RE: [MarkLogic Dev General] Surprising behavior with text nodeconstruction

Florentine, George Fri, 14 Mar 2008 06:45:01 -0700

Michael, thx for the references - makes sense.

The part that's still troubling to me is that the function declaration
states that the function should return a text node, not an atomic value.


So the behavior is different depending upon whether you explicitly
create a text node in the function vs. relying on a cast from a string
to a text node. Does that behavior seem right to you?

Thx,

G
---


George Florentine

  VP, High Technology Content & Technical Publishing


[EMAIL PROTECTED]
  O:  303.542.2173
  C:  303.669.8628
  F:  303.544.0522

 

  www.FlatironsSolutions.com

  An Inc. 500 Company


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Michael
Blakeley
Sent: Friday, March 14, 2008 2:18 AM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] Surprising behavior with text
nodeconstruction

We can make the test-case even shorter:

<test>
   <strings>{ for $i in 1 to 2 return "dummy" }</strings>
   <texts>{ for $i in 1 to 2 return text { "dummy" } }</texts>
</test>

=>
<test><strings>dummy dummy</strings><texts>dummydummy</texts></test>

I believe that this is the specified behavior, from 
http://www.w3.org/TR/xquery/#id-content (elided for simplicity):

> 1.e.i: For each adjacent sequence of one or more atomic values
returned
> by an enclosed expression, a new text node is constructed, containing
> the result of casting each atomic value to a string, with a single
space
> character inserted between adjacent values.

That matches "strings", above.

> 3. Adjacent text nodes in the content sequence are merged into a
single
> text node by concatenating their contents, with no intervening
blanks."

And that matches "texts", above.

-- Mike

Williams, Paul wrote:
> Sorry... not an answer, just more on the question...
> 
> I reduced the sample code down to what I've included below in order to
> wrap my head around this a little better.  This code shows both
results
> as George described.  But it focuses on the piece of the code that
seems
> pertinent.  Running this test produces this output...
> 
> <text>
>   <strings>dummy dummy</strings>
>   <texts>dummydummy</texts>
> </text>
> 
> So why doesn't the explicit text constructor version in the "texts"
> element produce the same space-joined single text node as the
> auto-constructed version in the "strings" element?  
> 
> The "strings" version, I would assume, produces a set of strings
first,
> then decides it needs a text node and must construct it.  The "texts"
> version, I assume, produces a set of text nodes first, then decides
they
> need to be concatenated.  But for the "strings" version to end up with
> the space, it must be converting the set of strings into a set of text
> nodes and then concatenating into one.  So why doesn't that result in
> the same output as the set of text nodes in the "texts" version?
Hmmm.
> Curious.
> 
> Sample code, try this in CQ ...
> ----------------------------------------------------------------
> <test>
>   <strings>{ for $node in (<elem/>,<elem/>) return  "dummy"
}</strings>
>   <texts>{ for $node in (<elem/>,<elem/>) return  text{"dummy"}
> }</texts>
> </test>
> ----------------------------------------------------------------
> 
> -- Paul
> 
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> Florentine, George
> Sent: Thursday, March 13, 2008 7:04 PM
> To: [email protected]
> Subject: [MarkLogic Dev General] Surprising behavior with text
> nodeconstruction
> 
> I've run into an interesting behavior (optimization? bug?) in
MarkLogic
> and wanted to see what others thought of this.
> 
> Here's the background - we have some code that dynamically generates
> content by processing DITA topics. Depending upon the structure of the
> content it's possible that our XQuery code may process two sequential
> elements that would each return a text node from a function. What we
see
> is that in this case, only one text node is returned and its value is
> the concatenation of the two string values separated by a single space
> character. This is somewhat in line with the 2003 spec
>
(http://www.w3.org/TR/2003/WD-xquery-20030502/#doc-ComputedTextConstruct
> or, section 3.7.2.4), which states:
> 
> ----
> The content expression of a text node constructor is processed as
> follows:
> 1. Atomization is applied to the value of the content expression,
> converting it to a sequence of atomic values.
> 2. If the result of atomization is an empty sequence, no text node is
> constructed. Otherwise, each atomic value in the atomized sequence is
> cast into a string.
> 3. The individual strings resulting from the previous step are merged
> into a single string by concatenating them with a single space
character
> between each pair. The resulting string becomes the content of the
> constructed text node.
> -----
> 
> So it appears that there's some optimization in the output generation
of
> nodes such that two sequential text nodes are collapsed into one.
> 
> Below is a concrete code example. If you run the 1st code snippet in
CQ,
> the code generates the output <p>dummy dummy</p>, showing an example
of
> two calls to a function that should return two text nodes but only
> returns one text node, with the return value of each call ("dummy")
> concatenated into a single text node with a space character separating
> the two.
> 
> If you run the same code (2nd snippet) with the one change that the
> return value from the function transform_dummy returns an explicitly
> created text constructor the output is <p>dummydummy</p> (no space
> character). This is the behavior I was expecting and seems like the
> right behavior. Note that the return value in function signature for
the
> transform_dummy() function is text() so I would assume that the
> xs:string "dummy" would be coerced into a text node and that a text
node
> would be returned from this function in all cases.
> 
> It seems bad that this behavior is different. I'd like to get other
> perspectives on this.
> 
> Thx,
> 
> G
> -------------------------------
> 
> Code snippet 1 - no explicit text constructor in the function
> transform_dummy, returns <p>dummy dummy</p>
> -------------------------------
> 
> define function transform_default_element($element as element()) as
> node()
> {
>     (: create a new element with the same name and attributes and
> recurse to travel the subtree. :)
>     element
>      {fn:node-name($element)}
>      {$element/@*,transform_template($element/node())}
> }
> define function transform_dummy($element as element()) as text()
> {
>    "dummy"
> }
> define function transform_element ( $element as element())  as node()*
> {
>     (: branch to more specialized functions based on the type of
element
> :)
>     typeswitch ($element)
>         case element(dummy)
>             return transform_dummy($element)
>         default 
>             return transform_default_element ($element)
> }
> define function transform_template ( $nodes as node()* )  as node()* 
> {
>      
>    for $node in $nodes   
>    return 
>        typeswitch($node)
>            case element()                
>                return transform_element($node)
>             default
>                 (: PIs, text and comment nodes are outputted here :)
>                 return $node          
>  }
> 
> (: module start :)
> let $para := xdmp:unquote("<p><dummy/><dummy/></p>")
> return transform_template($para/node())
> 
> -----------------------------------------
> Code snippet 2: explicit creation of text node in transform_dummy,
> returns <p>dummydummy</p>
> ------------------------------------------
> 
> define function transform_default_element($element as element()) as
> node()
> {
>     (: create a new element with the same name and attributes and
> recurse to travel the subtree. :)
>     element
>      {fn:node-name($element)}
>      {$element/@*,transform_template($element/node())}
> }
> define function transform_dummy($element as element()) as text()
> {
>    (: explicitly create a text node before returning :)
>    text { "dummy" }
> }
> define function transform_element ( $element as element())  as node()*
> {
>     (: branch to more specialized functions based on the type of
element
> :)
>     typeswitch ($element)
>         case element(dummy)
>             return transform_dummy($element)
>         default 
>             return transform_default_element ($element)
> }
> define function transform_template ( $nodes as node()* )  as node()* 
> {
>      
>    for $node in $nodes   
>    return 
>        typeswitch($node)
>            case element()                
>                return transform_element($node)
>             default
>                 (: PIs, text and comment nodes are outputted here :)
>                 return $node          
>  }
> 
> (: module start :)
> 
> let $para := xdmp:unquote("<p><dummy/><dummy/></p>")
> return transform_template($para/node())
> 
>
------------------------------------------------------------------------
> ---
> George Florentine
> 
> [EMAIL PROTECTED]
>   O:  303.542.2173
>   C:  303.669.8628
>   F:  303.544.0522
>   www.FlatironsSolutions.com
>  An Inc. 500 Company
> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

RE: [MarkLogic Dev General] Surprising behavior with text nodeconstruction

Reply via email to