Michael, thx for the references - makes sense. The part that's still troubling to me is that the function declaration states that the function should return a text node, not an atomic value.
So the behavior is different depending upon whether you explicitly create a text node in the function vs. relying on a cast from a string to a text node. Does that behavior seem right to you? Thx, G --- George Florentine VP, High Technology Content & Technical Publishing [EMAIL PROTECTED] O: 303.542.2173 C: 303.669.8628 F: 303.544.0522 www.FlatironsSolutions.com An Inc. 500 Company -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Blakeley Sent: Friday, March 14, 2008 2:18 AM To: General Mark Logic Developer Discussion Subject: Re: [MarkLogic Dev General] Surprising behavior with text nodeconstruction We can make the test-case even shorter: <test> <strings>{ for $i in 1 to 2 return "dummy" }</strings> <texts>{ for $i in 1 to 2 return text { "dummy" } }</texts> </test> => <test><strings>dummy dummy</strings><texts>dummydummy</texts></test> I believe that this is the specified behavior, from http://www.w3.org/TR/xquery/#id-content (elided for simplicity): > 1.e.i: For each adjacent sequence of one or more atomic values returned > by an enclosed expression, a new text node is constructed, containing > the result of casting each atomic value to a string, with a single space > character inserted between adjacent values. That matches "strings", above. > 3. Adjacent text nodes in the content sequence are merged into a single > text node by concatenating their contents, with no intervening blanks." And that matches "texts", above. -- Mike Williams, Paul wrote: > Sorry... not an answer, just more on the question... > > I reduced the sample code down to what I've included below in order to > wrap my head around this a little better. This code shows both results > as George described. But it focuses on the piece of the code that seems > pertinent. Running this test produces this output... > > <text> > <strings>dummy dummy</strings> > <texts>dummydummy</texts> > </text> > > So why doesn't the explicit text constructor version in the "texts" > element produce the same space-joined single text node as the > auto-constructed version in the "strings" element? > > The "strings" version, I would assume, produces a set of strings first, > then decides it needs a text node and must construct it. The "texts" > version, I assume, produces a set of text nodes first, then decides they > need to be concatenated. But for the "strings" version to end up with > the space, it must be converting the set of strings into a set of text > nodes and then concatenating into one. So why doesn't that result in > the same output as the set of text nodes in the "texts" version? Hmmm. > Curious. > > Sample code, try this in CQ ... > ---------------------------------------------------------------- > <test> > <strings>{ for $node in (<elem/>,<elem/>) return "dummy" }</strings> > <texts>{ for $node in (<elem/>,<elem/>) return text{"dummy"} > }</texts> > </test> > ---------------------------------------------------------------- > > -- Paul > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Florentine, George > Sent: Thursday, March 13, 2008 7:04 PM > To: [email protected] > Subject: [MarkLogic Dev General] Surprising behavior with text > nodeconstruction > > I've run into an interesting behavior (optimization? bug?) in MarkLogic > and wanted to see what others thought of this. > > Here's the background - we have some code that dynamically generates > content by processing DITA topics. Depending upon the structure of the > content it's possible that our XQuery code may process two sequential > elements that would each return a text node from a function. What we see > is that in this case, only one text node is returned and its value is > the concatenation of the two string values separated by a single space > character. This is somewhat in line with the 2003 spec > (http://www.w3.org/TR/2003/WD-xquery-20030502/#doc-ComputedTextConstruct > or, section 3.7.2.4), which states: > > ---- > The content expression of a text node constructor is processed as > follows: > 1. Atomization is applied to the value of the content expression, > converting it to a sequence of atomic values. > 2. If the result of atomization is an empty sequence, no text node is > constructed. Otherwise, each atomic value in the atomized sequence is > cast into a string. > 3. The individual strings resulting from the previous step are merged > into a single string by concatenating them with a single space character > between each pair. The resulting string becomes the content of the > constructed text node. > ----- > > So it appears that there's some optimization in the output generation of > nodes such that two sequential text nodes are collapsed into one. > > Below is a concrete code example. If you run the 1st code snippet in CQ, > the code generates the output <p>dummy dummy</p>, showing an example of > two calls to a function that should return two text nodes but only > returns one text node, with the return value of each call ("dummy") > concatenated into a single text node with a space character separating > the two. > > If you run the same code (2nd snippet) with the one change that the > return value from the function transform_dummy returns an explicitly > created text constructor the output is <p>dummydummy</p> (no space > character). This is the behavior I was expecting and seems like the > right behavior. Note that the return value in function signature for the > transform_dummy() function is text() so I would assume that the > xs:string "dummy" would be coerced into a text node and that a text node > would be returned from this function in all cases. > > It seems bad that this behavior is different. I'd like to get other > perspectives on this. > > Thx, > > G > ------------------------------- > > Code snippet 1 - no explicit text constructor in the function > transform_dummy, returns <p>dummy dummy</p> > ------------------------------- > > define function transform_default_element($element as element()) as > node() > { > (: create a new element with the same name and attributes and > recurse to travel the subtree. :) > element > {fn:node-name($element)} > {$element/@*,transform_template($element/node())} > } > define function transform_dummy($element as element()) as text() > { > "dummy" > } > define function transform_element ( $element as element()) as node()* > { > (: branch to more specialized functions based on the type of element > :) > typeswitch ($element) > case element(dummy) > return transform_dummy($element) > default > return transform_default_element ($element) > } > define function transform_template ( $nodes as node()* ) as node()* > { > > for $node in $nodes > return > typeswitch($node) > case element() > return transform_element($node) > default > (: PIs, text and comment nodes are outputted here :) > return $node > } > > (: module start :) > let $para := xdmp:unquote("<p><dummy/><dummy/></p>") > return transform_template($para/node()) > > ----------------------------------------- > Code snippet 2: explicit creation of text node in transform_dummy, > returns <p>dummydummy</p> > ------------------------------------------ > > define function transform_default_element($element as element()) as > node() > { > (: create a new element with the same name and attributes and > recurse to travel the subtree. :) > element > {fn:node-name($element)} > {$element/@*,transform_template($element/node())} > } > define function transform_dummy($element as element()) as text() > { > (: explicitly create a text node before returning :) > text { "dummy" } > } > define function transform_element ( $element as element()) as node()* > { > (: branch to more specialized functions based on the type of element > :) > typeswitch ($element) > case element(dummy) > return transform_dummy($element) > default > return transform_default_element ($element) > } > define function transform_template ( $nodes as node()* ) as node()* > { > > for $node in $nodes > return > typeswitch($node) > case element() > return transform_element($node) > default > (: PIs, text and comment nodes are outputted here :) > return $node > } > > (: module start :) > > let $para := xdmp:unquote("<p><dummy/><dummy/></p>") > return transform_template($para/node()) > > ------------------------------------------------------------------------ > --- > George Florentine > > [EMAIL PROTECTED] > O: 303.542.2173 > C: 303.669.8628 > F: 303.544.0522 > www.FlatironsSolutions.com > An Inc. 500 Company > > > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
