RE: [MarkLogic Dev General] Surprising behavior with text nodeconstruction

David Sewell Fri, 14 Mar 2008 08:15:26 -0700

I would say that there is indeed a bug in ML Server reflected in these
cases. For proof, consider that (ML Server 3.2-5) this code:


  define function f() as text() { "dummy" }
  <out>{f(), f()}</out>

produces output (<out>dummy dummy</out>), while this code:

  define function f() as text() { "dummy" }
  concat(f(), f())

throws an error, "XDMP-AS: f() -- Invalid coercion: 'dummy' as  text()".

The first example should throw the same error, I think. I'll submit this
to ML support just to be sure it gets in the bug tracking queue.

David


On Fri, 14 Mar 2008, Williams, Paul wrote:

> Ok, one more exercise then...
>
>
>
> This may model George's situation a little closer.  This test case
> produces the same results we've seen before.  So, given the spec excerpt
> from Mike, the function f() appears to be returning a string rather than
> a constructed text node.  Why is that?
>
>
>
> define function f() as node() {"dummy"}
>
> <test>
>
>    <strings>{ for $i in 1 to 2 return f() }</strings>
>
>    <texts>{ for $i in 1 to 2 return text { "dummy" } }</texts> </test>
>
>
>
>
>
>
>
> -- Paul
>
> [land] 402.592.8218
>
> [cell]  402.203.2232
>
>
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Michael
> Blakeley
> Sent: Friday, March 14, 2008 3:18 AM
> To: General Mark Logic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Surprising behavior with text
> nodeconstruction
>
>
>
> We can make the test-case even shorter:
>
>
>
> <test>
>
>    <strings>{ for $i in 1 to 2 return "dummy" }</strings>
>
>    <texts>{ for $i in 1 to 2 return text { "dummy" } }</texts>
>
> </test>
>
>
>
> =>
>
> <test><strings>dummy dummy</strings><texts>dummydummy</texts></test>
>
>
>
> I believe that this is the specified behavior, from
>
> http://www.w3.org/TR/xquery/#id-content (elided for simplicity):
>
>
>
> > 1.e.i: For each adjacent sequence of one or more atomic values
> returned
>
> > by an enclosed expression, a new text node is constructed, containing
>
> > the result of casting each atomic value to a string, with a single
> space
>
> > character inserted between adjacent values.
>
>
>
> That matches "strings", above.
>
>
>
> > 3. Adjacent text nodes in the content sequence are merged into a
> single
>
> > text node by concatenating their contents, with no intervening
> blanks."
>
>
>
> And that matches "texts", above.
>
>
>
> -- Mike
>
>
>
> Williams, Paul wrote:
>
> > Sorry... not an answer, just more on the question...
>
> >
>
> > I reduced the sample code down to what I've included below in order to
>
> > wrap my head around this a little better.  This code shows both
> results
>
> > as George described.  But it focuses on the piece of the code that
> seems
>
> > pertinent.  Running this test produces this output...
>
> >
>
> > <text>
>
> >   <strings>dummy dummy</strings>
>
> >   <texts>dummydummy</texts>
>
> > </text>
>
> >
>
> > So why doesn't the explicit text constructor version in the "texts"
>
> > element produce the same space-joined single text node as the
>
> > auto-constructed version in the "strings" element?
>
> >
>
> > The "strings" version, I would assume, produces a set of strings
> first,
>
> > then decides it needs a text node and must construct it.  The "texts"
>
> > version, I assume, produces a set of text nodes first, then decides
> they
>
> > need to be concatenated.  But for the "strings" version to end up with
>
> > the space, it must be converting the set of strings into a set of text
>
> > nodes and then concatenating into one.  So why doesn't that result in
>
> > the same output as the set of text nodes in the "texts" version?
> Hmmm.
>
> > Curious.
>
> >
>
> > Sample code, try this in CQ ...
>
> > ----------------------------------------------------------------
>
> > <test>
>
> >   <strings>{ for $node in (<elem/>,<elem/>) return  "dummy"
> }</strings>
>
> >   <texts>{ for $node in (<elem/>,<elem/>) return  text{"dummy"}
>
> > }</texts>
>
> > </test>
>
> > ----------------------------------------------------------------
>
> >
>
> > -- Paul
>
> >
>
> > -----Original Message-----
>
> > From: [EMAIL PROTECTED]
>
> > [mailto:[EMAIL PROTECTED] On Behalf Of
>
> > Florentine, George
>
> > Sent: Thursday, March 13, 2008 7:04 PM
>
> > To: [email protected]
>
> > Subject: [MarkLogic Dev General] Surprising behavior with text
>
> > nodeconstruction
>
> >
>
> > I've run into an interesting behavior (optimization? bug?) in
> MarkLogic
>
> > and wanted to see what others thought of this.
>
> >
>
> > Here's the background - we have some code that dynamically generates
>
> > content by processing DITA topics. Depending upon the structure of the
>
> > content it's possible that our XQuery code may process two sequential
>
> > elements that would each return a text node from a function. What we
> see
>
> > is that in this case, only one text node is returned and its value is
>
> > the concatenation of the two string values separated by a single space
>
> > character. This is somewhat in line with the 2003 spec
>
> >
> (http://www.w3.org/TR/2003/WD-xquery-20030502/#doc-ComputedTextConstruct
>
> > or, section 3.7.2.4), which states:
>
> >
>
> > ----
>
> > The content expression of a text node constructor is processed as
>
> > follows:
>
> > 1. Atomization is applied to the value of the content expression,
>
> > converting it to a sequence of atomic values.
>
> > 2. If the result of atomization is an empty sequence, no text node is
>
> > constructed. Otherwise, each atomic value in the atomized sequence is
>
> > cast into a string.
>
> > 3. The individual strings resulting from the previous step are merged
>
> > into a single string by concatenating them with a single space
> character
>
> > between each pair. The resulting string becomes the content of the
>
> > constructed text node.
>
> > -----
>
> >
>
> > So it appears that there's some optimization in the output generation
> of
>
> > nodes such that two sequential text nodes are collapsed into one.
>
> >
>
> > Below is a concrete code example. If you run the 1st code snippet in
> CQ,
>
> > the code generates the output <p>dummy dummy</p>, showing an example
> of
>
> > two calls to a function that should return two text nodes but only
>
> > returns one text node, with the return value of each call ("dummy")
>
> > concatenated into a single text node with a space character separating
>
> > the two.
>
> >
>
> > If you run the same code (2nd snippet) with the one change that the
>
> > return value from the function transform_dummy returns an explicitly
>
> > created text constructor the output is <p>dummydummy</p> (no space
>
> > character). This is the behavior I was expecting and seems like the
>
> > right behavior. Note that the return value in function signature for
> the
>
> > transform_dummy() function is text() so I would assume that the
>
> > xs:string "dummy" would be coerced into a text node and that a text
> node
>
> > would be returned from this function in all cases.
>
> >
>
> > It seems bad that this behavior is different. I'd like to get other
>
> > perspectives on this.
>
> >
>
> > Thx,
>
> >
>
> > G
>
> > -------------------------------
>
> >
>
> > Code snippet 1 - no explicit text constructor in the function
>
> > transform_dummy, returns <p>dummy dummy</p>
>
> > -------------------------------
>
> >
>
> > define function transform_default_element($element as element()) as
>
> > node()
>
> > {
>
> >     (: create a new element with the same name and attributes and
>
> > recurse to travel the subtree. :)
>
> >     element
>
> >      {fn:node-name($element)}
>
> >      {$element/@*,transform_template($element/node())}
>
> > }
>
> > define function transform_dummy($element as element()) as text()
>
> > {
>
> >    "dummy"
>
> > }
>
> > define function transform_element ( $element as element())  as node()*
>
> > {
>
> >     (: branch to more specialized functions based on the type of
> element
>
> > :)
>
> >     typeswitch ($element)
>
> >         case element(dummy)
>
> >             return transform_dummy($element)
>
> >         default
>
> >             return transform_default_element ($element)
>
> > }
>
> > define function transform_template ( $nodes as node()* )  as node()*
>
> > {
>
> >
>
> >    for $node in $nodes
>
> >    return
>
> >        typeswitch($node)
>
> >            case element()
>
> >                return transform_element($node)
>
> >             default
>
> >                 (: PIs, text and comment nodes are outputted here :)
>
> >                 return $node
>
> >  }
>
> >
>
> > (: module start :)
>
> > let $para := xdmp:unquote("<p><dummy/><dummy/></p>")
>
> > return transform_template($para/node())
>
> >
>
> > -----------------------------------------
>
> > Code snippet 2: explicit creation of text node in transform_dummy,
>
> > returns <p>dummydummy</p>
>
> > ------------------------------------------
>
> >
>
> > define function transform_default_element($element as element()) as
>
> > node()
>
> > {
>
> >     (: create a new element with the same name and attributes and
>
> > recurse to travel the subtree. :)
>
> >     element
>
> >      {fn:node-name($element)}
>
> >      {$element/@*,transform_template($element/node())}
>
> > }
>
> > define function transform_dummy($element as element()) as text()
>
> > {
>
> >    (: explicitly create a text node before returning :)
>
> >    text { "dummy" }
>
> > }
>
> > define function transform_element ( $element as element())  as node()*
>
> > {
>
> >     (: branch to more specialized functions based on the type of
> element
>
> > :)
>
> >     typeswitch ($element)
>
> >         case element(dummy)
>
> >             return transform_dummy($element)
>
> >         default
>
> >             return transform_default_element ($element)
>
> > }
>
> > define function transform_template ( $nodes as node()* )  as node()*
>
> > {
>
> >
>
> >    for $node in $nodes
>
> >    return
>
> >        typeswitch($node)
>
> >            case element()
>
> >                return transform_element($node)
>
> >             default
>
> >                 (: PIs, text and comment nodes are outputted here :)
>
> >                 return $node
>
> >  }
>
> >
>
> > (: module start :)
>
> >
>
> > let $para := xdmp:unquote("<p><dummy/><dummy/></p>")
>
> > return transform_template($para/node())
>
> >
>
> >
> ------------------------------------------------------------------------
>
> > ---
>
> > George Florentine
>
> >
>
> > [EMAIL PROTECTED]
>
> >   O:  303.542.2173
>
> >   C:  303.669.8628
>
> >   F:  303.544.0522
>
> >   www.FlatironsSolutions.com
>
> >  An Inc. 500 Company
>
> >
>
> >
>
> > _______________________________________________
>
> > General mailing list
>
> > [email protected]
>
> > http://xqzone.com/mailman/listinfo/general
>
> > _______________________________________________
>
> > General mailing list
>
> > [email protected]
>
> > http://xqzone.com/mailman/listinfo/general
>
>
>
>

-- 
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 801079, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: [EMAIL PROTECTED]   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

RE: [MarkLogic Dev General] Surprising behavior with text nodeconstruction

Reply via email to