At 2010-02-19 12:22 -0500, Tony Mariella wrote:
My data is very complex, much more than my example.
Ah, then creating your own hash will probably help a lot if the
processor isn't creating a hash of its own for node tree subset comparison.
I could not get Geert's example to work in my dataset. Since there
were so many
tags in my xml, the "order by" section of the the items and
the "where not" section of the $item section are not working correctly.
The example below reduces the number of deep-equal() calls when
determining the nodes being returned.
I would be curious if you experience an improvement in execution time
on your large data set. Looking at my solution below more closely,
you could probably also introduce the same short-cut in the
uniqueness checking thus reducing the number of deep-equal()
comparisons in the first loop.
I hope this helps.
. . . . . . . . . . . Ken
T:\ftemp>type tony3.xml
<results>
<item>
<addr>24 Short Rd</addr>
<city>Baltimore</city>
<state>MD</state>
<testVal/>
</item>
<item>
<addr>24 Short Rd</addr>
<city>Baltimore</city>
<state>MD</state>
<testVal/>
</item>
<item>
<addr>24 Short Rd</addr>
<city>Baltimore</city>
<state>MD</state>
<testVal>TEST1</testVal>
</item>
<item>
<addr>55 Tall Rd</addr>
<city>Orlando</city>
<state>FL</state>
<testVal/>
</item>
<item>
<addr>55 Tall Rd</addr>
<city>Orlando</city>
<state>FL</state>
<testVal/>
</item>
<item>
<addr>55 Tall Rd</addr>
<city>Orlando</city>
<state>FL</state>
<testVal/>
</item>
</results>
T:\ftemp>call xquery tony3.xq
<?xml version="1.0" encoding="UTF-8"?>
<results>
<item>
<addr>24 Short Rd</addr>
<city>Baltimore</city>
<state>MD</state>
<testVal>TEST1</testVal>
</item>
<item>
<addr>55 Tall Rd</addr>
<city>Orlando</city>
<state>FL</state>
<testVal/>
</item>
</results>
T:\ftemp>type tony3.xq
declare function local:distinct-items ($items as node()*) as node()*
{
(:walk through the information finding unique members:)
let $unique := for $i at $ipos in $items
let $before_i := subsequence( $items, 1, $ipos - 1 )
where every $bi in $before_i
satisfies not( deep-equal($bi, $i) )
return $i
(:rearrange the information to isolate the non-testVal info:)
let $interim := for $u in $unique
return <interim>
<hash>
(:put here some unlikely equal subset of the
complete tree; for example, I'll use address:)
{$u/address}
</hash>
<compare>{$u/node() except $u/testVal}</compare>
{$u}
</interim>
(:walk through the rearranged information de-duping those without a
value for testVal and for those with testVal removing all the same
without it:)
for $each in $interim return
if ( string( $each/item/testVal ) )
then $each/item (:because this has testVal:)
else if ( some $i in ($interim except $each)
[hash = $each/hash]
satisfies deep-equal( $i/compare, $each/compare ) )
then () (:because the other must have testVal:)
else $each/item (:because none have testVal:)
};
<results>
{ local:distinct-items( doc('tony3.xml')/results/item ) }
</results>
T:\ftemp>rem Done!
--
XSLT/XQuery training: after http://XMLPrague.cz 2010-03-15/19
XSLT/XQuery training: San Carlos, California 2010-04-26/30
Principles of XSLT for XQuery Writers: San Francisco,CA 2010-05-03
XSLT/XQuery/UBL/Code List training: Trondheim,Norway 2010-06-02/11
Vote for your XML training: http://www.CraneSoftwrights.com/q/i/
Crane Softwrights Ltd. http://www.CraneSoftwrights.com/q/
G. Ken Holman mailto:[email protected]
Male Cancer Awareness Nov'07 http://www.CraneSoftwrights.com/q/bc
Legal business disclaimers: http://www.CraneSoftwrights.com/legal
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general