One last tidbit ... Back to my original comment,  some XPath expressions
are not optimizing to indexing.
THIS expression (similar to my original) does not optimize to use
indexes,
even now that the records are all seperate docs.



declare variable $id := '2483417';
declare variable $c := xdmp:directory("/rxnorm/rxnconso/")//row[RXAUI eq
$id];
declare variable $id2 as xs:string := $c/RXAUI/string(); 

for $r in
     xdmp:directory("/rxnorm/rxnsat/")//row[RXAUI eq $id2]
return $r


--------

My hunch is that if the expression in the xpath cannot be calculated
during static analysis then 
it is not optimized to use indexes ... because THIS expression does use
indexes

         xdmp:directory("/rxnorm/rxnconso/")//row[RXAUI eq $id]



-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Lee, David
Sent: Monday, December 07, 2009 7:20 AM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Interesting case where
MLrefusestooptimize XPath

I tried all of these variants for experiment. 
They all take approx the same time, 4.5 sec. (reguardless of how many
times I run them)
Overnight my load of separate documents worked.  Interestingly the speed
of the load slowly increased from 5 TPS to 42 TPS so it finished
overnight.

This query now runs in better time : (1 sec the first time, .001 sec the
2nd time)

declare variable $id := '2483417';
declare variable $c := xdmp:directory("/rxnorm/rxnconso/")//row[RXAUI eq
$id];
declare variable $id2 as xs:string := $c/RXAUI/string(); 

for $r
in cts:search(
     xdmp:directory("/rxnorm/rxnsat/"),
        cts:element-query( xs:QName("RXAUI") , $id2 ) )
     
return $r
-----------


Note that this is in a different DB (same machine) Which I have
configured slightly differently ...
who knows what made the difference.   Maybe tonight I'll try loading
these 3mil docs into the SAME DB as the file
(or visa versa).





-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Geert
Josten
Sent: Monday, December 07, 2009 2:58 AM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Interesting case where ML
refusestooptimize XPath

Hi David and others,

I really wonder what the performance difference would be between David's
original:

for $r in cts:search(doc("/RxNorm/rxnsat.xml")/rxnsat/row,
  cts:element-query( xs:QName("RXAUI") , $id2 ))
return $r

And these:

for $r in cts:search(
  doc("/RxNorm/rxnsat.xml")/rxnsat/row,
  cts:element-value-query( xs:QName("RXAUI") , $id2 )
)
return $r

for $r in cts:search(
  /rxnsat/row,
  cts:and-query((
    cts:element-value-query( xs:QName("RXAUI") , $id2 ),
    cts:document-query("/RxNorm/rxnsat.xml")
  ))
)
return $r

And

for $r in cts:search(
  //row,
  cts:and-query((
    cts:element-value-query( xs:QName("RXAUI") , $id2 ),
    cts:document-query("/RxNorm/rxnsat.xml")
  ))
)
where parent::rxnsat
return $r

David, is it really only the for loop that takes 5 sec? Or are you doing
more in the same query? A cts:search taking 5 sec doesn't sound as a
cts:search that is only relying on indexes, there must be something else
going on, I am sure..

Kind regards,
Geert

>


Drs. G.P.H. Josten
Consultant


http://www.daidalos.nl/
Daidalos BV
Source of Innovation
Hoekeindsehof 1-4
2665 JZ Bleiswijk
Tel.: +31 (0) 10 850 1200
Fax: +31 (0) 10 850 1199
http://www.daidalos.nl/
KvK 27164984
De informatie - verzonden in of met dit emailbericht - is afkomstig van
Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u
dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te
verwijderen. Aan dit bericht kunnen geen rechten worden ontleend.


> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> Lee, David
> Sent: zaterdag 5 december 2009 15:20
> To: [email protected]
> Subject: [MarkLogic Dev General] Interesting case where ML
> refuses to optimize XPath
>
> I have 2 xml docs, each about 1GB and about 2 mil fragments
> ("rows") each ... in fact the elements are called "rows".
>
> Each "row" element is about 500 bytes.   But I dont yet have
> a better way to fragment them.
>
> ( Yes Its been suggested to split these to seperate docs and
> I may experiment with that. )
>
>
>
> Here's a case where I've found ML refuses to optimize xpaths.
>
>
>
> First off, this expression takes about 5 seconds, which I
> find a little slow ...  it returns 8 rows.
>
>
>
>
>
> declare variable $id := '2483417';
>
> for $r in doc("/RxNorm/rxnsat.xml")/rxnsat/row[RXAUI eq $id]
>
> return $r
>
>
>
>
>
> Now to complicate things I actually need $id from a previous
> query so the real query is like
>
>
>
>
>
> declare variable $id := '2483417';
>
> declare variable $c :=
> doc("/RxNorm/rxnconso.xml")/rxnconso/row[RXAUI eq $id];
>
> declare variable $id2 as xs:string := $c/RXAUI/string();
>
>
>
> for $r in doc("/RxNorm/rxnsat.xml")/rxnsat/row[RXAUI eq $id2]
>
> return $r
>
>
>
> This takes about 1 minute ! ..    Checking the profile I find
> the expression  row[ RXAUI eq $id] is evaluated a million
> times ... indicating its not doing indexing.
>
>
>
> I've tried all sorts of combinations of these like
>
>
>
> doc("/RxNorm/rxnsat.xml")/rxnsat/row[xs:string(RXAUI) eq $id2]
>
> doc("/RxNorm/rxnsat.xml")/rxnsat/row[RXAUI eq $c/RXAUI]
>
> doc("/RxNorm/rxnsat.xml")/rxnsat/row/RXAUI[. eq $id2]/ancestor::row
>
>
>
>
>
> All to the same avail ... no indexing !
>
>
>
> But of course this brings things back to speed
>
>
>
> ---------
>
> for $r in cts:search(doc("/RxNorm/rxnsat.xml")/rxnsat/row,
>
> cts:element-query( xs:QName("RXAUI") , $id2 ))
>
> return $r
>
>
>
> ------------
>
>
>
>
>
> Still takes too long (about 5 sec) ... but its back to
> realtime atleast.
>
>
>
> I'm experimenting now with fields ...
>
>
>
> But I find it strange that I cant the xpath expression to use
> the indexes in one case but it does in another that seems
> almost identical to me.
>
>
>
> This expression
>
> declare variable $id2 as xs:string := $c/RXAUI/string();
>
>
>
> should tell the system that $id2 is a single string so why
> wont it use it in xpath based index queries ?
>
>
>
>
>
>
>
>
>
> ----------------------------------------
>
> David A. Lee
>
> Senior Principal Software Engineer
>
> Epocrates, Inc.
>
> [email protected] <mailto:[email protected]>
>
> 812-482-5224
>
>
>
>
>
>

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to