Re: [basex-talk] Spectactularly slow performance with db:open vs. db:text
> Your second query executes in a fraction of a second - but only because the > optimizer successfully rewrites it to use the TEXT index. Exactly. A single context item reference (".") can be rewritten for index if the database statistics indicate that the addressed element – in your case order-id – has no other descendant elements. If it did have, you would get different results: see e.g.: XY = 'XY' → true In your case, I assume that your database statistics have got outdated (e.g. due to updates), or there may be order-id elements with elements as children.
Re: [basex-talk] Spectactularly slow performance with db:open vs. db:text
Hello Christian, Your second query executes in a fraction of a second - but only because the optimizer successfully rewrites it to use the TEXT index. When I remove the TEXT index so that the optimizer uses db:open-pre(), then the query takes some 88 minutes to run. (I did improve it to use a direct path). On the other hand, I did some benchmarking against different sized lookups. I've discovered that I am averageing just over 10 lookups per second against DB2 using db:open-pre, which is independent of the number of lookups performed per query. Presumably this is just an expensive sort of query to repeat thousands of times, so the major learning for me is to double-check that queries are rewritten against the TEXT index where possible. Thanks for helping me think this through. C. -Original Message- From: Christian Grün [mailto:christian.gr...@gmail.com] Sent: 15 February 2016 10:44 To: Hondros, Constantine (ELS-AMS) Cc: BaseX Subject: Re: [basex-talk] Spectactularly slow performance with db:open vs. db:text Hi Constantine, > for $a in (db:open('DB1')/item/order-id) return > if (db:open('DB2')//order-id[. = $a]) then > $a > else > () Do some of the order-id elements contain descendant elements? db:open('DB1')/item/order-id[*] If yes, the following query might be faster: for $a in (db:open('DB1')/item/order-id) return if (db:open('DB2')//order-id[text() = $a]) then $a else () Here is another way to rewrite the query: for $a in (db:open('DB1')/item/order-id) where db:open('DB2')//order-id[text() = $a] return $a > > > Note that the optimized query uses db:open-pre to access DB2. When I > re-write the query myself to use the TEXT index then performance is > excellent. But why such a difference? > > > > > > Query 2 [returns in 0.3 second] > > > > > > for $a in (db:open('DB1')/item/order-id) > > return > > if (db:text('DB2', $a)/parent::order-id) then > > $a > > else > >() > > > > > Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The > Netherlands, Registration No. 33156677, Registered in The Netherlands. Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.
Re: [basex-talk] Spectactularly slow performance with db:open vs. db:text
Hi Constantine, > for $a in (db:open('DB1')/item/order-id) > return > if (db:open('DB2')//order-id[. = $a]) then > $a > else > () Do some of the order-id elements contain descendant elements? db:open('DB1')/item/order-id[*] If yes, the following query might be faster: for $a in (db:open('DB1')/item/order-id) return if (db:open('DB2')//order-id[text() = $a]) then $a else () Here is another way to rewrite the query: for $a in (db:open('DB1')/item/order-id) where db:open('DB2')//order-id[text() = $a] return $a > > > Note that the optimized query uses db:open-pre to access DB2. When I > re-write the query myself to use the TEXT index then performance is > excellent. But why such a difference? > > > > > > Query 2 [returns in 0.3 second] > > > > > > for $a in (db:open('DB1')/item/order-id) > > return > > if (db:text('DB2', $a)/parent::order-id) then > > $a > > else > >() > > > > > Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The > Netherlands, Registration No. 33156677, Registered in The Netherlands.
[basex-talk] Spectactularly slow performance with db:open vs. db:text
Hello Basexers, I'm getting such a low performance on a relatively simple join between two databases that I feel there must be something going wrong here. I can provide the sources if necessary, but basically DB1 is 26 MB, about 80,000 small documents; DB2 is 47 MB, about 18,500 small documents. I'm using 8.4, by the way, haven't tested on other releases. Query 1 [returns in 144 minutes] for $a in (db:open('DB1')/item/order-id) return if (db:open('DB2')//order-id[. = $a]) then $a else () Note that the optimized query uses db:open-pre to access DB2. When I re-write the query myself to use the TEXT index then performance is excellent. But why such a difference? Query 2 [returns in 0.3 second] for $a in (db:open('DB1')/item/order-id) return if (db:text('DB2', $a)/parent::order-id) then $a else () Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.