Re: I need help on getting Derby to use a primary key index on a query

mike matrigali Fri, 23 Aug 2013 11:59:14 -0700

The best way to get help on this would be to create a new JIRA and post
a reproducible case against trunk there.  Is that your goal at this point?

I would suggest creating a new JIRA now, so the discussion can takeplace there.


In your "real" application does the query include the id explicitly
in the query, or do you use "?" and set the parameter separately.  These
2 execution options take different paths through the optimizer.

I agree that the "0" estimates look suspicious.  Maybe there is a bug

somewhere where an estimate starts at 0 and the optimizer startsmultiplying that 0 by other stuff and still gets 0.


As to how to build off the 10.9 branch vs. trunk.  You should create
a new directory and then to fill your svn client use the following
as the source (maybe http vs https).  This gives you the top of
the 10.9 branch.  you can then use some svn magic to get that branch
to whatever build number you are interested.  I tend to almost always
just work off the top during bug fixes.:

svn co https://svn.apache.org/repos/asf/db/derby/code/branches/10.9/ .

On 8/23/2013 11:04 AM, Bergquist, Brett wrote:

I just enabled the
derby.storage.indexStats.debug.keepDisposableStats=true in
derby.properties and restarted the network server.   I am running with
straight 10.9.1.0 right now and have a database with exactly two tables.
  I have used the "syscs_util.syscs_drop_statistics" to drop the
statistics and then uses the "syscs_util.syscs_update_statistics" to
recreate the statistics.   Here is the statistics query:

select tablename || ' (' || CONGLOMERATENAME || ')' as INDEX,
REFERENCEID,
CREATIONTIMESTAMP,
TYPE,
VALID,
COLCOUNT,
cast (STATISTICS as varchar(80)) stats,
       CASE WHEN CAST(creationtimestamp AS varchar(24))  IS NULL
       THEN 'Recreate Index to Initialize'
       ELSE CAST(creationtimestamp AS varchar(24))  END
from sys.systables t, sys.sysconglomerates c
     LEFT OUTER JOIN sys.sysstatistics s ON c.conglomerateid = s.referenceid
where t.tableid = c.tableid
and c.isindex = true and t.tabletype = 'T'
order by INDEX;

Here is the output from that:

INDEXREFERENCEIDCREATIONTIMESTAMPTYPEVALIDCOLCOUNTSTATS8
CONFIGURATION_BUNDLE3
(CONFIGURATION_BUNDLE3_IX_1)522b00e8-0140-ab92-f364-ffff93450f772013-08-23
13:54:30.812Itrue1numunique= 135 numrows= 70034812013-08-23 13:54:30.812
CONFIGURATION_BUNDLE3
(CONFIGURATION_BUNDLE3_IX_2)597e80e5-0140-ab92-f364-ffff93450f772013-08-23
13:54:21.516Itrue1numunique= 92 numrows= 70034812013-08-23 13:54:21.516
CONFIGURATION_BUNDLE3
(SQL130823102831500)d7bd80dd-0140-ab92-f364-ffff93450f772013-08-23
13:54:13.129Itrue1numunique= 7003481 numrows= 70034812013-08-23 13:54:13.129
COSEDDROPPROFILEDSCPTABLEBUNDLE_COSEDDROPPROFILEDSCPTABLEENTRY3
(COSEDDROPPROFILEDSCPTABLEBUNDLE_COSEDDROPPROFILEDSCPTABLEENTRY3_IX_0)46f7c11b-0140-ab92-f364-ffff93450f772013-08-23
13:57:22.96Itrue1numunique= 53053312 numrows= 582481282013-08-23 13:57:22.96
COSEDDROPPROFILEDSCPTABLEBUNDLE_COSEDDROPPROFILEDSCPTABLEENTRY3
(SQL130823102836170)4760c0fe-0140-ab92-f364-ffff93450f772013-08-23
13:55:30.662Itrue2numunique= 58248128 numrows= 582481282013-08-23
13:55:30.662
COSEDDROPPROFILEDSCPTABLEBUNDLE_COSEDDROPPROFILEDSCPTABLEENTRY3
(SQL130823102836170)4760c0fe-0140-ab92-f364-ffff93450f772013-08-23
13:55:30.661Itrue1numunique= 910127 numrows= 582481282013-08-23 13:55:30.661
COSEDDROPPROFILEDSCPTABLEBUNDLE_COSEDDROPPROFILEDSCPTABLEENTRY3
(SQL130823102836240)f8e18104-0140-ab92-f364-ffff93450f772013-08-23
13:56:24.379Itrue1numunique= 910127 numrows= 582481282013-08-23 13:56:24.379


So as can be seen, the primary key backing index
of CONFIGURATION_BUNDLE3 shows that the are 7003481 unique rows
and COSEDDROPPROFILEDSCPTABLEBUNDLE_COSEDDROPPROFILEDSCPTABLEENTRY3
shows 58248128 unique rows.

The query that I am performing is:

CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1);
CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(1);
SELECT * FROM APP.CONFIGURATION_BUNDLE3 T1,
APP.COSEDDROPPROFILEDSCPTABLEBUNDLE_COSEDDROPPROFILEDSCPTABLEENTRY3 t0
WHERE t1.ID = t0.COSEDDROPPROFILEDSCPTABLEBUNDLE_ID and
t0.COSEDDROPPROFILEDSCPTABLEENTRY_ID = 12344444;
VALUES SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS();
CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(0);
CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(0);

The optimizer plan from this is below.   The plan correctly chooses to
use the index on the COSEDDROPPROFILEDSCPTABLEENTRY_ID on
the  COSEDDROPPROFILEDSCPTABLEBUNDLE_COSEDDROPPROFILEDSCPTABLEENTRY3
table but still estimates 0 rows (there is actually 1 row) and then uses
a table scan on CONFIGURATION_BUNDLE3 even though the join
field APP.CONFIGURATION_BUNDLE3.ID is the primary key.

I have so far been able to do nothing that will cause this to operate
correctly now that it is in this state.   Dropping and recreating
statistics has no effect.   I will try dropping and recreating the
indexes and primary key next.

Here is the query plan:

Statement Name:
SQL_CURLH000C2
Statement Text:
SELECT * FROM APP.CONFIGURATION_BUNDLE3 T1,
APP.COSEDDROPPROFILEDSCPTABLEBUNDLE_COSEDDROPPROFILEDSCPTABLEENTRY3 t0
WHERE t1.ID = t0.COSEDDROPPROFILEDSCPTABLEBUNDLE_ID and
t0.COSEDDROPPROFILEDSCPTABLEENTRY_ID = 12344444
Parse Time: 2
Bind Time: 3
Optimize Time: 16
Generate Time: 3
Compile Time: 24
Execute Time: 9609
Begin Compilation Timestamp : 2013-08-23 13:59:44.959
End Compilation Timestamp : 2013-08-23 13:59:44.983
Begin Execution Timestamp : 2013-08-23 13:59:44.986
End Execution Timestamp : 2013-08-23 13:59:54.595
Statement Execution Plan Text:
Project-Restrict ResultSet (5):
Number of opens = 1
Rows seen = 1
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 1
next time (milliseconds) = 9608
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 0.00
optimizer estimated cost: 7.86
Source result set:
Nested Loop Join ResultSet:
Number of opens = 1
Rows seen from the left = 1
Rows seen from the right = 1
Rows filtered = 0
Rows returned = 1
constructor time (milliseconds) = 0
open time (milliseconds) = 1
next time (milliseconds) = 9608
close time (milliseconds) = 0
optimizer estimated row count: 0.00
optimizer estimated cost: 7.86
Left result set:
Index Row to Base Row ResultSet for
COSEDDROPPROFILEDSCPTABLEBUNDLE_COSEDDROPPROFILEDSCPTABLEENTRY3:
Number of opens = 1
Rows seen = 1
Columns accessed from heap = {0, 1}
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 0.00
optimizer estimated cost: 7.86
Index Scan ResultSet for
COSEDDROPPROFILEDSCPTABLEBUNDLE_COSEDDROPPROFILEDSCPTABLEENTRY3 using
index
COSEDDROPPROFILEDSCPTABLEBUNDLE_COSEDDROPPROFILEDSCPTABLEENTRY3_IX_0 at
read committed isolation level using instantaneous share row locking
chosen by the optimizer
Number of opens = 1
Rows seen = 1
Rows filtered = 0
Fetch Size = 16
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
next time in milliseconds/row = 0

scan information:
Bit set of columns fetched=All
Number of columns fetched=2
Number of deleted rows visited=0
Number of pages visited=5
Number of rows qualified=1
Number of rows visited=2
Scan type=btree
Tree height=5
start position:
 >= on first 1 column(s).
Ordered null semantics on the following columns:
0
stop position:
 > on first 1 column(s).
Ordered null semantics on the following columns:
0
qualifiers:
None
optimizer estimated row count: 0.00
optimizer estimated cost: 7.86

Right result set:
Table Scan ResultSet for CONFIGURATION_BUNDLE3 at read committed
isolation level using instantaneous share row locking chosen by the
optimizer
Number of opens = 1
Rows seen = 1
Rows filtered = 0
Fetch Size = 16
constructor time (milliseconds) = 0
open time (milliseconds) = 1
next time (milliseconds) = 9607
close time (milliseconds) = 1
next time in milliseconds/row = 9607

scan information:
Bit set of columns fetched=All
Number of columns fetched=4
Number of pages visited=197281
Number of rows qualified=1
Number of rows visited=7003481
Scan type=heap
start position:
null
stop position:
null
qualifiers:
Column[0][0] Id: 3
Operator: =
Ordered nulls: false
Unknown return value: false
Negate comparison result: false
optimizer estimated row count: 0.00
optimizer estimated cost: 0.00


On Aug 23, 2013, at 1:10 PM, Mamta Satoor <[email protected]
<mailto:[email protected]>> wrote:

Hi Brett,

We have encountered and fixed at least one bug in case of
single-column unique index, where starting 10.9 we have decided not to
keep stats for such indexes. The bug fixed is DERBY-6045 and it is in
Derby 10.10.1.1(revision 1480153). I was wondering if you could share
java org.apache.derby.tools.sysinfo so we know if you have that fix.

If you do have that fix, then it is possible (though I am not certain)
that there may be other bug in this area that
you are running into. One way to figure that out is to use the
property derby.storage.indexStats.debug.keepDisposableStats=true. This
property will force Derby to collect the stats for single-column
unique index. So, if you can recreate your db with this property and
run the query, we can check the query plan to see if it starts using
the single-column unique index. It might be possible to use existing
db with this property and recreate the stats and try the query if
creating the db from scratch with this property is a time consuming
process.

thanks,
Mamta



On Wed, Aug 21, 2013 at 4:55 PM, Bryan Pendleton
<[email protected] <mailto:[email protected]>> wrote:

        Any thoughts or comments on why this the optimizer chooses on
        over the other?


    Unfortunately, no.

    But there have been concerns about the optimizer's cost
    analysis for years; see
    https://issues.apache.org/__jira/browse/DERBY-1905
    <https://issues.apache.org/jira/browse/DERBY-1905>
    and some of the other issues linked from it (DERBY-1259, DERBY-1260).

    So, while I know it's frustrating for you to be wrestling with
    an optimizer that won't behave, it's also good news, for Derby,
    that you've made such good progress in trying to understand
    this peculiar behavior.

    Perhaps when we get to the bottom of the strange behavior
    you're seeing, we'll have a better understanding of some of
    the other strange optimizer behaviors we've seen in the past.

    thanks,

    bryan

Re: I need help on getting Derby to use a primary key index on a query

Reply via email to