On 31.03.12 22:28, Mike Matrigali wrote:
Bergquist, Brett wrote:
I hitched up the debugger and caught the indexStat return "49" for the
number of rows. I walked the stack back and found this was for
conglomeratename = "f3ec4922-011d-491f-3d8c-0000376d74d3"
So I did this query:
select
c.TABLEID,
c.CONGLOMERATENUMBER,
c.CONGLOMERATENAME,
c.ISINDEX,
c.ISCONSTRAINT,
c.CONGLOMERATEID,
t.TABLEID,
t.TABLENAME,
t.TABLETYPE,
s.STATID,
s.REFERENCEID,
s.TABLEID,
s.CREATIONTIMESTAMP,
s.TYPE,
s.VALID,
s.COLCOUNT,
CAST(STATISTICS AS VARCHAR(40)) as STATISTICS
from sys.SYSCONGLOMERATES c join sys.SYSTABLES t on c.TABLEID =
t.TABLEID join sys.SYSSTATISTICS s on s.TABLEID = t.TABLEID where
c.CONGLOMERATENAME = 'f3ec4922-011d-491f-3d8c-0000376d74d3'
Which is for one of the tables that I am having problems with. This
returned:
TABLEID CONGLOMERATENUMBER CONGLOMERATENAME ISINDEX ISCONSTRAINT
CONGLOMERATEID TABLEID TABLENAME TABLETYPE STATID REFERENCEID TABLEID
CREATIONTIMESTAMP TYPE VALID COLCOUNT STATISTICS
f3ec4922-011d-491f-3d8c-0000376d74d3 30768
f3ec4922-011d-491f-3d8c-0000376d74d3 false false
0e34c923-011d-491f-3d8c-0000376d74d3
f3ec4922-011d-491f-3d8c-0000376d74d3 BC01_CONFIGURATION_SET T
645c405f-0136-6999-c1b4-000065089f97
2c5f8294-012b-3c38-b55c-000043ea6398
f3ec4922-011d-491f-3d8c-0000376d74d3 2012-03-31 12:50:05.348 I true 1
numunique= 20334 numrows= 20334
f3ec4922-011d-491f-3d8c-0000376d74d3 30768
f3ec4922-011d-491f-3d8c-0000376d74d3 false false
0e34c923-011d-491f-3d8c-0000376d74d3
f3ec4922-011d-491f-3d8c-0000376d74d3 BC01_CONFIGURATION_SET T
49fc4ea2-0129-489a-0a26-00000732b350
592fcc26-011d-491f-3d8c-0000376d74d3
f3ec4922-011d-491f-3d8c-0000376d74d3 2010-06-17 23:16:40.018 I true 1
numunique= 49 numrows= 49
f3ec4922-011d-491f-3d8c-0000376d74d3 30768
f3ec4922-011d-491f-3d8c-0000376d74d3 false false
0e34c923-011d-491f-3d8c-0000376d74d3
f3ec4922-011d-491f-3d8c-0000376d74d3 BC01_CONFIGURATION_SET T
2c44c05e-0136-6999-c1b4-000065089f97
e9a40921-011d-491f-3d8c-0000376d74d3
f3ec4922-011d-491f-3d8c-0000376d74d3 2012-03-31 12:50:03.427 I true 1
numunique= 20334 numrows= 20334
So there seems to be 3 statistics rows for this table. And low and
behold there is the "numrows = 49" which I see the indexStat when I
turn on tracing.
First thought is I didn't think we needed any statistics at all
for a single column primary key.
That's probably DERBY-3790 ([1]).
I'll see if I can write a quick patch proposal and then we can take it
from there.
There should be entries in this catalog for every index on a table, so
it is not necessarily expected to be one per table. I think for each
index there is an entry for each column in the index (actually the stat
applies to multiple columns so in a 3 column index I think there is an
entry for (col1), (col1,col2), and (col1, col2, col3). But looking at
your ddl I would only expect 1 entry, as there is one
index on the table, and it has one column. I am not sure what the
expectation is for the foreign key that matches the primary key, maybe
it is 2 rows?
I do wonder if the foreign key is causing some extra issues. I think
in the long past we would end up creating 2 indexes in this case and
they would be exactly the same. So functionality was added to just
"logically" create the index and it would rely on the underlying
physical index sort of "shared" by the primary key on id and the foreign
constraint on id. We have had bugs in this area in the past and wonder
if it left an "orphan" row in the statistics.
As mentioned in another post, I think there's an easy way to avoid
scanning the same index twice, but I need to make sure the check is
valid first.
I'm thinking of simply checking if we've seen the index conglomerate
number already, and if so, ignore that index. The question is, can two
truly different indexes be represented by the same conglomerate number?
Interesting that the second column it looks like it was created in 2010.
seems like there might be multiple problems here. It would be great to
figure out how the multiple rows got in there in the first place. I
would suggest seeing what the system does for ddl in a brand new
database, turn off index stat thread, run your ddl, add some rows to the
table, and hand run update statistics procedure.
Just for information, the istat daemon and the system procedure share
almost all of the core code at the moment. The addition of automatic
updates of index cardinality statistics consisted mainly of adding the
"daemon framework" and triggering logic.
I would not be surprised if existing code does not expect the number of
rows you are seeing. If this is really a buggy orphan row, probably best
zero admin fix is to change the update statistics code to look for
other rows and delete them.
you can follow the update statistic code starting at:
java/engine/org/apache/derby/impl/sql/execute/AlterTableConstantAction/updateStatistics
There may be a index stat daemon bug here too, but I think it might just
be getting confused by the bad data in the catalog. Maybe it just keeps
reading all the rows and queueing work, but calling update statistics
only update 2 of the 3 rows.
Yes, the triggering code is driven by the existing statistics for the
table, whereas the update code is driven by the list of indexes on the
table.
--
Kristian
[1] https://issues.apache.org/jira/browse/DERBY-3790
< snip >