Re: Question on why indexStat deamon is being triggered

Kristian Waagan Mon, 02 Apr 2012 11:38:49 -0700

On 02.04.2012 19:13, Mike Matrigali wrote:

Kristian Waagan wrote:

On 31.03.12 22:28, Mike Matrigali wrote:

Bergquist, Brett wrote:

I hitched up the debugger and caught the indexStat return "49" for the
number of rows. I walked the stack back and found this was for
conglomeratename = "f3ec4922-011d-491f-3d8c-0000376d74d3"

< snip >

First thought is I didn't think we needed any statistics at all
for a single column primary key.
That's probably DERBY-3790 ([1]).
I'll see if I can write a quick patch proposal and then we can takeit from there.
With daemon DERBY-3790 is probably more important than when peoplewere just
manually running it.  I think a good incremental change is just first
deal with single column unique indexes.  In that case I think we should
make there be no row in sysindexes and verify this is not a problem for
the optimizer because it understands unique index means that cardinality
stat is redundant. I think this should just be handled in updatestatistics code and then the daemon would not have to have special logic
about what index to go after.


Thanks, Mike.

I haven't written any code yet, but this looks like something that iseasily achievable with minor changes to the update statistics code.



I have not looked, but brett reported seeing a non-unique index being
created for a foreign key that matched the primary key.  I think this
is a very normal pattern for database design.  So may be worth looking

at as a next incremental step. I think it might have to be anon-unique index as at that level of the system we might not know thatthe keyis unique and always will be. But there might be room for anoptimization here. Maybe update statistics could determine that the

non-unique index is currently based on a unique index and not do
the scanning work, and instead just immediately add a stat row with
number of unique rows equal to current number of rows.  I think this
is easier than teaching the optimizer about this.  Again I would just
go after the single column case.  Once there is more than one column
we are going to scan whole index once for all stats, so not worth
worrying about unique index case.

This one requires more investigation (I don't know exactly what theindex descriptors look like yet for these), but I observed that inBrett's case the same conglomerate was scanned twice (seen from theistat output).

There should be entries in this catalog for every index on a table, so
it is not necessarily expected to be one per table. I think for each
index there is an entry for each column in the index (actually the stat
applies to multiple columns so in a 3 column index I think there is an
entry for (col1), (col1,col2), and (col1, col2, col3). But looking at
your ddl I would only expect 1 entry, as there is one
index on the table, and it has one column. I am not sure what the
expectation is for the foreign key that matches the primary key, maybe
it is 2 rows?

I do wonder if the foreign key is causing some extra issues. I think
in the long past we would end up creating 2 indexes in this case and
they would be exactly the same. So functionality was added to just
"logically" create the index and it would rely on the underlying
physical index sort of "shared" by the primary key on id and theforeign
constraint on id. We have had bugs in this area in the past and wonder
if it left an "orphan" row in the statistics.
As mentioned in another post, I think there's an easy way to avoidscanning the same index twice, but I need to make sure the check isvalid first.I'm thinking of simply checking if we've seen the index conglomeratenumber already, and if so, ignore that index. The question is, cantwo truly different indexes be represented by the same conglomeratenumber
Interesting that the second column it looks like it was created in2010.
seems like there might be multiple problems here. It would be great to
figure out how the multiple rows got in there in the first place. I
would suggest seeing what the system does for ddl in a brand new
database, turn off index stat thread, run your ddl, add some rows tothe
table, and hand run update statistics procedure.
Just for information, the istat daemon and the system procedure sharealmost all of the core code at the moment. The addition of automaticupdates of index cardinality statistics consisted mainly of addingthe "daemon framework" and triggering logic.
great, i would like that any fixes in the area of this bad row work for
both update stats triggered by the daemon and those triggered by direct
call by user.
I would not be surprised if existing code does not expect the number of
rows you are seeing. If this is really a buggy orphan row, probablybest
zero admin fix is to change the update statistics code to look for
other rows and delete them.
you can follow the update statistic code starting at:
java/engine/org/apache/derby/impl/sql/execute/AlterTableConstantAction/updateStatistics
There may be a index stat daemon bug here too, but I think it mightjust
be getting confused by the bad data in the catalog. Maybe it just keeps
reading all the rows and queueing work, but calling update statistics
only update 2 of the 3 rows.
Yes, the triggering code is driven by the existing statistics for thetable, whereas the update code is driven by the list of indexes onthe table.
Does the daemon when it determines an update is necessary, go throughthe udpate stat path telling it to update stats on all indexes or does
it call update stat one index at a time?


It goes through the path where all indexes are updated.
This can be changed in two locations if we wish to:

1) In the scheduling call (currently it takes a TableDescriptor as anargument)2) Inside the daemon itself, where we could (re-)check the status ofeach index before processing it.

When a statement is compiled the code checks the index cardinalitystatistics for staleness if there are qualifying indexes for the query.Once a stale index is found for a base table, the table descriptor isstashed aside to be submitted to the daemon as a unit of work (and westop checking the rest of the indexes of that table).

Currently, the only time a unit of work can be scheduled is when aselect statement using an index is compiled/recompiled.



--
Kristian

Re: Question on why indexStat deamon is being triggered

Reply via email to