On 11.02.2011 23:29, Mike Matrigali wrote:
Thanks, could you take a look at DERBY-4211.  It looks like
the stat updater is running, but I don't think it should be.
basically what would you expect to happen on a newly created
table, that then has 7 rows added to it.

I've only looked briefly at the test, and here are my thoughts about what's going on: o some of the tables in the test are created, populated and then having an index created. Since the table is not empty, the index creation will cause statistics to be generated.
 o queries in the test will then cause the istat scheduling logic to fire.
o due to inaccurate row estimates for the table the istat incorrectly schedules an update.

My opinion (after having looked very quickly at this) is that the istat code is doing as it should with the current parameters. The bad behavior is caused by a combination of poor information quality (the row estimate), the low number of rows in the table ("defeats" the logarithmic threshold), and the istat configuration (absdiff=0). Since the row estimate is exactly that - an estimate - it may be wise to reintroduce the absdiff parameter to avoid problems like these for small tables. At least it should be simple to change its value and re-run the test to see if the istat work is still happening or not (note that the value quoted below is wrong - derby.storage.indexStats.debug.absdiffThreshold is currently set to zero).

There are at least two issues with the row estimate handling:
 o not logged
o there are two ways to update the estimate: using an absolute value, or using deltas. In some cases these two ways interfere, i.e. changes already reflected by a set absolute value are also applied afterwards as delta operations.


case one: then queries are run from ij

If the stat updater is running for case one, where there are no indexes, that's certainly a bug!


Regards,
--
Kristian

case two: an index is created on the table, and then queries are run from ij

Kristian Waagan wrote:
On 11.02.11 20:11, Mike Matrigali wrote:
From DERBY-4934 i see there are the following thresholds:
 a) derby.storage.indexStats.debug.createThreshold (100)
 b) derby.storage.indexStats.debug.absdiffThreshold (1000)
 c) derby.storage.indexStats.debug.lndiffThreshold (1.0)
 d) derby.storage.indexStats.debug.queueSize (5)


My question is that I don't understand how they are expected to interact. If a table has less than 100 rows does that mean
stat will not be created even if b or c is exceeded.

Hi Mike,

To start with, you can probably ignore threshold (d) for now.
It applies to the scheduling phase - that is when the unit of work is
scheduled with the daemon - and to get that far (at least) one of the
other thresholds has to be exceeded. If the queue is full the unit of
work won't be scheduled, and another attempt may be made at a later time
during another statement compilation. This requires that someone
actually compiles a relevant query, or potentially that the existing
statement is recompiled (stale plan check).
The purpose of (d) is to avoid excessive queue growth. Since the queue
is implemented as a list, searching it for duplicates may also be
expensive if it grows too large.

Threshold (a) concerns indexes without existing statistics. If there are
less than 100 rows in the base table, statistics won't be created.

Thresholds (b) and (c) concern indexes with existing statistics.
Threshold (b) was introduced to avoid too frequent updates of existing
statistics for small tables. I don't remember off the top of my head
where it was discussed, but I ended up effectively removing it by
setting it to zero for now. I kept the property (and the relevant code)
to allow people to experiment somewhat without having to recompile the
code if they have an application running into trouble with this scenario.
Finally, the main threshold for existing statistics is (c). Here the
natural logarithms of the row estimate of the index statistics and the
row estimate of the base table are compared. If the difference is
greater than or equal to lndiffThreshold (defaults to 1.0) the
statistics for the index are scheduled for update. If the daemon queue
is full the request is discarded, assuming another compilation will
manage to schedule the update eventually.


Hope this helped a bit, feel free to ask additional questions. As I have
said before, these threshold may have to be changed significantly as we
test the feature (remove existing, add new ones, or modify existing ones).


Cheers,


Reply via email to