Re: question about istats thresholds

Kristian Waagan Tue, 15 Feb 2011 11:26:31 -0800

On 11.02.2011 23:29, Mike Matrigali wrote:

Thanks, could you take a look at DERBY-4211.  It looks like
the stat updater is running, but I don't think it should be.
basically what would you expect to happen on a newly created
table, that then has 7 rows added to it.

I've only looked briefly at the test, and here are my thoughts aboutwhat's going on:o some of the tables in the test are created, populated and thenhaving an index created. Since the table is not empty, the indexcreation will cause statistics to be generated.

 o queries in the test will then cause the istat scheduling logic to fire.

o due to inaccurate row estimates for the table the istat incorrectlyschedules an update.

My opinion (after having looked very quickly at this) is that the istatcode is doing as it should with the current parameters. The bad behavioris caused by a combination of poor information quality (the rowestimate), the low number of rows in the table ("defeats" thelogarithmic threshold), and the istat configuration (absdiff=0).Since the row estimate is exactly that - an estimate - it may be wise toreintroduce the absdiff parameter to avoid problems like these for smalltables. At least it should be simple to change its value and re-run thetest to see if the istat work is still happening or not (note that thevalue quoted below is wrong -derby.storage.indexStats.debug.absdiffThreshold is currently set to zero).


There are at least two issues with the row estimate handling:
 o not logged

o there are two ways to update the estimate: using an absolute value,or using deltas. In some cases these two ways interfere, i.e. changesalready reflected by a set absolute value are also applied afterwards asdelta operations.


case one: then queries are run from ij

If the stat updater is running for case one, where there are no indexes,that's certainly a bug!



Regards,
--
Kristian

case two: an index is created on the table, and then queries are runfrom ij


Kristian Waagan wrote:

On 11.02.11 20:11, Mike Matrigali wrote:

From DERBY-4934 i see there are the following thresholds:
 a) derby.storage.indexStats.debug.createThreshold (100)
 b) derby.storage.indexStats.debug.absdiffThreshold (1000)
 c) derby.storage.indexStats.debug.lndiffThreshold (1.0)
 d) derby.storage.indexStats.debug.queueSize (5)

My question is that I don't understand how they are expected tointeract. If a table has less than 100 rows does that mean

stat will not be created even if b or c is exceeded.


Hi Mike,

To start with, you can probably ignore threshold (d) for now.
It applies to the scheduling phase - that is when the unit of work is
scheduled with the daemon - and to get that far (at least) one of the
other thresholds has to be exceeded. If the queue is full the unit of
work won't be scheduled, and another attempt may be made at a later time
during another statement compilation. This requires that someone
actually compiles a relevant query, or potentially that the existing
statement is recompiled (stale plan check).
The purpose of (d) is to avoid excessive queue growth. Since the queue
is implemented as a list, searching it for duplicates may also be
expensive if it grows too large.

Threshold (a) concerns indexes without existing statistics. If there are
less than 100 rows in the base table, statistics won't be created.

Thresholds (b) and (c) concern indexes with existing statistics.
Threshold (b) was introduced to avoid too frequent updates of existing
statistics for small tables. I don't remember off the top of my head
where it was discussed, but I ended up effectively removing it by
setting it to zero for now. I kept the property (and the relevant code)
to allow people to experiment somewhat without having to recompile the

code if they have an application running into trouble with thisscenario.

Finally, the main threshold for existing statistics is (c). Here the
natural logarithms of the row estimate of the index statistics and the
row estimate of the base table are compared. If the difference is
greater than or equal to lndiffThreshold (defaults to 1.0) the
statistics for the index are scheduled for update. If the daemon queue
is full the request is discarded, assuming another compilation will
manage to schedule the update eventually.


Hope this helped a bit, feel free to ask additional questions. As I have
said before, these threshold may have to be changed significantly as we

test the feature (remove existing, add new ones, or modify existingones).



Cheers,

Re: question about istats thresholds

Reply via email to