Re: question about istats thresholds

Kristian Waagan Wed, 16 Feb 2011 02:26:18 -0800

On 15.02.11 23:15, Mike Matrigali wrote:

Thanks for taking a look at this. As you point out the rowestimates are particularly hard to count on for very small number ofrows. After the table gets big the
other params seem ok until more feedback.  I wonder if there should be
just be some minimum table sizes rather than "diff" sizes.  I would lean
toward defaulting to not running stats on a table that has stats unless
it is over some minimum size - say 1000 rows.


First, when you refer to table size, do you mean the table row estimate?


So behavior would be:

If we make this change then expected behavior would be:
o no scheduled istat runs for tables with no indexes


As it is today.

o no scheduled istat run for a table with indexes with under 100 rows.


As it is today - this controls creation of new statistics.

If we introduce a lower limit of 1000 rows for updating statistics,would you consider increasing the creation limit to 1000 rows as well?

o no scheduled istat runs for a table with indexes and existing statswith under 1000 rows.


This would be new behavior.

o tables with over 1000 rows get stats based on current logic.

As it is today, but extended with a cutoff value (i.e. lower limit of1000 rows).

To stress the point, what we are now talking about concerns smalltables. Inaccuracies in the row estimate for larger tables are lesslikely to trigger stat updates due to the logarithmic comparison.


One question about at the end:

If we introduce the 1000 rows lower limit for stats update, what willhappen with the stats on a table that shrinks from 1 million rows to 100rows in one go?This may be a corner case, but updating the stats once for a small tableis cheap. On the other side, the suboptimal plan the optimizer may comeup with may not be that much more expensive to execute for a small tablewith less than 1000 rows either?



--
Kristian

Kristian Waagan wrote:
On 11.02.2011 23:29, Mike Matrigali wrote:
Thanks, could you take a look at DERBY-4211.  It looks like
the stat updater is running, but I don't think it should be.
basically what would you expect to happen on a newly created
table, that then has 7 rows added to it.
I've only looked briefly at the test, and here are my thoughts aboutwhat's going on:o some of the tables in the test are created, populated and thenhaving an index created. Since the table is not empty, the indexcreation will cause statistics to be generated.o queries in the test will then cause the istat scheduling logic tofire.o due to inaccurate row estimates for the table the istatincorrectly schedules an update.
My opinion (after having looked very quickly at this) is that theistat code is doing as it should with the current parameters. The badbehavior is caused by a combination of poor information quality (therow estimate), the low number of rows in the table ("defeats" thelogarithmic threshold), and the istat configuration (absdiff=0).Since the row estimate is exactly that - an estimate - it may be wiseto reintroduce the absdiff parameter to avoid problems like these forsmall tables. At least it should be simple to change its value andre-run the test to see if the istat work is still happening or not(note that the value quoted below is wrong -derby.storage.indexStats.debug.absdiffThreshold is currently set tozero).
There are at least two issues with the row estimate handling:
 o not logged
o there are two ways to update the estimate: using an absolutevalue, or using deltas. In some cases these two ways interfere, i.e.changes already reflected by a set absolute value are also appliedafterwards as delta operations.
case one: then queries are run from ij
If the stat updater is running for case one, where there are noindexes, that's certainly a bug!
Regards,

Re: question about istats thresholds

Reply via email to