Thanks for the explanation! So if I understand this correctly, if the indexes that I am querying on are always unique (the computed value that is index is always unique), then as long as the optimizer knows about these statistics (ie I have run "syscs_util.syscs_update_statistics()" after the table has a few rows), and my query is always like "select * from table where indexval >= 'some starting value' and indexval < 'some ending value'" then it should always use the index?
Again, thanks. Brett -----Original Message----- From: Mike Matrigali [mailto:mikem_...@sbcglobal.net] Sent: Thursday, November 18, 2010 1:51 PM To: derby-dev@db.apache.org Subject: Re: Question on cardinality, statistics, and when things go stale derby uses 2 types of statistics, for this discussion I will call them distribution and cardinality. For distribution derby uses the index themselves at query compile time, so this info never goes stale. For distribution the optimizer might need to know what percentage of the keys in an index are between value1 and value2. Cardinality information can go stale and depends on the actual data in the table. Cardinality is used by the optimizer when the query does not have actual values at compile time (ie. ? operators). Cardinality is basically a single number that represents the average number of duplicates per data value in the table. In the case of a non unique index on a single column a logically it will maintain one number that represents the average number of duplicates for a. The stat can never be wrong for a unique index on a single column as it is always 1 value per data value. For a 2 column unique index on (a, b) a cardinality count will be maintained on just the a values as they can be duplicate and it is possible to use the index for just qualifying on a. No cardinality count is necesary for queries that will provide both a and b as again the cardinality count for that is a, b. For many applications once a significant number of rows in the table have been inserted it is enough to get the statistics once and as rows come and go it does not matter as the average does not change. But of course it is easy to define an app that can break this. Your question also depends on the query. Assuming your index is on (a, b) and your query is something like select * from x where a = ? and b = ? then the system is unlikely to ever not pick the index. But usually the choice is not as clear. Bergquist, Brett wrote: > I have read as much as I can about this subject and am a little > confused. I understand the cardinality statistics are not computed if > the tables are empty when indexes are created and I can work around that. > > > > Say I have a table with 15 million records in it and there is a main > index that I need to be used to query and I update the statistics and > verify that query optimizer is using that index. This table will have > many rows inserted and many rows deleted every day. Basically the index > is a function of a timestamp and a counter. So older records are going > to be deleted and newer records are going to be inserted. > > > > My question is will the query optimizer revert back to doing a table > scan at some point instead of using the index. Basically will the index > go stale? > > > > A second question is does a call to > "syscs_util.syscs_update_statistics' lock a table/index while it is > being performed? This tables of concern where the indexes might go > stale never have a free second in which inserts and deletes are being done.... > > > > > > Thanks for any information. > > > > Brett > > >