Re: [jira] Updated: (DERBY-937) Instability in wisconsin test

Mike Matrigali Thu, 29 Jun 2006 10:20:35 -0700


Andreas Korneliussen (JIRA) wrote:

     [ http://issues.apache.org/jira/browse/DERBY-937?page=all ]

Andreas Korneliussen updated DERBY-937:
---------------------------------------

    Attachment: DERBY-937.diff
                DERBY-937.stat

I was able to reproduce this problem on every run on a very fast laptop (it was 
not reproducible on any other Windows lab machine I have tried).  This laptop 
happens to be the same kind as Ole uses for the nightly tests.

Adding a 10 second sleeping period after the population of the tables, did not 
have any effect. I therefore tried to do run a compress on the tables (based on 
the assumption that statistics is updated on compress), and now the test does 
not fail for me anymore.

Attached is the patch which makes this test stop failing. The patch does not 
seem to have any sideeffects on other platforms (Solaris) tested, however the 
test will use more time.

It is great that this fixes the problem on your reproducible case, I saywe should commit. But I don't know why it does. To me it looks likethe loader program always first inserts all the data into the basetables and then creates the indexes. In this case Derby automatically

creates the "statistics" that you mention here, and they should be
no different than what is recreated when you do the compress.  Also
derby will create "packed" indexes in this case which again should look
no different than what compress will do (since all that has happened

are inserts into the table - if deletes or updates were involved itwould be a different story).

What the change does do is alter the cache a lot, and alter the I/Otiming a lot.


Also I wanted to again make sure people understand what statistics
mean in derby.
What is usually called "statistics" in other db's comes in 3 forms
in derby:

1) key distribution information

o this is automatically maintained by derby by using existingindexes, no user interaction is ever needed and it is always up

to date.  Basically we use the index itself to estimate a given
number of rows for a range of keys.  Most other db's I know require
some sort of "update statistics" to get this info.

2) base table row counts
   o also automatically maintained by derby, but can be slightly out
     of date as it is only an estimate for performance reasons.  Row
     counts are updated at insert/delete time on a per page basis
     automatically, but we delay the rollup.  The rollup is done
     always when the page goes to disk.  The rollup may also be done
     earlier if the change is significant to the total - I think if
     delta is 10% or more of the table.  The row count estimate is also
     updated if they system ever happens to do a full scan on the
     table.

3) cardinality statistic
   This is basically one number which is the average number of duplicate
   keys for a given key, where key is a leading set of columns in an

index. So a one key index has one number and an index with 3columns (a, b, c) has 3 numbers indicating the cardinality of (a), (a,b) and

(a, b, c).

o This statistic is automatically created when an index is created,and also when a number of "bait/switch" ddl operations are done like

compress table.  There have been a number of discussions on the list
on how to automate the update of this.  My opinion is that we should
attack it in the following 3 ways:
o provide a way for users to schedule it to be calculated, other than by

running compress. Not greate for a zero-admin - just a work arounduntil we can better automate it. The code exists, just not shownthrough to users right now.

o Develop a back ground "zero-admin" component which wouldprogramtically figure out how and when to schedule this kind of activity.

o see if there is any smart quick way to estimate the valuesautomatically by doing some statistical analysis of the the btree

leaf pages.

Re: [jira] Updated: (DERBY-937) Instability in wisconsin test

Reply via email to