[PATCH] BackingStoreHashtable

Jack Klebanoff 25 Feb 2005 17:41:13 -0000

I have attached a patch that causes BackingStoreHashtable to spill to disk when it gets large. BackingStoreHashtable is used to implement hash joins, DISTINCT, scroll insensitive cursors, and the HAVING clause. The unpatched BackingStoreHashtable never spills to disk. This causes Derby to sometimes run out of memory. See Jira report Derby-106.

One of the arguments of the BackingStoreHashtable constructor is the maximum number of rows to store in memory. If this argument is non-negative then the patched BackingStoreHashtable spills to disk when more than that number of rows are added to the hash table.

If the max_inmemory_rowcnt argument is negative then BackingStoreHashtable decides itself when to spill to disk. It does so when its estimate of the size of the hash table (in bytes) grows larger than 1% of the total memory when the BackingStoreHashtable was created. Currently Derby only constructs BackingStoreHashtables with max_inmemory_rowcnt = -1, so this mechanism is always used to decide when to spill.

The disk portion of the hash table is handled in class DiskHashtable. This does not implement a dynamic hash table data structure. Instead it is implemented using an idea of Mike Matrigali. The rows are stored in a standard heap conglomerate, also used by standard Derby tables. A Btree is used to index the rows by hash code. We cannot index them by their actual keys because our Btree implementation limits the length of a key. In order to find a record by key DiskHashtable scans the Btree for all rows with the same hash code and matches the retrieved rows with the target key.

The advantage of this method is that it requires little new code because it uses existing heap and btree conglomerate implementations. The disadvantage is that it is slower than a dynamic hash table structure. We expect that in most cases BackingStoreHashtable will not spill to disk, so this trade off seems appropriate.

Issues and Future Work
------------------------

The Derby community may want to consider some follow on work.

Hash join costing should be re-evaluated now that BackingStoreHashtable can spill to disk. The optimizer can consider using hash joins on larger tables, but it should take the cost of disk access into account. This leads into a larger issue of optimizer costing. Our cost numbers were derived many years ago by timing various operations on a particular machine. That machine is probably no longer available for timing BackingStoreHashtable, and it is probably obsolete now. We should probably revise all of our costing numbers on a more modern machine.

We may want to implement a real dynamic disk hash structure to improve the speed of BackingStoreHashtable when it has spilled to disk. If it were faster we could use hash joins more often, potentially improving the speed of some Derby joins. Furthermore BackingStoreHashtable is used elsewhere. Our assumption that BackingStoreHashtable will seldom spill to disk may not be correct.

In my implementation BackingStoreHashtable keeps old rows in memory after it decides to spill new rows to disk. Mike Matrigali suggested that it should move the old rows to disk to reduce memory usage. This comes at the price of slowing down both the time to populate a BackingStoreHashtable and the time to access an element. That is why I chose not to move old rows to disk.

Jack Klebanoff

Index: 
java/engine/org/apache/derby/impl/sql/execute/ScrollInsensitiveResultSet.java
===================================================================
--- 
java/engine/org/apache/derby/impl/sql/execute/ScrollInsensitiveResultSet.java   
    (revision 155029)
+++ 
java/engine/org/apache/derby/impl/sql/execute/ScrollInsensitiveResultSet.java   
    (working copy)
@@ -66,7 +66,6 @@
 
 
        private int                                                     
sourceRowWidth;
-       private TransactionController           tc;
 
        private   BackingStoreHashtable         ht;
        private   ExecRow                                       resultRow;
@@ -87,6 +86,8 @@
 
     private GeneratedMethod closeCleanup;
 
+    private boolean keepAfterCommit;
+
        /**
         * Constructor for a ScrollInsensitiveResultSet
         *
@@ -110,6 +111,7 @@
                          optimizerEstimatedRowCount, optimizerEstimatedCost);
                this.source = source;
                this.sourceRowWidth = sourceRowWidth;
+        keepAfterCommit = activation.getResultSetHoldability();
                maxRows = activation.getMaxRows();
                if (SanityManager.DEBUG)
                {
@@ -160,7 +162,7 @@
                 * We need BackingStoreHashtable to actually go to disk when it 
doesn't fit.
                 * This is a known limitation.
                 */
-               ht = new BackingStoreHashtable(tc,
+               ht = new BackingStoreHashtable(getTransactionController(),
                                                                           null,
                                                                           
keyCols,
                                                                           
false,
@@ -168,7 +170,8 @@
                                                                           
HashScanResultSet.DEFAULT_MAX_CAPACITY,
                                                                           
HashScanResultSet.DEFAULT_INITIAL_CAPACITY,
                                                                           
HashScanResultSet.DEFAULT_MAX_CAPACITY,
-                                                                          
false);
+                                                                          
false,
+                                       keepAfterCommit);
 
                openTime += getElapsedMillis(beginTime);
                setBeforeFirstRow();
Index: java/engine/org/apache/derby/impl/sql/execute/HashTableResultSet.java
===================================================================
--- java/engine/org/apache/derby/impl/sql/execute/HashTableResultSet.java       
(revision 155029)
+++ java/engine/org/apache/derby/impl/sql/execute/HashTableResultSet.java       
(working copy)
@@ -221,7 +221,8 @@
                                                                                
   maxInMemoryRowCount,
                                                                                
   (int) initialCapacity,
                                                                                
   loadFactor,
-                                                                               
   skipNullKeyColumns);
+                                                                               
   skipNullKeyColumns,
+                                           false /* Not kept after a commit 
*/);
 
                        if (runTimeStatsOn)
                        {
Index: 
java/engine/org/apache/derby/impl/store/access/BackingStoreHashTableFromScan.java
===================================================================
--- 
java/engine/org/apache/derby/impl/store/access/BackingStoreHashTableFromScan.java
   (revision 155029)
+++ 
java/engine/org/apache/derby/impl/store/access/BackingStoreHashTableFromScan.java
   (working copy)
@@ -97,7 +97,8 @@
             max_inmemory_rowcnt,
             initialCapacity,
             loadFactor,
-                       skipNullKeyColumns);
+                       skipNullKeyColumns,
+            false /* Do not keep the hash table after a commit. */);
 
         open_scan =  (ScanManager)
             tc.openScan(
Index: java/engine/org/apache/derby/iapi/store/access/DiskHashtable.java
===================================================================
--- java/engine/org/apache/derby/iapi/store/access/DiskHashtable.java   
(revision 0)
+++ java/engine/org/apache/derby/iapi/store/access/DiskHashtable.java   
(revision 0)
@@ -0,0 +1,377 @@
+/*
+
+   Derby - Class org.apache.derby.iapi.store.access.DiskHashtable
+
+   Copyright 2005 The Apache Software Foundation or its licensors, as 
applicable.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+ */
+
+package org.apache.derby.iapi.store.access;
+
+import java.util.Enumeration;
+import java.util.NoSuchElementException;
+import java.util.Properties;
+import java.util.Vector;
+import org.apache.derby.iapi.error.StandardException;
+import org.apache.derby.iapi.services.io.FormatableBitSet;
+import org.apache.derby.iapi.types.DataValueDescriptor;
+import org.apache.derby.iapi.types.SQLInteger;
+import org.apache.derby.impl.store.access.heap.HeapRowLocation;
+import org.apache.derby.iapi.types.RowLocation;
+import org.apache.derby.iapi.services.context.ContextService;
+import org.apache.derby.iapi.sql.conn.LanguageConnectionContext;
+
+/**
+ * This class is used by BackingStoreHashtable when the BackingStoreHashtable 
must spill to disk.
+ * It implements the methods of a hash table: put, get, remove, elements, 
however it is not implemented
+ * as a hash table. In order to minimize the amount of unique code it is 
implemented using a Btree and a heap
+ * conglomerate. The Btree indexes the hash code of the row key. The actual 
key may be too long for
+ * our Btree implementation.
+ *
+ * Created: Fri Jan 28 13:58:03 2005
+ *
+ * @author <a href="mailto:[EMAIL PROTECTED]">Jack Klebanoff</a>
+ * @version 1.0
+ */
+
+public class DiskHashtable 
+{
+    private final long rowConglomerateId;
+    private ConglomerateController rowConglomerate;
+    private final long btreeConglomerateId;
+    private ConglomerateController btreeConglomerate;
+    private final DataValueDescriptor[] btreeRow;
+    private final int[] key_column_numbers;
+    private final boolean remove_duplicates;
+    private final TransactionController tc;
+    private final DataValueDescriptor[] row;
+    private final DataValueDescriptor[] scanKey = { new SQLInteger()};
+    private int size;
+    private boolean keepStatistics;
+
+    /**
+     * Creates a new <code>DiskHashtable</code> instance.
+     *
+     * @param tc
+     * @param template An array of DataValueDescriptors that serves as a 
template for the rows.
+     * @param key_column_numbers The indexes of the key columns (0 based)
+     * @param remove_duplicates If true then rows with duplicate keys are 
removed
+     * @param keepAfterCommit If true then the hash table is kept after a 
commit
+     */
+    public DiskHashtable( TransactionController tc,
+                          DataValueDescriptor[] template,
+                          int[] key_column_numbers,
+                          boolean remove_duplicates,
+                          boolean keepAfterCommit)
+        throws StandardException
+    {
+        this.tc = tc;
+        this.key_column_numbers = key_column_numbers;
+        this.remove_duplicates = remove_duplicates;
+        LanguageConnectionContext lcc = (LanguageConnectionContext)
+                               
ContextService.getContextOrNull(LanguageConnectionContext.CONTEXT_ID);
+        keepStatistics = (lcc != null) && lcc.getRunTimeStatisticsMode();
+        row = new DataValueDescriptor[ template.length];
+        for( int i = 0; i < row.length; i++)
+            row[i] = template[i].getNewNull();
+        int tempFlags = keepAfterCommit ? (TransactionController.IS_TEMPORARY 
| TransactionController.IS_KEPT)
+          : TransactionController.IS_TEMPORARY;
+        
+        rowConglomerateId = tc.createConglomerate( "heap",
+                                                   template,
+                                                   (ColumnOrdering[]) null,
+                                                   (Properties) null,
+                                                   tempFlags);
+        rowConglomerate = tc.openConglomerate( rowConglomerateId,
+                                               false, // Do not hold past the 
end of transaction
+                                               
TransactionController.OPENMODE_FORUPDATE,
+                                               
TransactionController.MODE_TABLE,
+                                               
TransactionController.ISOLATION_NOLOCK /* Single thread only */ );
+
+        btreeRow = new DataValueDescriptor[] { new SQLInteger(), 
rowConglomerate.newRowLocationTemplate()};
+        Properties btreeProps = new Properties();
+        btreeProps.put( "baseConglomerateId", String.valueOf( 
rowConglomerateId));
+        btreeProps.put( "rowLocationColumn", "1");
+        btreeProps.put( "allowDuplicates", "false"); // Because the row 
location is part of the key
+        btreeProps.put( "nKeyFields", "2"); // Include the row location column
+        btreeProps.put( "nUniqueColumns", "2"); // Include the row location 
column
+        btreeProps.put( "maintainParentLinks", "false");
+        btreeConglomerateId = tc.createConglomerate( "BTREE",
+                                                     btreeRow,
+                                                     (ColumnOrdering[]) null,
+                                                     btreeProps,
+                                                     tempFlags);
+
+        btreeConglomerate = tc.openConglomerate( btreeConglomerateId,
+                                                 false, // Do not hold past 
the end of transaction
+                                                 
TransactionController.OPENMODE_FORUPDATE,
+                                                 
TransactionController.MODE_TABLE,
+                                                 
TransactionController.ISOLATION_NOLOCK /* Single thread only */ );
+    } // end of constructor
+
+    public void close() throws StandardException
+    {
+        btreeConglomerate.close();
+        rowConglomerate.close();
+        tc.dropConglomerate( btreeConglomerateId);
+        tc.dropConglomerate( rowConglomerateId);
+    } // end of close
+    
+    /**
+     * Put a new row in the overflow structure.
+     *
+     * @param row The row to be inserted.
+     * @param hashCode The row's hash code.
+     *
+     * @return true if the row was added,
+     *         false if it was not added (because it was a duplicate and we 
are eliminating duplicates).
+     *
+     * @exception StandardException standard error policy
+     */
+    public boolean put( Object key, Object[] row)
+        throws StandardException
+    {
+        boolean isDuplicate = false;
+        if( remove_duplicates || keepStatistics)
+        {
+            // Go to the work of finding out whether it is a duplicate
+            isDuplicate = (getRemove( key, false, true) != null);
+            if( remove_duplicates && isDuplicate)
+                return false;
+        }
+        rowConglomerate.insertAndFetchLocation( (DataValueDescriptor[]) row, 
(RowLocation) btreeRow[1]);
+        btreeRow[0].setValue( key.hashCode());
+        btreeConglomerate.insert( btreeRow);
+        if( keepStatistics && !isDuplicate)
+            size++;
+        return true;
+    } // end of put
+
+    /**
+     * Get a row from the overflow structure.
+     *
+     * @param key If the rows only have one key column then the key value. If 
there is more than one
+     *            key column then a KeyHasher
+     *
+     * @return null if there is no corresponding row,
+     *         the row (DataValueDescriptor[]) if there is exactly one row 
with the key
+     *         a Vector of all the rows with the key if there is more than one.
+     *
+     * @exception StandardException
+     */
+    public Object get( Object key)
+        throws StandardException
+    {
+        return getRemove( key, false, false);
+    }
+
+    private Object getRemove( Object key, boolean remove, boolean 
existenceOnly)
+        throws StandardException
+    {
+        int hashCode = key.hashCode();
+        int rowCount = 0;
+        Object retValue = null;
+
+        scanKey[0].setValue( hashCode);
+        ScanController scan = tc.openScan( btreeConglomerateId,
+                                           false, // do not hold
+                                           remove ? 
TransactionController.OPENMODE_FORUPDATE : 0,
+                                           TransactionController.MODE_TABLE,
+                                           
TransactionController.ISOLATION_READ_UNCOMMITTED,
+                                           null, // Scan all the columns
+                                           scanKey,
+                                           ScanController.GE,
+                                           (Qualifier[][]) null,
+                                           scanKey,
+                                           ScanController.GT);
+        try
+        {
+            while( scan.fetchNext( btreeRow))
+            {
+                if( rowConglomerate.fetch( (RowLocation) btreeRow[1], row, 
(FormatableBitSet) null /* all columns */)
+                    && rowMatches( row, key))
+                {
+                    if( existenceOnly)
+                        return this;
+
+                    rowCount++;
+                    if( rowCount == 1)
+                        retValue = BackingStoreHashtable.cloneRow( row);
+                    else 
+                    {
+                        Vector v;
+                        if( rowCount == 2)
+                        {
+                            v = new Vector( 2);
+                            v.add( retValue);
+                            retValue = v;
+                        }
+                        else
+                            v = (Vector) retValue;
+                        v.add( BackingStoreHashtable.cloneRow( row));
+                    }
+                    if( remove)
+                    {
+                        rowConglomerate.delete( (RowLocation) btreeRow[1]);
+                        scan.delete();
+                        size--;
+                    }
+                    if( remove_duplicates)
+                        // This must be the only row with the key
+                        return retValue;
+                }
+            }
+        }
+        finally
+        {
+            scan.close();
+        }
+        return retValue;
+    } // end of getRemove
+
+
+    private boolean rowMatches( DataValueDescriptor[] row,
+                                Object key)
+    {
+        if( key_column_numbers.length == 1)
+            return row[ key_column_numbers[0]].equals( key);
+
+        KeyHasher kh = (KeyHasher) key;
+        for( int i = 0; i < key_column_numbers.length; i++)
+        {
+            if( ! row[ key_column_numbers[i]].equals( kh.getObject(i)))
+                return false;
+        }
+        return true;
+    } // end of rowMatches
+
+    /**
+     * remove all rows with a given key from the hash table.
+     *
+     * @param key          The key of the rows to remove.
+     *
+     * @return The removed row(s).
+     *
+        * @exception  StandardException  Standard exception policy.
+     **/
+    public Object remove( Object key)
+               throws StandardException
+    {
+        return getRemove( key, true, false);
+    } // end of remove
+
+    /**
+     * @return The number of rows in the hash table
+     */
+    public int size()
+    {
+        return size;
+    }
+    
+    /**
+     * Return an Enumeration that can be used to scan entire table.
+     * <p>
+     * RESOLVE - is it worth it to support this routine?
+     *
+        * @return The Enumeration.
+     *
+        * @exception  StandardException  Standard exception policy.
+     **/
+    public Enumeration elements()
+        throws StandardException
+    {
+        return new ElementEnum();
+    }
+
+    private class ElementEnum implements Enumeration
+    {
+        private ScanController scan;
+        private boolean hasMore;
+
+        ElementEnum()
+        {
+            try
+            {
+                scan = tc.openScan( rowConglomerateId,
+                                    false, // do not hold
+                                    0, // read only
+                                    TransactionController.MODE_TABLE,
+                                    TransactionController.ISOLATION_NOLOCK,
+                                    (FormatableBitSet) null, // all columns
+                                    (DataValueDescriptor[]) null, // no start 
key
+                                    0, // no start key operator
+                                    (Qualifier[][]) null,
+                                    (DataValueDescriptor[]) null, // no stop 
key
+                                    0 /* no stop key operator */);
+                hasMore = scan.next();
+                if( ! hasMore)
+                {
+                    scan.close();
+                    scan = null;
+                }
+            }
+            catch( StandardException se)
+            {
+                hasMore = false;
+                if( scan != null)
+                {
+                    try
+                    {
+                        scan.close();
+                    }
+                    catch( StandardException se1){};
+                    scan = null;
+                }
+            }
+        } // end of constructor
+
+        public boolean hasMoreElements()
+        {
+            return hasMore;
+        }
+
+        public Object nextElement()
+        {
+            if( ! hasMore)
+                throw new NoSuchElementException();
+            try
+            {
+                scan.fetch( row);
+                Object retValue =  BackingStoreHashtable.cloneRow( row);
+                hasMore = scan.next();
+                if( ! hasMore)
+                {
+                    scan.close();
+                    scan = null;
+                }
+
+                return retValue;
+            }
+            catch( StandardException se)
+            {
+                if( scan != null)
+                {
+                    try
+                    {
+                        scan.close();
+                    }
+                    catch( StandardException se1){};
+                    scan = null;
+                }
+                throw new NoSuchElementException();
+            }
+        } // end of nextElement
+    } // end of class ElementEnum
+}
Index: java/engine/org/apache/derby/iapi/store/access/BackingStoreHashtable.java
===================================================================
--- java/engine/org/apache/derby/iapi/store/access/BackingStoreHashtable.java   
(revision 155029)
+++ java/engine/org/apache/derby/iapi/store/access/BackingStoreHashtable.java   
(working copy)
@@ -29,10 +29,13 @@
 import org.apache.derby.iapi.types.CloneableObject;
 import org.apache.derby.iapi.types.DataValueDescriptor;
 
+import org.apache.derby.iapi.services.cache.ClassSize;
+
 import java.util.Enumeration;
 import java.util.Hashtable;
 import java.util.Properties; 
 import java.util.Vector;
+import java.util.NoSuchElementException;
 
 /**
 A BackingStoreHashtable is a utility class which will store a set of rows into
@@ -102,13 +105,36 @@
      * Fields of the class
      **************************************************************************
      */
+    private TransactionController tc;
     private Hashtable   hash_table;
     private int[]       key_column_numbers;
     private boolean     remove_duplicates;
        private boolean         skipNullKeyColumns;
     private Properties  auxillary_runtimestats;
        private RowSource       row_source;
+    /* If max_inmemory_rowcnt > 0 then use that to decide when to spill to 
disk.
+     * Otherwise compute max_inmemory_size based on the JVM memory size when 
the BackingStoreHashtable
+     * is constructed and use that to decide when to spill to disk.
+     */
+    private long max_inmemory_rowcnt;
+    private long inmemory_rowcnt;
+    private long max_inmemory_size;
+    private boolean keepAfterCommit;
 
+    private static int vectorSize; // The estimated number of bytes used by 
Vector(0)
+    static {
+        try
+        {
+            vectorSize = ClassSize.estimateBase( java.util.Vector.class);
+        }
+        catch( SecurityException se)
+        {
+            vectorSize = 4*ClassSize.refSize;
+        }
+    };
+    
+    private DiskHashtable diskHashtable;
+
     /**************************************************************************
      * Constructors for This class:
      **************************************************************************
@@ -163,7 +189,10 @@
         *
         * @param skipNullKeyColumns    Skip rows with a null key column, if 
true.
      *
+     * @param keepAfterCommit If true the hash table is kept after a commit,
+     *                        if false the hash table is dropped on the next 
commit.
      *
+     *
         * @exception  StandardException  Standard exception policy.
      **/
     public BackingStoreHashtable(
@@ -175,13 +204,21 @@
     long                    max_inmemory_rowcnt,
     int                     initialCapacity,
     float                   loadFactor,
-       boolean                                 skipNullKeyColumns)
+       boolean                                 skipNullKeyColumns,
+    boolean                 keepAfterCommit)
         throws StandardException
     {
         this.key_column_numbers    = key_column_numbers;
         this.remove_duplicates    = remove_duplicates;
                this.row_source                    = row_source;
                this.skipNullKeyColumns    = skipNullKeyColumns;
+        this.max_inmemory_rowcnt = max_inmemory_rowcnt;
+        if( max_inmemory_rowcnt > 0)
+            max_inmemory_size = Long.MAX_VALUE;
+        else
+            max_inmemory_size = Runtime.getRuntime().totalMemory()/100;
+        this.tc = tc;
+        this.keepAfterCommit = keepAfterCommit;
 
         Object[] row;
 
@@ -280,7 +317,7 @@
      *
         * @exception  StandardException  Standard exception policy.
      **/
-    private Object[] cloneRow(Object[] old_row)
+    static Object[] cloneRow(Object[] old_row)
         throws StandardException
     {
         Object[] new_row = new DataValueDescriptor[old_row.length];
@@ -300,8 +337,6 @@
      * @param row               Row to add to the hash table.
      * @param hash_table        The java HashTable to load into.
      *
-        * @return true if successful, false if heap add fails.
-     *
         * @exception  StandardException  Standard exception policy.
      **/
     private void add_row_to_hash_table(
@@ -310,9 +345,14 @@
     Object[]    row)
                throws StandardException
     {
+        if( spillToDisk( hash_table, key, row))
+            return;
+        
         Object  duplicate_value = null;
 
-        if ((duplicate_value = hash_table.put(key, row)) != null)
+        if ((duplicate_value = hash_table.put(key, row)) == null)
+            doSpaceAccounting( row, false);
+        else
         {
             if (!remove_duplicates)
             {
@@ -321,6 +361,7 @@
                 // inserted a duplicate
                 if ((duplicate_value instanceof Vector))
                 {
+                    doSpaceAccounting( row, false);
                     row_vec = (Vector) duplicate_value;
                 }
                 else
@@ -330,6 +371,7 @@
 
                     // insert original row into vector
                     row_vec.addElement(duplicate_value);
+                    doSpaceAccounting( row, true);
                 }
 
                 // insert new row into vector
@@ -345,6 +387,89 @@
         row = null;
     }
 
+    private void doSpaceAccounting( Object[] row,
+                                    boolean firstDuplicate)
+    {
+        inmemory_rowcnt++;
+        if( max_inmemory_rowcnt <= 0)
+        {
+            for( int i = 0; i < row.length; i++)
+            {
+                if( row[i] instanceof DataValueDescriptor)
+                    max_inmemory_size -= ((DataValueDescriptor) 
row[i]).estimateMemoryUsage();
+                max_inmemory_size -= ClassSize.refSize;
+            }
+            max_inmemory_size -= ClassSize.refSize;
+            if( firstDuplicate)
+                max_inmemory_size -= vectorSize;
+        }
+    } // end of doSpaceAccounting
+
+    /**
+     * Determine whether a new row should be spilled to disk and, if so, do it.
+     *
+     * @param hash_table The in-memory hash table
+     * @param key The row's key
+     * @param row
+     *
+     * @return true if the row was spilled to disk, false if not
+     *
+     * @exception  StandardException  Standard exception policy.
+     */
+    private boolean spillToDisk( Hashtable   hash_table,
+                                 Object      key,
+                                 Object[]    row)
+               throws StandardException
+    {
+        // Once we have started spilling all new rows will go to disk, even if 
we have freed up some
+        // memory by moving duplicates to disk. This simplifies handling of 
duplicates and accounting.
+        if( diskHashtable == null)
+        {
+            if( max_inmemory_rowcnt > 0)
+            {
+                if( inmemory_rowcnt < max_inmemory_rowcnt)
+                    return false; // Do not spill
+            }
+            else if( max_inmemory_size > 0)
+                return false;
+            // Want to start spilling
+            if( ! (row instanceof DataValueDescriptor[]))
+            {
+                if( SanityManager.DEBUG)
+                    SanityManager.THROWASSERT( "BackingStoreHashtable row is 
not DataValueDescriptor[]");
+                // Do not know how to put it on disk
+                return false;
+            }
+            diskHashtable = new DiskHashtable( tc,
+                                               (DataValueDescriptor[]) row,
+                                               key_column_numbers,
+                                               remove_duplicates,
+                                               keepAfterCommit);
+        }
+        
+        Object duplicateValue = hash_table.get( key);
+        if( duplicateValue != null)
+        {
+            if( remove_duplicates)
+                return true; // a degenerate case of spilling
+            // If we are keeping duplicates then move all the duplicates from 
memory to disk
+            // This simplifies finding duplicates: they are either all in 
memory or all on disk.
+            if( duplicateValue instanceof Vector)
+            {
+                Vector duplicateVec = (Vector) duplicateValue;
+                for( int i = duplicateVec.size() - 1; i >= 0; i--)
+                {
+                    Object[] dupRow = (Object[]) duplicateVec.elementAt(i);
+                    diskHashtable.put( key, dupRow);
+                }
+            }
+            else
+                diskHashtable.put( key, (Object []) duplicateValue);
+            hash_table.remove( key);
+        }
+        diskHashtable.put( key, row);
+        return true;
+    } // end of spillToDisk
     /**************************************************************************
      * Public Methods of This class:
      **************************************************************************
@@ -364,6 +489,11 @@
                throws StandardException
     {
         hash_table = null;
+        if( diskHashtable != null)
+        {
+            diskHashtable.close();
+            diskHashtable = null;
+        }
         return;
     }
 
@@ -380,7 +510,9 @@
     public Enumeration elements()
         throws StandardException
     {
-        return(hash_table.elements());
+        if( diskHashtable == null)
+            return(hash_table.elements());
+        return new BackingStoreHashtableEnumeration();
     }
 
     /**
@@ -420,7 +552,10 @@
     public Object get(Object key)
                throws StandardException
     {
-        return(hash_table.get(key));
+        Object obj = hash_table.get(key);
+        if( diskHashtable == null || obj != null)
+            return obj;
+        return diskHashtable.get( key);
     }
 
     /**
@@ -451,7 +586,10 @@
     Object      key)
                throws StandardException
     {
-        return(hash_table.remove(key));
+        Object obj = hash_table.remove(key);
+        if( obj != null || diskHashtable == null)
+            return obj;
+        return diskHashtable.remove(key);
     }
 
     /**
@@ -553,7 +691,54 @@
     public int size()
                throws StandardException
     {
-        return(hash_table.size());
+        if( diskHashtable == null)
+            return(hash_table.size());
+        return hash_table.size() + diskHashtable.size();
     }
 
+    private class BackingStoreHashtableEnumeration implements Enumeration
+    {
+        private Enumeration memoryEnumeration;
+        private Enumeration diskEnumeration;
+
+        BackingStoreHashtableEnumeration()
+        {
+            memoryEnumeration = hash_table.elements();
+            if( diskHashtable != null)
+            {
+                try
+                {
+                    diskEnumeration = diskHashtable.elements();
+                }
+                catch( StandardException se)
+                {
+                    diskEnumeration = null;
+                }
+            }
+        }
+        
+        public boolean hasMoreElements()
+        {
+            if( memoryEnumeration != null)
+            {
+                if( memoryEnumeration.hasMoreElements())
+                    return true;
+                memoryEnumeration = null;
+            }
+            if( diskEnumeration == null)
+                return false;
+            return diskEnumeration.hasMoreElements();
+        }
+
+        public Object nextElement() throws NoSuchElementException
+        {
+            if( memoryEnumeration != null)
+            {
+                if( memoryEnumeration.hasMoreElements())
+                    return memoryEnumeration.nextElement();
+                memoryEnumeration = null;
+            }
+            return diskEnumeration.nextElement();
+        }
+    } // end of class BackingStoreHashtableEnumeration
 }
Index: 
java/testing/org/apache/derbyTesting/functionTests/tests/lang/SpillHash.java
===================================================================
--- 
java/testing/org/apache/derbyTesting/functionTests/tests/lang/SpillHash.java    
    (revision 0)
+++ 
java/testing/org/apache/derbyTesting/functionTests/tests/lang/SpillHash.java    
    (revision 0)
@@ -0,0 +1,459 @@
+/*
+
+   Derby - Class org.apache.derbyTesting.functionTests.tests.lang.bug4356
+
+   Copyright 2001, 2004 The Apache Software Foundation or its licensors, as 
applicable.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+ */
+
+package org.apache.derbyTesting.functionTests.tests.lang;
+
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.DatabaseMetaData;
+import java.sql.ResultSet;
+import java.sql.PreparedStatement;
+import java.sql.Statement;
+import java.sql.SQLException;
+import java.util.BitSet;
+
+import org.apache.derby.tools.ij;
+import org.apache.derby.tools.JDBCDisplayUtil;
+
+/**
+ * Test BackingStoreHashtable spilling to disk.
+ * BackingStoreHashtable is used to implement hash joins, distinct, scroll 
insensitive cursors,
+ * outer joins, and the HAVING clause.
+ */
+public class SpillHash
+{
+    private static PreparedStatement joinStmt;
+    private static PreparedStatement distinctStmt;
+    private static PreparedStatement ojStmt1;
+    private static final int LOTS_OF_ROWS = 10000;
+    private static int errorCount = 0;
+    
+    public static void main (String args[]) 
+    {
+        try {
+            /* Load the JDBC Driver class */
+            // use the ij utility to read the property file and
+            // make the initial connection.
+            ij.getPropertyArg(args);
+            Connection conn = ij.startJBMS();
+            Statement stmt = conn.createStatement();
+
+            for( int i = 0; i < prep.length; i++)
+                stmt.executeUpdate( prep[i]);
+            PreparedStatement insA = conn.prepareStatement( "insert into 
ta(ca1,ca2) values(?,?)");
+            PreparedStatement insB = conn.prepareStatement( "insert into 
tb(cb1,cb2) values(?,?)");
+            insertDups( insA, insB, initDupVals);
+
+            joinStmt =
+              conn.prepareStatement( "select ta.ca1, ta.ca2, tb.cb2 from ta, 
tb where ca1 = cb1");
+            distinctStmt =
+              conn.prepareStatement( "select distinct ca1 from ta");
+            ojStmt1 =
+              conn.prepareStatement( "select cc1, cc2 from tc group by cc1,cc2 
having cc1 in (select ta.ca1 from ta)");
+
+            runStatements( conn, 0, new String[][][] {initDupVals});
+
+            System.out.println( "Growing database.");
+            
+            // Add a lot of rows so that the hash tables have to spill to disk
+            conn.setAutoCommit(false);
+            for( int i = 1; i <= LOTS_OF_ROWS; i++)
+            {
+                insA.setInt(1, i);
+                insA.setString(2, ca2Val(i));
+                insA.executeUpdate();
+                insB.setInt(1, i);
+                insB.setString(2, cb2Val(i));
+                insB.executeUpdate();
+            }
+            conn.commit();
+            insertDups( insA, insB, spillDupVals);
+            conn.commit();
+
+            conn.setAutoCommit(true);
+            runStatements( conn, LOTS_OF_ROWS, new String[][][] {initDupVals, 
spillDupVals});
+            
+            conn.close();
+        } catch (Exception e) {
+            System.out.println("FAIL -- unexpected exception "+e);
+            JDBCDisplayUtil.ShowException(System.out, e);
+            e.printStackTrace();
+            errorCount++;
+        }
+        if( errorCount == 0)
+        {
+            System.out.println( "PASSED.");
+            System.exit(0);
+        }
+        else
+        {
+            System.out.println( "FAILED: " + errorCount + ((errorCount == 1) ? 
" error" : " errors"));
+            System.exit(1);
+        }
+    } // end of main
+    
+    private static final String[] prep =
+    {
+        "create table ta (ca1 integer, ca2 char(200))",
+        "create table tb (cb1 integer, cb2 char(200))",
+        "create table tc (cc1 integer, cc2 varchar(16))",
+        "insert into ta(ca1,ca2) values(null, 'Anull')",
+        "insert into tb(cb1,cb2) values(null, 'Bnull')",
+        "insert into tc(cc1,cc2) values(0, 'C0')"
+    };
+
+    private static final String[][] initDupVals =
+    {
+        { "0a", "0b"},
+        { "1a", "1b"},
+        { "2a"}
+    };
+    private static final String[][] spillDupVals =
+    {
+        {},
+        { "1c"},
+        { "2b"},
+        { "3a", "3b", "3c"}
+    };
+
+    private static void insertDups( PreparedStatement insA, PreparedStatement 
insB, String[][] dupVals)
+        throws SQLException
+    {
+        for( int i = 0; i < dupVals.length; i++)
+        {
+            insA.setInt(1, -i);
+            insB.setInt(1, -i);
+            String[] vals = dupVals[i];
+            for( int j = 0; j < vals.length; j++)
+            {
+                insA.setString( 2, "A" + vals[j]);
+                insA.executeUpdate();
+                insB.setString( 2, "B" + vals[j]);
+                insB.executeUpdate();
+            }
+        }
+    } // end of insertDups
+    
+    private static String ca2Val( int col1Val)
+    {
+        return "A" + col1Val;
+    }
+    
+    private static String cb2Val( int col1Val)
+    {
+        return "B" + col1Val;
+    }
+    
+    private static void runStatements( Connection conn, int maxColValue, 
String[][][] dupVals)
+        throws SQLException
+    {
+        runJoin( conn, maxColValue, dupVals);
+        runDistinct( conn, maxColValue, dupVals);
+        runOj( conn, maxColValue, dupVals, "1", ojStmt1);
+        runCursor( conn, maxColValue, dupVals);
+    }
+
+    private static void runJoin( Connection conn, int maxColValue, 
String[][][] dupVals)
+        throws SQLException
+    {
+        System.out.println( "Running join");
+        int expectedRowCount = maxColValue; // plus expected duplicates, to be 
counted below
+        ResultSet rs = joinStmt.executeQuery();
+        BitSet joinRowFound = new BitSet( maxColValue);
+        int dupKeyCount = 0;
+        for( int i = 0; i < dupVals.length; i++)
+        {
+            if( dupVals[i].length > dupKeyCount)
+                dupKeyCount = dupVals[i].length;
+        }
+        BitSet[] dupsFound = new BitSet[dupKeyCount];
+        int[] dupCount = new int[ dupKeyCount];
+        for( int i = 0; i < dupKeyCount; i++)
+        {
+            // count the number of rows with column(1) == -i
+            dupCount[i] = 0;
+            for( int j = 0; j < dupVals.length; j++)
+            {
+                if( i < dupVals[j].length)
+                    dupCount[i] += dupVals[j][i].length;
+            }
+            dupsFound[i] = new BitSet(dupCount[i]*dupCount[i]);
+            expectedRowCount += dupCount[i]*dupCount[i];
+        }
+        
+        int count;
+        for( count = 0; rs.next(); count++)
+        {
+            int col1Val = rs.getInt(1);
+            if( rs.wasNull())
+            {
+                System.out.println( "Null in join column.");
+                errorCount++;
+                continue;
+            }
+            if( col1Val > maxColValue)
+            {
+                System.out.println( "Invalid value in first join column.");
+                errorCount++;
+                continue;
+            }
+            if( col1Val > 0)
+            {
+                if( joinRowFound.get( col1Val - 1))
+                {
+                    System.out.println( "Multiple rows for value " + col1Val);
+                    errorCount++;
+                }
+                joinRowFound.set( col1Val - 1);
+                String col2Val = trim( rs.getString(2));
+                String col3Val = trim( rs.getString(3));
+                if( !( ca2Val( col1Val).equals( col2Val) && cb2Val( 
col1Val).equals( col3Val)))
+                {
+                    System.out.println( "Incorrect value in column 2 or 3 of 
join.");
+                    errorCount++;
+                }
+            }
+            else // col1Val <= 0, there are duplicates in the source tables
+            {
+                int dupKeyIdx = -col1Val;
+                int col2Idx = findDupVal( rs, 2, 'A', dupKeyIdx, dupVals);
+                int col3Idx = findDupVal( rs, 3, 'B', dupKeyIdx, dupVals);
+                if( col2Idx < 0 || col3Idx < 0)
+                    continue;
+
+                int idx = col2Idx + dupCount[dupKeyIdx]*col3Idx;
+                if( dupsFound[dupKeyIdx].get( idx))
+                {
+                    System.out.println( "Repeat of row with key value 0");
+                    errorCount++;
+                }
+                dupsFound[dupKeyIdx].set( idx);
+            }
+        };
+        if( count != expectedRowCount)
+        {
+            System.out.println( "Incorrect number of rows in join.");
+            errorCount++;
+        }
+        rs.close();
+    } // end of runJoin
+
+    private static int findDupVal( ResultSet rs, int col, char prefix, int 
keyIdx, String[][][] dupVals)
+        throws SQLException
+    {
+        String colVal = rs.getString(col);
+        if( colVal != null && colVal.length() > 1 || colVal.charAt(0) == 
prefix)
+        {
+            colVal = trim( colVal.substring( 1));
+            int dupIdx = 0;
+            for( int i = 0; i < dupVals.length; i++)
+            {
+                if( keyIdx < dupVals[i].length)
+                {
+                    for( int j = 0; j < dupVals[i][keyIdx].length; j++, 
dupIdx++)
+                    {
+                        if( colVal.equals( dupVals[i][keyIdx][j]))
+                            return dupIdx;
+                    }
+                }
+            }
+        }
+        System.out.println( "Incorrect value in column " + col + " of join 
with duplicate keys.");
+        errorCount++;
+        return -1;
+    } // end of findDupVal
+        
+    private static String trim( String str)
+    {
+        if( str == null)
+            return str;
+        return str.trim();
+    }
+    
+    private static void runDistinct( Connection conn, int maxColValue, 
String[][][] dupVals)
+        throws SQLException
+    {
+        System.out.println( "Running distinct");
+        ResultSet rs = distinctStmt.executeQuery();
+        checkAllCa1( rs, false, maxColValue, dupVals, "DISTINCT");
+    }
+
+    private static void checkAllCa1( ResultSet rs, boolean expectDups, int 
maxColValue, String[][][] dupVals, String label)
+        throws SQLException
+    {
+        int dupKeyCount = 0;
+        for( int i = 0; i < dupVals.length; i++)
+        {
+            if( dupVals[i].length > dupKeyCount)
+                dupKeyCount = dupVals[i].length;
+        }
+        int[] expectedDupCount = new int[dupKeyCount];
+        int[] dupFoundCount = new int[dupKeyCount];
+        for( int i = 0; i < dupKeyCount; i++)
+        {
+            
+            dupFoundCount[i] = 0;
+            if( !expectDups)
+                expectedDupCount[i] = 1;
+            else
+            {
+                expectedDupCount[i] = 0;
+                for( int j = 0; j < dupVals.length; j++)
+                {
+                    if( i < dupVals[j].length)
+                        expectedDupCount[i] += dupVals[j][i].length;
+                }
+            }
+        }
+        BitSet found = new BitSet( maxColValue);
+        int count = 0;
+        boolean nullFound = false;
+        try
+        {
+            for( count = 0; rs.next();)
+            {
+                int col1Val = rs.getInt(1);
+                if( rs.wasNull())
+                {
+                    if( nullFound)
+                    {
+                        System.out.println( "Too many nulls returned by " + 
label);
+                        errorCount++;
+                        continue;
+                    }
+                    nullFound = true;
+                    continue;
+                }
+                if( col1Val <= -dupKeyCount || col1Val > maxColValue)
+                {
+                    System.out.println( "Invalid value returned by " + label);
+                    errorCount++;
+                    continue;
+                }
+                if( col1Val <= 0)
+                {
+                    dupFoundCount[ -col1Val]++;
+                    if( !expectDups)
+                    {
+                        if( dupFoundCount[ -col1Val] > 1)
+                        {
+                            System.out.println( label + " returned a 
duplicate.");
+                            errorCount++;
+                            continue;
+                        }
+                    }
+                    else if( dupFoundCount[ -col1Val] > expectedDupCount[ 
-col1Val])
+                    {
+                        System.out.println( label + " returned too many 
duplicates.");
+                        errorCount++;
+                        continue;
+                    }
+                }
+                else
+                {
+                    if( found.get( col1Val))
+                    {
+                        System.out.println( label + " returned a duplicate.");
+                        errorCount++;
+                        continue;
+                    }
+                    found.set( col1Val);
+                    count++;
+                }
+            }
+            if( count != maxColValue)
+            {
+                System.out.println( "Incorrect number of rows in " + label);
+                errorCount++;
+            }
+            for( int i = 0; i < dupFoundCount.length; i++)
+            {
+                if( dupFoundCount[i] != expectedDupCount[i])
+                {
+                    System.out.println( "A duplicate key row is missing in " + 
label);
+                    errorCount++;
+                    break;
+                }
+            }
+        }
+        finally
+        {
+            rs.close();
+        }
+    } // End of checkAllCa1
+
+    private static void runOj( Connection conn,
+                               int maxColValue,
+                               String[][][] dupVals,
+                               String ojId,
+                               PreparedStatement ojStmt)
+        throws SQLException
+    {
+        System.out.println( "Running outer join " + ojId);
+        ResultSet rs = ojStmt.executeQuery();
+        try
+        {
+            if( ! rs.next())
+            {
+                System.out.println( "Outer join " + ojId + " returned no 
rows.");
+                errorCount++;
+                return;
+            }
+            int cc1 = rs.getInt(1);
+            if( rs.wasNull())
+            {
+                System.out.println( "Outer join " + ojId + " returned null");
+                errorCount++;
+                return;
+            }
+            String cc2 = trim(rs.getString(2));
+            if( rs.wasNull())
+            {
+                System.out.println( "Outer join " + ojId + " returned null");
+                errorCount++;
+                return;
+            }
+            if( cc1 != 0 || ! "C0".equals( cc2))
+            {
+                System.out.println( "Outer join " + ojId + " returned wrong 
values");
+                errorCount++;
+                return;
+            }
+            if( rs.next())
+            {
+                System.out.println( "Outer join " + ojId + " returned too many 
rows.");
+                errorCount++;
+            }
+        }
+        finally
+        {
+            rs.close();
+        }
+    } // End of runOj
+
+    private static void runCursor( Connection conn, int maxColValue, 
String[][][] dupVals)
+        throws SQLException
+    {
+        System.out.println( "Running scroll insensitive cursor");
+        Statement stmt = 
conn.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, 
ResultSet.CONCUR_READ_ONLY);
+        ResultSet rs = stmt.executeQuery( "SELECT ca1 FROM ta");
+        checkAllCa1( rs, true, maxColValue, dupVals, "scroll insensitive 
cursor");
+    }
+}
Index: 
java/testing/org/apache/derbyTesting/functionTests/tests/store/TestDiskHashtable.java
===================================================================
--- 
java/testing/org/apache/derbyTesting/functionTests/tests/store/TestDiskHashtable.java
       (revision 0)
+++ 
java/testing/org/apache/derbyTesting/functionTests/tests/store/TestDiskHashtable.java
       (revision 0)
@@ -0,0 +1,432 @@
+/*
+
+   Derby - Class 
org.apache.derbyTesting.functionTests.tests.store.TestDiskHashtable
+
+   Copyright 2005 The Apache Software Foundation or its licensors, as 
applicable.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+ */
+
+package org.apache.derbyTesting.functionTests.tests.store;
+
+import java.sql.Connection;
+import java.sql.ResultSet;
+import java.sql.SQLException;
+import java.sql.Statement;
+
+import java.util.BitSet;
+import java.util.Enumeration;
+import java.util.HashMap;
+import java.util.Vector;
+
+import org.apache.derby.iapi.error.PublicAPI;
+import org.apache.derby.iapi.error.StandardException;
+import org.apache.derby.iapi.sql.conn.ConnectionUtil;
+import org.apache.derby.iapi.sql.conn.LanguageConnectionContext;
+import org.apache.derby.iapi.store.access.DiskHashtable;
+import org.apache.derby.iapi.store.access.KeyHasher;
+import org.apache.derby.iapi.store.access.TransactionController;
+import org.apache.derby.iapi.types.DataValueDescriptor;
+import org.apache.derby.iapi.types.Orderable;
+import org.apache.derby.iapi.types.SQLInteger;
+import org.apache.derby.iapi.types.SQLLongint;
+import org.apache.derby.iapi.types.SQLVarchar;
+import org.apache.derby.tools.ij;
+import org.apache.derbyTesting.functionTests.util.TestUtil;
+
+/**
+ * This program tests the org.apache.derby.iapi.store.access.DiskHashtable 
class.
+ * The unit test interface is not used because that is undocumented and very 
difficult to decipher.
+ * Furthermore it is difficult to diagnose problems when using the unit test 
interface.
+ *
+ * Created: Wed Feb 09 15:44:12 2005
+ *
+ * @author <a href="mailto:[EMAIL PROTECTED]">Jack Klebanoff</a>
+ * @version 1.0
+ */
+public class TestDiskHashtable 
+{
+    private TransactionController tc;
+    private int failed = 0;
+    
+    public static void main( String args[])
+    {
+        int failed = 1;
+
+               REPORT("Test DiskHashtable starting");
+        try
+        {
+                       // use the ij utility to read the property file and
+                       // make the initial connection.
+                       ij.getPropertyArg(args);
+                       Connection conn = ij.startJBMS();
+            Statement stmt = conn.createStatement();
+            stmt.execute("CREATE FUNCTION testDiskHashtable() returns INTEGER 
EXTERNAL NAME 
'org.apache.derbyTesting.functionTests.tests.store.TestDiskHashtable.runTests' 
LANGUAGE JAVA PARAMETER STYLE JAVA");
+            ResultSet rs = stmt.executeQuery( "values( testDiskHashtable())");
+            if( rs.next())
+                failed = rs.getInt(1);
+            stmt.close();
+            conn.close();
+        }
+        catch( SQLException e)
+        {
+                       TestUtil.dumpSQLExceptions( e);
+            failed = 1;
+        }
+        catch( Throwable t)
+        {
+                       REPORT("FAIL -- unexpected exception:" + t.toString());
+            failed = 1;
+               }
+        REPORT( (failed == 0) ? "OK" : "FAILED");
+        System.exit( (failed == 0) ? 0 : 1);
+    }
+
+    private void REPORT_FAILURE(String msg)
+    {
+        failed = 1;
+        REPORT( msg);
+    }
+    
+    private static void REPORT(String msg)
+    {
+        System.out.println( msg);
+    }
+    
+    public static int runTests() throws SQLException
+    {
+        TestDiskHashtable tester = new TestDiskHashtable();
+        return tester.doIt();
+    }
+
+    private TestDiskHashtable() throws SQLException
+    {
+        LanguageConnectionContext lcc = ConnectionUtil.getCurrentLCC();
+        if( lcc == null)
+            throw new SQLException( "Cannot get the LCC");
+        tc = lcc.getTransactionExecute();
+    }
+
+    private int doIt() throws SQLException
+    {
+               try {
+
+
+            REPORT( "Starting single key, keep duplicates test");
+            testOneVariant( tc, false, singleKeyTemplate, singleKeyCols, 
singleKeyRows);
+            REPORT( "Starting single key, remove duplicates test");
+            testOneVariant( tc, true, singleKeyTemplate, singleKeyCols, 
singleKeyRows);
+            REPORT( "Starting multiple key, keep duplicates test");
+            testOneVariant( tc, false, multiKeyTemplate, multiKeyCols, 
multiKeyRows);
+            REPORT( "Starting multiple key, remove duplicates test");
+            testOneVariant( tc, true, multiKeyTemplate, multiKeyCols, 
multiKeyRows);
+
+                       tc.commit();
+               }
+               catch (StandardException se)
+               {
+            throw PublicAPI.wrapStandardException( se);
+        }
+        return failed;
+    } // end of doIt
+
+    private static final DataValueDescriptor[] singleKeyTemplate = { new 
SQLInteger(), new SQLVarchar()};
+    private static final int[] singleKeyCols = {0};
+    private static final DataValueDescriptor[][] singleKeyRows =
+    {
+        {new SQLInteger(1), new SQLVarchar("abcd")},
+        {new SQLInteger(2), new SQLVarchar("abcd")},
+        {new SQLInteger(3), new SQLVarchar("e")},
+        {new SQLInteger(1), new SQLVarchar("zz")}
+    };
+
+    private static final DataValueDescriptor[] multiKeyTemplate = { new 
SQLLongint(), new SQLVarchar(), new SQLInteger()};
+    private static final int[] multiKeyCols = {1, 0};
+    private static final DataValueDescriptor[][] multiKeyRows =
+    {
+        {new SQLLongint(1), new SQLVarchar( "aa"), 
multiKeyTemplate[2].getNewNull()},
+        {new SQLLongint(2), new SQLVarchar( "aa"), new SQLInteger(1)},
+        {new SQLLongint(2), new SQLVarchar( "aa"), new SQLInteger(2)},
+        {new SQLLongint(2), new SQLVarchar( "b"), new SQLInteger(1)}
+    };
+
+    private static final int LOTS_OF_ROWS_COUNT = 50000;
+    
+    private void testOneVariant( TransactionController tc,
+                                 boolean removeDups,
+                                 DataValueDescriptor[] template,
+                                 int[] keyCols,
+                                 DataValueDescriptor[][] rows)
+        throws StandardException
+    {
+        DiskHashtable dht = new DiskHashtable(tc, template, keyCols, 
removeDups, false);
+        boolean[] isDuplicate = new boolean[ rows.length];
+        boolean[] found = new boolean[ rows.length];
+        HashMap simpleHash = new HashMap( rows.length);
+
+        testElements( removeDups, dht, keyCols, 0, rows, simpleHash, 
isDuplicate, found);
+
+        for( int i = 0; i < rows.length; i++)
+        {
+            Object key = KeyHasher.buildHashKey( rows[i], keyCols);
+            Vector al = (Vector) simpleHash.get( key);
+            isDuplicate[i] = (al != null);
+            if( al == null)
+            {
+                al = new Vector(4);
+                simpleHash.put( key, al);
+            }
+            if( (!removeDups) || !isDuplicate[i])
+                al.add( rows[i]);
+            
+            if( dht.put( key, rows[i]) != (removeDups ? (!isDuplicate[i]) : 
true))
+                REPORT_FAILURE( "  put returned wrong value on row " + i);
+
+            for( int j = 0; j <= i; j++)
+            {
+                key = KeyHasher.buildHashKey( rows[j], keyCols);
+                if( ! rowsEqual( dht.get( key), simpleHash.get( key)))
+                    REPORT_FAILURE( "  get returned wrong value on key " + j);
+            }
+
+            testElements( removeDups, dht, keyCols, i+1, rows, simpleHash, 
isDuplicate, found);
+        }
+        // Remove them
+        for( int i = 0; i < rows.length; i++)
+        {
+            Object key = KeyHasher.buildHashKey( rows[i], keyCols);
+            if( ! rowsEqual( dht.remove( key), simpleHash.get( key)))
+                REPORT_FAILURE( "  remove returned wrong value on key " + i);
+            simpleHash.remove( key);
+            if( dht.get( key) != null)
+                REPORT_FAILURE( "  remove did not delete key " + i);
+        }
+        testElements( removeDups, dht, keyCols, 0, rows, simpleHash, 
isDuplicate, found);
+
+        testLargeTable( dht, keyCols, rows[0]);
+        dht.close();
+    } // end of testOneVariant
+
+    private void testLargeTable( DiskHashtable dht,
+                                 int[] keyCols,
+                                 DataValueDescriptor[] aRow)
+        throws StandardException
+    {
+        // Add a lot of elements
+        // If there are two or more key columns then we will vary the first 
two key columns, using an approximately
+        // square matrix of integer key values. Because the hash generator is 
commutative key (i,j) hashes into the
+        // same bucket as key (j,i), testing the case where different keys 
hash into the same bucket.
+        int key1Count = (keyCols.length > 1) ? ((int) Math.round( Math.sqrt( 
(double) LOTS_OF_ROWS_COUNT))) : 1;
+        int key0Count = (LOTS_OF_ROWS_COUNT + key1Count - 1)/key1Count;
+
+        DataValueDescriptor[] row = new DataValueDescriptor[ aRow.length];
+        for( int i = 0; i < row.length; i++)
+            row[i] = aRow[i].getClone();
+        
+        for( int key0Idx = 0; key0Idx < key0Count; key0Idx++)
+        {
+            row[ keyCols[0]].setValue( key0Idx);
+            for( int key1Idx = 0; key1Idx < key1Count; key1Idx++)
+            {
+                if( keyCols.length > 1)
+                    row[ keyCols[1]].setValue( key1Idx);
+                Object key = KeyHasher.buildHashKey( row, keyCols);
+                if( ! dht.put( key, row))
+                {
+                    REPORT_FAILURE( "  put returned wrong value for key(" + 
key0Idx + "," + key1Idx + ")");
+                    key0Idx = key0Count;
+                    break;
+                }
+            }
+        }
+        for( int key0Idx = 0; key0Idx < key0Count; key0Idx++)
+        {
+            row[ keyCols[0]].setValue( key0Idx);
+            for( int key1Idx = 0; key1Idx < key1Count; key1Idx++)
+            {
+                if( keyCols.length > 1)
+                    row[ keyCols[1]].setValue( key1Idx);
+                Object key = KeyHasher.buildHashKey( row, keyCols);
+                if( ! rowsEqual( dht.get( key), row))
+                {
+                    REPORT_FAILURE( "  large table get returned wrong value 
for key(" + key0Idx + "," + key1Idx + ")");
+                    key0Idx = key0Count;
+                    break;
+                }
+            }
+        }
+        BitSet found = new BitSet(key0Count * key1Count);
+        Enumeration elements = dht.elements();
+        while( elements.hasMoreElements())
+        {
+            Object el = elements.nextElement();
+            if( ! (el instanceof DataValueDescriptor[]))
+            {
+                REPORT_FAILURE( "  large table enumeration returned wrong 
element type");
+                break;
+            }
+            DataValueDescriptor[] fetchedRow = (DataValueDescriptor[]) el;
+            
+            int i = fetchedRow[ keyCols[0]].getInt() * key1Count;
+            if( keyCols.length > 1)
+                i += fetchedRow[ keyCols[1]].getInt();
+            if( i >= key0Count * key1Count)
+            {
+                REPORT_FAILURE( "  large table enumeration returned invalid 
element");
+                break;
+            }
+                
+            if( found.get(i))
+            {
+                REPORT_FAILURE( "  large table enumeration returned same 
element twice");
+                break;
+            }
+            found.set(i);
+        }
+        for( int i = key0Count * key1Count - 1; i >= 0; i--)
+        {
+            if( !found.get(i))
+            {
+                REPORT_FAILURE( "  large table enumeration missed at least one 
element");
+                break;
+            }
+        }
+    } // end of testLargeTable
+
+    private void testElements( boolean removeDups,
+                               DiskHashtable dht,
+                               int[] keyCols,
+                               int rowCount,
+                               DataValueDescriptor[][] rows,
+                               HashMap simpleHash,
+                               boolean[] isDuplicate,
+                               boolean[] found)
+        throws StandardException
+    {
+        for( int i = 0; i < rowCount; i++)
+            found[i] = false;
+        
+        for( Enumeration e = dht.elements(); e.hasMoreElements();)
+        {
+            Object el = e.nextElement();
+            if( el == null)
+            {
+                REPORT_FAILURE( "  table enumeration returned a null element");
+                return;
+            }
+            if( el instanceof DataValueDescriptor[])
+                checkElement( (DataValueDescriptor[]) el, rowCount, rows, 
found);
+            else if( el instanceof Vector)
+            {
+                Vector v = (Vector) el;
+                for( int i = 0; i < v.size(); i++)
+                    checkElement( (DataValueDescriptor[]) v.get(i), rowCount, 
rows, found);
+            }
+            else if( el == null)
+            {
+                REPORT_FAILURE( "  table enumeration returned an incorrect 
element type");
+                return;
+            }
+        }
+        for( int i = 0; i < rowCount; i++)
+        {
+            if( (removeDups && isDuplicate[i]))
+            {
+                if( found[i])
+                {
+                    REPORT_FAILURE( "  table enumeration did not remove 
duplicates");
+                    return;
+                }
+            }
+            else if( ! found[i])
+            {
+                REPORT_FAILURE( "  table enumeration missed at least one 
element");
+                return;
+            }
+        }
+    } // end of testElements
+
+    private void checkElement( DataValueDescriptor[] fetchedRow,
+                               int rowCount,
+                               DataValueDescriptor[][] rows,
+                               boolean[] found)
+        throws StandardException
+    {
+        for( int i = 0; i < rowCount; i++)
+        {
+            if( rowsEqual( fetchedRow, rows[i]))
+            {
+                if( found[i])
+                {
+                    REPORT_FAILURE( "  table enumeration returned the same 
element twice");
+                    return;
+                }
+                found[i] = true;
+                return;
+            }
+        }
+        REPORT_FAILURE( "  table enumeration returned an incorrect element");
+    } // end of checkElement
+
+    private boolean rowsEqual( Object r1, Object r2)
+        throws StandardException
+    {
+        if( r1 == null)
+            return r2 == null;
+
+        if( r1 instanceof DataValueDescriptor[])
+        {
+            DataValueDescriptor[] row1 = (DataValueDescriptor[]) r1;
+            DataValueDescriptor[] row2;
+            
+            if( r2 instanceof Vector)
+            {
+                Vector v2 = (Vector) r2;
+                if( v2.size() != 1)
+                    return false;
+                row2 = (DataValueDescriptor[]) v2.elementAt(0);
+            }
+            else if( r2 instanceof DataValueDescriptor[])
+                row2 = (DataValueDescriptor[]) r2;
+            else
+                return false;
+            
+            if( row1.length != row2.length)
+                return false;
+            for( int i = 0; i < row1.length; i++)
+            {
+                if( ! row1[i].compare( Orderable.ORDER_OP_EQUALS, row2[i], 
true, true))
+                    return false;
+            }
+            return true;
+        }
+        if( r1 instanceof Vector)
+        {
+            if( !(r2 instanceof Vector))
+                return false;
+            Vector v1 = (Vector) r1;
+            Vector v2 = (Vector) r2;
+            if( v1.size() != v2.size())
+                return false;
+            for( int i = v1.size() - 1; i >= 0; i--)
+            {
+                if( ! rowsEqual( v1.elementAt( i), v2.elementAt(i)))
+                    return false;
+            }
+            return true;
+        }
+        // What is it then?
+        return r1.equals( r2);
+    } // end of rowsEqual
+}
Index: 
java/testing/org/apache/derbyTesting/functionTests/master/TestDiskHashtable.out
===================================================================
--- 
java/testing/org/apache/derbyTesting/functionTests/master/TestDiskHashtable.out 
    (revision 0)
+++ 
java/testing/org/apache/derbyTesting/functionTests/master/TestDiskHashtable.out 
    (revision 0)
@@ -0,0 +1,6 @@
+Test DiskHashtable starting
+Starting single key, keep duplicates test
+Starting single key, remove duplicates test
+Starting multiple key, keep duplicates test
+Starting multiple key, remove duplicates test
+OK
Index: java/testing/org/apache/derbyTesting/functionTests/master/SpillHash.out
===================================================================
--- java/testing/org/apache/derbyTesting/functionTests/master/SpillHash.out     
(revision 0)
+++ java/testing/org/apache/derbyTesting/functionTests/master/SpillHash.out     
(revision 0)
@@ -0,0 +1,10 @@
+Running join
+Running distinct
+Running outer join 1
+Running scroll insensitive cursor
+Growing database.
+Running join
+Running distinct
+Running outer join 1
+Running scroll insensitive cursor
+PASSED.
Index: 
java/testing/org/apache/derbyTesting/functionTests/suites/derbylang.runall
===================================================================
--- java/testing/org/apache/derbyTesting/functionTests/suites/derbylang.runall  
(revision 155029)
+++ java/testing/org/apache/derbyTesting/functionTests/suites/derbylang.runall  
(working copy)
@@ -107,6 +107,7 @@
 lang/select.sql
 lang/simpleThreadWrapper.java
 lang/specjPlans.sql
+lang/SpillHash.java
 lang/staleplans.sql
 lang/stmtCache0.sql
 lang/stmtCache1.sql
Index: build.xml
===================================================================
--- build.xml   (revision 155029)
+++ build.xml   (working copy)
@@ -413,6 +413,7 @@
       <arg value="java.math.BigDecimal"/>
       <arg value="java.util.ArrayList"/>
       <arg value="java.util.GregorianCalendar"/>
+      <arg value="java.util.Vector"/>
     </java>
 
     <javac

[PATCH] BackingStoreHashtable

Reply via email to