I believe he means to say saves a bit more "memory" rather then "money" -
but otherwise, yeah ;)
On 12/8/06, Mark Proctor < [EMAIL PROTECTED]> wrote:
>
> Found your original email. Thought I would add other than iteration
> there aren't large amounts of performance benefits for my implementation.
> However if your Objects can be the Entry then there is no need to create
> separate Entry wrapper objects, which saves a bit more money - in some cases
> you just store objects, avoiding the key/value field pairs. You can also
> gain performance by indexing those collections. This collection applies all
> the above and move optimisations:
>
>
http://anonsvn.labs.jboss.com/labs/jbossrules/trunk/drools-core/src/main/java/org/drools/util/TupleHashTable.java
>
>
> Mark
>
> Mark Proctor wrote:
>
> Hmm for some reason I don't remember reading that email from Matt,
> sorry for not replying. Note mine does not implement any java.utilIterfaces,
nor do I plan too.
>
>
http://anonsvn.labs.jboss.com/labs/jbossrules/trunk/drools-core/src/main/java/org/drools/util/AbstractHashTable.java
>
>
http://anonsvn.labs.jboss.com/labs/jbossrules/trunk/drools-core/src/main/java/org/drools/util/ObjectHashMap.java
>
>
> Mark
>
> Peter Van Weert wrote:
>
> Matt Casters wrote:
>
> Hi Mark,
>
> I came across you e-mail exchange with regards to the customer HashMap
> you made for the Drools project. (and of-course, this blog:
>
> http://woolfel.blogspot.com/2006/10/custom-hashtable.html)
> The open source Kettle project (http://kettle.pentaho.org
> ) could use
> exactly this piece of software for the in-memory lookup of large datasets.
> Typically this is used to store database id-pairs (foreign key lookup),
> however we see that a single long integer pair in a HashMap consumes
>
> around 128 bytes (!).
>
> I would **really** appreciate it if you could push us in the general
> direction of the source-code. (I assume that it is open-sourced too
> under Apache license?)
>
> Kind regards,
>
>
> Matt
> ____________________________________________
> Matt Casters, Chief Data Integration
> Pentaho, Open Source Business Intelligence
>
> http://www.pentaho.org <http://www.pentaho.org/> <http://www.pentaho.org/> --
>
> [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
> Fonteinstraat 70, 9400 OKEGEM, BELGIUM
> Tel. +32 (0) 486 97 29 37
>
> In attachment the customized hash map I use for the JCHR runtime (not
> yet in the released version). It is basically a stripped down version of
> the very general-purpose HashMap, taking into account some application
>
> specific preconditions (like not being null, sometimes knowing something
> is new or already in the map, etc). It didn't help much for time
> performance in my case, though it should improve space a bit (I haven't
>
> tested it).
>
> Do not worry too much about licensing issues, I don't either for now.
> For instance, I do not know whether I was allowed to use and modify the
> Sun implementation the way I did ;-)
> If you use ideas from it, then acknowledging my name somewhere in the
>
> Javadoc would be appreciated, though I will probably never notice it if
> you don't.
>
> Cheers,
> Peter
>
>
>
> Disclaimer:
> http://www.kuleuven.be/cwis/email_disclaimer.htm
>
> ------------------------------
>
> package runtime.hash;
>
> import java.util.HashMap;
> import java.util.Iterator;
> import java.util.NoSuchElementException;
>
> /*
> * IDEAS
> * - maybe we could use double linked entry lists and have keys
> * (i.e. entries) remove themselves (i.e. no more lookup needed)?
> * - move the second level hash function to the elements themselves:
>
> * . no more need for extra field in entry (downside: extra indirection)
> * . no more recalculation of second level hash if not necessary
> * - merge Entry and actual entry classes in some cases
> */
>
>
> /**
> * <p>
> * Special purpose hash index, loosely based on the general purpose [EMAIL
PROTECTED] HashMap}
> * implementation (version 1.72) by Doug Lea, Josh Bloch, Arthur van Hoff
> * and Neal Gafter. Notable changes include:
>
> * </p>
> * <ul>
> * <li>All non-essential code has been stripped and/or inlined.</li>
> * <li>
> * Domain-specific precondintions have been taken into account: keys
> * can never be <code>null</code>, keys generally never have been
>
> * inserted before, ... This allowed us to avoid several tests
> * </li>
> * <li>
> * Keys and values are merged into entries (turning it more into a set
> * then a map really). You can iterate over all entries in the index.
>
> * </li>
> * <li>
> * The [EMAIL PROTECTED] Iterator}s returned by [EMAIL PROTECTED]
#iterator()} are not fail-safe.
> * (i.e.: take care when using this class, see also [EMAIL PROTECTED]
#safeIterator()}!),
> * </li>
> * <li>
>
> * Some extra, slightly more specific methods have been added
> * </li>
> * </ul>
> * <p>
> * In fact, not much is left from the original code. Or, to paraphrase Peter
Lin:
> * <br/>
>
> * <q>If java.util.HashMap was a women, this custom hashtable would be naked.
> * no bikini, no g-string.</q>
> * <br/>
> * </p>
> * <p>
> * This implementation provides constant-time performance for the basic
>
> * operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
> * disperses the elements properly among the buckets. Iteration over
> * collection views requires time proportional to the "capacity" of the
>
> * <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
> * of key-value mappings). Thus, it's very important not to set the initial
> * capacity too high (or the load factor too low) if iteration performance is
>
> * important.
> * </p>
> * <p>
> * An instance of <tt>HashIndex</tt> has two parameters that affect its
> * performance: <i>initial capacity</i> and <i>load factor</i>. The
>
> * <i>capacity</i> is the number of buckets in the hash table, and the initial
> * capacity is simply the capacity at the time the hash table is created. The
> * <i>load factor</i> is a measure of how full the hash table is allowed to
>
> * get before its capacity is automatically increased. When the number of
> * entries in the hash table exceeds the product of the load factor and the
> * current capacity, the hash table is <i>rehashed</i> (that is, internal data
>
> * structures are rebuilt) so that the hash table has approximately twice the
> * number of buckets.
> * </p>
> * <p>
> * As a general rule, the default load factor (.75) offers a good tradeoff
>
> * between time and space costs. Higher values decrease the space overhead
> * but increase the lookup cost (reflected in most of the operations of the
> * <tt>HashMap</tt> class, including <tt>get</tt> and <tt>put</tt>). The
>
> * expected number of entries in the map and its load factor should be taken
> * into account when setting its initial capacity, so as to minimize the
> * number of rehash operations. If the initial capacity is greater
>
> * than the maximum number of entries divided by the load factor, no
> * rehash operations will ever occur.
> * </p>
> * <p>
> * If many mappings are to be stored in a <tt>HashMap</tt> instance,
>
> * creating it with a sufficiently large capacity will allow the mappings to
> * be stored more efficiently than letting it perform automatic rehashing as
> * needed to grow the table.
> * </p>
> * <p>
>
> * <strong>Note that this implementation is not synchronized.</strong>
> * If multiple threads access a hash map concurrently, and at least one of
> * the threads modifies the map structurally, it <i>must</i> be
>
> * synchronized externally. (A structural modification is any operation
> * that adds or deletes one or more mappings; merely changing the value
> * associated with a key that an instance already contains is not a
>
> * structural modification.) This is typically accomplished by
> * synchronizing on some object that naturally encapsulates the map.
> *
> * @param <E> the type of mapped values
> *
> * @author ( Doug Lea )
>
> * @author ( Josh Bloch )
> * @author ( Arthur van Hoff )
> * @author ( Neal Gafter )
> * @see HashMap
> *
> * @author Peter Van Weert
> */
> public final class HashIndex<E> implements Iterable<E> {
>
>
> /**
> * The default initial capacity - MUST be a power of two.
> */
> final static int DEFAULT_INITIAL_CAPACITY = 16;
>
> /**
> * The maximum capacity, used if a higher value is implicitly specified
>
> * by either of the constructors with arguments.
> * MUST be a power of two <= 1<<30.
> */
> final static int MAXIMUM_CAPACITY = 1 << 30;
>
> /**
> * The load factor used when none specified in constructor.
>
> */
> final static float DEFAULT_LOAD_FACTOR = 0.75f;
>
> /**
> * The table, resized as necessary. Length MUST Always be a power of two.
> */
> Entry<E>[] table;
>
> /**
>
> * The number of key-value mappings contained in this map.
> */
> int size;
>
> /**
> * The next size value at which to resize (capacity * load factor).
> */
> int threshold;
>
>
> /**
> * The load factor for the hash table.
> */
> final float loadFactor;
>
> /**
> * Constructs an empty <tt>HashIndex</tt> with the specified initial
> * capacity and load factor.
>
> *
> * @param initialCapacity the initial capacity
> * @param loadFactor the load factor
> * @throws IllegalArgumentException if the initial capacity is negative
> * or the load factor is nonpositive
>
> */
> @SuppressWarnings("unchecked")
> public HashIndex(int initialCapacity, float loadFactor) {
> if (initialCapacity < 0)
> throw new IllegalArgumentException(
> "Illegal initial capacity: " + initialCapacity);
>
> if (initialCapacity > MAXIMUM_CAPACITY)
> initialCapacity = MAXIMUM_CAPACITY;
> if (loadFactor <= 0 || Float.isNaN(loadFactor))
> throw new IllegalArgumentException(
>
> "Illegal load factor: " + loadFactor);
>
> // Find a power of 2 >= initialCapacity
> int capacity = 1;
> while (capacity < initialCapacity)
> capacity <<= 1;
>
>
> this.loadFactor = loadFactor;
> threshold = (int)(capacity * loadFactor);
> table = new Entry[capacity];
> }
>
> /**
> * Constructs an empty <tt>HashMap</tt> with the specified initial
>
> * capacity and the default load factor (0.75).
> *
> * @param initialCapacity the initial capacity.
> * @throws IllegalArgumentException if the initial capacity is negative.
> */
> public HashIndex(int initialCapacity) {
>
> this(initialCapacity, DEFAULT_LOAD_FACTOR);
> }
>
> /**
> * Constructs an empty <tt>HashMap</tt> with the default initial capacity
> * (16) and the default load factor (0.75
> ).
> */
> @SuppressWarnings("unchecked")
> public HashIndex() {
> this.loadFactor = DEFAULT_LOAD_FACTOR;
> threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
>
> table = new Entry[DEFAULT_INITIAL_CAPACITY];
> }
>
> /**
> * Returns the number of entries in this index.
> *
> * @return the number of entries in this index.
> */
> public int size() {
>
> return size;
> }
>
> /**
> * Returns <tt>true</tt> if this index contains no entries.
> *
> * @return <tt>true</tt> if this index contains no entries
> */
>
> public boolean isEmpty() {
> return size == 0;
> }
>
> /**
> * [EMAIL PROTECTED]
> * <br/>
> * <em>The returned iterator is not fail-safe under structural
modifications!</em>
>
> * (A structural modification is any operation
> * that adds or deletes one or more mappings; merely changing the value
> * associated with a key that an instance already contains is not a
> * structural modification).
>
> *
> * @see #safeIterator()
> */
> public Iterator<E> iterator() {
> return new HashIterator<E>(table);
> }
>
> /**
> * Returns an iterator over all entries currently in the store. Structural
>
> * changes to the table are <em>not</em> reflected in the iteration.
> * (A structural modification is any operation
> * that adds or deletes one or more mappings; merely changing the value
>
> * associated with a key that an instance already contains is not a
> * structural modification).
> *
> * @return an iterator over all entries currently in the store. Structural
> * changes to the table are <em>not</em> reflected in the iteration.
>
> *
> * @see #iterator()
> */
> public Iterator<E> safeIterator() {
> return new HashIterator<E>(table.clone());
> }
>
> /**
> * Returns the entry to which the specified key is "mapped",
>
> * or, in other words, if this index contains an entry that is equal
> * to the given key, or [EMAIL PROTECTED] null} if this is not the case.
> *
> * <p>More formally, if this index contains an entry [EMAIL PROTECTED] e}
>
> * such that [EMAIL PROTECTED] key.equals(e))}, then this method returns
> * [EMAIL PROTECTED] e}; otherwise it returns [EMAIL PROTECTED] null}.
> * (There can be at most one such entry.)
> *
> * @param key
> * The key to look up.
>
> *
> * @see #put(Object)
> * @see #insert(Object)
> */
> public E get(Object key) {
> int hash = key.hashCode();
> hash ^= (hash >>> 20) ^ (hash >>> 12);
>
> hash = hash ^ (hash >>> 7) ^ (hash >>> 4);
>
> for (Entry<E> e = table[(hash & (table.length-1))];
> e != null;
> e = e.next) {
> E k;
>
> if (e.hash == hash && key.equals(k = e.entry))
> return k;
> }
> return null;
> }
>
> /**
> * If an equal entry was already present, the latter entry is returned,
>
> * and the method behaves as a [EMAIL PROTECTED] #get(Object)} operation.
> * If no equal entry is present, the given <code>entry</code> is
> * inserted (i.e. the method behaves as a [EMAIL PROTECTED]
#put(Object)}),
>
> * and <code>null</code> is returned.
> *
> * @param entry
> * The entry to insert if no equal entry is already present.
> * @return If an equal entry was already present, the latter entry
>
> * is returned; else the result will be <code>null</code>.
> */
> public E insert(E entry) {
> int hash = entry.hashCode();
> hash ^= (hash >>> 20) ^ (hash >>> 12);
>
> hash = hash ^ (hash >>> 7) ^ (hash >>> 4);
> int i = hash & (table.length-1);
>
> for (Entry<E> e = table[i]; e != null; e = e.next) {
> E k;
>
> if (e.hash == hash && entry.equals(k = e.entry))
> return k;
> }
>
> // before returning null, we do a put!
> table[i] = new Entry<E>(hash, entry, table[i]);
>
> resize();
>
> return null;
> }
>
> /**
> * Adds the given entry to the index. No equal entry is already
> * stored in the index: this is an <em>essential precondition</em>
>
> * to this method. If this is not guaranteed, maybe [EMAIL PROTECTED]
#putAgain(Object)}
> * is what you need.
> *
> * @param entry
> * The new entry put in the store.
> *
> * @see #putAgain(Object)
>
> */
> public void put(E entry) {
> int hash = entry.hashCode();
> hash ^= (hash >>> 20) ^ (hash >>> 12);
> hash = hash ^ (hash >>> 7) ^ (hash >>> 4);
>
> int i = hash & (table.length-1);
>
> table[i] = new Entry<E>(hash, entry, table[i]);
> resize();
> }
>
> /**
> * Adds the given entry to the index, or replaces an equal entry
>
> * already stored in the index. If you are sure no equal entry is
> * already stored, using the [EMAIL PROTECTED] #put(Object)} method is
more
> * efficient.
> *
> * @param entry
> * The entry to add to the store.
>
> * @return The replaced, equal entry if such an entry was present,
> * or <code>null</code> otherwise.
> */
> public E putAgain(E entry) {
> int hash = entry.hashCode();
> hash ^= (hash >>> 20) ^ (hash >>> 12);
>
> hash = hash ^ (hash >>> 7) ^ (hash >>> 4);
> int i = hash & (table.length-1);
>
> for (Entry<E> e = table[i]; e != null; e = e.next) {
> E k;
>
> if (e.hash == hash && entry.equals(k = e.entry)) {
> E old = k;
> e.entry = entry;
> return old;
> }
> }
>
> table[i] = new Entry<E>(hash, entry, table[i]);
>
> resize();
>
> return null;
> }
>
>
> /**
> * Rehashes the contents of this map into a new array with a
> * larger capacity.
> *
> * If current capacity is MAXIMUM_CAPACITY, this method does not
>
> * resize the map, but sets threshold to Integer.MAX_VALUE.
> */
> @SuppressWarnings("unchecked")
> private final void resize() {
> if (size++ >= threshold) {
> Entry[] oldTable = table;
>
> int oldCapacity = oldTable.length;
>
> if (oldCapacity == MAXIMUM_CAPACITY) {
> threshold = Integer.MAX_VALUE;
> return;
> }
>
>
> int newCapacity = 2 * table.length;
>
> Entry<E>[] newTable = new Entry[newCapacity];
>
> for (int j = 0; j < oldTable.length; j++) {
> Entry<E> e = oldTable[j];
>
> if (e != null) {
> oldTable[j] = null;
> do {
> Entry<E> next = e.next;
> int i = e.hash & (newCapacity-1);
>
> e.next = newTable[i];
> newTable[i] = e;
> e = next;
> } while (e != null);
> }
> }
> table = newTable;
>
> threshold = (int)(newCapacity * loadFactor);
> }
> }
>
> /**
> * Removes the specified entry from this map. A <em>specific
> * precondition</em> to this method is that the provided
>
> * entry is in fact contained by the index (entries are
> * also only compared using reference comparison
> * (i.e. <code>==</code>) by this method).
> *
> * @param entry
>
> * the entry that is to be removed from the index
> */
> public void remove(E entry) {
> int hash = entry.hashCode();
> hash ^= (hash >>> 20) ^ (hash >>> 12);
> hash = hash ^ (hash >>> 7) ^ (hash >>> 4);
>
> int i = hash & (table.length-1);
>
> Entry<E> prev = table[i], e = prev;
>
> while (e != null) {
> Entry<E> next = e.next;
> if (e.entry == entry) {
>
> size--;
> if (prev == e)
> table[i] = next;
> else
> prev.next = next;
> return;
> }
> prev = e;
>
> e = next;
> }
> }
>
> /**
> * Removes all entries from this index.
> * The index will be empty after this call returns.
> */
> public void clear() {
> Entry[] tab = table;
>
> for (int i = 0; i < tab.length; i++)
> tab[i] = null;
> size = 0;
> }
>
> private final static class Entry<E> {
> E entry;
> Entry<E> next;
>
> final int hash;
>
> /**
> * Creates new entry.
> */
> Entry(int h, E entry, Entry<E> next) {
> this.entry = entry;
> this.next = next;
> hash = h;
>
> }
>
> @Override
> public final String toString() {
> return entry.toString();
> }
> }
>
>
> private final static class HashIterator<E> implements Iterator<E> {
>
> Entry<E> next; // next entry to return
> int index; // current slot
> Entry<E>[] table;
>
> HashIterator(Entry<E>[] table) {
> this.table = table;
>
> /* we know table.length is not 0 since 0 is not a power of two */
> while ((next = table[index]) == null && ++index < table.length)
{/**/}
> }
>
> public final boolean hasNext() {
>
> return next != null;
> }
>
> public E next() {
> Entry<E> e = next;
> if (e == null)
> throw new NoSuchElementException();
> if ((next =
> e.next) == null)
> while (++index < table.length && (next = table[index]) ==
null) {/**/}
> return e.entry;
> }
>
> public void remove() {
> throw new UnsupportedOperationException();
>
> }
> }
> }
>
>
>
>