[Haskell] Re: performance tuning Data.FiniteMap

2004-03-02 Thread oleg

[BTW, should we move to Haskell-Cafe?]

 Because updates are not so infrequent that I want to pay the cost of
 replicating the entire array every update (or every ten!).  I'm
 willing to exchange *some* read time for faster update. Also, because
 small array copies may be sufficiently faster than tree traversals
 that I may pay very little extra for faster reads.

 FYI, my current code looks like this:

I'm afraid I'm somewhat confused. The hash-table related code makes
the copy of the whole array every purgatory-size times. Thus, given the
sequence of 'n' unique inserts (which don't trigger the
major_rebuild), at most 2*n/|purgatory| elements will be moved.

As I understand your code, in particular,

   insert (ArrMap proto toBase ht) key elt = ArrMap proto toBase newHT
  where newHT= insert' proto ht (toBase key) elt
   insert' _ (HT x _) [] = HT x
   insert' proto (HT Nothing y) key = insert' proto (HT (Just proto) y) key
   insert' p (HT (Just ar) y) (k:ey) = \val - HT (Just $ newArray val) y
   where newArray val = ar//[(k,insert' p (ar!k) ey val)]

you make a copy of an array of the size |base| |key| times. _If_ the
tree is kept balanced and filled, then the sequence of n inserts will
copy (log n)/(log |base|)*|base| elements. For small n and large
|base|, that can be a lot. For example,

   testMap=newMap (chr 0) (chr 255) id
   main = do print $ lookup (insert testMap abc (Just def)) abc

involves copying a 256-element array three times. Right? 

I guess we have come to the point where we really need to know the
distribution of reads and writes, the length of the key (and if it is
bounded), and the distribution of key values. We must also be sure of
the cost basis. So far, we have concentrated only on the traversal
through and moving of elements as the function of the size of the
map. This is clearly not sufficient, as Andrew Bromage pointed out.


___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


[Haskell] Re: performance tuning Data.FiniteMap

2004-03-01 Thread oleg

Hello!

If indeed the read performance is at premium and updates are
infrequent, by bother with ternary etc. trees -- why not to use just a
single, one-level array. Given a reasonable hash function, the
retrieval performance is O(1). And still, no IO/ST are necessary.


{-# OPTIONS -fglasgow-exts #-}
module Foo where

import Data.Array
import Data.List

import Data.HashTable (hashString)
import Data.Int (Int32)

class Hashy a where
  hash:: a - Int
  
data MyFM key val = MyFM { base::  Int
 , purgatory:: [(key,val)]
 , store:: Array Int [(key,val)]
 } deriving Show
 

empty = MyFM {base = 41, purgatory = [],
  store = listArray (0,base(empty)-1) $ repeat []}
  
lkup fm key = case lookup key (purgatory fm) of
 t@(Just _) - t
 _  - lookup key item
   where item  = (store fm)! hashv
 hashv = (hash key) `mod` (base fm)
 
count = length . concat . elems . store

purgatory_limit = 10

ins fm key val 
 = rebuild_perhaps $ fm {purgatory = add_uniq (purgatory fm) key val}
  where
   rebuild_perhaps fm | length (purgatory fm)  purgatory_limit 
  = rebuild fm
   rebuild_perhaps fm = fm
   
rebuild fm | 2*(count fm)  base fm = major_rebuild fm
rebuild fm = fm{purgatory = [], store = (store fm) // updates}
  where
updates = map (retr . merge) $ groupBy gfirs $ 
  sortBy sfirs $ map (\p@(k,v) - (hashk k,p)) $ purgatory fm
hashk k = (hash k) `mod` (base fm)
gfirs (k1,_) (k2,_) = k1 == k2
sfirs (k1,_) (k2,_) = compare k1 k2
merge x = (fst$ head x, map snd x)
retr (h,v) = (h, unionBy gfirs v ((store fm)!h))

-- reallocate the hash table to the bigger size
major_rebuild fm = undefined -- exercise for the reader

-- add association (key,val) to the list, replacing an old association
-- with the same key, if any. At most one such association could have
-- existed
add_uniq [] key val = [(key,val)]
add_uniq ((hkey,_):t) key val | hkey == key = (key,val):t
add_uniq (h:t) key val = h: add_uniq t key val

instance Hashy String where
hash = fromInteger . toInteger . hashString



test1 = foldl (\fm v - ins fm v v) empty $ map (:[]) ['a'..'h']
test2 = foldl (\fm v - ins fm v v) test1 $ map (:[]) ['a'..'o']
test3 = foldl (\fm v - ins fm v v) test2 $ map (:[]) ['a'..'o']
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


[Haskell] Re: performance tuning Data.FiniteMap

2004-03-01 Thread S. Alexander Jacobson
On Fri, 27 Feb 2004 [EMAIL PROTECTED] wrote:
   If indeed the read performance is at premium and updates are
 infrequent, by bother with ternary etc. trees -- why not to use just a
 single, one-level array. Given a reasonable hash function

Because updates are not so infrequent that I want
to pay the cost of replicating the entire array
every update (or every ten!).  I'm willing to
exchange *some* read time for faster update. Also,
because small array copies may be sufficiently
faster than tree traversals that I may pay very
little extra for faster reads.

FYI, my current code looks like this:

  type HTArray base elt = Array base (HT base elt)
  data HT base elt = HT (Maybe (HTArray base elt)) (Maybe elt)
  data MyMap base key elt = ArrMap (HTArray base elt) (key-[base]) (HT base elt)

  newMap minBase maxBase toBase = ArrMap proto toBase emptyHT
where
proto= array (minBase,maxBase) [(x,emptyHT) | x- [minBase..maxBase]]
emptyHT=HT Nothing Nothing

  lookup (ArrMap _ toBase ht) key = lookup' ht $ toBase key
  lookup' (HT x y) [] = y
  lookup' (HT Nothing _) _ = Nothing
  lookup' (HT (Just ar) _) (k:ey) = lookup' (ar!k) ey

  insert (ArrMap proto toBase ht) key elt = ArrMap proto toBase newHT
 where newHT= insert' proto ht (toBase key) elt
  insert' _ (HT x _) [] = HT x
  insert' proto (HT Nothing y) key = insert' proto (HT (Just proto) y) key
  insert' p (HT (Just ar) y) (k:ey) = \val - HT (Just $ newArray val) y
where newArray val = ar//[(k,insert' p (ar!k) ey val)]

  -

  testMap=newMap (chr 0) (chr 255) id
  main = do print $ lookup (insert testMap abc (Just def)) abc

Make the difference between in minBase and
maxBase larger in the call to newMap to prefer
reads more.

Note: This format seems awkward.  I feel like I
want to have the user to define an enumeration
type e.g.

  data UpToFive = One | Two | Three | Four | Five
  instance Ix UpToFive where

and have

  newMap::(Bounded base,Ix base)=(key-[base]) - MyMap base key elt

But I can't figure out a nice way to auto-generate
arbitrary size enumerations and manually doing so
is too wearisome to contemplate.

If you can generate these enumeration classes,
then it would seem you could auto-derive functions
that translate from an arbitrary key into [base].

-Alex-

_
S. Alexander Jacobson  mailto:[EMAIL PROTECTED]
tel:917-770-6565   http://alexjacobson.com
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell