[Haskell] Re: performance tuning Data.FiniteMap
[BTW, should we move to Haskell-Cafe?] Because updates are not so infrequent that I want to pay the cost of replicating the entire array every update (or every ten!). I'm willing to exchange *some* read time for faster update. Also, because small array copies may be sufficiently faster than tree traversals that I may pay very little extra for faster reads. FYI, my current code looks like this: I'm afraid I'm somewhat confused. The hash-table related code makes the copy of the whole array every purgatory-size times. Thus, given the sequence of 'n' unique inserts (which don't trigger the major_rebuild), at most 2*n/|purgatory| elements will be moved. As I understand your code, in particular, insert (ArrMap proto toBase ht) key elt = ArrMap proto toBase newHT where newHT= insert' proto ht (toBase key) elt insert' _ (HT x _) [] = HT x insert' proto (HT Nothing y) key = insert' proto (HT (Just proto) y) key insert' p (HT (Just ar) y) (k:ey) = \val - HT (Just $ newArray val) y where newArray val = ar//[(k,insert' p (ar!k) ey val)] you make a copy of an array of the size |base| |key| times. _If_ the tree is kept balanced and filled, then the sequence of n inserts will copy (log n)/(log |base|)*|base| elements. For small n and large |base|, that can be a lot. For example, testMap=newMap (chr 0) (chr 255) id main = do print $ lookup (insert testMap abc (Just def)) abc involves copying a 256-element array three times. Right? I guess we have come to the point where we really need to know the distribution of reads and writes, the length of the key (and if it is bounded), and the distribution of key values. We must also be sure of the cost basis. So far, we have concentrated only on the traversal through and moving of elements as the function of the size of the map. This is clearly not sufficient, as Andrew Bromage pointed out. ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
[Haskell] Re: performance tuning Data.FiniteMap
Hello! If indeed the read performance is at premium and updates are infrequent, by bother with ternary etc. trees -- why not to use just a single, one-level array. Given a reasonable hash function, the retrieval performance is O(1). And still, no IO/ST are necessary. {-# OPTIONS -fglasgow-exts #-} module Foo where import Data.Array import Data.List import Data.HashTable (hashString) import Data.Int (Int32) class Hashy a where hash:: a - Int data MyFM key val = MyFM { base:: Int , purgatory:: [(key,val)] , store:: Array Int [(key,val)] } deriving Show empty = MyFM {base = 41, purgatory = [], store = listArray (0,base(empty)-1) $ repeat []} lkup fm key = case lookup key (purgatory fm) of t@(Just _) - t _ - lookup key item where item = (store fm)! hashv hashv = (hash key) `mod` (base fm) count = length . concat . elems . store purgatory_limit = 10 ins fm key val = rebuild_perhaps $ fm {purgatory = add_uniq (purgatory fm) key val} where rebuild_perhaps fm | length (purgatory fm) purgatory_limit = rebuild fm rebuild_perhaps fm = fm rebuild fm | 2*(count fm) base fm = major_rebuild fm rebuild fm = fm{purgatory = [], store = (store fm) // updates} where updates = map (retr . merge) $ groupBy gfirs $ sortBy sfirs $ map (\p@(k,v) - (hashk k,p)) $ purgatory fm hashk k = (hash k) `mod` (base fm) gfirs (k1,_) (k2,_) = k1 == k2 sfirs (k1,_) (k2,_) = compare k1 k2 merge x = (fst$ head x, map snd x) retr (h,v) = (h, unionBy gfirs v ((store fm)!h)) -- reallocate the hash table to the bigger size major_rebuild fm = undefined -- exercise for the reader -- add association (key,val) to the list, replacing an old association -- with the same key, if any. At most one such association could have -- existed add_uniq [] key val = [(key,val)] add_uniq ((hkey,_):t) key val | hkey == key = (key,val):t add_uniq (h:t) key val = h: add_uniq t key val instance Hashy String where hash = fromInteger . toInteger . hashString test1 = foldl (\fm v - ins fm v v) empty $ map (:[]) ['a'..'h'] test2 = foldl (\fm v - ins fm v v) test1 $ map (:[]) ['a'..'o'] test3 = foldl (\fm v - ins fm v v) test2 $ map (:[]) ['a'..'o'] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
[Haskell] Re: performance tuning Data.FiniteMap
On Fri, 27 Feb 2004 [EMAIL PROTECTED] wrote: If indeed the read performance is at premium and updates are infrequent, by bother with ternary etc. trees -- why not to use just a single, one-level array. Given a reasonable hash function Because updates are not so infrequent that I want to pay the cost of replicating the entire array every update (or every ten!). I'm willing to exchange *some* read time for faster update. Also, because small array copies may be sufficiently faster than tree traversals that I may pay very little extra for faster reads. FYI, my current code looks like this: type HTArray base elt = Array base (HT base elt) data HT base elt = HT (Maybe (HTArray base elt)) (Maybe elt) data MyMap base key elt = ArrMap (HTArray base elt) (key-[base]) (HT base elt) newMap minBase maxBase toBase = ArrMap proto toBase emptyHT where proto= array (minBase,maxBase) [(x,emptyHT) | x- [minBase..maxBase]] emptyHT=HT Nothing Nothing lookup (ArrMap _ toBase ht) key = lookup' ht $ toBase key lookup' (HT x y) [] = y lookup' (HT Nothing _) _ = Nothing lookup' (HT (Just ar) _) (k:ey) = lookup' (ar!k) ey insert (ArrMap proto toBase ht) key elt = ArrMap proto toBase newHT where newHT= insert' proto ht (toBase key) elt insert' _ (HT x _) [] = HT x insert' proto (HT Nothing y) key = insert' proto (HT (Just proto) y) key insert' p (HT (Just ar) y) (k:ey) = \val - HT (Just $ newArray val) y where newArray val = ar//[(k,insert' p (ar!k) ey val)] - testMap=newMap (chr 0) (chr 255) id main = do print $ lookup (insert testMap abc (Just def)) abc Make the difference between in minBase and maxBase larger in the call to newMap to prefer reads more. Note: This format seems awkward. I feel like I want to have the user to define an enumeration type e.g. data UpToFive = One | Two | Three | Four | Five instance Ix UpToFive where and have newMap::(Bounded base,Ix base)=(key-[base]) - MyMap base key elt But I can't figure out a nice way to auto-generate arbitrary size enumerations and manually doing so is too wearisome to contemplate. If you can generate these enumeration classes, then it would seem you could auto-derive functions that translate from an arbitrary key into [base]. -Alex- _ S. Alexander Jacobson mailto:[EMAIL PROTECTED] tel:917-770-6565 http://alexjacobson.com ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell