summary of storing highly shared data structures
I wrote: However, shared ATerms are always different for different types, because the corresponding data constructors are different. This isn't quite true. The shared ATerm for the empty list is the same for all instances. Finally, _reading in_ shared ATerms is fast, since ghc seems to exploit the sharing given by the injective ATermTable. This was also wrong. The ATermTable as IntMap ShATerm was not enough, because all objects were constructed anew from the same or different ShATerms. I had to setup an additional IntMap Dynamic for creating shared data structures. It was no fun to make everything Typeable (and having no Ord instance for TypeRep). Simon suggested to use StableName or StablePtr. I did not use StablePtr, because it lacks a hashing function! I wasn't happy with StableName either, because 1) it required IO 2) System.Mem.StableName is marked non-portable (without a reason in the haddock documentation) 3) There was no coercion function StableName a - StableName b I coerced all my objects with makeStableName and GHC.Prim.unsafeCoerce# to StableName () (I became non-portable, anywy.) Now I could use Data.HashTable (StableName ()) Value since I was in the IO monad anyway. (The Value is basically an Int, but here I want to avoid the potential confusion with the key.) For the sake of interest I also tried IntMap [(StableName (), Value)] using hashStableName as IntMap key. (An IntMap is also used in http://cs.helsinki.fi/u/ekarttun/SerTH that Einar Karttunen pointed to) It turned out that the IntMap was not slower than the HashTable (What is HashTable good for, then? Why is it so slow?) Since I used hashStableName to get a key for my IntMap, I see no reason why StableName should not have an Ord instance, in order to avoid the whole hashing with lists (although a Data.Map (StableName ()) Value may not be faster). Summerizing, the user-friendliness (at least) of System.Mem.StableName should be improved (also with respect to the +RTS -A10m business for performing makeStableName). Cheers Christian ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: summary of storing highly shared data structures
Hi Bulat, The difference between IntMap and HashTable is not large despite -A10m (without this option HashTable is unusable). HashTable: ghc: 2754665792 bytes, 287 GCs, 26495315/147911940 avg/max bytes residency (12 samples), 299M in use, 0.00 INIT (0.00 elapsed), 31.72 MUT (33.78 elapsed), 14.04 GC (17.73 elapsed) :ghc IntMap: ghc: 2705137096 bytes, 282 GCs, 30775754/176806684 avg/max bytes residency (12 samples), 356M in use, 0.00 INIT (0.01 elapsed), 30.35 MUT (31.28 elapsed), 13.64 GC (14.52 elapsed) :ghc I must admit, though, that my hash function may be bad: data EqKey = EqKey (StableName ()) TypeRep deriving Eq hashKey :: EqKey - Int32 hashKey (EqKey p t) = HTab.hashInt (hashStableName p) and calling HashTable.new (==) hashKey. (Adding hashString on show TypeRep is worse.) A hash-function for TypeReps would be better. But my IntMap also uses Eq on EqKey (IntMap [(EqKey, Int)]) Cheers Christian Bulat Ziganshin wrote: Hello Christian, Wednesday, January 11, 2006, 5:13:25 PM, you wrote: CM It turned out that the IntMap was not slower than the HashTable (What is CM HashTable good for, then? Why is it so slow?) see the http://cvs.haskell.org/trac/ghc/ticket/650 and the attached letter. -A10m should help in this case ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users