summary of storing highly shared data structures

2006-01-11 Thread Christian Maeder

I wrote:

However, shared ATerms are always different for different types,
because the corresponding data constructors are different.


This isn't quite true. The shared ATerm for the empty list is the same 
for all instances.


Finally, _reading in_ shared ATerms is fast, since ghc seems to exploit 
the sharing given by the injective ATermTable.


This was also wrong. The ATermTable as IntMap ShATerm was not enough, 
because all objects were constructed anew from the same or different 
ShATerms. I had to setup an additional IntMap Dynamic for creating 
shared data structures. It was no fun to make everything Typeable (and 
having no Ord instance for TypeRep).


Simon suggested to use StableName or StablePtr. I did not use StablePtr, 
because it lacks a hashing function!


I wasn't happy with StableName either, because
 1) it required IO
 2) System.Mem.StableName is marked non-portable
(without a reason in the haddock documentation)
 3) There was no coercion function StableName a - StableName b

I coerced all my objects with makeStableName and GHC.Prim.unsafeCoerce# 
to StableName () (I became non-portable, anywy.)


Now I could use Data.HashTable (StableName ()) Value since I was in 
the IO monad anyway. (The Value is basically an Int, but here I want to 
avoid the potential confusion with the key.)


For the sake of interest I also tried IntMap [(StableName (), Value)] 
using hashStableName as IntMap key. (An IntMap is also used in 
http://cs.helsinki.fi/u/ekarttun/SerTH that Einar Karttunen pointed to)


It turned out that the IntMap was not slower than the HashTable (What is 
HashTable good for, then? Why is it so slow?)


Since I used hashStableName to get a key for my IntMap, I see no reason 
why StableName should not have an Ord instance, in order to avoid the 
whole hashing with lists (although a Data.Map (StableName ()) Value 
may not be faster).


Summerizing, the user-friendliness (at least) of System.Mem.StableName 
should be improved (also with respect to the +RTS -A10m business for 
performing makeStableName).


Cheers Christian
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: summary of storing highly shared data structures

2006-01-11 Thread Christian Maeder

Hi Bulat,

The difference between IntMap and HashTable is not large despite -A10m 
(without this option HashTable is unusable).


HashTable:
ghc: 2754665792 bytes, 287 GCs, 26495315/147911940 avg/max bytes 
residency (12 samples), 299M in use, 0.00 INIT (0.00 elapsed), 31.72 MUT 
(33.78 elapsed), 14.04 GC (17.73 elapsed) :ghc


IntMap:
ghc: 2705137096 bytes, 282 GCs, 30775754/176806684 avg/max bytes 
residency (12 samples), 356M in use, 0.00 INIT (0.01 elapsed), 30.35 MUT 
(31.28 elapsed), 13.64 GC (14.52 elapsed) :ghc


I must admit, though, that my hash function may be bad:

  data EqKey = EqKey (StableName ()) TypeRep deriving Eq

  hashKey :: EqKey - Int32
  hashKey (EqKey p t) = HTab.hashInt (hashStableName p)

and calling HashTable.new (==) hashKey. (Adding hashString on show 
TypeRep is worse.) A hash-function for TypeReps would be better.


But my IntMap also uses Eq on EqKey (IntMap [(EqKey, Int)])

Cheers Christian


Bulat Ziganshin wrote:

Hello Christian,

Wednesday, January 11, 2006, 5:13:25 PM, you wrote:

CM It turned out that the IntMap was not slower than the HashTable (What is
CM HashTable good for, then? Why is it so slow?)

see the http://cvs.haskell.org/trac/ghc/ticket/650

and the attached letter. -A10m should help in this case



___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users