jnf: > > > wren ng thornton wrote: > > > > If you have many identical strings then you will save lots by memoizing > > your strings into Integers, and then serializing that memo table and the > > integerized version of your data structure. The amount of savings > > decreases as the number of duplications decrease, though since you don't > > need the memo table itself you should be able to serialize it in a way > > that doesn't have much overhead. > > > > I had problems with the size of the allocated heap space after serializing > and loading data with the binary package. The reason was that > binary does not support sharing of identical elements and considered a > restricted solution for strings and certain other data types first, but > came up with a generic solution in the end. > (I did it just last weekend).
And this is exactly the intended path -- that people will release their own special instances for doing more elaborate parsing/printing tricks! > I put the Binary monad in a state transformer with maps for memoization: > type PutShared = St.StateT (Map Object Int, Int) PutM () > type GetShared = St.StateT (IntMap Object) Bin.Get > > In addition to standard get ant put methods: > class (Typeable α, Ord α, Eq α) ⇒ BinaryShared α where > put :: α → PutShared > get :: GetShared α > I added putShared and getShared methods with memoization: > putShared :: (α → PutShared) → α → PutShared > getShared :: GetShared α → GetShared α > > For types that I don't want memoization I can either refer to the underlying > binary monad for primitive types, e.g.: > instance BinaryShared Int where > put = lift∘Bin.put > get = lift Bin.get > or stay in the BinaryShared monad for types of which I may memoize > components, e.g.: > instance (BinaryShared a, BinaryShared b) ⇒ BinaryShared (a,b) where > put (a,b) = put a ≫ put b > get = liftM2 (,) get get > > And for types for which I want memoization, I wrap it with putShared and > getShared ,e.g: > instance BinaryShared a ⇒ BinaryShared [a] where > put = putShared (λl → lift (Bin.put (length l)) ≫ mapM_ put l) > get = getShared (do > n ← lift (Bin.get :: Bin.Get Int) > replicateM n get) > > This save 1/3 of heap space to my application. I didn't measure time. > Maybe it would be useful to have something like this in the binary module. > Very nice. Maybe even upload these useful instances in a little binary-extras package? _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe