Re: runtime fusion for Data.ByteString.cons ?
claus.reinke: > >On Nov 19, 2006, at 11:54 AM, Claus Reinke wrote: > >>I noticed that ByteString is drastically slower than String if I use > >>cons a lot. according to the source, that is expected because of > >>the memcpy for the second parameter. > > > >Have you considered constructing your strings with unfoldr? It > >should be able to handle most (all?) of your string producing > >functions efficiently. > > old habits die hard - I still underappreciate unfold;-) > > perhaps I should expand my habits to include unfold more often, > but in this case, I was interested in the performance of naively > recursive ByteString programming. and the cons performance > was the very first thing I noticed, so I tried to do something > about it. > > I guess I got con(s)fused by the two branches of Data.ByteString: > since it is part of base since ghc 6.6, I thought that pulling from > the ghc/libraries darcs repository would give me the latest and > greatest Data.ByteString, as described in the string rewriting > paper. the lack of Yields et al should have tripped me up.. > > what is the plan for that branch? and if there are issues that > prevent an update, shouldn't they be mentioned on the > Data.ByteString page? > > claus Of course, you can also use stream-based ByteStrings right now, you just have to remove the Data.ByteString* dirs from base before you build, and then install the unstable fps branch via Cabal. -- Don ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: runtime fusion for Data.ByteString.cons ?
claus.reinke: > >On Nov 19, 2006, at 11:54 AM, Claus Reinke wrote: > >>I noticed that ByteString is drastically slower than String if I use > >>cons a lot. according to the source, that is expected because of > >>the memcpy for the second parameter. > > > >Have you considered constructing your strings with unfoldr? It > >should be able to handle most (all?) of your string producing > >functions efficiently. > > old habits die hard - I still underappreciate unfold;-) > > perhaps I should expand my habits to include unfold more often, > but in this case, I was interested in the performance of naively > recursive ByteString programming. and the cons performance > was the very first thing I noticed, so I tried to do something > about it. > > I guess I got con(s)fused by the two branches of Data.ByteString: > since it is part of base since ghc 6.6, I thought that pulling from > the ghc/libraries darcs repository would give me the latest and > greatest Data.ByteString, as described in the string rewriting > paper. the lack of Yields et al should have tripped me up.. > > what is the plan for that branch? and if there are issues that > prevent an update, shouldn't they be mentioned on the > Data.ByteString page? Sometime before January I expect to tag and release fps 0.9 (with stream fusion). It will then be merged into the base library, and be available with the next GHC release. The api should be identical to the currrent fps 0.8 in the base library, just faster, since its using stream fusion instead of old-style functional array fusion. -- Don ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: runtime fusion for Data.ByteString.cons ?
On Nov 19, 2006, at 11:54 AM, Claus Reinke wrote: I noticed that ByteString is drastically slower than String if I use cons a lot. according to the source, that is expected because of the memcpy for the second parameter. Have you considered constructing your strings with unfoldr? It should be able to handle most (all?) of your string producing functions efficiently. old habits die hard - I still underappreciate unfold;-) perhaps I should expand my habits to include unfold more often, but in this case, I was interested in the performance of naively recursive ByteString programming. and the cons performance was the very first thing I noticed, so I tried to do something about it. I guess I got con(s)fused by the two branches of Data.ByteString: since it is part of base since ghc 6.6, I thought that pulling from the ghc/libraries darcs repository would give me the latest and greatest Data.ByteString, as described in the string rewriting paper. the lack of Yields et al should have tripped me up.. what is the plan for that branch? and if there are issues that prevent an update, shouldn't they be mentioned on the Data.ByteString page? claus ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: runtime fusion for Data.ByteString.cons ?
On Sun, 2006-11-19 at 17:54 +, Claus Reinke wrote: > I noticed that ByteString is drastically slower than String if I use > cons a lot. according to the source, that is expected because of > the memcpy for the second parameter. > > but it seems to me that construction should be able to play the > dual trick to deconstruction (which does not copy the tail, but > returns an indirection into the original list). Another approach which I have considered is to do it directly by just poking into an array but then do cunning things to make it persistent at yet still O(1) in the best case of single-threaded construction. Here the representation: data StringBuilder = StringBuilder (ForeignPtr Word8) Int Int (IORef Int) -- pointer, offset, length and 'current length' So just like the ByteString representation but with an extra IORef Int. The idea is that the IORef tells us the current length of the used part of the memory block. So by comparing the length at the time this StringBuilder value was made with the real current length then we can see if we're using the 'latest' version of the StringBuilder or if it's been appended/prepended to since. If we're using the latest value then we can reserve some space by atomically incrementing the IORef and then directly write into the free space. If we're not starting from the latest value then we incur a O(n) penalty to copy the array. Of course in a sequence of cons/snoc operations to an old value the copying only happens once since now we have a new unshared array. To make this scheme efficient the locking has to be cheap or preferably someone could figure out a lockless version. This could usefully be combined with lazy bytestrings (implemented either as lists or unbalanced trees) to provide time and space efficient O(1) cons and snoc. Duncan ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: runtime fusion for Data.ByteString.cons ?
On Nov 19, 2006, at 11:54 AM, Claus Reinke wrote: I noticed that ByteString is drastically slower than String if I use cons a lot. according to the source, that is expected because of the memcpy for the second parameter. Have you considered constructing your strings with unfoldr? It should be able to handle most (all?) of your string producing functions efficiently. Cheers, Spencer Janssen ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: runtime fusion for Data.ByteString.cons ?
dons: > claus.reinke: > > I noticed that ByteString is drastically slower than String if I use > > cons a lot. according to the source, that is expected because of > > the memcpy for the second parameter. > > Just a quick response, before I consider this in detail, in the stream > fusion branch of Data.ByteString cons is fusible: > > cons :: Word8 -> ByteString -> ByteString > cons c (PS x s l) = unsafeCreate (l+1) $ \p -> withForeignPtr x $ \f -> do > poke p c > memcpy (p `plusPtr` 1) (f `plusPtr` s) (fromIntegral l) > {-# INLINE [1] cons #-} > > {-# RULES > "FPS cons -> fused" [~1] forall w. > cons w = F.strTransformerUp (F.consS w) > "FPS cons -> unfused" [1] forall w. > F.strTransformerUp (F.consS w) = cons w > #-} > > strTransformerUp :: (Stream -> Stream) -> (ByteString -> ByteString) > strTransformerUp f = writeStrUp . f . readStrUp > {-# INLINE [0] strTransformerUp #-} > > consS :: Word8 -> Stream -> Stream > consS w (Stream nextx xs0 len) = Stream next' (True :*: xs0) (len+1) > where next' (True :*: xs) = Yield w (False :*: xs) > next' (_ :*: xs) = case nextx xs of > Done-> Done > Skip xs'-> Skip(False :*: xs') > Yield x xs' -> Yield x (False :*: xs') > {-# INLINE [0] consS #-} Oh, this is slower than it should be, too. Those Bools get in the way of GHC's specConstr optimisation. Instead it shoudl use a strict Either. consS :: Word8 -> Stream -> Stream consS w (Stream nextx xs0 len) = Stream next' (RightS xs0) (len+1) where next' (RightS xs) = Yield w (LeftS xs) next' (LeftS xs) = case nextx xs of Done-> Done Skip xs'-> Skip(LeftS xs') Yield x xs' -> Yield x (LeftS xs') {-# INLINE [0] consS #-} where data EitherS a b = LeftS !a | RightS !b deriving (Eq, Ord ) that should help a bit with the stripping away of constructors in consS. -- Don ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: runtime fusion for Data.ByteString.cons ?
[just saw your reply while sending this, so perhaps there's nothing new here? but then why the runtime difference? anyway, here goes nothing :-] a tentative idea would be to overload create so that it produces a proper, allocated ByteString where such is expected, but can also just pass through the PreBS where the context can handle it? attached is a variant that seems to do the trick, although I do confess myself slightly surprised that it does (I had thought that simplifier rules would be too late to fix overloaded types..). the core code for mapBS and mapBS' is now the same, but the versions of empty and cons used in the latter can produce either variant of ByteString, unallocated pre-ByteStrings or allocated ByteStrings (the instances of class IsByteString), and cons can also handle both variants as second parameter. so all we do in the simplifier rule is to request that mapBS' should internally use unallocated pre-ByteStrings, converting to allocated ByteString only at the end (this request then forces empty and cons to produce pre-ByteStrings as well, switching the whole recursion over to that representation). as I said, I'm somewhat surprised that this works, but commenting out the rule does have the expected impact on performance.. what do you think? Claus BScons.hs Description: Binary data ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: runtime fusion for Data.ByteString.cons ?
claus.reinke: > I noticed that ByteString is drastically slower than String if I use > cons a lot. according to the source, that is expected because of > the memcpy for the second parameter. Just a quick response, before I consider this in detail, in the stream fusion branch of Data.ByteString cons is fusible: cons :: Word8 -> ByteString -> ByteString cons c (PS x s l) = unsafeCreate (l+1) $ \p -> withForeignPtr x $ \f -> do poke p c memcpy (p `plusPtr` 1) (f `plusPtr` s) (fromIntegral l) {-# INLINE [1] cons #-} {-# RULES "FPS cons -> fused" [~1] forall w. cons w = F.strTransformerUp (F.consS w) "FPS cons -> unfused" [1] forall w. F.strTransformerUp (F.consS w) = cons w #-} strTransformerUp :: (Stream -> Stream) -> (ByteString -> ByteString) strTransformerUp f = writeStrUp . f . readStrUp {-# INLINE [0] strTransformerUp #-} consS :: Word8 -> Stream -> Stream consS w (Stream nextx xs0 len) = Stream next' (True :*: xs0) (len+1) where next' (True :*: xs) = Yield w (False :*: xs) next' (_ :*: xs) = case nextx xs of Done-> Done Skip xs'-> Skip(False :*: xs') Yield x xs' -> Yield x (False :*: xs') {-# INLINE [0] consS #-} Also, have you looked at Data.ByteString.Lazy which does have O(1) cons? I'll think about the rest of your proposal after getting some coffee :) -- Don ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users