Re: runtime fusion for Data.ByteString.cons ?

2006-11-19 Thread Donald Bruce Stewart
claus.reinke:
> >On Nov 19, 2006, at 11:54 AM, Claus Reinke wrote:
> >>I noticed that ByteString is drastically slower than String if I use
> >>cons a lot. according to the source, that is expected because of
> >>the memcpy for the second parameter.
> >
> >Have you considered constructing your strings with unfoldr?  It  
> >should be able to handle most (all?) of your string producing  
> >functions efficiently.
> 
> old habits die hard - I still underappreciate unfold;-)
> 
> perhaps I should expand my habits to include unfold more often,
> but in this case, I was interested in the performance of naively
> recursive ByteString programming. and the cons performance
> was the very first thing I noticed, so I tried to do something 
> about it.
> 
> I guess I got con(s)fused by the two branches of Data.ByteString:
> since it is part of base since ghc 6.6, I thought that pulling from 
> the ghc/libraries darcs repository would give me the latest and 
> greatest Data.ByteString, as described in the string rewriting 
> paper. the lack of Yields et al should have tripped me up.. 
> 
> what is the plan for that branch? and if there are issues that 
> prevent an update, shouldn't they be mentioned on the 
> Data.ByteString page?
> 
> claus

Of course, you can also use stream-based ByteStrings right now, you just
have to remove the Data.ByteString* dirs from base before you build, and
then install the unstable fps branch via Cabal.

-- Don
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: runtime fusion for Data.ByteString.cons ?

2006-11-19 Thread Donald Bruce Stewart
claus.reinke:
> >On Nov 19, 2006, at 11:54 AM, Claus Reinke wrote:
> >>I noticed that ByteString is drastically slower than String if I use
> >>cons a lot. according to the source, that is expected because of
> >>the memcpy for the second parameter.
> >
> >Have you considered constructing your strings with unfoldr?  It  
> >should be able to handle most (all?) of your string producing  
> >functions efficiently.
> 
> old habits die hard - I still underappreciate unfold;-)
> 
> perhaps I should expand my habits to include unfold more often,
> but in this case, I was interested in the performance of naively
> recursive ByteString programming. and the cons performance
> was the very first thing I noticed, so I tried to do something 
> about it.
> 
> I guess I got con(s)fused by the two branches of Data.ByteString:
> since it is part of base since ghc 6.6, I thought that pulling from 
> the ghc/libraries darcs repository would give me the latest and 
> greatest Data.ByteString, as described in the string rewriting 
> paper. the lack of Yields et al should have tripped me up.. 
> 
> what is the plan for that branch? and if there are issues that 
> prevent an update, shouldn't they be mentioned on the 
> Data.ByteString page?

Sometime before January I expect to tag and release fps 0.9 (with stream
fusion). It will then be merged into the base library, and be available
with the next GHC release.

The api should be identical to the currrent fps 0.8 in the base library,
just faster, since its using stream fusion instead of old-style
functional array fusion.

-- Don
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: runtime fusion for Data.ByteString.cons ?

2006-11-19 Thread Claus Reinke

On Nov 19, 2006, at 11:54 AM, Claus Reinke wrote:

I noticed that ByteString is drastically slower than String if I use
cons a lot. according to the source, that is expected because of
the memcpy for the second parameter.


Have you considered constructing your strings with unfoldr?  It  
should be able to handle most (all?) of your string producing  
functions efficiently.


old habits die hard - I still underappreciate unfold;-)

perhaps I should expand my habits to include unfold more often,
but in this case, I was interested in the performance of naively
recursive ByteString programming. and the cons performance
was the very first thing I noticed, so I tried to do something 
about it.


I guess I got con(s)fused by the two branches of Data.ByteString:
since it is part of base since ghc 6.6, I thought that pulling from 
the ghc/libraries darcs repository would give me the latest and 
greatest Data.ByteString, as described in the string rewriting 
paper. the lack of Yields et al should have tripped me up.. 

what is the plan for that branch? and if there are issues that 
prevent an update, shouldn't they be mentioned on the 
Data.ByteString page?


claus

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: runtime fusion for Data.ByteString.cons ?

2006-11-19 Thread Duncan Coutts
On Sun, 2006-11-19 at 17:54 +, Claus Reinke wrote:
> I noticed that ByteString is drastically slower than String if I use
> cons a lot. according to the source, that is expected because of
> the memcpy for the second parameter.
> 
> but it seems to me that construction should be able to play the 
> dual trick to deconstruction (which does not copy the tail, but
> returns an indirection into the original list).

Another approach which I have considered is to do it directly by just
poking into an array but then do cunning things to make it persistent at
yet still O(1) in the best case of single-threaded construction.

Here the representation:

data StringBuilder =
  StringBuilder (ForeignPtr Word8) Int Int (IORef Int)
  -- pointer, offset, length and 'current length'

So just like the ByteString representation but with an extra IORef Int.

The idea is that the IORef tells us the current length of the used part
of the memory block. So by comparing the length at the time this
StringBuilder value was made with the real current length then we can
see if we're using the 'latest' version of the StringBuilder or if it's
been appended/prepended to since.

If we're using the latest value then we can reserve some space by
atomically incrementing the IORef and then directly write into the free
space.

If we're not starting from the latest value then we incur a O(n) penalty
to copy the array. Of course in a sequence of cons/snoc operations to an
old value the copying only happens once since now we have a new unshared
array.

To make this scheme efficient the locking has to be cheap or preferably
someone could figure out a lockless version.

This could usefully be combined with lazy bytestrings (implemented
either as lists or unbalanced trees) to provide time and space efficient
O(1) cons and snoc.

Duncan

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: runtime fusion for Data.ByteString.cons ?

2006-11-19 Thread Spencer Janssen

On Nov 19, 2006, at 11:54 AM, Claus Reinke wrote:

I noticed that ByteString is drastically slower than String if I use
cons a lot. according to the source, that is expected because of
the memcpy for the second parameter.


Have you considered constructing your strings with unfoldr?  It  
should be able to handle most (all?) of your string producing  
functions efficiently.



Cheers,
Spencer Janssen


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: runtime fusion for Data.ByteString.cons ?

2006-11-19 Thread Donald Bruce Stewart
dons:
> claus.reinke:
> > I noticed that ByteString is drastically slower than String if I use
> > cons a lot. according to the source, that is expected because of
> > the memcpy for the second parameter.
> 
> Just a quick response, before I consider this in detail, in the stream
> fusion branch of Data.ByteString cons is fusible:
> 
> cons :: Word8 -> ByteString -> ByteString
> cons c (PS x s l) = unsafeCreate (l+1) $ \p -> withForeignPtr x $ \f -> do
> poke p c
> memcpy (p `plusPtr` 1) (f `plusPtr` s) (fromIntegral l)
> {-# INLINE [1] cons #-}
> 
> {-# RULES
> "FPS cons -> fused"  [~1] forall w.
> cons w = F.strTransformerUp (F.consS w)
> "FPS cons -> unfused" [1]   forall w.
> F.strTransformerUp (F.consS w) = cons w
>   #-}
> 
> strTransformerUp :: (Stream -> Stream) -> (ByteString -> ByteString)
> strTransformerUp f = writeStrUp . f . readStrUp
> {-# INLINE [0] strTransformerUp #-}
> 
> consS :: Word8 -> Stream -> Stream
> consS w (Stream nextx xs0 len) = Stream next' (True :*: xs0) (len+1)
>   where next' (True  :*: xs) = Yield w (False :*: xs)
> next' (_ :*: xs) = case nextx xs of
> Done-> Done
> Skip xs'-> Skip(False :*: xs')
> Yield x xs' -> Yield x (False :*: xs')
> {-# INLINE [0] consS #-}

Oh, this is slower than it should be, too. Those Bools get in the way of
GHC's specConstr optimisation. Instead it shoudl use a strict Either.

consS :: Word8 -> Stream -> Stream
consS w (Stream nextx xs0 len) = Stream next' (RightS xs0) (len+1)
  where next' (RightS xs) = Yield w (LeftS xs)
next' (LeftS  xs) = case nextx xs of
Done-> Done
Skip xs'-> Skip(LeftS xs')
Yield x xs' -> Yield x (LeftS xs')
{-# INLINE [0] consS #-}

where

data  EitherS a b = LeftS !a | RightS !b deriving (Eq, Ord )

that should help a bit with the stripping away of constructors in consS.

-- Don
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: runtime fusion for Data.ByteString.cons ?

2006-11-19 Thread Claus Reinke

[just saw your reply while sending this, so perhaps there's nothing new here?
but then why the runtime difference? anyway, here goes nothing :-]


a tentative idea would be to overload create so that it produces a proper,
allocated ByteString where such is expected, but can also just pass through
the PreBS where the context can handle it?


attached is a variant that seems to do the trick, although I do confess myself
slightly surprised that it does (I had thought that simplifier rules would be 
too
late to fix overloaded types..).

the core code for mapBS and mapBS' is now the same, but the versions of
empty and cons used in the latter can produce either variant of ByteString,
unallocated pre-ByteStrings or allocated ByteStrings (the instances of 
class IsByteString), and cons can also handle both variants as second
parameter. 


so all we do in the simplifier rule is to request that mapBS' should internally
use unallocated pre-ByteStrings, converting to allocated ByteString only at
the end (this request then forces empty and cons to produce pre-ByteStrings
as well, switching the whole recursion over to that representation). as I said, 
I'm somewhat surprised that this works, but commenting out the rule does 
have the expected impact on performance..


what do you think?
Claus

BScons.hs
Description: Binary data
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: runtime fusion for Data.ByteString.cons ?

2006-11-19 Thread Donald Bruce Stewart
claus.reinke:
> I noticed that ByteString is drastically slower than String if I use
> cons a lot. according to the source, that is expected because of
> the memcpy for the second parameter.

Just a quick response, before I consider this in detail, in the stream
fusion branch of Data.ByteString cons is fusible:

cons :: Word8 -> ByteString -> ByteString
cons c (PS x s l) = unsafeCreate (l+1) $ \p -> withForeignPtr x $ \f -> do
poke p c
memcpy (p `plusPtr` 1) (f `plusPtr` s) (fromIntegral l)
{-# INLINE [1] cons #-}

{-# RULES
"FPS cons -> fused"  [~1] forall w.
cons w = F.strTransformerUp (F.consS w)
"FPS cons -> unfused" [1]   forall w.
F.strTransformerUp (F.consS w) = cons w
  #-}

strTransformerUp :: (Stream -> Stream) -> (ByteString -> ByteString)
strTransformerUp f = writeStrUp . f . readStrUp
{-# INLINE [0] strTransformerUp #-}

consS :: Word8 -> Stream -> Stream
consS w (Stream nextx xs0 len) = Stream next' (True :*: xs0) (len+1)
  where next' (True  :*: xs) = Yield w (False :*: xs)
next' (_ :*: xs) = case nextx xs of
Done-> Done
Skip xs'-> Skip(False :*: xs')
Yield x xs' -> Yield x (False :*: xs')
{-# INLINE [0] consS #-}

Also, have you looked at Data.ByteString.Lazy which does have O(1) cons?

I'll think about the rest of your proposal after getting some coffee :)

-- Don
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users