Re: [Haskell-cafe] stream/bytestring questions

2008-02-21 Thread Chad Scherrer
On Wed, Feb 20, 2008 at 5:53 PM, Roman Leshchinskiy <[EMAIL PROTECTED]> wrote:
>  In general, I don't see why programming directly with streams is
>  something that should be avoided. You do have to be quite careful,
>  though, if you want to get good performance (but GHC's simplifier is
>  becoming increasingly robust in this respect).

Hmm. I was taking the approach of getting something working, given
what is currently exported from Data.Stream. How would you deal with
this? Should there be a Data.Stream.Internal or something that exports
streams and unlifted types?

If I'm understanding this correctly, these things were not exported in
the first place because this fusion framework provides an
approximation, but not an isomorphism, so partial bottoms don't always
behave nicely. I was hoping to get around this by programming instead
to Step and then hoping rules could be constructed to translate to
Streams. Do you think there's a better way around it?

>  > extract ns xs == [xs !! n | n <- ns]
>
>  Note that in contrast to your function, this doesn't assume that ns is
>  sorted and hence, there is no way to implement this without producing an
>  intermediate list.

Oh yes, good point. It's so easy to forget about assumptions like that.

Chad
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] stream/bytestring questions

2008-02-20 Thread Roman Leshchinskiy

Chad Scherrer wrote:


Here's an example of the problem. Start with a function

extract :: [Int] -> [a] -> [a]
extract = f 0
where
f !k nss@(n:ns) (x:xs)
  | n == k= x : f (k+1) ns xs
  | otherwise = f (k+1) nss xs
f _ _ _ = []


If you want this to play nicely with stream fusion, you have to define a 
version of extract which works on streams instead of lists:


extractS :: Stream Int -> Stream a -> Stream a

Now, you can easily define a fusible list version:

extract ns xs = unstream (extractS (stream ns) (stream xs))

In general, I don't see why programming directly with streams is 
something that should be avoided. You do have to be quite careful, 
though, if you want to get good performance (but GHC's simplifier is 
becoming increasingly robust in this respect).



extract ns xs == [xs !! n | n <- ns]


Note that in contrast to your function, this doesn't assume that ns is 
sorted and hence, there is no way to implement this without producing an 
intermediate list.


Roman

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] stream/bytestring questions

2008-02-20 Thread Chad Scherrer
On Feb 17, 2008 6:06 PM, Derek Elkins <[EMAIL PROTECTED]> wrote:
> It's -quite- possible that a coalgebraic perspective is much more
> natural for your code/problem.  If that's the case, use it (the
> coalgebraic perspective that is).  Obviously depending on the internals
> of the stream library is not a good idea and using Streams directly was
> not their intent, but it is your code.  Do what you will.

Here's an example of the problem. Start with a function

extract :: [Int] -> [a] -> [a]
extract = f 0
where
f !k nss@(n:ns) (x:xs)
  | n == k= x : f (k+1) ns xs
  | otherwise = f (k+1) nss xs
f _ _ _ = []

which is just a more efficient way of getting
extract ns xs == [xs !! n | n <- ns]

There should be a way to write this that will be friendly for stream
fusion. The best option I can see is unfoldr. But if you try to write
it this way, you get something like

extract' ns xs = unfoldr f (0,ns,xs)
  where
  f (!k, nss@(n:ns), x:xs)
| n == k= Just (x, (k + 1, ns, xs))
| otherwise = f (k+1, nss, xs)
  f _ = Nothing

This is fine, except that the second-to-last line means this is still
recursive. If I understand correctly, fusion requires that the
recursion be encapsulated within the unfoldr or other functions that
are expressed internally as stream functions.

We could encapsulate the recursion with a function
stepUnfoldr :: (s -> Step a s) -> s -> [a]
stepUnfoldr f s = unfoldr g s
  where
  g s = case f s of
Done -> Nothing
Yield x s' -> Just (x,s')
Skip s' -> g s'

Using this, we could just write

extract'' ns xs = stepUnfoldr f (0,ns,xs)
  where
  f (!k, nss@(n:ns), x:xs)
| n == k= Yield x (k + 1, ns, xs)
| otherwise = Skip (k+1, nss, xs)
  f _ = Done

This is a pretty natural way to write the algorithm, and the recursion
is nicely tucked away. The only remaining question is whether we can
get things to fuse.

The type of stepUnfoldr looks familiar...

*Main> :t stepUnfoldr
stepUnfoldr :: (s -> Step a s) -> s -> [a]

*Main> :t \f s -> unstream $ Stream f s
\f s -> unstream $ Stream f s :: (Data.Stream.Unlifted s) =>
 (s -> Step a s) -> s -> [a]

If we could somehow swap out our state type for an unlifted one, we
could write a rule
  stepUnfoldr f = unstream . Stream f

It seems like there might be some subtleties there to watch out for,
but I'm not sure yet.

Anyway, this is the kind of thing I had in mind when I asked about
using the internals of Data.Stream more directly.

Chad
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] stream/bytestring questions

2008-02-17 Thread Derek Elkins
On Sun, 2008-02-17 at 18:02 -0800, Chad Scherrer wrote:
> On Feb 17, 2008 5:01 PM, Don Stewart <[EMAIL PROTECTED]> wrote:
> > yeah, with lists, as compared to bytestrings, there are:
> >
> > * more complex operations to fuse
> > * allocation is much cheaper (lazy list cons nodes)
> > * built in desugaring for build/foldr fusion interferes (enumerations, 
> > comprehensions)
> >
> > so the benefits of fusing lists are less obvious than bytestrings, where
> > every fusion point knocks out a big array allocation, and they're a bit
> > more complex to get the full api going.
> 
> Ok, that makes sense.
> 
> > no, using the rules should be fine. you're not supposed to program in
> > the stream abstraction.
> 
> I was working on some run-length encoding stuff and found it most
> natural to do a lot of it using unfoldr, and I had a state in the
> unfold that was naturally represented as a tuple. I got a little
> worried about all the packing and unpacking tuples not being optimized
> out, and this got me wondering about just using Streams directly. At
> the time, the Skip constructor for a Step felt natural to use in some
> places, which I thought could be really convenient. I dunno, maybe
> it's not an issue. Hmm, if it would help I can try to post some of the
> code...


It's -quite- possible that a coalgebraic perspective is much more
natural for your code/problem.  If that's the case, use it (the
coalgebraic perspective that is).  Obviously depending on the internals
of the stream library is not a good idea and using Streams directly was
not their intent, but it is your code.  Do what you will.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] stream/bytestring questions

2008-02-17 Thread Chad Scherrer
On Feb 17, 2008 5:01 PM, Don Stewart <[EMAIL PROTECTED]> wrote:
> yeah, with lists, as compared to bytestrings, there are:
>
> * more complex operations to fuse
> * allocation is much cheaper (lazy list cons nodes)
> * built in desugaring for build/foldr fusion interferes (enumerations, 
> comprehensions)
>
> so the benefits of fusing lists are less obvious than bytestrings, where
> every fusion point knocks out a big array allocation, and they're a bit
> more complex to get the full api going.

Ok, that makes sense.

> no, using the rules should be fine. you're not supposed to program in
> the stream abstraction.

I was working on some run-length encoding stuff and found it most
natural to do a lot of it using unfoldr, and I had a state in the
unfold that was naturally represented as a tuple. I got a little
worried about all the packing and unpacking tuples not being optimized
out, and this got me wondering about just using Streams directly. At
the time, the Skip constructor for a Step felt natural to use in some
places, which I thought could be really convenient. I dunno, maybe
it's not an issue. Hmm, if it would help I can try to post some of the
code...

> > Wouldn't
> > it make sense to do something like
> >
> > data ArrayList a i e = Empty | Chunk !(a i e) (ArrayList a i e)
> > ?
>
> someone could do that. we chose to  go with the monomorphic case, since
> its easier to get the api right, and the performance right.

Unless I'm overlooking something, this would involve something like...
(1) make the fusion stuff apply to an IArray (pretty handy anyway)
(2) make (strict) ByteString an instance of IArray (or maybe via StorableArray)
(3) write an ArrayList module, similar to Data.ByteString.Lazy

>From this, it would be pretty short again, in theory, to write an
alternate ByteString library using the abstractions. The point of this
would be to leverage future code improvements and reduce code
duplication.

Does this seem reasonable? Or is it too soon to consider this kind of extension?

Chad
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] stream/bytestring questions

2008-02-17 Thread Don Stewart
chad.scherrer:
> > they currently use two different fusion systems. bytestring uses an
> > older version of what is now stream fusion. at some point we'll switch
> > bytestrings over to using the new stuff in the stream-fusion package,
> > since its a lot better.
> 
> Oh, that's pretty interesting. I had assumed bytestring had the latest
> fusion stuff.
> 
> > ah, you assume stream-fusion lists are slower. for list stuff, since
> > 6.8, i've only seen things the same speed or better. but if you have a
> > program doing the wrong thing, it would be worth looking at.
> 
> I was just going by the "Lists to Streams to Nothing At All"  paper.
> That's great that it's gotten faster! Does that mean the path for
> replacing Data.List with this is any clearer?

Not yet. We still don't have a good system for fusing nested concatMaps, 
that GHC likes, which come up when you desugar list comprehensions.

If you can live without too many comprehensions, then the stream-fusion
package is entirely viable.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] stream/bytestring questions

2008-02-17 Thread Chad Scherrer
> they currently use two different fusion systems. bytestring uses an
> older version of what is now stream fusion. at some point we'll switch
> bytestrings over to using the new stuff in the stream-fusion package,
> since its a lot better.

Oh, that's pretty interesting. I had assumed bytestring had the latest
fusion stuff.

> ah, you assume stream-fusion lists are slower. for list stuff, since
> 6.8, i've only seen things the same speed or better. but if you have a
> program doing the wrong thing, it would be worth looking at.

I was just going by the "Lists to Streams to Nothing At All"  paper.
That's great that it's gotten faster! Does that mean the path for
replacing Data.List with this is any clearer?
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] stream/bytestring questions

2008-02-17 Thread Don Stewart
chad.scherrer:
> On Feb 17, 2008 4:13 PM, Brandon S. Allbery KF8NH <[EMAIL PROTECTED]> wrote:
> 
> > Have you looked at the stream-fusion package on Hackage?
> > http://hackage.haskell.org/cgi-bin/hackage-scripts/package/stream-
> > fusion-0.1.1
> 
> Yeah, I've seen this. It's nice that this is separated, but a little
> unsatisfying that the bytestring library reimplements it rather than
> just requiring it as a dependency. This, and the fact that bytestrings

they currently use two different fusion systems. bytestring uses an
older version of what is now stream fusion. at some point we'll switch
bytestrings over to using the new stuff in the stream-fusion package,
since its a lot better.

> are way fast, while the stream-fusion stuff is sometimes slower than
> just using lists, make me think that either (1) there's sometimes an
> advantage in programming directly to streams, or (2) maybe more of the
> stream functions need to be exposed in the API. Or maybe there's
> another reason... that's part of my question.

ah, you assume stream-fusion lists are slower. for list stuff, since
6.8, i've only seen things the same speed or better. but if you have a
program doing the wrong thing, it would be worth looking at.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] stream/bytestring questions

2008-02-17 Thread Don Stewart
chad.scherrer:
> ByteStrings have given a real performance boost to a lot of Haskell
> applications, and I'm curious why some of the techniques aren't more
> abstracted and widely available. If it's "because it's a big job",
> that's certainly understandable, but maybe there's something I'm
> overlooking that some of the abstractions aren't really desirable.
> 
> My first question is about the stream fusion. This seems (based on the
> ByteString paper) to speed things up quite a lot, but for some reason
> this isn't necessarily the case for lists. Is this just a matter of

yeah, with lists, as compared to bytestrings, there are:

* more complex operations to fuse
* allocation is much cheaper (lazy list cons nodes)
* built in desugaring for build/foldr fusion interferes (enumerations, 
comprehensions)

so the benefits of fusing lists are less obvious than bytestrings, where
every fusion point knocks out a big array allocation, and they're a bit
more complex to get the full api going.

> the existing fusion for lists doing such a good job in many cases that
> it's hard to beat? The streams paper suggests that some contructors
> aren't optimized away during fusion, but wouldn't this also be a
> problem for fusion in the bytestring context? Are there many cases

probably not, as the issues are mostly to do with deeply nested
comprehensions, which simply aren't done with bytestrings.

> where it helps performance to program to streams directly, rather than

no, using the rules should be fine. you're not supposed to program in
the stream abstraction.

> letting the rules go back and forth between them and lists? I tried to
> do this, but kept getting hung up on the Unlifted stuff; it's not
> exposed (pardon the pun) in the API, and I don't understand why the
> types are different than the usual (), Either a b, (a,b), etc.
> 
> Second, the idea of representing a sequence as a lazy list of strict
> arrays is very appealing. But why represent it so concretely? Wouldn't
> it make sense to do something like
> 
> data ArrayList a i e = Empty | Chunk !(a i e) (ArrayList a i e)
> ?

someone could do that. we chose to  go with the monomorphic case, since
its easier to get the api right, and the performance right.

> Then array fusion could be written for IArray types in general, and
> the ByteString library would become a set of instances with some
> specialization pragmas. Presumably a more general StorableVector could
> be represented as an IArray, and the NDP stuff seems a good fit too,
> so this would make it easy to leverage future work. Seems to separate
> the various concerns a little better, too, but again maybe there's a
> performance issue I'm missing.
> 
> Sorry for the barrage of questions; I guess there's just a lot I'm
> trying to understand better.

Sure. Hope that clears things up.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] stream/bytestring questions

2008-02-17 Thread Brandon S. Allbery KF8NH


On Feb 17, 2008, at 19:23 , Chad Scherrer wrote:

On Feb 17, 2008 4:13 PM, Brandon S. Allbery KF8NH  
<[EMAIL PROTECTED]> wrote:

Have you looked at the stream-fusion package on Hackage?
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/stream-
fusion-0.1.1


Yeah, I've seen this. It's nice that this is separated, but a little
unsatisfying that the bytestring library reimplements it rather than
just requiring it as a dependency. This, and the fact that bytestrings
are way fast, while the stream-fusion stuff is sometimes slower than
just using lists, make me think that either (1) there's sometimes an
advantage in programming directly to streams, or (2) maybe more of the
stream functions need to be exposed in the API. Or maybe there's
another reason... that's part of my question.


You should be able to find discussion of this here if you search for  
"stream fusion".  The short version, IIRC, is that (a) this is still  
an area of active research, with the above lagging behind the current  
work which is usually in current bytestrings; (b) the spinoff library  
at times conflicts badly with the existing ghc RULEs for lists, and I  
think some changes to the way ghc applies RULEs are going to happen  
at some point in order to ameliorate this.  (ghc's current handling  
of RULEs, inlining, and build/foldr etc. is a bit ad hoc and needs to  
be given structure so that it is possible to accurately control when  
and how they are applied.  This also has been discussed, either here  
or on [EMAIL PROTECTED])


Don Stewart, Duncan Coutts (streams), and Simon Peyton-Jones (ghc)  
should be able to give details on this stuff, and hopefully one or  
more of them will jump in to correct any misrecollections or errors  
I've made.


--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED]
system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED]
electrical and computer engineering, carnegie mellon universityKF8NH


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] stream/bytestring questions

2008-02-17 Thread Chad Scherrer
On Feb 17, 2008 4:13 PM, Brandon S. Allbery KF8NH <[EMAIL PROTECTED]> wrote:

> Have you looked at the stream-fusion package on Hackage?
> http://hackage.haskell.org/cgi-bin/hackage-scripts/package/stream-
> fusion-0.1.1

Yeah, I've seen this. It's nice that this is separated, but a little
unsatisfying that the bytestring library reimplements it rather than
just requiring it as a dependency. This, and the fact that bytestrings
are way fast, while the stream-fusion stuff is sometimes slower than
just using lists, make me think that either (1) there's sometimes an
advantage in programming directly to streams, or (2) maybe more of the
stream functions need to be exposed in the API. Or maybe there's
another reason... that's part of my question.

Chad
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] stream/bytestring questions

2008-02-17 Thread Brandon S. Allbery KF8NH


On Feb 17, 2008, at 18:53 , Chad Scherrer wrote:


ByteStrings have given a real performance boost to a lot of Haskell
applications, and I'm curious why some of the techniques aren't more
abstracted and widely available. If it's "because it's a big job",
that's certainly understandable, but maybe there's something I'm
overlooking that some of the abstractions aren't really desirable.


Have you looked at the stream-fusion package on Hackage?
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/stream- 
fusion-0.1.1


--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED]
system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED]
electrical and computer engineering, carnegie mellon universityKF8NH


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] stream/bytestring questions

2008-02-17 Thread Chad Scherrer
ByteStrings have given a real performance boost to a lot of Haskell
applications, and I'm curious why some of the techniques aren't more
abstracted and widely available. If it's "because it's a big job",
that's certainly understandable, but maybe there's something I'm
overlooking that some of the abstractions aren't really desirable.

My first question is about the stream fusion. This seems (based on the
ByteString paper) to speed things up quite a lot, but for some reason
this isn't necessarily the case for lists. Is this just a matter of
the existing fusion for lists doing such a good job in many cases that
it's hard to beat? The streams paper suggests that some contructors
aren't optimized away during fusion, but wouldn't this also be a
problem for fusion in the bytestring context? Are there many cases
where it helps performance to program to streams directly, rather than
letting the rules go back and forth between them and lists? I tried to
do this, but kept getting hung up on the Unlifted stuff; it's not
exposed (pardon the pun) in the API, and I don't understand why the
types are different than the usual (), Either a b, (a,b), etc.

Second, the idea of representing a sequence as a lazy list of strict
arrays is very appealing. But why represent it so concretely? Wouldn't
it make sense to do something like

data ArrayList a i e = Empty | Chunk !(a i e) (ArrayList a i e)
?

Then array fusion could be written for IArray types in general, and
the ByteString library would become a set of instances with some
specialization pragmas. Presumably a more general StorableVector could
be represented as an IArray, and the NDP stuff seems a good fit too,
so this would make it easy to leverage future work. Seems to separate
the various concerns a little better, too, but again maybe there's a
performance issue I'm missing.

Sorry for the barrage of questions; I guess there's just a lot I'm
trying to understand better.

Thanks,
Chad
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe