Philip
Thanks for your email. We are quite good at lazily reading (and caching)
things - an example is hGetContents -- but at the moment we don't have a good
mechanism for un-caching things. That is, taking a big data structure,
throwing it away, and replacing it with a thunk. Clearly we need programmer
help to know when and how to do this.
As luck would have it, I was talking with Conal Eliot about just such a
mechanism at ICFP. Here's the idea.
Suppose we have
data Revertable a b = REV (a->b) a b
mkRevertable :: (a->b) -> a -> Revertable a b
getValue :: Revertable a b -> b
The idea is that a value (REV f x v) is a kind of thunk: it has code (f), its
argument (x), and its value (v). If the garbage collector is short of space,
it's free to rewrite
REV f x v --> REF f x (f x)
thereby discarding v in favour of a small thunk (f x).
In your case (f x) would fetch data from the disk again. [Which involves I/O
but I assume that you are willing to promise that teh data isn't changing, so
unsafePerformIO is ok.]
The main thing I'm unhappy about is that if the user of a Revertable does
something like
let x = tail (getValue r)
and evaluates x, then he has a pointer into the value 'v' (inside that
Revertable), so it'll be retained after all. But I guess you just have
Revertables every so often in the structure, at a granularity you decide.
Anyway none of this is implemented. But it doesn't look too hard. Maybe an
intern.
Simon
From: Philip Scott [mailto:[email protected]]
Sent: 20 October 2009 20:17
To: Simon Peyton-Jones
Subject: Haskell fun
Hi Simon,
Thanks for coming in the other day, it was very interesting to hear what you
had to say. Just finished reading your bit in "Coders at Work" too, very
enlightening. Inspired, I downloaded ghc and have been having a play. I've got
a little pet problem, and was wondering if you might have any insight to offer.
It is related to my question I had for you about timeseries data, if you can
remember it!
So here is the problem:
I have a large (very, ~2 gigabytes) list of <timestamp, value> pairs stored in
a file. I would like to interact with this thing as a bidirectional lazy stream
with the ability to do random access. But what would really be crucial is the
ability to have some kind of cache, both for the initial data fetching and for
any expensive operations I perform later. Can you think of any way of doing
this nicely and efficiently in haskell?
For example, I might like to say
mystream = stream("foo.bin")
mystream.find( time("1 Jul 09, 10:00:00 ") )
And then get some values and either increment or decrement my pointer with
something like
mystream.next
mystream.prev
The hard bit, so far as I can se, would be the ability to keep mystream around
(e.g. in an interactive program), along with a cache of the values it has
already found so that if I ask the same question I don't have to return to the
file. Some ability to manage the cache (e.g. size limit it, get rid of the
least accessed etc..) would be handy as well.
That part doesn't _feel_ very functional, but I really love the composibility
of lazy streams of operations. It works just great when I am just zipping
through a whole file spurting the data out but in an more interactive way I
have got stuck.
I appreciate you're a busy chap, don't worry too much if you do not get chance
to answer; I've got Duncan and haskell-beginners to bother next :)
Thanks,
Philip
_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc