Philip

Thanks for your email.   We are quite good at lazily reading (and caching) 
things - an example is hGetContents --  but at the moment we don't have a good 
mechanism for un-caching things.  That is, taking a big data structure, 
throwing it away, and replacing it with a thunk.   Clearly we need programmer 
help to know when and how to do this.

As luck would have it, I was talking with Conal Eliot about just such a 
mechanism at ICFP.  Here's the idea.

Suppose we have
                data Revertable a b = REV (a->b) a b
                mkRevertable :: (a->b) -> a -> Revertable a b
                getValue :: Revertable a b -> b

The idea is that a value (REV f x v) is a kind of thunk: it has code (f), its 
argument (x), and its value (v).  If the garbage collector is short of space, 
it's free to rewrite
                REV f x v  -->  REF f x (f x)
thereby discarding v in favour of a small thunk (f x).

In your case (f x) would fetch data from the disk again.  [Which involves I/O 
but I assume that you are willing to promise that teh data isn't changing, so 
unsafePerformIO is ok.]


The main thing I'm unhappy about is that if the user of a Revertable does 
something like
                let x = tail (getValue r)
and evaluates x, then he has a pointer into the value 'v' (inside that 
Revertable), so it'll be retained after all.  But I guess you just have 
Revertables every so often in the structure, at a granularity you decide.


Anyway none of this is implemented. But it doesn't look too hard.  Maybe an 
intern.

Simon



From: Philip Scott [mailto:[email protected]]
Sent: 20 October 2009 20:17
To: Simon Peyton-Jones
Subject: Haskell fun

Hi Simon,

Thanks for coming in the other day, it was very interesting to hear what you 
had to say. Just finished reading your bit in "Coders at Work" too, very 
enlightening. Inspired, I downloaded ghc and have been having a play. I've got 
a little pet problem, and was wondering if you might have any insight to offer. 
It is related to my question I had for you about timeseries data, if you can 
remember it!

So here is the problem:

I have a large (very, ~2 gigabytes) list of <timestamp, value> pairs stored in 
a file. I would like to interact with this thing as a bidirectional lazy stream 
with the ability to do random access. But what would really be crucial is the 
ability to have some kind of cache, both for the initial data fetching and for 
any expensive operations I perform later. Can you think of any way of doing 
this nicely and efficiently in haskell?

For example, I might like to say
mystream = stream("foo.bin")
mystream.find( time("1 Jul 09, 10:00:00 ") )
And then get some values and either increment or decrement my pointer with 
something like
mystream.next
mystream.prev
The hard bit, so far as I can se, would be the ability to keep mystream around 
(e.g. in an interactive program), along with a cache of the values it has 
already found so that if I ask the same question I don't have to return to the 
file. Some ability to manage the cache (e.g. size limit it, get rid of the 
least accessed etc..) would be handy as well.

That part doesn't _feel_ very functional, but I really love the composibility 
of lazy streams of operations. It works just great when I am just zipping 
through a whole file spurting the data out but in an more interactive way I 
have got stuck.

I appreciate you're a busy chap, don't worry too much if you do not get chance 
to answer; I've got Duncan and haskell-beginners to bother next :)

Thanks,

Philip
_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc

Reply via email to