Re: [Haskell-cafe] weak pointers and memoization (was Re: memoization)

2009-09-18 Thread Job Vranish
Hey it works :D
Here is a proof of concept:
http://gist.github.com/189104

Maybe later today I'll try to make a version that can be safely used outside
IO.

- Job


On Fri, Sep 18, 2009 at 10:19 AM, Job Vranish  wrote:

> Yeah it seems like the general solution to the problem would be some sort
> of map-like datastructure that you add items via a key/value pair, and if
> the key gets GC'd, that entry gets removed from the structure.
>
> I've been wanting something like this as well, but didn't know about weak
> references so I didn't know if it was possible, but I think I could make
> something like this now. I'll give it a shot and let you guys know how it
> goes.
>
> Rodney could you post your memo code that uses the weak references?
>
> - Job
>
>
> On Fri, Sep 18, 2009 at 7:56 AM, Peter Verswyvelen wrote:
>
>> I would also like to see a solution for problems like these.
>>
>> Haskell provides a lot of nice memoizing / caching data structures -
>> like a trie - but the ones I know indeed keep growing, so no garbage
>> collection takes place?
>>
>> It would be nice to have a data structure that performs caching but
>> does not grow unlimited.
>>
>> I had a similar problem with stable names; it is not possible to check
>> if a stable name is still "alive".
>>
>> On Fri, Sep 18, 2009 at 1:39 AM, Rodney Price 
>> wrote:
>> > In my case, the results of each computation are used to generate a node
>> > in a graph structure (dag).  The key, oddly, is a hash of a two-tuple
>> > that gets stored in the data structure after the computation of the
>> > node finishes.  If I don't memoize the function to build a node, the
>> > cost of generating the tree is exponential; if I do, it's somewhere
>> > between linear and quadratic.
>> >
>> > Another process prunes parts of this graph structure as time goes on.
>> > The entire data structure is intended to be persistent, lasting for
>> > days at a time in a server-like application.  If the parts pruned
>> > aren't garbage collected, the space leak will eventually be
>> > catastrophic.  Either the memo table or the graph structure itself will
>> > outgrow available memory.
>> >
>> > -Rod
>> >
>> >
>> > On Thu, 17 Sep 2009 13:32:13 -0400
>> > Job Vranish  wrote:
>> >
>> >> What are you trying to use this for? It seems to me that for memo
>> >> tables you almost never have references to they keys outside the
>> >> lookup table since the keys are usually computed right at the last
>> >> minute, and then discarded (otherwise it might be easier to just
>> >> cache stuff outside the function).
>> >>
>> >> For example with a naive fibs, the values you are passing in are
>> >> computed, and probably don't exist before you do the recursive call,
>> >> and then are discarded shortly afterward.
>> >>
>> >> It seems like putting a cap on the cache size, and then just
>> >> overwriting old entries would be better.
>> >> Am I missing something?
>> >>
>> >> - Job
>> >>
>> >>
>> >>
>> >> On Wed, Sep 16, 2009 at 4:48 PM, Rodney Price 
>> >> wrote:
>> >>
>> >> > How does garbage collection work in an example like the one below?
>> >> > You memoize a function with some sort of lookup table, which stores
>> >> > function arguments as keys and function results as values.  As long
>> >> > as the function remains in scope, the keys in the lookup table
>> >> > remain in memory, which means that the keys themselves always
>> >> > remain reachable and they cannot be garbage collected.  Right?
>> >> >
>> >> > So what do you do in the case where you know that, after some
>> >> > period of time, some entries in the lookup table will never be
>> >> > accessed?  That is, there are no references to the keys for some
>> >> > entries remaining, except for the references in the lookup table
>> >> > itself.  You'd like to allow the memory occupied by the keys to be
>> >> > garbage collected.  Otherwise, if the function stays around for a
>> >> > long time, the size of the lookup table always grows.  How do you
>> >> > avoid the space leak?
>> >> >
>> >> > I notice that there is a function in Data.IORef,
>> >> >
>> >> > mkWeakIORef :: IORef a -> IO () -> IO (Weak (IORef a))
>> >> >
>> >> > which looks promising.  In the code below, however, there's only one
>> >> > IORef, so either the entire table gets garbage collected or none of
>> >> > it does.
>> >> >
>> >> > I've been reading the paper "Stretching the storage manager: weak
>> >> > pointers and stable names in Haskell," which seems to answer my
>> >> > question.  When I attempt to run the memoization code in the paper
>> >> > on the simple fib example, I find that -- apparently due to lazy
>> >> > evaluation -- no new entries are entered into the lookup table, and
>> >> > therefore no lookups are ever successful!
>> >> >
>> >> > So apparently there is some interaction between lazy evaluation and
>> >> > garbage collection that I don't understand.  My head hurts.  Is it
>> >> > necessary to make the table lookup operation strict?  Or is it
>> >> > somethi

Re: [Haskell-cafe] weak pointers and memoization (was Re: memoization)

2009-09-18 Thread Job Vranish
Yeah it seems like the general solution to the problem would be some sort of
map-like datastructure that you add items via a key/value pair, and if the
key gets GC'd, that entry gets removed from the structure.

I've been wanting something like this as well, but didn't know about weak
references so I didn't know if it was possible, but I think I could make
something like this now. I'll give it a shot and let you guys know how it
goes.

Rodney could you post your memo code that uses the weak references?

- Job

On Fri, Sep 18, 2009 at 7:56 AM, Peter Verswyvelen wrote:

> I would also like to see a solution for problems like these.
>
> Haskell provides a lot of nice memoizing / caching data structures -
> like a trie - but the ones I know indeed keep growing, so no garbage
> collection takes place?
>
> It would be nice to have a data structure that performs caching but
> does not grow unlimited.
>
> I had a similar problem with stable names; it is not possible to check
> if a stable name is still "alive".
>
> On Fri, Sep 18, 2009 at 1:39 AM, Rodney Price 
> wrote:
> > In my case, the results of each computation are used to generate a node
> > in a graph structure (dag).  The key, oddly, is a hash of a two-tuple
> > that gets stored in the data structure after the computation of the
> > node finishes.  If I don't memoize the function to build a node, the
> > cost of generating the tree is exponential; if I do, it's somewhere
> > between linear and quadratic.
> >
> > Another process prunes parts of this graph structure as time goes on.
> > The entire data structure is intended to be persistent, lasting for
> > days at a time in a server-like application.  If the parts pruned
> > aren't garbage collected, the space leak will eventually be
> > catastrophic.  Either the memo table or the graph structure itself will
> > outgrow available memory.
> >
> > -Rod
> >
> >
> > On Thu, 17 Sep 2009 13:32:13 -0400
> > Job Vranish  wrote:
> >
> >> What are you trying to use this for? It seems to me that for memo
> >> tables you almost never have references to they keys outside the
> >> lookup table since the keys are usually computed right at the last
> >> minute, and then discarded (otherwise it might be easier to just
> >> cache stuff outside the function).
> >>
> >> For example with a naive fibs, the values you are passing in are
> >> computed, and probably don't exist before you do the recursive call,
> >> and then are discarded shortly afterward.
> >>
> >> It seems like putting a cap on the cache size, and then just
> >> overwriting old entries would be better.
> >> Am I missing something?
> >>
> >> - Job
> >>
> >>
> >>
> >> On Wed, Sep 16, 2009 at 4:48 PM, Rodney Price 
> >> wrote:
> >>
> >> > How does garbage collection work in an example like the one below?
> >> > You memoize a function with some sort of lookup table, which stores
> >> > function arguments as keys and function results as values.  As long
> >> > as the function remains in scope, the keys in the lookup table
> >> > remain in memory, which means that the keys themselves always
> >> > remain reachable and they cannot be garbage collected.  Right?
> >> >
> >> > So what do you do in the case where you know that, after some
> >> > period of time, some entries in the lookup table will never be
> >> > accessed?  That is, there are no references to the keys for some
> >> > entries remaining, except for the references in the lookup table
> >> > itself.  You'd like to allow the memory occupied by the keys to be
> >> > garbage collected.  Otherwise, if the function stays around for a
> >> > long time, the size of the lookup table always grows.  How do you
> >> > avoid the space leak?
> >> >
> >> > I notice that there is a function in Data.IORef,
> >> >
> >> > mkWeakIORef :: IORef a -> IO () -> IO (Weak (IORef a))
> >> >
> >> > which looks promising.  In the code below, however, there's only one
> >> > IORef, so either the entire table gets garbage collected or none of
> >> > it does.
> >> >
> >> > I've been reading the paper "Stretching the storage manager: weak
> >> > pointers and stable names in Haskell," which seems to answer my
> >> > question.  When I attempt to run the memoization code in the paper
> >> > on the simple fib example, I find that -- apparently due to lazy
> >> > evaluation -- no new entries are entered into the lookup table, and
> >> > therefore no lookups are ever successful!
> >> >
> >> > So apparently there is some interaction between lazy evaluation and
> >> > garbage collection that I don't understand.  My head hurts.  Is it
> >> > necessary to make the table lookup operation strict?  Or is it
> >> > something entirely different that I am missing?
> >> >
> >> > -Rod
> >> >
> >> >
> >> > On Thu, 10 Sep 2009 18:33:47 -0700
> >> > Ryan Ingram  wrote:
> >> >
> >> > >
> >> > > memoIO :: Ord a => (a -> b) -> IO (a -> IO b)
> >> > > memoIO f = do
> >> > >cache <- newIORef M.empty
> >> > >return $ \x -> do
> >> > >m <- readIORef c

Re: [Haskell-cafe] weak pointers and memoization (was Re: memoization)

2009-09-18 Thread Peter Verswyvelen
I would also like to see a solution for problems like these.

Haskell provides a lot of nice memoizing / caching data structures -
like a trie - but the ones I know indeed keep growing, so no garbage
collection takes place?

It would be nice to have a data structure that performs caching but
does not grow unlimited.

I had a similar problem with stable names; it is not possible to check
if a stable name is still "alive".

On Fri, Sep 18, 2009 at 1:39 AM, Rodney Price  wrote:
> In my case, the results of each computation are used to generate a node
> in a graph structure (dag).  The key, oddly, is a hash of a two-tuple
> that gets stored in the data structure after the computation of the
> node finishes.  If I don't memoize the function to build a node, the
> cost of generating the tree is exponential; if I do, it's somewhere
> between linear and quadratic.
>
> Another process prunes parts of this graph structure as time goes on.
> The entire data structure is intended to be persistent, lasting for
> days at a time in a server-like application.  If the parts pruned
> aren't garbage collected, the space leak will eventually be
> catastrophic.  Either the memo table or the graph structure itself will
> outgrow available memory.
>
> -Rod
>
>
> On Thu, 17 Sep 2009 13:32:13 -0400
> Job Vranish  wrote:
>
>> What are you trying to use this for? It seems to me that for memo
>> tables you almost never have references to they keys outside the
>> lookup table since the keys are usually computed right at the last
>> minute, and then discarded (otherwise it might be easier to just
>> cache stuff outside the function).
>>
>> For example with a naive fibs, the values you are passing in are
>> computed, and probably don't exist before you do the recursive call,
>> and then are discarded shortly afterward.
>>
>> It seems like putting a cap on the cache size, and then just
>> overwriting old entries would be better.
>> Am I missing something?
>>
>> - Job
>>
>>
>>
>> On Wed, Sep 16, 2009 at 4:48 PM, Rodney Price 
>> wrote:
>>
>> > How does garbage collection work in an example like the one below?
>> > You memoize a function with some sort of lookup table, which stores
>> > function arguments as keys and function results as values.  As long
>> > as the function remains in scope, the keys in the lookup table
>> > remain in memory, which means that the keys themselves always
>> > remain reachable and they cannot be garbage collected.  Right?
>> >
>> > So what do you do in the case where you know that, after some
>> > period of time, some entries in the lookup table will never be
>> > accessed?  That is, there are no references to the keys for some
>> > entries remaining, except for the references in the lookup table
>> > itself.  You'd like to allow the memory occupied by the keys to be
>> > garbage collected.  Otherwise, if the function stays around for a
>> > long time, the size of the lookup table always grows.  How do you
>> > avoid the space leak?
>> >
>> > I notice that there is a function in Data.IORef,
>> >
>> > mkWeakIORef :: IORef a -> IO () -> IO (Weak (IORef a))
>> >
>> > which looks promising.  In the code below, however, there's only one
>> > IORef, so either the entire table gets garbage collected or none of
>> > it does.
>> >
>> > I've been reading the paper "Stretching the storage manager: weak
>> > pointers and stable names in Haskell," which seems to answer my
>> > question.  When I attempt to run the memoization code in the paper
>> > on the simple fib example, I find that -- apparently due to lazy
>> > evaluation -- no new entries are entered into the lookup table, and
>> > therefore no lookups are ever successful!
>> >
>> > So apparently there is some interaction between lazy evaluation and
>> > garbage collection that I don't understand.  My head hurts.  Is it
>> > necessary to make the table lookup operation strict?  Or is it
>> > something entirely different that I am missing?
>> >
>> > -Rod
>> >
>> >
>> > On Thu, 10 Sep 2009 18:33:47 -0700
>> > Ryan Ingram  wrote:
>> >
>> > >
>> > > memoIO :: Ord a => (a -> b) -> IO (a -> IO b)
>> > > memoIO f = do
>> > >    cache <- newIORef M.empty
>> > >    return $ \x -> do
>> > >        m <- readIORef cache
>> > >        case M.lookup x m of
>> > >            Just y -> return y
>> > >            Nothing -> do let res = f x
>> > >                          writeIORef cache $ M.insert x res m
>> > >                          return res
>> > >
>> > > memo :: Ord a => (a -> b) -> (a -> b)
>> > > memo f = unsafePerformIO $ do
>> > >     fmemo <- memoIO f
>> > >     return (unsafePerformIO . fmemo)
>> > >
>> > > I don't think there is any valid transformation that breaks this,
>> > > since the compiler can't lift anything through unsafePerformIO.
>> > > Am I mistaken?
>> > >
>> > >   -- ryan
>> >
>> > ___
>> > Haskell-Cafe mailing list
>> > Haskell-Cafe@haskell.org
>> > http://www.haskell.org/mailman/listinfo/haskell-cafe
>> >
>

Re: [Haskell-cafe] weak pointers and memoization (was Re: memoization)

2009-09-17 Thread Rodney Price
In my case, the results of each computation are used to generate a node
in a graph structure (dag).  The key, oddly, is a hash of a two-tuple
that gets stored in the data structure after the computation of the
node finishes.  If I don't memoize the function to build a node, the
cost of generating the tree is exponential; if I do, it's somewhere
between linear and quadratic.

Another process prunes parts of this graph structure as time goes on.
The entire data structure is intended to be persistent, lasting for
days at a time in a server-like application.  If the parts pruned
aren't garbage collected, the space leak will eventually be
catastrophic.  Either the memo table or the graph structure itself will
outgrow available memory.

-Rod


On Thu, 17 Sep 2009 13:32:13 -0400
Job Vranish  wrote:

> What are you trying to use this for? It seems to me that for memo
> tables you almost never have references to they keys outside the
> lookup table since the keys are usually computed right at the last
> minute, and then discarded (otherwise it might be easier to just
> cache stuff outside the function).
> 
> For example with a naive fibs, the values you are passing in are
> computed, and probably don't exist before you do the recursive call,
> and then are discarded shortly afterward.
> 
> It seems like putting a cap on the cache size, and then just
> overwriting old entries would be better.
> Am I missing something?
> 
> - Job
> 
> 
> 
> On Wed, Sep 16, 2009 at 4:48 PM, Rodney Price 
> wrote:
> 
> > How does garbage collection work in an example like the one below?
> > You memoize a function with some sort of lookup table, which stores
> > function arguments as keys and function results as values.  As long
> > as the function remains in scope, the keys in the lookup table
> > remain in memory, which means that the keys themselves always
> > remain reachable and they cannot be garbage collected.  Right?
> >
> > So what do you do in the case where you know that, after some
> > period of time, some entries in the lookup table will never be
> > accessed?  That is, there are no references to the keys for some
> > entries remaining, except for the references in the lookup table
> > itself.  You'd like to allow the memory occupied by the keys to be
> > garbage collected.  Otherwise, if the function stays around for a
> > long time, the size of the lookup table always grows.  How do you
> > avoid the space leak?
> >
> > I notice that there is a function in Data.IORef,
> >
> > mkWeakIORef :: IORef a -> IO () -> IO (Weak (IORef a))
> >
> > which looks promising.  In the code below, however, there's only one
> > IORef, so either the entire table gets garbage collected or none of
> > it does.
> >
> > I've been reading the paper "Stretching the storage manager: weak
> > pointers and stable names in Haskell," which seems to answer my
> > question.  When I attempt to run the memoization code in the paper
> > on the simple fib example, I find that -- apparently due to lazy
> > evaluation -- no new entries are entered into the lookup table, and
> > therefore no lookups are ever successful!
> >
> > So apparently there is some interaction between lazy evaluation and
> > garbage collection that I don't understand.  My head hurts.  Is it
> > necessary to make the table lookup operation strict?  Or is it
> > something entirely different that I am missing?
> >
> > -Rod
> >
> >
> > On Thu, 10 Sep 2009 18:33:47 -0700
> > Ryan Ingram  wrote:
> >
> > >
> > > memoIO :: Ord a => (a -> b) -> IO (a -> IO b)
> > > memoIO f = do
> > >cache <- newIORef M.empty
> > >return $ \x -> do
> > >m <- readIORef cache
> > >case M.lookup x m of
> > >Just y -> return y
> > >Nothing -> do let res = f x
> > >  writeIORef cache $ M.insert x res m
> > >  return res
> > >
> > > memo :: Ord a => (a -> b) -> (a -> b)
> > > memo f = unsafePerformIO $ do
> > > fmemo <- memoIO f
> > > return (unsafePerformIO . fmemo)
> > >
> > > I don't think there is any valid transformation that breaks this,
> > > since the compiler can't lift anything through unsafePerformIO.
> > > Am I mistaken?
> > >
> > >   -- ryan
> >
> > ___
> > Haskell-Cafe mailing list
> > Haskell-Cafe@haskell.org
> > http://www.haskell.org/mailman/listinfo/haskell-cafe
> >

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] weak pointers and memoization (was Re: memoization)

2009-09-17 Thread Job Vranish
What are you trying to use this for? It seems to me that for memo tables you
almost never have references to they keys outside the lookup table since the
keys are usually computed right at the last minute, and then discarded
(otherwise it might be easier to just cache stuff outside the function).

For example with a naive fibs, the values you are passing in are computed,
and probably don't exist before you do the recursive call, and then are
discarded shortly afterward.

It seems like putting a cap on the cache size, and then just overwriting old
entries would be better.
Am I missing something?

- Job



On Wed, Sep 16, 2009 at 4:48 PM, Rodney Price  wrote:

> How does garbage collection work in an example like the one below?  You
> memoize a function with some sort of lookup table, which stores function
> arguments as keys and function results as values.  As long as the
> function remains in scope, the keys in the lookup table remain in
> memory, which means that the keys themselves always remain reachable
> and they cannot be garbage collected.  Right?
>
> So what do you do in the case where you know that, after some period of
> time, some entries in the lookup table will never be accessed?  That is,
> there are no references to the keys for some entries remaining, except
> for the references in the lookup table itself.  You'd like to allow the
> memory occupied by the keys to be garbage collected.  Otherwise, if the
> function stays around for a long time, the size of the lookup table
> always grows.  How do you avoid the space leak?
>
> I notice that there is a function in Data.IORef,
>
> mkWeakIORef :: IORef a -> IO () -> IO (Weak (IORef a))
>
> which looks promising.  In the code below, however, there's only one
> IORef, so either the entire table gets garbage collected or none of it
> does.
>
> I've been reading the paper "Stretching the storage manager: weak
> pointers and stable names in Haskell," which seems to answer my
> question.  When I attempt to run the memoization code in the paper on
> the simple fib example, I find that -- apparently due to lazy
> evaluation -- no new entries are entered into the lookup table, and
> therefore no lookups are ever successful!
>
> So apparently there is some interaction between lazy evaluation and
> garbage collection that I don't understand.  My head hurts.  Is it
> necessary to make the table lookup operation strict?  Or is it
> something entirely different that I am missing?
>
> -Rod
>
>
> On Thu, 10 Sep 2009 18:33:47 -0700
> Ryan Ingram  wrote:
>
> >
> > memoIO :: Ord a => (a -> b) -> IO (a -> IO b)
> > memoIO f = do
> >cache <- newIORef M.empty
> >return $ \x -> do
> >m <- readIORef cache
> >case M.lookup x m of
> >Just y -> return y
> >Nothing -> do let res = f x
> >  writeIORef cache $ M.insert x res m
> >  return res
> >
> > memo :: Ord a => (a -> b) -> (a -> b)
> > memo f = unsafePerformIO $ do
> > fmemo <- memoIO f
> > return (unsafePerformIO . fmemo)
> >
> > I don't think there is any valid transformation that breaks this,
> > since the compiler can't lift anything through unsafePerformIO.  Am I
> > mistaken?
> >
> >   -- ryan
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe