Re: [Haskell-cafe] Persistent Concurrent Data Structures

2011-11-01 Thread dokondr
On Wed, Nov 2, 2011 at 3:12 AM, dokondr  wrote:

> Thanks everybody for advice!
> I'll try to clarify what I mean by persistence and concurrent access that
> preserves "happens-before" relationship.
> 1) Persistence - imagine Haskell run-time executing in infinite physical
> memory. My idea is to implement really huge,  "almost infinite memory" in
> the cloud of one or another type. Nothing more nothing less, nothing
> imperative, exactly the same environment that GHC runtime  works today, but
> extended to huge virtual memory in the cloud.
>
>
I am afraid I am confused everybody, didn't mean it, sorry. I understand
that  infinite physical memory is not quite the same that implementing
NData.* on top of some cloud framework. Yet NData.* may be the first step
in this direction. Infinite memory will require support in Haskell
run-time, run-time distributed across many physical instances. Yet, it
looks like physical memory will eventually will be very huge virtual memory
persistent in the cloud for some period of time. IMHO this will simplify
programming model considerably.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Persistent Concurrent Data Structures

2011-11-01 Thread dokondr
Thanks everybody for advice!
I'll try to clarify what I mean by persistence and concurrent access that
preserves "happens-before" relationship.
1) Persistence - imagine Haskell run-time executing in infinite physical
memory. My idea is to implement really huge,  "almost infinite memory" in
the cloud of one or another type. Nothing more nothing less, nothing
imperative, exactly the same environment that GHC runtime  works today, but
extended to huge virtual memory in the cloud.

2) Concurrent access that preserves "happens-before" relationship. Before
talking about transactions, I would use locking data structures in two
different ways:
- Optimistic lock - everybody can read and write / delete simultaneously,
system ensures only "happens-before" relationship. In other words Best
Effort Modification - you can try to modify data, but there is no guarantee
that  your modification will work.
- Pessimistic lock - when you get lock - structure is all yours  as long as
you held the lock - everybody else can only read, "happens-before"
relationship is ensured at all times.
About "happens-before" relationship:
http://en.wikipedia.org/wiki/Happened-before
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Persistent Concurrent Data Structures

2011-11-01 Thread Alberto G. Corona
hi Dimitri

Take a look at TCache. It is a transactional cache with configurable
persistence.

http://hackage.haskell.org/package/TCache

It defines persistent TVars  (DBRef`s)  with similar primitives.
Persistence can be defined by the user for each datatype by an
instance declaration. There is a default persistence in files.

Cache size, synchronization with the database and caching policies are
also defined by the user.

Using this package is easy to define persistent concurrent data
structures. An example are the persistent Queues defined in the
package Workflow:

http://hackage.haskell.org/package/Workflow



2011/11/1 dokondr :
> Hi,
> Please comment on the idea and advise on steps to implement it.
> Real world applications need persistent data, that can be accessed and
> modified concurrently by several clients, in a way that preserves
> "happen-before" relationship.
> Idea: Design and implement Persistent Concurrent Data Types in Haskell.
> These data types should mirror existing Data.List , Data.Map and similar
> types but provide persistency and support consistent concurrent access and
> modification (or simply - "concurrency").
> Persistency and concurrency should be configurable through these type
> interfaces. Configuration should include:
> 1) Media to persist data, such as file, DBMS, external key-value store (for
> example Amazon SimpleDB, CouchDB, MongoDB, Redis, etc)
> 2) Caching policy - when (on what events) and how much data to read/write
> from/to persistent media. Media reads / writes can be done asynchronously in
> separate threads.
> 3) Concurrency configuration: optimistic or pessimistic data locking.
>
> One may ask why encapsulate persistency and concurrency in the data type
> instead of using "native" storage API, such as for example key-value /
> row-column API that  NoSQL databases provide?
> The answer is simple: APIs that your code use greatly influence the code
> itself. Using low-level storage  API directly in your code results in
> bloated obscure code, or you need to encapsulate this low-level API in clear
> and powerful abstractions. So why not to do this encapsulation once and for
> all for such powerful types as Data.Map, for example, and forget all
> Cassandra and SimpleDB low-level access method details?
> When the right time comes and you will need to move your application to the
> next new "shiny_super_cloud", you will just write the implementation of
> NData.Map backed by Data.Map in terms of low-level API of this super-cloud.
>
> (Side note: I really need such a NData.Map type. I was requested to move my
> code that heavily uses Data.Map and simple text file persistence into Amazon
> AWS cloud. Looking at SimpleDB API, I realized that I will have to rewrite
> 90% of code. This rewrite will greatly bloat my code and will make it very
> unreadable. In case I had NData.Map I would just switch implementation from
> 'file' to SimpleDB persistency inside my NData.Map type.)
>
> Implementation:
> To start playing with this idea, NData.Map persisted in a regular file will
> do, no concurrency yet. Next step -   NData.Map persisted in SimpleDB or
> Cassandra or Redis, with concurrent access supported.
>
> So it looks like  NData.Map should be a monad ...
> Any ideas on implementation and similar work?
>
> Thanks!
> Dmitri
> ---
> http://sites.google.com/site/dokondr/welcome
>
>
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Persistent Concurrent Data Structures

2011-11-01 Thread David Barbour
Several of the Haskell web server frameworks (Yesod, HAppS, etc.) come with
persistence support.

I believe you're taking the wrong approach here, with respect to `modified
concurrently` and the like. What does it mean for a Data.List to be
'modified concurrently'? If you need concurrency, first find a good
abstraction for a concurrent collection - one that already covers such
details as ordered or collaborative updates. The result might not look much
like Data.List.



On Tue, Nov 1, 2011 at 3:31 PM, dokondr  wrote:

> Hi,
> Please comment on the idea and advise on steps to implement it.
> Real world applications need persistent data, that can be accessed and
> modified concurrently by several clients, in a way that preserves
> "happen-before" relationship.
> Idea: Design and implement Persistent Concurrent Data Types in Haskell.
> These data types should mirror existing Data.List , Data.Map and similar
> types but provide persistency and support consistent concurrent access and
> modification (or simply - "concurrency").
> Persistency and concurrency should be configurable through these type
> interfaces. Configuration should include:
> 1) Media to persist data, such as file, DBMS, external key-value store
> (for example Amazon SimpleDB, CouchDB, MongoDB, Redis, etc)
> 2) Caching policy - when (on what events) and how much data to read/write
> from/to persistent media. Media reads / writes can be done asynchronously
> in separate threads.
> 3) Concurrency configuration: optimistic or pessimistic data locking.
>
> One may ask why encapsulate persistency and concurrency in the data type
> instead of using "native" storage API, such as for example key-value /
> row-column API that  NoSQL databases provide?
> The answer is simple: APIs that your code use greatly influence the code
> itself. Using low-level storage  API directly in your code results in
> bloated obscure code, or you need to encapsulate this low-level API in
> clear and powerful abstractions. So why not to do this encapsulation once
> and for all for such powerful types as Data.Map, for example, and forget
> all Cassandra and SimpleDB low-level access method details?
> When the right time comes and you will need to move your application to
> the next new "shiny_super_cloud", you will just write the implementation of
> NData.Map backed by Data.Map in terms of low-level API of this super-cloud.
>
> (Side note: I really need such a NData.Map type. I was requested to move
> my code that heavily uses Data.Map and simple text file persistence into
> Amazon AWS cloud. Looking at SimpleDB API, I realized that I will have to
> rewrite 90% of code. This rewrite will greatly bloat my code and will make
> it very unreadable. In case I had NData.Map I would just switch
> implementation from 'file' to SimpleDB persistency inside my NData.Map
> type.)
>
> Implementation:
> To start playing with this idea, NData.Map persisted in a regular file
> will do, no concurrency yet. Next step -   NData.Map persisted in SimpleDB
> or Cassandra or Redis, with concurrent access supported.
>
> So it looks like  NData.Map should be a monad ...
> Any ideas on implementation and similar work?
>
> Thanks!
> Dmitri
> ---
> http://sites.google.com/site/dokondr/welcome
>
>
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Persistent Concurrent Data Structures

2011-11-01 Thread Evan Laforge
So I guess you're talking about imperative mutated data structures
(which is btw the opposite of what "persistence" usually means in
haskell).

It seems like switching data storage would be as easy or hard as
you've been able to abstract it, e.g. if you can put everything
through 'get' and 'put' then it's easy, just change those two
functions.  But if you can't, then maybe it's because you're relying
on features specific to some data store, and then some generic
interface won't help you.

So I guess I'm skeptical that generic auto-serialized types would be
able to help.  Why not write to an interface that expresses exactly
what your app requires instead?  Anyway, unless I'm misunderstanding
your question, this just an imperative style design thing, and no
different in haskell than python or java, except of course haskell
will encourage you to not write in this style in the first place :)

But maybe you can factor out the IO by e.g. 'data <- readData;
writeData (diff data (transform data))' where data is a plain
persistent (in the haskell sense) data structure.  That way
interaction with the storage is in one place.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Persistent Concurrent Data Structures

2011-11-01 Thread Jeremy Shaw
If I have a list [a], and I want to make that persistence, then I have
to have some way to serialize values of type 'a'. If I then modify my
type, then the serialized structure will be out of sync with the new
version of the type -- so I will need some sort of migration feature.

safecopy addresses both the issues of serializing the data and
migrating it when the datastructure changes:

http://hackage.haskell.org/package/safecopy

You should definitely consider using that.

When it comes to concurrency.. my big question is how do you plan to
deal with transaction boundaries / atomicness.

For example, if each function (like, filter, map, etc) is atomic. That
doesn't mean you have something atomic when you do:

 filter pred =<< map f l

something could sneak in between the 'map' and the 'filter'.

An obviously solution would be to do something like:

 transaction $ filter pred =<< map f l

Which could mean that the datastore would have to be locked until that
entire operation is done?

Also.. what does it mean to have a 'persistent list'. In that example,
is map destructive? Does it modify the list ? Or does it produce a new
list?

A somewhat related system is, of course, acid-state (formerly happstack-state).

The solution there is pretty simple, and somewhat more flexible. To
write code you basically just use the State monad. You can store just
about any types you want and use just about any functions you want. To
get and update the state you just use get/set.

simpleTransaction
  do l <- get
  let l' = filter pred (map f l)
  put l'
  return l'

That updates the list and returns the modified list as well.

To make that into a transaction we use a bit of template-haskell to
mark it is a transaction

$(makeAcidic ''MyDatabase ['simpleTransaction])

The appeal of this solution is that you are not limited to just a List
or Map or whatever types people have bother to import into the system.
If you decide you want to use Data.Tree you need only do:

$(deriveSafeCopy 1 'base ''Tree)

And now you can use it persistently and concurrently as well. You do
not have to recreate every function in Data.Tree.

Still, I can see the appeal of just being able to import NData.Map,
deriving a serialize instance for your data, and start writing very
normal looking code. There is something very nice about just being
able to use a function like 'transaction' to mark the boundaries of a
transaction rather than having to give the transaction and name and
call some template haskell function.

Using acid-state, it would be very easy to implement a persistent
version of Data.Map where each function is atomic. However, there is
currently no way to group multiple events into a single transaction.
Though I think I can imagine how to add such a feature. Of course, the
idea of having a big lock blocking everything is not very appealing.
But as an experimental fork it could be interesting..

But, first I would like to hear more about how you imagined
transactions would actually work in the first place..

The big issue I see is that transactions can be a real performance
problem. If I write code for a Map-like persistent structure:

 transaction $ do v <- lookup "key" pmap
 v' <- doSomethingExpensive v
 insert v pmap

That is going to really lock things up, since nothing else can happen
while that transaction is running?

Still, it sounds interesting.. just not easy :)

I would definitely encourage you to consider safecopy at the very
least. It is completely independent of acid-state. It is simply a fast
versioned data serialization library.

- jeremy



On Tue, Nov 1, 2011 at 5:31 PM, dokondr  wrote:
> Hi,
> Please comment on the idea and advise on steps to implement it.
> Real world applications need persistent data, that can be accessed and
> modified concurrently by several clients, in a way that preserves
> "happen-before" relationship.
> Idea: Design and implement Persistent Concurrent Data Types in Haskell.
> These data types should mirror existing Data.List , Data.Map and similar
> types but provide persistency and support consistent concurrent access and
> modification (or simply - "concurrency").
> Persistency and concurrency should be configurable through these type
> interfaces. Configuration should include:
> 1) Media to persist data, such as file, DBMS, external key-value store (for
> example Amazon SimpleDB, CouchDB, MongoDB, Redis, etc)
> 2) Caching policy - when (on what events) and how much data to read/write
> from/to persistent media. Media reads / writes can be done asynchronously in
> separate threads.
> 3) Concurrency configuration: optimistic or pessimistic data locking.
>
> One may ask why encapsulate persistency and concurrency in the data type
> instead of using "native" storage API, such as for example key-value /
> row-column API that  NoSQL databases provide?
> The answer is simple: APIs that your code use greatly influence the code
> itself. Using lo

[Haskell-cafe] Persistent Concurrent Data Structures

2011-11-01 Thread dokondr
Hi,
Please comment on the idea and advise on steps to implement it.
Real world applications need persistent data, that can be accessed and
modified concurrently by several clients, in a way that preserves
"happen-before" relationship.
Idea: Design and implement Persistent Concurrent Data Types in Haskell.
These data types should mirror existing Data.List , Data.Map and similar
types but provide persistency and support consistent concurrent access and
modification (or simply - "concurrency").
Persistency and concurrency should be configurable through these type
interfaces. Configuration should include:
1) Media to persist data, such as file, DBMS, external key-value store (for
example Amazon SimpleDB, CouchDB, MongoDB, Redis, etc)
2) Caching policy - when (on what events) and how much data to read/write
from/to persistent media. Media reads / writes can be done asynchronously
in separate threads.
3) Concurrency configuration: optimistic or pessimistic data locking.

One may ask why encapsulate persistency and concurrency in the data type
instead of using "native" storage API, such as for example key-value /
row-column API that  NoSQL databases provide?
The answer is simple: APIs that your code use greatly influence the code
itself. Using low-level storage  API directly in your code results in
bloated obscure code, or you need to encapsulate this low-level API in
clear and powerful abstractions. So why not to do this encapsulation once
and for all for such powerful types as Data.Map, for example, and forget
all Cassandra and SimpleDB low-level access method details?
When the right time comes and you will need to move your application to the
next new "shiny_super_cloud", you will just write the implementation of
NData.Map backed by Data.Map in terms of low-level API of this super-cloud.

(Side note: I really need such a NData.Map type. I was requested to move my
code that heavily uses Data.Map and simple text file persistence into
Amazon AWS cloud. Looking at SimpleDB API, I realized that I will have to
rewrite 90% of code. This rewrite will greatly bloat my code and will make
it very unreadable. In case I had NData.Map I would just switch
implementation from 'file' to SimpleDB persistency inside my NData.Map
type.)

Implementation:
To start playing with this idea, NData.Map persisted in a regular file will
do, no concurrency yet. Next step -   NData.Map persisted in SimpleDB or
Cassandra or Redis, with concurrent access supported.

So it looks like  NData.Map should be a monad ...
Any ideas on implementation and similar work?

Thanks!
Dmitri
---
http://sites.google.com/site/dokondr/welcome
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe