Re: [Haskell-cafe] Amazon AWS storage best to use with Haskell?

2011-11-16 Thread dokondr
Steve, thanks for sharing your experience with AWS!
At the moment I have evaluated several NoSQL storage solutions including
SimpleDB, Riak, MongoDB and Cassandra. Lessons learned:
1) Storage that SimpleDB provides is too low-level and not very convenient
to store dictionaries and other b-tree data structures that my app. works
with.
2) "simpledb/dev" simulator is out of date and does not support the
complete feature set of SimpleDB today. Thus, without major rewrite
"simpledb/dev" emulator can not be used for the development.
3) SimpleDB storage is 100% specific to Amazon framework. From this follows
that developing directly to SimpleDB interface will make app not portable
across different cloud platforms.
4) Cassandra row/column abstraction is awkward for Data.Map structures that
my app needs.
5) Riak provides convenient bucket/key/value abstraction and works in
robust to failure node framework. REST/JSON protocol is simple to use, yet
it is inefficient for data exchanges used by my app. I couldn't find simple
libraries for binary exchange that Riak also supports.
6) MongoDB answers my requirements best of all - it is powerful on a server
side (Javascript filters, etc) and works with efficient communication
protocol based on BSON data exchange.

I also plan to use RabitMQ  for communication between several Haskell
processes and Java Web front-end that my app incorporates.
It would be great to know what tools people use in the cloud (AWS, etc.) to
communicate Web front-end with rest of the (Haskell) system ?
What Haskell tools to build Web front-end?

Thanks!
Dmitri


On Wed, Nov 16, 2011 at 9:01 PM, Steve Severance wrote:

> We use AWS extensively. We use the aws package and have contributed to it,
> specifically SQS functionality. I will give you the rundown of what we do.
>
> We moved off of SimpleDb and now use mondodb. The reason is that simple db
> seemed to have problems with write pressure and there are not good tools
> for profiling your queries. My main application is extremely write heavy
> with a single instance needing to do 100s or 1000s of writes a second.
> Mongodb has worked well for us. I am scared of things like cassandra having
> looked at the code, however some people have made it work.
>
> We store data such as crawled web pages in S3. The files are lzma
> compressed and the data format is built on protocol buffers. We picked lzma
> for both storage costs of cold data and the fact that the pipe between S3
> and EC2 is somewhat limited and we want to make the most effective use of
> it as possible.
>
> In my experience AWS simulators are more trouble than they are worth since
> they don't accurately model the way AWS will respond to you under load. The
> free tier at AWS should allow you to experiment with building an app. The
> first couple of months of development cost us less than $1.
>
> Steve
>
> On Tue, Nov 1, 2011 at 1:27 AM, dokondr  wrote:
>
>>
>>
>> On Tue, Nov 1, 2011 at 10:53 AM, Neil Davies <
>> semanticphilosop...@gmail.com> wrote:
>>
>>> Word of caution
>>>
>>> Understand the semantics (and cost profile) of the AWS services first -
>>> you can't just open a HTTP connection and dribble data out over several
>>> days and hope for things to work. It is not a system that has that sort of
>>> laziness at its heart.
>>>
>>> AWS doesn't supply a traditional remote file store semantics - is
>>> queuing, simple database and object store have all been designed for large
>>> scale systems being offered as a service to a (potentially hostile) large
>>> set of users - you can see that in the way that things are designed. There
>>> are all sorts of (sensible from their point of view) performance related
>>> limits and retries.
>>>
>>> The challenge in designing nice clean layers on top of AWS is how/when
>>> to hide the transient/load related failures.
>>>
>>>
>>>
>> As a straw-man approach I would go first to NData.Map backed by Data.Map
>> with addition of "flush" function  to write Data.Map to external key-value
>> store / NoSQL DB.
>> Another requirement for NData.Map is concurrent consistency, so different
>> clients could modify its state preserving "happen-before" relationship. For
>> this I would add to NData.Map a "reftresh" function, that updates local
>> copy from  external key-value store.
>>
>> As for hSimpleDB package, it looks like it doesn't build on ghc7:
>> http://hackage.haskell.org/package/hSimpleDB
>>
>>
>>> The hSimpleDB package
>>>
>>> Interface to Amazon's SimpleDB service.
>>> PropertiesVersions0.1 ,
>>> 0.2 , *0.3*
>>> Dependenciesbase  (≥3
>>> & ≤4), bytestring,
>>> Crypto , 
>>> dataenc,
>>> HTTP , 
>>> hxt

Re: [Haskell-cafe] Amazon AWS storage best to use with Haskell?

2011-11-16 Thread Steve Severance
We use AWS extensively. We use the aws package and have contributed to it,
specifically SQS functionality. I will give you the rundown of what we do.

We moved off of SimpleDb and now use mondodb. The reason is that simple db
seemed to have problems with write pressure and there are not good tools
for profiling your queries. My main application is extremely write heavy
with a single instance needing to do 100s or 1000s of writes a second.
Mongodb has worked well for us. I am scared of things like cassandra having
looked at the code, however some people have made it work.

We store data such as crawled web pages in S3. The files are lzma
compressed and the data format is built on protocol buffers. We picked lzma
for both storage costs of cold data and the fact that the pipe between S3
and EC2 is somewhat limited and we want to make the most effective use of
it as possible.

In my experience AWS simulators are more trouble than they are worth since
they don't accurately model the way AWS will respond to you under load. The
free tier at AWS should allow you to experiment with building an app. The
first couple of months of development cost us less than $1.

Steve

On Tue, Nov 1, 2011 at 1:27 AM, dokondr  wrote:

>
>
> On Tue, Nov 1, 2011 at 10:53 AM, Neil Davies <
> semanticphilosop...@gmail.com> wrote:
>
>> Word of caution
>>
>> Understand the semantics (and cost profile) of the AWS services first -
>> you can't just open a HTTP connection and dribble data out over several
>> days and hope for things to work. It is not a system that has that sort of
>> laziness at its heart.
>>
>> AWS doesn't supply a traditional remote file store semantics - is
>> queuing, simple database and object store have all been designed for large
>> scale systems being offered as a service to a (potentially hostile) large
>> set of users - you can see that in the way that things are designed. There
>> are all sorts of (sensible from their point of view) performance related
>> limits and retries.
>>
>> The challenge in designing nice clean layers on top of AWS is how/when to
>> hide the transient/load related failures.
>>
>>
>>
> As a straw-man approach I would go first to NData.Map backed by Data.Map
> with addition of "flush" function  to write Data.Map to external key-value
> store / NoSQL DB.
> Another requirement for NData.Map is concurrent consistency, so different
> clients could modify its state preserving "happen-before" relationship. For
> this I would add to NData.Map a "reftresh" function, that updates local
> copy from  external key-value store.
>
> As for hSimpleDB package, it looks like it doesn't build on ghc7:
> http://hackage.haskell.org/package/hSimpleDB
>
>
>> The hSimpleDB package
>>
>> Interface to Amazon's SimpleDB service.
>> PropertiesVersions0.1 ,
>> 0.2 , *0.3*
>> Dependenciesbase  (≥3 &
>> ≤4), bytestring ,
>> Crypto , 
>> dataenc,
>> HTTP , 
>> hxt,
>> network , 
>> old-locale,
>> old-time ,
>> utf8-string 
>> LicenseBSD3AuthorDavid Himmelstrup 2009, Greg Heartsfield 2007Maintainer 
>> David
>> Himmelstrup 
>> CategoryDatabase,
>> Web ,
>> Network
>>  Upload
>> dateThu Sep 17 17:09:26 UTC 2009Uploaded byDavidHimmelstrupBuilt on ghc-6.10,
>> ghc-6.12Build failureghc-7.0 
>> (log
>> )
>>
>
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Amazon AWS storage best to use with Haskell?

2011-11-01 Thread dokondr
On Tue, Nov 1, 2011 at 10:53 AM, Neil Davies
wrote:

> Word of caution
>
> Understand the semantics (and cost profile) of the AWS services first -
> you can't just open a HTTP connection and dribble data out over several
> days and hope for things to work. It is not a system that has that sort of
> laziness at its heart.
>
> AWS doesn't supply a traditional remote file store semantics - is queuing,
> simple database and object store have all been designed for large scale
> systems being offered as a service to a (potentially hostile) large set of
> users - you can see that in the way that things are designed. There are all
> sorts of (sensible from their point of view) performance related limits and
> retries.
>
> The challenge in designing nice clean layers on top of AWS is how/when to
> hide the transient/load related failures.
>
>
>
As a straw-man approach I would go first to NData.Map backed by Data.Map
with addition of "flush" function  to write Data.Map to external key-value
store / NoSQL DB.
Another requirement for NData.Map is concurrent consistency, so different
clients could modify its state preserving "happen-before" relationship. For
this I would add to NData.Map a "reftresh" function, that updates local
copy from  external key-value store.

As for hSimpleDB package, it looks like it doesn't build on ghc7:
http://hackage.haskell.org/package/hSimpleDB


> The hSimpleDB package
>
> Interface to Amazon's SimpleDB service.
> PropertiesVersions0.1 ,
> 0.2 , *0.3*Dependencies
> base  (≥3 & ≤4),
> bytestring , 
> Crypto,
> dataenc , 
> HTTP,
> hxt , 
> network,
> old-locale ,
> old-time ,
> utf8-string License
> BSD3AuthorDavid Himmelstrup 2009, Greg Heartsfield 2007MaintainerDavid
> Himmelstrup 
> CategoryDatabase,
> Web ,
> NetworkUpload
> dateThu Sep 17 17:09:26 UTC 2009Uploaded byDavidHimmelstrupBuilt onghc-6.10,
> ghc-6.12Build failureghc-7.0 
> (log
> )
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Amazon AWS storage best to use with Haskell?

2011-11-01 Thread Neil Davies
Word of caution

Understand the semantics (and cost profile) of the AWS services first - you 
can't just open a HTTP connection and dribble data out over several days and 
hope for things to work. It is not a system that has that sort of laziness at 
its heart.

AWS doesn't supply a traditional remote file store semantics - is queuing, 
simple database and object store have all been designed for large scale systems 
being offered as a service to a (potentially hostile) large set of users - you 
can see that in the way that things are designed. There are all sorts of 
(sensible from their point of view) performance related limits and retries.

The challenge in designing nice clean layers on top of AWS is how/when to hide 
the transient/load related failures.



Neil


On 1 Nov 2011, at 06:21, dokondr wrote:

> On Tue, Nov 1, 2011 at 5:03 AM, Ryan Newton  wrote:
>  Any example code of using hscassandra package would really help!
> 
> I'll ask my student.  We may have some simple examples.
> 
> Also, I have no idea as to their quality but I was pleasantly surprised to 
> find three different amazon related packages on Hackage (simply by searching 
> for the word "Amazon" in the package list).  
> 
>http://hackage.haskell.org/package/hS3
>http://hackage.haskell.org/package/hSimpleDB
>http://hackage.haskell.org/package/aws
> 
> It would be great to know if these work.
> 
>  
> Thinking about how to implement Data.Map on top of hscassandra or any other 
> key-value storage ...
> For example creating new map with "fromList" will require to store *all* 
> (key, value) list elements in external storage at once. How to deal with 
> laziness in this case?
> 
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Amazon AWS storage best to use with Haskell?

2011-11-01 Thread Neil Davies
We use all three (in various ways as they have arrived on the scene over time) 
in production systems.


On 1 Nov 2011, at 02:03, Ryan Newton wrote:

>  Any example code of using hscassandra package would really help!
> 
> I'll ask my student.  We may have some simple examples.
> 
> Also, I have no idea as to their quality but I was pleasantly surprised to 
> find three different amazon related packages on Hackage (simply by searching 
> for the word "Amazon" in the package list).  
> 
>http://hackage.haskell.org/package/hS3
>http://hackage.haskell.org/package/hSimpleDB
>http://hackage.haskell.org/package/aws
> 
> It would be great to know if these work.
> 
>  -Ryan
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Amazon AWS storage best to use with Haskell?

2011-10-31 Thread dokondr
On Tue, Nov 1, 2011 at 5:03 AM, Ryan Newton  wrote:

>  Any example code of using hscassandra package would really help!
>>
>
> I'll ask my student.  We may have some simple examples.
>
> Also, I have no idea as to their quality but I was pleasantly surprised to
> find three different amazon related packages on Hackage (simply by
> searching for the word "Amazon" in the package list).
>
>http://hackage.haskell.org/package/hS3
>http://hackage.haskell.org/package/hSimpleDB
>http://hackage.haskell.org/package/aws
>
> It would be great to know if these work.
>
>
Thinking about how to implement Data.Map on top of hscassandra or any other
key-value storage ...
For example creating new map with "fromList" will require to store *all*
(key, value) list elements in external storage at once. How to deal with
laziness in this case?
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Amazon AWS storage best to use with Haskell?

2011-10-31 Thread Ryan Newton
>
>  Any example code of using hscassandra package would really help!
>

I'll ask my student.  We may have some simple examples.

Also, I have no idea as to their quality but I was pleasantly surprised to
find three different amazon related packages on Hackage (simply by
searching for the word "Amazon" in the package list).

   http://hackage.haskell.org/package/hS3
   http://hackage.haskell.org/package/hSimpleDB
   http://hackage.haskell.org/package/aws

It would be great to know if these work.

 -Ryan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Amazon AWS storage best to use with Haskell?

2011-10-31 Thread dokondr
On Tue, Nov 1, 2011 at 12:07 AM, Ryan Newton  wrote:

> ...
> For a NOSQL layer -- I'm looking for the answer to that same question
> myself!  We've been experimenting with Cassandra (used via the hscassandra
> package based in turn on cassandra-thrift).  Already it's clear that there
> are many areas that need work.  The Haskell code generated by Thrift itself
> has a lot of room for improvement (for the intrepid hacker: cycles there
> would be well-spent).
>
>
Any example code of using hscassandra package would really help!
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Amazon AWS storage best to use with Haskell?

2011-10-31 Thread Ryan Newton
For distributed execution you can look at the recent work on "CloudHaskell":

   https://github.com/jepst/CloudHaskell
   http://groups.google.com/group/cloudhaskell

As for a programming model -- Philip Trinder et. al have a version of
monad-par that works in a distributed way over CloudHaskell, likewise
CloudHaskell itself provides a simple "Task" layer.

For a NOSQL layer -- I'm looking for the answer to that same question
myself!  We've been experimenting with Cassandra (used via the hscassandra
package based in turn on cassandra-thrift).  Already it's clear that there
are many areas that need work.  The Haskell code generated by Thrift itself
has a lot of room for improvement (for the intrepid hacker: cycles there
would be well-spent).
   We haven't tried CouchDB yet.  Please keep us posted on what you find.

I don't know if any one has a clean way for hooking a simple Haskell-ish
interface (e.g. Data.Map) up to a persistence layer.  But it seems like
there have been a bunch of papers on "database supported haskell" and the
like.  One of them must have solved this!

http://hackage.haskell.org/package/DSH

Cheers,
  -Ryan


On Mon, Oct 31, 2011 at 4:53 PM, dokondr  wrote:

> On Mon, Oct 31, 2011 at 6:50 PM, John Lenz  wrote:
>
>> CouchDB works great, although I decided to go with SimpleDB since then it
>> is amazon's problem to scale and allocate disk and so forth, which I like
>> better.  For couchdb, you can use my package couchdb-enumerator on hackage.
>>
>>
>> Regarding CouchDB. So far I have my records keyed by Id and stored in
> Data.Map which I serialize to  text file. Using Data.Map functions I do
> many operations with these records including mapping functions over keys
> and values, accumulation, lookup, intersection, union etc.
> When I move this data to CouchDB and start using couchdb-enumerator to
> work with it, how natural will it be to implement all these functions that
> I use from Data.Map?
> Or maybe it makes more sense to store my serialized Data.Map as a blob in
> CouchDB? And do not use views or similar CouchDB / SimpleDB interfaces at
> all?  Just retrieve necessary blob and deserialize it to Data.Map, update
> and then store modified blob to CouchDB again?
>
> It would be great if somebody had time to implement Data.List, Data.Map,
> etc on top of generic  NoSQL DB interface with specific instances for
> CouchDB, SimpleDB, etc.
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Amazon AWS storage best to use with Haskell?

2011-10-31 Thread dokondr
On Mon, Oct 31, 2011 at 6:50 PM, John Lenz  wrote:

> CouchDB works great, although I decided to go with SimpleDB since then it
> is amazon's problem to scale and allocate disk and so forth, which I like
> better.  For couchdb, you can use my package couchdb-enumerator on hackage.
>
>
> Regarding CouchDB. So far I have my records keyed by Id and stored in
Data.Map which I serialize to  text file. Using Data.Map functions I do
many operations with these records including mapping functions over keys
and values, accumulation, lookup, intersection, union etc.
When I move this data to CouchDB and start using couchdb-enumerator to work
with it, how natural will it be to implement all these functions that I use
from Data.Map?
Or maybe it makes more sense to store my serialized Data.Map as a blob in
CouchDB? And do not use views or similar CouchDB / SimpleDB interfaces at
all?  Just retrieve necessary blob and deserialize it to Data.Map, update
and then store modified blob to CouchDB again?

It would be great if somebody had time to implement Data.List, Data.Map,
etc on top of generic  NoSQL DB interface with specific instances for
CouchDB, SimpleDB, etc.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Amazon AWS storage best to use with Haskell?

2011-10-31 Thread dokondr
On Mon, Oct 31, 2011 at 6:50 PM, John Lenz  wrote:


> 4) My code processes hundreds of messages. Every message is processed in
>
>> exactly the same way as the others. So the code can be easily
>> parallelized. Any Haskell frameworks that will allow me to run this code
>> in a simple concurrency model?
>>
>>
> Yes, there are many options.
>
> http://www.haskell.org/**haskellwiki/GHC/Concurrency
>
>
>
John, thanks for detailed reply!
I am looking at Haskell Concurrency wiki, but can not figure out which
framework - STM, sparks, threads, etc. Amazon AWS will be able to scale?
As far as I know, to scale CPU and program memory in Amazon, all you can
ask from AWS is to start some number of additional extra VMs. Every VM
contains a complete image of your OS and executables, all images are
exactly the same.
That's fine with me as currently all my workflow tasks are performed by
separately compiled Haskell executables communicating via regular files.
So to reformulate my question:
- Does any Haskell  framework exist that allow to orchestrate separate
processes  (NOT threads that share the same process memory)?
On the other hand, in case there is a way to make Amazon AWS  to scale
Haskell STM, sparks, threads, etc. - I would happily rewrite my Haskell
code to use these frameworks.  So my second question:
- Is there a way to make Amazon AWS  to scale Haskell STM, sparks, threads
or any other Haskell concurrency frameworks?
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Amazon AWS storage best to use with Haskell?

2011-10-31 Thread dokondr
Hi,
Please share your experience / ideas on AWS storage most friendly to
Haskell.
So far I store my data mostly in Data.Map structures serialized to text
files with write / read functions. Now I was requested to move my app and
data to Amazon cloud. As far as I know there are two main storage types
that Amazon provides: S3 - basic block storage and SimpleDB (
http://docs.amazonwebservices.com/AmazonSimpleDB/latest/GettingStartedGuide/
)
Questions:
1) I would like to continue working with my data using abstractions similar
to the ones that Data.Map provides. Any ideas how to iterate and modify
SimpleDB records in a similar powerful way as provided by Data.Map? Or
maybe S3?
2) It would be great to do development and testing offline without actually
connecting to AWS S3 / SimpleDB. Are there any AWS simulators + Haskell
libraries that  will allow to do such an offline development?
3) Any experience / ideas  with Haskell libs for NoSQL, not AWS-native,
storages, that will run well both offline and in AWS?
4) My code processes hundreds of messages. Every message is processed in
exactly the same way as the others. So the code can be easily parallelized.
Any Haskell frameworks that will allow me to run this code in a simple
concurrency model?

Thanks!
Dmitri
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe