Re: happstack-ixset internals/performance (was Re: [Haskell-cafe] Inverse of HaskellDB)

2010-10-02 Thread Jeremy Shaw
In the current version of IxSet, the performance of querying on just
the Lon would be essentially the same as if you just had a Data.Map
Lon Point. But the queries on the second index are current not so
great. There is work in progress to rewrite the internals of IxSet to
be based on a kd-tree, in which case your query should be pretty
efficient.

So, that answer is pretty vague :) I am in the process of wrapping up
happstack 0.6 which has focused on fixing some performance issues with
happstack-server, and refactoring the code so that user API and
internals are more clearly separated and better documented.

happstack 0.7 is all about happstack-state. A key aspect will be
nailing down some solid performance benchmarks instead of vague hand
waving :)

The numbers you give are certainly within the scope of what we would
like 0.7 to be able to handle. Also, I should note that
happstack-state and happstack-ixset are independent from each other.
You can easily use something other than IxSet to store your points and
still use happstack-state.

- jeremy

On Fri, Oct 1, 2010 at 1:53 PM, Thomas M. DuBuisson
thomas.dubuis...@gmail.com wrote:
 That is pretty close to how it would look using happstack-state. Here
 is a complete, runnable example which defines the types, a query,
 creates/initializes the database, performs the query, and prints the
 results.
 [snip]

 How is data stored in Happstack.State?  I see the Component instance
 uses fromList from happstack-ixset but can't find any information on
 the algorithm used or its efficiency (computationally or wrt space).

 If making this more concrete helps then here is a possible use:

 == GPS Points ==
 I have a GPS logger that logs every 10 seconds when I jog.  Jogging for
 an hour a day for the past 180 days has resulted in 64k points.
 Pretending I hosted a site for joggers (and all points were in the same
 DB) I could easily result in a billion points ( 20K users).  Would
 happstack-ixset code in the form points @ (Lon -120) @ (Lon -125) @
 (Lat 45) @ (Lat 50) perform reasonably?

 Cheers,
 Thomas


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: happstack-ixset internals/performance (was Re: [Haskell-cafe] Inverse of HaskellDB)

2010-10-02 Thread Thomas M. DuBuisson
Thanks Jeremy, I just wrote up my own little analysis (below) while you
were responding.  I'll look for the kd-tree work; if I see discussion
(and am stupid enough to heap more work onto my plate) then I might get
involved.

Oops, didn't send...

Cheers,
Thomas

-

So another glance tells me there is a list of maps (one element for each
index method) and it uses Data.Map under the hood.  So I have O(m lg n)
where m is the number of index methods and n is the number of elements.

Space wise, I think Data.Map takes up 6 words per Bin constructor (1 for
the constructor, 1 for the 'Size' and one for the pointer indirection
for each additional field), so the space is 6 * n * m * w where w is
the word size.  This means indexing by 5 methods for 1M entries takes
about 256MB, assuming 28B per entry that's 28MB of data + 256 indexing ~
282MB needed.

Indexing my imaginary 1B points by user,date,lat,lon is 6 * 2^30 * 4 * 8
- or about 192 GB of indexing + 28GB of data for 220GB total.  Obviously
I shouldn't be talking about keeping a live data set of 28GB in memory
let alone indexing it all, but I was just curious about the ratio (220MB
for 1M points, which is just one heavy user).



On Sat, 2010-10-02 at 14:09 -0500, Jeremy Shaw wrote:
 In the current version of IxSet, the performance of querying on just
 the Lon would be essentially the same as if you just had a Data.Map
 Lon Point. But the queries on the second index are current not so
 great. There is work in progress to rewrite the internals of IxSet to
 be based on a kd-tree, in which case your query should be pretty
 efficient.
 
 So, that answer is pretty vague :) I am in the process of wrapping up
 happstack 0.6 which has focused on fixing some performance issues with
 happstack-server, and refactoring the code so that user API and
 internals are more clearly separated and better documented.
 
 happstack 0.7 is all about happstack-state. A key aspect will be
 nailing down some solid performance benchmarks instead of vague hand
 waving :)
 
 The numbers you give are certainly within the scope of what we would
 like 0.7 to be able to handle. Also, I should note that
 happstack-state and happstack-ixset are independent from each other.
 You can easily use something other than IxSet to store your points and
 still use happstack-state.
 
 - jeremy
 
 On Fri, Oct 1, 2010 at 1:53 PM, Thomas M. DuBuisson
 thomas.dubuis...@gmail.com wrote:
  That is pretty close to how it would look using happstack-state. Here
  is a complete, runnable example which defines the types, a query,
  creates/initializes the database, performs the query, and prints the
  results.
  [snip]
 
  How is data stored in Happstack.State?  I see the Component instance
  uses fromList from happstack-ixset but can't find any information on
  the algorithm used or its efficiency (computationally or wrt space).
 
  If making this more concrete helps then here is a possible use:
 
  == GPS Points ==
  I have a GPS logger that logs every 10 seconds when I jog.  Jogging for
  an hour a day for the past 180 days has resulted in 64k points.
  Pretending I hosted a site for joggers (and all points were in the same
  DB) I could easily result in a billion points ( 20K users).  Would
  happstack-ixset code in the form points @ (Lon -120) @ (Lon -125) @
  (Lat 45) @ (Lat 50) perform reasonably?
 
  Cheers,
  Thomas
 
 


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: happstack-ixset internals/performance (was Re: [Haskell-cafe] Inverse of HaskellDB)

2010-10-02 Thread Peter Robinson
Hi, Thomas.

 Thanks Jeremy, I just wrote up my own little analysis (below) while you
 were responding.  I'll look for the kd-tree work; if I see discussion
 (and am stupid enough to heap more work onto my plate) then I might get
 involved.

You can find the repository for the dynamic kd-tree implementation here [1].
I'm currently rewriting large parts of the core algorithms (balancing and multi
key traversal) and the implementation is far from being complete/usable.  Once
I'm done with these changes it's time for some serious benchmarking.  The
kd-tree implementation does seem to scale, as the last working version
outperformed Data.Map w.r.t. space and time when considering large data sets
(~100 elements). For single-key queries on small data sets, however, IxSet
is currently still faster while memory consumption is about the same.

I think the main advantage of using a kd-tree vs multiple Data.Maps, is that a
query involving multiple keys can still happen in O(log n) time, as the tree
needs to be traversed only once. Also, when an element is modified, most of the
m Data.Maps need to be rebuilt (i.e. O(m*n*log n)) because the indexing
information might be out of date. (This might have been optimized in recent
versions of happstack-ixset.) For the kd-tree we can get away with rebalancing a
subtree of some size k which takes O(k*log k) time.

  Peter

[1] http://darcs.monoid.at/kdtree/

 -

 So another glance tells me there is a list of maps (one element for each
 index method) and it uses Data.Map under the hood.  So I have O(m lg n)
 where m is the number of index methods and n is the number of elements.

 Space wise, I think Data.Map takes up 6 words per Bin constructor (1 for
 the constructor, 1 for the 'Size' and one for the pointer indirection
 for each additional field), so the space is 6 * n * m * w where w is
 the word size.  This means indexing by 5 methods for 1M entries takes
 about 256MB, assuming 28B per entry that's 28MB of data + 256 indexing ~
 282MB needed.

 Indexing my imaginary 1B points by user,date,lat,lon is 6 * 2^30 * 4 * 8
 - or about 192 GB of indexing + 28GB of data for 220GB total.  Obviously
 I shouldn't be talking about keeping a live data set of 28GB in memory
 let alone indexing it all, but I was just curious about the ratio (220MB
 for 1M points, which is just one heavy user).



 On Sat, 2010-10-02 at 14:09 -0500, Jeremy Shaw wrote:
 In the current version of IxSet, the performance of querying on just
 the Lon would be essentially the same as if you just had a Data.Map
 Lon Point. But the queries on the second index are current not so
 great. There is work in progress to rewrite the internals of IxSet to
 be based on a kd-tree, in which case your query should be pretty
 efficient.

 So, that answer is pretty vague :) I am in the process of wrapping up
 happstack 0.6 which has focused on fixing some performance issues with
 happstack-server, and refactoring the code so that user API and
 internals are more clearly separated and better documented.

 happstack 0.7 is all about happstack-state. A key aspect will be
 nailing down some solid performance benchmarks instead of vague hand
 waving :)

 The numbers you give are certainly within the scope of what we would
 like 0.7 to be able to handle. Also, I should note that
 happstack-state and happstack-ixset are independent from each other.
 You can easily use something other than IxSet to store your points and
 still use happstack-state.

 - jeremy

 On Fri, Oct 1, 2010 at 1:53 PM, Thomas M. DuBuisson
 thomas.dubuis...@gmail.com wrote:
  That is pretty close to how it would look using happstack-state. Here
  is a complete, runnable example which defines the types, a query,
  creates/initializes the database, performs the query, and prints the
  results.
  [snip]
 
  How is data stored in Happstack.State?  I see the Component instance
  uses fromList from happstack-ixset but can't find any information on
  the algorithm used or its efficiency (computationally or wrt space).
 
  If making this more concrete helps then here is a possible use:
 
  == GPS Points ==
  I have a GPS logger that logs every 10 seconds when I jog.  Jogging for
  an hour a day for the past 180 days has resulted in 64k points.
  Pretending I hosted a site for joggers (and all points were in the same
  DB) I could easily result in a billion points ( 20K users).  Would
  happstack-ixset code in the form points @ (Lon -120) @ (Lon -125) @
  (Lat 45) @ (Lat 50) perform reasonably?
 
  Cheers,
  Thomas
 
 


 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: happstack-ixset internals/performance (was Re: [Haskell-cafe] Inverse of HaskellDB)

2010-10-02 Thread Jeremy Shaw
Hello,

Check out these threads:

http://groups.google.com/group/happs/browse_thread/thread/23d92e45c99f88b1
http://groups.google.com/group/happs/browse_thread/thread/0b0d0a9158c3ad73

There is nothing inherently wrong with keeping 28GB of real data in
memory.  It depends largely on what you are trying to optimize for..

For personal hobby projects, the cost of hosting the project seems to
be the biggest expense that people are concerned worth. (Which is
quite reasonable). As a result people are willing to increase
development time, and sacrifice performance in order lower hosting
costs. (Also reasonable).

For commercial projects, the costs look different. For example, if
buying a server with 1TB of RAM (around $80,000 these days) means they
can hire one less developer (at $150,000/year when you look at total
cost of salary, benefits, etc), then that is a savings of $70,000.
Additionally, in RAM storage provides much more predictable
performance. If you have ever attempted to use a site like reddit
after its memcached servers have been reset, it is clear how bad the
penalties for hitting disk are. RAM is significantly faster and than
disk and has no seek time. RAM can achieve 100-1000x less latency and
higher throughput.

Facebook reportedly keeps as much as 80-90% of their working dataset
in memcached. In 2008 they had 28TB of memcached servers. No idea what
they are up to now
(http://www.facebook.com/note.php?note_id=39391378919).

Of course, there are datasets which are simply too big to be stored in
RAM. For example, google's search index. I am not that familiar with
their approach, but I believe they go the opposite extreme. They
likely assume that every query is going to hit the disk, and optimize
the system so that it can provide acceptable response even if nothing
is cached ? (I am totally guessing here).

The focus of the new IxSet internals is to help bring the indexing
overhead ratio down (and increase speed at the same time).

There are also some ideas, but not code, for how to build an
IxSet-like structure which stores the keys in RAM, but can store the
values of disk. That would, hopefully, give you the ease of use of
IxSet, but with lower memory requirements if your keys are
significantly smaller than the total value payload. The trade off
being that you get back into the ugly work of disk seeks :)

There was also some experimental work done where the values were
stored in RAM, but in their serialized byte format, under the
assumption that the serialized format is a lot more compact than the
normal representation. The trade off is that the values must be
deserialized everytime they are used, which requires more CPU. A more
complex version could be imagined, where the deserialized version is
cached for some period of time. Whether that is actually beneficial
can only be determined empirically I think...

So, clearly happstack-state is not the optimal choice for all haskell
web applications. And that is why it is a completely independent from
happstack-server. We do want to allow people to pick the best choice
for their application. Of course, if we want to provide off-the-shelf
components like a user account system -- then eventually we have to
nail down some specifics, such as the persistence layer. But those
choices would only apply to code that wants to use those optional
components.

At the same time, happstack-state is a very interesting and
challenging library to work on. The recent rise in popularity of
things like redis seems to validate happstack-state somewhat.

- jeremy


On Sat, Oct 2, 2010 at 3:09 PM, Thomas M. DuBuisson
thomas.dubuis...@gmail.com wrote:
 Thanks Jeremy, I just wrote up my own little analysis (below) while you
 were responding.  I'll look for the kd-tree work; if I see discussion
 (and am stupid enough to heap more work onto my plate) then I might get
 involved.

 Oops, didn't send...

 Cheers,
 Thomas

 -

 So another glance tells me there is a list of maps (one element for each
 index method) and it uses Data.Map under the hood.  So I have O(m lg n)
 where m is the number of index methods and n is the number of elements.

 Space wise, I think Data.Map takes up 6 words per Bin constructor (1 for
 the constructor, 1 for the 'Size' and one for the pointer indirection
 for each additional field), so the space is 6 * n * m * w where w is
 the word size.  This means indexing by 5 methods for 1M entries takes
 about 256MB, assuming 28B per entry that's 28MB of data + 256 indexing ~
 282MB needed.

 Indexing my imaginary 1B points by user,date,lat,lon is 6 * 2^30 * 4 * 8
 - or about 192 GB of indexing + 28GB of data for 220GB total.  Obviously
 I shouldn't be talking about keeping a live data set of 28GB in memory
 let alone indexing it all, but I was just curious about the ratio (220MB
 for 1M points, which is just one heavy user).



 On Sat, 2010-10-02 at 14:09 -0500, Jeremy Shaw wrote:
 In the current version of IxSet, the performance of querying on just
 the 

happstack-ixset internals/performance (was Re: [Haskell-cafe] Inverse of HaskellDB)

2010-10-01 Thread Thomas M. DuBuisson
 That is pretty close to how it would look using happstack-state. Here
 is a complete, runnable example which defines the types, a query,
 creates/initializes the database, performs the query, and prints the
 results.
[snip]

How is data stored in Happstack.State?  I see the Component instance
uses fromList from happstack-ixset but can't find any information on
the algorithm used or its efficiency (computationally or wrt space).

If making this more concrete helps then here is a possible use:

== GPS Points ==
I have a GPS logger that logs every 10 seconds when I jog.  Jogging for
an hour a day for the past 180 days has resulted in 64k points.
Pretending I hosted a site for joggers (and all points were in the same
DB) I could easily result in a billion points ( 20K users).  Would
happstack-ixset code in the form points @ (Lon -120) @ (Lon -125) @
(Lat 45) @ (Lat 50) perform reasonably?

Cheers,
Thomas

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Inverse of HaskellDB

2010-09-30 Thread Felipe Lessa
On Wed, Sep 29, 2010 at 7:21 AM, Michael Snoyman mich...@snoyman.com wrote:
 I think this approach is not possible without involving some fairly
 ugly unsafeInterleaveIO/unsafePerformIO calls. A simple example using
 a common web programming example: support I have a multi-user blog
 site, where each user can have multiple entries. I would model this
 using standard Haskell datatypes as:

 data Entry = Entry { title :: String, content :: String }
 data Blogger = Blogger { name :: String, entries :: [Entry] }

 Obviously we'll need some kind of blogger loading function:

 getBloggerByName :: String - IO Blogger

 Either this will load up all entries (a potentially incredibly costly
 operation) or use unsafe IO down the road. Especially when using
 database connections, this can be incredibly bad: the connection could
 be closed, the SQL statement could be reused by another request, etc.

It may be possible to tag those data fields that are not to be
loaded on the spot.  For example,

 data Entry = Entry { title :: String, content :: String }
 data Blogger db = Blogger { name :: String, entries :: OnDB db [Entry] }

 class Monad db = Database db where
   data OnDB db :: * - *
   fetch :: OnDB db a - db a
   fetchSome :: Criteria a - OnDB db [a] - db [a]

 newtype InMemory a = InMemory a
 instance Database InMemory where
   newtype OnDB db a = OnDBMem a
   fetch (OnDBMem x) = return x
   fetchSome = ...

 instance Database SQL where
   ...

Cheers,

--
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Inverse of HaskellDB

2010-09-30 Thread Christopher Done
While we're on the topic of databases, I really wanted to try out
query/inserting from/to my database with records like this:
http://hpaste.org/40240/db_library_approach

Define a record:

data Person f =
  Person { pid:: f Integer
 , firstName  :: f String
 , middleName :: f (Maybe String)
 , lastName   :: f String
 , age:: f Integer
 }

then I'd query it like

personById :: Integer - Query Person
personById i =
  Person { pid= constant i
 , firstName  = anything
 , middleName = anything
 , lastName   = anything
 , age= anything
 }  deriving (Typeable,Data)

Or with a Data.Default instance:

personById :: Integer - Query Person
personById i = def { pid = constant i }

But I have yet to figure out how to derive a Typeable instance for
such a type. I don't want to write any instances of anything.
Technically I can get the field names and values using a
Data.Data.Data instance, but I don't know, maybe I should make a
TypeablePolymorphicKinds class or something and try to derive for it.
UHC's generic deriving would probably be good for something like this,
but I want GHC.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Inverse of HaskellDB

2010-09-30 Thread Michael Snoyman
On Thu, Sep 30, 2010 at 4:35 PM, Felipe Lessa felipe.le...@gmail.com wrote:
 On Wed, Sep 29, 2010 at 7:21 AM, Michael Snoyman mich...@snoyman.com wrote:
 I think this approach is not possible without involving some fairly
 ugly unsafeInterleaveIO/unsafePerformIO calls. A simple example using
 a common web programming example: support I have a multi-user blog
 site, where each user can have multiple entries. I would model this
 using standard Haskell datatypes as:

 data Entry = Entry { title :: String, content :: String }
 data Blogger = Blogger { name :: String, entries :: [Entry] }

 Obviously we'll need some kind of blogger loading function:

 getBloggerByName :: String - IO Blogger

 Either this will load up all entries (a potentially incredibly costly
 operation) or use unsafe IO down the road. Especially when using
 database connections, this can be incredibly bad: the connection could
 be closed, the SQL statement could be reused by another request, etc.

 It may be possible to tag those data fields that are not to be
 loaded on the spot.  For example,

 data Entry = Entry { title :: String, content :: String }
 data Blogger db = Blogger { name :: String, entries :: OnDB db [Entry] }

 class Monad db = Database db where
   data OnDB db :: * - *
   fetch :: OnDB db a - db a
   fetchSome :: Criteria a - OnDB db [a] - db [a]

 newtype InMemory a = InMemory a
 instance Database InMemory where
   newtype OnDB db a = OnDBMem a
   fetch (OnDBMem x) = return x
   fetchSome = ...

 instance Database SQL where
   ...

I wasn't claiming my approach was the *only* approach, just stating
that it doesn't seem feasible to use the simple Haskell data type
declarations.

Michael
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Inverse of HaskellDB

2010-09-30 Thread Jeremy Shaw
On Wed, Sep 29, 2010 at 5:21 AM, Michael Snoyman mich...@snoyman.com wrote:
 I think this approach is not possible without involving some fairly
 ugly unsafeInterleaveIO/unsafePerformIO calls. A simple example using
 a common web programming example: support I have a multi-user blog
 site, where each user can have multiple entries. I would model this
 using standard Haskell datatypes as:

 data Entry = Entry { title :: String, content :: String }
 data Blogger = Blogger { name :: String, entries :: [Entry] }

 Obviously we'll need some kind of blogger loading function:

 getBloggerByName :: String - IO Blogger

That is pretty close to how it would look using happstack-state. Here
is a complete, runnable example which defines the types, a query,
creates/initializes the database, performs the query, and prints the
results.

 {-# LANGUAGE DeriveDataTypeable, FlexibleContexts, MultiParamTypeClasses, 
 TemplateHaskell, TypeSynonymInstances, TypeFamilies #-}
 module Main where

 import Control.Exception (bracket)
 import Control.Monad.Reader (ask)
 import Data.Data
 import Happstack.Data
 import Happstack.Data.IxSet
 import Happstack.State

A simple type to identify a particular blogger:

 newtype Blogger = Blogger { name :: String }
 deriving (Eq, Ord, Read, Show, Data, Typeable)
 $(deriveSerialize ''Blogger)
 instance Version Blogger

The deriveSerialize instance automatically creates the instances for
serializing and deserializing to/from a binary representation for
storage, transmission, etc.

The Version instance is used for migration when the data type changes.
(Since there is no previous version of this type to migrate from, we
don't have to specify anything).

We create a similar type for the title of the blog post:

 newtype Title = Title { unTitle :: String }
 deriving (Eq, Ord, Read, Show, Data, Typeable)
 $(deriveSerialize ''Title)
 instance Version Title

And a simple record which actually contains a blog post:

 data Entry =
 Entry { title   :: Title
   , blogger :: Blogger
   , content :: String
   }
 deriving (Eq, Ord, Read, Show, Data, Typeable)
 $(deriveSerialize ''Entry)
 instance Version Entry

Obviously, it could be expanded to support tags, posted date, whether
or not in is published, etc. Next we create an IxSet which holds all
the Entries that have been posted:

 $(inferIxSet Entries ''Entry 'noCalcs [''Blogger, ''Title])

An IxSet is a bit like a normal Set, except it has indexes, which you
can use for performing queries. In this case, we use Blogger and Title
as indexes.

Next we define a component that actually stores the Entries:

 instance Component Entries where
 type Dependencies Entries = End
 initialValue = fromList [ Entry { title   = Title 10 Reasons you should 
 use Happstack.
 , blogger = Blogger stepcut
 , content = ...
 }
 , Entry { title   = Title Persistence made easy!
 , blogger = Blogger Jeremy Shaw
 , content = ...
 }
 ]

This component is prepopulated with 2 entries. Now we want to define a
query which retrieves all the entries by a particular Blogger:

 getEntriesByBlogger :: Blogger - Query Entries Entries
 getEntriesByBlogger blogger =
 do e - ask
return (e @= blogger)

The Query monad is essentially a specialized version of the Reader
monad. So we use 'ask' to get the Entries from the Entries
component. (@=) is an IxSet function which selects all the Entries with
the specified blogger.

Next we 'register' all the functions we want to use as queries for the
Entries Component:

 $(mkMethods ''Entries ['getEntriesByBlogger])

And finally, here is a main function which initializes the transaction
system, performs a query, prints the results, and shuts the
transaction system down:

 main :: IO ()
 main =
 bracket (startSystemState (Proxy :: Proxy Entries)) shutdownSystem $ \_ -
 do postsByStepcut - query (GetEntriesByBlogger (Blogger stepcut))
print postsByStepcut

Note that there is no outside or additional configuration which needs
to be done. If you have the happstack-state libraries installed on
your system, then you can simply run this program. You do not need to
configure or initialize any external database system.

The queries and updates are thread-safe, ACID-transactions. You can
use almost any Haskell datatype declared using the normal Haskell
syntax. Basically, if you could write a pair of Read/Show instances
for the type, then you can probably use it directly with
happstack-state. So that means the type can not have functions,
existentials, and a few other things. But Trees, etc, are no problem.

The queries and updates are just straight-forward functions in the
Reader and State monads. So, there is no 

Re: [Haskell-cafe] Inverse of HaskellDB

2010-09-29 Thread Chris Eidhof
On 28 sep 2010, at 17:33, Ozgur Akgun wrote:

 How do you define relationships between data types?
 
 Well, why is it any different from other fields? From one of your examples 
 [1], I'd expect you to have a list of questions in the Quiz data type, and if 
 necessary, a quiz field in the Question data type. This might be a bit tricky 
 but certainly achievable [2].

This is really tricky. For example, consider storing a large tree in the 
database:

 data Tree = Node Int Tree Tree | Leaf Int

This means you need to read the entire tree from the database. Or consider 
cyclic datastructures (such as the example you gave). How do you store this? 
The only way to inspect this is using a library like data-reify [1].

I think the problem might be a bit harder than you suspect.

Another way to solve it is using Sebastiaan Visser's framework, described in 
his paper [2], but that's also rather complicated.

-chris

[1]: http://hackage.haskell.org/package/data-reify
[2]: 
http://github.com/downloads/sebastiaanvisser/msc-thesis/wgp10-genstorage.pdf___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Inverse of HaskellDB

2010-09-29 Thread Ozgur Akgun
OK, I am rephrasing it a bit then :)
I definitely don't think this would be trivial to implement. However, I'd
expect a decent solution to this problem, not to have special combinators to
describe relations between data types, but let the user model their data
using plain haskell data types, and infer the associated table structure
just by looking at the data types.

I'll give this a harder thought once I find the time. There is the huge
barrier of TH, stopping me from playing with things like this.

Anyway, have fun! :)

On 29 September 2010 10:41, Chris Eidhof ch...@eidhof.nl wrote:

 On 28 sep 2010, at 17:33, Ozgur Akgun wrote:

  How do you define relationships between data types?
 
  Well, why is it any different from other fields? From one of your
 examples [1], I'd expect you to have a list of questions in the Quiz data
 type, and if necessary, a quiz field in the Question data type. This might
 be a bit tricky but certainly achievable [2].

 This is really tricky. For example, consider storing a large tree in the
 database:

  data Tree = Node Int Tree Tree | Leaf Int

 This means you need to read the entire tree from the database. Or consider
 cyclic datastructures (such as the example you gave). How do you store this?
 The only way to inspect this is using a library like data-reify [1].

 I think the problem might be a bit harder than you suspect.

 Another way to solve it is using Sebastiaan Visser's framework, described
 in his paper [2], but that's also rather complicated.

 -chris

 [1]: http://hackage.haskell.org/package/data-reify
 [2]:
 http://github.com/downloads/sebastiaanvisser/msc-thesis/wgp10-genstorage.pdf




-- 
Ozgur Akgun
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Inverse of HaskellDB

2010-09-29 Thread Michael Snoyman
I think this approach is not possible without involving some fairly
ugly unsafeInterleaveIO/unsafePerformIO calls. A simple example using
a common web programming example: support I have a multi-user blog
site, where each user can have multiple entries. I would model this
using standard Haskell datatypes as:

data Entry = Entry { title :: String, content :: String }
data Blogger = Blogger { name :: String, entries :: [Entry] }

Obviously we'll need some kind of blogger loading function:

getBloggerByName :: String - IO Blogger

Either this will load up all entries (a potentially incredibly costly
operation) or use unsafe IO down the road. Especially when using
database connections, this can be incredibly bad: the connection could
be closed, the SQL statement could be reused by another request, etc.

My persistent library follows a similar approach to what Chris
describes, though with a very different syntax. You can see a very
similar example on the documentation site[1].

Michael

[1] http://docs.yesodweb.com/book/persistent/#relations

On Wed, Sep 29, 2010 at 12:01 PM, Ozgur Akgun ozgurak...@gmail.com wrote:
 OK, I am rephrasing it a bit then :)
 I definitely don't think this would be trivial to implement. However, I'd
 expect a decent solution to this problem, not to have special combinators to
 describe relations between data types, but let the user model their data
 using plain haskell data types, and infer the associated table structure
 just by looking at the data types.
 I'll give this a harder thought once I find the time. There is the huge
 barrier of TH, stopping me from playing with things like this.
 Anyway, have fun! :)

 On 29 September 2010 10:41, Chris Eidhof ch...@eidhof.nl wrote:

 On 28 sep 2010, at 17:33, Ozgur Akgun wrote:

  How do you define relationships between data types?
 
  Well, why is it any different from other fields? From one of your
  examples [1], I'd expect you to have a list of questions in the Quiz data
  type, and if necessary, a quiz field in the Question data type. This might
  be a bit tricky but certainly achievable [2].

 This is really tricky. For example, consider storing a large tree in the
 database:

  data Tree = Node Int Tree Tree | Leaf Int

 This means you need to read the entire tree from the database. Or consider
 cyclic datastructures (such as the example you gave). How do you store this?
 The only way to inspect this is using a library like data-reify [1].

 I think the problem might be a bit harder than you suspect.

 Another way to solve it is using Sebastiaan Visser's framework, described
 in his paper [2], but that's also rather complicated.

 -chris

 [1]: http://hackage.haskell.org/package/data-reify
 [2]:
 http://github.com/downloads/sebastiaanvisser/msc-thesis/wgp10-genstorage.pdf


 --
 Ozgur Akgun

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Inverse of HaskellDB

2010-09-28 Thread Chris Eidhof
Hey Jonathan,

I've done some work on this. The hard part is defining relationships between 
datatypes: how do you model this in Haskell? I've some code on github: 
http://github.com/chriseidhof/persist, you might be interested in that.

-chris

On 25 sep 2010, at 21:31, Jonathan Geddes wrote:

 Cafe,
 
 HaskellDB takes a database schema and produces Haskell data structures
 (plus some other query-related stuff for its EDSL query language).
 
 What I'm looking for is the inverse of this functionality. I want to
 create tables based on a Haskell data structure with a few simple
 rules. These rules include: if a field is not of the form `Maybe a'
 then it can't be nullable in the database. If a field is not a
 primitive (in the database) then it is actually stored in another
 table and a reference id is stored in the table. Tables are produced
 recursively, unless they already exist, etc.
 
 The HaskellDB approach is great for interfacing with existing tables,
 but in my case I already have data structures and now I would like a
 quick way to create tables to persist them.
 
 Does such a thing exist? If not, would you find it useful? I may take
 this up as a side project if it does not already exist and others
 would find it useful.
 
 Thanks,
 
 --Jonathan
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Inverse of HaskellDB

2010-09-28 Thread Jeremy Shaw
Do you need to persistently store your Haskell data types in tables ?
Or just persistently store them ?

happstack-state provides the latter for you. Your data-types are just
plain-old Haskell data types and your queries and updates are just
functions in the Reader or State monad. It provides ACID properties
with write-ahead logging to the disk, S3, or other backends. It also
includes data-type versioning and automatic migration. Although it is
associated with the happstack web application framework, there is
nothing web specific about it. It does not even require that you build
or install any of the web stuff in Happstack.

To get a quick feel for how it works, I recommend this tutorial:

http://nhlab.blogspot.com/2008/07/extending-asterisk-with-happs.html

It is a little out of date -- but mostly you just need to change the
imports from HAppS to Happstack.

 - jeremy

On Sat, Sep 25, 2010 at 2:31 PM, Jonathan Geddes
geddes.jonat...@gmail.com wrote:
 Cafe,

 HaskellDB takes a database schema and produces Haskell data structures
 (plus some other query-related stuff for its EDSL query language).

 What I'm looking for is the inverse of this functionality. I want to
 create tables based on a Haskell data structure with a few simple
 rules. These rules include: if a field is not of the form `Maybe a'
 then it can't be nullable in the database. If a field is not a
 primitive (in the database) then it is actually stored in another
 table and a reference id is stored in the table. Tables are produced
 recursively, unless they already exist, etc.

 The HaskellDB approach is great for interfacing with existing tables,
 but in my case I already have data structures and now I would like a
 quick way to create tables to persist them.

 Does such a thing exist? If not, would you find it useful? I may take
 this up as a side project if it does not already exist and others
 would find it useful.

 Thanks,

 --Jonathan
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Inverse of HaskellDB

2010-09-28 Thread Ozgur Akgun

 How do you define relationships between data types?


Well, why is it any different from other fields? From one of your examples
[1], I'd expect you to have a list of questions in the Quiz data type, and
if necessary, a quiz field in the Question data type. This might be a bit
tricky but certainly achievable [2].

Something like the following:

data Quiz = Quiz {
  description :: String,
  subject :: String,
  questions   :: [Question]
} deriving (Show, Read)

data Question = Question {
  title   :: String,
  choiceA :: String,
  choiceB :: String,
  choiceC :: String,
  quiz:: Quiz
} deriving (Show, Read)


[1] http://github.com/chriseidhof/persist/blob/master/examples/Model.phs
[2] http://www.haskell.org/haskellwiki/Tying_the_Knot

On 28 September 2010 16:13, Chris Eidhof ch...@eidhof.nl wrote:

 Hey Jonathan,

 I've done some work on this. The hard part is defining relationships
 between datatypes: how do you model this in Haskell? I've some code on
 github: http://github.com/chriseidhof/persist, you might be interested in
 that.

 -chris

 On 25 sep 2010, at 21:31, Jonathan Geddes wrote:

  Cafe,
 
  HaskellDB takes a database schema and produces Haskell data structures
  (plus some other query-related stuff for its EDSL query language).
 
  What I'm looking for is the inverse of this functionality. I want to
  create tables based on a Haskell data structure with a few simple
  rules. These rules include: if a field is not of the form `Maybe a'
  then it can't be nullable in the database. If a field is not a
  primitive (in the database) then it is actually stored in another
  table and a reference id is stored in the table. Tables are produced
  recursively, unless they already exist, etc.
 
  The HaskellDB approach is great for interfacing with existing tables,
  but in my case I already have data structures and now I would like a
  quick way to create tables to persist them.
 
  Does such a thing exist? If not, would you find it useful? I may take
  this up as a side project if it does not already exist and others
  would find it useful.
 
  Thanks,
 
  --Jonathan
  ___
  Haskell-Cafe mailing list
  Haskell-Cafe@haskell.org
  http://www.haskell.org/mailman/listinfo/haskell-cafe

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe




-- 
Ozgur Akgun
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Inverse of HaskellDB

2010-09-25 Thread Antoine Latter
That sounds pretty awesome to me.

Have you given any thought as to how you want to approach versioning?

Maybe I'm asking a silly question - I have very little real world experience
with relation databases and how to version schemas.

Antoine

On Sep 25, 2010 2:31 PM, Jonathan Geddes geddes.jonat...@gmail.com
wrote:
 Cafe,

 HaskellDB takes a database schema and produces Haskell data structures
 (plus some other query-related stuff for its EDSL query language).

 What I'm looking for is the inverse of this functionality. I want to
 create tables based on a Haskell data structure with a few simple
 rules. These rules include: if a field is not of the form `Maybe a'
 then it can't be nullable in the database. If a field is not a
 primitive (in the database) then it is actually stored in another
 table and a reference id is stored in the table. Tables are produced
 recursively, unless they already exist, etc.

 The HaskellDB approach is great for interfacing with existing tables,
 but in my case I already have data structures and now I would like a
 quick way to create tables to persist them.

 Does such a thing exist? If not, would you find it useful? I may take
 this up as a side project if it does not already exist and others
 would find it useful.

 Thanks,

 --Jonathan
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Inverse of HaskellDB

2010-09-25 Thread Jonathan Geddes
Versioning is a tricky problem regardless of how you are creating
tables. And that isn't the problem I was aiming to tackle; the problem
I was aiming to tackle is a bit more narrow than that: I have a record
and now I need a table to stick it in.

By the way, how does HaskellDB handle versioning?

--Jonathan

On Sat, Sep 25, 2010 at 1:57 PM, Antoine Latter aslat...@gmail.com wrote:
 That sounds pretty awesome to me.

 Have you given any thought as to how you want to approach versioning?

 Maybe I'm asking a silly question - I have very little real world experience
 with relation databases and how to version schemas.

 Antoine

 On Sep 25, 2010 2:31 PM, Jonathan Geddes geddes.jonat...@gmail.com
 wrote:
 Cafe,

 HaskellDB takes a database schema and produces Haskell data structures
 (plus some other query-related stuff for its EDSL query language).

 What I'm looking for is the inverse of this functionality. I want to
 create tables based on a Haskell data structure with a few simple
 rules. These rules include: if a field is not of the form `Maybe a'
 then it can't be nullable in the database. If a field is not a
 primitive (in the database) then it is actually stored in another
 table and a reference id is stored in the table. Tables are produced
 recursively, unless they already exist, etc.

 The HaskellDB approach is great for interfacing with existing tables,
 but in my case I already have data structures and now I would like a
 quick way to create tables to persist them.

 Does such a thing exist? If not, would you find it useful? I may take
 this up as a side project if it does not already exist and others
 would find it useful.

 Thanks,

 --Jonathan
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Inverse of HaskellDB

2010-09-25 Thread Jonathan Geddes
Have you given any thought as to how you want to approach versioning?

After giving this some more thought, I realized:
1) One of the best practices (even though I despise the term) of
versioning schema is to include a script with the code which checks
for each change to the tables, and makes the change if it is needed
[1]. Each time you check out new code, you run the script to ensure
that the code you are working with matches the tables you are working
with.
2) A system that generates tables from Haskell types could also be
made to check if a given table faithfully represents a given Haskell
record type. It could then make any changes to the table so that it
_does_ faithfully represent the record type.
3) In this way, your Haskell records ARE your table update script,
just (like most Haskell code) incredibly terse. Your usual code
repository will track when and by whom changes are made to the record.

Of course, there are some issues with this, but I think it could be
made to work well.

Hibernate does this, more or less, for Java classes.  That might be a
good place to look for ideas.

Good point. I'll start there.

[1] http://www.codeproject.com/KB/database/DatabaseSchemaVersioning.aspx

--Jonathan

On Sat, Sep 25, 2010 at 3:45 PM, Rogan Creswick cresw...@gmail.com wrote:
 On Sat, Sep 25, 2010 at 12:31 PM, Jonathan Geddes
 geddes.jonat...@gmail.com wrote:

 Does such a thing exist? If not, would you find it useful? I may take
 this up as a side project if it does not already exist and others
 would find it useful.


 I've been looking for something along these lines too.

 Hibernate does this, more or less, for Java classes.  That might be a
 good place to look for ideas.

 --Rogan

 Thanks,

 --Jonathan
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe