Re: Persistent storage

evins.mi...@gmail.com Fri, 19 Dec 2008 10:10:16 -0800

On Dec 18, 7:18 pm, "Mark McGranaghan" <mmcgr...@gmail.com> wrote:
> I've likewise though a fair bit about this, but haven't been able to
> come up with a particularly satisfying solution.
>
> One approach I've considered is a watcher-type system where
> persistence is defined in terms of immutable snapshots and append-only
> journals: you snapshot the data to disk occasionally and otherwise
> maintain persistence by appending changes to a journal before
> committing; these changes can be replayed in the case of failure. This
> will probably require additional hooks into Clojure's MVCC
> implementation. This might actually be workable for smallish data
> sets, where it is reasonable to hold everything in memory.
>
> Anyone interested in the topic of "persistent data structures on disk"
> might want to look into the implementation of CouchDB. They currently
> use (IIUC) a persistent B tree on disk that uses crash-safe
> append-only modifications and is "garbage collected" by occasionally
> copying over all reachable portions of the tree into a new file. This
> may be interesting in and of itself, but I also expect that we'll see
> more interesting things in this space from the CouchDB project.
>
> - Mark
>
> On Thu, Dec 18, 2008 at 7:53 PM, Kyle Schaffrick <k...@raidi.us> wrote:
>
> > On Thu, 18 Dec 2008 18:06:40 -0500
> > Chouser <chou...@gmail.com> wrote:
>
> >> On Thu, Dec 18, 2008 at 4:47 PM, r <nbs.pub...@gmail.com> wrote:
>
> >> > Is is possible to use some kind of backend storage for Clojure's
> >> > data structures? I mean something like Perl's "tie" function that
> >> > makes data structures persistent (in sense of storage, not
> >> > immutability).
>
> >> > Such storage should be inherently immutable the way Clojure's data
> >> > are (so a simple wrapper on sql is probably not good enough) and
> >> > provide means of querying database and indexing (ideally
> >> > multidimensional).
>
> >> I would looove this.
>
> > This occurred to me the other day as well; the name "Mnejia" which
> > popped into my head says a lot about the sort of thing I had in mind :)
>
> >> > I wonder if this could be at library level or would rather have to
> >> > be hard-coded into Clojure itself. Did anyone try to do it?
>
> >> I've pondered a couple approaches, though only enough to find
> >> problems.
>
> >> One approach would work act like a Clojure collection, with structural
> >> sharing on-disk.  This would be great because it would have
> >> multi-versioning and transaction features built right in.  It would
> >> also have the potential to cache some data in memory while managing
> >> reads and writes to disk.
>
> > This is an interesting observation.
>
> > Something in the vein of OTP's Mnesia for Clojure would be *very* cool
> > indeed, and I have been thinking a lot about ways to implement
> > distribution mechanisms for Clojure on top of which such a thing could
> > be built. I imagine however that even sans distribution it would be
> > quite powerful and useful, and a fun project.
>
> > [ Mostly off-topic musings follow :) ]
>
> > The big problem with mimicking Mnesia for distribution is that a lot of
> > Erlang distribution idioms (used heavily in Mnesia AFAIK) rely on BEAM's
> > ability to marshall funs across the network (and indeed I think Mnesia
> > can even use that to serialize them on disk tables). If serialization of
> > a fun is possible in Clojure, doing it is way over my head :) Obviously
> > if you can serialize the sexp before the compiler gets ahold of it, this
> > is easy, but usually you don't get that lucky.
>
> > If one were able to marshall a Clojure fun, I had envisioned
> > constructing a sort of "distributed send", probably built atop one of
> > the many good message-queueing protocols already extant, that can be
> > used to cause that fun to be run on a remote Agent, giving you a more
> > Clojure-flavored distribution mechanism. Not RPC, but not exactly Actor
> > model either.
>
> > Hmmmm... :)
>
> > -Kyle


The largest project for which I'm using Clojure currently uses a
CouchDB instance for server storage and uses neo4j for single-user
persistent storage. Both are working well for me so far.

They aren't transparent, but, then again, I'm not sure that any
persistent storage mechanism is. Based on past experience, I prefer to
have all operations that alter persistent storage go through a fairly
small number of API calls, so that the search space is small when it's
time to fix bugs. Writing that API, and then ensuring that everything
in the system uses it, is about as much work as writing the data-
marshaling subsystem that you need for storage that requires
marshaling, so in that sense it's no loss.

There may even be a disadvantage to a storage subsystem that purports
to be "transparent": if the only way to get objects into and out of
the store is by marshaling them through your API, then there won't be
rogue code somewhere using some other API to scribble something wrong
in your database. If you use a "transparent" storage layer, then there
are ways other than your API to get objects into and out of persistent
storage, and you therefore have that much more code to search when
fixing bugs.

Neo is a "graph database"; it stores objects that represent named
links between other values. To marshal data to Neo, you convert your
objects to DAGs of these objects.

CouchDB is a "document database". It stores "documents", each of which
is a JSON object, so you have to convert your objects to JSON objects.

I have a model subsystem that defines types and operations used to
represent application data, and a store subsystem that defines an API
for storing and loading these values.  In addition, there are two
different back-ends for the storage subsystem, one that talks to Couch
and one that talks to Neo. Each of them handles the data marshaling
for its particular store.

I could see it being useful to add something like Python's pickles to
Clojure. Pickles are handy for very simple cases, when you want to
store a little bit of not-very-elaborate data, and you aren't worried
about ACID-type issues. It wouldn't solve the case where you need ACID
compliance, but I'm not sure there's a good way to do that at the
language level. Probably individual developers are going to want to
solve those issues in different ways, depending on their particular
applications. I've seen that already in my own application, where I'm
using one storage back-end for the shared multiuser case and a
different one for the single-user local case.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---
Re: Persistent storage

Reply via email to