Re: Local database recommendation?

Amirouche Boubekki Sun, 26 May 2013 15:20:25 -0700

>  1) Is it structured aka. an object can have several fields possibly
>> complex fields like list or hashmaps but also integers ? dates and uuids
>> can be emulated with strings and integers
>> 2) Do objects have relations ? a lot of relations ?
>> 3) is the data schema fixed at compilation or do you need to have the
>> schema to be dynamic ?
>>
>
> Much of the data is conditional in a certain sense -- if it's an X, it's
> also a Y and it may be a W or a Z as well, but if it's a G it's certainly
> not a W, etc.; though simply storing a large number of boolean columns that
> may be unused by many of the table rows would be acceptable.
>
> The thing that makes me slightly dubious about relational here is that
> there will necessarily either be many columns unused by many rows, as
> there's a lot of data that's N/A unless certain other conditions are met;
> or else there will be many whole tables and a lot of expensive joins, as we
> have a table of Foos, with an isBar? column with either a BarID or a null,
> and a table of Bars with an isBaz? column, and a table of Bazzes with an
> isQuux? column, and then a need to do joins on *all* of those tables to run
> a query over a subset of Quuxes and have access to some Foo table columns
> in the results.
>
> This sort of thing points towards an object database more than any other
> sort, with inherited fields from superclasses, or a map database that
> performs well with lots of null/missing keys in most of the maps. But maybe
> a relational DB table with very many columns but relatively few used by any
> given row would perform OK.
>


The only kind of object database that does ACID across documents on the JVM
I know of is Tinkerpop' Blueprints. Blueprints is an abstraction layer on
top of many graph databases among which Neo4j an OrientDB. The difference
between a graph database and an object database is that «pointers» in a
graph database are known at both ends. If you don't know graph you will
need to learn a bit of it. Basicaly, if A is connected to B, B knows also
about A being connected to it, which is not the case with a pointer.
Otherwise said, like in relationnal database, you can ask for «all things
connected to B» or «all things B connects to». The same query in an object
database will cost more. On top of that it's schemaless, like an object
database, but there is no notion of class, similar to what is found OO
programming (even if you can model the graph to have the concept of
classes).


>
> The DB must be able to grow larger then available RAM without crashing the
>>> JVM and the seqs resulting from queries like the above will also need to be
>>> able to get bigger than RAM.
>>>
>>
>>
>>> My own research suggests that H2 may be a good choice, but it's a
>>> standard SQL/relational DB and I'm not 100% sure that fits well with the
>>> type of data and querying noted above. Note though that not all querying
>>> will take that form; there'll also be strings, uuids, dates, and other such
>>> field types and the need to query on these and to join on some of them;
>>> also, to do less-than comparisons on dates.
>>>
>>
>> Depending on your speed needs and the speed of the database, a kv store
>> can be enough, you serialize the data as strings and deserialize it when
>> you need to do computation. Except that kv store are not easy to deal with
>> when you have complex queries, but again it depends on the query.
>>
>
> I expect they'd also have problems with transactional integrity if, say,
> there was a power cut during an update. Anything involving "serialize the
> data as strings" sounds unsuited to either the volume I'm envisioning or
> the need for consistency. It certainly wouldn't do to overwrite the file
> with half of an updated version of itself and then lose power! Keeping the
> previous version around as a .bak file is scarcely much better. It pretty
> much needs to be ACID since there will need to be coordinated changes to
> more than one bit of the data sometimes and having an update interrupted
> with only half the changes done, and having it stay in that half-done
> state, would potentially be disastrous.
>

At least unqlite is a embeddable kv store that is ACID across several keys,
you won't have data cut in half (based on what is  advertised), I think
berkley db is also transactional.

Also I'm interested only in opensource software so there might be
proprietary softwares that solve you problem best, but I doubt that ;)

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Local database recommendation?

Reply via email to