Use postgres. If it makes sense later on, then try a nosql solution. Until 
then, postgres will probably do 95% of what you want out of the box. 
-Zack

On Sunday, May 26, 2013 6:20:02 PM UTC-4, Amirouche Boubekki wrote:
>
>
>  1) Is it structured aka. an object can have several fields possibly 
>>> complex fields like list or hashmaps but also integers ? dates and uuids 
>>> can be emulated with strings and integers
>>> 2) Do objects have relations ? a lot of relations ?
>>> 3) is the data schema fixed at compilation or do you need to have the 
>>> schema to be dynamic ?
>>>
>>
>> Much of the data is conditional in a certain sense -- if it's an X, it's 
>> also a Y and it may be a W or a Z as well, but if it's a G it's certainly 
>> not a W, etc.; though simply storing a large number of boolean columns that 
>> may be unused by many of the table rows would be acceptable.
>>
>> The thing that makes me slightly dubious about relational here is that 
>> there will necessarily either be many columns unused by many rows, as 
>> there's a lot of data that's N/A unless certain other conditions are met; 
>> or else there will be many whole tables and a lot of expensive joins, as we 
>> have a table of Foos, with an isBar? column with either a BarID or a null, 
>> and a table of Bars with an isBaz? column, and a table of Bazzes with an 
>> isQuux? column, and then a need to do joins on *all* of those tables to run 
>> a query over a subset of Quuxes and have access to some Foo table columns 
>> in the results.
>>  
>> This sort of thing points towards an object database more than any other 
>> sort, with inherited fields from superclasses, or a map database that 
>> performs well with lots of null/missing keys in most of the maps. But maybe 
>> a relational DB table with very many columns but relatively few used by any 
>> given row would perform OK.
>>
>
> The only kind of object database that does ACID across documents on the 
> JVM I know of is Tinkerpop' Blueprints. Blueprints is an abstraction layer 
> on top of many graph databases among which Neo4j an OrientDB. The 
> difference between a graph database and an object database is that 
> «pointers» in a graph database are known at both ends. If you don't know 
> graph you will need to learn a bit of it. Basicaly, if A is connected to B, 
> B knows also about A being connected to it, which is not the case with a 
> pointer. Otherwise said, like in relationnal database, you can ask for «all 
> things connected to B» or «all things B connects to». The same query in an 
> object database will cost more. On top of that it's schemaless, like an 
> object database, but there is no notion of class, similar to what is found 
> OO programming (even if you can model the graph to have the concept of 
> classes).
>  
>
>>
>> The DB must be able to grow larger then available RAM without crashing 
>>>> the JVM and the seqs resulting from queries like the above will also need 
>>>> to be able to get bigger than RAM.
>>>>  
>>>  
>>>
>>>> My own research suggests that H2 may be a good choice, but it's a 
>>>> standard SQL/relational DB and I'm not 100% sure that fits well with the 
>>>> type of data and querying noted above. Note though that not all querying 
>>>> will take that form; there'll also be strings, uuids, dates, and other 
>>>> such 
>>>> field types and the need to query on these and to join on some of them; 
>>>> also, to do less-than comparisons on dates.
>>>>
>>>
>>> Depending on your speed needs and the speed of the database, a kv store 
>>> can be enough, you serialize the data as strings and deserialize it when 
>>> you need to do computation. Except that kv store are not easy to deal with 
>>> when you have complex queries, but again it depends on the query.
>>>
>>
>> I expect they'd also have problems with transactional integrity if, say, 
>> there was a power cut during an update. Anything involving "serialize the 
>> data as strings" sounds unsuited to either the volume I'm envisioning or 
>> the need for consistency. It certainly wouldn't do to overwrite the file 
>> with half of an updated version of itself and then lose power! Keeping the 
>> previous version around as a .bak file is scarcely much better. It pretty 
>> much needs to be ACID since there will need to be coordinated changes to 
>> more than one bit of the data sometimes and having an update interrupted 
>> with only half the changes done, and having it stay in that half-done 
>> state, would potentially be disastrous.
>>
>
> At least unqlite is a embeddable kv store that is ACID across several 
> keys, you won't have data cut in half (based on what is  advertised), I 
> think berkley db is also transactional.
>
> Also I'm interested only in opensource software so there might be 
> proprietary softwares that solve you problem best, but I doubt that ;)
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to