Use postgres. If it makes sense later on, then try a nosql solution. Until then, postgres will probably do 95% of what you want out of the box. -Zack
On Sunday, May 26, 2013 6:20:02 PM UTC-4, Amirouche Boubekki wrote: > > > 1) Is it structured aka. an object can have several fields possibly >>> complex fields like list or hashmaps but also integers ? dates and uuids >>> can be emulated with strings and integers >>> 2) Do objects have relations ? a lot of relations ? >>> 3) is the data schema fixed at compilation or do you need to have the >>> schema to be dynamic ? >>> >> >> Much of the data is conditional in a certain sense -- if it's an X, it's >> also a Y and it may be a W or a Z as well, but if it's a G it's certainly >> not a W, etc.; though simply storing a large number of boolean columns that >> may be unused by many of the table rows would be acceptable. >> >> The thing that makes me slightly dubious about relational here is that >> there will necessarily either be many columns unused by many rows, as >> there's a lot of data that's N/A unless certain other conditions are met; >> or else there will be many whole tables and a lot of expensive joins, as we >> have a table of Foos, with an isBar? column with either a BarID or a null, >> and a table of Bars with an isBaz? column, and a table of Bazzes with an >> isQuux? column, and then a need to do joins on *all* of those tables to run >> a query over a subset of Quuxes and have access to some Foo table columns >> in the results. >> >> This sort of thing points towards an object database more than any other >> sort, with inherited fields from superclasses, or a map database that >> performs well with lots of null/missing keys in most of the maps. But maybe >> a relational DB table with very many columns but relatively few used by any >> given row would perform OK. >> > > The only kind of object database that does ACID across documents on the > JVM I know of is Tinkerpop' Blueprints. Blueprints is an abstraction layer > on top of many graph databases among which Neo4j an OrientDB. The > difference between a graph database and an object database is that > «pointers» in a graph database are known at both ends. If you don't know > graph you will need to learn a bit of it. Basicaly, if A is connected to B, > B knows also about A being connected to it, which is not the case with a > pointer. Otherwise said, like in relationnal database, you can ask for «all > things connected to B» or «all things B connects to». The same query in an > object database will cost more. On top of that it's schemaless, like an > object database, but there is no notion of class, similar to what is found > OO programming (even if you can model the graph to have the concept of > classes). > > >> >> The DB must be able to grow larger then available RAM without crashing >>>> the JVM and the seqs resulting from queries like the above will also need >>>> to be able to get bigger than RAM. >>>> >>> >>> >>>> My own research suggests that H2 may be a good choice, but it's a >>>> standard SQL/relational DB and I'm not 100% sure that fits well with the >>>> type of data and querying noted above. Note though that not all querying >>>> will take that form; there'll also be strings, uuids, dates, and other >>>> such >>>> field types and the need to query on these and to join on some of them; >>>> also, to do less-than comparisons on dates. >>>> >>> >>> Depending on your speed needs and the speed of the database, a kv store >>> can be enough, you serialize the data as strings and deserialize it when >>> you need to do computation. Except that kv store are not easy to deal with >>> when you have complex queries, but again it depends on the query. >>> >> >> I expect they'd also have problems with transactional integrity if, say, >> there was a power cut during an update. Anything involving "serialize the >> data as strings" sounds unsuited to either the volume I'm envisioning or >> the need for consistency. It certainly wouldn't do to overwrite the file >> with half of an updated version of itself and then lose power! Keeping the >> previous version around as a .bak file is scarcely much better. It pretty >> much needs to be ACID since there will need to be coordinated changes to >> more than one bit of the data sometimes and having an update interrupted >> with only half the changes done, and having it stay in that half-done >> state, would potentially be disastrous. >> > > At least unqlite is a embeddable kv store that is ACID across several > keys, you won't have data cut in half (based on what is advertised), I > think berkley db is also transactional. > > Also I'm interested only in opensource software so there might be > proprietary softwares that solve you problem best, but I doubt that ;) > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.