Re: Local database recommendation?

2013-06-12 Thread Zack Maril
No. We have done nothing with laziness. If your graph doesn't change much, 
you could probably roll your own. If it is changing often, I'm not sure 
laziness would be such a good thing. One way of rolling your own could be 
going through and getting all the ids of the elements you want and then 
mapping across those lazily while doing any work you want with the objects 
they represent. That shouldn't be too difficult to do and all your id's 
might easily fit into memory without any problems. 
-Zack

On Wednesday, June 12, 2013 10:30:02 PM UTC-4, Cedric Greevey wrote:
>
> I have some additional questions about Titanium, as the documentation did 
> not make these particular matters sufficiently clear:
>
> 1. Can query results take the form of a lazy sequence, one which will not 
> result in an OOME if it's too large so long as the head is not held onto 
> while it is consumed? E.g. (take 3 (run-query-with-ten-zillion-results)) 
> should not blow up.
>
> 2. In particular, can indexed-key searches do so?
>
> 3. Given a backing DB that supports global queries of the whole graph, and 
> supposing the graph was ginormous and a query returned all or most of the 
> nodes, could *those* be lazily consumed in an OOME-avoiding manner?
>  

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Local database recommendation?

2013-06-12 Thread Cedric Greevey
I have some additional questions about Titanium, as the documentation did
not make these particular matters sufficiently clear:

1. Can query results take the form of a lazy sequence, one which will not
result in an OOME if it's too large so long as the head is not held onto
while it is consumed? E.g. (take 3 (run-query-with-ten-zillion-results))
should not blow up.

2. In particular, can indexed-key searches do so?

3. Given a backing DB that supports global queries of the whole graph, and
supposing the graph was ginormous and a query returned all or most of the
nodes, could *those* be lazily consumed in an OOME-avoiding manner?

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Local database recommendation?

2013-05-29 Thread Cedric Greevey
Thanks for the various suggestions. This now needs hammock time.


On Mon, May 27, 2013 at 11:22 AM, gaz jones wrote:

> Sqlite is worth a look. Never used it with the JVM, but I assume there is
> a JDBC driver for it.
>
>
> On Mon, May 27, 2013 at 1:01 AM, Zack Maril  wrote:
>
>> Use postgres. If it makes sense later on, then try a nosql solution.
>> Until then, postgres will probably do 95% of what you want out of the box.
>> -Zack
>>
>>
>> On Sunday, May 26, 2013 6:20:02 PM UTC-4, Amirouche Boubekki wrote:
>>>
>>>
>>>  1) Is it structured aka. an object can have several fields possibly
> complex fields like list or hashmaps but also integers ? dates and uuids
> can be emulated with strings and integers
> 2) Do objects have relations ? a lot of relations ?
> 3) is the data schema fixed at compilation or do you need to have the
> schema to be dynamic ?
>

 Much of the data is conditional in a certain sense -- if it's an X,
 it's also a Y and it may be a W or a Z as well, but if it's a G it's
 certainly not a W, etc.; though simply storing a large number of boolean
 columns that may be unused by many of the table rows would be acceptable.

 The thing that makes me slightly dubious about relational here is that
 there will necessarily either be many columns unused by many rows, as
 there's a lot of data that's N/A unless certain other conditions are met;
 or else there will be many whole tables and a lot of expensive joins, as we
 have a table of Foos, with an isBar? column with either a BarID or a null,
 and a table of Bars with an isBaz? column, and a table of Bazzes with an
 isQuux? column, and then a need to do joins on *all* of those tables to run
 a query over a subset of Quuxes and have access to some Foo table columns
 in the results.

 This sort of thing points towards an object database more than any
 other sort, with inherited fields from superclasses, or a map database that
 performs well with lots of null/missing keys in most of the maps. But maybe
 a relational DB table with very many columns but relatively few used by any
 given row would perform OK.

>>>
>>> The only kind of object database that does ACID across documents on the
>>> JVM I know of is Tinkerpop' Blueprints. Blueprints is an abstraction layer
>>> on top of many graph databases among which Neo4j an OrientDB. The
>>> difference between a graph database and an object database is that
>>> «pointers» in a graph database are known at both ends. If you don't know
>>> graph you will need to learn a bit of it. Basicaly, if A is connected to B,
>>> B knows also about A being connected to it, which is not the case with a
>>> pointer. Otherwise said, like in relationnal database, you can ask for «all
>>> things connected to B» or «all things B connects to». The same query in an
>>> object database will cost more. On top of that it's schemaless, like an
>>> object database, but there is no notion of class, similar to what is found
>>> OO programming (even if you can model the graph to have the concept of
>>> classes).
>>>
>>>

 The DB must be able to grow larger then available RAM without crashing
>> the JVM and the seqs resulting from queries like the above will also need
>> to be able to get bigger than RAM.
>>
>
>
>> My own research suggests that H2 may be a good choice, but it's a
>> standard SQL/relational DB and I'm not 100% sure that fits well with the
>> type of data and querying noted above. Note though that not all querying
>> will take that form; there'll also be strings, uuids, dates, and other 
>> such
>> field types and the need to query on these and to join on some of them;
>> also, to do less-than comparisons on dates.
>>
>
> Depending on your speed needs and the speed of the database, a kv
> store can be enough, you serialize the data as strings and deserialize it
> when you need to do computation. Except that kv store are not easy to deal
> with when you have complex queries, but again it depends on the query.
>

 I expect they'd also have problems with transactional integrity if,
 say, there was a power cut during an update. Anything involving "serialize
 the data as strings" sounds unsuited to either the volume I'm envisioning
 or the need for consistency. It certainly wouldn't do to overwrite the file
 with half of an updated version of itself and then lose power! Keeping the
 previous version around as a .bak file is scarcely much better. It pretty
 much needs to be ACID since there will need to be coordinated changes to
 more than one bit of the data sometimes and having an update interrupted
 with only half the changes done, and having it stay in that half-done
 state, would potentially be disastrous.

>>>
>>> At least unqlite is a embeddable kv store that is ACID across severa

Re: Local database recommendation?

2013-05-27 Thread gaz jones
Sqlite is worth a look. Never used it with the JVM, but I assume there is a
JDBC driver for it.


On Mon, May 27, 2013 at 1:01 AM, Zack Maril  wrote:

> Use postgres. If it makes sense later on, then try a nosql solution. Until
> then, postgres will probably do 95% of what you want out of the box.
> -Zack
>
>
> On Sunday, May 26, 2013 6:20:02 PM UTC-4, Amirouche Boubekki wrote:
>>
>>
>>  1) Is it structured aka. an object can have several fields possibly
 complex fields like list or hashmaps but also integers ? dates and uuids
 can be emulated with strings and integers
 2) Do objects have relations ? a lot of relations ?
 3) is the data schema fixed at compilation or do you need to have the
 schema to be dynamic ?

>>>
>>> Much of the data is conditional in a certain sense -- if it's an X, it's
>>> also a Y and it may be a W or a Z as well, but if it's a G it's certainly
>>> not a W, etc.; though simply storing a large number of boolean columns that
>>> may be unused by many of the table rows would be acceptable.
>>>
>>> The thing that makes me slightly dubious about relational here is that
>>> there will necessarily either be many columns unused by many rows, as
>>> there's a lot of data that's N/A unless certain other conditions are met;
>>> or else there will be many whole tables and a lot of expensive joins, as we
>>> have a table of Foos, with an isBar? column with either a BarID or a null,
>>> and a table of Bars with an isBaz? column, and a table of Bazzes with an
>>> isQuux? column, and then a need to do joins on *all* of those tables to run
>>> a query over a subset of Quuxes and have access to some Foo table columns
>>> in the results.
>>>
>>> This sort of thing points towards an object database more than any other
>>> sort, with inherited fields from superclasses, or a map database that
>>> performs well with lots of null/missing keys in most of the maps. But maybe
>>> a relational DB table with very many columns but relatively few used by any
>>> given row would perform OK.
>>>
>>
>> The only kind of object database that does ACID across documents on the
>> JVM I know of is Tinkerpop' Blueprints. Blueprints is an abstraction layer
>> on top of many graph databases among which Neo4j an OrientDB. The
>> difference between a graph database and an object database is that
>> «pointers» in a graph database are known at both ends. If you don't know
>> graph you will need to learn a bit of it. Basicaly, if A is connected to B,
>> B knows also about A being connected to it, which is not the case with a
>> pointer. Otherwise said, like in relationnal database, you can ask for «all
>> things connected to B» or «all things B connects to». The same query in an
>> object database will cost more. On top of that it's schemaless, like an
>> object database, but there is no notion of class, similar to what is found
>> OO programming (even if you can model the graph to have the concept of
>> classes).
>>
>>
>>>
>>> The DB must be able to grow larger then available RAM without crashing
> the JVM and the seqs resulting from queries like the above will also need
> to be able to get bigger than RAM.
>


> My own research suggests that H2 may be a good choice, but it's a
> standard SQL/relational DB and I'm not 100% sure that fits well with the
> type of data and querying noted above. Note though that not all querying
> will take that form; there'll also be strings, uuids, dates, and other 
> such
> field types and the need to query on these and to join on some of them;
> also, to do less-than comparisons on dates.
>

 Depending on your speed needs and the speed of the database, a kv store
 can be enough, you serialize the data as strings and deserialize it when
 you need to do computation. Except that kv store are not easy to deal with
 when you have complex queries, but again it depends on the query.

>>>
>>> I expect they'd also have problems with transactional integrity if, say,
>>> there was a power cut during an update. Anything involving "serialize the
>>> data as strings" sounds unsuited to either the volume I'm envisioning or
>>> the need for consistency. It certainly wouldn't do to overwrite the file
>>> with half of an updated version of itself and then lose power! Keeping the
>>> previous version around as a .bak file is scarcely much better. It pretty
>>> much needs to be ACID since there will need to be coordinated changes to
>>> more than one bit of the data sometimes and having an update interrupted
>>> with only half the changes done, and having it stay in that half-done
>>> state, would potentially be disastrous.
>>>
>>
>> At least unqlite is a embeddable kv store that is ACID across several
>> keys, you won't have data cut in half (based on what is  advertised), I
>> think berkley db is also transactional.
>>
>> Also I'm interested only in opensource software so there might be
>> proprietary softwares

Re: Local database recommendation?

2013-05-26 Thread Zack Maril
Use postgres. If it makes sense later on, then try a nosql solution. Until 
then, postgres will probably do 95% of what you want out of the box. 
-Zack

On Sunday, May 26, 2013 6:20:02 PM UTC-4, Amirouche Boubekki wrote:
>
>
>  1) Is it structured aka. an object can have several fields possibly 
>>> complex fields like list or hashmaps but also integers ? dates and uuids 
>>> can be emulated with strings and integers
>>> 2) Do objects have relations ? a lot of relations ?
>>> 3) is the data schema fixed at compilation or do you need to have the 
>>> schema to be dynamic ?
>>>
>>
>> Much of the data is conditional in a certain sense -- if it's an X, it's 
>> also a Y and it may be a W or a Z as well, but if it's a G it's certainly 
>> not a W, etc.; though simply storing a large number of boolean columns that 
>> may be unused by many of the table rows would be acceptable.
>>
>> The thing that makes me slightly dubious about relational here is that 
>> there will necessarily either be many columns unused by many rows, as 
>> there's a lot of data that's N/A unless certain other conditions are met; 
>> or else there will be many whole tables and a lot of expensive joins, as we 
>> have a table of Foos, with an isBar? column with either a BarID or a null, 
>> and a table of Bars with an isBaz? column, and a table of Bazzes with an 
>> isQuux? column, and then a need to do joins on *all* of those tables to run 
>> a query over a subset of Quuxes and have access to some Foo table columns 
>> in the results.
>>  
>> This sort of thing points towards an object database more than any other 
>> sort, with inherited fields from superclasses, or a map database that 
>> performs well with lots of null/missing keys in most of the maps. But maybe 
>> a relational DB table with very many columns but relatively few used by any 
>> given row would perform OK.
>>
>
> The only kind of object database that does ACID across documents on the 
> JVM I know of is Tinkerpop' Blueprints. Blueprints is an abstraction layer 
> on top of many graph databases among which Neo4j an OrientDB. The 
> difference between a graph database and an object database is that 
> «pointers» in a graph database are known at both ends. If you don't know 
> graph you will need to learn a bit of it. Basicaly, if A is connected to B, 
> B knows also about A being connected to it, which is not the case with a 
> pointer. Otherwise said, like in relationnal database, you can ask for «all 
> things connected to B» or «all things B connects to». The same query in an 
> object database will cost more. On top of that it's schemaless, like an 
> object database, but there is no notion of class, similar to what is found 
> OO programming (even if you can model the graph to have the concept of 
> classes).
>  
>
>>
>> The DB must be able to grow larger then available RAM without crashing 
 the JVM and the seqs resulting from queries like the above will also need 
 to be able to get bigger than RAM.
  
>>>  
>>>
 My own research suggests that H2 may be a good choice, but it's a 
 standard SQL/relational DB and I'm not 100% sure that fits well with the 
 type of data and querying noted above. Note though that not all querying 
 will take that form; there'll also be strings, uuids, dates, and other 
 such 
 field types and the need to query on these and to join on some of them; 
 also, to do less-than comparisons on dates.

>>>
>>> Depending on your speed needs and the speed of the database, a kv store 
>>> can be enough, you serialize the data as strings and deserialize it when 
>>> you need to do computation. Except that kv store are not easy to deal with 
>>> when you have complex queries, but again it depends on the query.
>>>
>>
>> I expect they'd also have problems with transactional integrity if, say, 
>> there was a power cut during an update. Anything involving "serialize the 
>> data as strings" sounds unsuited to either the volume I'm envisioning or 
>> the need for consistency. It certainly wouldn't do to overwrite the file 
>> with half of an updated version of itself and then lose power! Keeping the 
>> previous version around as a .bak file is scarcely much better. It pretty 
>> much needs to be ACID since there will need to be coordinated changes to 
>> more than one bit of the data sometimes and having an update interrupted 
>> with only half the changes done, and having it stay in that half-done 
>> state, would potentially be disastrous.
>>
>
> At least unqlite is a embeddable kv store that is ACID across several 
> keys, you won't have data cut in half (based on what is  advertised), I 
> think berkley db is also transactional.
>
> Also I'm interested only in opensource software so there might be 
> proprietary softwares that solve you problem best, but I doubt that ;)
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email t

Re: Local database recommendation?

2013-05-26 Thread Amirouche Boubekki
>  1) Is it structured aka. an object can have several fields possibly
>> complex fields like list or hashmaps but also integers ? dates and uuids
>> can be emulated with strings and integers
>> 2) Do objects have relations ? a lot of relations ?
>> 3) is the data schema fixed at compilation or do you need to have the
>> schema to be dynamic ?
>>
>
> Much of the data is conditional in a certain sense -- if it's an X, it's
> also a Y and it may be a W or a Z as well, but if it's a G it's certainly
> not a W, etc.; though simply storing a large number of boolean columns that
> may be unused by many of the table rows would be acceptable.
>
> The thing that makes me slightly dubious about relational here is that
> there will necessarily either be many columns unused by many rows, as
> there's a lot of data that's N/A unless certain other conditions are met;
> or else there will be many whole tables and a lot of expensive joins, as we
> have a table of Foos, with an isBar? column with either a BarID or a null,
> and a table of Bars with an isBaz? column, and a table of Bazzes with an
> isQuux? column, and then a need to do joins on *all* of those tables to run
> a query over a subset of Quuxes and have access to some Foo table columns
> in the results.
>
> This sort of thing points towards an object database more than any other
> sort, with inherited fields from superclasses, or a map database that
> performs well with lots of null/missing keys in most of the maps. But maybe
> a relational DB table with very many columns but relatively few used by any
> given row would perform OK.
>

The only kind of object database that does ACID across documents on the JVM
I know of is Tinkerpop' Blueprints. Blueprints is an abstraction layer on
top of many graph databases among which Neo4j an OrientDB. The difference
between a graph database and an object database is that «pointers» in a
graph database are known at both ends. If you don't know graph you will
need to learn a bit of it. Basicaly, if A is connected to B, B knows also
about A being connected to it, which is not the case with a pointer.
Otherwise said, like in relationnal database, you can ask for «all things
connected to B» or «all things B connects to». The same query in an object
database will cost more. On top of that it's schemaless, like an object
database, but there is no notion of class, similar to what is found OO
programming (even if you can model the graph to have the concept of
classes).


>
> The DB must be able to grow larger then available RAM without crashing the
>>> JVM and the seqs resulting from queries like the above will also need to be
>>> able to get bigger than RAM.
>>>
>>
>>
>>> My own research suggests that H2 may be a good choice, but it's a
>>> standard SQL/relational DB and I'm not 100% sure that fits well with the
>>> type of data and querying noted above. Note though that not all querying
>>> will take that form; there'll also be strings, uuids, dates, and other such
>>> field types and the need to query on these and to join on some of them;
>>> also, to do less-than comparisons on dates.
>>>
>>
>> Depending on your speed needs and the speed of the database, a kv store
>> can be enough, you serialize the data as strings and deserialize it when
>> you need to do computation. Except that kv store are not easy to deal with
>> when you have complex queries, but again it depends on the query.
>>
>
> I expect they'd also have problems with transactional integrity if, say,
> there was a power cut during an update. Anything involving "serialize the
> data as strings" sounds unsuited to either the volume I'm envisioning or
> the need for consistency. It certainly wouldn't do to overwrite the file
> with half of an updated version of itself and then lose power! Keeping the
> previous version around as a .bak file is scarcely much better. It pretty
> much needs to be ACID since there will need to be coordinated changes to
> more than one bit of the data sometimes and having an update interrupted
> with only half the changes done, and having it stay in that half-done
> state, would potentially be disastrous.
>

At least unqlite is a embeddable kv store that is ACID across several keys,
you won't have data cut in half (based on what is  advertised), I think
berkley db is also transactional.

Also I'm interested only in opensource software so there might be
proprietary softwares that solve you problem best, but I doubt that ;)

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe

Re: Local database recommendation?

2013-05-26 Thread fmjrey


On Sunday, May 26, 2013 9:11:03 PM UTC+2, Cedric Greevey wrote:
>
> On Sun, May 26, 2013 at 1:56 PM, James Thornton 
> 
> > wrote:
>
>>
>>
>> On Sunday, May 26, 2013 12:14:22 PM UTC-5, Cedric Greevey wrote:
>>
>>> On Sun, May 26, 2013 at 11:33 AM, James Thornton 
>>> wrote:
>>>
 Hi Cedric -

 Look at Datomic free edition or the Titan graph database using 
 either Berkeley DB as its backend datastore or Cassandra in single-server 
 mode -- you can run both locally. 

 Datomic: http://www.datomic.**com/ 
 Docs: http://docs.datomic.com/
 Clojure Client: 
 http://docs.datomic.**com/clojure/index.html
 Videos: 
 http://www.datomic.**com/videos.html
 Blog: http://blog.datomic.com/
 Query Language: 
 http://docs.datomic.**com/query.html
  (Datalog)

>>>
>>> The page seems to imply that the free edition DB has to fit in main 
>>> memory.
>>>
>>
>>
>> No, Datomic free edition does not have to fit in main memory -- the 
>> difference between Pro and Free are explained here:  
>> http://www.datomic.com/pricing.html
>>
>> The Datomic Free peer library includes a memory database and embedded 
>> Datomic Datalog.
>>
>
> No mention of a non-memory database. 
>
 
The peer (reader, e.g. your application) uses a memory database, not the 
transactor (the single writer, the server all peer connects to).
See 
https://groups.google.com/forum/?fromgroups#!searchin/datomic/free$20memory/datomic/ay3S0M7mVSg/gbbCe0UEnnEJ

Based on what you explained in other posts I again encourage you again to 
check out Datomic and it's notion of datoms. The video I mentioned earlier 
explains it well and should make you see its value.
OrientDB is also a suitable choice as it's lightweight, embeddable, and 
fast. Its rooted in object databases, and is able to act as graph or 
document db, whichever is more suitable for your case. There's a clojure 
orientdb library somewhere if you google it, and the blueprint api + 
tinkerpop stack could be a nice feature.

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Local database recommendation?

2013-05-26 Thread Cedric Greevey
On Sun, May 26, 2013 at 2:45 PM, Amirouche Boubekki <
amirouche.boube...@gmail.com> wrote:

>
>
>
> 2013/5/26 Cedric Greevey 
>
>> I may be developing an application which will need a persistent,
>>
>
>
>> ACID
>>
>
> which means at least transactionnal, are you sure you need that ?
> depending on the database, ACID means differents things. Do you need data
> integrity across «documents», which means that a transaction must span
> modification to several objects, if a failure happens everything should
> rolled back or not persisted.
>

Yes, I need things not to be able to get left half-done. :)


>  local database (on the same disk as the application, rather than having
>> to be accessed over the network)
>>
>
> which means embedded
>
>
>> containing information about potentially 100,000-1,000,000 (or more)
>> objects.
>>
>
> which means relatively big
>

I expect a few GB to a few tens of GB in practice. Chump change,
disk-space-wise, but just a bit too big to want to try loading it all into
RAM at once, even on the 8GB development box here. It would probably work,
but run like a pig and make the rest of the system horribly slow due to
paging.

Much of that information will be of a quasi-boolean character: "is it an X
>> or not?" for various choices of X, but with "yes", "no", "borderline", and
>> "not yet evaluated" as the four possible values. It will be desirable to
>> query for these, for example to get a lazy seq of all objects for which
>> it's a borderline Y or for which it's not yet evaluated whether it's a Z or
>> for which it's either "yes" or "borderline" on whether it's an X or
>> whatever.
>>
>
> It seems like loosely structured data for which a key/value store (also
> know as kv store) might be great
>
>
>>  I'm not that familiar with the local-DB solutions out there. I'd like a
>> recommendation for one which is *
>>
>
>
>> a) a good for for Clojure use
>>
>
> I'm not sure about Clojure specificities related to bindings C/C++
> databases, but in Python it's some 
> ctypes(or else) definitions 
> away.
>
>
>> and b) a good fit for the type of data and queries noted above.
>>
>
> You are not very specific about the queries and the data.
>
> 1) Is it structured aka. an object can have several fields possibly
> complex fields like list or hashmaps but also integers ? dates and uuids
> can be emulated with strings and integers
> 2) Do objects have relations ? a lot of relations ?
> 3) is the data schema fixed at compilation or do you need to have the
> schema to be dynamic ?
>

Much of the data is conditional in a certain sense -- if it's an X, it's
also a Y and it may be a W or a Z as well, but if it's a G it's certainly
not a W, etc.; though simply storing a large number of boolean columns that
may be unused by many of the table rows would be acceptable.

The thing that makes me slightly dubious about relational here is that
there will necessarily either be many columns unused by many rows, as
there's a lot of data that's N/A unless certain other conditions are met;
or else there will be many whole tables and a lot of expensive joins, as we
have a table of Foos, with an isBar? column with either a BarID or a null,
and a table of Bars with an isBaz? column, and a table of Bazzes with an
isQuux? column, and then a need to do joins on *all* of those tables to run
a query over a subset of Quuxes and have access to some Foo table columns
in the results.

This sort of thing points towards an object database more than any other
sort, with inherited fields from superclasses, or a map database that
performs well with lots of null/missing keys in most of the maps. But maybe
a relational DB table with very many columns but relatively few used by any
given row would perform OK.

The DB must be able to grow larger then available RAM without crashing the
>> JVM and the seqs resulting from queries like the above will also need to be
>> able to get bigger than RAM.
>>
>
>
>> My own research suggests that H2 may be a good choice, but it's a
>> standard SQL/relational DB and I'm not 100% sure that fits well with the
>> type of data and querying noted above. Note though that not all querying
>> will take that form; there'll also be strings, uuids, dates, and other such
>> field types and the need to query on these and to join on some of them;
>> also, to do less-than comparisons on dates.
>>
>
> Depending on your speed needs and the speed of the database, a kv store
> can be enough, you serialize the data as strings and deserialize it when
> you need to do computation. Except that kv store are not easy to deal with
> when you have complex queries, but again it depends on the query.
>

I expect they'd also have problems with transactional integrity if, say,
there was a power cut during an update. Anything involving "serialize the
data as strings" sounds unsuited to either the volume I'm envisioning or
the need for consistency. It certainly wouldn't do to overwrite the

Re: Local database recommendation?

2013-05-26 Thread Cedric Greevey
On Sun, May 26, 2013 at 1:56 PM, James Thornton wrote:

>
>
> On Sunday, May 26, 2013 12:14:22 PM UTC-5, Cedric Greevey wrote:
>
>> On Sun, May 26, 2013 at 11:33 AM, James Thornton wrote:
>>
>>> Hi Cedric -
>>>
>>> Look at Datomic free edition or the Titan graph database using
>>> either Berkeley DB as its backend datastore or Cassandra in single-server
>>> mode -- you can run both locally.
>>>
>>> Datomic: http://www.datomic.**com/ 
>>> Docs: http://docs.datomic.com/
>>> Clojure Client: 
>>> http://docs.datomic.**com/clojure/index.html
>>> Videos: 
>>> http://www.datomic.**com/videos.html
>>> Blog: http://blog.datomic.com/
>>> Query Language: 
>>> http://docs.datomic.**com/query.html
>>>  (Datalog)
>>>
>>
>> The page seems to imply that the free edition DB has to fit in main
>> memory.
>>
>
>
> No, Datomic free edition does not have to fit in main memory -- the
> difference between Pro and Free are explained here:
> http://www.datomic.com/pricing.html
>
> The Datomic Free peer library includes a memory database and embedded
> Datomic Datalog.
>

No mention of a non-memory database.

The Datomic Free transactor includes an embedded durable storage engine.
> Datomic Free does not support any external storage services. The Datomic
> Free transactor is limited to 2 simultaneous peers. Even with those limits,
> it's still a quite capable system. Because the components are
> redistributable, it's great for applications that you want to share.
>
> If you have questions, here is the Datomic mailing list:
> https://groups.google.com/forum/?fromgroups=#!forum/datomic
>
>
>>
>>
>>> Titan: 
>>> http://thinkaurelius.**github.io/titan/
>>> Repo: 
>>> https://github.com/**thinkaurelius/titan
>>> Clojure Client: 
>>> https://github.com/**clojurewerkz/archimedes
>>> Blog: http://thinkaurelius.**com/blog/ 
>>> Query Language: 
>>> https://github.com/**tinkerpop/gremlin/wiki
>>>  (**Gremlin)
>>>
>>> See the Resources section of the TinkerPop Book website for a collection
>>> of Titan videos and tutorials: 
>>> http://www.**tinkerpopbook.com/#resources
>>>
>>
>> That seems more promising, but there seems to be no documentation to
>> speak of for Archimides and precious little for Titan, at least not without
>> devoting substantial time and bandwidth to viewing videos.
>>
>> It's unclear, then, how I'd go about assembling everything into a clooj
>> project that would find all of its dependencies, nor how I'd use the API to
>> represent, query, add, change, etc. the data. (That tends to happen when no
>> API documentation seems to be linked from anywhere. :))
>>
>> Long story short -- it seems that this stuff is either a) not ready for
>> prime time yet, b) targeted predominantly at people that are already at
>> expert proficiency working with graph databases with little concession for
>> learnability/usability by others, c) targeted predominantly at people using
>> Groovy rather than Clojure, or d) some combination of these things. :(
>>
>
> If you click through to the Titan wiki, you'll find extensive
> documentation: https://github.com/thinkaurelius/titan/wiki
>

Wikis typically provide useful references but poor getting-started
information. If I'm looking for the latter I'm likely to ignore wikis.
Consider Oracle's Java Tutorial vs. the Javadocs for the standard library
-- if I'm looking for material more like the former, I'm likely to ignore a
wiki as it's likely to be more like the latter, less useful until you're
already up and running and need to look something up that you already know
about.


> And for a Quickstart guide to get Titan up and running in 5 mins, check
> out Marko's blog:
>
> "Titan Server: From a Single Server to a Highly Available Cluster"
>
> http://thinkaurelius.com/2013/03/30/titan-server-from-a-single-server-to-a-highly-available-cluster/
>
> Titan is the first native TinkerPop-Blueprints DB so you connect to it and
> interact with it like any other Blueprints graph database (
> https://github.com/tinkerpop/blueprints/wiki).
>
> Think of Gremlin as a domain-specific language for graphs you use in
> harmony with your native programming language. The original Gremlin was
> written in Groovy and it's what most people use, but Zack Maril and the
> ClojureWerks team recently released a Gremlin-Clojure library called Ogre (
> https://github.com/clojurewerkz/ogre).
>
> Here are Ogre's docs: http://ogre.clojurewerkz.org/
>
> BTW: I should have pointed you to Titanium instead
> of Archimedes -- Archimedes is a lower-level library for connecting to
> any Blueprints database. Titanium is a higher-level, Titan-specific library
> built on

Re: Local database recommendation?

2013-05-26 Thread fmjrey
I would definitely check out Datomic. The free edition uses H2 internally 
to persist to disk, it's not just a memory db. The datalog query language 
for Datomic is great, and you should definitely give it some thoughts. A 
good overview video on Datomic which I recommend is listed on their site: 
*Datomic, 
and How We Built 
It
.*
If you think a graph database might be suitable for your case, you may want 
to check out this comparison page: 
http://architects.dzone.com/articles/16-graph-databases-compared.
OrientDB (http://www.orientdb.org/) might be a suitable lightweight and 
mature option. It can be embedded and includes various query options:  
Traverser 
API, Blueprints, Rexster, Gremlin, and its own SQL-like Query Language.
In the case of a graph database t the blueprints API may give you some 
added benefits as explained here: 
http://cloud.dzone.com/articles/get-started-tinkerpop. Datomic can also be 
accessed through the Blueprints API.

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Local database recommendation?

2013-05-26 Thread Amirouche Boubekki
2013/5/26 Amirouche Boubekki 

> I think Blueprints from the 
> Tinkerpopstack would be best suited.
>

If a graph is needed.

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Local database recommendation?

2013-05-26 Thread Amirouche Boubekki
I think Blueprints from the
Tinkerpopstack would be best
suited.


2013/5/26 James Thornton 

>
>
> On Sunday, May 26, 2013 12:14:22 PM UTC-5, Cedric Greevey wrote:
>
>> On Sun, May 26, 2013 at 11:33 AM, James Thornton wrote:
>>
>>> Hi Cedric -
>>>
>>> Look at Datomic free edition or the Titan graph database using
>>> either Berkeley DB as its backend datastore or Cassandra in single-server
>>> mode -- you can run both locally.
>>>
>>> Datomic: http://www.datomic.**com/ 
>>> Docs: http://docs.datomic.com/
>>> Clojure Client: 
>>> http://docs.datomic.**com/clojure/index.html
>>> Videos: 
>>> http://www.datomic.**com/videos.html
>>> Blog: http://blog.datomic.com/
>>> Query Language: 
>>> http://docs.datomic.**com/query.html
>>>  (Datalog)
>>>
>>
>> The page seems to imply that the free edition DB has to fit in main
>> memory.
>>
>
>
> No, Datomic free edition does not have to fit in main memory -- the
> difference between Pro and Free are explained here:
> http://www.datomic.com/pricing.html
>
> The Datomic Free peer library includes a memory database and embedded
> Datomic Datalog.
>
> The Datomic Free transactor includes an embedded durable storage engine.
> Datomic Free does not support any external storage services. The Datomic
> Free transactor is limited to 2 simultaneous peers. Even with those limits,
> it's still a quite capable system. Because the components are
> redistributable, it's great for applications that you want to share.
>
> If you have questions, here is the Datomic mailing list:
> https://groups.google.com/forum/?fromgroups=#!forum/datomic
>
>
>>
>>
>>> Titan: 
>>> http://thinkaurelius.**github.io/titan/
>>> Repo: 
>>> https://github.com/**thinkaurelius/titan
>>> Clojure Client: 
>>> https://github.com/**clojurewerkz/archimedes
>>> Blog: http://thinkaurelius.**com/blog/ 
>>> Query Language: 
>>> https://github.com/**tinkerpop/gremlin/wiki
>>>  (**Gremlin)
>>>
>>> See the Resources section of the TinkerPop Book website for a collection
>>> of Titan videos and tutorials: 
>>> http://www.**tinkerpopbook.com/#resources
>>>
>>
>> That seems more promising, but there seems to be no documentation to
>> speak of for Archimides and precious little for Titan, at least not without
>> devoting substantial time and bandwidth to viewing videos.
>>
>> It's unclear, then, how I'd go about assembling everything into a clooj
>> project that would find all of its dependencies, nor how I'd use the API to
>> represent, query, add, change, etc. the data. (That tends to happen when no
>> API documentation seems to be linked from anywhere. :))
>>
>> Long story short -- it seems that this stuff is either a) not ready for
>> prime time yet, b) targeted predominantly at people that are already at
>> expert proficiency working with graph databases with little concession for
>> learnability/usability by others, c) targeted predominantly at people using
>> Groovy rather than Clojure, or d) some combination of these things. :(
>>
>
> If you click through to the Titan wiki, you'll find extensive
> documentation: https://github.com/thinkaurelius/titan/wiki
>
> And for a Quickstart guide to get Titan up and running in 5 mins, check
> out Marko's blog:
>
> "Titan Server: From a Single Server to a Highly Available Cluster"
>
> http://thinkaurelius.com/2013/03/30/titan-server-from-a-single-server-to-a-highly-available-cluster/
>
> Titan is the first native TinkerPop-Blueprints DB so you connect to it and
> interact with it like any other Blueprints graph database (
> https://github.com/tinkerpop/blueprints/wiki).
>
> Think of Gremlin as a domain-specific language for graphs you use in
> harmony with your native programming language. The original Gremlin was
> written in Groovy and it's what most people use, but Zack Maril and the
> ClojureWerks team recently released a Gremlin-Clojure library called Ogre (
> https://github.com/clojurewerkz/ogre).
>
> Here are Ogre's docs: http://ogre.clojurewerkz.org/
>
> BTW: I should have pointed you to Titanium instead
> of Archimedes -- Archimedes is a lower-level library for connecting to
> any Blueprints database. Titanium is a higher-level, Titan-specific library
> built on top of Archimedes, and it's well documented:
>
> Docs: http://titanium.clojurewerkz.org/
> Repo: https://github.com/clojurewerkz/titanium
>
> If you have questions, here are the mailing lists for Titan and Gremlin:
>
> Titan: https://groups.google.com/forum/#!forum/aureliusgraphs
> Gremlin: https://groups.google.com/forum/#!forum/gremlin-users
>
> - James
>
>
>
>
>  --
> --
> You received this messag

Re: Local database recommendation?

2013-05-26 Thread Amirouche Boubekki
2013/5/26 Cedric Greevey 

> I may be developing an application which will need a persistent,
>


> ACID
>

which means at least transactionnal, are you sure you need that ? depending
on the database, ACID means differents things. Do you need data integrity
across «documents», which means that a transaction must span modification
to several objects, if a failure happens everything should rolled back or
not persisted.


> local database (on the same disk as the application, rather than having to
> be accessed over the network)
>

which means embedded


> containing information about potentially 100,000-1,000,000 (or more)
> objects.
>

which means relatively big


> Much of that information will be of a quasi-boolean character: "is it an X
> or not?" for various choices of X, but with "yes", "no", "borderline", and
> "not yet evaluated" as the four possible values. It will be desirable to
> query for these, for example to get a lazy seq of all objects for which
> it's a borderline Y or for which it's not yet evaluated whether it's a Z or
> for which it's either "yes" or "borderline" on whether it's an X or
> whatever.
>

It seems like loosely structured data for which a key/value store (also
know as kv store) might be great


>  I'm not that familiar with the local-DB solutions out there. I'd like a
> recommendation for one which is *
>


> a) a good for for Clojure use
>

I'm not sure about Clojure specificities related to bindings C/C++
databases, but in Python it's some
ctypes(or else)
definitions away.


> and b) a good fit for the type of data and queries noted above.
>

You are not very specific about the queries and the data.

1) Is it structured aka. an object can have several fields possibly complex
fields like list or hashmaps but also integers ? dates and uuids can be
emulated with strings and integers
2) Do objects have relations ? a lot of relations ?
3) is the data schema fixed at compilation or do you need to have the
schema to be dynamic ?


> The DB must be able to grow larger then available RAM without crashing the
> JVM and the seqs resulting from queries like the above will also need to be
> able to get bigger than RAM.
>


> My own research suggests that H2 may be a good choice, but it's a standard
> SQL/relational DB and I'm not 100% sure that fits well with the type of
> data and querying noted above. Note though that not all querying will take
> that form; there'll also be strings, uuids, dates, and other such field
> types and the need to query on these and to join on some of them; also, to
> do less-than comparisons on dates.
>

Depending on your speed needs and the speed of the database, a kv store can
be enough, you serialize the data as strings and deserialize it when you
need to do computation. Except that kv store are not easy to deal with when
you have complex queries, but again it depends on the query.

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Local database recommendation?

2013-05-26 Thread Max Penet
H2 sounds like the safe choice.  

You could also try/evaluate the latest 
mapdb: https://github.com/jankotek/MapDB#features 
It very easy to use from clojure but I only used it on trivial stuff 
however, and it's still considered alpha.

On Sunday, May 26, 2013 5:09:51 PM UTC+2, Cedric Greevey wrote:
>
> I may be developing an application which will need a persistent, ACID 
> local database (on the same disk as the application, rather than having to 
> be accessed over the network) containing information about potentially 
> 100,000-1,000,000 (or more) objects.
>
> Much of that information will be of a quasi-boolean character: "is it an X 
> or not?" for various choices of X, but with "yes", "no", "borderline", and 
> "not yet evaluated" as the four possible values. It will be desirable to 
> query for these, for example to get a lazy seq of all objects for which 
> it's a borderline Y or for which it's not yet evaluated whether it's a Z or 
> for which it's either "yes" or "borderline" on whether it's an X or 
> whatever.
>
> I'm not that familiar with the local-DB solutions out there. I'd like a 
> recommendation for one which is a) a good for for Clojure use and b) a good 
> fit for the type of data and queries noted above. The DB must be able to 
> grow larger then available RAM without crashing the JVM and the seqs 
> resulting from queries like the above will also need to be able to get 
> bigger than RAM.
>
> My own research suggests that H2 may be a good choice, but it's a standard 
> SQL/relational DB and I'm not 100% sure that fits well with the type of 
> data and querying noted above. Note though that not all querying will take 
> that form; there'll also be strings, uuids, dates, and other such field 
> types and the need to query on these and to join on some of them; also, to 
> do less-than comparisons on dates.
>
> Also, what is the current best recommendation of clojure library for 
> interfacing to the DB? (Answer might depend on the sort of DB recommended 
> -- standard, object/NoSQL, graph/ontology, etc.)
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Local database recommendation?

2013-05-26 Thread James Thornton


On Sunday, May 26, 2013 12:14:22 PM UTC-5, Cedric Greevey wrote:
>
> On Sun, May 26, 2013 at 11:33 AM, James Thornton 
> 
> > wrote:
>
>> Hi Cedric -
>>
>> Look at Datomic free edition or the Titan graph database using 
>> either Berkeley DB as its backend datastore or Cassandra in single-server 
>> mode -- you can run both locally. 
>>
>> Datomic: http://www.datomic.com/
>> Docs: http://docs.datomic.com/
>> Clojure Client: http://docs.datomic.com/clojure/index.html
>> Videos: http://www.datomic.com/videos.html
>> Blog: http://blog.datomic.com/
>> Query Language: http://docs.datomic.com/query.html (Datalog)
>>
>
> The page seems to imply that the free edition DB has to fit in main memory.
>


No, Datomic free edition does not have to fit in main memory -- the 
difference between Pro and Free are explained here:  
http://www.datomic.com/pricing.html

The Datomic Free peer library includes a memory database and embedded 
Datomic Datalog.

The Datomic Free transactor includes an embedded durable storage engine. 
Datomic Free does not support any external storage services. The Datomic 
Free transactor is limited to 2 simultaneous peers. Even with those limits, 
it's still a quite capable system. Because the components are 
redistributable, it's great for applications that you want to share.

If you have questions, here is the Datomic mailing list: 
https://groups.google.com/forum/?fromgroups=#!forum/datomic
 

>  
>
>> Titan: http://thinkaurelius.github.io/titan/
>> Repo: https://github.com/thinkaurelius/titan
>> Clojure Client: https://github.com/clojurewerkz/archimedes
>> Blog: http://thinkaurelius.com/blog/
>> Query Language: https://github.com/tinkerpop/gremlin/wiki (Gremlin)
>>
>> See the Resources section of the TinkerPop Book website for a collection 
>> of Titan videos and tutorials: http://www.tinkerpopbook.com/#resources
>>
>
> That seems more promising, but there seems to be no documentation to speak 
> of for Archimides and precious little for Titan, at least not without 
> devoting substantial time and bandwidth to viewing videos.
>
> It's unclear, then, how I'd go about assembling everything into a clooj 
> project that would find all of its dependencies, nor how I'd use the API to 
> represent, query, add, change, etc. the data. (That tends to happen when no 
> API documentation seems to be linked from anywhere. :))
>
> Long story short -- it seems that this stuff is either a) not ready for 
> prime time yet, b) targeted predominantly at people that are already at 
> expert proficiency working with graph databases with little concession for 
> learnability/usability by others, c) targeted predominantly at people using 
> Groovy rather than Clojure, or d) some combination of these things. :(
>

If you click through to the Titan wiki, you'll find extensive 
documentation: https://github.com/thinkaurelius/titan/wiki

And for a Quickstart guide to get Titan up and running in 5 mins, check out 
Marko's blog:

"Titan Server: From a Single Server to a Highly Available Cluster"
http://thinkaurelius.com/2013/03/30/titan-server-from-a-single-server-to-a-highly-available-cluster/

Titan is the first native TinkerPop-Blueprints DB so you connect to it and 
interact with it like any other Blueprints graph database 
(https://github.com/tinkerpop/blueprints/wiki). 

Think of Gremlin as a domain-specific language for graphs you use in 
harmony with your native programming language. The original Gremlin was 
written in Groovy and it's what most people use, but Zack Maril and the 
ClojureWerks team recently released a Gremlin-Clojure library called Ogre (
https://github.com/clojurewerkz/ogre).

Here are Ogre's docs: http://ogre.clojurewerkz.org/

BTW: I should have pointed you to Titanium instead 
of Archimedes -- Archimedes is a lower-level library for connecting to 
any Blueprints database. Titanium is a higher-level, Titan-specific library 
built on top of Archimedes, and it's well documented:

Docs: http://titanium.clojurewerkz.org/
Repo: https://github.com/clojurewerkz/titanium

If you have questions, here are the mailing lists for Titan and Gremlin:

Titan: https://groups.google.com/forum/#!forum/aureliusgraphs
Gremlin: https://groups.google.com/forum/#!forum/gremlin-users

- James




-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Local database recommendation?

2013-05-26 Thread Patrick Logan
Apache Jena is another good choice for a graph database. It has the choice 
of an in-memory database, memory-mapped file database (optionally with ACID 
transactions), or mapped to a relational database. It can also run as a 
separate database server. There is a procedural java API and the standard 
SPARQL graph query / update language.

http://jena.apache.org/


On Sunday, May 26, 2013 8:33:22 AM UTC-7, James Thornton wrote:
>
> Hi Cedric -
>
> Look at Datomic free edition or the Titan graph database using 
> either Berkeley DB as its backend datastore or Cassandra in single-server 
> mode -- you can run both locally. 
>
> Datomic: http://www.datomic.com/
> Docs: http://docs.datomic.com/
> Clojure Client: http://docs.datomic.com/clojure/index.html
> Videos: http://www.datomic.com/videos.html
> Blog: http://blog.datomic.com/
> Query Language: http://docs.datomic.com/query.html (Datalog)
>
> Titan: http://thinkaurelius.github.io/titan/
> Repo: https://github.com/thinkaurelius/titan
> Clojure Client: https://github.com/clojurewerkz/archimedes
> Blog: http://thinkaurelius.com/blog/
> Query Language: https://github.com/tinkerpop/gremlin/wiki (Gremlin)
>
> See the Resources section of the TinkerPop Book website for a collection 
> of Titan videos and tutorials: http://www.tinkerpopbook.com/#resources
>
>
> - James
>
>
> On Sunday, May 26, 2013 10:09:51 AM UTC-5, Cedric Greevey wrote:
>>
>> I may be developing an application which will need a persistent, ACID 
>> local database (on the same disk as the application, rather than having to 
>> be accessed over the network) containing information about potentially 
>> 100,000-1,000,000 (or more) objects.
>>
>> Much of that information will be of a quasi-boolean character: "is it an 
>> X or not?" for various choices of X, but with "yes", "no", "borderline", 
>> and "not yet evaluated" as the four possible values. It will be desirable 
>> to query for these, for example to get a lazy seq of all objects for which 
>> it's a borderline Y or for which it's not yet evaluated whether it's a Z or 
>> for which it's either "yes" or "borderline" on whether it's an X or 
>> whatever.
>>
>> I'm not that familiar with the local-DB solutions out there. I'd like a 
>> recommendation for one which is a) a good for for Clojure use and b) a good 
>> fit for the type of data and queries noted above. The DB must be able to 
>> grow larger then available RAM without crashing the JVM and the seqs 
>> resulting from queries like the above will also need to be able to get 
>> bigger than RAM.
>>
>> My own research suggests that H2 may be a good choice, but it's a 
>> standard SQL/relational DB and I'm not 100% sure that fits well with the 
>> type of data and querying noted above. Note though that not all querying 
>> will take that form; there'll also be strings, uuids, dates, and other such 
>> field types and the need to query on these and to join on some of them; 
>> also, to do less-than comparisons on dates.
>>
>> Also, what is the current best recommendation of clojure library for 
>> interfacing to the DB? (Answer might depend on the sort of DB recommended 
>> -- standard, object/NoSQL, graph/ontology, etc.)
>>
>>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Local database recommendation?

2013-05-26 Thread Cedric Greevey
On Sun, May 26, 2013 at 11:33 AM, James Thornton
wrote:

> Hi Cedric -
>
> Look at Datomic free edition or the Titan graph database using
> either Berkeley DB as its backend datastore or Cassandra in single-server
> mode -- you can run both locally.
>
> Datomic: http://www.datomic.com/
> Docs: http://docs.datomic.com/
> Clojure Client: http://docs.datomic.com/clojure/index.html
> Videos: http://www.datomic.com/videos.html
> Blog: http://blog.datomic.com/
> Query Language: http://docs.datomic.com/query.html (Datalog)
>

The page seems to imply that the free edition DB has to fit in main memory.


> Titan: http://thinkaurelius.github.io/titan/
> Repo: https://github.com/thinkaurelius/titan
> Clojure Client: https://github.com/clojurewerkz/archimedes
> Blog: http://thinkaurelius.com/blog/
> Query Language: https://github.com/tinkerpop/gremlin/wiki (Gremlin)
>
> See the Resources section of the TinkerPop Book website for a collection
> of Titan videos and tutorials: http://www.tinkerpopbook.com/#resources
>

That seems more promising, but there seems to be no documentation to speak
of for Archimides and precious little for Titan, at least not without
devoting substantial time and bandwidth to viewing videos.

It's unclear, then, how I'd go about assembling everything into a clooj
project that would find all of its dependencies, nor how I'd use the API to
represent, query, add, change, etc. the data. (That tends to happen when no
API documentation seems to be linked from anywhere. :))

Long story short -- it seems that this stuff is either a) not ready for
prime time yet, b) targeted predominantly at people that are already at
expert proficiency working with graph databases with little concession for
learnability/usability by others, c) targeted predominantly at people using
Groovy rather than Clojure, or d) some combination of these things. :(

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Local database recommendation?

2013-05-26 Thread James Thornton
Hi Cedric -

Look at Datomic free edition or the Titan graph database using 
either Berkeley DB as its backend datastore or Cassandra in single-server 
mode -- you can run both locally. 

Datomic: http://www.datomic.com/
Docs: http://docs.datomic.com/
Clojure Client: http://docs.datomic.com/clojure/index.html
Videos: http://www.datomic.com/videos.html
Blog: http://blog.datomic.com/
Query Language: http://docs.datomic.com/query.html (Datalog)

Titan: http://thinkaurelius.github.io/titan/
Repo: https://github.com/thinkaurelius/titan
Clojure Client: https://github.com/clojurewerkz/archimedes
Blog: http://thinkaurelius.com/blog/
Query Language: https://github.com/tinkerpop/gremlin/wiki (Gremlin)

See the Resources section of the TinkerPop Book website for a collection of 
Titan videos and tutorials: http://www.tinkerpopbook.com/#resources


- James


On Sunday, May 26, 2013 10:09:51 AM UTC-5, Cedric Greevey wrote:
>
> I may be developing an application which will need a persistent, ACID 
> local database (on the same disk as the application, rather than having to 
> be accessed over the network) containing information about potentially 
> 100,000-1,000,000 (or more) objects.
>
> Much of that information will be of a quasi-boolean character: "is it an X 
> or not?" for various choices of X, but with "yes", "no", "borderline", and 
> "not yet evaluated" as the four possible values. It will be desirable to 
> query for these, for example to get a lazy seq of all objects for which 
> it's a borderline Y or for which it's not yet evaluated whether it's a Z or 
> for which it's either "yes" or "borderline" on whether it's an X or 
> whatever.
>
> I'm not that familiar with the local-DB solutions out there. I'd like a 
> recommendation for one which is a) a good for for Clojure use and b) a good 
> fit for the type of data and queries noted above. The DB must be able to 
> grow larger then available RAM without crashing the JVM and the seqs 
> resulting from queries like the above will also need to be able to get 
> bigger than RAM.
>
> My own research suggests that H2 may be a good choice, but it's a standard 
> SQL/relational DB and I'm not 100% sure that fits well with the type of 
> data and querying noted above. Note though that not all querying will take 
> that form; there'll also be strings, uuids, dates, and other such field 
> types and the need to query on these and to join on some of them; also, to 
> do less-than comparisons on dates.
>
> Also, what is the current best recommendation of clojure library for 
> interfacing to the DB? (Answer might depend on the sort of DB recommended 
> -- standard, object/NoSQL, graph/ontology, etc.)
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Local database recommendation?

2013-05-26 Thread Cedric Greevey
I may be developing an application which will need a persistent, ACID local
database (on the same disk as the application, rather than having to be
accessed over the network) containing information about potentially
100,000-1,000,000 (or more) objects.

Much of that information will be of a quasi-boolean character: "is it an X
or not?" for various choices of X, but with "yes", "no", "borderline", and
"not yet evaluated" as the four possible values. It will be desirable to
query for these, for example to get a lazy seq of all objects for which
it's a borderline Y or for which it's not yet evaluated whether it's a Z or
for which it's either "yes" or "borderline" on whether it's an X or
whatever.

I'm not that familiar with the local-DB solutions out there. I'd like a
recommendation for one which is a) a good for for Clojure use and b) a good
fit for the type of data and queries noted above. The DB must be able to
grow larger then available RAM without crashing the JVM and the seqs
resulting from queries like the above will also need to be able to get
bigger than RAM.

My own research suggests that H2 may be a good choice, but it's a standard
SQL/relational DB and I'm not 100% sure that fits well with the type of
data and querying noted above. Note though that not all querying will take
that form; there'll also be strings, uuids, dates, and other such field
types and the need to query on these and to join on some of them; also, to
do less-than comparisons on dates.

Also, what is the current best recommendation of clojure library for
interfacing to the DB? (Answer might depend on the sort of DB recommended
-- standard, object/NoSQL, graph/ontology, etc.)

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.