Re: [h2] Manage MVStore file growth: Drop old versions? Best parameters? Entry order?

2014-04-10 Thread Markus Kropf
Thank you very much, Thomas, your hint on LSM-trees opened my eyes!
(a little smalltalk: sometimes I tend to forget that there are laws of 
nature in computer science too... naive thought: That doesn't fit into 
memory, so lets store it in a database, DBs are so smart these days, they 
will give me fast random access on the API side and huge storage quantities 
on the persistence side, and a magic cache will solve all difficulties...)

-- 
You received this message because you are subscribed to the Google Groups H2 
Database group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To post to this group, send email to h2-database@googlegroups.com.
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.


Re: [h2] Manage MVStore file growth: Drop old versions? Best parameters? Entry order?

2014-04-10 Thread Noel Grandin
Thomas, perhaps we need a way of adding sorted entries to MVStore that is
optimized to reduce writes and reduce internal page node splitting?

Kind of like INSERT  SORTED.

-- 
You received this message because you are subscribed to the Google Groups H2 
Database group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To post to this group, send email to h2-database@googlegroups.com.
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.


[h2] Manage MVStore file growth: Drop old versions? Best parameters? Entry order?

2014-04-10 Thread Thomas Mueller
Hi,

 Thomas, perhaps we need a way of adding sorted entries to MVStore that is
optimized to reduce writes and reduce internal page node splitting?

Well, inserting entries in sorted order shouldn't be all that slow. Is it?
What is quite slow is adding entries in random order. I ran into this
problem quite a few times recently, and my plan is to write an extension
for the MVStore that internally does all what's needed for such cases. This
is already on my todo list (write a LSM-tree (log structured merge tree)
utility on top of the MVStore). One problem is that such writes need to be
blind, so that duplicates are not immediately detected. (There is a
workaround, using a bloom filter, but it only helps up to some point.)
Therefore, the regular java.util.Map API can't be used, as Map.put is
supposed to return the old value, which is not possible for blind writes.
Also, the size isn't known except at the very end.

Regards,
Thomas


On Thu, Apr 10, 2014 at 2:52 PM, Markus Kropf
m.kropf.allm...@gmail.comjavascript:_e(%7B%7D,'cvml','m.kropf.allm...@gmail.com');
 wrote:

 Thank you very much, Thomas, your hint on LSM-trees opened my eyes!
 (a little smalltalk: sometimes I tend to forget that there are laws of
 nature in computer science too... naive thought: That doesn't fit into
 memory, so lets store it in a database, DBs are so smart these days, they
 will give me fast random access on the API side and huge storage quantities
 on the persistence side, and a magic cache will solve all difficulties...)

  --
 You received this message because you are subscribed to the Google Groups
 H2 Database group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to 
 h2-database+unsubscr...@googlegroups.comjavascript:_e(%7B%7D,'cvml','h2-database%2bunsubscr...@googlegroups.com');
 .
 To post to this group, send email to 
 h2-database@googlegroups.comjavascript:_e(%7B%7D,'cvml','h2-database@googlegroups.com');
 .
 Visit this group at http://groups.google.com/group/h2-database.
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups H2 
Database group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To post to this group, send email to h2-database@googlegroups.com.
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.


[h2] Manage MVStore file growth: Drop old versions? Best parameters? Entry order?

2014-04-09 Thread Markus Kropf
Hi, I'm trying MVStore to implement a change tracking feature in a large 
graph structure, using several MVMaps with custom data types. In a recent 
test, I had a load like this:

Opened new store, added ~1M entries, with roughly 10Byte serialized data 
each, closed. MVStore file size was ~1.6 GB
Store version was ~750, obviously the number of autocommits that happened.
Keys have a near to random distribution.
In my case, some latest versions are useful to realize n concurrent readers 
with 1 writer. Most old versions could be discarded. I mean, 1.6GB is 
little large when, ideally, just ~10MB would do. Can you give me some 
advice? Is there an option to discard old versions? Can I optimize chunk 
sizes, and what are the criteria? Play with autocommit frequency?

Using version 1.3.175

-- 
You received this message because you are subscribed to the Google Groups H2 
Database group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To post to this group, send email to h2-database@googlegroups.com.
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.


Re: [h2] Manage MVStore file growth: Drop old versions? Best parameters? Entry order?

2014-04-09 Thread Thomas Mueller
Hi,

 Keys have a near to random distribution.

I think that's the problem. If you insert random keys, then all pages are
affected all the time, and you end up having lots of I/O. I ran into a
similar problem recently (actually twice: first to optimize the create
index, and then for the ArchiveTool). What I would do is:

* try to _not_ use randomly distributed keys
* if that's not possible, then use merge sort

Merge sort goes like this: insert about 2 MB of data to map1 (so the map
easily fits in memory), then commit, switch to map2, repeat, until you have
stored all data. That way you have multiple maps. Now, combine all maps
using merge sort. This is basically what LSM-trees do. The disadvantage is,
you can not easily detect duplicate keys while inserting because you have
blind writes. Well you could try using a bloom filter, but that's
complicated.

Regards,
Thomas





On Thu, Apr 10, 2014 at 7:29 AM, Markus Kropf m.kropf.allm...@gmail.comwrote:

 Hi, I'm trying MVStore to implement a change tracking feature in a large
 graph structure, using several MVMaps with custom data types. In a recent
 test, I had a load like this:

 Opened new store, added ~1M entries, with roughly 10Byte serialized data
 each, closed. MVStore file size was ~1.6 GB
 Store version was ~750, obviously the number of autocommits that happened.
 Keys have a near to random distribution.
 In my case, some latest versions are useful to realize n concurrent
 readers with 1 writer. Most old versions could be discarded. I mean, 1.6GB
 is little large when, ideally, just ~10MB would do. Can you give me some
 advice? Is there an option to discard old versions? Can I optimize chunk
 sizes, and what are the criteria? Play with autocommit frequency?

 Using version 1.3.175

 --
 You received this message because you are subscribed to the Google Groups
 H2 Database group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to h2-database+unsubscr...@googlegroups.com.
 To post to this group, send email to h2-database@googlegroups.com.
 Visit this group at http://groups.google.com/group/h2-database.
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups H2 
Database group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To post to this group, send email to h2-database@googlegroups.com.
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.