Re: [h2] Manage MVStore file growth: Drop old versions? Best parameters? Entry order?
Thank you very much, Thomas, your hint on LSM-trees opened my eyes! (a little smalltalk: sometimes I tend to forget that there are laws of nature in computer science too... naive thought: That doesn't fit into memory, so lets store it in a database, DBs are so smart these days, they will give me fast random access on the API side and huge storage quantities on the persistence side, and a magic cache will solve all difficulties...) -- You received this message because you are subscribed to the Google Groups H2 Database group. To unsubscribe from this group and stop receiving emails from it, send an email to h2-database+unsubscr...@googlegroups.com. To post to this group, send email to h2-database@googlegroups.com. Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/d/optout.
Re: [h2] Manage MVStore file growth: Drop old versions? Best parameters? Entry order?
Thomas, perhaps we need a way of adding sorted entries to MVStore that is optimized to reduce writes and reduce internal page node splitting? Kind of like INSERT SORTED. -- You received this message because you are subscribed to the Google Groups H2 Database group. To unsubscribe from this group and stop receiving emails from it, send an email to h2-database+unsubscr...@googlegroups.com. To post to this group, send email to h2-database@googlegroups.com. Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/d/optout.
[h2] Manage MVStore file growth: Drop old versions? Best parameters? Entry order?
Hi, Thomas, perhaps we need a way of adding sorted entries to MVStore that is optimized to reduce writes and reduce internal page node splitting? Well, inserting entries in sorted order shouldn't be all that slow. Is it? What is quite slow is adding entries in random order. I ran into this problem quite a few times recently, and my plan is to write an extension for the MVStore that internally does all what's needed for such cases. This is already on my todo list (write a LSM-tree (log structured merge tree) utility on top of the MVStore). One problem is that such writes need to be blind, so that duplicates are not immediately detected. (There is a workaround, using a bloom filter, but it only helps up to some point.) Therefore, the regular java.util.Map API can't be used, as Map.put is supposed to return the old value, which is not possible for blind writes. Also, the size isn't known except at the very end. Regards, Thomas On Thu, Apr 10, 2014 at 2:52 PM, Markus Kropf m.kropf.allm...@gmail.comjavascript:_e(%7B%7D,'cvml','m.kropf.allm...@gmail.com'); wrote: Thank you very much, Thomas, your hint on LSM-trees opened my eyes! (a little smalltalk: sometimes I tend to forget that there are laws of nature in computer science too... naive thought: That doesn't fit into memory, so lets store it in a database, DBs are so smart these days, they will give me fast random access on the API side and huge storage quantities on the persistence side, and a magic cache will solve all difficulties...) -- You received this message because you are subscribed to the Google Groups H2 Database group. To unsubscribe from this group and stop receiving emails from it, send an email to h2-database+unsubscr...@googlegroups.comjavascript:_e(%7B%7D,'cvml','h2-database%2bunsubscr...@googlegroups.com'); . To post to this group, send email to h2-database@googlegroups.comjavascript:_e(%7B%7D,'cvml','h2-database@googlegroups.com'); . Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups H2 Database group. To unsubscribe from this group and stop receiving emails from it, send an email to h2-database+unsubscr...@googlegroups.com. To post to this group, send email to h2-database@googlegroups.com. Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/d/optout.
[h2] Manage MVStore file growth: Drop old versions? Best parameters? Entry order?
Hi, I'm trying MVStore to implement a change tracking feature in a large graph structure, using several MVMaps with custom data types. In a recent test, I had a load like this: Opened new store, added ~1M entries, with roughly 10Byte serialized data each, closed. MVStore file size was ~1.6 GB Store version was ~750, obviously the number of autocommits that happened. Keys have a near to random distribution. In my case, some latest versions are useful to realize n concurrent readers with 1 writer. Most old versions could be discarded. I mean, 1.6GB is little large when, ideally, just ~10MB would do. Can you give me some advice? Is there an option to discard old versions? Can I optimize chunk sizes, and what are the criteria? Play with autocommit frequency? Using version 1.3.175 -- You received this message because you are subscribed to the Google Groups H2 Database group. To unsubscribe from this group and stop receiving emails from it, send an email to h2-database+unsubscr...@googlegroups.com. To post to this group, send email to h2-database@googlegroups.com. Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/d/optout.
Re: [h2] Manage MVStore file growth: Drop old versions? Best parameters? Entry order?
Hi, Keys have a near to random distribution. I think that's the problem. If you insert random keys, then all pages are affected all the time, and you end up having lots of I/O. I ran into a similar problem recently (actually twice: first to optimize the create index, and then for the ArchiveTool). What I would do is: * try to _not_ use randomly distributed keys * if that's not possible, then use merge sort Merge sort goes like this: insert about 2 MB of data to map1 (so the map easily fits in memory), then commit, switch to map2, repeat, until you have stored all data. That way you have multiple maps. Now, combine all maps using merge sort. This is basically what LSM-trees do. The disadvantage is, you can not easily detect duplicate keys while inserting because you have blind writes. Well you could try using a bloom filter, but that's complicated. Regards, Thomas On Thu, Apr 10, 2014 at 7:29 AM, Markus Kropf m.kropf.allm...@gmail.comwrote: Hi, I'm trying MVStore to implement a change tracking feature in a large graph structure, using several MVMaps with custom data types. In a recent test, I had a load like this: Opened new store, added ~1M entries, with roughly 10Byte serialized data each, closed. MVStore file size was ~1.6 GB Store version was ~750, obviously the number of autocommits that happened. Keys have a near to random distribution. In my case, some latest versions are useful to realize n concurrent readers with 1 writer. Most old versions could be discarded. I mean, 1.6GB is little large when, ideally, just ~10MB would do. Can you give me some advice? Is there an option to discard old versions? Can I optimize chunk sizes, and what are the criteria? Play with autocommit frequency? Using version 1.3.175 -- You received this message because you are subscribed to the Google Groups H2 Database group. To unsubscribe from this group and stop receiving emails from it, send an email to h2-database+unsubscr...@googlegroups.com. To post to this group, send email to h2-database@googlegroups.com. Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups H2 Database group. To unsubscribe from this group and stop receiving emails from it, send an email to h2-database+unsubscr...@googlegroups.com. To post to this group, send email to h2-database@googlegroups.com. Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/d/optout.