Hi, if one plans to store a lot of BLOBs (several gigabytes total) in H2 >
That's a good question. Regular file systems do support files sizes of many gigabytes, so the file size itself shouldn't be a problem. The main problems I see are: a) incremental backup b) when removing blobs, will the file size shrink c) unused files should be deleted early if possible Are there more problems? Incremental backup: If there is just one file, then incremental backup is problematic. Some tools are able to deal with this case (rdiff-backup for example), but having to use a special tool is not convenient. Both of your solutions (database file split feature and storing the large objects externally) should work for this case. The problem with storing each LOB is a separate file is that you end up with lots of files, most of them small (even if very small objects are stored in-place). I have some experience with this approach with the Apache Jackrabbit DataStore. There, a garbage collection algorithm is used to get rid of unused files. However, this can break incremental backup because GC will update the last modified time. Also, because there are lots of files, one problem is how do you create directories. So both (1) and (2) have advantages and disadvantages. File size shrink: I'm not sure if this is such a big problem, as empty space in the database file(s) is re-used. But anyway: the current MVStore implementation does not support efficient shrinking of files; instead, the content at the end of the file needs to be moved to the freed space. However, there is an abstraction (mvstore.FileStore) that can help solve this problem. There is one implementation (OffHeapStore) that already supports freeing up space in the middle of the storage. It should be relatively easy to implement the same for real files, if it really turns out to be a problem. Unused files should be deleted early: H2 does that, and I would like to keep it the way it is. Actually, in previous versions of H2, LOB data was deleted too early (before the LOB was closed), which of course is a problem. One alternative is to do garbage collection, similar to Java garbage collection. As written above, I have experience with that approach (Jackrabbit DataStore). For H2, I would like to avoid it as it is a separate (background) task to think about. And disk space is not re-used early on. Regards, Thomas -- You received this message because you are subscribed to the Google Groups "H2 Database" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/groups/opt_out.
