Hi,

 if one plans to store a lot of BLOBs (several gigabytes total) in H2
>

That's a good question.

Regular file systems do support files sizes of many gigabytes, so the file
size itself shouldn't be a problem. The main problems I see are:

a) incremental backup
b) when removing blobs, will the file size shrink
c) unused files should be deleted early if possible

Are there more problems?

Incremental backup: If there is just one file, then incremental backup is
problematic. Some tools are able to deal with this case (rdiff-backup for
example), but having to use a special tool is not convenient. Both of your
solutions (database file split feature and storing the large objects
externally) should work for this case. The problem with storing each LOB is
a separate file is that you end up with lots of files, most of them small
(even if very small objects are stored in-place). I have some experience
with this approach with the Apache Jackrabbit DataStore. There, a garbage
collection algorithm is used to get rid of unused files. However, this can
break incremental backup because GC will update the last modified time.
Also, because there are lots of files, one problem is how do you create
directories. So both (1) and (2) have advantages and disadvantages.

File size shrink: I'm not sure if this is such a big problem, as empty
space in the database file(s) is re-used. But anyway: the current MVStore
implementation does not support efficient shrinking of files; instead, the
content at the end of the file needs to be moved to the freed space.
However, there is an abstraction (mvstore.FileStore) that can help solve
this problem. There is one implementation (OffHeapStore) that already
supports freeing up space in the middle of the storage. It should be
relatively easy to implement the same for real files, if it really turns
out to be a problem.

Unused files should be deleted early: H2 does that, and I would like to
keep it the way it is. Actually, in previous versions of H2, LOB data was
deleted too early (before the LOB was closed), which of course is a
problem. One alternative is to do garbage collection, similar to Java
garbage collection. As written above, I have experience with that approach
(Jackrabbit DataStore). For H2, I would like to avoid it as it is a
separate (background) task to think about. And disk space is not re-used
early on.

Regards,
Thomas

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to