Here is my quick and dirty simulation of the "restore point" with copying
of in-memory files:
public class RestorePointSimulation
{
public static void main(String[] args) throws Exception {
Class.forName("org.h2.Driver");
String goldenCopy = "test";
// populate database with required data
try(Connection conn = DriverManager.getConnection("jdbc:h2:memFS:"
+ goldenCopy)) {
try (Statement stat = conn.createStatement()) {
stat.execute("CREATE TABLE TEST(ID INT PRIMARY KEY, V INT)
AS SELECT X, X FROM SYSTEM_RANGE(1, 1_000)");
}
}
String workingCopy = "work";
copyDatabaseFile(goldenCopy, workingCopy); // @BeforeTest
// run first test
try(Connection conn =
DriverManager.getConnection("jdbc:h2:memFS:"+workingCopy)) {
try (Statement stat = conn.createStatement()) {
verifyRowCount(stat, 1_000);
stat.execute("DELETE FROM TEST");
verifyRowCount(stat, 0);
}
}
copyDatabaseFile(goldenCopy, workingCopy); // @BeforeTest
// run second test
try(Connection conn =
DriverManager.getConnection("jdbc:h2:memFS:"+workingCopy)) {
try (Statement stat = conn.createStatement()) {
verifyRowCount(stat, 1_000);
}
}
}
private static void copyDatabaseFile(String goldenCopy, String
workingCopy) throws IOException {
String workFileName = "memFS:" + workingCopy + ".mv.db";
FilePath workFilePath = FilePath.get(workFileName);
try (OutputStream out = workFilePath.newOutputStream(false)) {
IOUtils.copy(FilePath.get("memFS:"+ goldenCopy
+".mv.db").newInputStream(), out);
}
assert workFilePath.exists();
assert workFilePath.isRegularFile();
assert workFilePath.size() > 0;
}
private static void verifyRowCount(Statement stat, int expectedCount)
throws SQLException {
try (ResultSet resultSet = stat.executeQuery("SELECT COUNT(*) FROM
TEST")) {
assert resultSet.next();
assert resultSet.getInt(1) == expectedCount;
}
}
}
On Saturday, November 9, 2024 at 6:59:39 AM UTC-5 Enno Thieleke wrote:
> Hello,
>
> thanks for the valuable feedback and time you've taken. I will try to go
> from top to bottom.
>
> > Don't use labdas in unit tests at all unless it is strongly necessary.
>
> Understood.
>
> > I changed [...] to re-open connection and tests started to fail.
>
> Correct me if I'm wrong, but for in-memory databases failures are
> expected, because they are dropped when the last connection is closed
> (unless prevented via flags in the connection string). That's one of the
> things my lambdas tried to address. However, I've replaced the lambda based
> tests with simpler, more straight forward ones which allow faster reasoning
> about the test behavior.
>
> > This code throws OOME unless restore point creation is commented out
>
> Thanks for pointing that out and good catch. For the time being I will not
> address it this, but I would address it once you (as a group) are convinced
> the feature goes in.
>
> > This code throws General Error
>
> Same as above, but an even better catch. In my opinion database compaction
> should not erase restore points. That would defeat the purpose of restore
> points (the way I intended them) entirely.
>
> > Because SCRIPT command can't export these restore points, it must be
> documented.
>
> Good point.
>
> > You also need to document this functionality as experimental only.
>
> Of course. If it goes in at all though.
>
> > I think content of INFORMATION_SCHEMA.RESTORE_POINTS shouldn't be
> visible for regular users.
>
> I disagree. The information schema contains all available information of a
> database and restore points are a part of that information, much like
> tables and columns and constraints. If nothing else, it simply enables
> users to realize that there are still restore points, i.e. more is kept on
> disk than might be necessary. Or maybe I misunderstood: I'm thinking of
> users who have access to the information schema and you'd like to restrict
> access to the table `restore_points` to admins?
>
> > Don't use MVCC in its documentation, this term only confuses people.
>
> I disagree again. MVCC is a widely used concept and restore points simply
> expose an implementation detail in H2: that it uses MVCC. I think this is
> comparable to the Oracle documentation, which is pretty thorough. Even more
> so than PostgreSQL (which is mind blowing to me). But I also get your point
> that it would be great if we were able to hide that information from users
> in case H2 ever switches to another storage which is not MVCC. But we could
> always adjust the docs if that happens.
>
> > What is a purpose of OLDEST_DATABASE_VERSION_TO_KEEP and why this field
> isn't changed in newer restore points if older restore points were already
> dropped?
>
> The field tracks the `database_version` of the very first restore point
> across all restore points. The reason (for better or worse) is that if one
> creates two restore points A and B, the latter one includes the creation of
> A, because they share the same storage as the rest of the database. Ergo,
> if one drops restore point A but returns to point B (at which the drop of A
> hasn't happened yet) and then returns to A, we expect the database to be in
> the state of A. And that's why we need to track the version of A (the
> oldest restore point in a chain) in every subsequent restore point. I'd be
> lieing if I said that I like this detail.
>
> > [...] a [...] question, but why this feature is so badly needed?
>
> It is 100% *not* badly needed. It would merely be a nice to have and, to
> be completely honest, I don't think it would ever go beyond that. However,
> if somebody wants to be able to restore an H2 database to an earlier state
> without the hassle of requiring persistent storage and copying files
> around, that's where it would shine.
>
> > What's wrong with populating database to a desired state, shutting down,
> copying the file, and starting H2 with a file copy.
>
> Nothing. It's just a wee bit more work.
>
> > Why do you need this functionality for in-memory (non-persistent)
> database? If performance is the reason, you can use RAM-based file system
> instead.
>
> Performance is not the reason. At least not when it comes to H2 directly.
> Database population tools like Liquibase on the other hand can be a
> bottleneck. And I actually tried a RAM-based filesystem. Jimfs. But I
> didn't get H2 to work on it, because it doesn't use `Paths.get(URI)` to
> look up files, but `Paths.get(String, String...)`. As for "real" RAM-based
> filesystems: That's simply too much work in my opinion.
> But to answer your question why I would like this functionality for
> in-memory databases: Convenience. There's simply no other and no good or
> better reason.
> But this kinda brings me back to the point I mentioned already: I think H2
> shouldn't know if it's running in-memory or not and it should work with
> Java based in-memory filesystems. And maybe it even does and I was not able
> to figure out how, but I think it doesn't.
>
> > I am struggling to understand what is going to happen with all other
> concurrent connections and their transactions at the moment when RESTORE TO
> POINT is executed.
>
> Gone. Simple as that. Concurrent connections are dropped. The connection
> that executes `restore to point` (which requires admin permissions) goes
> into exclusive mode and drops all other connections. It then rolls the
> storage back to the desired version and re-initializes the database
> (reloads META, rolls back transactions that were ongoing concurrently at
> the time of restore point creation it is restoring to). It is as if nothing
> ever happened after the restore point, that's being returned to. That's the
> idea at least.
> Not gonna lie, it is complicated.
>
> > I also agree with Evgenij, that implementation based on MVStore will
> prevent a lot of intermediate versions from being dropped and heedlessly
> bloat database file.
>
> Which shouldn't be a problem once all restore points have been dropped,
> because then the MVStore GC can work its magic, or if a database is being
> restored to a point, because then at least some GC can happen.
>
> > So if I was interested in this feature, I would implement SAVEPOINT as
> "export to script" and RESTORE as "drop all schemas and execute script".
> Only questions I have: how to lock the DB while this happens and
> queue/stall any pending sessions -- in case of multi-user access.
>
> I like that idea *a lot*. It is much, much simpler than my approach.
> Although it's more than drop all schemas, because it would require a wiped
> database. Nothing bad though. Let's maybe go even further and simply create
> a script export and then create a new database with `INIT=RUNSCRIPT FROM
> '~/create.sql'`. That should do the trick, no? Why didn't I come up with
> it? :D
> Yes, all connections would have to be dropped to the existing database,
> but it would be wiped clean anyway, so that's not an issue.
>
>
>
> Guys, thanks a lot for the thoughts exchange. I'm still willing to spend
> time on "my approach" if you'd accept it into H2, but I'd also be fine with
> dropping it. As I said, what I've written so far was a good exercise and
> I've learned more about H2 in the past few days than I have over the last 5
> or more years.
>
> Regards,
> Enno
>
> On Saturday, November 9, 2024 at 3:53:30 AM UTC+1 Andreas Reichel wrote:
>
> Dear All.
>
> On Fri, 2024-11-08 at 18:47 -0800, Andrei Tokar wrote:
>
> What's wrong with populating database to a desired state, shutting down,
> copying the file, and starting H2 with a file copy.
>
>
> Am I right to assume, that shutdown is not even needed: you can run export
> to script/backup on an open instance without interruptions, right?
> (Tiny detail, but it could matter.)
>
> So if I was interested in this feature, I would implement SAVEPOINT as
> "export to script" and RESTORE as "drop all schemas and execute script".
> Only questions I have: how to lock the DB while this happens and
> queue/stall any pending sessions -- in case of multi-user access.
>
> Cheers
> Andreas
>
>
--
You received this message because you are subscribed to the Google Groups "H2
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/h2-database/55d7c48a-6558-408b-9f66-9601a3423b63n%40googlegroups.com.