Re: [h2] Re: Implement restore points (vaguely similar to Oracle)

Andrei Tokar Sat, 09 Nov 2024 11:18:29 -0800

Here is my quick and dirty simulation of the "restore point" with copying 
of in-memory files:


public class RestorePointSimulation
{
    public static void main(String[] args) throws Exception {
        Class.forName("org.h2.Driver");
        String goldenCopy = "test";
        // populate database with required data
        try(Connection conn = DriverManager.getConnection("jdbc:h2:memFS:" 
+ goldenCopy)) {
            try (Statement stat = conn.createStatement()) {
                stat.execute("CREATE TABLE TEST(ID INT PRIMARY KEY, V INT) 
AS SELECT X, X FROM SYSTEM_RANGE(1, 1_000)");
            }
        }

        String workingCopy = "work";

        copyDatabaseFile(goldenCopy, workingCopy); // @BeforeTest
        // run first test
        try(Connection conn = 
DriverManager.getConnection("jdbc:h2:memFS:"+workingCopy)) {
            try (Statement stat = conn.createStatement()) {
                verifyRowCount(stat, 1_000);
                stat.execute("DELETE FROM TEST");
                verifyRowCount(stat, 0);
            }
        }

        copyDatabaseFile(goldenCopy, workingCopy); // @BeforeTest
        // run second test
        try(Connection conn = 
DriverManager.getConnection("jdbc:h2:memFS:"+workingCopy)) {
            try (Statement stat = conn.createStatement()) {
                verifyRowCount(stat, 1_000);
            }
        }
    }

    private static void copyDatabaseFile(String goldenCopy, String 
workingCopy) throws IOException {
        String workFileName = "memFS:" + workingCopy + ".mv.db";
        FilePath workFilePath = FilePath.get(workFileName);
        try (OutputStream out = workFilePath.newOutputStream(false)) {
            IOUtils.copy(FilePath.get("memFS:"+ goldenCopy 
+".mv.db").newInputStream(), out);
        }
        assert workFilePath.exists();
        assert workFilePath.isRegularFile();
        assert workFilePath.size() > 0;
    }

    private static void verifyRowCount(Statement stat, int expectedCount) 
throws SQLException {
        try (ResultSet resultSet = stat.executeQuery("SELECT COUNT(*) FROM 
TEST")) {
            assert resultSet.next();
            assert resultSet.getInt(1) == expectedCount;
        }
    }
}

On Saturday, November 9, 2024 at 6:59:39 AM UTC-5 Enno Thieleke wrote:

> Hello,
>
> thanks for the valuable feedback and time you've taken. I will try to go 
> from top to bottom.
>
> > Don't use labdas in unit tests at all unless it is strongly necessary.
>
> Understood.
>
> > I changed [...] to re-open connection and tests started to fail.
>
> Correct me if I'm wrong, but for in-memory databases failures are 
> expected, because they are dropped when the last connection is closed 
> (unless prevented via flags in the connection string). That's one of the 
> things my lambdas tried to address. However, I've replaced the lambda based 
> tests with simpler, more straight forward ones which allow faster reasoning 
> about the test behavior.
>
> > This code throws OOME unless restore point creation is commented out
>
> Thanks for pointing that out and good catch. For the time being I will not 
> address it this, but I would address it once you (as a group) are convinced 
> the feature goes in.
>
> > This code throws General Error
>
> Same as above, but an even better catch. In my opinion database compaction 
> should not erase restore points. That would defeat the purpose of restore 
> points (the way I intended them) entirely.
>
> > Because SCRIPT command can't export these restore points, it must be 
> documented.
>
> Good point.
>
> > You also need to document this functionality as experimental only.
>
> Of course. If it goes in at all though.
>
> > I think content of INFORMATION_SCHEMA.RESTORE_POINTS shouldn't be 
> visible for regular users.
>
> I disagree. The information schema contains all available information of a 
> database and restore points are a part of that information, much like 
> tables and columns and constraints. If nothing else, it simply enables 
> users to realize that there are still restore points, i.e. more is kept on 
> disk than might be necessary. Or maybe I misunderstood: I'm thinking of 
> users who have access to the information schema and you'd like to restrict 
> access to the table `restore_points` to admins?
>
> > Don't use MVCC in its documentation, this term only confuses people.
>
> I disagree again. MVCC is a widely used concept and restore points simply 
> expose an implementation detail in H2: that it uses MVCC. I think this is 
> comparable to the Oracle documentation, which is pretty thorough. Even more 
> so than PostgreSQL (which is mind blowing to me). But I also get your point 
> that it would be great if we were able to hide that information from users 
> in case H2 ever switches to another storage which is not MVCC. But we could 
> always adjust the docs if that happens.
>
> > What is a purpose of OLDEST_DATABASE_VERSION_TO_KEEP and why this field 
> isn't changed in newer restore points if older restore points were already 
> dropped?
>
> The field tracks the `database_version` of the very first restore point 
> across all restore points. The reason (for better or worse) is that if one 
> creates two restore points A and B, the latter one includes the creation of 
> A, because they share the same storage as the rest of the database. Ergo, 
> if one drops restore point A but returns to point B (at which the drop of A 
> hasn't happened yet) and then returns to A, we expect the database to be in 
> the state of A. And that's why we need to track the version of A (the 
> oldest restore point in a chain) in every subsequent restore point. I'd be 
> lieing if I said that I like this detail.
>
> > [...] a [...] question, but why this feature is so badly needed?
>
> It is 100% *not* badly needed. It would merely be a nice to have and, to 
> be completely honest, I don't think it would ever go beyond that. However, 
> if somebody wants to be able to restore an H2 database to an earlier state 
> without the hassle of requiring persistent storage and copying files 
> around, that's where it would shine.
>
> > What's wrong with populating database to a desired state, shutting down, 
> copying the file, and starting H2 with a file copy.
>
> Nothing. It's just a wee bit more work.
>
> > Why do you need this functionality for in-memory (non-persistent) 
> database? If performance is the reason, you can use RAM-based file system 
> instead.
>
> Performance is not the reason. At least not when it comes to H2 directly. 
> Database population tools like Liquibase on the other hand can be a 
> bottleneck. And I actually tried a RAM-based filesystem. Jimfs. But I 
> didn't get H2 to work on it, because it doesn't use `Paths.get(URI)` to 
> look up files, but `Paths.get(String, String...)`. As for "real" RAM-based 
> filesystems: That's simply too much work in my opinion.
> But to answer your question why I would like this functionality for 
> in-memory databases: Convenience. There's simply no other and no good or 
> better reason.
> But this kinda brings me back to the point I mentioned already: I think H2 
> shouldn't know if it's running in-memory or not and it should work with 
> Java based in-memory filesystems. And maybe it even does and I was not able 
> to figure out how, but I think it doesn't.
>
> > I am struggling to understand what is going to happen with all other 
> concurrent connections and their transactions at the moment when RESTORE TO 
> POINT is executed.
>
> Gone. Simple as that. Concurrent connections are dropped. The connection 
> that executes `restore to point` (which requires admin permissions) goes 
> into exclusive mode and drops all other connections. It then rolls the 
> storage back to the desired version and re-initializes the database 
> (reloads META, rolls back transactions that were ongoing concurrently at 
> the time of restore point creation it is restoring to). It is as if nothing 
> ever happened after the restore point, that's being returned to. That's the 
> idea at least.
> Not gonna lie, it is complicated.
>
> > I also agree with Evgenij, that implementation based on MVStore will 
> prevent a lot of intermediate versions from being dropped and heedlessly 
> bloat database file.
>
> Which shouldn't be a problem once all restore points have been dropped, 
> because then the MVStore GC can work its magic, or if a database is being 
> restored to a point, because then at least some GC can happen.
>
> > So if I was interested in this feature, I would implement SAVEPOINT as 
> "export to script" and RESTORE as "drop all schemas and execute script". 
> Only questions I have: how to lock the DB while this happens and 
> queue/stall any pending sessions -- in case of multi-user access.
>
> I like that idea *a lot*. It is much, much simpler than my approach. 
> Although it's more than drop all schemas, because it would require a wiped 
> database. Nothing bad though. Let's maybe go even further and simply create 
> a script export and then create a new database with `INIT=RUNSCRIPT FROM 
> '~/create.sql'`. That should do the trick, no? Why didn't I come up with 
> it? :D
> Yes, all connections would have to be dropped to the existing database, 
> but it would be wiped clean anyway, so that's not an issue.
>
>
>
> Guys, thanks a lot for the thoughts exchange. I'm still willing to spend 
> time on "my approach" if you'd accept it into H2, but I'd also be fine with 
> dropping it. As I said, what I've written so far was a good exercise and 
> I've learned more about H2 in the past few days than I have over the last 5 
> or more years.
>
> Regards,
> Enno
>
> On Saturday, November 9, 2024 at 3:53:30 AM UTC+1 Andreas Reichel wrote:
>
> Dear All.
>
> On Fri, 2024-11-08 at 18:47 -0800, Andrei Tokar wrote:
>
> What's wrong with populating database to a desired state, shutting down, 
> copying the file, and starting H2 with a file copy.
>
>
> Am I right to assume, that shutdown is not even needed: you can run export 
> to script/backup on an open instance without interruptions, right?
> (Tiny detail, but it could matter.)
>
> So if I was interested in this feature, I would implement SAVEPOINT as 
> "export to script" and RESTORE as "drop all schemas and execute script". 
> Only questions I have: how to lock the DB while this happens and 
> queue/stall any pending sessions -- in case of multi-user access.
>
> Cheers
> Andreas
>
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/h2-database/55d7c48a-6558-408b-9f66-9601a3423b63n%40googlegroups.com.

Re: [h2] Re: Implement restore points (vaguely similar to Oracle)

Reply via email to