Ryan,

your answer helps a lot; thank you.

Another (maybe stupid) question: What exactly is the purpose of a 'transaction'? Do I need it for each SQL statement to be executed?


That's an excellent question in fact - lots of confusion out there.

Two main reasons, which I will quickly show - 1 - ACID guarantees (especially Isolation and Consistency, the others [Atomicity and Durability] you get anyway), and 2 - Speed.
(https://en.wikipedia.org/wiki/ACID)

In most DB engines (SQLite definitely) most executed commands gets their own little wrapped transaction if you did not start one explicitly - just to make the nuts and bolts of the engine function correctly. This might be a "lesser" form of transaction, but it has to at a minimum prepare the table, lock tables/rows/whatever, do the reading/writing, and then release those locks.

This also means that if you issue four commands, let's use an example (based loosely on a contacts-list table), say you execute these three in order:

SELECT MAX(ID) FROM contacts;
(The result of the previous statement is put into a variable, say: X)
INSERT INTO contacts(ID, FirstName, LastName, TimeAdded, ....) VALUES (X+1, 'Joe', 'Soap', Now, ...); INSERT INTO contacts(ID, FirstName, LastName, TimeAdded, ....) VALUES (X+2, 'Joe', 'Jones', Now, ...); INSERT INTO contacts(ID, FirstName, LastName, TimeAdded, ....) VALUES (X+3, 'Joe', 'Smith', Now, ...);

Now inside the DB Engine, it is wrapped in pseudo-code like this (Obviously there is more going on, but I'm just mentioning the interesting bits to our example):

tt = Start_Transaction;
    qq = prepare query( SELECT MAX(ID) FROM contacts; ) :
        if (qq is READ Query)
            obtain_read_lock(tt, qq); // Executed
        ELSE
            obtain_write_lock(tt, qq);// Skipped
        run_query(qq);
        output_results;
        drop_locks(tt, qq);
    release_prepared_resources(qq);
    IF ERRORS > 0
        roll_back(tt);
    ELSE
        commit(tt);
release_transaction(tt);
tt = Start_Transaction;
    qq = prepare query( INSERT INTO contacts(ID, FirstName, LastName, TimeAdded, ...) VALUES (X+1, 'Joe', 'Soap', Now, ...); ) :
        if (qq is READ Query)
            obtain_read_lock(tt, qq);// Skipped
        ELSE
            obtain_write_lock(tt, qq); // Executed
        run_query(qq);
        output_results;
        drop_locks(tt, qq);
    release_prepared_resources(qq);
    IF ERRORS > 0
        roll_back(tt);
    ELSE
        commit(tt);
release_transaction(tt);
tt = Start_Transaction;
    qq = prepare query(INSERT INTO contacts(ID, FirstName, LastName, TimeAdded, ...) VALUES (X+2, 'Joe', 'Jones', Now, ...); ) :
        if (qq is READ Query)
            obtain_read_lock(tt, qq); // Skipped
        ELSE
            obtain_write_lock(tt, qq); // Executed
        run_query(qq);
        output_results;
        drop_locks(tt, qq);
    release_prepared_resources(qq);
    IF ERRORS > 0
        roll_back(tt);
    ELSE
        commit(tt);
release_transaction(tt);
tt = Start_Transaction;
    qq = prepare query(INSERT INTO contacts(ID, FirstName, LastName, TimeAdded, ...) VALUES (X+3, 'Joe', 'Smith', Now, ...); ) :
        if (qq is READ Query)
            obtain_read_lock(tt, qq);   // Skipped
        ELSE
            obtain_write_lock(tt, qq); // Executed
        run_query(qq);
        output_results;
        drop_locks(tt, qq);
    release_prepared_resources(qq);
    IF ERRORS > 0
        roll_back(tt);
    ELSE
        commit(tt);
release_transaction(tt);

The "output_results;" command might be a no-op for INSERT queries, though some DBs do return values.

Now note some things:
Towards Point 1 above:  If someone else also were trying to insert to this database at the same time, they might get an X that is in-between your inserts, and indeed insert values in between. This is called a concurrancy problem and is fixed by the "isolation" in an ACID transaction.   (I know that behaviour can well be fixed by declaring the ID column with AUTO_INCREMENT, but let's imagine it wasn't in this example, you may well have other columns that must be Unique).

Note also that the inserted values for "Now()" would be slightly different every time since it's just a littlebit later when the next INSERT happens.  Inside a transaction, the TimeAdded values will all be the same. Basically, while the transaction is in progress, and for the duration of it, no other connection to the same database can mess with those values, plus those values will remain consistent for the duration of the transaction. This is a very important property to many people. There are a few other considerations to get technical with, like Serializability modes etc, allowing Dirty-reads, Read-uncommitted data (they don't fall in the scope of this discussion, but you can look them up if interested) - i'll only say here that not everyone wants their transactions to be closed off completely so Database Engines have switches/ways to allow breaking of the ACID model.

Point 2: Also note that the speed magnificently increases when you explicitly START the transaction, because the above script and pseudo-code changes to this:

BEGIN TRANSACTION;
SELECT MAX(ID) FROM contacts;
INSERT INTO contacts(ID, FirstName, LastName, TimeAdded, ....) VALUES (X+1, 'Joe', 'Soap', Now, ...); INSERT INTO contacts(ID, FirstName, LastName, TimeAdded, ....) VALUES (X+2, 'Joe', 'Jones', Now, ...); INSERT INTO contacts(ID, FirstName, LastName, TimeAdded, ....) VALUES (X+3, 'Joe', 'Smith', Now, ...);
END TRANSACTION;

tt = Start_Transaction;
    obtain_read_lock(tt);
    qq = prepare query( SELECT MAX(ID) FROM contacts; ) :
        if (qq is WRITE Query)escalate_to_write_lock(tt, qq); // Skipped
        run_query(qq);
        output_results;**
    release_prepared_resources(qq);
    qq = prepare query( INSERT INTO contacts(ID, FirstName, LastName, TimeAdded, ...) VALUES (X+1, 'Joe', 'Soap', Now, ...); ) :         if (qq is WRITE Query) if (qq is WRITE Query)escalate_to_write_lock(tt, qq); // Executed
        run_query(qq);
    release_prepared_resources(qq);
    qq = prepare query( INSERT INTO contacts(ID, FirstName, LastName, TimeAdded, ...) VALUES (X+2, 'Joe', 'Jones', Now, ...); ) : if (qq is WRITE Query) if (qq is WRITE Query)escalate_to_write_lock(tt, qq); // No-op because write-lock is already held.
        run_query(qq);
    release_prepared_resources(qq);
    qq = prepare query( INSERT INTO contacts(ID, FirstName, LastName, TimeAdded, ...) VALUES (X+3, 'Joe', 'Smith', Now, ...); ) : if (qq is WRITE Query) if (qq is WRITE Query)escalate_to_write_lock(tt, qq); // No-op because write-lock is already held.
        run_query(qq);
    release_prepared_resources(qq);
    IF ERRORS > 0
       roll_back(tt);
    ELSE
       commit(tt);
    output_results;**
release_transaction(tt);

** - The exact times and places when results are output in queries differ from Engine to Engine and can be controlled with switches in some. Typically inline after selects, but sometimes you only want to see the results that actually made it into the database (i.e. at the END of the transaction).

You can of course get even faster when using multi-line inserts and prepared statements - but that's also another discussion.

Note also that only transactions with explicit errors will be rolled back by the Engines typically. If there is an error (a thing that you don't want to commit to the data) but it isn't a problem to commit it for the DB engine, then you have to explicitly tell it to roll back. (In sqlite this is just querying the command: ROLLBACK;)

There are also conflict handlers which you can specify on how you want to proceed when a conflict (error) occurs, but again, that's another topic, this one is too long already. :)

Cheers,
Ryan



--
_______________________________________________
lazarus mailing list
lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Reply via email to