Re: [sqlite] Clarification on "No Isolation On Same Db Connection"

Keith Medcalf Wed, 07 Sep 2016 19:16:25 -0700

There is only cursor stability level isolation between queries on the same 
connection.  The query is executed as a descending series of loops.  There is a 
"current record" pointer based on the location of the "current record" in the 
query for each table.  If you modify an entry, that modification is made 
immediately.  Then the next "step" in the descending loops is made.  Your 
modification may, however, have changed the btree's and suchlike "out from 
under" your query.  When your query continues (does the next step) it is using 
the "location pointers" of where it was and following them "as it is now".


This make it appear that the behaviour is undefined, however this is only an 
appearance.  The behaviour is entire rational, predictable, and deterministic.

The "proper" way to do this is to engage WAL and use a separate connection for 
each (one for updating, one for reading).  When WAL is in effect, read 
operations WITHIN A TRANSACTION have Repeatable Read isolation.  That means 
that all READ operations within the transaction are repeatable and will return 
the exact same results until the transaction is ended.

That is if you do:

Connection 1:

create table x ( x );
insert into x values (1);

BEGIN;
select x from x;
                          Connection 2
                          BEGIN;
                          UPDATE x SET x=2;
select x from x;
                          COMMIT;
select x from x;          
                          SELECT x from x;
select x from x;

COMMIT;

select x from x;

All the selects in connection 1 within the transaction will return 1, while the 
selects in connection 2 will return 2.  The last select in connection 1 (taking 
place in a new automagic transaction) will return 2.

In other words, the data READ from a connection between BEGIN ... COMMIT is 
REPEATABLE -- hence the name Repeatable Read.
The "normal" isolation level is Cursor Stability (which means that the pointers 
to the "current rows" remain "current" in the face of updates, but that the 
results of the query may perturbate as a result of concomitant updates.

While the transaction is open on connection 1 (and in a transaction), you can 
do as many transactions as you like on connection 2.  Those effects will not be 
seen until the changes are committed on connection 2 *AND* the Repeatable Read 
isolation level is ended by committing connection 1 (thus allowing you to see 
changes made to the database).

> -----Original Message-----
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Stephan Mueller
> Sent: Wednesday, 7 September, 2016 16:12
> To: sqlite-users@mailinglists.sqlite.org
> Subject: [sqlite] Clarification on "No Isolation On Same Db Connection"
> 
> Hello all;
> 
> I have a scenario that is well covered by the text at
> https://www.sqlite.org/isolation.html in the section "No Isolation Between
> Operations On The Same Database Connection" (relevant paragraphs at end
> for convenience) but am hoping for a bit more clarity.  Here's an overview
> of my scenario.  I'm trying to find the right level of detail; if I've
> failed on the side of too little, I'm happy to provide more.
> 
> I have a non-trivial SELECT simulating an outer join
> (http://stackoverflow.com/questions/1923259/full-outer-join-with-sqlite)
> with the overall structure
> 
>         SELECT columns FROM t1 LEFT JOIN t2 ON t1.column = t2.column
>             WHERE conditions
>         UNION ALL
>         SELECT columns FROM t2 LEFT JOIN t1 ON t2.column = t1.column
>             WHERE t1.column IS NULL
>             AND more_conditions
>         ORDER BY column
> 
> The application fetches a row at a time, doing an UPDATE or INSERT or
> DELETE into t1 (henceforth referred to just  as 'updates') for each row
> fetched.  I _want_ the semantics to be that the SELECT is unperturbed by
> my updates.  (Details: there are actually three tables.  The joined-on
> column is UNIQUE (indeed, the PRIMARY KEY) in each table.  The query
> conditions are such that column is UNIQUE in the result set when not doing
> any updates while iterating.)
> 
> I understand that a way to ensure "SELECT is unperturbed" semantics is to
> use separate connections for SELECT and updates.   This is undesirable
> since I'd have to (IIUC) do all my updates (possibly millions) in a single
> transaction.  I'd prefer to commit after each update (to avoid redoing any
> work in case of unnatural program abort, for example).  The compromise for
> performance is to commit every N (4096, currently) updates, but
> potentially millions seems right out.
> 
> Another way is to fetch all the results into a temporary table tt, then
> iterate over tt while modifying t1.   This is easy to code, and SQLite can
> fill tt with a single INSERT INTO tt SELECT <as above>.  With potentially
> millions of rows, however, I'm rather an approach that doesn't spoil the
> interactive user experience by adding seconds spent on creating the
> temporary table, if that can be avoided.
> 
> I'm thus led to the following notion (apologies for my pseudo-code, which
> is neither SQL, nor C, nor the Perl I'm actually coding in):
> 
>         prepare SELECT from above
>         execute()
>         $last = "";
>         while (fetchnext) {
>                 if $fetched.column <= $last                (1)
>                         skip to next iteration of loop      (2)
>                 $last = $fetched.column                       (3)
>                 update-or-insert-or-delete
>         }
> 
> That is, if I ever receive a record that ought to have arrived earlier
> because of ORDER BY, it must be a since-SELECT-began update, and should be
> ignored.
> 
> I'm trying to not think in terms of potential implementation, and stick
> with "what must ORDER BY guarantee?"   I'd imagine if it ever returns a
> recent update, it would do so at the proper point in sort order.  If it's
> decided that an "alphabetically-prior-to-where-we-have-been-returning-
> records recent update" record must be returned, it would be returned ASAP
> -- as if the ORDER BY guarantee was that "all remaining records are
> sorted, and the lowest of those returned next".
> 
> Is my thinking sound?  -- are the numbered lines in my pseudo-code
> necessary and/or sufficient to achieve the goal of ignoring updates in
> this case, in a righteously precise use of SQL?
> 
> Thanks in advance for your thoughts!
> 
> Here are selections from the relevant SQLite documentation paragraphs to
> provide context, with comments/queries interspersed.
> 
>       No Isolation Between Operations On The Same Database Connection
> 
>       ... What if a SELECT statement is started and the
> sqlite3_step()<https://www.sqlite.org/c3ref/step.html> interface steps
> through roughly half of its output, then some
> UPDATE<https://www.sqlite.org/lang_update.html> statements are run by the
> application that modify the table that the SELECT statement is reading,
> then more calls to sqlite3_step()<https://www.sqlite.org/c3ref/step.html>
> are made to finish out the SELECT statement? Will the later steps of the
> SELECT statement see the changes made by the UPDATE or not? The answer is
> that this behavior is undefined.  In particular, whether or not the SELECT
> statement sees the concurrent changes depends on which release of SQLite
> is running, the schema of the database file, whether or not
> ANALYZE<https://www.sqlite.org/lang_analyze.html> has been run, and the
> details of the query. In some cases, it might depend on the content of the
> database file, too. There is no good way to know whether or not a SELECT
> statement will see changes that were made to the database by the same
> database connection after the SELECT statement was started. And hence,
> developers should diligently avoid writing applications that make
> assumptions about what will occur in that circumstance.
> When talking about the C language specification, it's traditional to claim
> that "this behaviour is undefined" means "the C compiler is free to format
> your hard drive".  In this context, I hope it means "it is not specified
> whether later steps of the SELECT statement will see the changes made by
> the UPDATE, but the database is not at risk of corruption due to updates-
> while-iterating".  Text below suggests my hopeful interpretation is
> correct.
> 
> Formalizing my query above, I think I'm asking: given I would retrieve
> records 1..N iterating over the SELECT results when making no
> modifications while iterating, is it reasonable to assume that with
> updates-to-previously-returned-records-while-iterating, I am still
> guaranteed to receive at least the same records 1..N, in the same order
> (assuming the ORDER BY clause as in my query), possibly (but not
> necessarily) with additional records interspersed, and any interspersed
> records would need to respect the ORDER BY clause as well (which means "in
> sorted order for 'higher' than most-recently-returned, and 'as soon as
> possible' for 'lower or equal to' most-recently returned?
> 
>       If an application issues a SELECT statement on a single table like
> "SELECT rowid, * FROM table WHERE ..." and starts stepping through the
> output of that statement
> usingsqlite3_step()<https://www.sqlite.org/c3ref/step.html> and examining
> each row, then it is safe for the application to delete the current row or
> any prior row using "DELETE FROM table WHERE rowid=?". It is also safe (in
> the sense that it will not harm the database) for the application to
> delete a row that expected to appear later in the query but has not
> appeared yet. If a future row is deleted, however, it might happen that
> the row turns up after a subsequent sqlite3_step(), even after it has
> allegedly been deleted. Or it might not. That behavior is undefined. The
> application can also INSERT new rows into the table while the SELECT
> statement is running, but whether or not the new rows appear in subsequent
> sqlite3_step()s of the query is undefined. And the application can UPDATE
> the current row or any prior row, though doing so might cause that row to
> reappear in a subsequent sqlite3_step(). As long as the application is
> prepared to deal with these ambiguities, the operations themselves are
> safe and will not harm the database file.
> This paragraph focuses on an example with a single table, perhaps because
> it's easy to imagine iterating over this with a simple cursor.  I can also
> imagine that more complex queries will involve more complex data
> structures to manage the iteration within SQLite, and so have a more
> complex relationship with what rows get returned.  Rather than trying to
> explain (and then have to maintain compatibility) I fully understand the
> point of "assume nothing" here.  Ultimately, I want to know enough to
> robustly deal with these ambiguities as the last sentence suggests.
> 
> many thanks,
> stephan();
> 
> 
> 
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Clarification on "No Isolation On Same Db Connection"

Reply via email to