Please reply if you sent this. Thanks.




-----Original Message-----
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On
Behalf Of Stephan Mueller
Sent: Wednesday, September 7, 2016 6:12 PM
To: sqlite-users@mailinglists.sqlite.org
Subject: [sqlite] Clarification on "No Isolation On Same Db Connection"

Hello all;

I have a scenario that is well covered by the text at
https://www.sqlite.org/isolation.html in the section "No Isolation Between
Operations On The Same Database Connection" (relevant paragraphs at end for
convenience) but am hoping for a bit more clarity.  Here's an overview of my
scenario.  I'm trying to find the right level of detail; if I've failed on
the side of too little, I'm happy to provide more.

I have a non-trivial SELECT simulating an outer join
(http://stackoverflow.com/questions/1923259/full-outer-join-with-sqlite)
with the overall structure

        SELECT columns FROM t1 LEFT JOIN t2 ON t1.column = t2.column
            WHERE conditions
        UNION ALL
        SELECT columns FROM t2 LEFT JOIN t1 ON t2.column = t1.column
            WHERE t1.column IS NULL
            AND more_conditions
        ORDER BY column

The application fetches a row at a time, doing an UPDATE or INSERT or DELETE
into t1 (henceforth referred to just  as 'updates') for each row fetched.  I
_want_ the semantics to be that the SELECT is unperturbed by my updates.
(Details: there are actually three tables.  The joined-on column is UNIQUE
(indeed, the PRIMARY KEY) in each table.  The query conditions are such that
column is UNIQUE in the result set when not doing any updates while
iterating.)

I understand that a way to ensure "SELECT is unperturbed" semantics is to
use separate connections for SELECT and updates.   This is undesirable since
I'd have to (IIUC) do all my updates (possibly millions) in a single
transaction.  I'd prefer to commit after each update (to avoid redoing any
work in case of unnatural program abort, for example).  The compromise for
performance is to commit every N (4096, currently) updates, but potentially
millions seems right out.

Another way is to fetch all the results into a temporary table tt, then
iterate over tt while modifying t1.   This is easy to code, and SQLite can
fill tt with a single INSERT INTO tt SELECT <as above>.  With potentially
millions of rows, however, I'm rather an approach that doesn't spoil the
interactive user experience by adding seconds spent on creating the
temporary table, if that can be avoided.

I'm thus led to the following notion (apologies for my pseudo-code, which is
neither SQL, nor C, nor the Perl I'm actually coding in):

        prepare SELECT from above
        execute()
        $last = "";
        while (fetchnext) {
                if $fetched.column <= $last                (1)
                        skip to next iteration of loop      (2)
                $last = $fetched.column                       (3)
                update-or-insert-or-delete
        }

That is, if I ever receive a record that ought to have arrived earlier
because of ORDER BY, it must be a since-SELECT-began update, and should be
ignored.

I'm trying to not think in terms of potential implementation, and stick with
"what must ORDER BY guarantee?"   I'd imagine if it ever returns a recent
update, it would do so at the proper point in sort order.  If it's decided
that an "alphabetically-prior-to-where-we-have-been-returning-records recent
update" record must be returned, it would be returned ASAP -- as if the
ORDER BY guarantee was that "all remaining records are sorted, and the
lowest of those returned next".

Is my thinking sound?  -- are the numbered lines in my pseudo-code necessary
and/or sufficient to achieve the goal of ignoring updates in this case, in a
righteously precise use of SQL?

Thanks in advance for your thoughts!

Here are selections from the relevant SQLite documentation paragraphs to
provide context, with comments/queries interspersed.

      No Isolation Between Operations On The Same Database Connection

      ... What if a SELECT statement is started and the
sqlite3_step()<https://www.sqlite.org/c3ref/step.html> interface steps
through roughly half of its output, then some
UPDATE<https://www.sqlite.org/lang_update.html> statements are run by the
application that modify the table that the SELECT statement is reading, then
more calls to sqlite3_step()<https://www.sqlite.org/c3ref/step.html> are
made to finish out the SELECT statement? Will the later steps of the SELECT
statement see the changes made by the UPDATE or not? The answer is that this
behavior is undefined.  In particular, whether or not the SELECT statement
sees the concurrent changes depends on which release of SQLite is running,
the schema of the database file, whether or not
ANALYZE<https://www.sqlite.org/lang_analyze.html> has been run, and the
details of the query. In some cases, it might depend on the content of the
database file, too. There is no good way to know whether or not a SELECT
statement will see changes that were made to the database by the same
database connection after the SELECT statement was started. And hence,
developers should diligently avoid writing applications that make
assumptions about what will occur in that circumstance.
When talking about the C language specification, it's traditional to claim
that "this behaviour is undefined" means "the C compiler is free to format
your hard drive".  In this context, I hope it means "it is not specified
whether later steps of the SELECT statement will see the changes made by the
UPDATE, but the database is not at risk of corruption due to
updates-while-iterating".  Text below suggests my hopeful interpretation is
correct.

Formalizing my query above, I think I'm asking: given I would retrieve
records 1..N iterating over the SELECT results when making no modifications
while iterating, is it reasonable to assume that with
updates-to-previously-returned-records-while-iterating, I am still
guaranteed to receive at least the same records 1..N, in the same order
(assuming the ORDER BY clause as in my query), possibly (but not
necessarily) with additional records interspersed, and any interspersed
records would need to respect the ORDER BY clause as well (which means "in
sorted order for 'higher' than most-recently-returned, and 'as soon as
possible' for 'lower or equal to' most-recently returned?

      If an application issues a SELECT statement on a single table like
"SELECT rowid, * FROM table WHERE ..." and starts stepping through the
output of that statement
usingsqlite3_step()<https://www.sqlite.org/c3ref/step.html> and examining
each row, then it is safe for the application to delete the current row or
any prior row using "DELETE FROM table WHERE rowid=?". It is also safe (in
the sense that it will not harm the database) for the application to delete
a row that expected to appear later in the query but has not appeared yet.
If a future row is deleted, however, it might happen that the row turns up
after a subsequent sqlite3_step(), even after it has allegedly been deleted.
Or it might not. That behavior is undefined. The application can also INSERT
new rows into the table while the SELECT statement is running, but whether
or not the new rows appear in subsequent sqlite3_step()s of the query is
undefined. And the application can UPDATE the current row or any prior row,
though doing so might cause that row to reappear in a subsequent
sqlite3_step(). As long as the application is prepared to deal with these
ambiguities, the operations themselves are safe and will not harm the
database file.
This paragraph focuses on an example with a single table, perhaps because
it's easy to imagine iterating over this with a simple cursor.  I can also
imagine that more complex queries will involve more complex data structures
to manage the iteration within SQLite, and so have a more complex
relationship with what rows get returned.  Rather than trying to explain
(and then have to maintain compatibility) I fully understand the point of
"assume nothing" here.  Ultimately, I want to know enough to robustly deal
with these ambiguities as the last sentence suggests.

many thanks,
stephan();



_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



TMGID:S11411219126200001013
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to