Hello all; I have a scenario that is well covered by the text at https://www.sqlite.org/isolation.html in the section "No Isolation Between Operations On The Same Database Connection" (relevant paragraphs at end for convenience) but am hoping for a bit more clarity. Here's an overview of my scenario. I'm trying to find the right level of detail; if I've failed on the side of too little, I'm happy to provide more.
I have a non-trivial SELECT simulating an outer join (http://stackoverflow.com/questions/1923259/full-outer-join-with-sqlite) with the overall structure SELECT columns FROM t1 LEFT JOIN t2 ON t1.column = t2.column WHERE conditions UNION ALL SELECT columns FROM t2 LEFT JOIN t1 ON t2.column = t1.column WHERE t1.column IS NULL AND more_conditions ORDER BY column The application fetches a row at a time, doing an UPDATE or INSERT or DELETE into t1 (henceforth referred to just as 'updates') for each row fetched. I _want_ the semantics to be that the SELECT is unperturbed by my updates. (Details: there are actually three tables. The joined-on column is UNIQUE (indeed, the PRIMARY KEY) in each table. The query conditions are such that column is UNIQUE in the result set when not doing any updates while iterating.) I understand that a way to ensure "SELECT is unperturbed" semantics is to use separate connections for SELECT and updates. This is undesirable since I'd have to (IIUC) do all my updates (possibly millions) in a single transaction. I'd prefer to commit after each update (to avoid redoing any work in case of unnatural program abort, for example). The compromise for performance is to commit every N (4096, currently) updates, but potentially millions seems right out. Another way is to fetch all the results into a temporary table tt, then iterate over tt while modifying t1. This is easy to code, and SQLite can fill tt with a single INSERT INTO tt SELECT <as above>. With potentially millions of rows, however, I'm rather an approach that doesn't spoil the interactive user experience by adding seconds spent on creating the temporary table, if that can be avoided. I'm thus led to the following notion (apologies for my pseudo-code, which is neither SQL, nor C, nor the Perl I'm actually coding in): prepare SELECT from above execute() $last = ""; while (fetchnext) { if $fetched.column <= $last (1) skip to next iteration of loop (2) $last = $fetched.column (3) update-or-insert-or-delete } That is, if I ever receive a record that ought to have arrived earlier because of ORDER BY, it must be a since-SELECT-began update, and should be ignored. I'm trying to not think in terms of potential implementation, and stick with "what must ORDER BY guarantee?" I'd imagine if it ever returns a recent update, it would do so at the proper point in sort order. If it's decided that an "alphabetically-prior-to-where-we-have-been-returning-records recent update" record must be returned, it would be returned ASAP -- as if the ORDER BY guarantee was that "all remaining records are sorted, and the lowest of those returned next". Is my thinking sound? -- are the numbered lines in my pseudo-code necessary and/or sufficient to achieve the goal of ignoring updates in this case, in a righteously precise use of SQL? Thanks in advance for your thoughts! Here are selections from the relevant SQLite documentation paragraphs to provide context, with comments/queries interspersed. No Isolation Between Operations On The Same Database Connection ... What if a SELECT statement is started and the sqlite3_step()<https://www.sqlite.org/c3ref/step.html> interface steps through roughly half of its output, then some UPDATE<https://www.sqlite.org/lang_update.html> statements are run by the application that modify the table that the SELECT statement is reading, then more calls to sqlite3_step()<https://www.sqlite.org/c3ref/step.html> are made to finish out the SELECT statement? Will the later steps of the SELECT statement see the changes made by the UPDATE or not? The answer is that this behavior is undefined. In particular, whether or not the SELECT statement sees the concurrent changes depends on which release of SQLite is running, the schema of the database file, whether or not ANALYZE<https://www.sqlite.org/lang_analyze.html> has been run, and the details of the query. In some cases, it might depend on the content of the database file, too. There is no good way to know whether or not a SELECT statement will see changes that were made to the database by the same database connection after the SELECT statement was started. And hence, developers should diligently avoid writing applications that make assumptions about what will occur in that circumstance. When talking about the C language specification, it's traditional to claim that "this behaviour is undefined" means "the C compiler is free to format your hard drive". In this context, I hope it means "it is not specified whether later steps of the SELECT statement will see the changes made by the UPDATE, but the database is not at risk of corruption due to updates-while-iterating". Text below suggests my hopeful interpretation is correct. Formalizing my query above, I think I'm asking: given I would retrieve records 1..N iterating over the SELECT results when making no modifications while iterating, is it reasonable to assume that with updates-to-previously-returned-records-while-iterating, I am still guaranteed to receive at least the same records 1..N, in the same order (assuming the ORDER BY clause as in my query), possibly (but not necessarily) with additional records interspersed, and any interspersed records would need to respect the ORDER BY clause as well (which means "in sorted order for 'higher' than most-recently-returned, and 'as soon as possible' for 'lower or equal to' most-recently returned? If an application issues a SELECT statement on a single table like "SELECT rowid, * FROM table WHERE ..." and starts stepping through the output of that statement usingsqlite3_step()<https://www.sqlite.org/c3ref/step.html> and examining each row, then it is safe for the application to delete the current row or any prior row using "DELETE FROM table WHERE rowid=?". It is also safe (in the sense that it will not harm the database) for the application to delete a row that expected to appear later in the query but has not appeared yet. If a future row is deleted, however, it might happen that the row turns up after a subsequent sqlite3_step(), even after it has allegedly been deleted. Or it might not. That behavior is undefined. The application can also INSERT new rows into the table while the SELECT statement is running, but whether or not the new rows appear in subsequent sqlite3_step()s of the query is undefined. And the application can UPDATE the current row or any prior row, though doing so might cause that row to reappear in a subsequent sqlite3_step(). As long as the application is prepared to deal with these ambiguities, the operations themselves are safe and will not harm the database file. This paragraph focuses on an example with a single table, perhaps because it's easy to imagine iterating over this with a simple cursor. I can also imagine that more complex queries will involve more complex data structures to manage the iteration within SQLite, and so have a more complex relationship with what rows get returned. Rather than trying to explain (and then have to maintain compatibility) I fully understand the point of "assume nothing" here. Ultimately, I want to know enough to robustly deal with these ambiguities as the last sentence suggests. many thanks, stephan(); _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users