Re: [sqlite] Query plan gone haywire lays waste to my library's performance
On Friday, 28 April, 2017 15:55, Simon Slavinwrote: > > The only difference is the explicit JOIN statement. I was under the > > impression that using this, vs. the way I wrote it, is a matter of taste > > that doesn’t affect the execution of the query. > SQLite computes two-table searches using nested loops. Providing the JOIN > you want tells SQLite which order you think the loops should be nested in. > Expressing the searches with JOINs is also helps me figure out what SQLite > is actually doing. Before I did that it wasn’t obvious to me how little > the table "docs" mattered for this query. NO IT DOES NOT. FROM JOIN < ON IS EXACTLY THE SAME AS FROM , WHERE THERE IS NOT DIFFERENCE WHATSOEVER. EVER. PERIOD. It is also exactly the same as FROM JOIN WHERE FROM JOIN WHERE FROM JOIN ON FROM , WHERE The *ONLY TIME EVER THAT "ON " is not IDENTICAL to leaving out the JOIN keyword and putting the in the WHERE clause is the case of an OUTER (LEFT) JOIN. In all other cases they are just different spellings of exactly the same thing. The other somewhat limited exception is the CROSS JOIN which specifies the visitation order where the LHS table is the outer loop and the RHS table is the inner loop, however, in that case the ON clause is identical to a WHERE clause and is nothing more than alternate spelling (just like we spell colour properly here, but the Yangs spell it color). ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query plan gone haywire lays waste to my library's performance
The syntactic sugar of the ON clause does nothing for equijoins. Only for outer joins. Please RTFM: See https://sqlite.org/queryplanner.html And https://sqlite.org/optoverview.html And https://sqlite.org/queryplanner-ng.html -- ˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı > -Original Message- > From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] > On Behalf Of Simon Slavin > Sent: Friday, 28 April, 2017 15:00 > To: SQLite mailing list > Subject: Re: [sqlite] Query plan gone haywire lays waste to my library's > performance > > On 28 Apr 2017, at 9:49pm, Jens Alfke <j...@mooseyard.com> wrote: > > > SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs, > docs WHERE sequence>? AND current!=0 AND deleted=0 AND revs.doc_id = > docs.doc_id ORDER BY revs.doc_id, deleted, revid DESC > > 0 0 0 SCAN TABLE revs USING INDEX revs_by_docid_revid > > 0 1 1 SEARCH TABLE docs USING INTEGER PRIMARY KEY (rowid=?) > > > > Another query gets it wrong too: > > > > SELECT sequence, revs.doc_id, docid, revid, deleted FROM revs, docs > WHERE sequence > ? AND current=1 AND revs.doc_id = docs.doc_id ORDER BY > revs.doc_id, deleted, revid DESC > > > > 0 0 0 SCAN TABLE revs USING COVERING INDEX revs_current > > 0 1 1 SEARCH TABLE docs USING INTEGER PRIMARY KEY (rowid=?) > > 0 0 0 USE TEMP B-TREE FOR RIGHT PART OF ORDER BY > > What indexes do you have on these two tables ? I can’t recommend one > without knowing which columns belong to which tables. > > Once you have the indexes you want, run ANALYZE. Or rather, make sure > that the customers’ database files have had ANALYZE run on them while they > contain typical data. > > First query rewritten for clarity: > > SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs, docs > WHERE sequence>? AND current!=0 AND deleted=0 > AND revs.doc_id = docs.doc_id > ORDER BY revs.doc_id, deleted, revid DESC > > Should be more like > > SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs > JOIN docs ON docs.doc_id = revs.doc_id > WHERE sequence>? AND current!=0 AND deleted=0 > ORDER BY revs.doc_id, deleted, revid DESC > > Second query should be more like: > > SELECT revs.doc_id, sequence, docid, revid, deleted FROM revs > JOIN docs ON revs.doc_id = docs.doc_id > WHERE sequence > ? AND current=1 > ORDER BY revs.doc_id, deleted, revid DESC > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query plan gone haywire lays waste to my library's performance
On 4/28/2017 4:37 PM, Jens Alfke wrote: CREATE TABLE docs (doc_id INTEGER PRIMARY KEY, docid TEXT UNIQUE NOT NULL, expiry_timestamp INTEGER); CREATE INDEX docs_docid ON docs(docid); For the record, this index is redundant. There's already an automatically created index on docs(docid), thanks to UNIQUE clause. This might be what confuses SQLite. See whether the behavior changes if you drop it. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query plan gone haywire lays waste to my library's performance
On 28 Apr 2017, at 10:23pm, Jens Alfkewrote: > On Apr 28, 2017, at 2:00 PM, Simon Slavin wrote: > >> What indexes do you have on these two tables ? I can’t recommend one >> without knowing which columns belong to which tables. > > I showed them at the end of my first message. Sorry, I missed that. > CREATE TABLE docs (doc_id INTEGER PRIMARY KEY, docid TEXT UNIQUE NOT NULL, > expiry_timestamp INTEGER); > CREATE INDEX docs_docid ON docs(docid); > CREATE TABLE revs (sequence INTEGER PRIMARY KEY AUTOINCREMENT, doc_id INTEGER > NOT NULL REFERENCES docs(doc_id) ON DELETE CASCADE, revid TEXT NOT NULL > COLLATE REVID, parent INTEGER REFERENCES revs(sequence) ON DELETE SET NULL, > current BOOLEAN,deleted BOOLEAN DEFAULT 0, json BLOB, > no_attachments BOOLEAN, doc_type TEXT, UNIQUE (doc_id, revid)); > CREATE INDEX revs_parent ON revs(parent); > CREATE INDEX revs_by_docid_revid ON revs(doc_id, revid desc, current, > deleted); > CREATE INDEX revs_current ON revs(doc_id, current desc, deleted, revid desc); This search SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs JOIN docs ON docs.doc_id = revs.doc_id WHERE revs.sequence>? AND revs.current!=0 AND revs.deleted=0 ORDER BY revs.doc_id, revs.deleted, revs.revid DESC can be improved if you know that revs.current is never negative. If that’s the case then rephrase it SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs JOIN docs ON docs.doc_id = revs.doc_id WHERE revs.sequence>? AND revs.current>0 AND revs.deleted=0 ORDER BY revs.doc_id, revs.deleted, revs.revid DESC Either way, the search could benefit from a better index on revs. Depending on your data it would be one of these four: CREATE INDEX revs_1 ON revs(deleted,current,sequence,doc_id,revid); CREATE INDEX revs_2 ON revs(deleted,current,doc_id,sequence,revid); CREATE INDEX revs_3 ON revs(deleted,sequence,current,doc_id,revid); CREATE INDEX revs_4 ON revs(deleted,sequence,doc_id,current,revid); Create those four in a database with typical data in it. Then do ANALYZE. Then run the query and see if time has improved. Once you see which index SQLite decided to use, you can delete the other ones. > The only difference is the explicit JOIN statement. I was under the > impression that using this, vs. the way I wrote it, is a matter of taste that > doesn’t affect the execution of the query. SQLite computes two-table searches using nested loops. Providing the JOIN you want tells SQLite which order you think the loops should be nested in. Expressing the searches with JOINs is also helps me figure out what SQLite is actually doing. Before I did that it wasn’t obvious to me how little the table "docs" mattered for this query. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query plan gone haywire lays waste to my library's performance
> On Apr 28, 2017, at 2:00 PM, Simon Slavinwrote: > > What indexes do you have on these two tables ? I can’t recommend one without > knowing which columns belong to which tables. I showed them at the end of my first message. > First query rewritten for clarity: > […] Should be more like > > SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs > JOIN docs ON docs.doc_id = revs.doc_id > WHERE sequence>? AND current!=0 AND deleted=0 > ORDER BY revs.doc_id, deleted, revid DESC The only difference is the explicit JOIN statement. I was under the impression that using this, vs. the way I wrote it, is a matter of taste that doesn’t affect the execution of the query. (I just tried changing some queries to use JOIN syntax and it didn’t affect their query plans.) —Jens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query plan gone haywire lays waste to my library's performance
On 28 Apr 2017, at 9:49pm, Jens Alfkewrote: > SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs, docs > WHERE sequence>? AND current!=0 AND deleted=0 AND revs.doc_id = docs.doc_id > ORDER BY revs.doc_id, deleted, revid DESC > 0 0 0 SCAN TABLE revs USING INDEX revs_by_docid_revid > 0 1 1 SEARCH TABLE docs USING INTEGER PRIMARY KEY (rowid=?) > > Another query gets it wrong too: > > SELECT sequence, revs.doc_id, docid, revid, deleted FROM revs, docs WHERE > sequence > ? AND current=1 AND revs.doc_id = docs.doc_id ORDER BY > revs.doc_id, deleted, revid DESC > > 0 0 0 SCAN TABLE revs USING COVERING INDEX revs_current > 0 1 1 SEARCH TABLE docs USING INTEGER PRIMARY KEY (rowid=?) > 0 0 0 USE TEMP B-TREE FOR RIGHT PART OF ORDER BY What indexes do you have on these two tables ? I can’t recommend one without knowing which columns belong to which tables. Once you have the indexes you want, run ANALYZE. Or rather, make sure that the customers’ database files have had ANALYZE run on them while they contain typical data. First query rewritten for clarity: SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs, docs WHERE sequence>? AND current!=0 AND deleted=0 AND revs.doc_id = docs.doc_id ORDER BY revs.doc_id, deleted, revid DESC Should be more like SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs JOIN docs ON docs.doc_id = revs.doc_id WHERE sequence>? AND current!=0 AND deleted=0 ORDER BY revs.doc_id, deleted, revid DESC Second query should be more like: SELECT revs.doc_id, sequence, docid, revid, deleted FROM revs JOIN docs ON revs.doc_id = docs.doc_id WHERE sequence > ? AND current=1 ORDER BY revs.doc_id, deleted, revid DESC ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query plan gone haywire lays waste to my library's performance
Sorry for the quick follow-up, but I’ve just run a quick test that does an EXPLAIN QUERY PLAN for every statement that gets compiled, in an empty database, and confirmed that the query plan comes out wrong right from the start: SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs, docs WHERE sequence>? AND current!=0 AND deleted=0 AND revs.doc_id = docs.doc_id ORDER BY revs.doc_id, deleted, revid DESC 0 0 0 SCAN TABLE revs USING INDEX revs_by_docid_revid 0 1 1 SEARCH TABLE docs USING INTEGER PRIMARY KEY (rowid=?) Another query gets it wrong too: SELECT sequence, revs.doc_id, docid, revid, deleted FROM revs, docs WHERE sequence > ? AND current=1 AND revs.doc_id = docs.doc_id ORDER BY revs.doc_id, deleted, revid DESC 0 0 0 SCAN TABLE revs USING COVERING INDEX revs_current 0 1 1 SEARCH TABLE docs USING INTEGER PRIMARY KEY (rowid=?) 0 0 0 USE TEMP B-TREE FOR RIGHT PART OF ORDER BY This is with SQLite 3.16.0. —Jens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Query plan gone haywire lays waste to my library's performance
I’ve got an urgent-to-me issue wherein a customer says that our library has slowed down by several orders of magnitude in iOS 10.3 compared to iOS 9. I ran a test case they provided and found that some queries that should be fast are taking a very long time (~500ms) and generating huge numbers of reads. Using EXPLAIN QUERY PLAN I found that the optimizer has generated a horrifically bad plan, essentially an inside-out join, that results in linear scans of two large tables. (Presumably this is due to some change in the optimizer between the older and newer versions of SQLite in iOS.) Now I need to figure out how to fix this ASAP. Our library periodically runs ANALYZE, so sqlite might not even start out using the bad query plan; it might happen only with large databases or with some specific data sets. The query, with some extraneous details removed, looks like: SELECT docs.doc_id, revs.sequence, docs.docid, rev.revid, rev.json, […] FROM revs, docs WHERE revs.sequence>? AND revs.current!=0 AND revs.doc_id = docs.doc_id ORDER BY revs.doc_id, revs.deleted, revs.revid DESC The “revs” table has “sequence” as its integer primary key, and a foreign key “doc_id” referencing the “docs” table. (Plus many other columns.) The “docs” table has “doc_id” as its primary key and “docid” as a text column. Basically this query is meant to find the latest revs, including their “docid” strings joined from the “docs” table. It should just be using the “revs” table’s primary index starting from the “?” parameter, then looking up the corresponding doc for each revision. Instead it does this: selectid = 0 order = 0 from = 1 detail = SCAN TABLE docs USING COVERING INDEX docs_docid selectid = 0 order = 1 from = 0 detail = SEARCH TABLE revs USING INDEX revs_current (doc_id=?) selectid = 0 order = 0 from = 0 detail = USE TEMP B-TREE FOR ORDER BY It’s scanning through all 50,000+ rows in “docs”, and then for each document looking up all the rows in “revs” that reference it (“revs” has 175,000 rows), and comparing their sequences to the parameter. (I’ve pasted the relevant parts of the db schema at the bottom of this email if you want to refer to them.) My first priority is to find some way to restate this query such that it will always use the revs.sequence primary key index. A lower priority is to know whether there’s a valid reason the query planner came up with this, or if it’s a bug in the optimizer or the ANALYZE command? In the planner’s defense, it probably doesn’t know that this query is almost always run with the starting sequence (“?”) relatively close to the maximum sequence. If low numbers were frequently given for “?”, it’s possible that its current plan would be more efficient… Thanks, —Jens CREATE TABLE docs (doc_id INTEGER PRIMARY KEY, docid TEXT UNIQUE NOT NULL, expiry_timestamp INTEGER); CREATE INDEX docs_docid ON docs(docid); CREATE TABLE revs (sequence INTEGER PRIMARY KEY AUTOINCREMENT, doc_id INTEGER NOT NULL REFERENCES docs(doc_id) ON DELETE CASCADE, revid TEXT NOT NULL COLLATE REVID, parent INTEGER REFERENCES revs(sequence) ON DELETE SET NULL, current BOOLEAN,deleted BOOLEAN DEFAULT 0, json BLOB, no_attachments BOOLEAN, doc_type TEXT, UNIQUE (doc_id, revid)); CREATE INDEX revs_parent ON revs(parent); CREATE INDEX revs_by_docid_revid ON revs(doc_id, revid desc, current, deleted); CREATE INDEX revs_current ON revs(doc_id, current desc, deleted, revid desc); ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users