Re: [sqlite] Query plan gone haywire lays waste to my library's performance

2017-04-28 Thread Keith Medcalf
On Friday, 28 April, 2017 15:55, Simon Slavin  wrote:

> > The only difference is the explicit JOIN statement. I was under the
> > impression that using this, vs. the way I wrote it, is a matter of taste
> > that doesn’t affect the execution of the query.
 
> SQLite computes two-table searches using nested loops.  Providing the JOIN
> you want tells SQLite which order you think the loops should be nested in.
> Expressing the searches with JOINs is also helps me figure out what SQLite
> is actually doing.  Before I did that it wasn’t obvious to me how little
> the table "docs" mattered for this query.

NO IT DOES NOT.

FROM  JOIN < ON 

IS EXACTLY THE SAME AS

FROM ,  WHERE 

THERE IS NOT DIFFERENCE WHATSOEVER.  EVER.  PERIOD.

It is also exactly the same as 

FROM  JOIN  WHERE 
FROM  JOIN  WHERE 
FROM  JOIN  ON 
FROM ,  WHERE 

The *ONLY TIME EVER THAT "ON " is not IDENTICAL to leaving out the 
JOIN keyword and putting the  in the WHERE clause is the case of an 
OUTER (LEFT) JOIN.  In all other cases they are just different spellings of 
exactly the same thing.  The other somewhat limited exception is the CROSS JOIN 
which specifies the visitation order where the LHS table is the outer loop and 
the RHS table is the inner loop, however, in that case the ON clause is 
identical to a WHERE clause and is nothing more than alternate spelling (just 
like we spell colour properly here, but the Yangs spell it color).





___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query plan gone haywire lays waste to my library's performance

2017-04-28 Thread Keith Medcalf

The syntactic sugar of the ON clause does nothing for equijoins.  Only for 
outer joins.  Please RTFM:


See https://sqlite.org/queryplanner.html
And https://sqlite.org/optoverview.html
And https://sqlite.org/queryplanner-ng.html


-- 
˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı

> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Simon Slavin
> Sent: Friday, 28 April, 2017 15:00
> To: SQLite mailing list
> Subject: Re: [sqlite] Query plan gone haywire lays waste to my library's
> performance
> 
> On 28 Apr 2017, at 9:49pm, Jens Alfke <j...@mooseyard.com> wrote:
> 
> > SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs,
> docs WHERE sequence>? AND current!=0 AND deleted=0 AND revs.doc_id =
> docs.doc_id ORDER BY revs.doc_id, deleted, revid DESC
> > 0 0 0 SCAN TABLE revs USING INDEX revs_by_docid_revid
> > 0 1 1 SEARCH TABLE docs USING INTEGER PRIMARY KEY (rowid=?)
> >
> > Another query gets it wrong too:
> >
> > SELECT sequence, revs.doc_id, docid, revid, deleted  FROM revs, docs
> WHERE sequence > ? AND current=1 AND revs.doc_id = docs.doc_id ORDER BY
> revs.doc_id, deleted, revid DESC
> >
> > 0 0 0 SCAN TABLE revs USING COVERING INDEX revs_current
> > 0 1 1 SEARCH TABLE docs USING INTEGER PRIMARY KEY (rowid=?)
> > 0 0 0 USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
> 
> What indexes do you have on these two tables ?  I can’t recommend one
> without knowing which columns belong to which tables.
> 
> Once you have the indexes you want, run ANALYZE.  Or rather, make sure
> that the customers’ database files have had ANALYZE run on them while they
> contain typical data.
> 
> First query rewritten for clarity:
> 
> SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs, docs
>   WHERE sequence>? AND current!=0 AND deleted=0
>   AND revs.doc_id = docs.doc_id
>   ORDER BY revs.doc_id, deleted, revid DESC
> 
> Should be more like
> 
> SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs
>   JOIN docs ON docs.doc_id = revs.doc_id
>   WHERE sequence>? AND current!=0 AND deleted=0
>   ORDER BY revs.doc_id, deleted, revid DESC
> 
> Second query should be more like:
> 
> SELECT revs.doc_id, sequence, docid, revid, deleted FROM revs
>   JOIN docs ON revs.doc_id = docs.doc_id
>   WHERE sequence > ? AND current=1
>   ORDER BY revs.doc_id, deleted, revid DESC
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query plan gone haywire lays waste to my library's performance

2017-04-28 Thread Igor Tandetnik

On 4/28/2017 4:37 PM, Jens Alfke wrote:

CREATE TABLE docs (doc_id INTEGER PRIMARY KEY, docid TEXT UNIQUE NOT NULL, 
expiry_timestamp INTEGER);
CREATE INDEX docs_docid ON docs(docid);


For the record, this index is redundant. There's already an 
automatically created index on docs(docid), thanks to UNIQUE clause. 
This might be what confuses SQLite. See whether the behavior changes if 
you drop it.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query plan gone haywire lays waste to my library's performance

2017-04-28 Thread Simon Slavin

On 28 Apr 2017, at 10:23pm, Jens Alfke  wrote:

> On Apr 28, 2017, at 2:00 PM, Simon Slavin  wrote:
> 
>> What indexes do you have on these two tables ?  I can’t recommend one 
>> without knowing which columns belong to which tables.
> 
> I showed them at the end of my first message.

Sorry, I missed that.

> CREATE TABLE docs (doc_id INTEGER PRIMARY KEY, docid TEXT UNIQUE NOT NULL, 
> expiry_timestamp INTEGER);
> CREATE INDEX docs_docid ON docs(docid);

> CREATE TABLE revs (sequence INTEGER PRIMARY KEY AUTOINCREMENT, doc_id INTEGER 
> NOT NULL REFERENCES docs(doc_id) ON DELETE CASCADE, revid TEXT NOT NULL 
> COLLATE REVID, parent INTEGER REFERENCES revs(sequence) ON DELETE SET NULL,  
> current BOOLEAN,deleted BOOLEAN DEFAULT 0, json BLOB, 
> no_attachments BOOLEAN, doc_type TEXT, UNIQUE (doc_id, revid));
> CREATE INDEX revs_parent ON revs(parent);
> CREATE INDEX revs_by_docid_revid ON revs(doc_id, revid desc, current, 
> deleted);
> CREATE INDEX revs_current ON revs(doc_id, current desc, deleted, revid desc);

This search

SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs
JOIN docs ON docs.doc_id = revs.doc_id
WHERE revs.sequence>? AND revs.current!=0 AND revs.deleted=0
ORDER BY revs.doc_id, revs.deleted, revs.revid DESC

can be improved if you know that revs.current is never negative.  If that’s the 
case then rephrase it

SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs
JOIN docs ON docs.doc_id = revs.doc_id
WHERE revs.sequence>? AND revs.current>0 AND revs.deleted=0
ORDER BY revs.doc_id, revs.deleted, revs.revid DESC

Either way, the search could benefit from a better index on revs.  Depending on 
your data it would be one of these four:

CREATE INDEX revs_1 ON revs(deleted,current,sequence,doc_id,revid);
CREATE INDEX revs_2 ON revs(deleted,current,doc_id,sequence,revid);
CREATE INDEX revs_3 ON revs(deleted,sequence,current,doc_id,revid);
CREATE INDEX revs_4 ON revs(deleted,sequence,doc_id,current,revid);

Create those four in a database with typical data in it.  Then do ANALYZE.  
Then run the query and see if time has improved.  Once you see which index 
SQLite decided to use, you can delete the other ones.

> The only difference is the explicit JOIN statement. I was under the 
> impression that using this, vs. the way I wrote it, is a matter of taste that 
> doesn’t affect the execution of the query.


SQLite computes two-table searches using nested loops.  Providing the JOIN you 
want tells SQLite which order you think the loops should be nested in.  
Expressing the searches with JOINs is also helps me figure out what SQLite is 
actually doing.  Before I did that it wasn’t obvious to me how little the table 
"docs" mattered for this query.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query plan gone haywire lays waste to my library's performance

2017-04-28 Thread Jens Alfke

> On Apr 28, 2017, at 2:00 PM, Simon Slavin  wrote:
> 
> What indexes do you have on these two tables ?  I can’t recommend one without 
> knowing which columns belong to which tables.

I showed them at the end of my first message.

> First query rewritten for clarity:
> […] Should be more like
> 
> SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs
>   JOIN docs ON docs.doc_id = revs.doc_id
>   WHERE sequence>? AND current!=0 AND deleted=0
>   ORDER BY revs.doc_id, deleted, revid DESC

The only difference is the explicit JOIN statement. I was under the impression 
that using this, vs. the way I wrote it, is a matter of taste that doesn’t 
affect the execution of the query. (I just tried changing some queries to use 
JOIN syntax and it didn’t affect their query plans.)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query plan gone haywire lays waste to my library's performance

2017-04-28 Thread Simon Slavin
On 28 Apr 2017, at 9:49pm, Jens Alfke  wrote:

> SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs, docs 
> WHERE sequence>? AND current!=0 AND deleted=0 AND revs.doc_id = docs.doc_id 
> ORDER BY revs.doc_id, deleted, revid DESC
> 0 0 0 SCAN TABLE revs USING INDEX revs_by_docid_revid
> 0 1 1 SEARCH TABLE docs USING INTEGER PRIMARY KEY (rowid=?)
> 
> Another query gets it wrong too:
> 
> SELECT sequence, revs.doc_id, docid, revid, deleted  FROM revs, docs WHERE 
> sequence > ? AND current=1 AND revs.doc_id = docs.doc_id ORDER BY 
> revs.doc_id, deleted, revid DESC
> 
> 0 0 0 SCAN TABLE revs USING COVERING INDEX revs_current
> 0 1 1 SEARCH TABLE docs USING INTEGER PRIMARY KEY (rowid=?)
> 0 0 0 USE TEMP B-TREE FOR RIGHT PART OF ORDER BY

What indexes do you have on these two tables ?  I can’t recommend one without 
knowing which columns belong to which tables.

Once you have the indexes you want, run ANALYZE.  Or rather, make sure that the 
customers’ database files have had ANALYZE run on them while they contain 
typical data.

First query rewritten for clarity:

SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs, docs
WHERE sequence>? AND current!=0 AND deleted=0
AND revs.doc_id = docs.doc_id
ORDER BY revs.doc_id, deleted, revid DESC

Should be more like

SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs
JOIN docs ON docs.doc_id = revs.doc_id
WHERE sequence>? AND current!=0 AND deleted=0
ORDER BY revs.doc_id, deleted, revid DESC

Second query should be more like:

SELECT revs.doc_id, sequence, docid, revid, deleted FROM revs
JOIN docs ON revs.doc_id = docs.doc_id
WHERE sequence > ? AND current=1
ORDER BY revs.doc_id, deleted, revid DESC
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query plan gone haywire lays waste to my library's performance

2017-04-28 Thread Jens Alfke
Sorry for the quick follow-up, but I’ve just run a quick test that does an 
EXPLAIN QUERY PLAN for every statement that gets compiled, in an empty 
database, and confirmed that the query plan comes out wrong right from the 
start:

SELECT revs.doc_id, sequence, docid, revid, json, deleted FROM revs, docs WHERE 
sequence>? AND current!=0 AND deleted=0 AND revs.doc_id = docs.doc_id ORDER BY 
revs.doc_id, deleted, revid DESC
0 0 0 SCAN TABLE revs USING INDEX revs_by_docid_revid
0 1 1 SEARCH TABLE docs USING INTEGER PRIMARY KEY (rowid=?)

Another query gets it wrong too:

SELECT sequence, revs.doc_id, docid, revid, deleted  FROM revs, docs WHERE 
sequence > ? AND current=1 AND revs.doc_id = docs.doc_id ORDER BY revs.doc_id, 
deleted, revid DESC

0 0 0 SCAN TABLE revs USING COVERING INDEX revs_current
0 1 1 SEARCH TABLE docs USING INTEGER PRIMARY KEY (rowid=?)
0 0 0 USE TEMP B-TREE FOR RIGHT PART OF ORDER BY

This is with SQLite 3.16.0.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Query plan gone haywire lays waste to my library's performance

2017-04-28 Thread Jens Alfke
I’ve got an urgent-to-me issue wherein a customer says that our library has 
slowed down by several orders of magnitude in iOS 10.3 compared to iOS 9. I ran 
a test case they provided and found that some queries that should be fast are 
taking a very long time (~500ms) and generating huge numbers of reads. Using 
EXPLAIN QUERY PLAN I found that the optimizer has generated a horrifically bad 
plan, essentially an inside-out join, that results in linear scans of two large 
tables. (Presumably this is due to some change in the optimizer between the 
older and newer versions of SQLite in iOS.)

Now I need to figure out how to fix this ASAP. Our library periodically runs 
ANALYZE, so sqlite might not even start out using the bad query plan; it might 
happen only with large databases or with some specific data sets.

The query, with some extraneous details removed, looks like:
SELECT docs.doc_id, revs.sequence, docs.docid, rev.revid, rev.json, […] 
FROM revs, docs 
WHERE revs.sequence>? AND revs.current!=0 AND revs.doc_id = docs.doc_id 
ORDER BY revs.doc_id, revs.deleted, revs.revid DESC

The “revs” table has “sequence” as its integer primary key, and a foreign key 
“doc_id” referencing the “docs” table. (Plus many other columns.)
The “docs” table has “doc_id” as its primary key and “docid” as a text column.

Basically this query is meant to find the latest revs, including their “docid” 
strings joined from the “docs” table. It should just be using the “revs” 
table’s primary index starting from the “?” parameter, then looking up the 
corresponding doc for each revision. Instead it does this:

selectid = 0
   order = 0
from = 1
  detail = SCAN TABLE docs USING COVERING INDEX docs_docid

selectid = 0
   order = 1
from = 0
  detail = SEARCH TABLE revs USING INDEX revs_current (doc_id=?)

selectid = 0
   order = 0
from = 0
  detail = USE TEMP B-TREE FOR ORDER BY

It’s scanning through all 50,000+ rows in “docs”, and then for each document 
looking up all the rows in “revs” that reference it (“revs” has 175,000 rows), 
and comparing their sequences to the parameter.

(I’ve pasted the relevant parts of the db schema at the bottom of this email if 
you want to refer to them.)

My first priority is to find some way to restate this query such that it will 
always use the revs.sequence primary key index.

A lower priority is to know whether there’s a valid reason the query planner 
came up with this, or if it’s a bug in the optimizer or the ANALYZE command? In 
the planner’s defense, it probably doesn’t know that this query is almost 
always run with the starting sequence (“?”) relatively close to the maximum 
sequence.  If low numbers were frequently given for “?”, it’s possible that its 
current plan would be more efficient…

Thanks,

—Jens

CREATE TABLE docs (doc_id INTEGER PRIMARY KEY, docid TEXT UNIQUE NOT NULL, 
expiry_timestamp INTEGER);
CREATE INDEX docs_docid ON docs(docid);
CREATE TABLE revs (sequence INTEGER PRIMARY KEY AUTOINCREMENT, doc_id INTEGER 
NOT NULL REFERENCES docs(doc_id) ON DELETE CASCADE, revid TEXT NOT NULL COLLATE 
REVID, parent INTEGER REFERENCES revs(sequence) ON DELETE SET NULL,  current 
BOOLEAN,deleted BOOLEAN DEFAULT 0, json BLOB, no_attachments 
BOOLEAN, doc_type TEXT, UNIQUE (doc_id, revid));
CREATE INDEX revs_parent ON revs(parent);
CREATE INDEX revs_by_docid_revid ON revs(doc_id, revid desc, current, deleted);
CREATE INDEX revs_current ON revs(doc_id, current desc, deleted, revid desc);

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users