Re: [sqlite] use of index in ORDER BY clause
Am 29.06.2006 um 17:17 schrieb Dennis Cote: Jens Miltner wrote: Is there any way to improve the ORDER BY performance for joined queries? From your answer that the intermediate results are sorted, I take it no index won't ever be used when using ORDER BY with a join query? Is there a way to rewrite the queries so we don't take the penalty of sorting without an index? In our case, the 'bar' table may have several 100,000 entries and the 'foo' table probably has much less (say order of thousand). A minimal query (i.e. one where we don't return a minimal amount of data, so the pure data shuffling doesn't get in the way) with a 'bar' table with 250,000 entries and a 'foo' table with around 10 entries (so the 250,000 entries in the 'bar' table relate to only a few records in the 'foo' table), a query like the above takes 10-20 minutes to run (depending on the ORDER BY clauses used)! (Side-note: The table does have quite a few columns and apparently, the amount of data per row also massively affects the performance... when I tried with the same table with not all columns filled in - i.e. less data - performance was much better. I also tried increasing the page size, hoping that less paging would have happen, but this didn't really make a noticeable difference) I would really appreciate any hints on how to improve performance with this kind of setup... Jens, If you create an index on your bar.something column, it will be used to do the ordering. The log below shows how this query will be executed before and after creating this index. Doh! You're right... I could have sworn when I tried, it didn't show an index on bar(something) to be used... I probably confuse this with our real schema & query, which is slightly more complex and which I couldn't get to use an index on the ORDER BY column(s)... Maybe it's because there sometimes are more than one sort column, even from distinct tables. I guess in that case, I'm probably stuck without an index? Thanks for your help so far,
Re: [sqlite] use of index in ORDER BY clause
Jens Miltner wrote: Is there any way to improve the ORDER BY performance for joined queries? From your answer that the intermediate results are sorted, I take it no index won't ever be used when using ORDER BY with a join query? Is there a way to rewrite the queries so we don't take the penalty of sorting without an index? In our case, the 'bar' table may have several 100,000 entries and the 'foo' table probably has much less (say order of thousand). A minimal query (i.e. one where we don't return a minimal amount of data, so the pure data shuffling doesn't get in the way) with a 'bar' table with 250,000 entries and a 'foo' table with around 10 entries (so the 250,000 entries in the 'bar' table relate to only a few records in the 'foo' table), a query like the above takes 10-20 minutes to run (depending on the ORDER BY clauses used)! (Side-note: The table does have quite a few columns and apparently, the amount of data per row also massively affects the performance... when I tried with the same table with not all columns filled in - i.e. less data - performance was much better. I also tried increasing the page size, hoping that less paging would have happen, but this didn't really make a noticeable difference) I would really appreciate any hints on how to improve performance with this kind of setup... Jens, If you create an index on your bar.something column, it will be used to do the ordering. The log below shows how this query will be executed before and after creating this index. SQLite version 3.3.5 Enter ".help" for instructions sqlite> CREATE TABLE foo (id integer primary key, name text); sqlite> CREATE TABLE bar (id integer primary key, foo_id integer, something text ); sqlite> sqlite> CREATE INDEX bar_idx on bar(foo_id, something); sqlite> sqlite> .explain on sqlite> explain SELECT ...> foo.name AS name, ...> bar.something AS something ...> FROM ...> bar ...> LEFT JOIN ...> foo ON foo.id=bar.foo_id ...> ORDER BY ...> bar.something ...> ; addr opcode p1 p2 p3 -- -- -- - 0 OpenVirtual 2 3 keyinfo(1,BINARY) 1 Goto0 40 2 Integer 0 0 3 OpenRead0 3 4 SetNumColumns 0 3 5 Integer 0 0 6 OpenRead1 2 7 SetNumColumns 1 2 8 Rewind 0 26 9 MemInt 0 1 10Column 0 1 11MustBeInt 1 22 12NotExists 1 22 13MemInt 1 1 14Column 1 1 15Column 0 2 16MakeRecord 2 0 17Column 0 2 18Sequence2 0 19Pull2 0 20MakeRecord 3 0 21IdxInsert 2 0 22IfMemPos1 25 23NullRow 1 0 24Goto0 13 25Next0 9 26Close 0 0 27Close 1 0 28OpenPseudo 3 0 29SetNumColumns 3 2 30Sort2 38 31Integer 1 0 32Column 2 2 33Insert 3 0 34Column 3 0 35Column 3 1 36Callback2 0 37Next2 31 38Close 3 0 39Halt0 0 40Transaction 0 0 41VerifyCookie0 3 42Goto0 2 43Noop0 0 sqlite> .explain off sqlite> explain query plan SELECT ...> foo.name AS name, ...> bar.something AS something ...> FROM ...> bar ...> LEFT JOIN ...> foo ON foo.id=bar.foo_id ...> ORDER BY ...> bar.something ...> ; 0|0|TABLE bar 1|1|TABLE foo USING PRIMARY KEY sqlite> sqlite> create index bar_some on bar(something); sqlite> sqlite> .explain on sqlite> explain SELECT ...> foo.name AS name, ...> bar.something AS something ...> FROM ...> bar ...> LEFT JOIN ...> foo ON foo.id=bar.foo_id ...> ORDER BY ...> bar.something ...> ; addr opcode p1 p2 p3 -- -- -- - 0 Noop0 0 1 Goto0 31 2 Integer 0 0 3 OpenRead0
Re: [sqlite] use of index in ORDER BY clause
On 6/29/06, Jens Miltner <[EMAIL PROTECTED]> wrote: Is there any way to improve the ORDER BY performance for joined queries? From your answer that the intermediate results are sorted, I take it no index won't ever be used when using ORDER BY with a join query? You can use the explain command to optimize queries: http://sqlite.org/lang_explain.html You might read through the optimizer notes to see how it works rather than trust to my poor memory: http://sqlite.org/optoverview.html -- SqliteImporter and SqliteReplicator: Command line utilities for Sqlite http://www.reddawn.net/~jsprenkl/Sqlite Cthulhu Bucks! http://www.cthulhubucks.com
Re: [sqlite] use of index in ORDER BY clause
Am 27.06.2006 um 18:06 schrieb Dennis Cote: Jens Miltner wrote: I have a schema similar to this: CREATE TABLE foo (id integer primary key, name text); CREATE TABLE bar (id integer primary key, foo_id integer, something text); CREATE INDEX bar_idx on bar(foo_id, something); When I run a query like SELECT foo.id AS foo_id, bar.id AS bar_id FROM bar LEFT JOIN foo ON foo.id=bar.foo_id ORDER BY bar.something ; sqlite will only use the primary key index of foo (as the output of 'explain query plan' shows: 0|0|TABLE bar 1|1|TABLE foo USING PRIMARY KEY ) If I try to disable the foo primary key index by prefixing foo.id with a '+' sign, no index will be used. I'd have expected the bar_idx index to be used for the ORDER BY clause? Or is this some unreasonable assumption? Is there a way to enforce this? Thanks, Jens, Your query is not using any index to do the order by clause, it is sorting the intermediate results. There is no need to join the foo table in your query since the foo_id is available in the bar table. The following is equivalent: SELECT foo_id, bar.id AS bar_id FROM bar ORDER BY bar.something ; You're right, of course, but then again, this was just a (bad) example of what our table relations are... In reality, our tables are far larger and we do indeed need to join them, because not all information is available from the base table: SELECT foo.name AS name, bar.something AS something FROM bar LEFT JOIN foo ON foo.id=bar.foo_id ORDER BY bar.something ; Is there any way to improve the ORDER BY performance for joined queries? From your answer that the intermediate results are sorted, I take it no index won't ever be used when using ORDER BY with a join query? Is there a way to rewrite the queries so we don't take the penalty of sorting without an index? In our case, the 'bar' table may have several 100,000 entries and the 'foo' table probably has much less (say order of thousand). A minimal query (i.e. one where we don't return a minimal amount of data, so the pure data shuffling doesn't get in the way) with a 'bar' table with 250,000 entries and a 'foo' table with around 10 entries (so the 250,000 entries in the 'bar' table relate to only a few records in the 'foo' table), a query like the above takes 10-20 minutes to run (depending on the ORDER BY clauses used)! (Side-note: The table does have quite a few columns and apparently, the amount of data per row also massively affects the performance... when I tried with the same table with not all columns filled in - i.e. less data - performance was much better. I also tried increasing the page size, hoping that less paging would have happen, but this didn't really make a noticeable difference) I would really appreciate any hints on how to improve performance with this kind of setup... Thanks,
Re: [sqlite] use of index in ORDER BY clause
Jens Miltner wrote: I have a schema similar to this: CREATE TABLE foo (id integer primary key, name text); CREATE TABLE bar (id integer primary key, foo_id integer, something text); CREATE INDEX bar_idx on bar(foo_id, something); When I run a query like SELECT foo.id AS foo_id, bar.id AS bar_id FROM bar LEFT JOIN foo ON foo.id=bar.foo_id ORDER BY bar.something ; sqlite will only use the primary key index of foo (as the output of 'explain query plan' shows: 0|0|TABLE bar 1|1|TABLE foo USING PRIMARY KEY ) If I try to disable the foo primary key index by prefixing foo.id with a '+' sign, no index will be used. I'd have expected the bar_idx index to be used for the ORDER BY clause? Or is this some unreasonable assumption? Is there a way to enforce this? Thanks, Jens, Your query is not using any index to do the order by clause, it is sorting the intermediate results. There is no need to join the foo table in your query since the foo_id is available in the bar table. The following is equivalent: SELECT foo_id, bar.id AS bar_id FROM bar ORDER BY bar.something ; It still does a sort as shown by the explain output sqlite> .explain on sqlite> explain SELECT ...> foo_id, ...> bar.id AS bar_id ...> FROM ...> bar ...> ORDER BY ...> bar.something ...> ; addr opcode p1 p2 p3 -- -- -- - 0 OpenVirtual 1 3 keyinfo(1,BINARY) 1 Goto0 28 2 Integer 0 0 3 OpenRead0 3 4 SetNumColumns 0 3 5 Rewind 0 15 6 Column 0 1 7 Rowid 0 0 8 MakeRecord 2 0 9 Column 0 2 10Sequence1 0 11Pull2 0 12MakeRecord 3 0 13IdxInsert 1 0 14Next0 6 15Close 0 0 16OpenPseudo 2 0 17SetNumColumns 2 2 18Sort1 26 19Integer 1 0 20Column 1 2 21Insert 2 0 22Column 2 0 23Column 2 1 24Callback2 0 25Next1 19 26Close 2 0 27Halt0 0 28Transaction 0 0 29VerifyCookie0 3 30Goto0 2 31Noop0 0 sqlite> .explain off sqlite> explain query plan SELECT ...> foo_id, ...> bar.id AS bar_id ...> FROM ...> bar ...> ORDER BY ...> bar.something ...> ; 0|0|TABLE bar sqlite> To use the index to do the oder by the index must start with the columns used to order the output. If you create an index on the bar.something column then it will be used to scan the bar table in order rather than from start to finish by primary key id. sqlite> create index bar_someting on bar(something); sqlite> .explain on sqlite> explain SELECT ...> foo_id, ...> bar.id AS bar_id ...> FROM ...> bar ...> ORDER BY ...> bar.something ...> ; addr opcode p1 p2 p3 -- -- -- - 0 Noop0 0 1 Goto0 19 2 Integer 0 0 3 OpenRead0 3 4 SetNumColumns 0 3 5 Integer 0 0 6 OpenRead2 5 keyinfo(1,BINARY) 7 Rewind 2 16 8 RowKey 2 0 9 IdxIsNull 0 15 10IdxRowid2 0 11MoveGe 0 0 12Column 0 1 13Rowid 0 0 14Callback2 0 15Next2 8 16Close 0 0 17Close 2 0 18Halt0 0 19Transaction 0 0 20VerifyCookie0 4 21Goto0 2 22Noop0 0 sqlite> .explain off sqlite> sqlite> explain query plan SELECT ...> foo_id, ...> bar.id AS bar_id ...> FROM ...> bar ...> ORDER BY ...> bar.something ...> ; 0|0|TABLE bar WITH INDEX bar_someting sqlite> HTH Dennis Cote
Re: [sqlite] use of index in ORDER BY clause
On 6/27/06, Jens Miltner <[EMAIL PROTECTED]> wrote: When I run a query like SELECT foo.id AS foo_id, bar.id AS bar_id FROM bar LEFT JOIN foo ON foo.id=bar.foo_id ORDER BY bar.something I think it's using the join first to determine what goes in the output set, then sorting the output set using the ORDER BY. That seems to be correct to me. It might not be optimal but that would be extremely difficult to determine before executing the query. -- SqliteImporter and SqliteReplicator: Command line utilities for Sqlite http://www.reddawn.net/~jsprenkl/Sqlite Cthulhu Bucks! http://www.cthulhubucks.com