Re: [sqlite] use of index in ORDER BY clause

2006-06-30 Thread Jens Miltner


Am 29.06.2006 um 17:17 schrieb Dennis Cote:


Jens Miltner wrote:




Is there any way to improve the ORDER BY performance for joined  
queries? From your answer that the intermediate results are  
sorted, I take it no index won't ever be used when using ORDER BY  
with a join query?


Is there a way to rewrite the queries so we don't take the penalty  
of sorting without an index?


In our case, the 'bar' table may have several 100,000 entries and  
the 'foo' table probably has much less (say order of thousand).


A minimal query (i.e. one where we don't return a minimal amount  
of data, so the pure data shuffling doesn't get in the way) with a  
'bar' table with 250,000 entries and a 'foo' table with around 10  
entries (so the 250,000 entries in the 'bar' table relate to only  
a few records in the 'foo' table), a query like the above takes  
10-20 minutes to run (depending on the ORDER BY clauses used)!


(Side-note: The table does have quite a few columns and  
apparently, the amount of data per row also massively affects the  
performance... when I tried with the same table with not all  
columns filled in - i.e. less data - performance was much better.  
I also tried increasing the page size, hoping that less paging  
would have happen, but this didn't really make a noticeable  
difference)


I would really appreciate any hints on how to improve performance  
with this kind of setup...




Jens,

If you create an index on your bar.something column, it will be  
used to do the ordering. The log below shows how this query will be  
executed before and after creating this index.


Doh! You're right... I could have sworn when I tried, it didn't show  
an index on bar(something) to be used...
I probably confuse this with our real schema & query, which is  
slightly more complex and which I couldn't get to use an index on the  
ORDER BY column(s)...
Maybe it's because there sometimes are more than one sort column,  
even from distinct tables. I guess in that case, I'm probably stuck  
without an index?


Thanks for your help so far,







Re: [sqlite] use of index in ORDER BY clause

2006-06-29 Thread Dennis Cote

Jens Miltner wrote:




Is there any way to improve the ORDER BY performance for joined 
queries? From your answer that the intermediate results are sorted, I 
take it no index won't ever be used when using ORDER BY with a join 
query?


Is there a way to rewrite the queries so we don't take the penalty of 
sorting without an index?


In our case, the 'bar' table may have several 100,000 entries and the 
'foo' table probably has much less (say order of thousand).


A minimal query (i.e. one where we don't return a minimal amount of 
data, so the pure data shuffling doesn't get in the way) with a 'bar' 
table with 250,000 entries and a 'foo' table with around 10 entries 
(so the 250,000 entries in the 'bar' table relate to only a few 
records in the 'foo' table), a query like the above takes 10-20 
minutes to run (depending on the ORDER BY clauses used)!


(Side-note: The table does have quite a few columns and apparently, 
the amount of data per row also massively affects the performance... 
when I tried with the same table with not all columns filled in - i.e. 
less data - performance was much better. I also tried increasing the 
page size, hoping that less paging would have happen, but this didn't 
really make a noticeable difference)


I would really appreciate any hints on how to improve performance with 
this kind of setup...




Jens,

If you create an index on your bar.something column, it will be used to 
do the ordering. The log below shows how this query will be executed 
before and after creating this index.


   SQLite version 3.3.5
   Enter ".help" for instructions
   sqlite> CREATE TABLE foo (id integer primary key, name text);
   sqlite> CREATE TABLE bar (id integer primary key, foo_id integer, 
something text

   );
   sqlite>
   sqlite> CREATE INDEX bar_idx on bar(foo_id, something);
   sqlite>
   sqlite> .explain on
   sqlite> explain SELECT
  ...> foo.name AS name,
  ...> bar.something AS something
  ...> FROM
  ...> bar
  ...> LEFT JOIN
  ...> foo ON foo.id=bar.foo_id
  ...> ORDER BY
  ...> bar.something
  ...> ;
   addr  opcode  p1  p2  p3
     --  --  --  
-

   0 OpenVirtual 2   3   keyinfo(1,BINARY)
   1 Goto0   40
   2 Integer 0   0
   3 OpenRead0   3
   4 SetNumColumns   0   3
   5 Integer 0   0
   6 OpenRead1   2
   7 SetNumColumns   1   2
   8 Rewind  0   26
   9 MemInt  0   1
   10Column  0   1
   11MustBeInt   1   22
   12NotExists   1   22
   13MemInt  1   1
   14Column  1   1
   15Column  0   2
   16MakeRecord  2   0
   17Column  0   2
   18Sequence2   0
   19Pull2   0
   20MakeRecord  3   0
   21IdxInsert   2   0
   22IfMemPos1   25
   23NullRow 1   0
   24Goto0   13
   25Next0   9
   26Close   0   0
   27Close   1   0
   28OpenPseudo  3   0
   29SetNumColumns   3   2
   30Sort2   38
   31Integer 1   0
   32Column  2   2
   33Insert  3   0
   34Column  3   0
   35Column  3   1
   36Callback2   0
   37Next2   31
   38Close   3   0
   39Halt0   0
   40Transaction 0   0
   41VerifyCookie0   3
   42Goto0   2
   43Noop0   0
   sqlite> .explain off
   sqlite> explain query plan SELECT
  ...> foo.name AS name,
  ...> bar.something AS something
  ...> FROM
  ...> bar
  ...> LEFT JOIN
  ...> foo ON foo.id=bar.foo_id
  ...> ORDER BY
  ...> bar.something
  ...> ;
   0|0|TABLE bar
   1|1|TABLE foo USING PRIMARY KEY
   sqlite>
   sqlite> create index bar_some on bar(something);
   sqlite>
   sqlite> .explain on
   sqlite> explain SELECT
  ...> foo.name AS name,
  ...> bar.something AS something
  ...> FROM
  ...> bar
  ...> LEFT JOIN
  ...> foo ON foo.id=bar.foo_id
  ...> ORDER BY
  ...> bar.something
  ...> ;
   addr  opcode  p1  p2  p3
     --  --  --  
-

   0 Noop0   0
   1 Goto0   31
   2 Integer 0   0
   3 OpenRead0 

Re: [sqlite] use of index in ORDER BY clause

2006-06-29 Thread Jay Sprenkle

On 6/29/06, Jens Miltner <[EMAIL PROTECTED]> wrote:



Is there any way to improve the ORDER BY performance for joined
queries? From your answer that the intermediate results are sorted, I
take it no index won't ever be used when using ORDER BY with a join
query?


You can use the explain command to optimize queries:
http://sqlite.org/lang_explain.html

You might read through the optimizer notes to see how it works
rather than trust to my poor memory:
http://sqlite.org/optoverview.html


--
SqliteImporter and SqliteReplicator: Command line utilities for Sqlite
http://www.reddawn.net/~jsprenkl/Sqlite

Cthulhu Bucks!
http://www.cthulhubucks.com


Re: [sqlite] use of index in ORDER BY clause

2006-06-29 Thread Jens Miltner


Am 27.06.2006 um 18:06 schrieb Dennis Cote:


Jens Miltner wrote:

I have a schema similar to this:

CREATE TABLE foo (id integer primary key, name text);
CREATE TABLE bar (id integer primary key, foo_id integer,  
something text);


CREATE INDEX bar_idx on bar(foo_id, something);


When I run a query like

SELECT
foo.id AS foo_id,
bar.id AS bar_id
FROM
bar
LEFT JOIN
foo ON foo.id=bar.foo_id
ORDER BY
bar.something
;

sqlite will only use the primary key index of foo
(as the output of 'explain query plan' shows:

0|0|TABLE bar
1|1|TABLE foo USING PRIMARY KEY

)


If I try to disable the foo primary key index by prefixing foo.id  
with a '+' sign, no index will be used.


I'd have expected the bar_idx index to be used for the ORDER BY  
clause? Or is this some unreasonable assumption?

Is there a way to enforce this?

Thanks,




Jens,

Your query is not using any index to do the order by clause, it is  
sorting the intermediate results.


There is no need to join the foo table in your query since the  
foo_id is available in the bar table. The following is equivalent:


   SELECT
   foo_id,
   bar.id AS bar_id
   FROM
   bar
   ORDER BY
   bar.something
   ;



You're right, of course, but then again, this was just a (bad)  
example of what our table relations are... In reality, our tables are  
far larger and we do indeed need to join them, because not all  
information is available from the base table:


SELECT
foo.name AS name,
bar.something AS something
FROM
bar
LEFT JOIN
foo ON foo.id=bar.foo_id
ORDER BY
bar.something
;



Is there any way to improve the ORDER BY performance for joined  
queries? From your answer that the intermediate results are sorted, I  
take it no index won't ever be used when using ORDER BY with a join  
query?


Is there a way to rewrite the queries so we don't take the penalty of  
sorting without an index?


In our case, the 'bar' table may have several 100,000 entries and the  
'foo' table probably has much less (say order of thousand).


A minimal query (i.e. one where we don't return a minimal amount of  
data, so the pure data shuffling doesn't get in the way) with a 'bar'  
table with 250,000 entries and a 'foo' table with around 10 entries  
(so the 250,000 entries in the 'bar' table relate to only a few  
records in the 'foo' table), a query like the above takes 10-20  
minutes to run (depending on the ORDER BY clauses used)!


(Side-note: The table does have quite a few columns and apparently,  
the amount of data per row also massively affects the performance...  
when I tried with the same table with not all columns filled in -  
i.e. less data - performance was much better. I also tried increasing  
the page size, hoping that less paging would have happen, but this  
didn't really make a noticeable difference)


I would really appreciate any hints on how to improve performance  
with this kind of setup...


Thanks,




Re: [sqlite] use of index in ORDER BY clause

2006-06-27 Thread Dennis Cote

Jens Miltner wrote:

I have a schema similar to this:

CREATE TABLE foo (id integer primary key, name text);
CREATE TABLE bar (id integer primary key, foo_id integer, something 
text);


CREATE INDEX bar_idx on bar(foo_id, something);


When I run a query like

SELECT
foo.id AS foo_id,
bar.id AS bar_id
FROM
bar
LEFT JOIN
foo ON foo.id=bar.foo_id
ORDER BY
bar.something
;

sqlite will only use the primary key index of foo
(as the output of 'explain query plan' shows:

0|0|TABLE bar
1|1|TABLE foo USING PRIMARY KEY

)


If I try to disable the foo primary key index by prefixing foo.id with 
a '+' sign, no index will be used.


I'd have expected the bar_idx index to be used for the ORDER BY 
clause? Or is this some unreasonable assumption?

Is there a way to enforce this?

Thanks,




Jens,

Your query is not using any index to do the order by clause, it is 
sorting the intermediate results.


There is no need to join the foo table in your query since the foo_id is 
available in the bar table. The following is equivalent:


   SELECT
   foo_id,
   bar.id AS bar_id
   FROM
   bar
   ORDER BY
   bar.something
   ;

It still does a sort as shown by the explain output

   sqlite> .explain on
   sqlite> explain SELECT
  ...> foo_id,
  ...> bar.id AS bar_id
  ...> FROM
  ...> bar
  ...> ORDER BY
  ...> bar.something
  ...> ;
   addr  opcode  p1  p2  p3
     --  --  --  
-

   0 OpenVirtual 1   3   keyinfo(1,BINARY)
   1 Goto0   28
   2 Integer 0   0
   3 OpenRead0   3
   4 SetNumColumns   0   3
   5 Rewind  0   15
   6 Column  0   1
   7 Rowid   0   0
   8 MakeRecord  2   0
   9 Column  0   2
   10Sequence1   0
   11Pull2   0
   12MakeRecord  3   0
   13IdxInsert   1   0
   14Next0   6
   15Close   0   0
   16OpenPseudo  2   0
   17SetNumColumns   2   2
   18Sort1   26
   19Integer 1   0
   20Column  1   2
   21Insert  2   0
   22Column  2   0
   23Column  2   1
   24Callback2   0
   25Next1   19
   26Close   2   0
   27Halt0   0
   28Transaction 0   0
   29VerifyCookie0   3
   30Goto0   2
   31Noop0   0
   sqlite> .explain off
   sqlite> explain query plan SELECT
  ...> foo_id,
  ...> bar.id AS bar_id
  ...> FROM
  ...> bar
  ...> ORDER BY
  ...> bar.something
  ...> ;
   0|0|TABLE bar
   sqlite>

To use the index to do the oder by the index must start with the columns 
used to order the output. If you create an index on the bar.something 
column then it will be used to scan the bar table in order rather than 
from start to finish by primary key id.


   sqlite> create index bar_someting on bar(something);
   sqlite> .explain on
   sqlite> explain SELECT
  ...> foo_id,
  ...> bar.id AS bar_id
  ...> FROM
  ...> bar
  ...> ORDER BY
  ...> bar.something
  ...> ;
   addr  opcode  p1  p2  p3
     --  --  --  
-

   0 Noop0   0
   1 Goto0   19
   2 Integer 0   0
   3 OpenRead0   3
   4 SetNumColumns   0   3
   5 Integer 0   0
   6 OpenRead2   5   keyinfo(1,BINARY)
   7 Rewind  2   16
   8 RowKey  2   0
   9 IdxIsNull   0   15
   10IdxRowid2   0
   11MoveGe  0   0
   12Column  0   1
   13Rowid   0   0
   14Callback2   0
   15Next2   8
   16Close   0   0
   17Close   2   0
   18Halt0   0
   19Transaction 0   0
   20VerifyCookie0   4
   21Goto0   2
   22Noop0   0
   sqlite> .explain off
   sqlite>
   sqlite> explain query plan SELECT
  ...> foo_id,
  ...> bar.id AS bar_id
  ...> FROM
  ...> bar
  ...> ORDER BY
  ...> bar.something
  ...> ;
   0|0|TABLE bar WITH INDEX bar_someting
   sqlite>


HTH
Dennis Cote













Re: [sqlite] use of index in ORDER BY clause

2006-06-27 Thread Jay Sprenkle

On 6/27/06, Jens Miltner <[EMAIL PROTECTED]> wrote:

When I run a query like

SELECT
foo.id AS foo_id,
bar.id AS bar_id
FROM
bar
LEFT JOIN
foo ON foo.id=bar.foo_id
ORDER BY
bar.something



I think it's using the join first to determine what goes in the output set,
then sorting the output set using the ORDER BY.
That seems to be correct to me. It might not be optimal
but that would be extremely difficult to determine before executing
the query.


--
SqliteImporter and SqliteReplicator: Command line utilities for Sqlite
http://www.reddawn.net/~jsprenkl/Sqlite

Cthulhu Bucks!
http://www.cthulhubucks.com