Re: [sqlite] Order of columns in group by statement affects query performance

2013-04-26 Thread Hick Gunter
Do you have any experience with SQLite virtual tables? I guess not.

There are 20 issues here:

1) The abstract problem of choosing an Index for optimizing GROUP BY

2) the SQLite implementation (which I was referring to)

Ad 1)

Any index that covers all the GROUP BY fields is a good index because it 
allows aggregates to be computed on the fly as opposed to in a temporary 
table.

ORDER BY clause and multiple indices may complicate the matter


Ad 2)

SQLite attempts to handle virtual tables the same as native tables (which is 
one of the main reasons we chose SQLite). The VT interface does not allow 
publication of indexes nor creation of native indexes on virtual tables. The 
aOrderBy table of the interface implies an ordered list of fields, therefore 
SQLite would have to call xBestIndex n! times to discover the least costly 
index to use.

It is not unreasonable to assume that in a well designed SQL Statement the 
GROUP BY clause will be backed up by the necessary index and an identical ORDER 
BY clause (at least unintentionally by the programmer virtue of laziness 
resulting in copy-paste of the field list).

Thus the aOrderBy array being used for ORDER BY and GROUP BY in the VT 
interface.



http://www.sqlite.org/vtab.html

2.3 The xBestIndex Method
SQLite uses the xBestIndex method of a virtual table module to determine the 
best way to access the virtual table. The xBestIndex method has a prototype 
like this:

  int (*xBestIndex)(sqlite3_vtab *pVTab, sqlite3_index_info*);

...

Before calling this method, the SQLite core initializes an instance of the 
sqlite3_index_info structure with information about the query that it is 
currently trying to process. This information derives mainly from the WHERE 
clause and **ORDER BY or GROUP BY** clauses of the query, but also from any ON 
or USING clauses if the query is a join.


-Ursprüngliche Nachricht-
Von: James K. Lowden [mailto:jklow...@schemamania.org]
Gesendet: Donnerstag, 25. April 2013 16:34
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] Order of columns in group by statement affects query 
performance

On Thu, 25 Apr 2013 10:29:34 +0200
Hick Gunter h...@scigames.at wrote:

 AFAIK SQLite treats GROUP BY the same way as ORDER BY (taken from
 hints in the virtual table description).

That might be so, in some limited sense.  It's obviously false in general 
because they mean different things and have different effects.

 If you have an index that covers the GROUP BY clause in field order,
 then aggregate functions need store only the current value; if not,
 then you need an ephemeral table to hold the aggregate values.

Nonsense.  The query parser sees GROUP BY A,B.  The optimizer sees an index 
ordered B,A.  By permuting the order of the columns in the GROUP BY clause, it 
finds a match for the index and uses it.

Yes, the problem is O(n^2), where n is the number of columns in the GROUP BY, 
but n is always small; even 7 columns could be checked in less than 50 
iterations.

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


--
 Gunter Hick
Software Engineer
Scientific Games International GmbH
Klitschgasse 2 – 4, A - 1130 Vienna, Austria
FN 157284 a, HG Wien
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This e-mail is confidential and may well also be legally privileged. If you 
have received it in error, you are on notice as to its status and accordingly 
please notify us immediately by reply e-mail and then delete this message from 
your system. Please do not copy it or use it for any purposes, or disclose its 
contents to any person as to do so could be a breach of confidence. Thank you 
for your cooperation.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Order of columns in group by statement affects query performance

2013-04-26 Thread James K. Lowden
On Fri, 26 Apr 2013 11:12:15 +0200
Hick Gunter h...@scigames.at wrote:

 It is not unreasonable to assume that in a well designed SQL
 Statement the GROUP BY clause will be backed up by the necessary
 index and an identical ORDER BY clause

That is an entirely unreasonable assumption.  Order may or not matter.
I've often written GROUP BY queries ordered by the aggregate, or to
select the maximum.  

 Any index that covers all the GROUP BY fields is a good index
 because it allows aggregates to be computed on the fly as opposed
 to in a temporary table.

Agreed.  I hope it's clear now that covers is order-independent.  It
so happens, apparently, that the order in which the column names appear
in the GROUP BY determine whether or not the index is used.  Thats
unfortunate, because it makes two equivalent queries perform very
differently.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Order of columns in group by statement affects query performance

2013-04-25 Thread Hick Gunter
AFAIK SQLite treats GROUP BY the same way as ORDER BY (taken from hints in the 
virtual table description).

If you have an index that covers the GROUP BY clause in field order, then 
aggregate functions need store only the current value; if not, then you need an 
ephemeral table to hold the aggregate values.

In more detail:

IF you have an index that covers the GROUP BY clause in field order, then 
retrieving rows in index order guarantees that all rows belonging to the same 
group will be retrieved in one block AND makes the output rows come out sorted 
too. This allows SQLite to keep current aggregate values in registers.

IF you do not have an index that covers the GROUP BY clause, then rows will be 
retrieved in some deterministic order other than by group. You need to retrieve 
and update the current aggregate values for the group each row is in. This 
forces SQLite to keep current aggregate values in an ephemeral table.

IF you have an index that covers the GROUP BY clause in any other order, then 
you still have the guarantee that all rows belonging to the same group will be 
retrieved together, but the result rows will be ordered in index order and not 
GROUP BY order. This would probably require code paths presumably shared by 
GROUP BY and ORDER BY processing to be split.


-Ursprüngliche Nachricht-
Von: James K. Lowden [mailto:jklow...@schemamania.org]
Gesendet: Donnerstag, 25. April 2013 01:55
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] Order of columns in group by statement affects query 
performance

On Wed, 24 Apr 2013 17:46:00 +0100
Simon Slavin slav...@bigfraud.org wrote:

 On 24 Apr 2013, at 5:14pm, Igor Tandetnik i...@tandetnik.org wrote:
  Note though that the query doesn't have an ORDER BY clause. It
  doesn't request rows in any particular order. SQLite could, in
  principle, reorder columns in GROUP BY to take advantage of the
  index. I suppose the optimizer just happens to miss this particular
  opportunity.

 But the GROUP BY clause has an order:

  Query 1:  SELECT A,B,count(*) from tableTest group by A,B Query 2:
  SELECT A,B,count(*) from tableTest group by B,A

The order in which the columns appear syntactically in the GROUP BY clause is 
meaningless in SQL.

Igor is correct that the query processor could use any index beginning with B,A 
or A,B, should it so choose.

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


--
 Gunter Hick
Software Engineer
Scientific Games International GmbH
Klitschgasse 2 – 4, A - 1130 Vienna, Austria
FN 157284 a, HG Wien
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This e-mail is confidential and may well also be legally privileged. If you 
have received it in error, you are on notice as to its status and accordingly 
please notify us immediately by reply e-mail and then delete this message from 
your system. Please do not copy it or use it for any purposes, or disclose its 
contents to any person as to do so could be a breach of confidence. Thank you 
for your cooperation.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Order of columns in group by statement affects query performance

2013-04-25 Thread James K. Lowden
On Thu, 25 Apr 2013 10:29:34 +0200
Hick Gunter h...@scigames.at wrote:

 AFAIK SQLite treats GROUP BY the same way as ORDER BY (taken from
 hints in the virtual table description).

That might be so, in some limited sense.  It's obviously false in
general because they mean different things and have different effects.  

 If you have an index that covers the GROUP BY clause in field order,
 then aggregate functions need store only the current value; if not,
 then you need an ephemeral table to hold the aggregate values.

Nonsense.  The query parser sees GROUP BY A,B.  The optimizer sees an
index ordered B,A.  By permuting the order of the columns in the GROUP
BY clause, it finds a match for the index and uses it.  

Yes, the problem is O(n^2), where n is the number of columns in the
GROUP BY, but n is always small; even 7 columns could be checked in
less than 50 iterations.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Order of columns in group by statement affects query performance

2013-04-25 Thread Daniel Winter
2013/4/25 James K. Lowden jklow...@schemamania.org


 Nonsense.  The query parser sees GROUP BY A,B.  The optimizer sees an
 index ordered B,A.  By permuting the order of the columns in the GROUP
 BY clause, it finds a match for the index and uses it.

 Yes, the problem is O(n^2), where n is the number of columns in the
 GROUP BY, but n is always small; even 7 columns could be checked in
 less than 50 iterations.


I believe its O(n!), but still doable for small n.  I don't know the inner
workings of the query optimizer but mabye instead of asking/check for a
index of every permutation of the columns in the group by, it could just
check if an index exists which covers all columns (even the sorting order
doesn't matter). (the virtual table api needs an addition for that to work)
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Order of columns in group by statement affects query performance

2013-04-25 Thread Jay A. Kreibich
On Thu, Apr 25, 2013 at 10:29:34AM +0200, Hick Gunter scratched on the wall:
 AFAIK SQLite treats GROUP BY the same way as ORDER BY (taken from hints
 in the virtual table description).

  They're not the same clause, they don't do the same thing.

  Now, it is true that most database systems implement the first step
  of a GROUP BY by sorting the query using semantics that are similar to
  ORDER BY.  That way all of the rows in a related group are next to
  each other, and they're easier to process.  I assume SQLite does the
  same thing.

  It is, however, as they say, an implementation detail.

 IF you have an index that covers the GROUP BY clause in any other order,
 then you still have the guarantee that all rows belonging to the same
 group will be retrieved together, but the result rows will be ordered
 in index order and not GROUP BY order.

  Except there is no such thing as GROUP BY order.  SQL Golden Rule:
  If there is no ORDER BY, the rows have no order.  According to SQL, 
  neither the groups, nor the rows within a group (as they are fed into
  aggregates) have a defined order.  Any query that makes assumptions
  about the ordered result of a GROUP BY is broken.

  Use the out-of-order index.

   -j

-- 
Jay A. Kreibich  J A Y  @  K R E I B I.C H 

Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable. -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Order of columns in group by statement affects query performance

2013-04-25 Thread Jay A. Kreibich
On Thu, Apr 25, 2013 at 05:08:04PM +0200, Daniel Winter scratched on the wall:
 2013/4/25 James K. Lowden jklow...@schemamania.org
 
 
  Nonsense.  The query parser sees GROUP BY A,B.  The optimizer sees an
  index ordered B,A.  By permuting the order of the columns in the GROUP
  BY clause, it finds a match for the index and uses it.
 
  Yes, the problem is O(n^2), where n is the number of columns in the
  GROUP BY, but n is always small; even 7 columns could be checked in
  less than 50 iterations.
 
 
 I believe its O(n!), but still doable for small n.  I don't know the inner
 workings of the query optimizer but mabye instead of asking/check for a
 index of every permutation of the columns in the group by, it could just
 check if an index exists which covers all columns (even the sorting order
 doesn't matter). (the virtual table api needs an addition for that to work)

  Permutations are O(N!), but that's not really what you want.  Given a
  set of GROUP BY terms you want, generally, the index with the most
  terms in any initial order.  You don't need a full match for the
  index to be a win.  For example, GROUP BY A,B,C,D,E is likely to
  get a performance boost from an index on (A,D,B) and, *in general*,
  that should be a bigger win than an index on (B,C).

  Of course, since this is a query optimizer, there are always edge
  cases... For example, if there is an index over (E) that has 99%
  unique values, it is likely a better choice than (A,D,B)... it
  depends on the distribution of the index.  Similarly, if any GROUP BY
  term maps to a unique index... boom, you're done.

  As with most things having to do with query optimization, the problem
  quickly explodes.  On the other hand, SQLite must already have
  assumptions about index costs (with or without ANALYZE), so at least
  there's an existing set of weights and assumptions to work from.

   -j

-- 
Jay A. Kreibich  J A Y  @  K R E I B I.C H 

Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable. -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Order of columns in group by statement affects query performance

2013-04-25 Thread Simon Slavin

On 25 Apr 2013, at 4:23pm, Jay A. Kreibich j...@kreibi.ch wrote:

  Except there is no such thing as GROUP BY order.  SQL Golden Rule:
  If there is no ORDER BY, the rows have no order.  According to SQL, 
  neither the groups, nor the rows within a group (as they are fed into
  aggregates) have a defined order.  Any query that makes assumptions
  about the ordered result of a GROUP BY is broken.
 
  Use the out-of-order index.

GROUP BY on multiple columns means that the values in all those columns have to 
be the same for the rows to be included in the same GROUP.  It says nothing 
about the order those groups should appear in in the results of the SELECT.

Okay.  So adding this to what went by upthread, I was wrong.  Column order in 
the GROUP BY clause doesn't matter.  Therefore the upthread comment that GROUP 
BY A,B means exactly the same as GROUP BY B,A is correct.

So if there's an index which features those columns in any order it can be 
used, not matter what order the columns appear in in the GROUP BY clause.

So it was perfectly reasonable for the OP to wonder why an index was used for 
one order but not another.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Order of columns in group by statement affects query performance

2013-04-24 Thread Daniel Winter
Hello,

I am using sqlite3 (3.7.15.2 at the moment) in a project.

I discovered that the order of columns in a group by affects the
performance of a query. Is this expected?

For example:

Table:  Column A int, Column B int, Column C int
One Index:  A,B  (combined)

Query 1:  SELECT A,B,count(*) from tableTest group by A,B
Query 2:  SELECT A,B,count(*) from tableTest group by B,A

Query 1 will use the index, while query 2 will not. (which makes Query 1 a
lot faster with bigger tables). Both querys will result with the same data.
I do not really understand why it doesn't use the index for both querys.

I am also using a virtual table (based on a two dimensional array) (very
nice feature btw, sad that it is so unknown). I am seeing the same
behaviour there. It will only use the index if it has the exact same order
as the columns in the group by.

Thanks,

Daniel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Order of columns in group by statement affects query performance

2013-04-24 Thread Larry Brasfield
*Daniel Winter wrote:0*
 I discovered that the order of columns in a group by affects the
 performance of a query. Is this expected?

Yes.

 For example:

 Table:  Column A int, Column B int, Column C int
 One Index:  A,B  (combined)

 Query 1:  SELECT A,B,count(*) from tableTest group by A,B
 Query 2:  SELECT A,B,count(*) from tableTest group by B,A

 Query 1 will use the index, while query 2 will not. (which makes Query 1 a
 lot faster with bigger tables). Both querys will result with the same
data.
 I do not really understand why it doesn't use the index for both querys.

The index has row references pre-sorted according to the criteria specified
for it.  That is why it can be used to speed up a query which sorts by
those criteria.  There is no mystery as to why it does not help to speed up
other queries which do not sort by those criteria.  Looking at the query
plan would show the index used for your first query and not for the second.

A similar effect would be seen with joins which are typically processed as
a sifted merge of sorted components.

Best regards,
-- 
Larry
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Order of columns in group by statement affects query performance

2013-04-24 Thread Igor Tandetnik

On 4/24/2013 11:49 AM, Larry Brasfield wrote:

*Daniel Winter wrote:0*

Table:  Column A int, Column B int, Column C int
One Index:  A,B  (combined)

Query 1:  SELECT A,B,count(*) from tableTest group by A,B
Query 2:  SELECT A,B,count(*) from tableTest group by B,A

Query 1 will use the index, while query 2 will not. (which makes Query 1 a
lot faster with bigger tables). Both querys will result with the same

data.

I do not really understand why it doesn't use the index for both querys.


The index has row references pre-sorted according to the criteria specified
for it.  That is why it can be used to speed up a query which sorts by
those criteria.


Note though that the query doesn't have an ORDER BY clause. It doesn't 
request rows in any particular order. SQLite could, in principle, 
reorder columns in GROUP BY to take advantage of the index. I suppose 
the optimizer just happens to miss this particular opportunity.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Order of columns in group by statement affects query performance

2013-04-24 Thread Simon Slavin

On 24 Apr 2013, at 5:14pm, Igor Tandetnik i...@tandetnik.org wrote:

 Note though that the query doesn't have an ORDER BY clause. It doesn't 
 request rows in any particular order. SQLite could, in principle, reorder 
 columns in GROUP BY to take advantage of the index. I suppose the optimizer 
 just happens to miss this particular opportunity.

But the GROUP BY clause has an order:

 Query 1:  SELECT A,B,count(*) from tableTest group by A,B
 Query 2:  SELECT A,B,count(*) from tableTest group by B,A

which is pretty much the same as having two different ORDER BY clauses.  An 
index which is ideal for one of those is not ideal for the other.  I don't find 
it weird that the optimiser would decide to use an index for one of them but 
not the other.

To the original poster: do an ANALYZE, then see if you get different timings 
afterwards.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Order of columns in group by statement affects query performance

2013-04-24 Thread James K. Lowden
On Wed, 24 Apr 2013 17:46:00 +0100
Simon Slavin slav...@bigfraud.org wrote:

 On 24 Apr 2013, at 5:14pm, Igor Tandetnik i...@tandetnik.org wrote:
  Note though that the query doesn't have an ORDER BY clause. It
  doesn't request rows in any particular order. SQLite could, in
  principle, reorder columns in GROUP BY to take advantage of the
  index. I suppose the optimizer just happens to miss this particular
  opportunity.
 
 But the GROUP BY clause has an order:
 
  Query 1:  SELECT A,B,count(*) from tableTest group by A,B
  Query 2:  SELECT A,B,count(*) from tableTest group by B,A

The order in which the columns appear syntactically in the GROUP BY
clause is meaningless in SQL.  

Igor is correct that the query processor could use any index beginning
with B,A or A,B, should it so choose.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users