Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-27 Thread Віталій Тимчишин
2010/1/26 Matthew Wakeling matt...@flymine.org

 On Tue, 26 Jan 2010, Richard Neill wrote:

 SELECT SUM (case when id  120 and id  121 then 1 else 0 end)
 from tbl_tracker;

 Explain shows that this does a sequential scan.


 I'd defer to Tom on this one, but really, for Postgres to work this out, it
 would have to peer deep into the mysterious SUM function, and realise that
 the number zero is a noop. I suppose it would be possible, but you'd have to
 define noops for each of the different possible functions, *and* make the
 planner clever enough to spot the noop-matching number in the else and
 convert the WHEN into a WHERE.

 Hello.

How  about SELECT SUM (case when id  120 and id  121 then 1 end)
from tbl_tracker;
It gives same result (may be unless there are no records at all) and
optimizer already knows it need not to call function for null input. Such an
optimization would cover much more cases. It would look like:
 * Check only for aggregate subselects
 * All the functions should be noop for null input
 * Add ORed constraint for every function input is not null (in this example
(case when id  A1 and id  B1 then 1 end is not null) or (case when id  A2
and id  B2 then 1 end is not null) or ... or (case when id  An and id  Bn
then 1 end is not null)
 * Know special case (case when id  A1 and id  B1 then 1 end is not
null) = (id  A1 and id  B1)
by ORing all the when conditions case when C1 then D1 when C2 then D2 ...
when Cm then Dm end is not null = C1 or C2 or ... or Cm.
Event without last part it may give bonuses even for select count(field)
from table transformed into select count(field) from table where field is
not null and using [partial] indexes.
As of last *, replacing COUNT with SUM(CASE()) is used often enough when
multiple count calculations are needed.

Best regards, Vitalii Tymchyshyn


Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-27 Thread Matthew Wakeling

On Wed, 27 Jan 2010, Віталій Тимчишин wrote:

How  about SELECT SUM (case when id  120 and id  121 then 1 end)
from tbl_tracker;


That is very interesting.


* All the functions should be noop for null input


Alas, not true for COUNT(*), AVG(), etc.

Matthew

--
An optimist sees the glass as half full, a pessimist as half empty,
and an engineer as having redundant storage capacity.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-27 Thread Віталій Тимчишин
27 січня 2010 р. 19:01 Matthew Wakeling matt...@flymine.org написав:

 On Wed, 27 Jan 2010, Віталій Тимчишин wrote:

 How  about SELECT SUM (case when id  120 and id  121 then 1 end)
 from tbl_tracker;


 That is very interesting.


  * All the functions should be noop for null input


 Alas, not true for COUNT(*), AVG(), etc.

 select avg(b), count(b), count(*) from (values (2),(null))a(b)
gives  (2.0, 1, 2) for me, so AVG is in game. Sure, it won't work for
count(*), but optimizer already knows which aggregates are strict and which
are not, so no new information is needed.

Best regards, Vitalii Tymchyshyn


[PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-26 Thread Richard Neill

Dear All,

Just wondering whether there is a missing scope for the query planner 
(on 8.4.2) to be cleverer than it currently is.


Specifically, I wonder whether the optimiser should know that by 
converting a CASE condition into a WHERE condition, it can use an index.


Have I found a possible enhancement, or is this simply too hard to do?

Best wishes,

Richard



Example:


In this example, tbl_tracker has 255751 rows, with a primary key id, 
whose values lie uniformly in the range 1...1255750.


If one is trying to count multiple conditions, the following query seems 
to be the most obvious way to do it:


SELECT
  SUM (case when id  120 and id  121 then 1 else 0 end) AS c1,
  SUM (case when id  121 and id  122 then 1 else 0 end) AS c2,
  SUM (case when id  122 and id  123 then 1 else 0 end) AS c3,
  SUM (case when id  123 and id  124 then 1 else 0 end) AS c4,
  SUM (case when id  124 and id  125 then 1 else 0 end) AS c5
FROM tbl_tracker;


  c1  |  c2  |  c3  |  c4  |  c5
--+--+--+--+--
 2009 | 2018 | 2099 | 2051 | 2030

Time: 361.666 ms



This can be manually optimised into a far uglier (but much much faster) 
query:


SELECT * FROM
 (SELECT COUNT (1) AS c1 FROM tbl_tracker
WHERE id  120 and id  121) AS s1,
 (SELECT COUNT (1) AS c2 FROM tbl_tracker
WHERE id  121 and id  122) AS s2,
 (SELECT COUNT (1) AS c3 FROM tbl_tracker
WHERE id  122 and id  123) AS s3,
 (SELECT COUNT (1) AS c4 FROM tbl_tracker
WHERE id  123 and id  124) AS s4,
 (SELECT COUNT (1) AS c5 FROM tbl_tracker
WHERE id  124 and id  125) AS s5

  c1  |  c2  |  c3  |  c4  |  c5
--+--+--+--+--
 2009 | 2018 | 2099 | 2051 | 2030
(1 row)

Time: 21.091 ms





Debugging
-

The simple queries are:

SELECT SUM (case when id  120 and id  121 then 1 else 0 end) 
from tbl_tracker;


Time: 174.804 ms

Explain shows that this does a sequential scan.



SELECT COUNT(1) from tbl_tracker WHERE id  120 and id  121;

Time: 4.153 ms

Explain shows that this uses the index, as expected.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-26 Thread Tom Lane
Richard Neill rn...@cam.ac.uk writes:
 SELECT
SUM (case when id  120 and id  121 then 1 else 0 end) AS c1,
SUM (case when id  121 and id  122 then 1 else 0 end) AS c2,
...
 FROM tbl_tracker;

 This can be manually optimised into a far uglier (but much much faster) 
 query:

 SELECT * FROM
   (SELECT COUNT (1) AS c1 FROM tbl_tracker
  WHERE id  120 and id  121) AS s1,
   (SELECT COUNT (1) AS c2 FROM tbl_tracker
  WHERE id  121 and id  122) AS s2,
   ...

We're unlikely to consider doing this, for a couple of reasons:
it's unlikely to come up often enough to justify the cycles the planner
would spend looking for the case *on every query*, and it requires very
special knowledge about the behavior of two specific aggregate functions,
which is something the planner tends to avoid using.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-26 Thread Matthew Wakeling

On Tue, 26 Jan 2010, Richard Neill wrote:
SELECT SUM (case when id  120 and id  121 then 1 else 0 end) from 
tbl_tracker;


Explain shows that this does a sequential scan.


I'd defer to Tom on this one, but really, for Postgres to work this out, 
it would have to peer deep into the mysterious SUM function, and realise 
that the number zero is a noop. I suppose it would be possible, but you'd 
have to define noops for each of the different possible functions, *and* 
make the planner clever enough to spot the noop-matching number in the 
else and convert the WHEN into a WHERE.


In my mind, this is quite a lot of work for the planner to do to solve 
this one. That translates into quite a lot of work for some poor 
programmer to do to achieve it. If you have the money, then hire someone 
to do it!


Matthew

--
I don't want the truth. I want something I can tell parliament!
 -- Rt. Hon. Jim Hacker MP

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-26 Thread Richard Neill

Thanks for your answers.


David Wilson wrote:

 Why not simply add the where clause to the original query?

 SELECT
 SUM (case when id  120 and id  121 then 1 else 0 end) AS c1,
 SUM (case when id  121 and id  122 then 1 else 0 end) AS c2,
 SUM (case when id  122 and id  123 then 1 else 0 end) AS c3,
 SUM (case when id  123 and id  124 then 1 else 0 end) AS c4,
 SUM (case when id  124 and id  125 then 1 else 0 end) AS c5
 FROM tbl_tracker WHERE (id120) AND (id125);

 I didn't populate any test tables, but I'd expect that to do just as
 well without being any uglier than the original query is.

You're absolutely right, but I'm afraid this won't help. I'd simplified 
the original example query, but in real life, I've got about 50 
different sub-ranges, which cover virtually all the id-space.


--

Tom Lane wrote:

Richard Neill rn...@cam.ac.uk writes:

SELECT
   SUM (case when id  120 and id  121 then 1 else 0 end) AS c1,
   SUM (case when id  121 and id  122 then 1 else 0 end) AS c2,
   ...
FROM tbl_tracker;


This can be manually optimised into a far uglier (but much much faster) 
query:



SELECT * FROM
  (SELECT COUNT (1) AS c1 FROM tbl_tracker
 WHERE id  120 and id  121) AS s1,
  (SELECT COUNT (1) AS c2 FROM tbl_tracker
 WHERE id  121 and id  122) AS s2,
  ...


We're unlikely to consider doing this, for a couple of reasons:
it's unlikely to come up often enough to justify the cycles the planner
would spend looking for the case *on every query*, and it requires very
special knowledge about the behavior of two specific aggregate functions,
which is something the planner tends to avoid using.



OK - that's all I was wondering. I thought I'd raise this in case it 
might be helpful.


I'll add a note to:
http://www.postgresql.org/docs/8.4/interactive/functions-conditional.html
to point out that this is something of a trap for the unwary

Regards,

Richard

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-26 Thread Scott Carey

On Jan 26, 2010, at 9:41 AM, Richard Neill wrote:

 Thanks for your answers.
 
 
 David Wilson wrote:
 
 Why not simply add the where clause to the original query?
 
 SELECT
 SUM (case when id  120 and id  121 then 1 else 0 end) AS c1,
 SUM (case when id  121 and id  122 then 1 else 0 end) AS c2,
 SUM (case when id  122 and id  123 then 1 else 0 end) AS c3,
 SUM (case when id  123 and id  124 then 1 else 0 end) AS c4,
 SUM (case when id  124 and id  125 then 1 else 0 end) AS c5
 FROM tbl_tracker WHERE (id120) AND (id125);
 
 I didn't populate any test tables, but I'd expect that to do just as
 well without being any uglier than the original query is.
 
 You're absolutely right, but I'm afraid this won't help. I'd simplified 
 the original example query, but in real life, I've got about 50 
 different sub-ranges, which cover virtually all the id-space.
 

Well, it probably shouldn't use the index if it covers the vast majority of the 
table.  I wonder if it is actually faster to reformulate with WHERE or not at 
that point -- it might be slower.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance