Re: [PERFORM] Index order ignored after `is null` in query

2014-11-06 Thread Artūras Lapinskas
After some more investigation my wild guess would be that then nulls are 
involved in query postgresql wants to double check whatever they are 
really nulls in actual relation (maybe because of dead tuples). To do 
that it has to go and fetch pages from disk and the best way to do that 
is to use bitmap index. Sadly bitmaps tend to be not the best option 
when using limit in queries. Which would make sense, if it is really a 
need to synchronize index with relation...


--
Best Regard,
Artūras Lapinskas

On Wed, Nov 05, 2014 at 10:42:43PM +0100, Artūras Lapinskas wrote:

Hello,

I am having some hard time understanding how postgresql handles null 
values. As much I understand null values are stored in b-tree as 
simple values (put as last or first depending on index). But it seems 
that there is something really specific about them as postgresql 
deliberately ignores obvious (I think...) optimizations concerning 
index order after using one of them in a query. As a simple example 
look at table below:


arturas=# drop table if exists test;
DROP TABLE
arturas=# create table test (
arturas(#   a int not null,
arturas(#   b int,
arturas(#   c int not null
arturas(# );
CREATE TABLE

After filling this table with random data (actual distribution of 
null's/real values seams not to matter):


arturas=# insert into test (a, b, c)
arturas-#   select
arturas-# case when random()  0.5 then 1 else 2 end
arturas-# , case when random()  0.5 then null else 1 end
arturas-# , case when random()  0.5 then 1 else 2 end
arturas-#   from generate_series(1, 100, 1) as gen;
INSERT 0 100

And creating index:

arturas=# create index test_idx on test (a, b nulls first, c);
CREATE INDEX

We get fast queries with `order by` on c:

arturas=# explain analyze verbose select * from test where a = 1 and b 
= 1 order by c limit 1;

QUERY PLAN  

---
 Limit  (cost=0.42..0.53 rows=1 width=12) (actual time=0.052..0.052 
rows=1 loops=1)
   Output: a, b, c
   -  Index Only Scan using test_idx on public.test  
(cost=0.42..25890.42 rows=251433 width=12) (actual time=0.051..0.051 rows=1 
loops=1)
 Output: a, b, c
 Index Cond: ((test.a = 1) AND (test.b = 1))
 Heap Fetches: 1
 Total runtime: 0.084 ms
(7 rows)

But really slow ones if we search for null values of b:

arturas=# explain analyze verbose select * from test where a = 1 and b 
is null order by c limit 1;

 QUERY PLAN 

-
 Limit  (cost=15632.47..15632.47 rows=1 width=12) (actual 
time=138.127..138.127 rows=1 loops=1)
   Output: a, b, c
   -  Sort  (cost=15632.47..16253.55 rows=248434 width=12) (actual 
time=138.127..138.127 rows=1 loops=1)
 Output: a, b, c
 Sort Key: test.c
 Sort Method: top-N heapsort  Memory: 25kB
 -  Bitmap Heap Scan on public.test  
(cost=6378.87..14390.30 rows=248434 width=12) (actual time=47.083..88.986 
rows=249243 loops=1)
   Output: a, b, c
   Recheck Cond: ((test.a = 1) AND (test.b IS 
NULL))
   -  Bitmap Index Scan on test_idx  
(cost=0.00..6316.77 rows=248434 width=0) (actual time=46.015..46.015 rows=249243 
loops=1)
 Index Cond: ((test.a = 1) AND 
(test.b IS NULL))
 Total runtime: 138.200 ms
(12 rows)

Can someone please give some insight on this problem :)

P.S. I am using `select version()` = PostgreSQL 9.3.5 on 
x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 
64-bit, compiled from source with no default configuration changes.


--
Best Regard,
Artūras Lapinskas



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index order ignored after `is null` in query

2014-11-06 Thread Tom Lane
=?utf-8?Q?Art=C5=ABras?= Lapinskas arturasl...@gmail.com writes:
 After some more investigation my wild guess would be that then nulls are 
 involved in query postgresql wants to double check whatever they are 
 really nulls in actual relation (maybe because of dead tuples).

No, it's much simpler than that: IS NULL is not an equality operator,
so it's not treated as constraining sort order.

What you're asking for amounts to building in an assumption that all
nulls are equal, which is exactly not what the SQL semantics for NULL
say.  So I feel that you have probably chosen a bogus data design
that is misusing NULL for a purpose at variance with the SQL semantics.
That's likely to bite you on the rear in many more ways than this.

Even disregarding the question of whether it's semantically appropriate,
getting the planner to handle IS NULL this way would be a significant
amount of work.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pgtune + configurations with 9.3

2014-11-06 Thread Josh Berkus
On 10/29/2014 11:49 PM, Tory M Blue wrote:
 I looked at pgtune again today and the numbers it's spitting out took me
 back, they are huge. From all historical conversations and attempts a few
 of these larger numbers netted reduced performance vs better performance
 (but that was on older versions of Postgres).

Yeah, pgTune is pretty badly out of date.  It's been on my TODO list, as
I'm sure it has been on Greg's.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Postgres does not use indexes with OR-conditions

2014-11-06 Thread arhipov

Hello,

I have just came across interesting Postgres behaviour with 
OR-conditions. Are there any chances that the optimizer will handle this 
situation in the future?


select *
from commons.financial_documents fd
where fd.creation_time = '2011-11-07 10:39:07.285022+08'
order by fd.creation_time desc
limit 200

Limit  (cost=4.30..44.50 rows=200 width=67) (actual time=0.032..1.376 
rows=200 loops=1)
  -  Index Scan Backward using financial_document_creation_time_index 
on financial_documents fd (cost=4.30..292076.25 rows=1453075 width=67) 
(actual time=0.027..0.683 rows=200 loops=1)
Index Cond: (creation_time = '2011-11-07 
11:39:07.285022+09'::timestamp with time zone)

Total runtime: 1.740 ms

select *
from commons.financial_documents fd
where fd.creation_time = '2011-11-07 10:39:07.285022+08'
   or fd.creation_time  '2011-11-07 10:39:07.285022+08'
order by fd.creation_time desc
limit 200

Limit  (cost=4.30..71.76 rows=200 width=67) (actual 
time=1067.935..1069.126 rows=200 loops=1)
  -  Index Scan Backward using financial_document_creation_time_index 
on financial_documents fd (cost=4.30..490104.07 rows=1453075 width=67) 
(actual time=1067.927..1068.532 rows=200 loops=1)
Filter: ((creation_time = '2011-11-07 
11:39:07.285022+09'::timestamp with time zone) OR (creation_time  
'2011-11-07 11:39:07.285022+09'::timestamp with time zone))

Rows Removed by Filter: 776785
Total runtime: 1069.480 ms



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres does not use indexes with OR-conditions

2014-11-06 Thread David Rowley
On Fri, Nov 7, 2014 at 5:16 PM, arhipov arhi...@dc.baikal.ru wrote:

 Hello,

 I have just came across interesting Postgres behaviour with OR-conditions.
 Are there any chances that the optimizer will handle this situation in the
 future?

 select *
 from commons.financial_documents fd
 where fd.creation_time = '2011-11-07 10:39:07.285022+08'
 order by fd.creation_time desc
 limit 200

 select *
 from commons.financial_documents fd
 where fd.creation_time = '2011-11-07 10:39:07.285022+08'
or fd.creation_time  '2011-11-07 10:39:07.285022+08'
 order by fd.creation_time desc
 limit 200


 It would certainly be possible, providing the constants compare equally,
but... Question: Would you really want to pay a, say 1% increase in
planning time for ALL queries, so that you could have this unique case of
queries perform better at execution time?

Is there a valid reason why you don't just write the query with the =
operator?

Regards

David Rowley


Re: [PERFORM] Postgres does not use indexes with OR-conditions

2014-11-06 Thread Vlad Arkhipov

It was just a minimal example. The real query looks like this.

select *
from commons.financial_documents fd
where fd.creation_time  '2011-11-07 10:39:07.285022+08'
  or (fd.creation_time = '2011-11-07 10:39:07.285022+08' and 
fd.financial_document_id  100)

order by fd.creation_time desc
limit 200

I need to rewrite it in the way below to make Postgres use the index.

select *
from commons.financial_documents fd
where fd.creation_time = '2011-11-07 10:39:07.285022+08'
  and (
fd.creation_time  '2011-11-07 10:39:07.285022+08'
  or (fd.creation_time = '2011-11-07 10:39:07.285022+08' and 
fd.financial_document_id  100)

  )
order by fd.creation_time desc
limit 200

On 11/07/2014 12:38 PM, David Rowley wrote:
On Fri, Nov 7, 2014 at 5:16 PM, arhipov arhi...@dc.baikal.ru 
mailto:arhi...@dc.baikal.ru wrote:


Hello,

I have just came across interesting Postgres behaviour with
OR-conditions. Are there any chances that the optimizer will
handle this situation in the future?

select *
from commons.financial_documents fd
where fd.creation_time = '2011-11-07 10:39:07.285022+08'
order by fd.creation_time desc
limit 200

select *
from commons.financial_documents fd
where fd.creation_time = '2011-11-07 10:39:07.285022+08'
   or fd.creation_time  '2011-11-07 10:39:07.285022+08'
order by fd.creation_time desc
limit 200


 It would certainly be possible, providing the constants compare 
equally, but... Question: Would you really want to pay a, say 1% 
increase in planning time for ALL queries, so that you could have this 
unique case of queries perform better at execution time?


Is there a valid reason why you don't just write the query with the = 
operator?


Regards

David Rowley