Re: [PERFORM] Index order ignored after `is null` in query

2014-11-14 Thread Jim Nasby

On 11/7/14, 5:14 AM, Artūras Lapinskas wrote:

thanks for your time and answer. Not treating IS NULL as equality operator 
definitely helps me to make more sense out of previous explains.


You can also try creating a partial index WHERE b IS NULL. WHERE b IS NOT NULL 
can also sometimes be useful, though for different reasons.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index order ignored after `is null` in query

2014-11-07 Thread Artūras Lapinskas

Hi,

thanks for your time and answer. Not treating IS NULL as equality 
operator definitely helps me to make more sense out of previous 
explains.


--
Best Regard,
Artūras Lapinskas

On Thu, Nov 06, 2014 at 12:23:12PM -0500, Tom Lane wrote:

=?utf-8?Q?Art=C5=ABras?= Lapinskas  writes:

After some more investigation my wild guess would be that then nulls are
involved in query postgresql wants to double check whatever they are
really nulls in actual relation (maybe because of dead tuples).


No, it's much simpler than that: IS NULL is not an equality operator,
so it's not treated as constraining sort order.

What you're asking for amounts to building in an assumption that "all
nulls are equal", which is exactly not what the SQL semantics for NULL
say.  So I feel that you have probably chosen a bogus data design
that is misusing NULL for a purpose at variance with the SQL semantics.
That's likely to bite you on the rear in many more ways than this.

Even disregarding the question of whether it's semantically appropriate,
getting the planner to handle IS NULL this way would be a significant
amount of work.

regards, tom lane



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index order ignored after `is null` in query

2014-11-06 Thread Tom Lane
=?utf-8?Q?Art=C5=ABras?= Lapinskas  writes:
> After some more investigation my wild guess would be that then nulls are 
> involved in query postgresql wants to double check whatever they are 
> really nulls in actual relation (maybe because of dead tuples).

No, it's much simpler than that: IS NULL is not an equality operator,
so it's not treated as constraining sort order.

What you're asking for amounts to building in an assumption that "all
nulls are equal", which is exactly not what the SQL semantics for NULL
say.  So I feel that you have probably chosen a bogus data design
that is misusing NULL for a purpose at variance with the SQL semantics.
That's likely to bite you on the rear in many more ways than this.

Even disregarding the question of whether it's semantically appropriate,
getting the planner to handle IS NULL this way would be a significant
amount of work.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index order ignored after `is null` in query

2014-11-06 Thread Artūras Lapinskas
After some more investigation my wild guess would be that then nulls are 
involved in query postgresql wants to double check whatever they are 
really nulls in actual relation (maybe because of dead tuples). To do 
that it has to go and fetch pages from disk and the best way to do that 
is to use bitmap index. Sadly bitmaps tend to be not the best option 
when using limit in queries. Which would make sense, if it is really a 
need to synchronize index with relation...


--
Best Regard,
Artūras Lapinskas

On Wed, Nov 05, 2014 at 10:42:43PM +0100, Artūras Lapinskas wrote:

Hello,

I am having some hard time understanding how postgresql handles null 
values. As much I understand null values are stored in b-tree as 
simple values (put as last or first depending on index). But it seems 
that there is something really specific about them as postgresql 
deliberately ignores obvious (I think...) optimizations concerning 
index order after using one of them in a query. As a simple example 
look at table below:


arturas=# drop table if exists test;
DROP TABLE
arturas=# create table test (
arturas(#   a int not null,
arturas(#   b int,
arturas(#   c int not null
arturas(# );
CREATE TABLE

After filling this table with random data (actual distribution of 
null's/real values seams not to matter):


arturas=# insert into test (a, b, c)
arturas-#   select
arturas-# case when random() < 0.5 then 1 else 2 end
arturas-# , case when random() < 0.5 then null else 1 end
arturas-# , case when random() < 0.5 then 1 else 2 end
arturas-#   from generate_series(1, 100, 1) as gen;
INSERT 0 100

And creating index:

arturas=# create index test_idx on test (a, b nulls first, c);
CREATE INDEX

We get fast queries with `order by` on c:

arturas=# explain analyze verbose select * from test where a = 1 and b 
= 1 order by c limit 1;

QUERY PLAN  

---
 Limit  (cost=0.42..0.53 rows=1 width=12) (actual time=0.052..0.052 
rows=1 loops=1)
   Output: a, b, c
   ->  Index Only Scan using test_idx on public.test  
(cost=0.42..25890.42 rows=251433 width=12) (actual time=0.051..0.051 rows=1 
loops=1)
 Output: a, b, c
 Index Cond: ((test.a = 1) AND (test.b = 1))
 Heap Fetches: 1
 Total runtime: 0.084 ms
(7 rows)

But really slow ones if we search for null values of b:

arturas=# explain analyze verbose select * from test where a = 1 and b 
is null order by c limit 1;

 QUERY PLAN 

-
 Limit  (cost=15632.47..15632.47 rows=1 width=12) (actual 
time=138.127..138.127 rows=1 loops=1)
   Output: a, b, c
   ->  Sort  (cost=15632.47..16253.55 rows=248434 width=12) (actual 
time=138.127..138.127 rows=1 loops=1)
 Output: a, b, c
 Sort Key: test.c
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Bitmap Heap Scan on public.test  
(cost=6378.87..14390.30 rows=248434 width=12) (actual time=47.083..88.986 
rows=249243 loops=1)
   Output: a, b, c
   Recheck Cond: ((test.a = 1) AND (test.b IS 
NULL))
   ->  Bitmap Index Scan on test_idx  
(cost=0.00..6316.77 rows=248434 width=0) (actual time=46.015..46.015 rows=249243 
loops=1)
 Index Cond: ((test.a = 1) AND 
(test.b IS NULL))
 Total runtime: 138.200 ms
(12 rows)

Can someone please give some insight on this problem :)

P.S. I am using `select version()` => PostgreSQL 9.3.5 on 
x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 
64-bit, compiled from source with no default configuration changes.


--
Best Regard,
Artūras Lapinskas



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Index order ignored after `is null` in query

2014-11-05 Thread Artūras Lapinskas

Hello,

I am having some hard time understanding how postgresql handles null 
values. As much I understand null values are stored in b-tree as simple 
values (put as last or first depending on index). But it seems that 
there is something really specific about them as postgresql deliberately 
ignores obvious (I think...) optimizations concerning index order after 
using one of them in a query. As a simple example look at table below:


arturas=# drop table if exists test;
DROP TABLE
arturas=# create table test (
arturas(#   a int not null,
arturas(#   b int,
arturas(#   c int not null
arturas(# );
CREATE TABLE

After filling this table with random data (actual distribution of 
null's/real values seams not to matter):


arturas=# insert into test (a, b, c)
arturas-#   select
arturas-# case when random() < 0.5 then 1 else 2 end
arturas-# , case when random() < 0.5 then null else 1 end
arturas-# , case when random() < 0.5 then 1 else 2 end
arturas-#   from generate_series(1, 100, 1) as gen;
INSERT 0 100

And creating index:

arturas=# create index test_idx on test (a, b nulls first, c);
CREATE INDEX

We get fast queries with `order by` on c:

arturas=# explain analyze verbose select * from test where a = 1 and b = 1 
order by c limit 1;
   QUERY PLAN 
---

Limit  (cost=0.42..0.53 rows=1 width=12) (actual time=0.052..0.052 rows=1 
loops=1)
  Output: a, b, c
  ->  Index Only Scan using test_idx on public.test  (cost=0.42..25890.42 
rows=251433 width=12) (actual time=0.051..0.051 rows=1 loops=1)
Output: a, b, c
Index Cond: ((test.a = 1) AND (test.b = 1))
Heap Fetches: 1
Total runtime: 0.084 ms
(7 rows)

But really slow ones if we search for null values of b:

arturas=# explain analyze verbose select * from test where a = 1 and b is null 
order by c limit 1;
QUERY PLAN  
-

Limit  (cost=15632.47..15632.47 rows=1 width=12) (actual time=138.127..138.127 
rows=1 loops=1)
  Output: a, b, c
  ->  Sort  (cost=15632.47..16253.55 rows=248434 width=12) (actual 
time=138.127..138.127 rows=1 loops=1)
Output: a, b, c
Sort Key: test.c
Sort Method: top-N heapsort  Memory: 25kB
->  Bitmap Heap Scan on public.test  (cost=6378.87..14390.30 
rows=248434 width=12) (actual time=47.083..88.986 rows=249243 loops=1)
  Output: a, b, c
  Recheck Cond: ((test.a = 1) AND (test.b IS NULL))
  ->  Bitmap Index Scan on test_idx  (cost=0.00..6316.77 
rows=248434 width=0) (actual time=46.015..46.015 rows=249243 loops=1)
Index Cond: ((test.a = 1) AND (test.b IS NULL))
Total runtime: 138.200 ms
(12 rows)

Can someone please give some insight on this problem :)

P.S. I am using `select version()` => PostgreSQL 9.3.5 on 
x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 
64-bit, compiled from source with no default configuration changes.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Index order ignored after `is null` in query

2014-11-05 Thread Artūras Lapinskas

Hello,

I am having some hard time understanding how postgresql handles null 
values. As much I understand null values are stored in b-tree as simple 
values (put as last or first depending on index). But it seems that 
there is something really specific about them as postgresql deliberately 
ignores obvious (I think...) optimizations concerning index order after 
using one of them in a query. As a simple example look at table below:


arturas=# drop table if exists test;
DROP TABLE
arturas=# create table test (
arturas(#   a int not null,
arturas(#   b int,
arturas(#   c int not null
arturas(# );
CREATE TABLE

After filling this table with random data (actual distribution of 
null's/real values seams not to matter):


arturas=# insert into test (a, b, c)
arturas-#   select
arturas-# case when random() < 0.5 then 1 else 2 end
arturas-# , case when random() < 0.5 then null else 1 end
arturas-# , case when random() < 0.5 then 1 else 2 end
arturas-#   from generate_series(1, 100, 1) as gen;
INSERT 0 100

And creating index:

arturas=# create index test_idx on test (a, b nulls first, c);
CREATE INDEX

We get fast queries with `order by` on c:

arturas=# explain analyze verbose select * from test where a = 1 and b 
= 1 order by c limit 1;
	QUERY PLAN 
	---

 Limit  (cost=0.42..0.53 rows=1 width=12) (actual time=0.052..0.052 
rows=1 loops=1)
   Output: a, b, c
   ->  Index Only Scan using test_idx on public.test  
(cost=0.42..25890.42 rows=251433 width=12) (actual time=0.051..0.051 rows=1 
loops=1)
 Output: a, b, c
 Index Cond: ((test.a = 1) AND (test.b = 1))
 Heap Fetches: 1
 Total runtime: 0.084 ms
(7 rows)

But really slow ones if we search for null values of b:

arturas=# explain analyze verbose select * from test where a = 1 and b 
is null order by c limit 1;
	 QUERY PLAN  
	-

 Limit  (cost=15632.47..15632.47 rows=1 width=12) (actual 
time=138.127..138.127 rows=1 loops=1)
   Output: a, b, c
   ->  Sort  (cost=15632.47..16253.55 rows=248434 width=12) (actual 
time=138.127..138.127 rows=1 loops=1)
 Output: a, b, c
 Sort Key: test.c
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Bitmap Heap Scan on public.test  
(cost=6378.87..14390.30 rows=248434 width=12) (actual time=47.083..88.986 
rows=249243 loops=1)
   Output: a, b, c
   Recheck Cond: ((test.a = 1) AND (test.b IS 
NULL))
   ->  Bitmap Index Scan on test_idx  
(cost=0.00..6316.77 rows=248434 width=0) (actual time=46.015..46.015 rows=249243 
loops=1)
 Index Cond: ((test.a = 1) AND 
(test.b IS NULL))
 Total runtime: 138.200 ms
(12 rows)

Can someone please give some insight on this problem :)

P.S. I am using `select version()` => PostgreSQL 9.3.5 on 
x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 
64-bit, compiled from source with no default configuration changes.


--
Best Regard,
Artūras Lapinskas


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance