Re: [GENERAL] Indexes on NULL's and order by ... limit N queries

2008-12-01 Thread Tom Lane
Maxim Boguk [EMAIL PROTECTED] writes:
 Looks like when indexed search over NULL's added, planner wasn't learned 
 right way use such index on  where something is NULL order by ... limit ... 
 queries.

There's nothing wrong with the plan; you've just got too many NULLs to
make it worth using the index for that.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Indexes on NULL's and order by ... limit N queries

2008-12-01 Thread Maxim Boguk

Sorry with all my respect to you, you look like wrong.
Here example:

With NULL's:
mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where 
cluster_weight.rubric_id IS NULL ORDER BY pos LIMIT 5;
  QUERY PLAN
---
 Limit  (cost=1503.75..1503.76 rows=5 width=28) (actual time=93.334..93.353 
rows=5 loops=1)
   -  Sort  (cost=1503.75..1569.84 rows=26435 width=28) (actual 
time=93.329..93.335 rows=5 loops=1)
 Sort Key: pos
 Sort Method:  top-N heapsort  Memory: 25kB
 -  Bitmap Heap Scan on cluster_weight  (cost=314.32..1064.67 
rows=26435 width=28) (actual time=7.519..48.678 rows=26435 loops=1)
   Recheck Cond: (rubric_id IS NULL)
   -  Bitmap Index Scan on cluster_weight_2  (cost=0.00..307.72 
rows=26435 width=0) (actual time=7.350..7.350 rows=26435 loops=1)
 Index Cond: (rubric_id IS NULL)
 Total runtime: 93.433 ms
(9 rows)

Now lets change NULL's to -1
mboguk_billing=# UPDATE cluster_weight set rubric_id=-1 where rubric_id IS NULL;
UPDATE 26435

And ANALYZE
mboguk_billing=# ANALYZE cluster_weight;
ANALYZE

And try same query with -1 instead of NULL:
mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where 
cluster_weight.rubric_id=-1 ORDER BY pos LIMIT 5;
  QUERY PLAN
---
 Limit  (cost=0.00..0.25 rows=5 width=28) (actual time=0.056..0.080 rows=5 
loops=1)
   -  Index Scan using cluster_weight_2 on cluster_weight  (cost=0.00..1334.41 
rows=26435 width=28) (actual time=0.053..0.065 rows=5 loops=1)
 Index Cond: (rubric_id = (-1))
 Total runtime: 0.133 ms
(4 rows)


And plan become normal. So issue not with too many NULL's in my dataset.


--
SY, Maxim Boguk

Tom Lane wrote:

Maxim Boguk [EMAIL PROTECTED] writes:
Looks like when indexed search over NULL's added, planner wasn't learned right way use such index on  where something is NULL order by ... limit ... 
queries.


There's nothing wrong with the plan; you've just got too many NULLs to
make it worth using the index for that.

regards, tom lane


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Indexes on NULL's and order by ... limit N queries

2008-12-01 Thread Alvaro Herrera
Maxim Boguk wrote:
 Sorry with all my respect to you, you look like wrong.

The difference is that the plan with -1 does not need to sort the
output, because it comes sorted out of the index; so the execution can
be stopped as soon as 5 tuples have come out.  With NULL, that can't be
done.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Indexes on NULL's and order by ... limit N queries

2008-12-01 Thread Maxim Boguk

Alvaro Herrera wrote:

Maxim Boguk wrote:

Sorry with all my respect to you, you look like wrong.


The difference is that the plan with -1 does not need to sort the
output, because it comes sorted out of the index; so the execution can
be stopped as soon as 5 tuples have come out.  With NULL, that can't be
done.


But why? NULL's have some special representation in index which don't work same 
as normal values?
Eg output with rubric_id is NULL dont come sorted from index?

Really my tests show same behavior of -1 and NULL values:

mboguk_billing=# SELECT pos from cluster_weight where rubric_id=-1 limit 20;
 pos
-
  20
  20
  25
  40
  40
  50
  60
  60
  80
  80
 100
 120
 140
 160
 180
 200
 220
 240
 260
 280
(20 rows)

mboguk_billing=# UPDATE cluster_weight set rubric_id=NULL where rubric_id=-1;
UPDATE 26435
mboguk_billing=# ANALYZE cluster_weight;
ANALYZE
mboguk_billing=# SELECT pos from cluster_weight where rubric_id is NULL limit 
20;
 pos
-
  20
  20
  25
  40
  40
  50
  60
  60
  80
  80
 100
 120
 140
 160
 180
 200
 220
 240
 260
 280
(20 rows)

Eg output with rubric_id is NULL come ordered be pos from index (rubric_id, pos)

(
Here is explains:

mboguk_billing=# EXPLAIN ANALYZE SELECT pos from cluster_weight where rubric_id 
is NULL limit 20;
  QUERY PLAN
---
 Limit  (cost=0.00..0.99 rows=20 width=2) (actual time=0.050..0.144 rows=20 
loops=1)
   -  Index Scan using cluster_weight_2 on cluster_weight  (cost=0.00..1314.94 
rows=26435 width=2) (actual time=0.045..0.082 rows=20 loops=1)
 Index Cond: (rubric_id IS NULL)
 Total runtime: 0.214 ms
(4 rows)

mboguk_billing=# UPDATE cluster_weight set rubric_id=-1 where rubric_id IS NULL;
UPDATE 26435
mboguk_billing=# ANALYZE cluster_weight;
ANALYZE

mboguk_billing=# EXPLAIN ANALYZE SELECT pos from cluster_weight where 
rubric_id=-1 limit 20;
  QUERY PLAN
---
 Limit  (cost=0.00..0.95 rows=20 width=2) (actual time=0.050..0.141 rows=20 
loops=1)
   -  Index Scan using cluster_weight_2 on cluster_weight  (cost=0.00..1259.05 
rows=26435 width=2) (actual time=0.045..0.081 rows=20 loops=1)
 Index Cond: (rubric_id = (-1))
 Total runtime: 0.214 ms
(4 rows)

Plans look same.
)

PS: REINDEX do not change situation.

--
SY Maxim Boguk

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Indexes on NULL's and order by ... limit N queries

2008-12-01 Thread Tom Lane
Maxim Boguk [EMAIL PROTECTED] writes:
 But why? NULL's have some special representation in index which don't work 
 same as normal values?

In general, NULLs don't work the same as normal values, no.

The reason this particular query isn't working as you are expecting is
that foo IS NULL isn't seen as an ordering constraint by the planner's
pathkey machinery, and so the query doesn't appear to match the index
order.  You could work around it by explicitly specifying a matching
ordering:

SELECT * from cluster_weight where cluster_weight.rubric_id IS NULL
ORDER BY rubric_id, pos LIMIT 5;
 ^^


regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general