Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2016-01-03 Thread Emre Hasegeli
> But is it? Is it impossible for the BRIN bitmap index scan to return 0 rows
> (say, if the value being matched is outside the min/max boundary for every
> block range?) Granted, if we document that it always returns 0 and should be
> ignored, then confusing the actual 0 with the 0 as a representation of
> “unknown” would be less a problem.

How about -1 ?  It is an impossible value for sure.  Maybe we should
change BitmapAnd and BitmapOr nodes, too.  It is better to make it
obvious that it is not the correct value.  I don't think many people
would notice the note on the documentation.

On the other hand, returning -1 broke parser of explain.depesz.com [1].

[1] http://explain.depesz.com/s/tAkd


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2015-12-30 Thread Oleksii Kliukin

> On 30 Dec 2015, at 18:38, Emre Hasegeli  wrote:
> 
>> which is much closer to the actual number of rows removed by the index
>> recheck + the one left.
> 
> Is it better to be closer?  We are saying those are the "actual"
> values not the estimates.  If we cannot provide the actual rows, I
> think it is better to provide nothing.  Something closer to the
> reality would create more confusion.  Maybe, we just just return the
> number of blocks, and put somewhere a note about it.  The actual row
> count is already available on the upper part of the plan.

I don’t see how to solve this problem without changing explain analyze output 
to accommodate for “unknown” value. I don’t think “0” is a non-confusing 
representation of “unknown” for most people, and from the practical standpoint, 
a “best effort” estimate is better than 0 (i.e. I will be able to estimate how 
efficient BRIN index is for my tables in terms of the number of tuples 
retrieved/thrown away)

We might still reflect in the documentation that the BRIN index cannot produce 
the exact number of rows during the bitmap scan and point people asking similar 
questions there.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2015-12-30 Thread Tom Lane
Emre Hasegeli  writes:
>> which is much closer to the actual number of rows removed by the index
>> recheck + the one left.

> Is it better to be closer?  We are saying those are the "actual"
> values not the estimates.  If we cannot provide the actual rows, I
> think it is better to provide nothing.

Return zero you mean?  Good point; there's precedent for that elsewhere
in EXPLAIN ANALYZE, IIRC.  If we do what I propose in my patch, it would
possibly just lead to more questions like Oleksii's, because people would
be even more likely to believe that the number is accurate.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2015-12-30 Thread Oleksii Kliukin

> On 30 Dec 2015, at 17:02, Tom Lane  wrote:
> 
> Oleksii Kliukin  writes:
>> Bitmap Heap Scan on example  (cost=744.44..757.64 rows=6 width=0) (actual 
>> time=73.895..73.895 rows=0 loops=1)
>>   Output: 1
>>   Recheck Cond: (example.event_time = (now() - '5 mons'::interval))
>>   Rows Removed by Index Recheck: 4030
>>   Heap Blocks: lossy=128
>>   Buffers: shared hit=629
>>   ->  Bitmap Index Scan on example_event_time_idx1  (cost=0.00..744.41 
>> rows=6 width=0) (actual time=70.335..70.335 rows=1280 loops=1)
>> Index Cond: (example.event_time = (now() - '5 mons'::interval))
>> Buffers: shared hit=501
> 
>> - how does it get 1280 rows from the BRIN index scan, given that BRIN only 
>> stores pointers to the heap blocks, not individual rows. Does it calculate 
>> the number of rows in the blocks returned?
> 
> It evidently returned 128 block IDs to the heapscan logic.  I have not
> looked at the code, but a reasonable bet is that it's just guessing that
> there are 10 rows per block.

You are right, this is at the end of bringetbitmap in brin.c
   /*
 * XXX We have an approximation of the number of *pages* that our scan
 * returns, but we don't have a precise idea of the number of heap 
tuples
 * involved.
 */
PG_RETURN_INT64(totalpages * 10);


> 
> That seems like an awfully low number, though; it equates to assuming
> that rows are 800 bytes wide on average.  If we're going to use a fixed
> number, 100 rows per block would probably be more nearly the correct
> order of magnitude.
> 
> Another idea would be to use the heap's row density as calculated
> by the last ANALYZE (ie, reltuples/relpages), with a fallback to 100
> if relpages=0.  This'd only be convenient if the bitmap scan node has
> the parent heap rel open, which it might not.

+1

Kind regards,
--
Oleksii



Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2015-12-30 Thread Tom Lane
Oleksii Kliukin  writes:
>> On 30 Dec 2015, at 17:02, Tom Lane  wrote:
>> Another idea would be to use the heap's row density as calculated
>> by the last ANALYZE (ie, reltuples/relpages), with a fallback to 100
>> if relpages=0.  This'd only be convenient if the bitmap scan node has
>> the parent heap rel open, which it might not.

> +1

Any objections to the attached?

regards, tom lane

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2622a7e..64bf788 100644
*** a/src/backend/access/brin/brin.c
--- b/src/backend/access/brin/brin.c
*** bringetbitmap(PG_FUNCTION_ARGS)
*** 279,286 
  	Relation	heapRel;
  	BrinOpaque *opaque;
  	BlockNumber nblocks;
  	BlockNumber heapBlk;
! 	int			totalpages = 0;
  	FmgrInfo   *consistentFn;
  	MemoryContext oldcxt;
  	MemoryContext perRangeCxt;
--- 279,287 
  	Relation	heapRel;
  	BrinOpaque *opaque;
  	BlockNumber nblocks;
+ 	int			heap_tuples_per_page;
  	BlockNumber heapBlk;
! 	int64		totalpages = 0;
  	FmgrInfo   *consistentFn;
  	MemoryContext oldcxt;
  	MemoryContext perRangeCxt;
*** bringetbitmap(PG_FUNCTION_ARGS)
*** 291,301 
  
  	/*
  	 * We need to know the size of the table so that we know how long to
! 	 * iterate on the revmap.
  	 */
  	heapOid = IndexGetRelation(RelationGetRelid(idxRel), false);
  	heapRel = heap_open(heapOid, AccessShareLock);
  	nblocks = RelationGetNumberOfBlocks(heapRel);
  	heap_close(heapRel, AccessShareLock);
  
  	/*
--- 292,309 
  
  	/*
  	 * We need to know the size of the table so that we know how long to
! 	 * iterate on the revmap.  While we have it open, estimate the number of
! 	 * tuples per heap page for use later.
  	 */
  	heapOid = IndexGetRelation(RelationGetRelid(idxRel), false);
  	heapRel = heap_open(heapOid, AccessShareLock);
  	nblocks = RelationGetNumberOfBlocks(heapRel);
+ 	if (heapRel->rd_rel->relpages != 0 && heapRel->rd_rel->reltuples > 0)
+ 		heap_tuples_per_page = (int)
+ 			((double) heapRel->rd_rel->reltuples /
+ 			 (BlockNumber) heapRel->rd_rel->relpages);
+ 	else	/* if no info, assume 100-byte tuples */
+ 		heap_tuples_per_page = BLCKSZ / 100;
  	heap_close(heapRel, AccessShareLock);
  
  	/*
*** bringetbitmap(PG_FUNCTION_ARGS)
*** 447,457 
  		ReleaseBuffer(buf);
  
  	/*
! 	 * XXX We have an approximation of the number of *pages* that our scan
! 	 * returns, but we don't have a precise idea of the number of heap tuples
! 	 * involved.
  	 */
! 	PG_RETURN_INT64(totalpages * 10);
  }
  
  /*
--- 455,465 
  		ReleaseBuffer(buf);
  
  	/*
! 	 * We have an approximation of the number of pages that our scan returns,
! 	 * but we don't have a precise idea of the number of heap tuples involved.
! 	 * We have to estimate based on average tuple density.
  	 */
! 	PG_RETURN_INT64(totalpages * heap_tuples_per_page);
  }
  
  /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2015-12-30 Thread Emre Hasegeli
> which is much closer to the actual number of rows removed by the index
> recheck + the one left.

Is it better to be closer?  We are saying those are the "actual"
values not the estimates.  If we cannot provide the actual rows, I
think it is better to provide nothing.  Something closer to the
reality would create more confusion.  Maybe, we just just return the
number of blocks, and put somewhere a note about it.  The actual row
count is already available on the upper part of the plan.

By the way, the estimation is a bigger problem than that.  Please see
my patch [1] about it.

[1] 
http://www.postgresql.org/message-id/cae2gyzzjvzpy-1csgzjjyh69izsa13segfc4i4r2z0qbq2p...@mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2015-12-30 Thread Tom Lane
Oleksii Kliukin  writes:
>  Bitmap Heap Scan on example  (cost=744.44..757.64 rows=6 width=0) (actual 
> time=73.895..73.895 rows=0 loops=1)
>Output: 1
>Recheck Cond: (example.event_time = (now() - '5 mons'::interval))
>Rows Removed by Index Recheck: 4030
>Heap Blocks: lossy=128
>Buffers: shared hit=629
>->  Bitmap Index Scan on example_event_time_idx1  (cost=0.00..744.41 
> rows=6 width=0) (actual time=70.335..70.335 rows=1280 loops=1)
>  Index Cond: (example.event_time = (now() - '5 mons'::interval))
>  Buffers: shared hit=501

> - how does it get 1280 rows from the BRIN index scan, given that BRIN only 
> stores pointers to the heap blocks, not individual rows. Does it calculate 
> the number of rows in the blocks returned?

It evidently returned 128 block IDs to the heapscan logic.  I have not
looked at the code, but a reasonable bet is that it's just guessing that
there are 10 rows per block.

That seems like an awfully low number, though; it equates to assuming
that rows are 800 bytes wide on average.  If we're going to use a fixed
number, 100 rows per block would probably be more nearly the correct
order of magnitude.

Another idea would be to use the heap's row density as calculated
by the last ANALYZE (ie, reltuples/relpages), with a fallback to 100
if relpages=0.  This'd only be convenient if the bitmap scan node has
the parent heap rel open, which it might not.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2015-12-30 Thread Emre Hasegeli
> I don’t see how to solve this problem without changing explain analyze output 
> to accommodate for “unknown” value. I don’t think “0” is a non-confusing 
> representation of “unknown” for most people, and from the practical 
> standpoint, a “best effort” estimate is better than 0 (i.e. I will be able to 
> estimate how efficient BRIN index is for my tables in terms of the number of 
> tuples retrieved/thrown away)

The number of retrieved and thrown away rows are already available on
the upper part of the plan.  Bitmap Index Scan should provide the rows
that matched the index.  Another alternative would be just returning
the number of matching pages (by not multiplying with 10).  It might
be better understood.  The users who can understand the EXPLAIN
ANALYZE output shouldn't be expecting BRIN to return rows.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2015-12-30 Thread Oleksii Kliukin

> On 30 Dec 2015, at 21:12, Tom Lane  wrote:
> 
> Emre Hasegeli  writes:
>>> I don’t see how to solve this problem without changing explain analyze 
>>> output to accommodate for “unknown” value. I don’t think “0” is a 
>>> non-confusing representation of “unknown” for most people, and from the 
>>> practical standpoint, a “best effort” estimate is better than 0 (i.e. I 
>>> will be able to estimate how efficient BRIN index is for my tables in terms 
>>> of the number of tuples retrieved/thrown away)
> 
> We do already have a nearby precedent for returning zero when we don't
> have an accurate answer: that's what BitmapAnd and BitmapOr plan nodes
> do.  (This is documented btw, at the bottom of section 14.1.)

+1 for following a precedent.

> 
>> The number of retrieved and thrown away rows are already available on
>> the upper part of the plan.  Bitmap Index Scan should provide the rows
>> that matched the index.
> 
> It doesn't have that information.
> 
>> Another alternative would be just returning
>> the number of matching pages (by not multiplying with 10).  It might
>> be better understood.
> 
> No, it would not, at least not unless we found a way to explicitly mark
> the output as being blocks not rows (which would doubtless break a lot of
> existing client-side code).  Zero is fairly clearly an impossible value,
> whereas anything that's not zero is going to be taken at face value by
> many users.

But is it? Is it impossible for the BRIN bitmap index scan to return 0 rows 
(say, if the value being matched is outside the min/max boundary for every 
block range?) Granted, if we document that it always returns 0 and should be 
ignored, then confusing the actual 0 with the 0 as a representation of 
“unknown” would be less a problem. 

--
Oleksii



Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2015-12-30 Thread Tom Lane
Emre Hasegeli  writes:
>> I don’t see how to solve this problem without changing explain analyze 
>> output to accommodate for “unknown” value. I don’t think “0” is a 
>> non-confusing representation of “unknown” for most people, and from the 
>> practical standpoint, a “best effort” estimate is better than 0 (i.e. I 
>> will be able to estimate how efficient BRIN index is for my tables in terms 
>> of the number of tuples retrieved/thrown away)

We do already have a nearby precedent for returning zero when we don't
have an accurate answer: that's what BitmapAnd and BitmapOr plan nodes
do.  (This is documented btw, at the bottom of section 14.1.)

> The number of retrieved and thrown away rows are already available on
> the upper part of the plan.  Bitmap Index Scan should provide the rows
> that matched the index.

It doesn't have that information.

> Another alternative would be just returning
> the number of matching pages (by not multiplying with 10).  It might
> be better understood.

No, it would not, at least not unless we found a way to explicitly mark
the output as being blocks not rows (which would doubtless break a lot of
existing client-side code).  Zero is fairly clearly an impossible value,
whereas anything that's not zero is going to be taken at face value by
many users.

On balance I think likely the best thing to do is return zero, and
document that behavior.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2015-12-30 Thread Alvaro Herrera
Tom Lane wrote:
> Emre Hasegeli  writes:
> >> I don’t see how to solve this problem without changing explain analyze 
> >> output to accommodate for “unknown” value. I don’t think “0” is a 
> >> non-confusing representation of “unknown” for most people, and from the 
> >> practical standpoint, a “best effort” estimate is better than 0 (i.e. I 
> >> will be able to estimate how efficient BRIN index is for my tables in 
> >> terms of the number of tuples retrieved/thrown away)
> 
> We do already have a nearby precedent for returning zero when we don't
> have an accurate answer: that's what BitmapAnd and BitmapOr plan nodes
> do.  (This is documented btw, at the bottom of section 14.1.)

Hmm, but amgetbitmap is documented thusly:

The number of tuples fetched is returned (this might be just an
approximate count, for instance some AMs do not detect
duplicates).
http://www.postgresql.org/docs/9.5/static/index-functions.html

so I'm not sure we're actually violating an expectation that the number
of rows is exact.  I mean, if we zero out this one, shouldn't we set it
to zero for these other documented cases too?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2015-12-30 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> We do already have a nearby precedent for returning zero when we don't
>> have an accurate answer: that's what BitmapAnd and BitmapOr plan nodes
>> do.  (This is documented btw, at the bottom of section 14.1.)

> Hmm, but amgetbitmap is documented thusly:

>   The number of tuples fetched is returned (this might be just an
>   approximate count, for instance some AMs do not detect
>   duplicates).
>   http://www.postgresql.org/docs/9.5/static/index-functions.html

> so I'm not sure we're actually violating an expectation that the number
> of rows is exact.  I mean, if we zero out this one, shouldn't we set it
> to zero for these other documented cases too?

Well, the code comments may say that, but the user-facing docs don't ...

More generally, people are accustomed to the idea that the planner's
estimated rowcounts are just estimates, but they're not accustomed to
the idea that the runtime "actual" rowcounts might be just estimates.
That seems like it's moving EXPLAIN ANALYZE's contract quite a bit,
even if there are some documents for internal APIs that say something
else.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2015-12-30 Thread Oleksii Kliukin

> On 30 Dec 2015, at 17:44, Tom Lane  wrote:
> 
> Oleksii Kliukin  writes:
>>> On 30 Dec 2015, at 17:02, Tom Lane  wrote:
>>> Another idea would be to use the heap's row density as calculated
>>> by the last ANALYZE (ie, reltuples/relpages), with a fallback to 100
>>> if relpages=0.  This'd only be convenient if the bitmap scan node has
>>> the parent heap rel open, which it might not.
> 
>> +1
> 
> Any objections to the attached?

Looks good to me. On my sample system with 100K rows, the new version gives me:

— CREATE TABLE test AS SELECT id FROM generate_series(1,10) id;
— CREATE INDEX ON test USING brin(id);

postgres=# explain analyze select 1 from test where id = 500;
   QUERY PLAN
-
 Bitmap Heap Scan on test  (cost=12.01..16.02 rows=1 width=0) (actual 
time=0.199..4.220 rows=1 loops=1)
   Recheck Cond: (id = 500)
   Rows Removed by Index Recheck: 28927
   Heap Blocks: lossy=128
   ->  Bitmap Index Scan on test_id_idx  (cost=0.00..12.01 rows=1 width=0) 
(actual time=0.072..0.072 rows=28800 loops=1)
 Index Cond: (id = 500)
 Planning time: 0.433 ms
 Execution time: 4.323 ms
(8 rows)

which is much closer to the actual number of rows removed by the index recheck 
+ the one left.

--
Oleksii



[HACKERS] rows estimate in explain analyze for the BRIN index

2015-12-30 Thread Oleksii Kliukin
Hi,

While experimenting with BRIN on PostgreSQL 9.5RC1 I came across the following 
plan (which is, btw a very good example of how BRIN rocks for the clustered 
data, the size of this table is around 90GB, the size of the index is around 
3MB):

explain (analyze, buffers, verbose) select 1 from example where event_time = 
now() - interval '5 months';
  QUERY PLAN
---
 Bitmap Heap Scan on example  (cost=744.44..757.64 rows=6 width=0) (actual 
time=73.895..73.895 rows=0 loops=1)
   Output: 1
   Recheck Cond: (example.event_time = (now() - '5 mons'::interval))
   Rows Removed by Index Recheck: 4030
   Heap Blocks: lossy=128
   Buffers: shared hit=629
   ->  Bitmap Index Scan on example_event_time_idx1  (cost=0.00..744.41 rows=6 
width=0) (actual time=70.335..70.335 rows=1280 loops=1)
 Index Cond: (example.event_time = (now() - '5 mons'::interval))
 Buffers: shared hit=501
 Planning time: 0.125 ms
 Execution time: 73.943 ms
(11 rows)

Time: 74.642 ms


Here EXPLAIN ANALYZE reports 1280 rows from the Bitmap Index Scan, but on the 
higher level, 4030 rows were removed by Index Recheck. 

The questions are:

- how does it get 1280 rows from the BRIN index scan, given that BRIN only 
stores pointers to the heap blocks, not individual rows. Does it calculate the 
number of rows in the blocks returned?

- how comes that the subsequent recheck filters out 4030 rows, out of 
supposedly 1280 returned?

Kind regards,
--
Oleksii