Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2018-03-22 Thread Tomas Vondra
On 03/22/2018 08:51 PM, Tom Lane wrote:
> I wrote:
>> Tomas Vondra  writes:
>>> The 0002 part is the main part, unifying the definition of reltuples on
>>> three main places:
> 
>> On to this part ...
> 
> I've pushed 0002 with several corrections: it did not seem to me that
> you'd correctly propagated what ANALYZE is doing into CREATE INDEX or
> pgstattuple.  Also, since one of the things VACUUM does with num_tuples
> is to report it as "the total number of non-removable tuples", I thought
> we'd better leave that calculation alone.  I made the added code count
> live tuples in a new variable live_tuples, instead.
> 

OK, makes sense. Thanks.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2018-03-22 Thread Tom Lane
I wrote:
> Tomas Vondra  writes:
>> The 0002 part is the main part, unifying the definition of reltuples on
>> three main places:

> On to this part ...

I've pushed 0002 with several corrections: it did not seem to me that
you'd correctly propagated what ANALYZE is doing into CREATE INDEX or
pgstattuple.  Also, since one of the things VACUUM does with num_tuples
is to report it as "the total number of non-removable tuples", I thought
we'd better leave that calculation alone.  I made the added code count
live tuples in a new variable live_tuples, instead.

regards, tom lane



Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2018-03-22 Thread Tom Lane
Tomas Vondra  writes:
> 0001 fixes this by tracking the number of actually indexed rows in the
> build states, just like in the other index AMs.
> A VACUUM or ANALYZE will fix the estimate, of course, but for tables
> that are not changing very much it may take quite a while. So I think
> this is something we definitely need to back-patch.

Agreed, and done.  I noticed a second bug in contrib/bloom, too: it'd
forget to write out the last index page if that contained only one
tuple, because the new-page code path in bloomBuildCallback failed to
increment the "count" field after clearing it.

> The 0002 part is the main part, unifying the definition of reltuples on
> three main places:

On to this part ...

regards, tom lane



Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2018-03-20 Thread Tomas Vondra
Hi,

So here is an updated version of the patch/fix, addressing the remaining
issues in v3 posted by Tom in November.


The 0001 part is actually a bugfix in bloom and spgist index AM, which
did something like this:

reltuples = IndexBuildHeapScan(...)

result->heap_tuples = result->index_tuples = reltuples;

That is, these two AMs simply used the number of heap rows for the
index. That does not work for partial indexes, of course, where the
correct index reltuples value is likely much lower.

0001 fixes this by tracking the number of actually indexed rows in the
build states, just like in the other index AMs.

A VACUUM or ANALYZE will fix the estimate, of course, but for tables
that are not changing very much it may take quite a while. So I think
this is something we definitely need to back-patch.


The 0002 part is the main part, unifying the definition of reltuples on
three main places:

 a) acquire_sample_rows (ANALYZE)
 b) lazy_scan_heap (VACUUM)
 c) IndexBuildHeapRangeScan (CREATE INDEX)

As the ANALYZE case seems the most constrained, the other two places
were updated to use the same criteria for which rows to include in the
reltuples estimate:

  * HEAPTUPLE_LIVE
  * HEAPTUPLE_INSERT_IN_PROGRESS (same transaction)
  * HEAPTUPLE_DELETE_IN_PROGRESS (not the same trasaction)

This resolves the issue with oscillating reltuples estimates, produced
by VACUUM and ANALYZE (with many non-removable dead tuples).

I've checked all IndexBuildHeapRangeScan callers, and none of them is
using the reltuples estimate for anything except for passing it to
index_update_stats. Aside from the bug fixed in 0001, of course.



regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-Compute-index_tuples-correctly-in-bloom-and-spgist.patch.gz
Description: application/gzip


0002-Unify-the-definition-of-reltuples-in-VACUUM-ANALYZE-.patch.gz
Description: application/gzip


Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2018-03-05 Thread Tomas Vondra


On 03/05/2018 04:12 PM, David Steele wrote:
> Hi Tomas,
> 
> On 1/8/18 3:28 PM, Tomas Vondra wrote:
>>
>>
>> On 01/08/2018 08:39 PM, Tom Lane wrote:
>>> Tomas Vondra  writes:
 As I already mentioned, Tom's updated patch is better than what I
 posted initially, and I agree with his approach to the remaining
 issues he pointed out. But I somehow assumed that he's already
 looking into that. Tom, do you plan to look into this patch soon,
 or should I?
>>>
>>> No, I thought you were going to run with those ideas. I have a lot
>>> of other stuff on my plate ...
>>>
>>
>> OK, will do.
> 
> What the status of this patch?  It's been waiting on author since last
> November, though I can see there was some confusion over who was working
> on it.
> 
> Given that it's a bug fix it would be good to see a patch for this CF,
> or very soon after.
> 

Yeah, it's the next thing on my TODO.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2018-03-05 Thread David Steele
Hi Tomas,

On 1/8/18 3:28 PM, Tomas Vondra wrote:
> 
> 
> On 01/08/2018 08:39 PM, Tom Lane wrote:
>> Tomas Vondra  writes:
>>> As I already mentioned, Tom's updated patch is better than what I
>>> posted initially, and I agree with his approach to the remaining
>>> issues he pointed out. But I somehow assumed that he's already
>>> looking into that. Tom, do you plan to look into this patch soon,
>>> or should I?
>>
>> No, I thought you were going to run with those ideas. I have a lot
>> of other stuff on my plate ...
>>
> 
> OK, will do.

What the status of this patch?  It's been waiting on author since last
November, though I can see there was some confusion over who was working
on it.

Given that it's a bug fix it would be good to see a patch for this CF,
or very soon after.

Thanks,
-- 
-David
da...@pgmasters.net



Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2018-01-08 Thread Tomas Vondra


On 01/08/2018 08:39 PM, Tom Lane wrote:
> Tomas Vondra  writes:
>> As I already mentioned, Tom's updated patch is better than what I
>> posted initially, and I agree with his approach to the remaining
>> issues he pointed out. But I somehow assumed that he's already
>> looking into that. Tom, do you plan to look into this patch soon,
>> or should I?
> 
> No, I thought you were going to run with those ideas. I have a lot
> of other stuff on my plate ...
> 

OK, will do.

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2018-01-08 Thread Tom Lane
Tomas Vondra  writes:
> As I already mentioned, Tom's updated patch is better than what I posted
> initially, and I agree with his approach to the remaining issues he
> pointed out. But I somehow assumed that he's already looking into that.
> Tom, do you plan to look into this patch soon, or should I?

No, I thought you were going to run with those ideas.  I have a lot of
other stuff on my plate ...

regards, tom lane



Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2018-01-08 Thread Tomas Vondra

On 01/05/2018 05:25 AM, Stephen Frost wrote:
> Tomas,
> 
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
>> On Sun, Nov 19, 2017 at 6:40 AM, Tomas Vondra
>>  wrote:
>>> Thanks for looking into this. I agree your patch is more consistent and
>>> generally cleaner.
>>
>> This is classified as a bug fix, and is marked as waiting on author. I
>> am moving it to next CF as work continues.
> 
> This is still in waiting-on-author state; it'd be great to get your 
> thoughts on where this patch is and what the next steps are to move
> it forward. If you need additional feedback or there needs to be
> more discussion (perhaps with Tom) then maybe this should be in
> needs-review instead, but it'd be great if you could review the
> discussion and patch again and at least provide an update on it (last
> update from you was in November).
> 

Hmmm, I'm not sure, TBH.

As I already mentioned, Tom's updated patch is better than what I posted
initially, and I agree with his approach to the remaining issues he
pointed out. But I somehow assumed that he's already looking into that.

Tom, do you plan to look into this patch soon, or should I?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2018-01-04 Thread Stephen Frost
Tomas,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Sun, Nov 19, 2017 at 6:40 AM, Tomas Vondra
>  wrote:
> > Thanks for looking into this. I agree your patch is more consistent and
> > generally cleaner.
> 
> This is classified as a bug fix, and is marked as waiting on author. I
> am moving it to next CF as work continues.

This is still in waiting-on-author state; it'd be great to get your
thoughts on where this patch is and what the next steps are to move it
forward.  If you need additional feedback or there needs to be more
discussion (perhaps with Tom) then maybe this should be in needs-review
instead, but it'd be great if you could review the discussion and patch
again and at least provide an update on it (last update from you was in
November).

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2017-11-27 Thread Michael Paquier
On Sun, Nov 19, 2017 at 6:40 AM, Tomas Vondra
 wrote:
> Thanks for looking into this. I agree your patch is more consistent and
> generally cleaner.

This is classified as a bug fix, and is marked as waiting on author. I
am moving it to next CF as work continues.
-- 
Michael



Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2017-11-20 Thread Alvaro Herrera
BTW see bug #14863 which is related to this:
https://postgr.es/m/CAEBTBzu5j_E1K1jb9OKwTZj98MPeM7V81-vadp5adRm=nhj...@mail.gmail.com

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2017-11-18 Thread Tom Lane
Tomas Vondra  writes:
> On 11/02/2017 08:15 PM, Tom Lane wrote:
>> However, I'm not sure we're there yet, because there remains a fairly
>> nasty discrepancy even once we've gotten everyone onto the same page
>> about reltuples counting just live tuples: VACUUM and ANALYZE have
>> different definitions of what's "live".  In particular they do not treat
>> INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples the same.  Should we
>> try to do something about that?  If so, what?  It looks like ANALYZE's
>> behavior is pretty tightly constrained, judging by the comments in
>> acquire_sample_rows.

> ISTM we need to unify those definitions, probably so that VACUUM adopts
> what acquire_sample_rows does. I mean, if ANALYZE assumes that the stats
> will be updated at the end of transaction, why shouldn't VACUUM do the
> same thing?

That was the way I was leaning.  I haven't thought very hard about the
implications, but as long as the change in VACUUM's behavior extends
only to the live-tuple count it reports, it seems like adjusting it
couldn't break anything too badly.

>> Another problem is that it looks like CREATE INDEX will set reltuples
>> to the total number of heap entries it chose to index, because that
>> is what IndexBuildHeapScan counts.  Maybe we should adjust that?

> You mean by only counting live tuples in IndexBuildHeapRangeScan,
> following whatever definition we end up using in VACUUM/ANALYZE?

Right.  One issue is that, as I mentioned, the index AMs probably want to
think about total-tuples-indexed not live-tuples; so for their purposes,
what IndexBuildHeapScan currently counts is the right thing.  We need to
look and see if any AMs are actually using that value rather than just
silently passing it back.  If they are, we might need to go to the trouble
of computing/returning two values.

regards, tom lane



Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2017-11-18 Thread Tomas Vondra
Hi,

On 11/02/2017 08:15 PM, Tom Lane wrote:
> Haribabu Kommi  writes:
>> The changes are fine and now it reports proper live tuples in both
>> pg_class and stats. The other issue of continuous vacuum operation
>> leading to decrease of number of live tuples is not related to this
>> patch and that can be handled separately.
> 
> I did not like this patch too much, because it did nothing to fix
> the underlying problem of confusion between "live tuples" and "total
> tuples"; in fact, it made that worse, because now the comment on
> LVRelStats.new_rel_tuples is a lie.  And there's at least one usage
> of that field value where I think we do want total tuples not live
> tuples: where we pass it to index AM cleanup functions.  Indexes
> don't really care whether heap entries are live or not, only that
> they're there.  Plus the VACUUM VERBOSE log message that uses the
> field is supposed to be reporting total tuples not live tuples.
> 
> I hacked up the patch trying to make that better, finding along the
> way that contrib/pgstattuple shared in the confusion about what
> it was trying to count.  Results attached.
> 

Thanks for looking into this. I agree your patch is more consistent and
generally cleaner.

> However, I'm not sure we're there yet, because there remains a fairly
> nasty discrepancy even once we've gotten everyone onto the same page
> about reltuples counting just live tuples: VACUUM and ANALYZE have
> different definitions of what's "live".  In particular they do not treat
> INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples the same.  Should we
> try to do something about that?  If so, what?  It looks like ANALYZE's
> behavior is pretty tightly constrained, judging by the comments in
> acquire_sample_rows.
> 

Yeah :-(

For the record (and people following this thread who are too lazy to
open the analyze.c and search for the comments), the problem is that
acquire_sample_rows does not count HEAPTUPLE_INSERT_IN_PROGRESS as live
(and HEAPTUPLE_DELETE_IN_PROGRESS as dead) as it assumes the transaction
will send the stats at the end.

Which is a correct assumption, but it means that when there is a
long-running transaction that inserted/deleted many rows, the reltuples
value will oscillate during VACUUM / ANALYZE runs.

ISTM we need to unify those definitions, probably so that VACUUM adopts
what acquire_sample_rows does. I mean, if ANALYZE assumes that the stats
will be updated at the end of transaction, why shouldn't VACUUM do the
same thing?

The one reason I can think of is that VACUUM is generally expected to
run longer than ANALYZE, so the assumption that large transactions
complete later is not quite reliable.

Or is there some other reason why VACUUM can't treat the in-progress
tuples the same way?

> Another problem is that it looks like CREATE INDEX will set reltuples
> to the total number of heap entries it chose to index, because that
> is what IndexBuildHeapScan counts.  Maybe we should adjust that?
> 

You mean by only counting live tuples in IndexBuildHeapRangeScan,
following whatever definition we end up using in VACUUM/ANALYZE? Seems
like a good idea I guess, although CREATE INDEX not as frequent as the
other commands.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services