subject:"\[HACKERS\] GUC for cleanup indexes threshold."

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-10-31 Thread Robert Haas

On Mon, Oct 16, 2017 at 3:02 PM, Masahiko Sawada  wrote:
> I guess that is the patch I proposed. However I think that there still
> is room for discussion because the patch cannot skip to cleanup vacuum
> when aggressive vacuum, which is one of the situation that I really
> wanted to skip.

Well, I think there are outstanding concerns that the patch in
question is not safe, and I don't see that anything has been done to
resolve them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-10-16 Thread Masahiko Sawada

On Fri, Oct 13, 2017 at 1:14 AM, Robert Haas  wrote:
> On Tue, Oct 10, 2017 at 5:55 AM, Pavel Golub  wrote:
>> DP> The new status of this patch is: Ready for Committer
>>
>> Seems  like,  we  may  also  going to hit it and it would be cool this
>> vacuum issue solved for next PG version.
>
> Exactly which patch on this thread is someone proposing for commit?
>

I guess that is the patch I proposed. However I think that there still
is room for discussion because the patch cannot skip to cleanup vacuum
when aggressive vacuum, which is one of the situation that I really
wanted to skip.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-10-12 Thread Robert Haas

On Tue, Oct 10, 2017 at 5:55 AM, Pavel Golub  wrote:
> DP> The new status of this patch is: Ready for Committer
>
> Seems  like,  we  may  also  going to hit it and it would be cool this
> vacuum issue solved for next PG version.

Exactly which patch on this thread is someone proposing for commit?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-10-10 Thread Pavel Golub

Hello, Darafei.

You wrote:

DP> The following review has been posted through the commitfest application:
DP> make installcheck-world:  tested, passed
DP> Implements feature:   tested, passed
DP> Spec compliant:   tested, passed
DP> Documentation:tested, passed

DP> We're using Postgres with this patch for some time.

DP> In our use case we've got a quickly growing large table with events from 
our users.
DP> Table has a structure of (user_id, ts, ). Events are
DP> append only, each user generates events in small predictable time frame, 
mostly each second.
DP> From time to time we need to read this table in fashion of WHERE
DP> ts BETWEEN a AND b AND user_id=c.
DP> Such query leads to enormous amount of seeks, as records of each
DP> user are scattered across relation and there are no pages that
DP> contain two events from same user.

DP> To fight it, we created a btree index on (user_id, ts,
DP> ). Plan switched to index only scans, but heap fetches
DP> and execution times were still the same. 
DP> Manual 
DP> We noticed that autovacuum skips scanning the relation and freezing the 
Visibility Map.

DP> We started frequently performing VACUUM manually on the relation.
DP> This helped with freezing the Visibility Map.
DP> However, we found out that VACUUM makes a full scan over the index.
DP> As index does not fit into memory, this means that each run
DP> flushes all the disk caches and eats up Amazon IOPS credits. 

DP> With this patch behavior is much better for us - VACUUM finishes real quick.

DP> As a future improvement, a similar improvement for other index types will 
be useful.
DP> After it happens, I'm looking forward to autovacuum kicking in on
DP> append-only tables, to freeze the Visibility Map.

DP> The new status of this patch is: Ready for Committer


Seems  like,  we  may  also  going to hit it and it would be cool this
vacuum issue solved for next PG version.

-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-25 Thread Kyotaro HORIGUCHI

At Fri, 22 Sep 2017 17:15:08 +0300, Sokolov Yura  
wrote in 
> On 2017-09-22 16:22, Sokolov Yura wrote:
> > On 2017-09-22 11:21, Masahiko Sawada wrote:
> >> On Fri, Sep 22, 2017 at 4:16 PM, Kyotaro HORIGUCHI
> >>  wrote:
> >>> At Fri, 22 Sep 2017 15:00:20 +0900, Masahiko Sawada
> >>>  wrote in
> >>> 
>  On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
>   wrote:
>  > I was just looking the thread since it is found left alone for a
>  > long time in the CF app.
>  >
>  > At Mon, 18 Sep 2017 16:35:58 -0700, Peter Geoghegan  
>  > wrote
>  > in
>  > 
>  >> On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund 
>  >> wrote:
>  >> > Hi,
>  >> >
>  >> > On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
>  >> >> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas
>  >> >>  wrote:
>  >> >> [ lots of valuable discussion ]
>  >> >
>  >> > I think this patch clearly still is in the design stage, and has
>  >> > received plenty feedback this CF.  I'll therefore move this to the
>  >> > next
>  >> > commitfest.
>  >>
>  >> Does anyone have ideas on a way forward here? I don't, but then I
>  >> haven't thought about it in detail in several months.
>  >
>  > Is the additional storage in metapage to store the current status
>  > of vaccum is still unacceptable even if it can avoid useless
>  > full-page scan on indexes especially for stable tables?
>  >
>  > Or, how about additional 1 bit in pg_stat_*_index to indicate
>  > that the index *don't* require vacuum cleanup stage. (default
>  > value causes cleanup)
>  You meant that "the next cycle" is the lazy_cleanup_index() function
>  called by lazy_scan_heap()?
> >>> Both finally call btvacuumscan under a certain condition, but
> >>> what I meant by "the next cycle" is the lazy_cleanup_index call
> >>> in the next round of vacuum since abstract layer (index am) isn't
> >>> conscious of the detail of btree.
> >>> 
>  > index_bulk_delete (or ambulkdelete) returns the flag in
>  > IndexBulkDeleteResult then lazy_scan_heap stores the flag in
>  > stats and in the next cycle it is looked up to decide the
>  > necessity of index cleanup.
>  >
>  Could you elaborate about this? For example in btree index, the index
>  cleanup skips to scan on the index scan if index_bulk_delete has been
>  called during vacuuming because stats != NULL. So I think we don't
>  need such a flag.
> >>> The flag works so that successive two index full scans don't
> >>> happen in a vacuum round. If any rows are fully deleted, just
> >>> following btvacuumcleanup does nothing.
> >>> I think what you wanted to solve here was the problem that
> >>> index_vacuum_cleanup runs a full scan even if it ends with no
> >>> actual work, when manual or anti-wraparound vacuums.  (I'm
> >>> getting a bit confused on this..) It is caused by using the
> >>> pointer "stats" as the flag to instruct to do that. If the
> >>> stats-as-a-flag worked as expected, the GUC doesn't seem to be
> >>> required.
> >> Hmm, my proposal is like that if a table doesn't changed since the
> >> previous vacuum much we skip the cleaning up index.
> >> If the table has at least one garbage we do the lazy_vacuum_index and
> >> then IndexBulkDeleteResutl is stored, which causes to skip doing the
> >> btvacuumcleanup. On the other hand, if the table doesn't have any
> >> garbage but some new tuples inserted since the previous vacuum, we
> >> don't do the lazy_vacuum_index but do the lazy_cleanup_index. In this
> >> case, we always do the lazy_cleanup_index (i.g, we do the full scan)
> >> even if only one tuple is inserted. That's why I proposed a new GUC
> >> parameter which allows us to skip the lazy_cleanup_index in the case.
> >> 
> >>> Addition to that, as Simon and Peter pointed out
> >>> index_bulk_delete can leave not-fully-removed pages (so-called
> >>> half-dead pages and pages that are recyclable but not registered
> >>> in FSM, AFAICS) in some cases mainly by RecentGlobalXmin
> >>> interlock. In this case, just inhibiting cleanup scan by a
> >>> threshold lets such dangling pages persist in the index. (I
> >>> conldn't make such a many dangling pages, though..)
> >>> The first patch in the mail (*1) does that. It seems having some
> >>> bugs, though..
> >>> Since the dangling pages persist until autovacuum decided to scan
> >>> the belonging table again, we should run a vacuum round (or
> >>> index_vacuum_cleanup itself) even having no dead rows if we want
> >>> to clean up such pages within a certain period. The

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-25 Thread Kyotaro HORIGUCHI

At Mon, 25 Sep 2017 19:20:07 +0900, Masahiko Sawada  
wrote in

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-25 Thread Masahiko Sawada

On Fri, Sep 22, 2017 at 5:31 PM, Kyotaro HORIGUCHI
 wrote:
> At Fri, 22 Sep 2017 17:21:04 +0900, Masahiko Sawada  
> wrote in 
>> On Fri, Sep 22, 2017 at 4:16 PM, Kyotaro HORIGUCHI
>>  wrote:
>> > At Fri, 22 Sep 2017 15:00:20 +0900, Masahiko Sawada 
>> >  wrote in 
>> > 
>> >> On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
>> >>  wrote:
>> >> Could you elaborate about this? For example in btree index, the index
>> >> cleanup skips to scan on the index scan if index_bulk_delete has been
>> >> called during vacuuming because stats != NULL. So I think we don't
>> >> need such a flag.
>> >
>> > The flag works so that successive two index full scans don't
>> > happen in a vacuum round. If any rows are fully deleted, just
>> > following btvacuumcleanup does nothing.
>> >
>> > I think what you wanted to solve here was the problem that
>> > index_vacuum_cleanup runs a full scan even if it ends with no
>> > actual work, when manual or anti-wraparound vacuums.  (I'm
>> > getting a bit confused on this..) It is caused by using the
>> > pointer "stats" as the flag to instruct to do that. If the
>> > stats-as-a-flag worked as expected, the GUC doesn't seem to be
>> > required.
>>
>> Hmm, my proposal is like that if a table doesn't changed since the
>> previous vacuum much we skip the cleaning up index.
>>
>> If the table has at least one garbage we do the lazy_vacuum_index and
>> then IndexBulkDeleteResutl is stored, which causes to skip doing the
>> btvacuumcleanup. On the other hand, if the table doesn't have any
>> garbage but some new tuples inserted since the previous vacuum, we
>> don't do the lazy_vacuum_index but do the lazy_cleanup_index. In this
>> case, we always do the lazy_cleanup_index (i.g, we do the full scan)
>> even if only one tuple is inserted. That's why I proposed a new GUC
>> parameter which allows us to skip the lazy_cleanup_index in the case.
>
> I think the problem raised in this thread is that the last index
> scan may leave dangling pages.
>
>> > Addition to that, as Simon and Peter pointed out
>> > index_bulk_delete can leave not-fully-removed pages (so-called
>> > half-dead pages and pages that are recyclable but not registered
>> > in FSM, AFAICS) in some cases mainly by RecentGlobalXmin
>> > interlock. In this case, just inhibiting cleanup scan by a
>> > threshold lets such dangling pages persist in the index. (I
>> > conldn't make such a many dangling pages, though..)
>> >
>> > The first patch in the mail (*1) does that. It seems having some
>> > bugs, though..
>> >
>> >
>> > Since the dangling pages persist until autovacuum decided to scan
>> > the belonging table again, we should run a vacuum round (or
>> > index_vacuum_cleanup itself) even having no dead rows if we want
>> > to clean up such pages within a certain period. The second patch
>> > doesn that.
>> >
>>
>> IIUC half-dead pages are not relevant to this proposal. The proposal
>> has two problems;
>>
>> * By skipping index cleanup we could leave recyclable pages that are
>> not marked as a recyclable.
>
> Yes.
>
>> * we stash an XID when a btree page is deleted, which is used to
>> determine when it's finally safe to recycle the page
>
> Is it a "problem" of this proposal?
>

As Peter explained before[1], the problem is that there is an XID
stored in dead btree pages that is used in the subsequent
RecentGlobalXmin interlock that determines if recycling is safe.

[1] 
https://www.postgresql.org/message-id/CAH2-Wz%3D1%3Dt5fcGGfarQGcAWBqaCh%2BdLMjpYCYHpEyzK8Qg6OrQ%40mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Darafei Praliaskouski

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

We're using Postgres with this patch for some time.

In our use case we've got a quickly growing large table with events from our 
users. 
Table has a structure of (user_id, ts, ). Events are append only, 
each user generates events in small predictable time frame, mostly each second.
From time to time we need to read this table in fashion of WHERE ts BETWEEN a 
AND b AND user_id=c.
Such query leads to enormous amount of seeks, as records of each user are 
scattered across relation and there are no pages that contain two events from 
same user.

To fight it, we created a btree index on (user_id, ts, ). Plan 
switched to index only scans, but heap fetches and execution times were still 
the same. 
Manual 
We noticed that autovacuum skips scanning the relation and freezing the 
Visibility Map. 

We started frequently performing VACUUM manually on the relation. This helped 
with freezing the Visibility Map.
However, we found out that VACUUM makes a full scan over the index.
As index does not fit into memory, this means that each run flushes all the 
disk caches and eats up Amazon IOPS credits. 

With this patch behavior is much better for us - VACUUM finishes real quick.

As a future improvement, a similar improvement for other index types will be 
useful.
After it happens, I'm looking forward to autovacuum kicking in on append-only 
tables, to freeze the Visibility Map.

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Sokolov Yura

On 2017-09-22 16:22, Sokolov Yura wrote:

On 2017-09-22 11:21, Masahiko Sawada wrote:

On Fri, Sep 22, 2017 at 4:16 PM, Kyotaro HORIGUCHI
 wrote:
At Fri, 22 Sep 2017 15:00:20 +0900, Masahiko Sawada 
 wrote in 

On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
 wrote:
> I was just looking the thread since it is found left alone for a
> long time in the CF app.
>
> At Mon, 18 Sep 2017 16:35:58 -0700, Peter Geoghegan  wrote in 

>> On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund  wrote:
>> > Hi,
>> >
>> > On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
>> >> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas  
wrote:
>> >> [ lots of valuable discussion ]
>> >
>> > I think this patch clearly still is in the design stage, and has
>> > received plenty feedback this CF.  I'll therefore move this to the next
>> > commitfest.
>>
>> Does anyone have ideas on a way forward here? I don't, but then I
>> haven't thought about it in detail in several months.
>
> Is the additional storage in metapage to store the current status
> of vaccum is still unacceptable even if it can avoid useless
> full-page scan on indexes especially for stable tables?
>
> Or, how about additional 1 bit in pg_stat_*_index to indicate
> that the index *don't* require vacuum cleanup stage. (default
> value causes cleanup)

You meant that "the next cycle" is the lazy_cleanup_index() function
called by lazy_scan_heap()?

Both finally call btvacuumscan under a certain condition, but
what I meant by "the next cycle" is the lazy_cleanup_index call
in the next round of vacuum since abstract layer (index am) isn't
conscious of the detail of btree.

> index_bulk_delete (or ambulkdelete) returns the flag in
> IndexBulkDeleteResult then lazy_scan_heap stores the flag in
> stats and in the next cycle it is looked up to decide the
> necessity of index cleanup.
>

Could you elaborate about this? For example in btree index, the 
index
cleanup skips to scan on the index scan if index_bulk_delete has 
been

called during vacuuming because stats != NULL. So I think we don't
need such a flag.

The flag works so that successive two index full scans don't
happen in a vacuum round. If any rows are fully deleted, just
following btvacuumcleanup does nothing.

I think what you wanted to solve here was the problem that
index_vacuum_cleanup runs a full scan even if it ends with no
actual work, when manual or anti-wraparound vacuums.  (I'm
getting a bit confused on this..) It is caused by using the
pointer "stats" as the flag to instruct to do that. If the
stats-as-a-flag worked as expected, the GUC doesn't seem to be
required.

Hmm, my proposal is like that if a table doesn't changed since the
previous vacuum much we skip the cleaning up index.

If the table has at least one garbage we do the lazy_vacuum_index and
then IndexBulkDeleteResutl is stored, which causes to skip doing the
btvacuumcleanup. On the other hand, if the table doesn't have any
garbage but some new tuples inserted since the previous vacuum, we
don't do the lazy_vacuum_index but do the lazy_cleanup_index. In this
case, we always do the lazy_cleanup_index (i.g, we do the full scan)
even if only one tuple is inserted. That's why I proposed a new GUC
parameter which allows us to skip the lazy_cleanup_index in the case.

Addition to that, as Simon and Peter pointed out
index_bulk_delete can leave not-fully-removed pages (so-called
half-dead pages and pages that are recyclable but not registered
in FSM, AFAICS) in some cases mainly by RecentGlobalXmin
interlock. In this case, just inhibiting cleanup scan by a
threshold lets such dangling pages persist in the index. (I
conldn't make such a many dangling pages, though..)

The first patch in the mail (*1) does that. It seems having some
bugs, though..

Since the dangling pages persist until autovacuum decided to scan
the belonging table again, we should run a vacuum round (or
index_vacuum_cleanup itself) even having no dead rows if we want
to clean up such pages within a certain period. The second patch
doesn that.

IIUC half-dead pages are not relevant to this proposal. The proposal
has two problems;
* By skipping index cleanup we could leave recyclable pages that are
not marked as a recyclable.
* we stash an XID when a btree page is deleted, which is used to
determine when it's finally safe to recycle the page

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Here is a small patch that skips scanning btree index if no pending
deleted pages exists.

It detects this situation by comparing pages_deleted with pages_free.
If they are equal, then there is no half-deleted pages, and it is
safe to skip next lazy scan.

Flag

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Sokolov Yura

On 2017-09-22 11:21, Masahiko Sawada wrote:

On Fri, Sep 22, 2017 at 4:16 PM, Kyotaro HORIGUCHI
 wrote:
At Fri, 22 Sep 2017 15:00:20 +0900, Masahiko Sawada 
 wrote in 

On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
 wrote:
> I was just looking the thread since it is found left alone for a
> long time in the CF app.
>
> At Mon, 18 Sep 2017 16:35:58 -0700, Peter Geoghegan  wrote in 

>> On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund  wrote:
>> > Hi,
>> >
>> > On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
>> >> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas  
wrote:
>> >> [ lots of valuable discussion ]
>> >
>> > I think this patch clearly still is in the design stage, and has
>> > received plenty feedback this CF.  I'll therefore move this to the next
>> > commitfest.
>>
>> Does anyone have ideas on a way forward here? I don't, but then I
>> haven't thought about it in detail in several months.
>
> Is the additional storage in metapage to store the current status
> of vaccum is still unacceptable even if it can avoid useless
> full-page scan on indexes especially for stable tables?
>
> Or, how about additional 1 bit in pg_stat_*_index to indicate
> that the index *don't* require vacuum cleanup stage. (default
> value causes cleanup)

You meant that "the next cycle" is the lazy_cleanup_index() function
called by lazy_scan_heap()?

Both finally call btvacuumscan under a certain condition, but
what I meant by "the next cycle" is the lazy_cleanup_index call
in the next round of vacuum since abstract layer (index am) isn't
conscious of the detail of btree.

> index_bulk_delete (or ambulkdelete) returns the flag in
> IndexBulkDeleteResult then lazy_scan_heap stores the flag in
> stats and in the next cycle it is looked up to decide the
> necessity of index cleanup.
>

Could you elaborate about this? For example in btree index, the index
cleanup skips to scan on the index scan if index_bulk_delete has been
called during vacuuming because stats != NULL. So I think we don't
need such a flag.

The flag works so that successive two index full scans don't
happen in a vacuum round. If any rows are fully deleted, just
following btvacuumcleanup does nothing.

I think what you wanted to solve here was the problem that
index_vacuum_cleanup runs a full scan even if it ends with no
actual work, when manual or anti-wraparound vacuums.  (I'm
getting a bit confused on this..) It is caused by using the
pointer "stats" as the flag to instruct to do that. If the
stats-as-a-flag worked as expected, the GUC doesn't seem to be
required.

Hmm, my proposal is like that if a table doesn't changed since the
previous vacuum much we skip the cleaning up index.

If the table has at least one garbage we do the lazy_vacuum_index and
then IndexBulkDeleteResutl is stored, which causes to skip doing the
btvacuumcleanup. On the other hand, if the table doesn't have any
garbage but some new tuples inserted since the previous vacuum, we
don't do the lazy_vacuum_index but do the lazy_cleanup_index. In this
case, we always do the lazy_cleanup_index (i.g, we do the full scan)
even if only one tuple is inserted. That's why I proposed a new GUC
parameter which allows us to skip the lazy_cleanup_index in the case.

Addition to that, as Simon and Peter pointed out
index_bulk_delete can leave not-fully-removed pages (so-called
half-dead pages and pages that are recyclable but not registered
in FSM, AFAICS) in some cases mainly by RecentGlobalXmin
interlock. In this case, just inhibiting cleanup scan by a
threshold lets such dangling pages persist in the index. (I
conldn't make such a many dangling pages, though..)

The first patch in the mail (*1) does that. It seems having some
bugs, though..

Since the dangling pages persist until autovacuum decided to scan
the belonging table again, we should run a vacuum round (or
index_vacuum_cleanup itself) even having no dead rows if we want
to clean up such pages within a certain period. The second patch
doesn that.

IIUC half-dead pages are not relevant to this proposal. The proposal
has two problems;
* By skipping index cleanup we could leave recyclable pages that are
not marked as a recyclable.
* we stash an XID when a btree page is deleted, which is used to
determine when it's finally safe to recycle the page

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Here is a small patch that skips scanning btree index if no pending
deleted pages exists.

It detects this situation by comparing pages_deleted with pages_free.
If they are equal, then there is no half-deleted pages, and it is
safe to skip next lazy scan.

Flag stored in a btpo_flags. It is unset using

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Claudio Freire

On Fri, Sep 22, 2017 at 4:46 AM, Kyotaro HORIGUCHI
 wrote:
> I apologize in advance of possible silliness.
>
> At Thu, 21 Sep 2017 13:54:01 -0300, Claudio Freire  
> wrote in

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Kyotaro HORIGUCHI

At Fri, 22 Sep 2017 17:21:04 +0900, Masahiko Sawada  
wrote in 
> On Fri, Sep 22, 2017 at 4:16 PM, Kyotaro HORIGUCHI
>  wrote:
> > At Fri, 22 Sep 2017 15:00:20 +0900, Masahiko Sawada  
> > wrote in 
> > 
> >> On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
> >>  wrote:
> >> Could you elaborate about this? For example in btree index, the index
> >> cleanup skips to scan on the index scan if index_bulk_delete has been
> >> called during vacuuming because stats != NULL. So I think we don't
> >> need such a flag.
> >
> > The flag works so that successive two index full scans don't
> > happen in a vacuum round. If any rows are fully deleted, just
> > following btvacuumcleanup does nothing.
> >
> > I think what you wanted to solve here was the problem that
> > index_vacuum_cleanup runs a full scan even if it ends with no
> > actual work, when manual or anti-wraparound vacuums.  (I'm
> > getting a bit confused on this..) It is caused by using the
> > pointer "stats" as the flag to instruct to do that. If the
> > stats-as-a-flag worked as expected, the GUC doesn't seem to be
> > required.
> 
> Hmm, my proposal is like that if a table doesn't changed since the
> previous vacuum much we skip the cleaning up index.
> 
> If the table has at least one garbage we do the lazy_vacuum_index and
> then IndexBulkDeleteResutl is stored, which causes to skip doing the
> btvacuumcleanup. On the other hand, if the table doesn't have any
> garbage but some new tuples inserted since the previous vacuum, we
> don't do the lazy_vacuum_index but do the lazy_cleanup_index. In this
> case, we always do the lazy_cleanup_index (i.g, we do the full scan)
> even if only one tuple is inserted. That's why I proposed a new GUC
> parameter which allows us to skip the lazy_cleanup_index in the case.

I think the problem raised in this thread is that the last index
scan may leave dangling pages.

> > Addition to that, as Simon and Peter pointed out
> > index_bulk_delete can leave not-fully-removed pages (so-called
> > half-dead pages and pages that are recyclable but not registered
> > in FSM, AFAICS) in some cases mainly by RecentGlobalXmin
> > interlock. In this case, just inhibiting cleanup scan by a
> > threshold lets such dangling pages persist in the index. (I
> > conldn't make such a many dangling pages, though..)
> >
> > The first patch in the mail (*1) does that. It seems having some
> > bugs, though..
> >
> >
> > Since the dangling pages persist until autovacuum decided to scan
> > the belonging table again, we should run a vacuum round (or
> > index_vacuum_cleanup itself) even having no dead rows if we want
> > to clean up such pages within a certain period. The second patch
> > doesn that.
> >
> 
> IIUC half-dead pages are not relevant to this proposal. The proposal
> has two problems;
> 
> * By skipping index cleanup we could leave recyclable pages that are
> not marked as a recyclable.

Yes.

> * we stash an XID when a btree page is deleted, which is used to
> determine when it's finally safe to recycle the page

Is it a "problem" of this proposal?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Masahiko Sawada

On Fri, Sep 22, 2017 at 4:16 PM, Kyotaro HORIGUCHI
 wrote:
> At Fri, 22 Sep 2017 15:00:20 +0900, Masahiko Sawada  
> wrote in 
>> On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
>>  wrote:
>> > I was just looking the thread since it is found left alone for a
>> > long time in the CF app.
>> >
>> > At Mon, 18 Sep 2017 16:35:58 -0700, Peter Geoghegan  wrote 
>> > in 
>> >> On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund  wrote:
>> >> > Hi,
>> >> >
>> >> > On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
>> >> >> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas  
>> >> >> wrote:
>> >> >> [ lots of valuable discussion ]
>> >> >
>> >> > I think this patch clearly still is in the design stage, and has
>> >> > received plenty feedback this CF.  I'll therefore move this to the next
>> >> > commitfest.
>> >>
>> >> Does anyone have ideas on a way forward here? I don't, but then I
>> >> haven't thought about it in detail in several months.
>> >
>> > Is the additional storage in metapage to store the current status
>> > of vaccum is still unacceptable even if it can avoid useless
>> > full-page scan on indexes especially for stable tables?
>> >
>> > Or, how about additional 1 bit in pg_stat_*_index to indicate
>> > that the index *don't* require vacuum cleanup stage. (default
>> > value causes cleanup)
>>
>> You meant that "the next cycle" is the lazy_cleanup_index() function
>> called by lazy_scan_heap()?
>
> Both finally call btvacuumscan under a certain condition, but
> what I meant by "the next cycle" is the lazy_cleanup_index call
> in the next round of vacuum since abstract layer (index am) isn't
> conscious of the detail of btree.
>
>> > index_bulk_delete (or ambulkdelete) returns the flag in
>> > IndexBulkDeleteResult then lazy_scan_heap stores the flag in
>> > stats and in the next cycle it is looked up to decide the
>> > necessity of index cleanup.
>> >
>>
>> Could you elaborate about this? For example in btree index, the index
>> cleanup skips to scan on the index scan if index_bulk_delete has been
>> called during vacuuming because stats != NULL. So I think we don't
>> need such a flag.
>
> The flag works so that successive two index full scans don't
> happen in a vacuum round. If any rows are fully deleted, just
> following btvacuumcleanup does nothing.
>
> I think what you wanted to solve here was the problem that
> index_vacuum_cleanup runs a full scan even if it ends with no
> actual work, when manual or anti-wraparound vacuums.  (I'm
> getting a bit confused on this..) It is caused by using the
> pointer "stats" as the flag to instruct to do that. If the
> stats-as-a-flag worked as expected, the GUC doesn't seem to be
> required.

Hmm, my proposal is like that if a table doesn't changed since the
previous vacuum much we skip the cleaning up index.

If the table has at least one garbage we do the lazy_vacuum_index and
then IndexBulkDeleteResutl is stored, which causes to skip doing the
btvacuumcleanup. On the other hand, if the table doesn't have any
garbage but some new tuples inserted since the previous vacuum, we
don't do the lazy_vacuum_index but do the lazy_cleanup_index. In this
case, we always do the lazy_cleanup_index (i.g, we do the full scan)
even if only one tuple is inserted. That's why I proposed a new GUC
parameter which allows us to skip the lazy_cleanup_index in the case.

>
> Addition to that, as Simon and Peter pointed out
> index_bulk_delete can leave not-fully-removed pages (so-called
> half-dead pages and pages that are recyclable but not registered
> in FSM, AFAICS) in some cases mainly by RecentGlobalXmin
> interlock. In this case, just inhibiting cleanup scan by a
> threshold lets such dangling pages persist in the index. (I
> conldn't make such a many dangling pages, though..)
>
> The first patch in the mail (*1) does that. It seems having some
> bugs, though..
>
>
> Since the dangling pages persist until autovacuum decided to scan
> the belonging table again, we should run a vacuum round (or
> index_vacuum_cleanup itself) even having no dead rows if we want
> to clean up such pages within a certain period. The second patch
> doesn that.
>

IIUC half-dead pages are not relevant to this proposal. The proposal
has two problems;
* By skipping index cleanup we could leave recyclable pages that are
not marked as a recyclable.
* we stash an XID when a btree page is deleted, which is used to
determine when it's finally safe to recycle the page

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Kyotaro HORIGUCHI

I apologize in advance of possible silliness.

At Thu, 21 Sep 2017 13:54:01 -0300, Claudio Freire  
wrote in

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Kyotaro HORIGUCHI

At Fri, 22 Sep 2017 15:00:20 +0900, Masahiko Sawada  
wrote in 
> On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
>  wrote:
> > I was just looking the thread since it is found left alone for a
> > long time in the CF app.
> >
> > At Mon, 18 Sep 2017 16:35:58 -0700, Peter Geoghegan  wrote in 
> > 
> >> On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund  wrote:
> >> > Hi,
> >> >
> >> > On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
> >> >> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas  
> >> >> wrote:
> >> >> [ lots of valuable discussion ]
> >> >
> >> > I think this patch clearly still is in the design stage, and has
> >> > received plenty feedback this CF.  I'll therefore move this to the next
> >> > commitfest.
> >>
> >> Does anyone have ideas on a way forward here? I don't, but then I
> >> haven't thought about it in detail in several months.
> >
> > Is the additional storage in metapage to store the current status
> > of vaccum is still unacceptable even if it can avoid useless
> > full-page scan on indexes especially for stable tables?
> >
> > Or, how about additional 1 bit in pg_stat_*_index to indicate
> > that the index *don't* require vacuum cleanup stage. (default
> > value causes cleanup)
> 
> You meant that "the next cycle" is the lazy_cleanup_index() function
> called by lazy_scan_heap()?

Both finally call btvacuumscan under a certain condition, but
what I meant by "the next cycle" is the lazy_cleanup_index call
in the next round of vacuum since abstract layer (index am) isn't
conscious of the detail of btree.

> > index_bulk_delete (or ambulkdelete) returns the flag in
> > IndexBulkDeleteResult then lazy_scan_heap stores the flag in
> > stats and in the next cycle it is looked up to decide the
> > necessity of index cleanup.
> >
> 
> Could you elaborate about this? For example in btree index, the index
> cleanup skips to scan on the index scan if index_bulk_delete has been
> called during vacuuming because stats != NULL. So I think we don't
> need such a flag.

The flag works so that successive two index full scans don't
happen in a vacuum round. If any rows are fully deleted, just
following btvacuumcleanup does nothing.

I think what you wanted to solve here was the problem that
index_vacuum_cleanup runs a full scan even if it ends with no
actual work, when manual or anti-wraparound vacuums.  (I'm
getting a bit confused on this..) It is caused by using the
pointer "stats" as the flag to instruct to do that. If the
stats-as-a-flag worked as expected, the GUC doesn't seem to be
required.

Addition to that, as Simon and Peter pointed out
index_bulk_delete can leave not-fully-removed pages (so-called
half-dead pages and pages that are recyclable but not registered
in FSM, AFAICS) in some cases mainly by RecentGlobalXmin
interlock. In this case, just inhibiting cleanup scan by a
threshold lets such dangling pages persist in the index. (I
conldn't make such a many dangling pages, though..)

The first patch in the mail (*1) does that. It seems having some
bugs, though..

Since the dangling pages persist until autovacuum decided to scan
the belonging table again, we should run a vacuum round (or
index_vacuum_cleanup itself) even having no dead rows if we want
to clean up such pages within a certain period. The second patch
doesn that.

[*1] 
https://www.postgresql.org/message-id/20170921.174957.236914340.horiguchi.kyot...@lab.ntt.co.jp

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Masahiko Sawada

On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
 wrote:
> I was just looking the thread since it is found left alone for a
> long time in the CF app.
>
> At Mon, 18 Sep 2017 16:35:58 -0700, Peter Geoghegan  wrote in 
> 
>> On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund  wrote:
>> > Hi,
>> >
>> > On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
>> >> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas  
>> >> wrote:
>> >> [ lots of valuable discussion ]
>> >
>> > I think this patch clearly still is in the design stage, and has
>> > received plenty feedback this CF.  I'll therefore move this to the next
>> > commitfest.
>>
>> Does anyone have ideas on a way forward here? I don't, but then I
>> haven't thought about it in detail in several months.
>
> Is the additional storage in metapage to store the current status
> of vaccum is still unacceptable even if it can avoid useless
> full-page scan on indexes especially for stable tables?
>
> Or, how about additional 1 bit in pg_stat_*_index to indicate
> that the index *don't* require vacuum cleanup stage. (default
> value causes cleanup)

You meant that "the next cycle" is the lazy_cleanup_index() function
called by lazy_scan_heap()?

>
> index_bulk_delete (or ambulkdelete) returns the flag in
> IndexBulkDeleteResult then lazy_scan_heap stores the flag in
> stats and in the next cycle it is looked up to decide the
> necessity of index cleanup.
>

Could you elaborate about this? For example in btree index, the index
cleanup skips to scan on the index scan if index_bulk_delete has been
called during vacuuming because stats != NULL. So I think we don't
need such a flag.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-21 Thread Claudio Freire

On Tue, Sep 19, 2017 at 8:55 PM, Peter Geoghegan  wrote:
> On Tue, Sep 19, 2017 at 4:47 PM, Claudio Freire  
> wrote:
>> Maybe this is looking at the problem from the wrong direction.
>>
>> Why can't the page be added to the FSM immediately and the check be
>> done at runtime when looking for a reusable page?
>>
>> Index FSMs currently store only 0 or 255, couldn't they store 128 for
>> half-recyclable pages and make the caller re-check reusability before
>> using it?
>
> No, because it's impossible for them to know whether or not the page
> that their index scan just landed on recycled just a second ago, or
> was like this since before their xact began/snapshot was acquired.
>
> For your reference, this RecentGlobalXmin interlock stuff is what
> Lanin & Shasha call "The Drain Technique" within "2.5 Freeing Empty
> Nodes". Seems pretty hard to do it any other way.

I don't see the difference between a vacuum run and distributed
maintainance at _bt_getbuf time. In fact, the code seems to be in
place already.

_bt_page_recyclable seems to prevent old transactions from treating
those pages as recyclable already, and the description of the
technique in 2.5 doesn't seem to preclude doing the drain while doing
other operations. In fact, Lehman even considers the possibility of
multiple concurrent garbage collectors.

It's only a matter of making the page visible in the FSM in a way that
can be efficiently skipped if we want to go directly to a page that
actually CAN be recycled to avoid looping forever looking for a
recyclable page in _bt_getbuf. In fact, that's pretty much Lehman's
drain technique right there. FSM entries with 128 would be "the
queue", and FSM entries with 255 would be "the freelist". _bt_getbuf
can be the GC getting a buffer to try and recycle, give up after a few
tries, and get an actual recycleable buffer instead (or extend the
relationship). In essence, microvacuum.

Unless I'm missing something and RecentGlobalXmin somehow doesn't
exclude all old transactions, I don't see a problem.

Lanin & Shasha use reference counting to do GC wait during the drain,
and most of the ordering of operations needed there is because of
that, but using the xmin seems to make all those considerations moot.
An xact earlier than RecentGlobalXmin cannot have active transactions
able to follow links to that page AFAIK.

TBH, I didn't read the whole papers, though I probably will.

But, in essence, what's the difference of vacuum doing

if (_bt_page_recyclable(page))
{
/* Okay to recycle this page */
RecordFreeIndexPage(rel, blkno);
vstate->totFreePages++;
stats->pages_deleted++;
}

VS doing it in _bt_getbuf?

What am I missing?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-21 Thread Kyotaro HORIGUCHI

Hi,

At Tue, 19 Sep 2017 16:55:38 -0700, Peter Geoghegan  wrote in 

> On Tue, Sep 19, 2017 at 4:47 PM, Claudio Freire  
> wrote:
> > Maybe this is looking at the problem from the wrong direction.
> >
> > Why can't the page be added to the FSM immediately and the check be
> > done at runtime when looking for a reusable page?
> >
> > Index FSMs currently store only 0 or 255, couldn't they store 128 for
> > half-recyclable pages and make the caller re-check reusability before
> > using it?
> 
> No, because it's impossible for them to know whether or not the page
> that their index scan just landed on recycled just a second ago, or
> was like this since before their xact began/snapshot was acquired.
> 
> For your reference, this RecentGlobalXmin interlock stuff is what
> Lanin & Shasha call "The Drain Technique" within "2.5 Freeing Empty
> Nodes". Seems pretty hard to do it any other way.

Anyway(:p) the attached first patch is a PoC for the
cleanup-state-in-stats method works only for btree. Some
LOG-level debugging messages are put in the patch to show how it
works.

The following steps makes a not-recyclable page but I'm not sure
it is general enough, and I couldn't generate half-dead pages.
The pg_sleep() in the following steps is inserted in order to see
the updated values in stats.

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a int);
CREATE INDEX ON t1 (a);
INSERT INTO t1 (SELECT a FROM generate_series(0, 80) a);
DELETE FROM t1 WHERE a > 416700 AND a < 417250;
VACUUM t1;
DELETE FROM t1;
VACUUM t1;  -- 1 (or wait for autovacuum)
select pg_sleep(1);
VACUUM t1;  -- 2 (autovacuum doesn't work)
select pg_sleep(1);
VACUUM t1;  -- 3 (ditto)

The following logs are emited while the three VACUUMs are issued.

# VACUUM t1;  -- 1 (or wait for autovacuum)
 LOG:  btvacuumscan(t1_a_idx) result: deleted = 2185, notrecyclable = 1, 
hafldead = 0, no_cleanup_needed = false
 LOG:  Vacuum cleanup of index t1_a_idx is NOT skipped
 LOG:  btvacuumcleanup on index t1_a_idx is skipped since bulkdelete has run 
just before.
# VACUUM t1;  -- 2
 LOG:  Vacuum cleanup of index t1_a_idx is NOT skipped
 LOG:  btvacuumscan(t1_a_idx) result: deleted = 2192, notrecyclable = 0, 
hafldead = 0, no_cleanup_needed = true
# VACUUM t1;  -- 3
 LOG:  Vacuum cleanup of index t1_a_idx is skipped

VACUUM #1 leaves a unrecyclable page and requests the next cleanup.
VACUUM #2 leaves no unrecyclable page and inhibits the next cleanup.
VACUUM #3 (and ever after) no vacuum cleanup executed.

# I suppose it is a known issue that the cleanup cycles are not
# executed automatically unless new dead tuples are generated.

- Getting stats takes a very long time to fail during
  initdb. Since I couldn't find the right way to cope with this,
  I added a tentative function pgstat_live(), which checks that
  the backend has a valid stats socket.

- The patch calls pg_stat_get_vac_cleanup_needed using
  DirectFunctionCall. It might be better be wrapped.

As a byproduct, this enables us to run extra autovacuum rounds fo
r index cleanup. With the second attached, autovacuum works as
follows.

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a int);
CREATE INDEX ON t1 (a);
INSERT INTO t1 (SELECT a FROM generate_series(0, 80) a);
DELETE FROM t1 WHERE a > 416700 AND a < 417250;
(autovacuum on t1 runs)
> LOG:  btvacuumscan(t1_a_idx) result: deleted = 0, notrecyclable = 0, hafldead 
> = 0, no_cleanup_needed = true
> LOG:  Vacuum cleanup of index t1_a_idx is skipped
> LOG:  automatic vacuum of table "postgres.public.t1": index scans: 1
DELETE FROM t1;
(autovacuum on t1 runs)
> LOG:  btvacuumscan(t1_a_idx) result: deleted = 2185, notrecyclable = 1, 
> hafldead = 0, no_cleanup_needed = false
> LOG:  Vacuum cleanup of index t1_a_idx is NOT skipped
> LOG:  btvacuumcleanup on index t1_a_idx is skipped since bulkdelete has run 
> just before.
> LOG:  automatic vacuum of table "postgres.public.t1": index scans: 1
(cleanup vacuum runs for t1 in the next autovac timing)
> LOG:  Vacuum cleanup of index t1_a_idx is NOT skipped
> LOG:  btvacuumscan(t1_a_idx) result: deleted = 2192, notrecyclable = 0, 
> hafldead = 0, no_cleanup_needed = true
> LOG:  automatic vacuum of table "postgres.public.t1": index scans: 0

Any suggestions are welcome.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
*** a/src/backend/access/nbtree/nbtpage.c
--- b/src/backend/access/nbtree/nbtpage.c
***
*** 1110,1116  _bt_pagedel(Relation rel, Buffer buf)
  {
  	int			ndeleted = 0;
  	BlockNumber rightsib;
! 	bool		rightsib_empty;
  	Page		page;
  	BTPageOpaque opaque;

--- 1110,1116 
  {
  	int			ndeleted = 0;
  	BlockNumber rightsib;
! 	bool		rightsib_empty = false;
  	Page		page;
  	BTPageOpaque opaque;

*** a/src/backend/access/nbtree/nbtree.c
--- b/src/backend/access/nbtree/nbtree.c
***
*** 63,68  typedef struct
--- 63,70 
  	BlockNumber

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-19 Thread Peter Geoghegan

On Tue, Sep 19, 2017 at 4:47 PM, Claudio Freire  wrote:
> Maybe this is looking at the problem from the wrong direction.
>
> Why can't the page be added to the FSM immediately and the check be
> done at runtime when looking for a reusable page?
>
> Index FSMs currently store only 0 or 255, couldn't they store 128 for
> half-recyclable pages and make the caller re-check reusability before
> using it?

No, because it's impossible for them to know whether or not the page
that their index scan just landed on recycled just a second ago, or
was like this since before their xact began/snapshot was acquired.

For your reference, this RecentGlobalXmin interlock stuff is what
Lanin & Shasha call "The Drain Technique" within "2.5 Freeing Empty
Nodes". Seems pretty hard to do it any other way.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-19 Thread Claudio Freire

On Tue, Sep 19, 2017 at 3:31 AM, Kyotaro HORIGUCHI
 wrote:
> I was just looking the thread since it is found left alone for a
> long time in the CF app.
>
> At Mon, 18 Sep 2017 16:35:58 -0700, Peter Geoghegan  wrote in 
> 
>> On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund  wrote:
>> > Hi,
>> >
>> > On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
>> >> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas  
>> >> wrote:
>> >> [ lots of valuable discussion ]
>> >
>> > I think this patch clearly still is in the design stage, and has
>> > received plenty feedback this CF.  I'll therefore move this to the next
>> > commitfest.
>>
>> Does anyone have ideas on a way forward here? I don't, but then I
>> haven't thought about it in detail in several months.
>
> Is the additional storage in metapage to store the current status
> of vaccum is still unacceptable even if it can avoid useless
> full-page scan on indexes especially for stable tables?
>
> Or, how about additional 1 bit in pg_stat_*_index to indicate
> that the index *don't* require vacuum cleanup stage. (default
> value causes cleanup)
>
> index_bulk_delete (or ambulkdelete) returns the flag in
> IndexBulkDeleteResult then lazy_scan_heap stores the flag in
> stats and in the next cycle it is looked up to decide the
> necessity of index cleanup.

Maybe this is looking at the problem from the wrong direction.

Why can't the page be added to the FSM immediately and the check be
done at runtime when looking for a reusable page?

Index FSMs currently store only 0 or 255, couldn't they store 128 for
half-recyclable pages and make the caller re-check reusability before
using it?

This would make the second pass totally unnecessary, for a slight
penalty when looking at the FSM. Ie: 2 lookups in the FSM instead of
one. An initial one with min free space 128 to get a half-recyclable
page, if the check fails on that page, a second lookup with min free
space 255 to get a surely-recycleable page.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-19 Thread Kyotaro HORIGUCHI

I was just looking the thread since it is found left alone for a
long time in the CF app.

At Mon, 18 Sep 2017 16:35:58 -0700, Peter Geoghegan  wrote in 

> On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund  wrote:
> > Hi,
> >
> > On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
> >> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas  
> >> wrote:
> >> [ lots of valuable discussion ]
> >
> > I think this patch clearly still is in the design stage, and has
> > received plenty feedback this CF.  I'll therefore move this to the next
> > commitfest.
> 
> Does anyone have ideas on a way forward here? I don't, but then I
> haven't thought about it in detail in several months.

Is the additional storage in metapage to store the current status
of vaccum is still unacceptable even if it can avoid useless
full-page scan on indexes especially for stable tables?

Or, how about additional 1 bit in pg_stat_*_index to indicate
that the index *don't* require vacuum cleanup stage. (default
value causes cleanup)

index_bulk_delete (or ambulkdelete) returns the flag in
IndexBulkDeleteResult then lazy_scan_heap stores the flag in
stats and in the next cycle it is looked up to decide the
necessity of index cleanup.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-18 Thread Peter Geoghegan

On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund  wrote:
> Hi,
>
> On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
>> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas  wrote:
>> [ lots of valuable discussion ]
>
> I think this patch clearly still is in the design stage, and has
> received plenty feedback this CF.  I'll therefore move this to the next
> commitfest.

Does anyone have ideas on a way forward here? I don't, but then I
haven't thought about it in detail in several months.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-04-05 Thread Andres Freund

Hi,

On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas  wrote:
> [ lots of valuable discussion ]

I think this patch clearly still is in the design stage, and has
received plenty feedback this CF.  I'll therefore move this to the next
commitfest.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-31 Thread Masahiko Sawada

On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas  wrote:
> On Wed, Mar 29, 2017 at 2:23 AM, Masahiko Sawada  
> wrote:
>> I was thinking that the status of this patch is still "Needs review"
>> because I sent latest version patch[1].
>
> I think you're right.
>
> I took a look at this today.  I think there is some problem with the
> design of this patch.  I originally proposed a threshold based on the
> percentage of not-all-visible pages on the theory that we'd just skip
> looking at the indexes altogether in that case.  But that's not what
> the patch does: it only avoids the index *cleanup*, not the index
> *vacuum*.

Hmm, I guess there is misunderstanding on this thread (my English
might have confused you, sorry).

I've been proposing the patch that allows lazy vacuum skip only the
index cleanup vacuum. Because the problem reported by xu jian[1] is
that second vacuum freeze takes long time even if the table is not
modified since first vaucuum. The reason is that we cannot skip
lazy_cleanup_index even if the table is not changed at all since
previous vacuum. If the table has a garbage the lazy vacuum does
lazy_vacuum_index, on the other hand, if the table doesn't have
garbage, which means that only insertion was execued or the table was
not modified, the lazy vacuum does lazy_cleanup_index instead. Since
I'd been knew this restriction I proposed it. That's why proposing new
GUC parameter name has been "vacuum_cleanup_index_scale_factor".

> And the comments in btvacuumcleanup say this:
>
> /*
>  * If btbulkdelete was called, we need not do anything, just return the
>  * stats from the latest btbulkdelete call.  If it wasn't called, we must
>  * still do a pass over the index, to recycle any newly-recyclable pages
>  * and to obtain index statistics.
>  *
>  * Since we aren't going to actually delete any leaf items, there's no
>  * need to go through all the vacuum-cycle-ID pushups.
>  */

In current design, we can skip lazy_cleanup_index in case where the
number of pages need to be vacuumed is less than the threshold. For
example, after frozen whole table by aggressive vacuum the vacuum can
skip scanning on the index if only a few insertion is execute on that
table. But if a transaction made a garbage on the table, this patch
cannot prevent from vacuuming on the index as you mentioned.

By skipping index cleanup we could leave recyclable pages that are not
marked as a recyclable. But it can be reclaimed by both next index
vacuum and next index cleanup becuase btvacuumpage calls
RecordFreeIndexPage for recyclable page. So I think it doesn't become
a problem.

And a next concern pointed by Peter is that we stash an XID when a
btree page is deleted, which is used to determine when it's finally
safe to recycle the page. I prevent from this situation by not
allowing lazy vacuum to skip the index cleanup when aggressive vacuum.
But since this makes the advantage of this patch weak I'm considering
better idea.

> So, if I'm reading this correctly, the only time this patch saves
> substantial work - at least in the case of a btree index - is in the
> case where there are no dead tuples at all.  But if we only want to
> avoid the work in that case, then a threshold based on the percentage
> of all-visible pages seems like the wrong thing, because the other
> stuff btvacuumcleanup() is doing doesn't have anything to do with the
> number of all-visible pages.

I thought that if the lazy vacuum skipped almost table according to
visibility map it means that the state of table is changed just a
little and we can skip updating the index statistics. Am I missing
something?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-31 Thread Robert Haas

On Wed, Mar 29, 2017 at 2:23 AM, Masahiko Sawada  wrote:
> I was thinking that the status of this patch is still "Needs review"
> because I sent latest version patch[1].

I think you're right.

I took a look at this today.  I think there is some problem with the
design of this patch.  I originally proposed a threshold based on the
percentage of not-all-visible pages on the theory that we'd just skip
looking at the indexes altogether in that case.  But that's not what
the patch does: it only avoids the index *cleanup*, not the index
*vacuum*.  And the comments in btvacuumcleanup say this:

/*
 * If btbulkdelete was called, we need not do anything, just return the
 * stats from the latest btbulkdelete call.  If it wasn't called, we must
 * still do a pass over the index, to recycle any newly-recyclable pages
 * and to obtain index statistics.
 *
 * Since we aren't going to actually delete any leaf items, there's no
 * need to go through all the vacuum-cycle-ID pushups.
 */

So, if I'm reading this correctly, the only time this patch saves
substantial work - at least in the case of a btree index - is in the
case where there are no dead tuples at all.  But if we only want to
avoid the work in that case, then a threshold based on the percentage
of all-visible pages seems like the wrong thing, because the other
stuff btvacuumcleanup() is doing doesn't have anything to do with the
number of all-visible pages.

I'm not quite sure what the right thing to do is here, but I'm
doubtful that this is it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-31 Thread David Steele


On 3/29/17 2:23 AM, Masahiko Sawada wrote:

On Wed, Mar 29, 2017 at 12:23 AM, David Steele  wrote:

On 3/23/17 1:54 AM, Masahiko Sawada wrote:


On Wed, Mar 15, 2017 at 7:51 AM, Peter Geoghegan  wrote:


On Tue, Mar 14, 2017 at 3:10 PM, Peter Geoghegan  wrote:


We already have BTPageOpaqueData.btpo, a union whose contained type
varies based on the page being dead. We could just do the same with
some other field in that struct, and then store epoch there. Clearly
nobody really cares about most data that remains on the page. Index
scans just need to be able to land on it to determine that it's dead,
and VACUUM needs to be able to determine whether or not there could
possibly be such an index scan at the time it considers recycling..



ISTM that we need all of the fields within BTPageOpaqueData even for
dead pages, actually. The left links and right links still need to be
sane, and the flag bits are needed. Plus, the field that stores an XID
already is clearly necessary. Even if they weren't needed, it would
probably still be a good idea to keep them around for forensic
purposes. However, the page header field pd_prune_xid is currently
unused for indexes, and is the same width as CheckPoint.nextXidEpoch
(the extra thing we might want to store -- the epoch).

Maybe you could store the epoch within that field when B-Tree VACUUM
deletes a page, and then compare that within _bt_page_recyclable(). It
would come before the existing XID comparison in that function. One
nice thing about this idea is that pd_prune_xid will be all-zero for
index pages from the current format, so there is no need to take
special care to make sure that databases that have undergone
pg_upgrade don't break.



Thank you for the suggestion!
If we store the poch within union field, I think we will not be able
to use BTPageOpaqueData.btpo.xact at the same time. Since comparing
btpo.xact is still necessary to determine if that page is recyclable
we cannot store the epoch into the same union field. And if we store
it into BTPageOpaqueData, it would break disk compatibility.



I have marked this patch "Waiting for Author".

This thread has been idle for five days.  Please respond with a new patch by
2017-03-30 00:00 AoE (UTC-12) or this submission will be marked "Returned
with Feedback".



I was thinking that the status of this patch is still "Needs review"
because I sent latest version patch[1]. We're still under discussion
how safely we can skip to the cleanup vacuum on btree index. That
patch has some restrictions but it would be good for first step.


I've marked this patch as "Needs Review".  Feel free to do that on your 
own if you think I've made a mistake in classifying a patch.


My view is that the patch is stalled and something might be required on 
your part to get it moving again, perhaps trying another approach.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-29 Thread Masahiko Sawada

On Wed, Mar 29, 2017 at 12:23 AM, David Steele  wrote:
> On 3/23/17 1:54 AM, Masahiko Sawada wrote:
>>
>> On Wed, Mar 15, 2017 at 7:51 AM, Peter Geoghegan  wrote:
>>>
>>> On Tue, Mar 14, 2017 at 3:10 PM, Peter Geoghegan  wrote:

 We already have BTPageOpaqueData.btpo, a union whose contained type
 varies based on the page being dead. We could just do the same with
 some other field in that struct, and then store epoch there. Clearly
 nobody really cares about most data that remains on the page. Index
 scans just need to be able to land on it to determine that it's dead,
 and VACUUM needs to be able to determine whether or not there could
 possibly be such an index scan at the time it considers recycling..
>>>
>>>
>>> ISTM that we need all of the fields within BTPageOpaqueData even for
>>> dead pages, actually. The left links and right links still need to be
>>> sane, and the flag bits are needed. Plus, the field that stores an XID
>>> already is clearly necessary. Even if they weren't needed, it would
>>> probably still be a good idea to keep them around for forensic
>>> purposes. However, the page header field pd_prune_xid is currently
>>> unused for indexes, and is the same width as CheckPoint.nextXidEpoch
>>> (the extra thing we might want to store -- the epoch).
>>>
>>> Maybe you could store the epoch within that field when B-Tree VACUUM
>>> deletes a page, and then compare that within _bt_page_recyclable(). It
>>> would come before the existing XID comparison in that function. One
>>> nice thing about this idea is that pd_prune_xid will be all-zero for
>>> index pages from the current format, so there is no need to take
>>> special care to make sure that databases that have undergone
>>> pg_upgrade don't break.
>>>
>>
>> Thank you for the suggestion!
>> If we store the poch within union field, I think we will not be able
>> to use BTPageOpaqueData.btpo.xact at the same time. Since comparing
>> btpo.xact is still necessary to determine if that page is recyclable
>> we cannot store the epoch into the same union field. And if we store
>> it into BTPageOpaqueData, it would break disk compatibility.
>
>
> I have marked this patch "Waiting for Author".
>
> This thread has been idle for five days.  Please respond with a new patch by
> 2017-03-30 00:00 AoE (UTC-12) or this submission will be marked "Returned
> with Feedback".
>

I was thinking that the status of this patch is still "Needs review"
because I sent latest version patch[1]. We're still under discussion
how safely we can skip to the cleanup vacuum on btree index. That
patch has some restrictions but it would be good for first step.

[1] 
https://www.postgresql.org/message-id/CAD21AoDCmnoqKuKOmge6uc5AJAWOcLAz6jjB_WPSPFVQT5PkUA%40mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-28 Thread David Steele


On 3/23/17 1:54 AM, Masahiko Sawada wrote:

On Wed, Mar 15, 2017 at 7:51 AM, Peter Geoghegan  wrote:

On Tue, Mar 14, 2017 at 3:10 PM, Peter Geoghegan  wrote:

We already have BTPageOpaqueData.btpo, a union whose contained type
varies based on the page being dead. We could just do the same with
some other field in that struct, and then store epoch there. Clearly
nobody really cares about most data that remains on the page. Index
scans just need to be able to land on it to determine that it's dead,
and VACUUM needs to be able to determine whether or not there could
possibly be such an index scan at the time it considers recycling..


ISTM that we need all of the fields within BTPageOpaqueData even for
dead pages, actually. The left links and right links still need to be
sane, and the flag bits are needed. Plus, the field that stores an XID
already is clearly necessary. Even if they weren't needed, it would
probably still be a good idea to keep them around for forensic
purposes. However, the page header field pd_prune_xid is currently
unused for indexes, and is the same width as CheckPoint.nextXidEpoch
(the extra thing we might want to store -- the epoch).

Maybe you could store the epoch within that field when B-Tree VACUUM
deletes a page, and then compare that within _bt_page_recyclable(). It
would come before the existing XID comparison in that function. One
nice thing about this idea is that pd_prune_xid will be all-zero for
index pages from the current format, so there is no need to take
special care to make sure that databases that have undergone
pg_upgrade don't break.



Thank you for the suggestion!
If we store the poch within union field, I think we will not be able
to use BTPageOpaqueData.btpo.xact at the same time. Since comparing
btpo.xact is still necessary to determine if that page is recyclable
we cannot store the epoch into the same union field. And if we store
it into BTPageOpaqueData, it would break disk compatibility.


I have marked this patch "Waiting for Author".

This thread has been idle for five days.  Please respond with a new 
patch by 2017-03-30 00:00 AoE (UTC-12) or this submission will be marked 
"Returned with Feedback".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-22 Thread Masahiko Sawada

On Wed, Mar 15, 2017 at 7:51 AM, Peter Geoghegan  wrote:
> On Tue, Mar 14, 2017 at 3:10 PM, Peter Geoghegan  wrote:
>> We already have BTPageOpaqueData.btpo, a union whose contained type
>> varies based on the page being dead. We could just do the same with
>> some other field in that struct, and then store epoch there. Clearly
>> nobody really cares about most data that remains on the page. Index
>> scans just need to be able to land on it to determine that it's dead,
>> and VACUUM needs to be able to determine whether or not there could
>> possibly be such an index scan at the time it considers recycling..
>
> ISTM that we need all of the fields within BTPageOpaqueData even for
> dead pages, actually. The left links and right links still need to be
> sane, and the flag bits are needed. Plus, the field that stores an XID
> already is clearly necessary. Even if they weren't needed, it would
> probably still be a good idea to keep them around for forensic
> purposes. However, the page header field pd_prune_xid is currently
> unused for indexes, and is the same width as CheckPoint.nextXidEpoch
> (the extra thing we might want to store -- the epoch).
>
> Maybe you could store the epoch within that field when B-Tree VACUUM
> deletes a page, and then compare that within _bt_page_recyclable(). It
> would come before the existing XID comparison in that function. One
> nice thing about this idea is that pd_prune_xid will be all-zero for
> index pages from the current format, so there is no need to take
> special care to make sure that databases that have undergone
> pg_upgrade don't break.
>

Thank you for the suggestion!
If we store the poch within union field, I think we will not be able
to use BTPageOpaqueData.btpo.xact at the same time. Since comparing
btpo.xact is still necessary to determine if that page is recyclable
we cannot store the epoch into the same union field. And if we store
it into BTPageOpaqueData, it would break disk compatibility.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-22 Thread Robert Haas

On Tue, Mar 21, 2017 at 8:18 PM, Peter Geoghegan  wrote:
>> Not only might that be unnecessary, but if we don't have a test
>> demonstrating the problem, we also don't have a test demonstrating
>> that a given approach fixes it.
>
> Preventing recycling from happening until we feel like it is probably
> fine. It is not fine to break it forever, though. The specific problem
> is that there is an XID stored in dead B-Tree + SP-GiST pages that is
> used in the subsequent RecentGlobalXmin interlock that determines if
> recycling is safe (if there is no possible index scan that could land
> on the dead page). You know, the _bt_page_recyclable() check.

Oh.  OK, so this is not just about bloat -- it's about whether this
can be safely done at all.  Somehow, I missed that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-21 Thread Peter Geoghegan

On Tue, Mar 21, 2017 at 12:15 PM, Robert Haas  wrote:
> Wouldn't it break on-disk compatibility with existing btree indexes?

Yes, it would, but see my later remarks on pd_prune_xid. I think that
that would be safe.

> I think we're still trying to solve a problem that Simon postulated in
> advance of evidence that shows how much of a problem it actually is.

I don't think we're still doing that. I think we're discussing the
risk of recycling being broken indefinitely when it doesn't happen in
time.

> Not only might that be unnecessary, but if we don't have a test
> demonstrating the problem, we also don't have a test demonstrating
> that a given approach fixes it.

Preventing recycling from happening until we feel like it is probably
fine. It is not fine to break it forever, though. The specific problem
is that there is an XID stored in dead B-Tree + SP-GiST pages that is
used in the subsequent RecentGlobalXmin interlock that determines if
recycling is safe (if there is no possible index scan that could land
on the dead page). You know, the _bt_page_recyclable() check.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-21 Thread Robert Haas

On Tue, Mar 14, 2017 at 6:10 PM, Peter Geoghegan  wrote:
> On Tue, Mar 14, 2017 at 2:48 PM, Peter Geoghegan  wrote:
>> I think that that's safe, but it is a little disappointing that it
>> does not allow us to skip work in the case that you really had in mind
>> when writing the patch. Better than nothing, though, and perhaps still
>> a good start. I would like to hear other people's opinions.
>
> Hmm. So, the SQL-callable function txid_current() exports a 64-bit
> XID, which includes an "epoch". If PostgreSQL used these 64-bit XIDs
> natively, we'd never need to freeze. Obviously we don't do that
> because the per-tuple overhead of visibility information is already
> high, and that would make it much worse. But, we can easily afford the
> extra overhead in a completely dead page. Maybe we can overcome the
> _bt_page_recyclable() limitation by being cleverer about how it
> determines if recycling is safe -- it can examine epoch, too. This
> would also be required in the similar function
> vacuumRedirectAndPlaceholder(), which is a part of SP-GiST.
>
> We already have BTPageOpaqueData.btpo, a union whose contained type
> varies based on the page being dead. We could just do the same with
> some other field in that struct, and then store epoch there. Clearly
> nobody really cares about most data that remains on the page. Index
> scans just need to be able to land on it to determine that it's dead,
> and VACUUM needs to be able to determine whether or not there could
> possibly be such an index scan at the time it considers recycling..
>
> Does anyone see a problem with that?

Wouldn't it break on-disk compatibility with existing btree indexes?

I think we're still trying to solve a problem that Simon postulated in
advance of evidence that shows how much of a problem it actually is.
Not only might that be unnecessary, but if we don't have a test
demonstrating the problem, we also don't have a test demonstrating
that a given approach fixes it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-21 Thread Masahiko Sawada

On Wed, Mar 22, 2017 at 2:29 AM, David Steele  wrote:
> Hi,
>
> On 3/15/17 9:50 AM, Amit Kapila wrote:
>>
>>
>> What about if somebody does manual vacuum and there are no garbage
>> tuples to clean, won't in that case also you want to avoid skipping
>> the lazy_cleanup_index?  Another option could be to skip updating the
>> relfrozenxid if we have skipped the index cleanup.
>
>
> This thread has been idle for six days.  Please respond and/or post a new
> patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be marked
> "Returned with Feedback".
>

I've changes the patch so that lazy_cleanup_index is not skipped when
aggressive vacuum. Attached patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


skip_cleanup_indexdes_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-21 Thread David Steele


Hi,

On 3/15/17 9:50 AM, Amit Kapila wrote:


What about if somebody does manual vacuum and there are no garbage
tuples to clean, won't in that case also you want to avoid skipping
the lazy_cleanup_index?  Another option could be to skip updating the
relfrozenxid if we have skipped the index cleanup.


This thread has been idle for six days.  Please respond and/or post a 
new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be 
marked "Returned with Feedback".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-21 Thread Masahiko Sawada

On Wed, Mar 15, 2017 at 4:50 PM, Amit Kapila  wrote:
> On Thu, Mar 9, 2017 at 10:21 PM, Masahiko Sawada  
> wrote:
>> On Wed, Mar 8, 2017 at 1:43 AM, Peter Geoghegan  wrote:
>>> On Sat, Mar 4, 2017 at 1:30 AM, Amit Kapila  wrote:
> While I can't see this explained anywhere, I'm
> pretty sure that that's supposed to be impossible, which this patch
> changes.
>

 What makes you think that patch will allow pg_class.relfrozenxid to be
 advanced past opaque->btpo.xact which was previously not possible?
>>>
>>> By not reliably recycling pages in a timely manner, we won't change
>>> anything about the dead page. It just sticks around. This is mostly
>>> fine, but we still need VACUUM to be able to reason about it (to
>>> determine if it is in fact recyclable), which is what I'm concerned
>>> about breaking here. It still needs to be *possible* to recycle any
>>> recyclable page at some point (e.g., when we find it convenient).
>>>
>>> pg_class.relfrozenxid is InvalidTransactionId for indexes because
>>> indexes generally don't store XIDs. This is the one exception that I'm
>>> aware of, presumably justified by the fact that it's only for
>>> recyclable pages anyway, and those are currently *guaranteed* to get
>>> recycled quickly. In particular, they're guaranteed to get recycled by
>>> the next VACUUM. They may be recycled in the course of anti-wraparound
>>> VACUUM, even if VACUUM has no garbage tuples to kill (even if we only
>>> do lazy_cleanup_index() instead of lazy_vacuum_index()). This is the
>>> case that this patch proposes to have us skip touching indexes for.
>>>
>>
>> To prevent this, I think we need to not skip the lazy_cleanup_index
>> when ant-wraparound vacuum (aggressive = true) even if the number of
>> scanned pages is less then the threshold. This can ensure that
>> pg_class.relfrozenxid is not advanced past opaque->bp.xact with
>> minimal pain. Even if the btree dead page is leaved, the subsequent
>> modification makes garbage on heap and then autovauum recycles that
>> page while index vacuuming(lazy_index_vacuum).
>>
>
> What about if somebody does manual vacuum and there are no garbage
> tuples to clean, won't in that case also you want to avoid skipping
> the lazy_cleanup_index?

Yes, in that case lazy_cleanup_index will be skipped.

> Another option could be to skip updating the
> relfrozenxid if we have skipped the index cleanup.

This could make anti-wraparound VACUUM occurs at high frequency and we
cannot skip lazy_clean_index when aggressive vacuum after all.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-15 Thread Amit Kapila

On Thu, Mar 9, 2017 at 10:21 PM, Masahiko Sawada  wrote:
> On Wed, Mar 8, 2017 at 1:43 AM, Peter Geoghegan  wrote:
>> On Sat, Mar 4, 2017 at 1:30 AM, Amit Kapila  wrote:
 While I can't see this explained anywhere, I'm
 pretty sure that that's supposed to be impossible, which this patch
 changes.

>>>
>>> What makes you think that patch will allow pg_class.relfrozenxid to be
>>> advanced past opaque->btpo.xact which was previously not possible?
>>
>> By not reliably recycling pages in a timely manner, we won't change
>> anything about the dead page. It just sticks around. This is mostly
>> fine, but we still need VACUUM to be able to reason about it (to
>> determine if it is in fact recyclable), which is what I'm concerned
>> about breaking here. It still needs to be *possible* to recycle any
>> recyclable page at some point (e.g., when we find it convenient).
>>
>> pg_class.relfrozenxid is InvalidTransactionId for indexes because
>> indexes generally don't store XIDs. This is the one exception that I'm
>> aware of, presumably justified by the fact that it's only for
>> recyclable pages anyway, and those are currently *guaranteed* to get
>> recycled quickly. In particular, they're guaranteed to get recycled by
>> the next VACUUM. They may be recycled in the course of anti-wraparound
>> VACUUM, even if VACUUM has no garbage tuples to kill (even if we only
>> do lazy_cleanup_index() instead of lazy_vacuum_index()). This is the
>> case that this patch proposes to have us skip touching indexes for.
>>
>
> To prevent this, I think we need to not skip the lazy_cleanup_index
> when ant-wraparound vacuum (aggressive = true) even if the number of
> scanned pages is less then the threshold. This can ensure that
> pg_class.relfrozenxid is not advanced past opaque->bp.xact with
> minimal pain. Even if the btree dead page is leaved, the subsequent
> modification makes garbage on heap and then autovauum recycles that
> page while index vacuuming(lazy_index_vacuum).
>

What about if somebody does manual vacuum and there are no garbage
tuples to clean, won't in that case also you want to avoid skipping
the lazy_cleanup_index?  Another option could be to skip updating the
relfrozenxid if we have skipped the index cleanup.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-14 Thread Peter Geoghegan

On Tue, Mar 14, 2017 at 3:10 PM, Peter Geoghegan  wrote:
> We already have BTPageOpaqueData.btpo, a union whose contained type
> varies based on the page being dead. We could just do the same with
> some other field in that struct, and then store epoch there. Clearly
> nobody really cares about most data that remains on the page. Index
> scans just need to be able to land on it to determine that it's dead,
> and VACUUM needs to be able to determine whether or not there could
> possibly be such an index scan at the time it considers recycling..

ISTM that we need all of the fields within BTPageOpaqueData even for
dead pages, actually. The left links and right links still need to be
sane, and the flag bits are needed. Plus, the field that stores an XID
already is clearly necessary. Even if they weren't needed, it would
probably still be a good idea to keep them around for forensic
purposes. However, the page header field pd_prune_xid is currently
unused for indexes, and is the same width as CheckPoint.nextXidEpoch
(the extra thing we might want to store -- the epoch).

Maybe you could store the epoch within that field when B-Tree VACUUM
deletes a page, and then compare that within _bt_page_recyclable(). It
would come before the existing XID comparison in that function. One
nice thing about this idea is that pd_prune_xid will be all-zero for
index pages from the current format, so there is no need to take
special care to make sure that databases that have undergone
pg_upgrade don't break.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-14 Thread Peter Geoghegan

On Tue, Mar 14, 2017 at 2:48 PM, Peter Geoghegan  wrote:
> I think that that's safe, but it is a little disappointing that it
> does not allow us to skip work in the case that you really had in mind
> when writing the patch. Better than nothing, though, and perhaps still
> a good start. I would like to hear other people's opinions.

Hmm. So, the SQL-callable function txid_current() exports a 64-bit
XID, which includes an "epoch". If PostgreSQL used these 64-bit XIDs
natively, we'd never need to freeze. Obviously we don't do that
because the per-tuple overhead of visibility information is already
high, and that would make it much worse. But, we can easily afford the
extra overhead in a completely dead page. Maybe we can overcome the
_bt_page_recyclable() limitation by being cleverer about how it
determines if recycling is safe -- it can examine epoch, too. This
would also be required in the similar function
vacuumRedirectAndPlaceholder(), which is a part of SP-GiST.

We already have BTPageOpaqueData.btpo, a union whose contained type
varies based on the page being dead. We could just do the same with
some other field in that struct, and then store epoch there. Clearly
nobody really cares about most data that remains on the page. Index
scans just need to be able to land on it to determine that it's dead,
and VACUUM needs to be able to determine whether or not there could
possibly be such an index scan at the time it considers recycling..

Does anyone see a problem with that?

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-14 Thread Peter Geoghegan

On Thu, Mar 9, 2017 at 8:51 AM, Masahiko Sawada  wrote:
>> pg_class.relfrozenxid is InvalidTransactionId for indexes because
>> indexes generally don't store XIDs. This is the one exception that I'm
>> aware of, presumably justified by the fact that it's only for
>> recyclable pages anyway, and those are currently *guaranteed* to get
>> recycled quickly. In particular, they're guaranteed to get recycled by
>> the next VACUUM. They may be recycled in the course of anti-wraparound
>> VACUUM, even if VACUUM has no garbage tuples to kill (even if we only
>> do lazy_cleanup_index() instead of lazy_vacuum_index()). This is the
>> case that this patch proposes to have us skip touching indexes for.
>>
>
> To prevent this, I think we need to not skip the lazy_cleanup_index
> when ant-wraparound vacuum (aggressive = true) even if the number of
> scanned pages is less then the threshold. This can ensure that
> pg_class.relfrozenxid is not advanced past opaque->bp.xact with
> minimal pain. Even if the btree dead page is leaved, the subsequent
> modification makes garbage on heap and then autovauum recycles that
> page while index vacuuming(lazy_index_vacuum).

Sorry for the delay in getting back to you.

I think that that's safe, but it is a little disappointing that it
does not allow us to skip work in the case that you really had in mind
when writing the patch. Better than nothing, though, and perhaps still
a good start. I would like to hear other people's opinions.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-09 Thread Masahiko Sawada

On Wed, Mar 8, 2017 at 1:43 AM, Peter Geoghegan  wrote:
> On Sat, Mar 4, 2017 at 1:30 AM, Amit Kapila  wrote:
>>> While I can't see this explained anywhere, I'm
>>> pretty sure that that's supposed to be impossible, which this patch
>>> changes.
>>>
>>
>> What makes you think that patch will allow pg_class.relfrozenxid to be
>> advanced past opaque->btpo.xact which was previously not possible?
>
> By not reliably recycling pages in a timely manner, we won't change
> anything about the dead page. It just sticks around. This is mostly
> fine, but we still need VACUUM to be able to reason about it (to
> determine if it is in fact recyclable), which is what I'm concerned
> about breaking here. It still needs to be *possible* to recycle any
> recyclable page at some point (e.g., when we find it convenient).
>
> pg_class.relfrozenxid is InvalidTransactionId for indexes because
> indexes generally don't store XIDs. This is the one exception that I'm
> aware of, presumably justified by the fact that it's only for
> recyclable pages anyway, and those are currently *guaranteed* to get
> recycled quickly. In particular, they're guaranteed to get recycled by
> the next VACUUM. They may be recycled in the course of anti-wraparound
> VACUUM, even if VACUUM has no garbage tuples to kill (even if we only
> do lazy_cleanup_index() instead of lazy_vacuum_index()). This is the
> case that this patch proposes to have us skip touching indexes for.
>

To prevent this, I think we need to not skip the lazy_cleanup_index
when ant-wraparound vacuum (aggressive = true) even if the number of
scanned pages is less then the threshold. This can ensure that
pg_class.relfrozenxid is not advanced past opaque->bp.xact with
minimal pain. Even if the btree dead page is leaved, the subsequent
modification makes garbage on heap and then autovauum recycles that
page while index vacuuming(lazy_index_vacuum).

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-07 Thread Peter Geoghegan

On Sat, Mar 4, 2017 at 1:30 AM, Amit Kapila  wrote:
>> While I can't see this explained anywhere, I'm
>> pretty sure that that's supposed to be impossible, which this patch
>> changes.
>>
>
> What makes you think that patch will allow pg_class.relfrozenxid to be
> advanced past opaque->btpo.xact which was previously not possible?

By not reliably recycling pages in a timely manner, we won't change
anything about the dead page. It just sticks around. This is mostly
fine, but we still need VACUUM to be able to reason about it (to
determine if it is in fact recyclable), which is what I'm concerned
about breaking here. It still needs to be *possible* to recycle any
recyclable page at some point (e.g., when we find it convenient).

pg_class.relfrozenxid is InvalidTransactionId for indexes because
indexes generally don't store XIDs. This is the one exception that I'm
aware of, presumably justified by the fact that it's only for
recyclable pages anyway, and those are currently *guaranteed* to get
recycled quickly. In particular, they're guaranteed to get recycled by
the next VACUUM. They may be recycled in the course of anti-wraparound
VACUUM, even if VACUUM has no garbage tuples to kill (even if we only
do lazy_cleanup_index() instead of lazy_vacuum_index()). This is the
case that this patch proposes to have us skip touching indexes for.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-06 Thread Masahiko Sawada

On Sat, Mar 4, 2017 at 4:37 AM, Peter Geoghegan  wrote:
> On Fri, Mar 3, 2017 at 8:13 AM, Masahiko Sawada  wrote:
>> Thank you for clarification. Let me check my understanding. IIUC,
>> skipping second index vacuum path (lazy_cleanup_index) can not be
>> cause of leaving page as half-dead state but could leave recyclable
>> pages that are not marked as a recyclable. But second one, it can be
>> reclaimed by next index vacuum because btvacuumpage calls
>> RecordFreeIndexPage for recyclable page. Am I missing something?
>
> I think that this is a good summary. You have not missed anything.
>
>> My first motivation of this patch is to skip the second index vacuum
>> patch when vacuum skipped whole table by visibility map.
>
> Makes sense.
>
>> But as Robert
>> suggested on another thread, I changed it to have a threshold. If my
>> understanding is correct, we can have a threshold that specifies the
>> fraction of the scanned pages by vacuum. If it's set 0.1,
>> lazy_scan_heap can do the second vacuum index only when 10% of table
>> is scanned. IOW, if 90% of table pages is skipped, which means almost
>> of table has not changed since previous vacuum, we can skip the second
>> index vacuum.
>
> It sounds like a setting of 0.1 would leave us in a position where
> it's very unlikely that a VACUUM of indexes could fail to occur when
> autovacuum has been triggered in the common way, by the "vacuum
> threshold" having been exceeded. Does this feeling that I have about
> it seem accurate to you? Is that actually your intent? It's hard to
> really be sure, because there are so many confounding factors (e.g.
> HOT), but if that was 100% true, then I suppose there would
> theoretically be zero new risk (except, perhaps, of the "other type of
> bloat" that I described, which I am not very worried about).
>
> Please verify my understanding of your thought process: We don't have
> to freeze indexes at all, ever, so if we see index bloat as a separate
> problem, we also see that there is no need to *link* index needs to
> the need for freezing. XID burn rate is a very bad proxy for how
> bloated an index may be. Besides, we already have a separate trigger
> for the thing that *actually* matters to indexes (the vacuum threshold
> stuff).
>
> Maybe 0.0 is too low as a default for
> vacuum_cleanup_index_scale_factor, even though initially it seemed
> attractive to me for theoretical reasons. Something like 0.01 is still
> "practically zero", but removes the extreme sensitivity that would
> have with 0.0. So, 0.01 might make sense as a default for roughly the
> same reason that autovacuum_vacuum_threshold exists. (Maybe it should
> be more like autovacuum_vacuum_threshold, in that it's an absolute
> minimum number of heap blocks to trigger index clean-up.)

Agreed. Attached current design patch. Also I've changed default value to 0.01.

Remaining issues are two issues; first is what you mentioned about
that pg_class.relfrozenxid/pg_database.datfrozenxid could be advanced
past opaque->btpo.xact. The second is to find out the impact to type
of indexes other than btree. I'm investigating the second one now.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


skip_cleanup_indexdes_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-04 Thread Amit Kapila

On Sat, Mar 4, 2017 at 8:55 AM, Peter Geoghegan  wrote:
> On Fri, Mar 3, 2017 at 6:58 PM, Amit Kapila  wrote:
>> You are right that we don't want the number of unclaimed-by-FSM
>> recyclable pages to grow forever, but I think that won't happen with
>> this patch.  As soon as there are more deletions (in heap), in the
>> next vacuum cycle, the pages will be reclaimed by lazy_vacuum_index().
>
> Right.
>
>>> (Thinks about it some more...)
>>>
>>> Unfortunately, I just saw a whole new problem with this patch:
>>> _bt_page_recyclable() is the one place in the B-Tree AM where we stash
>>> an XID.
>>>
>>
>> Can you be more specific to tell which code exactly you are referring here?
>
> I meant that we stash an XID when a B-Tree page is deleted, used to
> determine when it's finally safe to to recycle the page by comparing
> it to RecentGlobalXmin (recyling will happen during the next VACUUM).
> We can't immediately recycle a page, because an existing index scan
> might land on the page while following a stale pointer.
>
> _bt_page_recyclable(), which checks if recycling is safe (no possible
> index scan can land of dead page), is a pretty small function. I'm
> concerned about what happens when
> pg_class.relfrozenxid/pg_database.datfrozenxid are advanced past
> opaque->btpo.xact.
>

Are you talking pg_class.relfrozenxid of index or table?  IIUC, for
indexes it will be InvalidTransactionId and for tables, I think it
will be updated with or without the patch.

> While I can't see this explained anywhere, I'm
> pretty sure that that's supposed to be impossible, which this patch
> changes.
>

What makes you think that patch will allow pg_class.relfrozenxid to be
advanced past opaque->btpo.xact which was previously not possible?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-03 Thread Peter Geoghegan

On Fri, Mar 3, 2017 at 6:58 PM, Amit Kapila  wrote:
> You are right that we don't want the number of unclaimed-by-FSM
> recyclable pages to grow forever, but I think that won't happen with
> this patch.  As soon as there are more deletions (in heap), in the
> next vacuum cycle, the pages will be reclaimed by lazy_vacuum_index().

Right.

>> (Thinks about it some more...)
>>
>> Unfortunately, I just saw a whole new problem with this patch:
>> _bt_page_recyclable() is the one place in the B-Tree AM where we stash
>> an XID.
>>
>
> Can you be more specific to tell which code exactly you are referring here?

I meant that we stash an XID when a B-Tree page is deleted, used to
determine when it's finally safe to to recycle the page by comparing
it to RecentGlobalXmin (recyling will happen during the next VACUUM).
We can't immediately recycle a page, because an existing index scan
might land on the page while following a stale pointer.

_bt_page_recyclable(), which checks if recycling is safe (no possible
index scan can land of dead page), is a pretty small function. I'm
concerned about what happens when
pg_class.relfrozenxid/pg_database.datfrozenxid are advanced past
opaque->btpo.xact. While I can't see this explained anywhere, I'm
pretty sure that that's supposed to be impossible, which this patch
changes.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-03 Thread Amit Kapila

On Sat, Mar 4, 2017 at 5:59 AM, Peter Geoghegan  wrote:
> On Fri, Mar 3, 2017 at 2:41 PM, Peter Geoghegan  wrote:
>> In other words, the number of B-Tree pages that the last VACUUM
>> deleted, and thus made eligible to recycle by the next VACUUM has no
>> relationship with the number of pages the next VACUUM will itself end
>> up deleting, in general, or how long it will be before that next
>> VACUUM comes, if it comes at all, or anything else that seems at all
>> relevant.
>
> This raises another question, though: Why have this GUC at all? Why
> use *any* threshold that is to be compared against the number of heap
> pages that were processed by VACUUM this time?
>
> B-Tree page deletion isn't really part of the ordinary life cycle of a
> B-Tree index. In order for that to be the case, somebody would have to
> delete large ranges of indexed values (typically hundreds of logically
> contiguous values -- no gaps), without anyone else ever inserting new
> tuples that are in the same range before the next VACUUM. It's very
> unlikely that this would happen again and again in the real world. So,
> even if we never freeze, the number of B-Tree pages that we delete
> when we VACUUM today is generally a useless predictor of how many will
> be deleted by a VACUUM that occurs tomorrow. This is true despite the
> fact that the number of dead heap tuples is probably almost identical
> for each VACUUM (or the number of heap pages that end up being
> processed by VACUUM, if you prefer).
>
> Barring any concerns about crash safety, we can be completely certain
> that any "recycling-orientated B-Tree VACUUM" (a btvacuumcleanup()
> call to btvacuumscan(), which happens because there are no tuples in
> the index to kill) will end up recycling however many pages the last
> VACUUM managed to delete, which is a precisely knowable number (or
> could be made knowable if we stashed that number somewhere, like the
> meta-page). It will typically only take seconds or minutes after the
> VACUUM finishes for its RecentGlobalXmin interlock to stop being a
> problem (that is, for _bt_page_recyclable() to return "true" for any
> pages that that VACUUM deleted). From that point on, those deleted
> pages are "money in the bank" for the FSM. The only reason why we'd
> want to tie "the FSM withdrawing that money" to VACUUM is because that
> might be needed to clean up regular bloat anyway.
>
> The test performed by this patch within lazy_scan_heap(), to determine
> whether we should avoid calling lazy_cleanup_index() would therefore
> look like this, ideally: Do I want to go to the trouble of scanning
> this index (a cost that is proportionate to the size of the index) in
> order to recycle this number of known-deleted pages (a benefit that is
> proportionate to that number)? (I still think that the important thing
> is that we don't let the number of unclaimed-by-FSM recyclable pages
> grow forever, though.)
>

You are right that we don't want the number of unclaimed-by-FSM
recyclable pages to grow forever, but I think that won't happen with
this patch.  As soon as there are more deletions (in heap), in the
next vacuum cycle, the pages will be reclaimed by lazy_vacuum_index().


> (Thinks about it some more...)
>
> Unfortunately, I just saw a whole new problem with this patch:
> _bt_page_recyclable() is the one place in the B-Tree AM where we stash
> an XID.
>

Can you be more specific to tell which code exactly you are referring here?



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-03 Thread Peter Geoghegan

On Fri, Mar 3, 2017 at 2:41 PM, Peter Geoghegan  wrote:
> In other words, the number of B-Tree pages that the last VACUUM
> deleted, and thus made eligible to recycle by the next VACUUM has no
> relationship with the number of pages the next VACUUM will itself end
> up deleting, in general, or how long it will be before that next
> VACUUM comes, if it comes at all, or anything else that seems at all
> relevant.

This raises another question, though: Why have this GUC at all? Why
use *any* threshold that is to be compared against the number of heap
pages that were processed by VACUUM this time?

B-Tree page deletion isn't really part of the ordinary life cycle of a
B-Tree index. In order for that to be the case, somebody would have to
delete large ranges of indexed values (typically hundreds of logically
contiguous values -- no gaps), without anyone else ever inserting new
tuples that are in the same range before the next VACUUM. It's very
unlikely that this would happen again and again in the real world. So,
even if we never freeze, the number of B-Tree pages that we delete
when we VACUUM today is generally a useless predictor of how many will
be deleted by a VACUUM that occurs tomorrow. This is true despite the
fact that the number of dead heap tuples is probably almost identical
for each VACUUM (or the number of heap pages that end up being
processed by VACUUM, if you prefer).

Barring any concerns about crash safety, we can be completely certain
that any "recycling-orientated B-Tree VACUUM" (a btvacuumcleanup()
call to btvacuumscan(), which happens because there are no tuples in
the index to kill) will end up recycling however many pages the last
VACUUM managed to delete, which is a precisely knowable number (or
could be made knowable if we stashed that number somewhere, like the
meta-page). It will typically only take seconds or minutes after the
VACUUM finishes for its RecentGlobalXmin interlock to stop being a
problem (that is, for _bt_page_recyclable() to return "true" for any
pages that that VACUUM deleted). From that point on, those deleted
pages are "money in the bank" for the FSM. The only reason why we'd
want to tie "the FSM withdrawing that money" to VACUUM is because that
might be needed to clean up regular bloat anyway.

The test performed by this patch within lazy_scan_heap(), to determine
whether we should avoid calling lazy_cleanup_index() would therefore
look like this, ideally: Do I want to go to the trouble of scanning
this index (a cost that is proportionate to the size of the index) in
order to recycle this number of known-deleted pages (a benefit that is
proportionate to that number)? (I still think that the important thing
is that we don't let the number of unclaimed-by-FSM recyclable pages
grow forever, though.)

(Thinks about it some more...)

Unfortunately, I just saw a whole new problem with this patch:
_bt_page_recyclable() is the one place in the B-Tree AM where we stash
an XID. We don't need to set this to FrozenTransactionId at any point,
because this is stashed for deleted pages only, pages that are likely
to be recycled very soon. It might be that the VACUUM that ends up
deleting any such page is an anti-wraparound VACUUM, especially in the
case that this patch really wants to improve. However, with this
patch, that recycling won't happen, of course. As a result,
_bt_page_recyclable() will falsely report that the page is not
recyclable if it is ever asked again.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-03 Thread Peter Geoghegan

On Fri, Mar 3, 2017 at 11:49 AM, Peter Geoghegan  wrote:
>> Please verify my understanding of your thought process: We don't have
>> to freeze indexes at all, ever, so if we see index bloat as a separate
>> problem, we also see that there is no need to *link* index needs to
>> the need for freezing. XID burn rate is a very bad proxy for how
>> bloated an index may be. Besides, we already have a separate trigger
>> for the thing that *actually* matters to indexes (the vacuum threshold
>> stuff).

Wait, I have this wrong. You're only ever avoiding
lazy_cleanup_index(), never lazy_vacuum_index() (I repeated Kuntal's
mistake). So, you're focused on avoiding recycling pages in the case
that no actual bloat is in the index. That hasn't changed from your
earliest version, where there was no new GUC. You are not proposing to
increase VACUUM's tolerance of "the bad kind of bloat". It seems that
this is much less risky than I first thought, then.

> Another thing I wonder about: It would be okay to use the number of
> unset freeze map bits as a reasonable proxy for how much bloat is in
> the index the first time we vacuum. But, don't we then set the freeze
> map bits, losing any record of having skipped indexes?

Still seems like we may want to remember that we skipped the
btvacuumcleanup() calls, to build back-pressure, though. That is, we
might want to do what Simon suggests: something may be stored in the
B-Tree meta-page (for example) to remember that we tolerated not doing
a btvacuumcleanup() when we could have and maybe should have. It would
remember the last number of pages deleted by the last VACUUM. That's
the only thing I can think of that matters. I believe this is what
Simon meant when he said "half dead pages".

OTOH, I see no reason to insist on this meta-page stuff. The only
thing that can delete a B-Tree page is VACUUM. Hopefully, a subsequent
VACUUM will occur that goes on to recycle those deleted pages. But,
it's not as if a "recycling-orientated B-Tree VACUUM" (a
btvacuumcleanup() call to btvacuumscan()) occurs because of a build-up
of back-pressure that has any relationship to recycling. Back pressure
is actually created by "real" bloat, or maybe using up XIDs, which is
not the same thing. Bloat cannot occur within B-Tree pages that are
already fully deleted and (almost?) eligible to be recycled.

In other words, the number of B-Tree pages that the last VACUUM
deleted, and thus made eligible to recycle by the next VACUUM has no
relationship with the number of pages the next VACUUM will itself end
up deleting, in general, or how long it will be before that next
VACUUM comes, if it comes at all, or anything else that seems at all
relevant. In other other words, you are only preventing recycling that
would have occurred pretty much by accident before now, due to
concerns that had nothing to do with recycling in particular.

The really important thing is that you have bound the amount of
recycling that VACUUM can now fail to do to only one "round" of
VACUUM, because newly deleted pages can only occur due to another
VACUUM, which will always do however much recycling it can manage to
do as it scans a B-Tree index. I'm not very worried about VACUUM not
being super-eager about reclaiming disk space for the FSM if there is
a natural limit to that.

It would be intolerable for it to ever be possible for
recyclable-but-not-recycled pages to grow indefinitely, but I see no
risk of that here.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-03 Thread Peter Geoghegan

On Fri, Mar 3, 2017 at 11:37 AM, Peter Geoghegan  wrote:
> Please verify my understanding of your thought process: We don't have
> to freeze indexes at all, ever, so if we see index bloat as a separate
> problem, we also see that there is no need to *link* index needs to
> the need for freezing. XID burn rate is a very bad proxy for how
> bloated an index may be. Besides, we already have a separate trigger
> for the thing that *actually* matters to indexes (the vacuum threshold
> stuff).

Another thing I wonder about: It would be okay to use the number of
unset freeze map bits as a reasonable proxy for how much bloat is in
the index the first time we vacuum. But, don't we then set the freeze
map bits, losing any record of having skipped indexes?

What mechanism exists that allows back-pressure to actually *build up*
over many vacuum anti-wraparound cycles, so that we slowly but surely
get around to actually vacuuming indexes at some point?

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-03 Thread Peter Geoghegan

On Fri, Mar 3, 2017 at 8:13 AM, Masahiko Sawada  wrote:
> Thank you for clarification. Let me check my understanding. IIUC,
> skipping second index vacuum path (lazy_cleanup_index) can not be
> cause of leaving page as half-dead state but could leave recyclable
> pages that are not marked as a recyclable. But second one, it can be
> reclaimed by next index vacuum because btvacuumpage calls
> RecordFreeIndexPage for recyclable page. Am I missing something?

I think that this is a good summary. You have not missed anything.

> My first motivation of this patch is to skip the second index vacuum
> patch when vacuum skipped whole table by visibility map.

Makes sense.

> But as Robert
> suggested on another thread, I changed it to have a threshold. If my
> understanding is correct, we can have a threshold that specifies the
> fraction of the scanned pages by vacuum. If it's set 0.1,
> lazy_scan_heap can do the second vacuum index only when 10% of table
> is scanned. IOW, if 90% of table pages is skipped, which means almost
> of table has not changed since previous vacuum, we can skip the second
> index vacuum.

It sounds like a setting of 0.1 would leave us in a position where
it's very unlikely that a VACUUM of indexes could fail to occur when
autovacuum has been triggered in the common way, by the "vacuum
threshold" having been exceeded. Does this feeling that I have about
it seem accurate to you? Is that actually your intent? It's hard to
really be sure, because there are so many confounding factors (e.g.
HOT), but if that was 100% true, then I suppose there would
theoretically be zero new risk (except, perhaps, of the "other type of
bloat" that I described, which I am not very worried about).

Please verify my understanding of your thought process: We don't have
to freeze indexes at all, ever, so if we see index bloat as a separate
problem, we also see that there is no need to *link* index needs to
the need for freezing. XID burn rate is a very bad proxy for how
bloated an index may be. Besides, we already have a separate trigger
for the thing that *actually* matters to indexes (the vacuum threshold
stuff).

Maybe 0.0 is too low as a default for
vacuum_cleanup_index_scale_factor, even though initially it seemed
attractive to me for theoretical reasons. Something like 0.01 is still
"practically zero", but removes the extreme sensitivity that would
have with 0.0. So, 0.01 might make sense as a default for roughly the
same reason that autovacuum_vacuum_threshold exists. (Maybe it should
be more like autovacuum_vacuum_threshold, in that it's an absolute
minimum number of heap blocks to trigger index clean-up.)

At some point in the future, it may be possible to actually go ahead
with index vacumming, but do only a small amount of B-Tree vacuuming
by a process that is similar to a conventional index scan: A process
that collects index values from those rows found in the heap, and
performs subsequent look-ups to kill tuples in the index. I imagine
that in cases where the freeze map stuff really helps, the only index
is often on a bigserial column or similar, which naturally has good
locality. When you VACUUM heap pages, you attempt to build a range of
values for each index, a little like a BRIN index build. This range is
what you go on to use to do a cheap index-scan-based B-Tree VACUUM.
This could have far far less I/O, though has obvious risks that we
need to worry about. That's work for another release, of course.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-03 Thread Masahiko Sawada

On Fri, Mar 3, 2017 at 10:45 PM, David Steele  wrote:
> On 2/27/17 12:46 PM, Peter Geoghegan wrote:
>> On Sat, Feb 25, 2017 at 10:51 PM, Robert Haas  wrote:
>>
>>> Do you have an idea about that, or any ideas for experiments we could try?
>>
>> Nothing occurs to me right now, unfortunately. However, my general
>> sense is that it would probably be just fine when
>> vacuum_cleanup_index_scale_factor was 0.0, but there might be
>> non-linear increases in "the serious type of index bloat" as the
>> proposed new setting was scaled up. I'd be much more worried about
>> that.
>
> This was originally marked "Waiting on Author" due to some minor
> problems with the patch but on the whole there are much larger issues at
> play.
>
> The tenor seems to be that we should somehow prove the effectiveness of
> this patch one way or the other, but nobody is quite sure how to go
> about that, and in fact it would probably be different for each AM.
>
> Sawada, if you have ideas about how to go about this then we would need
> to see something very soon.  If not, I think marking this RWF is the
> best course of action.
>

Thank you for the remind. I've post new idea about this. After got
consensus about the design, I'm going to update the patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-03 Thread Masahiko Sawada

On Sat, Feb 25, 2017 at 7:10 AM, Peter Geoghegan  wrote:
> On Fri, Feb 24, 2017 at 9:26 AM, Robert Haas  wrote:
>> I think this thread is pretty short on evidence that would let us make
>> a smart decision about what to do here.  I see three possibilities.
>> The first is that this patch is a good idea whether we do something
>> about the issue of half-dead pages or not.  The second is that this
>> patch is a good idea if we do something about the issue of half-dead
>> pages but a bad idea if we don't.  The third is that this patch is a
>> bad idea whether or not we do anything about the issue of half-dead
>> pages.
>
> Half-dead pages are not really relevant to this discussion, AFAICT. I
> think that both you and Simon mean "recyclable" pages. There are
> several levels of indirection involved here, to keep the locking very
> granular, so it gets tricky to talk about.
>
> B-Tree page deletion is like a page split in reverse. It has a
> symmetry with page splits, which have two phases (atomic operations).
> There are also two phases for deletion, the first of which leaves the
> target page without a downlink in its parent, and marks it half dead.
> By the end of the first phase, there are still sibling pointers, so an
> index scan can land on them before the second phase of deletion begins
> -- they can visit a half-dead page before such time as the second
> phase of deletion begins, where the sibling link goes away. So, the
> sibling link isn't stale as such, but the page is still morally dead.
> (Second phase is where we remove even the sibling links, and declare
> it fully dead.)
>
> Even though there are two phases of deletion, the second still occurs
> immediately after the first within VACUUM. The need to have two phases
> is hard to explain, so I won't try, but it suffices to say that VACUUM
> does not actually ever leave a page half dead unless there is a hard
> crash.
>
> Recall that VACUUMing of a B-Tree is performed sequentially, so blocks
> can be recycled without needing to be found via a downlink or sibling
> link by VACUUM. What is at issue here, then, is VACUUM's degree of
> "eagerness" around putting *fully* dead B-Tree pages in the FSM for
> recycling. The interlock with RecentGlobalXmin is what makes it
> impossible for VACUUM to generally fully delete pages, *as well as*
> mark them as recyclable (put them in the FSM) all at once.
>
> Maybe you get this already, since, as I said, the terminology is
> tricky in this area, but I can't tell.
>

Thank you for clarification. Let me check my understanding. IIUC,
skipping second index vacuum path (lazy_cleanup_index) can not be
cause of leaving page as half-dead state but could leave recyclable
pages that are not marked as a recyclable. But second one, it can be
reclaimed by next index vacuum because btvacuumpage calls
RecordFreeIndexPage for recyclable page. Am I missing something?

My first motivation of this patch is to skip the second index vacuum
patch when vacuum skipped whole table by visibility map. But as Robert
suggested on another thread, I changed it to have a threshold. If my
understanding is correct, we can have a threshold that specifies the
fraction of the scanned pages by vacuum. If it's set 0.1,
lazy_scan_heap can do the second vacuum index only when 10% of table
is scanned. IOW, if 90% of table pages is skipped, which means almost
of table has not changed since previous vacuum, we can skip the second
index vacuum.

In this design, we could handle other types of AM as well.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-03 Thread David Steele

On 2/27/17 12:46 PM, Peter Geoghegan wrote:
> On Sat, Feb 25, 2017 at 10:51 PM, Robert Haas  wrote:
>
>> Do you have an idea about that, or any ideas for experiments we could try?
> 
> Nothing occurs to me right now, unfortunately. However, my general
> sense is that it would probably be just fine when
> vacuum_cleanup_index_scale_factor was 0.0, but there might be
> non-linear increases in "the serious type of index bloat" as the
> proposed new setting was scaled up. I'd be much more worried about
> that.

This was originally marked "Waiting on Author" due to some minor
problems with the patch but on the whole there are much larger issues at
play.

The tenor seems to be that we should somehow prove the effectiveness of
this patch one way or the other, but nobody is quite sure how to go
about that, and in fact it would probably be different for each AM.

Sawada, if you have ideas about how to go about this then we would need
to see something very soon.  If not, I think marking this RWF is the
best course of action.

Thanks,
-- 
-David
da...@pgmasters.net

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-27 Thread Peter Geoghegan

On Sat, Feb 25, 2017 at 10:51 PM, Robert Haas  wrote:
> The thing that strikes me based on what you wrote is that our page
> recycling seems to admit of long delays even as things stand.  There's
> no bound on how much time could pass between one index vacuum and the
> next, and RecentGlobalXmin could and probably usually will advance
> past the point that would allow recycling long before the next index
> vacuum cycle.

Agreed.

> I don't know whether that strengthens or weakens
> Simon's argument.

I think it weakens it, but I hesitate to take a firm position on it just yet.

> On the one hand, you could argue that if we're
> already doing this on a long delay, making it even longer probably
> won't hurt much.  On the other hand, you could argue that if the
> current situation is bad, we should at least avoid making it worse.  I
> don't know which of those arguments is correct, if either.

Unsure. I will point out:

* There probably is a big problem with B-Tree index bloat for certain
workloads ("sparse deletion patterns"), that interacts badly with less
frequent VACUUMing.

* Whatever the bloat this patch makes worse is not *that* bloat, at
least with the proposed default for vacuum_cleanup_index_scale_factor;
it's not the bloat we usually think of when we talk about index bloat.
A full index scan will not need to visit any of the dead pages, even
just to immediately skip over them. We just won't be able to recycle
them, which is another problem.

* The problem of not recycling as soon as we'd prefer can only happen
when everything else is, roughly speaking, going right. Which is still
pretty good.  (Again, remarks only apply when the default
vacuum_cleanup_index_scale_factor is used.)

* Roughly speaking, the recycling of disk blocks, and efficient use of
disk space more generally is not a priority for the implementation.
Nor should it be.

I tend to think of this recycling as being more about the worst case
for space utilization than about the average case. Kind of like the
fast root vs. true root thing prevents our needing to descend "skinny"
B-Trees from the true root, which can only really happen following
vast changes to the key space, which are always going to be painful.
These cases are a bit pathological.

For more on this recycling stuff, see section 2.5 of the Lanin and
Shasha paper, "Freeing empty nodes" [1]. It's describing what is
essentially the RecentGlobalXmin interlock, and I think you're right
to point out that that could stand to be a lot more aggressive, which
is maybe the real problem, if any. (The papers remarks suggest we
could stand to be more eager about it.)

> Do you have an idea about that, or any ideas for experiments we could try?

Nothing occurs to me right now, unfortunately. However, my general
sense is that it would probably be just fine when
vacuum_cleanup_index_scale_factor was 0.0, but there might be
non-linear increases in "the serious type of index bloat" as the
proposed new setting was scaled up. I'd be much more worried about
that.

[1] https://archive.org/stream/symmetricconcurr00lani#page/6/mode/2up
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-25 Thread Robert Haas

On Sat, Feb 25, 2017 at 3:40 AM, Peter Geoghegan  wrote:
> On Fri, Feb 24, 2017 at 9:26 AM, Robert Haas  wrote:
>> I think this thread is pretty short on evidence that would let us make
>> a smart decision about what to do here.  I see three possibilities.
>> The first is that this patch is a good idea whether we do something
>> about the issue of half-dead pages or not.  The second is that this
>> patch is a good idea if we do something about the issue of half-dead
>> pages but a bad idea if we don't.  The third is that this patch is a
>> bad idea whether or not we do anything about the issue of half-dead
>> pages.
>
> Half-dead pages are not really relevant to this discussion, AFAICT. I
> think that both you and Simon mean "recyclable" pages. There are
> several levels of indirection involved here, to keep the locking very
> granular, so it gets tricky to talk about.

Thanks for the clarification.  I wasn't very clear on what was going
on here; that helps.

The thing that strikes me based on what you wrote is that our page
recycling seems to admit of long delays even as things stand.  There's
no bound on how much time could pass between one index vacuum and the
next, and RecentGlobalXmin could and probably usually will advance
past the point that would allow recycling long before the next index
vacuum cycle.  I don't know whether that strengthens or weakens
Simon's argument.  On the one hand, you could argue that if we're
already doing this on a long delay, making it even longer probably
won't hurt much.  On the other hand, you could argue that if the
current situation is bad, we should at least avoid making it worse.  I
don't know which of those arguments is correct, if either.  Do you
have an idea about that, or any ideas for experiments we could try?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-24 Thread Amit Kapila

On Sat, Feb 25, 2017 at 3:40 AM, Peter Geoghegan  wrote:
> On Fri, Feb 24, 2017 at 9:26 AM, Robert Haas  wrote:
>> I think this thread is pretty short on evidence that would let us make
>> a smart decision about what to do here.  I see three possibilities.
>> The first is that this patch is a good idea whether we do something
>> about the issue of half-dead pages or not.  The second is that this
>> patch is a good idea if we do something about the issue of half-dead
>> pages but a bad idea if we don't.  The third is that this patch is a
>> bad idea whether or not we do anything about the issue of half-dead
>> pages.
>

+1.  I think we can track the stats from
IndexBulkDeleteResult->pages_free to see the impact of the patch.


> Half-dead pages are not really relevant to this discussion, AFAICT. I
> think that both you and Simon mean "recyclable" pages.
>

Yes, I think so and I think that is just confusion about terminology.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-24 Thread Peter Geoghegan

On Fri, Feb 24, 2017 at 9:26 AM, Robert Haas  wrote:
> I think this thread is pretty short on evidence that would let us make
> a smart decision about what to do here.  I see three possibilities.
> The first is that this patch is a good idea whether we do something
> about the issue of half-dead pages or not.  The second is that this
> patch is a good idea if we do something about the issue of half-dead
> pages but a bad idea if we don't.  The third is that this patch is a
> bad idea whether or not we do anything about the issue of half-dead
> pages.

Half-dead pages are not really relevant to this discussion, AFAICT. I
think that both you and Simon mean "recyclable" pages. There are
several levels of indirection involved here, to keep the locking very
granular, so it gets tricky to talk about.

B-Tree page deletion is like a page split in reverse. It has a
symmetry with page splits, which have two phases (atomic operations).
There are also two phases for deletion, the first of which leaves the
target page without a downlink in its parent, and marks it half dead.
By the end of the first phase, there are still sibling pointers, so an
index scan can land on them before the second phase of deletion begins
-- they can visit a half-dead page before such time as the second
phase of deletion begins, where the sibling link goes away. So, the
sibling link isn't stale as such, but the page is still morally dead.
(Second phase is where we remove even the sibling links, and declare
it fully dead.)

Even though there are two phases of deletion, the second still occurs
immediately after the first within VACUUM. The need to have two phases
is hard to explain, so I won't try, but it suffices to say that VACUUM
does not actually ever leave a page half dead unless there is a hard
crash.

Recall that VACUUMing of a B-Tree is performed sequentially, so blocks
can be recycled without needing to be found via a downlink or sibling
link by VACUUM. What is at issue here, then, is VACUUM's degree of
"eagerness" around putting *fully* dead B-Tree pages in the FSM for
recycling. The interlock with RecentGlobalXmin is what makes it
impossible for VACUUM to generally fully delete pages, *as well as*
mark them as recyclable (put them in the FSM) all at once.

Maybe you get this already, since, as I said, the terminology is
tricky in this area, but I can't tell.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-24 Thread Jim Nasby


On 2/24/17 11:26 AM, Robert Haas wrote:

I think we need to come up with some set of tests to figure out what
actually works well in practice here.  Theories are a good starting
point, but good vacuum behavior is really important, and a patch that
changes it ought to be backed up by at least some experimental
evidence.


I think something else worth considering is that if we had some method 
of mapping heap TIDs back to indexes then a lot (all?) of these problems 
would go away. 10+ years ago the idea of keeping such a mapping would 
probably be untenable, but with resource forks and how much cheaper 
storage is maybe that's no longer the case.


For btree I think this could be done by keeping a second btree ordered 
by ctid that points either to index entries or even just to whole index 
pages. At ~ 20 bytes per entry, even a 1B row index would take ~20GB.


Page splits are obviously a big issue. Maybe it's safe to update the 
ctid map for every item that gets moved when a split happens.


Would a ctid map work for other indexes as well?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-24 Thread Robert Haas

On Fri, Feb 24, 2017 at 8:49 AM, Amit Kapila  wrote:
>> IIUC, I think that we need to have the number of half-dead pages in meta 
>> page.
>
> Don't you think we need to consider backward compatibility if we want
> to do that?

Yeah, that would be an on-disk format break.

I think this thread is pretty short on evidence that would let us make
a smart decision about what to do here.  I see three possibilities.
The first is that this patch is a good idea whether we do something
about the issue of half-dead pages or not.  The second is that this
patch is a good idea if we do something about the issue of half-dead
pages but a bad idea if we don't.  The third is that this patch is a
bad idea whether or not we do anything about the issue of half-dead
pages.

Unfortunately, we have no evidence at all that would let us figure out
which of those three things is true.  The original post didn't include
any relevant benchmarks or test results.  Simon's reply, which
suggested that the problem of half-dead pages, didn't include any
benchmarks or test results.  In fact, in neither place were any tests
suggested, even hypothetically, which would help us decide what to do.
I had a hunch when I saw this thread that it was a good idea, and
Simon has a hunch that this btree page recycling thing needs to be
fixed first, and he might be right.  Or we might both be wrong.  Or
... who knows, really?

I think we need to come up with some set of tests to figure out what
actually works well in practice here.  Theories are a good starting
point, but good vacuum behavior is really important, and a patch that
changes it ought to be backed up by at least some experimental
evidence.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-23 Thread Amit Kapila

On Thu, Feb 23, 2017 at 5:15 PM, Masahiko Sawada  wrote:
> On Wed, Feb 22, 2017 at 12:01 AM, Amit Kapila  wrote:
>
>> I understand that there could be some delay in reclaiming dead pages
>> but do you think it is such a big deal that we completely scan the
>> index for such cases or even try to change the metapage format?
>
> IIUC, I think that we need to have the number of half-dead pages in meta page.
>

Don't you think we need to consider backward compatibility if we want
to do that?

> Isn't it a problem that the freespace map of btree index is not
> vacuumed if all vacuums skip the second pass?
>

AFAIU, you want to skip only when there is no dead tuple removal, if
so what is the need to update freespace map of btree index?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-23 Thread Masahiko Sawada

On Wed, Feb 22, 2017 at 12:01 AM, Amit Kapila  wrote:
> On Tue, Feb 21, 2017 at 1:09 AM, Simon Riggs  wrote:
>> On 20 February 2017 at 10:27, Amit Kapila  wrote:
>>> On Mon, Feb 20, 2017 at 3:01 PM, Simon Riggs  wrote:
 On 20 February 2017 at 09:15, Amit Kapila  wrote:
> On Mon, Feb 20, 2017 at 7:26 AM, Masahiko Sawada  
> wrote:
>> On Fri, Feb 17, 2017 at 3:41 AM, Robert Haas  
>> wrote:
>>> On Thu, Feb 16, 2017 at 6:17 AM, Simon Riggs  
>>> wrote:
 On 15 February 2017 at 08:07, Masahiko Sawada  
 wrote:
> It's a bug. Attached latest version patch, which passed make check.

 2. The current btree vacuum code requires 2 vacuums to fully reuse
 half-dead pages. So skipping an index vacuum might mean that second
 index scan never happens at all, which would be bad.
>>>
>>> Maybe.  If there are a tiny number of those half-dead pages in a huge
>>> index, it probably doesn't matter.  Also, I don't think it would never
>>> happen, unless the table just never gets any more updates or deletes -
>>> but that case could also happen today.  It's just a matter of
>>> happening less frequently.
>>
>
> Yeah thats right and I am not sure if it is worth to perform a
> complete pass to reclaim dead/deleted pages unless we know someway
> that there are many such pages.

 Agreed which is why
 On 16 February 2017 at 11:17, Simon Riggs  wrote:
> I suggest that we store the number of half-dead pages in the metapage
> after each VACUUM, so we can decide whether to skip the scan or not.


> Also, I think we do reclaim the
> complete page while allocating a new page in btree.

 That's not how it works according to the README at least.

>>>
>>> I am referring to code (_bt_getbuf()->if (_bt_page_recyclable(page))),
>>> won't that help us in reclaiming the space?
>>
>> Not unless the README is incorrect, no.
>>
>
> Just to ensure that we both have the same understanding, let me try to
> write what I understand about this reclaim algorithm.  AFAIU, in the
> first pass vacuum will mark the half dead pages as Deleted and in the
> second pass, it will record such pages as free in FSM so that they can
> be reused as new pages when the indexam asked for a new block instead
> of extending the index relation.  Now, if we introduce this new GUC,
> then there are chances that sometimes we skip the second pass where it
> would not have been skipped.
>
> Note that we do perform the second pass in the same vacuum cycle when
> index has not been scanned for deleting the tuples as per below code:

The first pass uses cycle id given by _bt_start_vacuum, but the second
pass always uses cycle id 0.

> btvacuumcleanup()
> {
> ..
> if (stats == NULL)
> {
> stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
> btvacuumscan(info, stats, NULL, NULL, 0);
> ..
> }
>
> In above code stats won't be NULL, if the vacuum has scanned index for
> deleting tuples (btbulkdelete).  So, based on this I think it will
> skip scanning the index (or recycling pages marked as deleted) in the
> second vacuum only when there are no dead tuple removals in that
> vacuum.  Do we agree till here?

Agreed.

> I understand that there could be some delay in reclaiming dead pages
> but do you think it is such a big deal that we completely scan the
> index for such cases or even try to change the metapage format?

IIUC, I think that we need to have the number of half-dead pages in meta page.
Isn't it a problem that the freespace map of btree index is not
vacuumed if all vacuums skip the second pass?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-21 Thread Amit Kapila

On Tue, Feb 21, 2017 at 1:09 AM, Simon Riggs  wrote:
> On 20 February 2017 at 10:27, Amit Kapila  wrote:
>> On Mon, Feb 20, 2017 at 3:01 PM, Simon Riggs  wrote:
>>> On 20 February 2017 at 09:15, Amit Kapila  wrote:
 On Mon, Feb 20, 2017 at 7:26 AM, Masahiko Sawada  
 wrote:
> On Fri, Feb 17, 2017 at 3:41 AM, Robert Haas  
> wrote:
>> On Thu, Feb 16, 2017 at 6:17 AM, Simon Riggs  
>> wrote:
>>> On 15 February 2017 at 08:07, Masahiko Sawada  
>>> wrote:
 It's a bug. Attached latest version patch, which passed make check.
>>>
>>> 2. The current btree vacuum code requires 2 vacuums to fully reuse
>>> half-dead pages. So skipping an index vacuum might mean that second
>>> index scan never happens at all, which would be bad.
>>
>> Maybe.  If there are a tiny number of those half-dead pages in a huge
>> index, it probably doesn't matter.  Also, I don't think it would never
>> happen, unless the table just never gets any more updates or deletes -
>> but that case could also happen today.  It's just a matter of
>> happening less frequently.
>

 Yeah thats right and I am not sure if it is worth to perform a
 complete pass to reclaim dead/deleted pages unless we know someway
 that there are many such pages.
>>>
>>> Agreed which is why
>>> On 16 February 2017 at 11:17, Simon Riggs  wrote:
 I suggest that we store the number of half-dead pages in the metapage
 after each VACUUM, so we can decide whether to skip the scan or not.
>>>
>>>
 Also, I think we do reclaim the
 complete page while allocating a new page in btree.
>>>
>>> That's not how it works according to the README at least.
>>>
>>
>> I am referring to code (_bt_getbuf()->if (_bt_page_recyclable(page))),
>> won't that help us in reclaiming the space?
>
> Not unless the README is incorrect, no.
>

Just to ensure that we both have the same understanding, let me try to
write what I understand about this reclaim algorithm.  AFAIU, in the
first pass vacuum will mark the half dead pages as Deleted and in the
second pass, it will record such pages as free in FSM so that they can
be reused as new pages when the indexam asked for a new block instead
of extending the index relation.  Now, if we introduce this new GUC,
then there are chances that sometimes we skip the second pass where it
would not have been skipped.

Note that we do perform the second pass in the same vacuum cycle when
index has not been scanned for deleting the tuples as per below code:

btvacuumcleanup()
{
..
if (stats == NULL)
{
stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
btvacuumscan(info, stats, NULL, NULL, 0);
..
}

In above code stats won't be NULL, if the vacuum has scanned index for
deleting tuples (btbulkdelete).  So, based on this I think it will
skip scanning the index (or recycling pages marked as deleted) in the
second vacuum only when there are no dead tuple removals in that
vacuum.  Do we agree till here?
I understand that there could be some delay in reclaiming dead pages
but do you think it is such a big deal that we completely scan the
index for such cases or even try to change the metapage format?

> That section of code is just a retest of pages retrieved from FSM;
>

Yes, I think you are right.  In short, I agree that only vacuum can
reclaim half-dead pages.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-20 Thread Peter Geoghegan

On Thu, Feb 16, 2017 at 10:41 AM, Robert Haas  wrote:
>> 2. The current btree vacuum code requires 2 vacuums to fully reuse
>> half-dead pages. So skipping an index vacuum might mean that second
>> index scan never happens at all, which would be bad.
>
> Maybe.  If there are a tiny number of those half-dead pages in a huge
> index, it probably doesn't matter.  Also, I don't think it would never
> happen, unless the table just never gets any more updates or deletes -
> but that case could also happen today.  It's just a matter of
> happening less frequently.
>
> I guess the question is whether the accumulation of half-dead pages in
> the index could become a problem before the unsetting of all-visible
> bits in the heap becomes a problem.  If the second one always happen
> first, then we don't have an issue here, but if it's possible for the
> first one to become a big problem before the second one gets to be a
> serious issue, then we need something more sophisticated.

Not getting to a second VACUUM where you might have otherwise can only
be a problem to the extent that users are sensitive to not reclaiming
disk space from indexes at the level of the FSM. It's not accurate to
say that pages could be left "half dead" indefinitely by this patch,
since that is something that lasts only as long as the first phase of
B-Tree page deletion. In fact, the only possible problem is that pages
are recyclable in principle, but that doesn't happen due to this new
GUC.

That isn't analogous to heap bloat at all, because it's not as if
there are any downlinks or right links or left links pointing to the
recyclable (fully deleted) pages; the previous key space *has* in fact
been *fully* reclaimed. These pages are fully dead, and as such are
out of the critical path of index scans entirely once the second phase
finishes. (They only need to continue to physically exist because old
index scans might follow a stale pointer).

Note that there is an interlock against RecentGlobalXmin respected by
VACUUM, that prevents this sort of recycling. I suspect that the
restrictions on page deletion as opposed to page recycling is vastly
more likely to cause pain to users, and that's not made any worse by
this.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-20 Thread Simon Riggs

On 20 February 2017 at 10:27, Amit Kapila  wrote:
> On Mon, Feb 20, 2017 at 3:01 PM, Simon Riggs  wrote:
>> On 20 February 2017 at 09:15, Amit Kapila  wrote:
>>> On Mon, Feb 20, 2017 at 7:26 AM, Masahiko Sawada  
>>> wrote:
 On Fri, Feb 17, 2017 at 3:41 AM, Robert Haas  wrote:
> On Thu, Feb 16, 2017 at 6:17 AM, Simon Riggs  
> wrote:
>> On 15 February 2017 at 08:07, Masahiko Sawada  
>> wrote:
>>> It's a bug. Attached latest version patch, which passed make check.
>>
>> 2. The current btree vacuum code requires 2 vacuums to fully reuse
>> half-dead pages. So skipping an index vacuum might mean that second
>> index scan never happens at all, which would be bad.
>
> Maybe.  If there are a tiny number of those half-dead pages in a huge
> index, it probably doesn't matter.  Also, I don't think it would never
> happen, unless the table just never gets any more updates or deletes -
> but that case could also happen today.  It's just a matter of
> happening less frequently.

>>>
>>> Yeah thats right and I am not sure if it is worth to perform a
>>> complete pass to reclaim dead/deleted pages unless we know someway
>>> that there are many such pages.
>>
>> Agreed which is why
>> On 16 February 2017 at 11:17, Simon Riggs  wrote:
>>> I suggest that we store the number of half-dead pages in the metapage
>>> after each VACUUM, so we can decide whether to skip the scan or not.
>>
>>
>>> Also, I think we do reclaim the
>>> complete page while allocating a new page in btree.
>>
>> That's not how it works according to the README at least.
>>
>
> I am referring to code (_bt_getbuf()->if (_bt_page_recyclable(page))),
> won't that help us in reclaiming the space?

Not unless the README is incorrect, no.

That section of code is just a retest of pages retrieved from FSM;
they aren't even added there until two scans have occurred and even
then it may not be possible to recycle.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-20 Thread Masahiko Sawada

On Mon, Feb 20, 2017 at 6:15 PM, Amit Kapila  wrote:
> On Mon, Feb 20, 2017 at 7:26 AM, Masahiko Sawada  
> wrote:
>> On Fri, Feb 17, 2017 at 3:41 AM, Robert Haas  wrote:
>>> On Thu, Feb 16, 2017 at 6:17 AM, Simon Riggs  wrote:
 On 15 February 2017 at 08:07, Masahiko Sawada  
 wrote:
> It's a bug. Attached latest version patch, which passed make check.

 In its current form, I'm not sure this is a good idea. Problems...

 1. I'm pretty sure the world doesn't need another VACUUM parameter

 I suggest that we use the existing vacuum scale factor/4 to reflect
 that indexes are more sensitive to bloat.
>>>
>>> I do not think it's a good idea to control multiple behaviors with a
>>> single GUC.  We don't really know that dividing by 4 will be right for
>>> everyone, or even for most people.  It's better to have another
>>> parameter with a sensible default than to hardcode a ratio that might
>>> work out poorly for some people.
>>>
 2. The current btree vacuum code requires 2 vacuums to fully reuse
 half-dead pages. So skipping an index vacuum might mean that second
 index scan never happens at all, which would be bad.
>>>
>>> Maybe.  If there are a tiny number of those half-dead pages in a huge
>>> index, it probably doesn't matter.  Also, I don't think it would never
>>> happen, unless the table just never gets any more updates or deletes -
>>> but that case could also happen today.  It's just a matter of
>>> happening less frequently.
>>
>
> Yeah thats right and I am not sure if it is worth to perform a
> complete pass to reclaim dead/deleted pages unless we know someway
> that there are many such pages.  Also, I think we do reclaim the
> complete page while allocating a new page in btree.
>
>> The half-dead pages are never cleaned up if the ratio of pages
>> containing garbage is always lower than threshold.
>>
>
> Which threshold are you referring here?
>

I meant the new parameter in current patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-20 Thread Amit Kapila

On Mon, Feb 20, 2017 at 3:01 PM, Simon Riggs  wrote:
> On 20 February 2017 at 09:15, Amit Kapila  wrote:
>> On Mon, Feb 20, 2017 at 7:26 AM, Masahiko Sawada  
>> wrote:
>>> On Fri, Feb 17, 2017 at 3:41 AM, Robert Haas  wrote:
 On Thu, Feb 16, 2017 at 6:17 AM, Simon Riggs  wrote:
> On 15 February 2017 at 08:07, Masahiko Sawada  
> wrote:
>> It's a bug. Attached latest version patch, which passed make check.
>
> 2. The current btree vacuum code requires 2 vacuums to fully reuse
> half-dead pages. So skipping an index vacuum might mean that second
> index scan never happens at all, which would be bad.

 Maybe.  If there are a tiny number of those half-dead pages in a huge
 index, it probably doesn't matter.  Also, I don't think it would never
 happen, unless the table just never gets any more updates or deletes -
 but that case could also happen today.  It's just a matter of
 happening less frequently.
>>>
>>
>> Yeah thats right and I am not sure if it is worth to perform a
>> complete pass to reclaim dead/deleted pages unless we know someway
>> that there are many such pages.
>
> Agreed which is why
> On 16 February 2017 at 11:17, Simon Riggs  wrote:
>> I suggest that we store the number of half-dead pages in the metapage
>> after each VACUUM, so we can decide whether to skip the scan or not.
>
>
>> Also, I think we do reclaim the
>> complete page while allocating a new page in btree.
>
> That's not how it works according to the README at least.
>

I am referring to code (_bt_getbuf()->if (_bt_page_recyclable(page))),
won't that help us in reclaiming the space?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-20 Thread Simon Riggs

On 20 February 2017 at 09:15, Amit Kapila  wrote:
> On Mon, Feb 20, 2017 at 7:26 AM, Masahiko Sawada  
> wrote:
>> On Fri, Feb 17, 2017 at 3:41 AM, Robert Haas  wrote:
>>> On Thu, Feb 16, 2017 at 6:17 AM, Simon Riggs  wrote:
 On 15 February 2017 at 08:07, Masahiko Sawada  
 wrote:
> It's a bug. Attached latest version patch, which passed make check.

 2. The current btree vacuum code requires 2 vacuums to fully reuse
 half-dead pages. So skipping an index vacuum might mean that second
 index scan never happens at all, which would be bad.
>>>
>>> Maybe.  If there are a tiny number of those half-dead pages in a huge
>>> index, it probably doesn't matter.  Also, I don't think it would never
>>> happen, unless the table just never gets any more updates or deletes -
>>> but that case could also happen today.  It's just a matter of
>>> happening less frequently.
>>
>
> Yeah thats right and I am not sure if it is worth to perform a
> complete pass to reclaim dead/deleted pages unless we know someway
> that there are many such pages.

Agreed which is why
On 16 February 2017 at 11:17, Simon Riggs  wrote:
> I suggest that we store the number of half-dead pages in the metapage
> after each VACUUM, so we can decide whether to skip the scan or not.


> Also, I think we do reclaim the
> complete page while allocating a new page in btree.

That's not how it works according to the README at least.

You might be referring to cleaning out killed tuples just before a
page split? That's something different.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-20 Thread Amit Kapila

On Mon, Feb 20, 2017 at 7:26 AM, Masahiko Sawada  wrote:
> On Fri, Feb 17, 2017 at 3:41 AM, Robert Haas  wrote:
>> On Thu, Feb 16, 2017 at 6:17 AM, Simon Riggs  wrote:
>>> On 15 February 2017 at 08:07, Masahiko Sawada  wrote:
 It's a bug. Attached latest version patch, which passed make check.
>>>
>>> In its current form, I'm not sure this is a good idea. Problems...
>>>
>>> 1. I'm pretty sure the world doesn't need another VACUUM parameter
>>>
>>> I suggest that we use the existing vacuum scale factor/4 to reflect
>>> that indexes are more sensitive to bloat.
>>
>> I do not think it's a good idea to control multiple behaviors with a
>> single GUC.  We don't really know that dividing by 4 will be right for
>> everyone, or even for most people.  It's better to have another
>> parameter with a sensible default than to hardcode a ratio that might
>> work out poorly for some people.
>>
>>> 2. The current btree vacuum code requires 2 vacuums to fully reuse
>>> half-dead pages. So skipping an index vacuum might mean that second
>>> index scan never happens at all, which would be bad.
>>
>> Maybe.  If there are a tiny number of those half-dead pages in a huge
>> index, it probably doesn't matter.  Also, I don't think it would never
>> happen, unless the table just never gets any more updates or deletes -
>> but that case could also happen today.  It's just a matter of
>> happening less frequently.
>

Yeah thats right and I am not sure if it is worth to perform a
complete pass to reclaim dead/deleted pages unless we know someway
that there are many such pages.  Also, I think we do reclaim the
complete page while allocating a new page in btree.

> The half-dead pages are never cleaned up if the ratio of pages
> containing garbage is always lower than threshold.
>

Which threshold are you referring here?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-19 Thread Masahiko Sawada

On Mon, Feb 20, 2017 at 11:35 AM, Jim Nasby  wrote:
> On 2/19/17 7:56 PM, Masahiko Sawada wrote:
>>
>> The half-dead pages are never cleaned up if the ratio of pages
>> containing garbage is always lower than threshold. Also in gin index
>> the pending list is never cleared, which become big problem. I guess
>> that we should take action for each type of indexes.
>
>
> What worries me is that each AM is going to have a different notion of what
> needs to happen to support this. That indicates that trying to handle this
> at the vacuum level is not a good idea.
>
> I think it would be wiser to add support for skipping scans to the AM API
> instead. That also means you don't have to add support for this to every
> index type to start with.

Yeah, and it's better to have it as a index storage parameter.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-19 Thread Jim Nasby


On 2/19/17 7:56 PM, Masahiko Sawada wrote:

The half-dead pages are never cleaned up if the ratio of pages
containing garbage is always lower than threshold. Also in gin index
the pending list is never cleared, which become big problem. I guess
that we should take action for each type of indexes.


What worries me is that each AM is going to have a different notion of 
what needs to happen to support this. That indicates that trying to 
handle this at the vacuum level is not a good idea.


I think it would be wiser to add support for skipping scans to the AM 
API instead. That also means you don't have to add support for this to 
every index type to start with.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-19 Thread Masahiko Sawada

On Fri, Feb 17, 2017 at 3:41 AM, Robert Haas  wrote:
> On Thu, Feb 16, 2017 at 6:17 AM, Simon Riggs  wrote:
>> On 15 February 2017 at 08:07, Masahiko Sawada  wrote:
>>> It's a bug. Attached latest version patch, which passed make check.
>>
>> In its current form, I'm not sure this is a good idea. Problems...
>>
>> 1. I'm pretty sure the world doesn't need another VACUUM parameter
>>
>> I suggest that we use the existing vacuum scale factor/4 to reflect
>> that indexes are more sensitive to bloat.
>
> I do not think it's a good idea to control multiple behaviors with a
> single GUC.  We don't really know that dividing by 4 will be right for
> everyone, or even for most people.  It's better to have another
> parameter with a sensible default than to hardcode a ratio that might
> work out poorly for some people.
>
>> 2. The current btree vacuum code requires 2 vacuums to fully reuse
>> half-dead pages. So skipping an index vacuum might mean that second
>> index scan never happens at all, which would be bad.
>
> Maybe.  If there are a tiny number of those half-dead pages in a huge
> index, it probably doesn't matter.  Also, I don't think it would never
> happen, unless the table just never gets any more updates or deletes -
> but that case could also happen today.  It's just a matter of
> happening less frequently.

The half-dead pages are never cleaned up if the ratio of pages
containing garbage is always lower than threshold. Also in gin index
the pending list is never cleared, which become big problem. I guess
that we should take action for each type of indexes.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-16 Thread Robert Haas

On Thu, Feb 16, 2017 at 6:17 AM, Simon Riggs  wrote:
> On 15 February 2017 at 08:07, Masahiko Sawada  wrote:
>> It's a bug. Attached latest version patch, which passed make check.
>
> In its current form, I'm not sure this is a good idea. Problems...
>
> 1. I'm pretty sure the world doesn't need another VACUUM parameter
>
> I suggest that we use the existing vacuum scale factor/4 to reflect
> that indexes are more sensitive to bloat.

I do not think it's a good idea to control multiple behaviors with a
single GUC.  We don't really know that dividing by 4 will be right for
everyone, or even for most people.  It's better to have another
parameter with a sensible default than to hardcode a ratio that might
work out poorly for some people.

> 2. The current btree vacuum code requires 2 vacuums to fully reuse
> half-dead pages. So skipping an index vacuum might mean that second
> index scan never happens at all, which would be bad.

Maybe.  If there are a tiny number of those half-dead pages in a huge
index, it probably doesn't matter.  Also, I don't think it would never
happen, unless the table just never gets any more updates or deletes -
but that case could also happen today.  It's just a matter of
happening less frequently.

I guess the question is whether the accumulation of half-dead pages in
the index could become a problem before the unsetting of all-visible
bits in the heap becomes a problem.  If the second one always happen
first, then we don't have an issue here, but if it's possible for the
first one to become a big problem before the second one gets to be a
serious issue, then we need something more sophisticated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-16 Thread Simon Riggs

On 15 February 2017 at 08:07, Masahiko Sawada  wrote:
>
> It's a bug. Attached latest version patch, which passed make check.

In its current form, I'm not sure this is a good idea. Problems...

1. I'm pretty sure the world doesn't need another VACUUM parameter

I suggest that we use the existing vacuum scale factor/4 to reflect
that indexes are more sensitive to bloat.

2. The current btree vacuum code requires 2 vacuums to fully reuse
half-dead pages. So skipping an index vacuum might mean that second
index scan never happens at all, which would be bad.

I suggest that we store the number of half-dead pages in the metapage
after each VACUUM, so we can decide whether to skip the scan or not.
And we use some math like each half-dead page that needs to be reused
is worth 250 index entries, so the decision to skip is based upon rows
and empty pages, not just recently vacuumed rows.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-16 Thread Kuntal Ghosh

On Mon, Feb 13, 2017 at 1:01 PM, Masahiko Sawada  wrote:

Thanks for the explanation. I've looked into the referred code. I'm
still in doubt. vacuumed_pages is incremented only when there are no
indexes, i.e. nindexes=0. Now, look at the following part in the
patch.

+   /*
+* Do post-vacuum cleanup and statistics update for each index if
+* the number of vacuumed page exceeds threshold.
+*/
+   cleanupidx_thresh = (float4) nblocks * vacuum_cleanup_index_scale;
+
+   elog(DEBUG3, "%s: vac: %d (threshold %0.f)",
+RelationGetRelationName(onerel), nblocks, cleanupidx_thresh);
+   if (vacuumed_pages >= cleanupidx_thresh)
+   {
+   for (i = 0; i < nindexes; i++)
+   lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
+   }
So, unless vacuum_cleanup_index_scale_thresh is zero,
lazy_cleanup_index will never be called. IMO, this seems to be
incorrect. Besides, I've tested with non-zero(0.5)
vacuum_cleanup_index_scale_thresh and the regression tests for brin
and gin fails. (make installcheck)

+   {"vacuum_cleanup_index_scale_factor", PGC_USERSET,
CLIENT_CONN_STATEMENT,
+gettext_noop("Number of pages containing dead tuple
prior to vacuum as a fraction of relpages."),
+   NULL
+   },
+   _cleanup_index_scale,
+   0.0, 0.0, 100.0,
+   NULL, NULL, NULL
+   },
Maximum value for vacuum_cleanup_index_scale_factor should be 1
instead of 100. As the code indicates, it is certainly not treated as
a percentage fraction of relpages.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-15 Thread Masahiko Sawada

On Wed, Feb 15, 2017 at 4:39 PM, Ideriha, Takeshi
 wrote:
> Hi, I tried regression test and found some errors concerning brin and gin,
> though I didn't look into this.
>
> Here's a log:
>
> *** /home/ideriha/postgres-master/src/test/regress/expected/brin.out
> 2017-02-13 11:33:43.270942937 +0900
> --- /home/ideriha/postgres-master/src/test/regress/results/brin.out 
> 2017-02-15 14:58:24.725984474 +0900
> ***
> *** 403,408 
>   SELECT brin_summarize_new_values('brinidx'); -- ok, no change expected
>brin_summarize_new_values
>   ---
> !  0
>   (1 row)
>
> --- 403,408 
>   SELECT brin_summarize_new_values('brinidx'); -- ok, no change expected
>brin_summarize_new_values
>   ---
> !  5
>   (1 row)
>
>
> ==
>
> *** /home/ideriha/postgres-master/src/test/regress/expected/gin.out 
> 2016-12-20 16:49:09.513050050 +0900
> --- /home/ideriha/postgres-master/src/test/regress/results/gin.out  
> 2017-02-15 14:58:25.536984461 +0900
> ***
> *** 20,26 
>   select gin_clean_pending_list('gin_test_idx'); -- nothing to flush
>gin_clean_pending_list
>   
> !   0
>   (1 row)
>
>   -- Test vacuuming
> --- 20,26 
>   select gin_clean_pending_list('gin_test_idx'); -- nothing to flush
>gin_clean_pending_list
>   
> !   8
>   (1 row)
>
>   -- Test vacuuming
>
> ==
>
>

Thank you for testing!

It's a bug. Attached latest version patch, which passed make check.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


skip_cleanup_indexdes_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-14 Thread Ideriha, Takeshi

Hi, I tried regression test and found some errors concerning brin and gin,
though I didn't look into this.

Here's a log:

*** /home/ideriha/postgres-master/src/test/regress/expected/brin.out
2017-02-13 11:33:43.270942937 +0900
--- /home/ideriha/postgres-master/src/test/regress/results/brin.out 
2017-02-15 14:58:24.725984474 +0900
***
*** 403,408 
  SELECT brin_summarize_new_values('brinidx'); -- ok, no change expected
   brin_summarize_new_values 
  ---
!  0
  (1 row)
  
--- 403,408 
  SELECT brin_summarize_new_values('brinidx'); -- ok, no change expected
   brin_summarize_new_values 
  ---
!  5
  (1 row)
  

==

*** /home/ideriha/postgres-master/src/test/regress/expected/gin.out 
2016-12-20 16:49:09.513050050 +0900
--- /home/ideriha/postgres-master/src/test/regress/results/gin.out  
2017-02-15 14:58:25.536984461 +0900
***
*** 20,26 
  select gin_clean_pending_list('gin_test_idx'); -- nothing to flush
   gin_clean_pending_list 
  
!   0
  (1 row)
  
  -- Test vacuuming
--- 20,26 
  select gin_clean_pending_list('gin_test_idx'); -- nothing to flush
   gin_clean_pending_list 
  
!   8
  (1 row)
  
  -- Test vacuuming

==


Regards,
Ideriha Takeshi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-12 Thread Masahiko Sawada

On Fri, Feb 10, 2017 at 8:01 PM, Kuntal Ghosh
 wrote:
> On Wed, Jan 4, 2017 at 1:51 PM, Masahiko Sawada  wrote:
>
>> Attached patch introduces new GUC parameter parameter
>> vacuum_cleanup_index_scale_factor which specifies the fraction of the
>> table pages containing dead tuple needed to trigger a cleaning up
>> indexes. The default is 0.0, which means that the cleanup index is not
>> invoked if no update on table. In other word, if table is completely
>> frozen then lazy vacuum can skip the index scans as well. Increasing
>> this value could reduce total time of lazy vacuum but the statistics
>> and the free space map of index are not updated.
>>
> I was looking into your patch and trying to understand how the
> following piece of code works.

Thank you for looking at this patch!

> +if (vacuumed_pages > cleanupidx_thresh)
> +{
> +for (i = 0; i < nindexes; i++)
> +lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
> +}
> So, you are skipping deletion of index entries if it does not reach
> the clean-up index threshold. But, you are removing all dead tuples
> from the heap pointed by the same index. Hence, index will contain
> entries with invalid references.

I think no. Before calling lazy_cleanup_index, all garbage on heap and
index should have been reclaimed by lazy_vacuum_heap and
lazy_vacuum_index.

> +This parameter can only be set anywhere.
> Oxymoron. :-)
>

Will fix it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-10 Thread Kuntal Ghosh

On Wed, Jan 4, 2017 at 1:51 PM, Masahiko Sawada  wrote:

> Attached patch introduces new GUC parameter parameter
> vacuum_cleanup_index_scale_factor which specifies the fraction of the
> table pages containing dead tuple needed to trigger a cleaning up
> indexes. The default is 0.0, which means that the cleanup index is not
> invoked if no update on table. In other word, if table is completely
> frozen then lazy vacuum can skip the index scans as well. Increasing
> this value could reduce total time of lazy vacuum but the statistics
> and the free space map of index are not updated.
>
I was looking into your patch and trying to understand how the
following piece of code works.
+if (vacuumed_pages > cleanupidx_thresh)
+{
+for (i = 0; i < nindexes; i++)
+lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
+}
So, you are skipping deletion of index entries if it does not reach
the clean-up index threshold. But, you are removing all dead tuples
from the heap pointed by the same index. Hence, index will contain
entries with invalid references. How does that work? How will you
remove those index entries later? (I'm a newbie.)

+This parameter can only be set anywhere.
Oxymoron. :-)


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-01-05 Thread Robert Haas

On Wed, Jan 4, 2017 at 3:21 AM, Masahiko Sawada  wrote:
> Hi and happy new year.
>
> The lazy vacuum calls lazy_cleanup_index to update statistics of
> indexes on a table such as relpages, reltuples at the end of the
> lazy_scan_heap. In all type of indexes the lazy_cleanup_index scans
> all index pages. It happens even if table has not been updated at all
> since previous vacuum invoked. Freeze map reduces the execution time
> and cost of table vacuuming much if almost table has been frozen. But
> it doesn't work for cleaning up indexes. If a very large static table
> has index then because the cleaning up index is called and it always
> scans all index pages, it takes time to scan all pages of index as
> reported[1].
>
> Attached patch introduces new GUC parameter parameter
> vacuum_cleanup_index_scale_factor which specifies the fraction of the
> table pages containing dead tuple needed to trigger a cleaning up
> indexes. The default is 0.0, which means that the cleanup index is not
> invoked if no update on table. In other word, if table is completely
> frozen then lazy vacuum can skip the index scans as well. Increasing
> this value could reduce total time of lazy vacuum but the statistics
> and the free space map of index are not updated.

Cool.  I'll look at this, but not until next CF.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GUC for cleanup indexes threshold.

2017-01-04 Thread Masahiko Sawada

Hi and happy new year.

The lazy vacuum calls lazy_cleanup_index to update statistics of
indexes on a table such as relpages, reltuples at the end of the
lazy_scan_heap. In all type of indexes the lazy_cleanup_index scans
all index pages. It happens even if table has not been updated at all
since previous vacuum invoked. Freeze map reduces the execution time
and cost of table vacuuming much if almost table has been frozen. But
it doesn't work for cleaning up indexes. If a very large static table
has index then because the cleaning up index is called and it always
scans all index pages, it takes time to scan all pages of index as
reported[1].

Attached patch introduces new GUC parameter parameter
vacuum_cleanup_index_scale_factor which specifies the fraction of the
table pages containing dead tuple needed to trigger a cleaning up
indexes. The default is 0.0, which means that the cleanup index is not
invoked if no update on table. In other word, if table is completely
frozen then lazy vacuum can skip the index scans as well. Increasing
this value could reduce total time of lazy vacuum but the statistics
and the free space map of index are not updated.

[1] 
https://www.postgresql.org/message-id/MWHPR20MB142177B86D893C946FAFC9A4A18C0%40MWHPR20MB1421.namprd20.prod.outlook.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


cleanup_indexes_threshold_v1.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

79 matches

Mail list logo