Re: [HACKERS] GUC for cleanup indexes threshold.

Sokolov Yura Fri, 22 Sep 2017 07:15:57 -0700

On 2017-09-22 16:22, Sokolov Yura wrote:

On 2017-09-22 11:21, Masahiko Sawada wrote:

On Fri, Sep 22, 2017 at 4:16 PM, Kyotaro HORIGUCHI
<[email protected]> wrote:

At Fri, 22 Sep 2017 15:00:20 +0900, Masahiko Sawada<[email protected]> wrote in<cad21aod6zgb1w6ps1axj0ccab_chdyiitntedpmhkefgg13...@mail.gmail.com>

On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
<[email protected]> wrote:
> I was just looking the thread since it is found left alone for a
> long time in the CF app.
>
> At Mon, 18 Sep 2017 16:35:58 -0700, Peter Geoghegan <[email protected]> wrote in 
<CAH2-WzkhJhAXD+6DdBp7D8WYLfJ3D0m=AZbGsiw=usujtmu...@mail.gmail.com>
>> On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund <[email protected]> wrote:
>> > Hi,
>> >
>> > On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
>> >> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas <[email protected]> 
wrote:
>> >> [ lots of valuable discussion ]
>> >
>> > I think this patch clearly still is in the design stage, and has
>> > received plenty feedback this CF.  I'll therefore move this to the next
>> > commitfest.
>>
>> Does anyone have ideas on a way forward here? I don't, but then I
>> haven't thought about it in detail in several months.
>
> Is the additional storage in metapage to store the current status
> of vaccum is still unacceptable even if it can avoid useless
> full-page scan on indexes especially for stable tables?
>
> Or, how about additional 1 bit in pg_stat_*_index to indicate
> that the index *don't* require vacuum cleanup stage. (default
> value causes cleanup)


You meant that "the next cycle" is the lazy_cleanup_index() function
called by lazy_scan_heap()?


Both finally call btvacuumscan under a certain condition, but
what I meant by "the next cycle" is the lazy_cleanup_index call
in the next round of vacuum since abstract layer (index am) isn't
conscious of the detail of btree.

> index_bulk_delete (or ambulkdelete) returns the flag in
> IndexBulkDeleteResult then lazy_scan_heap stores the flag in
> stats and in the next cycle it is looked up to decide the
> necessity of index cleanup.
>
Could you elaborate about this? For example in btree index, theindexcleanup skips to scan on the index scan if index_bulk_delete hasbeen
called during vacuuming because stats != NULL. So I think we don't
need such a flag.


The flag works so that successive two index full scans don't
happen in a vacuum round. If any rows are fully deleted, just
following btvacuumcleanup does nothing.

I think what you wanted to solve here was the problem that
index_vacuum_cleanup runs a full scan even if it ends with no
actual work, when manual or anti-wraparound vacuums.  (I'm
getting a bit confused on this..) It is caused by using the
pointer "stats" as the flag to instruct to do that. If the
stats-as-a-flag worked as expected, the GUC doesn't seem to be
required.


Hmm, my proposal is like that if a table doesn't changed since the
previous vacuum much we skip the cleaning up index.

If the table has at least one garbage we do the lazy_vacuum_index and
then IndexBulkDeleteResutl is stored, which causes to skip doing the
btvacuumcleanup. On the other hand, if the table doesn't have any
garbage but some new tuples inserted since the previous vacuum, we
don't do the lazy_vacuum_index but do the lazy_cleanup_index. In this
case, we always do the lazy_cleanup_index (i.g, we do the full scan)
even if only one tuple is inserted. That's why I proposed a new GUC
parameter which allows us to skip the lazy_cleanup_index in the case.


Addition to that, as Simon and Peter pointed out
index_bulk_delete can leave not-fully-removed pages (so-called
half-dead pages and pages that are recyclable but not registered
in FSM, AFAICS) in some cases mainly by RecentGlobalXmin
interlock. In this case, just inhibiting cleanup scan by a
threshold lets such dangling pages persist in the index. (I
conldn't make such a many dangling pages, though..)

The first patch in the mail (*1) does that. It seems having some
bugs, though..


Since the dangling pages persist until autovacuum decided to scan
the belonging table again, we should run a vacuum round (or
index_vacuum_cleanup itself) even having no dead rows if we want
to clean up such pages within a certain period. The second patch
doesn that.


IIUC half-dead pages are not relevant to this proposal. The proposal
has two problems;
* By skipping index cleanup we could leave recyclable pages that are
not marked as a recyclable.
* we stash an XID when a btree page is deleted, which is used to
determine when it's finally safe to recycle the page

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Here is a small patch that skips scanning btree index if no pending
deleted pages exists.

It detects this situation by comparing pages_deleted with pages_free.
If they are equal, then there is no half-deleted pages, and it is
safe to skip next lazy scan.

Flag stored in a btpo_flags. It is unset using general wal before scan.
If no half-deleted pages found, it is set without wal (like hint bit).
(it is safe to miss setting flag, but it is not safe to miss unsetting
flag).

This patch works correctly:
- if rows are only inserted and never deleted, index always skips
scanning (starting with second scan).
- if some rows updated/deleted, then some scans are not skipped. But
when all half-deleted pages are marked as deleted, lazy scans start to
be skipped.

Open question: what to do with index statistic? For simplicity this
patch skips updating stats (just returns NULL from btvacuumcleanup).
Probably, it should detect proportion of table changes, and do not
skip scans if table grows too much.

Excuse me, I didn't mean to overwrite "last attachment" on commitfestpage.


--
Sokolov Yura
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

Reply via email to