On 1/11/22, 1:01 PM, "Bossart, Nathan" <bossa...@amazon.com> wrote:

    On 1/10/22, 5:01 PM, "Imseih (AWS), Sami" <sims...@amazon.com> wrote:
    > I have attached the 3rd revision of the patch which also includes the 
documentation changes. Also attached is a rendered html of the docs for review.
    >
    > "max_index_vacuum_cycle_time" has been removed.
    > "index_rows_vacuumed" renamed to "index_tuples_removed". "tuples" is a 
more consistent with the terminology used.
    > "vacuum_cycle_ordinal_position" renamed to "index_ordinal_position".

    Thanks for the new version of the patch!

    nitpick: I get one whitespace error when applying the patch.

            Applying: Expose progress for the "vacuuming indexes" phase of a 
VACUUM operation.
            .git/rebase-apply/patch:44: tab in indent.
               Whenever <xref linkend="guc-vacuum-failsafe-age"/> is triggered, 
index
            warning: 1 line adds whitespace errors.

That was missed. Will fix it.

    +     <row>
    +      <entry role="catalog_table_entry"><para role="column_definition">
    +       <structfield>num_indexes_to_vacuum</structfield> <type>bigint</type>
    +      </para>
    +      <para>
    +       The number of indexes that will be vacuumed. Only indexes with
    +       <literal>pg_index.indisready</literal> set to "true" will be 
vacuumed.
    +       Whenever <xref linkend="guc-vacuum-failsafe-age"/> is triggered, 
index
    +       vacuuming will be bypassed.
    +      </para></entry>
    +     </row>
    +    </tbody>
    +   </tgroup>
    +  </table>

    We may want to avoid exhaustively listing the cases when this value
    will be zero.  I would suggest saying, "When index cleanup is skipped,
    this value will be zero" instead.

What about something like  "The number of indexes that are eligible for 
vacuuming".
This covers the cases where either an individual index is skipped or the entire 
"index vacuuming" phase is skipped.

    +     <row>
    +      <entry role="catalog_table_entry"><para role="column_definition">
    +       <structfield>relid</structfield> <type>oid</type>
    +      </para>
    +      <para>
    +       OID of the table being vacuumed.
    +      </para></entry>
    +     </row>

    Do we need to include this field?  I would expect indexrelid to go
    here.

Having indexrelid and relid makes the pg_stat_progress_vacuum_index view 
"self-contained". A user can lookup the index and table being vacuumed without 
joining back to pg_stat_progress_vacuum.

    +     <row>
    +      <entry role="catalog_table_entry"><para role="column_definition">
    +       <structfield>leader_pid</structfield> <type>bigint</type>
    +      </para>
    +      <para>
    +       Process ID of the parallel group leader. This field is 
<literal>NULL</literal>
    +       if this process is a parallel group leader or the
    +       <literal>vacuuming indexes</literal> phase is not performed in 
parallel.
    +      </para></entry>
    +     </row>

    Are there cases where the parallel group leader will have an entry in
    this view when parallelism is enabled?

Yes. A parallel group leader can perform an index vacuum just like a parallel 
worker. If you do something like "vacuum (parallel 3) ", you may have up to 4 
processes vacuuming indexes. The leader + 3 workers. 

    +     <row>
    +      <entry role="catalog_table_entry"><para role="column_definition">
    +       <structfield>index_ordinal_position</structfield> 
<type>bigint</type>
    +      </para>
    +      <para>
    +       The order in which the index is being vacuumed. Indexes are 
vacuumed by OID in ascending order.
    +      </para></entry>
    +     </row>

    Should we include the bit about the OID ordering?  I suppose that is
    unlikely to change in the near future, but I don't know if it is
    relevant information.  Also, do we need to include the "index_"
    prefix?  This view is specific for indexes.  (I have the same question
    for index_tuples_removed.)

I was on the fence about both of these as well. Will make a change to this.

    Should this new table go after the "VACUUM phases" table?  It might
    make sense to keep the phases table closer to where it is referenced.

I did not think that would read better. The introduction discusses both views 
and the "phase" table is linked from the pg_stat_progress_vacuum 

    +    /* Advertise the number of indexes to vacuum if we are not in failsafe 
mode */
    +    if (!lazy_check_wraparound_failsafe(vacrel))
    +        pgstat_progress_update_param(PROGRESS_VACUUM_TOTAL_INDEX_VACUUM, 
vacrel->nindexes);

    Shouldn't this be 0 when INDEX_CLEANUP is off, too?

This view is only covering the "vacuum index" phase, but it should also cover 
index_cleanup phase as well. Will update the patch.

    +#define PROGRESS_VACUUM_CURRENT_INDRELID         7
    +#define PROGRESS_VACUUM_LEADER_PID               8
    +#define PROGRESS_VACUUM_INDEX_ORDINAL            9
    +#define PROGRESS_VACUUM_TOTAL_INDEX_VACUUM       10
    +#define PROGRESS_VACUUM_DEAD_TUPLES_VACUUMED     11

    nitpick: I would suggest the following names to match the existing
    style:

            PROGRESS_VACUUM_NUM_INDEXES_TO_VACUUM
            PROGRESS_VACUUM_INDEX_LEADER_PID
            PROGRESS_VACUUM_INDEX_INDEXRELID
            PROGRESS_VACUUM_INDEX_ORDINAL_POSITION
            PROGRESS_VACUUM_INDEX_TUPLES_REMOVED

That looks better.

    Nathan


Reply via email to