Re: Add parallelism and glibc dependent only options to reindexdb

Alvaro Herrera Mon, 01 Jul 2019 06:51:48 -0700

Now that we have REINDEX CONCURRENTLY, I think reindexdb is going to
gain more popularity.


Please don't reuse a file name as generic as "parallel.c" -- it's
annoying when navigating source.  Maybe conn_parallel.c multiconn.c
connscripts.c admconnection.c ...?

If your server crashes or is stopped midway during the reindex, you
would have to start again from scratch, and it's tedious (if it's
possible at all) to determine which indexes were missed.  I think it
would be useful to have a two-phase mode: in the initial phase reindexdb
computes the list of indexes to be reindexed and saves them into a work
table somewhere.  In the second phase, it reads indexes from that table
and processes them, marking them as done in the work table.  If the
second phase crashes or is stopped, it can be restarted and consults the
work table.  I would keep the work table, as it provides a bit of an
audit trail.  It may be important to be able to run even if unable to
create such a work table (because of the <ironic>numerous</> users that
DROP DATABASE postgres).

Maybe we'd have two flags in the work table for each index:
"reindex requested", "reindex done".
    
The "glibc filter" thing (which I take to mean "indexes that depend on
collations") would apply to the first phase: it just skips adding other
indexes to the work table.  I suppose ICU collations are not affected,
so the filter would be for glibc collations only?  The --glibc-dependent
switch seems too ad-hoc.  Maybe "--exclude-rule=glibc"?  That way we can
add other rules later.  (Not "--exclude=foo" because we'll want to add
the possibility to ignore specific indexes by name.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add parallelism and glibc dependent only options to reindexdb

Reply via email to