On Thu, Feb 14, 2019 at 5:17 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > Thank you. Attached the rebased patch.
Here are some review comments. + started by a single utility command. Currently, the parallel + utility commands that support the use of parallel workers are + <command>CREATE INDEX</command> and <command>VACUUM</command> + without <literal>FULL</literal> option, and only when building + a B-tree index. Parallel workers are taken from the pool of That sentence is garbled. The end part about b-tree indexes applies only to CREATE INDEX, not to VACUUM, since VACUUM does build indexes. + Vacuum index and cleanup index in parallel + <replaceable class="parameter">N</replaceable> background workers (for the detail + of each vacuum phases, please refer to <xref linkend="vacuum-phases"/>. If the I have two problems with this. One is that I can't understand the English very well. I think you mean something like: "Perform the 'vacuum index' and 'cleanup index' phases of VACUUM in parallel using N background workers," but I'm not entirely sure. The other is that if that is what you mean, I don't think it's a sufficient description. Users need to understand whether, for example, only one worker can be used per index, or whether the work for a single index can be split across workers. + parallel degree <replaceable class="parameter">N</replaceable> is omitted, + then <command>VACUUM</command> decides the number of workers based on + number of indexes on the relation which further limited by + <xref linkend="guc-max-parallel-workers-maintenance"/>. Also if this option Now this makes it sound like it's one worker per index, but you could be more explicit about it. + is specified multile times, the last parallel degree + <replaceable class="parameter">N</replaceable> is considered into the account. Typo, but I'd just delete this sentence altogether; the behavior if the option is multiply specified seems like a triviality that need not be documented. + Setting a value for <literal>parallel_workers</literal> via + <xref linkend="sql-altertable"/> also controls how many parallel + worker processes will be requested by a <command>VACUUM</command> + against the table. This setting is overwritten by setting + <replaceable class="parameter">N</replaceable> of <literal>PARALLEL</literal> + option. I wonder if we really want this behavior. Should a setting that controls the degree of parallelism when scanning the table also affect VACUUM? I tend to think that we probably don't ever want VACUUM of a table to be parallel by default, but rather something that the user must explicitly request. Happy to hear other opinions. If we do want this behavior, I think this should be written differently, something like this: The PARALLEL N option to VACUUM takes precedence over this option. + * parallel mode nor destories the parallel context. For updating the index Spelling. +/* DSM keys for parallel lazy vacuum */ +#define PARALLEL_VACUUM_KEY_SHARED UINT64CONST(0xFFFFFFFFFFF00001) +#define PARALLEL_VACUUM_KEY_DEAD_TUPLES UINT64CONST(0xFFFFFFFFFFF00002) +#define PARALLEL_VACUUM_KEY_QUERY_TEXT UINT64CONST(0xFFFFFFFFFFF00003) Any special reason not to use just 1, 2, 3 here? The general infrastructure stuff uses high numbers to avoid conflicting with plan_node_id values, but end clients of the parallel infrastructure can generally just use small integers. + bool updated; /* is the stats updated? */ is -> are + * LVDeadTuples controls the dead tuple TIDs collected during heap scan. what do you mean by "controls", exactly? stores? + * This is allocated in a dynamic shared memory segment when parallel + * lazy vacuum mode, or allocated in a local memory. If this is in DSM, then max_tuples is a wart, I think. We can't grow the segment at that point. I'm suspicious that we need a better design here. It looks like you gather all of the dead tuples in backend-local memory and then allocate an equal amount of DSM to copy them. But that means that we are using twice as much memory, which seems pretty bad. You'd have to do that at least momentarily no matter what, but it's not obvious that the backend-local copy is ever freed. There's another patch kicking around to allocate memory for vacuum in chunks rather than preallocating the whole slab of memory at once; we might want to think about getting that committed first and then having this build on top of it. At least we need something smarter than this. -heap_vacuum_rel(Relation onerel, int options, VacuumParams *params, +heap_vacuum_rel(Relation onerel, VacuumOptions options, VacuumParams *params, We generally avoid passing a struct by value; copying the struct can be expensive and having multiple shallow copies of the same data sometimes leads to surprising results. I think it might be a good idea to propose a preliminary refactoring patch that invents VacuumOptions and gives it just a single 'int' member and refactors everything to use it, and then that can be committed first. It should pass a pointer, though, not the actual struct. + LVState *lvstate; It's not clear to me why we need this new LVState thing. What's the motivation for that? If it's a good idea, could it be done as a separate, preparatory patch? It seems to be responsible for a lot of code churn in this patch. It also leads to strange stuff like this: ereport(elevel, - (errmsg("scanned index \"%s\" to remove %d row versions", + (errmsg("scanned index \"%s\" to remove %d row versions %s", RelationGetRelationName(indrel), - vacrelstats->num_dead_tuples), + dead_tuples->num_tuples, + IsParallelWorker() ? "by parallel vacuum worker" : ""), This doesn't seem to be great grammar, and translation guidelines generally discourage this sort of incremental message construction quite strongly. Since the user can probably infer what happened by a suitable choice of log_line_prefix, I'm not totally sure this is worth doing in the first place, but if we're going to do it, it should probably have two completely separate message strings and pick between them using IsParallelWorker(), rather than building it up incrementally like this. +compute_parallel_workers(Relation rel, int nrequests, int nindexes) I think 'nrequets' is meant to be 'nrequested'. It isn't the number of requests; it's the number of workers that were requested. + /* quick exit if no workers are prepared, e.g. under serializable isolation */ That comment makes very little sense in this context. + /* Report parallel vacuum worker information */ + initStringInfo(&buf); + appendStringInfo(&buf, + ngettext("launched %d parallel vacuum worker %s (planned: %d", + "launched %d parallel vacuum workers %s (planned: %d", + lvstate->pcxt->nworkers_launched), + lvstate->pcxt->nworkers_launched, + for_cleanup ? "for index cleanup" : "for index vacuum", + lvstate->pcxt->nworkers); + if (lvstate->options.nworkers > 0) + appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers); + + appendStringInfo(&buf, ")"); + ereport(elevel, (errmsg("%s", buf.data))); This is another example of incremental message construction, again violating translation guidelines. + WaitForParallelWorkersToAttach(lvstate->pcxt); Why? + /* + * If there is already-updated result in the shared memory we use it. + * Otherwise we pass NULL to index AMs, meaning it's first time call, + * and copy the result to the shared memory segment. + */ I'm probably missing something here, but isn't the intention that we only do each index once? If so, how would there be anything there already? Once from for_cleanup = false and once for for_cleanup = true? + if (a->options.flags != b->options.flags) + return false; + if (a->options.nworkers != b->options.nworkers) + return false; You could just do COMPARE_SCALAR_FIELD(options.flags); COMPARE_SCALAR_FIELD(options.nworkers); -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company