From: Amit Kapila <amit.kapil...@gmail.com>
> We already allow users to specify the degree of parallelism for all
> the parallel operations via guc's max_parallel_maintenance_workers,
> max_parallel_workers_per_gather, then we have a reloption
> parallel_workers and vacuum command has the parallel option where
> users can specify the number of workers that can be used for
> parallelism. The parallelism considers these as hints but decides
> parallelism based on some other parameters like if there are that many
> workers available, etc. Why the users would expect differently for
> parallel DML?

I agree that the user would want to specify the degree of parallelism of DML, 
too.  My simple (probably silly) question was, in INSERT SELECT,

* If the target table has 10 partitions and the source table has 100 
partitions, how would the user want to specify parameters?

* If the source and target tables have the same number of partitions, and the 
user specified different values to parallel_workers and parallel_dml_workers, 
how many parallel workers would run?

* What would the query plan be like?  Something like below?  Can we easily 
support this sort of nested thing?

Gather
  Workers Planned: <parallel_dml_workers>
  Insert
    Gather
      Workers Planned: <parallel_workers>
      Parallel Seq Scan


> Which memory specific to partitions are you referring to here and does
> that apply to the patch being discussed?

Relation cache and catalog cache, which are not specific to partitions.  This 
patch's current parallel safety check opens and closes all descendant 
partitions of the target table.  That leaves those cache entries in 
CacheMemoryContext after the SQL statement ends.  But as I said, we can 
consider it's not a serious problem in this case because the parallel DML would 
be executed in limited number of concurrent sessions.  I just touched on the 
memory consumption issue for completeness in comparison with (3).


Regards
Takayuki Tsunakawa

Reply via email to