I'm working on a custom aggregate, that generates a serialized data format.
The preparation of the geometry before being formatted is pretty intense,
so it is probably a good thing for that work to be done in parallel, in
partial aggregates. Here's an example SQL call:

EXPLAIN analyze
SELECT length(ST_AsMVT(a)) FROM (
    SELECT ST_AsMVTGeom(p.geom, ::geometry_literal, 4096, 0, true), gid,
fed_num
    FROM pts_10 p
    WHERE p.geom && ::geometry_literal
    AND p.geom IS NOT NULL
    ) a;

The ST_AsMVTGeom() function can be comically expensive, it's really good
when it's in partial aggregates. But the cost of the function seems to be
ignored.

(First note that, in order to consistently get parallel plans I have to
brutally suppress parallel_tuple_cost, as described here
http://blog.cleverelephant.ca/2017/10/parallel-postgis-2.html)

Whether I get a parallel aggregate seems entirely determined by the number
of rows, not the cost of preparing those rows.

When changing the number of rows in the subquery, with a LIMIT, I can
change from a seq scan to a paralllel seq scan and finally to a parallel
aggregate, as the number of rows goes up.

An odd effect: when I have enough rows to get a paralllel seq scan, I get
flip it back to a seq scan, by *increasing* the cost of ST_AsMVTGeom. That
seems odd and backwards.

Is there anywhere a guide or rough description to how costs are used in
determining parallel plans? The empirical approach starts to wear one down
after a while :)

P.

Reply via email to