All,

As we try and make PostGIS more "parallel sensitive" we have been added
costs to our functions, so that their relative CPU cost is more accurately
reflected in parallel plans.

This has resulted in an odd side effect: some of our "wrapper" functions
stop giving index scans in plans [1]. This is a problem!

An example of a "wrapper" is ST_Intersects(geom1, geom2). It combines an
index operation (geom1 && geom2) with an exact spatial test
(_ST_Intersects(geom1, geom2). This is primarily for user convenience, and
has worked for us well for a decade and more. Having this construct stop
working is definitely a problem.

As we add costs to our functions, the odds increase that one of the
parameters to a wrapper might be a costed function. It's not uncommon to
see:

ST_Interects(geom, ST_SetSRID('POLYGON(...)', 4326))

It's fair to say that we really do depend on our wrappers getting inlined
basically all the time. They are simple functions, they do nothing other
than 'SELECT func1() AND func2() AND arg1 && arg2'.

However, once costs are added to the parameters, the inlining can be turned
off relatively quickly. Here's a PgSQL native example:

    -- Create data table and index. Analyze.
    DROP TABLE IF EXISTS boxen;
    CREATE TABLE boxen AS
        SELECT row_number() OVER() As gid,
        box(point(x, y),point(x+1, y+1)) AS b, x, y
        FROM generate_series(-100,100) As y, generate_series(-100,100) As x;
    CREATE INDEX idx_b_geom_gist ON boxen USING gist(b);
    ANALYZE boxen;

    -- An inlined function
    -- When set 'STRICT' it breaks index access
    -- However 'IMMUTABLE' doesn't seem to bother it
    CREATE OR REPLACE FUNCTION good_box(box, box)
        RETURNS boolean
        AS 'SELECT $1 OPERATOR(&&) $2 AND length(lseg(point($1),point($2)))
< 3'
        LANGUAGE 'sql';

    -- Start with a low cost circle()
    ALTER FUNCTION circle(point, double precision) COST 1;

    -- [A] Query plan hits index
    EXPLAIN SELECT gid
    FROM boxen
    WHERE good_box(
        boxen.b,
        box(circle(point(20.5, 20.5), 2))
        );

    -- [B] Query plan hits index
    EXPLAIN SELECT gid
    FROM boxen,
        (SELECT x, y FROM boxen WHERE x < 0 and y < 0) AS c
    WHERE good_box(
        boxen.b,
        box(circle(point(c.x, c.y), 2))
        );

    -- Increase cost of circle
    ALTER FUNCTION circle(point, double precision) COST 100;

    -- [B] Query plan does not hit index!
    EXPLAIN SELECT gid
    FROM boxen,
        (SELECT x, y FROM boxen WHERE x < 0 and y < 0) AS c
    WHERE good_box(
        boxen.b,
        box(circle(point(c.x, c.y), 2))
        );

The inlining is getting tossed out on a test of how expensive the function
parameters are [2]. As a result, we lose what is really the correct plan,
and get a sequence scan instead of an index scan.

The test of parameter cost seems quite old (15+ years) and perhaps didn't
anticipate highly variable individual function costs (or maybe it did). As
it stands though, PostGIS is currently stuck choosing between having costs
on our functions or having our inlined wrappers, because we cannot have
both at the same time.

Any thoughts?

Thanks!

P.


[1] https://trac.osgeo.org/postgis/ticket/3675#comment:18
[2]
https://github.com/postgres/postgres/blob/ae20b23a9e7029f31ee902da08a464d968319f56/src/backend/optimizer/util/clauses.c#L4581-L4584

Reply via email to