On 20.02.23 15:03, Peter Eisentraut wrote:
On 15.02.23 19:04, Alvaro Herrera wrote:
That said, I think most of this code is invoked for DDL, where
performance is not so critical; probably just fixing
get_object_property_data to not be so naïve would suffice.

Ok, I'll look into that.

I did a variety of performance testing on this now.

I wrote a C function that calls the "is visible" functions in a tight loop:

Datum
pg_test_visible(PG_FUNCTION_ARGS)
{
    int32 count = PG_GETARG_INT32(0);
    Oid relid = PG_GETARG_OID(1);
    Oid typid = PG_GETARG_OID(2);

    for (int i = 0; i < count; i++)
    {
        RelationIsVisible(relid);
        TypeIsVisible(typid);
        //ObjectIsVisible(RelationRelationId, relid);
        //ObjectIsVisible(TypeRelationId, typid);
    }

    PG_RETURN_VOID();
}

(It's calling two different ones to defeat the caching in get_object_property_data().)

Here are some run times:

unpatched:

select pg_test_visible(100_000_000, 'pg_class', 'int4');
Time: 4536.747 ms (00:04.537)

select pg_test_visible(100_000_000, 'tenk1', 'widget');
Time: 10828.802 ms (00:10.829)

(Note that the "is visible" functions special case system catalogs.)

patched:

select pg_test_visible(100_000_000, 'pg_class', 'int4');
Time: 11409.948 ms (00:11.410)

select pg_test_visible(100_000_000, 'tenk1', 'widget');
Time: 18649.496 ms (00:18.649)

So, it's slower, but it's not clear whether it matters in practice, considering this test.

I also wondered if this is visible through a normal external function call, so I tried

do $$ begin perform pg_get_statisticsobjdef(28999) from generate_series(1, 1_000_000); end $$;

(where that is the OID of the first object from select * from pg_statistic_ext; in the regression database).

unpatched:

Time: 6952.259 ms (00:06.952)

patched (first patch only):

Time: 6993.655 ms (00:06.994)

patched (both patches):

Time: 7114.290 ms (00:07.114)

So there is some visible impact, but again, the test isn't realistic.

Then I tried a few ways to make get_object_property_data() faster. I tried building a class_id+index cache that is qsort'ed (once) and then bsearch'ed, that helped only minimally, not enough to make up the difference. I also tried just searching the class_id+index cache linearly, hoping maybe that if the cache is smaller it would be more efficient to access, but that actually made things (minimally) worse, probably because of the indirection. So it might be hard to get much more out of this. I also thought about PerfectHash, but I didn't code that up yet.

Another way would be to not use get_object_property_data() at all but write a "common" function that we pass in all it needs hardcodedly, like

bool
RelationIsVisible(Oid relid)
{
    return IsVisible_common(RELOID,
                            Anum_pg_class_relname
                            Anum_pg_class_relnamespace);
}

This would still save a lot of duplicate code.

But again, I don't think the micro-performance really matters here.



Reply via email to