Hi,

While testing pg_restore_extended_stats(), I noticed several small issues.

1. Inconsistent expression key verification behavior

In this example, “b” is an invalid key, so it raises a warning and rejects the 
input:
```
evantest=# select pg_restore_extended_stats(
    'schemaname', 'repro’,
    'relname', 't’,
    'statistics_schemaname', 'repro’,
    'statistics_name', 's’,
    'inherited', false, 
    'exprs', '[{"b": "1"}]'::jsonb);
WARNING:  could not import element in expression -1: invalid key name
 pg_restore_extended_stats
---------------------------
 f
(1 row)
```

However, when I tried “a”, which is also an invalid key, it is silently ignored:
```
evantest=# select pg_restore_extended_stats(
    'schemaname', 'repro’,
    'relname', 't’,
    'statistics_schemaname', 'repro’,
    'statistics_name', 's’,
    'inherited', false,
    'exprs', '[{"a": "1"}]'::jsonb);
 pg_restore_extended_stats
---------------------------
 t
(1 row)
```

After debugging, I think the problem is here:
```
/*
 * Check if key is found in the list of expression argnames.
 */
static bool
key_in_expr_argnames(JsonbValue *key)
{
        Assert(key->type == jbvString);
        for (int i = 0; i < NUM_ATTRIBUTE_STATS_ELEMS; i++)
        {
                if (strncmp(extexprargname[i], key->val.string.val, 
key->val.string.len) == 0)
                        return true;
        }
        return false;
}
```

This function is actually doing a prefix comparison using the input key length, 
so as long as an input key matches a prefix of a valid key name, the function 
returns true.

At runtime, it does not lead to an incorrect value being imported, because the 
invalid key will still be filtered out later. But one bad scenario I can 
imagine is that a user is trying to set “correlation”, and accidentally omits 
the last “n”. In that case, the key is silently discarded and the user is not 
aware of it, which can lead to a surprising result. For example:
```
evantest=# select pg_restore_extended_stats(
    'schemaname', 'repro’,
    'relname', 't’,
    'statistics_schemaname', 'repro’,
    'statistics_name', 's’,
    'inherited', false,
    'exprs', '[{"n_distinct": "5", "correlatio": "-0.6"}]'::jsonb);
 pg_restore_extended_stats
---------------------------
 t
(1 row)

evantest=# select n_distinct, correlation from pg_stats_ext_exprs where 
statistics_schemaname = 'repro' and statistics_name = 's';
 n_distinct | correlation
------------+-------------
          5 |
(1 row)
```

Here, the user may think correlation has been set, but it has not, and there is 
no warning. I think this behavior should be fixed.

The fix is straightforward. If we keep using strncmp() for safety, we should 
also compare the lengths of both strings.

2. Wrong number in a warning message

I have only one expression in this example:
```
evantest=# select pg_restore_extended_stats(
evantest(#   'schemaname', 'repro',
evantest(#   'relname', 't',
evantest(#   'statistics_schemaname', 'repro',
evantest(#   'statistics_name', 's',
evantest(#   'inherited', false,
evantest(#   'exprs', '[{"n_distinct": "1"}, {"n_distinct": "2"}, 
{"n_distinct": "3"}]'::jsonb
evantest(# );
WARNING:  could not parse "exprs": incorrect number of elements (3 required)
 pg_restore_extended_stats
---------------------------
 f
(1 row)
```

The warning message says “3 required”, but it should be “1 required”.

The root cause is here:
```
        num_root_elements = JsonContainerSize(root);
        if (numexprs != num_root_elements)
        {
                ereport(WARNING,
                                errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                                errmsg("could not parse \"%s\": incorrect 
number of elements (%d required)",
                                           argname, num_root_elements));
                goto exprs_error;
        }
```

In the ereport, we should use numexprs instead.

3. Inconsistent heap_freetuple()

In pg_clear_extended_stats(), heap_freetuple(tup) is only called before the 
final return. However, in two earlier paths, the function only emits warning 
messages and returns. It seems those paths should also call heap_freetuple(tup).

But from my previous experience, I know this kind of issue is usually not 
considered as serious, so I only point it out here without making a fix for now.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/




Attachment: v1-0001-Fix-prefix-matching-of-expression-stats-keys.patch
Description: Binary data

Attachment: v1-0002-Fix-required-expression-count-in-stats-import-war.patch
Description: Binary data

Reply via email to