On 02.11.2019 8:30, Pavel Stehule wrote:
pá 1. 11. 2019 v 17:09 odesílatel Konstantin Knizhnik
<k.knizh...@postgrespro.ru <mailto:k.knizh...@postgrespro.ru>> napsal:
On 01.11.2019 18:26, Robert Haas wrote:
> On Fri, Nov 1, 2019 at 11:15 AM Konstantin Knizhnik
> <k.knizh...@postgrespro.ru <mailto:k.knizh...@postgrespro.ru>>
wrote:
>> It seems to me that I have found quite elegant solution for
per-backend statistic for GTT: I just inserting it in backend's
catalog cache, but not in pg_statistic table itself.
>> To do it I have to add InsertSysCache/InsertCatCache functions
which insert pinned entry in the correspondent cache.
>> I wonder if there are some pitfalls of such approach?
> That sounds pretty hackish. You'd have to be very careful, for
> example, that if the tables were dropped or re-analyzed, all of the
> old entries got removed --
I have checked it:
- when table is reanalyzed, then cache entries are replaced.
- when table is dropped, then cache entries are removed.
> and then it would still fail if any code
> tried to access the statistics directly from the table, rather than
> via the caches. My assumption is that the statistics ought to be
> stored in some backend-private data structure designed for that
> purpose, and that the code that needs the data should be taught to
> look for it there when the table is a GTT.
Yes, if you do "select * from pg_statistic" then you will not see
statistic for GTT in this case.
But I do not think that it is so critical. I do not believe that
anybody
is trying to manually interpret values in this table.
And optimizer is retrieving statistic through sys-cache mechanism
and so
is able to build correct plan in this case.
Years ago, when I though about it, I wrote patch with similar design.
It's working, but surely it's ugly.
I have another idea. Can be pg_statistics view instead a table?
Some like
SELECT * FROM pg_catalog.pg_statistics_rel
UNION ALL
SELECT * FROM pg_catalog.pg_statistics_gtt();
And pg_catalog.pg_statistics_gtt() is set returning functions?
I afraid that it is not acceptable solution from performance point of
view: pg_statictic table is accessed by keys (<relid>,<attpos>,<inh>)
If it can not be done using index scan, then it can cause significant
performance slow down.
Internally - when stat cache is filled, then there can be used
pg_statistics_rel and pg_statistics_gtt() directly. What I remember,
there was not possibility to work with queries, only with just relations.
Or crazy idea - today we can implement own types of heaps. Is possible
to create engine where result can be combination of some shared data
and local data. So union will be implemented on heap level.
This implementation can be simple, just scanning pages from shared
buffers and from local buffers. For these tables we don't need complex
metadata. It's crazy idea, and I think so union with table function
should be best.
Frankly speaking, implementing special heap access method for
pg_statistic just to handle case of global temp tables seems to be overkill
from my point of view. It requires a lot coding (or at least copying a
lot of code from heapam). Also, as I wrote above, we need also index for
efficient lookup of statistic.