Re: [HACKERS] [rfc] overhauling pgstat.stat

Atri Sharma Wed, 04 Sep 2013 05:39:19 -0700


Sent from my iPad


On 04-Sep-2013, at 15:39, Satoshi Nagayasu <sn...@uptime.jp> wrote:

> (2013/09/04 15:23), Atri Sharma wrote:
>> 
>> 
>> Sent from my iPad
>> 
>> On 04-Sep-2013, at 10:54, Satoshi Nagayasu <sn...@uptime.jp> wrote:
>> 
>>> Hi,
>>> 
>>> (2013/09/04 12:52), Atri Sharma wrote:
>>>> On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <sn...@uptime.jp> wrote:
>>>>> Hi,
>>>>> 
>>>>> I'm considering overhauling pgstat.stat, and would like to know how many
>>>>> people are interested in this topic.
>>>>> 
>>>>> As you may know, this file could be handreds of MB in size, because
>>>>> pgstat.stat holds all access statistics in each database, and it needs
>>>>> to read/write an entire pgstat.stat frequently.
>>>>> 
>>>>> As a result, pgstat.stat often generates massive I/O operation,
>>>>> particularly when having a large number of tables in the database.
>>>>> 
>>>>> To support multi-tenancy or just a large number of tables (up to 10k
>>>>> tables in single database), I think pgstat.stat needs to be overhauled.
>>>>> 
>>>>> I think using heap and btree in pgstat.stat would be preferred to reduce
>>>>> read/write and to allow updating access statistics for specific tables
>>>>> in pgstat.stat file.
>>>>> 
>>>>> Is this good for us?
>>>> 
>>>> Hi,
>>>> 
>>>> Nice thought. I/O reduction in pgstat can be really helpful.
>>>> 
>>>> I am trying to think of our aim here. Would we be looking to split
>>>> pgstat per table, so that the I/O write happens for only a portion of
>>>> pgstat? Or reduce the I/O in general?
>>> 
>>> I prefer the latter.
>>> 
>>> Under the current implementation, DBA need to split single database
>>> into many smaller databases with considering access locality of the
>>> tables. It's difficult and could be change in future.
>>> 
>>> And splitting the statistics data into many files (per table,
>>> for example) would cause another performance issue when
>>> collecting/showing statistics at once. Just my guess though.
>>> 
>>> So, I'm looking for a new way to reduce I/O for the statistics data
>>> in general.
>>> 
>>> Regards,
>>> 
>>>> 
>>>> If the later, how would using BTree help us? I would rather go for a
>>>> range tree or something. But again, I may be completely wrong.
>>>> 
>>>> Please elaborate a bit more on the solution we are trying to
>>>> achieve.It seems really interesting.
>>>> 
>>>> Regards,
>>>> 
>>>> Atri
>> 
>> Right,thanks.
>> 
>> How would using heap and BTree help here? Are we looking at a priority queue 
>> which supports the main storage system of the stats?
> 
> For example, when you read only a single block from your table,
> then you need to write all values in your database statistics next.
> It often generates large amount of i/o operation.
> 
> However, if random access is allowed in the statistics, you can
> update only as single record for the specific table which you read.
> It would be less than 100 bytes for each table.
> 
> I have no idea about how a priority queue can work here so far.
> However, if the statistics is overhauled, PostgreSQL would be able
> to host a much larger number of customers 


Ah, now I get it. Thanks a ton for the detailed explanation.

Yes, a BTree will sufficiently isolate per table stats here and allow for 
random access.

Another thing I can think of is having a write back cache which could probably 
be used for a buffer before the actual stats write. I am just musing here 
though.

Regards,

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

Reply via email to