2013/9/4 Atri Sharma <atri.j...@gmail.com> > > > Sent from my iPad > > On 04-Sep-2013, at 15:39, Satoshi Nagayasu <sn...@uptime.jp> wrote: > > > (2013/09/04 15:23), Atri Sharma wrote: > >> > >> > >> Sent from my iPad > >> > >> On 04-Sep-2013, at 10:54, Satoshi Nagayasu <sn...@uptime.jp> wrote: > >> > >>> Hi, > >>> > >>> (2013/09/04 12:52), Atri Sharma wrote: > >>>> On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <sn...@uptime.jp> > wrote: > >>>>> Hi, > >>>>> > >>>>> I'm considering overhauling pgstat.stat, and would like to know how > many > >>>>> people are interested in this topic. > >>>>> > >>>>> As you may know, this file could be handreds of MB in size, because > >>>>> pgstat.stat holds all access statistics in each database, and it > needs > >>>>> to read/write an entire pgstat.stat frequently. > >>>>> > >>>>> As a result, pgstat.stat often generates massive I/O operation, > >>>>> particularly when having a large number of tables in the database. > >>>>> > >>>>> To support multi-tenancy or just a large number of tables (up to 10k > >>>>> tables in single database), I think pgstat.stat needs to be > overhauled. > >>>>> > >>>>> I think using heap and btree in pgstat.stat would be preferred to > reduce > >>>>> read/write and to allow updating access statistics for specific > tables > >>>>> in pgstat.stat file. > >>>>> > >>>>> Is this good for us? > >>>> > >>>> Hi, > >>>> > >>>> Nice thought. I/O reduction in pgstat can be really helpful. > >>>> > >>>> I am trying to think of our aim here. Would we be looking to split > >>>> pgstat per table, so that the I/O write happens for only a portion of > >>>> pgstat? Or reduce the I/O in general? > >>> > >>> I prefer the latter. > >>> > >>> Under the current implementation, DBA need to split single database > >>> into many smaller databases with considering access locality of the > >>> tables. It's difficult and could be change in future. > >>> > >>> And splitting the statistics data into many files (per table, > >>> for example) would cause another performance issue when > >>> collecting/showing statistics at once. Just my guess though. > >>> > >>> So, I'm looking for a new way to reduce I/O for the statistics data > >>> in general. > >>> > >>> Regards, > >>> > >>>> > >>>> If the later, how would using BTree help us? I would rather go for a > >>>> range tree or something. But again, I may be completely wrong. > >>>> > >>>> Please elaborate a bit more on the solution we are trying to > >>>> achieve.It seems really interesting. > >>>> > >>>> Regards, > >>>> > >>>> Atri > >> > >> Right,thanks. > >> > >> How would using heap and BTree help here? Are we looking at a priority > queue which supports the main storage system of the stats? > > > > For example, when you read only a single block from your table, > > then you need to write all values in your database statistics next. > > It often generates large amount of i/o operation. > > > > However, if random access is allowed in the statistics, you can > > update only as single record for the specific table which you read. > > It would be less than 100 bytes for each table. > > > > I have no idea about how a priority queue can work here so far. > > However, if the statistics is overhauled, PostgreSQL would be able > > to host a much larger number of customers > > > Ah, now I get it. Thanks a ton for the detailed explanation. > > Yes, a BTree will sufficiently isolate per table stats here and allow for > random access. > > Another thing I can think of is having a write back cache which could > probably be used for a buffer before the actual stats write. I am just > musing here though. >
we very successfully use a tmpfs volume for pgstat files (use a backport of multiple statfiles from 9.3 to 9.1) Regards Pavel > > Regards, > > Atri > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >