[GENERAL] Statistics Data archiving with Postgres

Pascal Cohen Thu, 31 Jul 2008 01:16:04 -0700

Hello

We are developing an application and would like to compute statistics onit in order:- to have a better understanding of what is used mostly in ourapplication to model at best our load test scenarios.

- to get information on the usage of the application for other departments.

The problem is that our application is currently read mostly whilestatistics logging is more a log write mostly process. And stats collectwill generate a huge volume of data (because a very low granularity ismandatory).We would like to avoid as much as possible any interference of the statscollecting with the main application.

We have looked in the Postgres documentation and we have found severalideas:

- We have decided to isolate stats in a specific schema.

- We have looked at polymorphism in order to split our stat tables insmallest ones that we could "detach" when they are old.- We have looked at fsync tuning or better at asynchronous commit asthese data are not critical.


But we have been facing several questions/problems:

Polymorphism and ORM question:

- First as we are using an ORM tool around PG access, the rule wedefined in the Polymorphism returned 0 after an insert because the lastrule was generally not the one that made the insert. In our case we knowthat only a single rule will match, so we made a hack setting the activerule name with a zzz but that is very hacky. in that case anywayHibernate is happy.


One or several databases, one or several servers ?

- In such a case could we store both our application content and statsin the same database ? Should we better use two databases in the samecluster or should we even have to different dedicated servers ?- If we want to use fsync, I suppose we need two separated servers. Iread that asynchronous commit can be set for a transaction. Is there away to say that a given cluster or tables are in asynchronous commit bydefault, perhaps with triggers ....

We would like to archive old data collected in slow file storage in anycase but would like to avoid having our database reaching Tb only fordata collecting concerns. May be this is a bad idea. Anyway if this isnot so bad, we have again questions:With polymorphism we can dump some tables regularly. But polymorphismhas been seen a bit complex and we were studying a simpler way to and wealso have to study other ways with simpler but larger stats tables.We have studied the simple pg_dump command with only the data but wewould need to dump only a part of the table. Thus we have looked at theCOPY command which seems interesting in our case. Are there experienceor any feedback on that command.

Sorry, there are many questions, our problem is a bit wide because thereare several concerns:

- Polymorphism or not
- One or several DB clusters or servers
- Fsync/asynchronous problem
- Rule limitations
- Use of COPY

But to sum up we would like to collect statistics (write mostly tables,high volume generation, data not critical) on an application usage on aread mostly DB with the least impact on this DB perfs. ANn we would alsolike to be able to archive outside the DB, the old collected data.


Thanks for any help!

Pascal


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Statistics Data archiving with Postgres

Reply via email to