Hello
We are developing an application and would like to compute statistics on it in order: - to have a better understanding of what is used mostly in our application to model at best our load test scenarios.
- to get information on the usage of the application for other departments.

The problem is that our application is currently read mostly while statistics logging is more a log write mostly process. And stats collect will generate a huge volume of data (because a very low granularity is mandatory). We would like to avoid as much as possible any interference of the stats collecting with the main application.

We have looked in the Postgres documentation and we have found several ideas:
- We have decided to isolate stats in a specific schema.
- We have looked at polymorphism in order to split our stat tables in smallest ones that we could "detach" when they are old. - We have looked at fsync tuning or better at asynchronous commit as these data are not critical.

But we have been facing several questions/problems:

Polymorphism and ORM question:
- First as we are using an ORM tool around PG access, the rule we defined in the Polymorphism returned 0 after an insert because the last rule was generally not the one that made the insert. In our case we know that only a single rule will match, so we made a hack setting the active rule name with a zzz but that is very hacky. in that case anyway Hibernate is happy.

One or several databases, one or several servers ?
- In such a case could we store both our application content and stats in the same database ? Should we better use two databases in the same cluster or should we even have to different dedicated servers ? - If we want to use fsync, I suppose we need two separated servers. I read that asynchronous commit can be set for a transaction. Is there a way to say that a given cluster or tables are in asynchronous commit by default, perhaps with triggers ....

We would like to archive old data collected in slow file storage in any case but would like to avoid having our database reaching Tb only for data collecting concerns. May be this is a bad idea. Anyway if this is not so bad, we have again questions: With polymorphism we can dump some tables regularly. But polymorphism has been seen a bit complex and we were studying a simpler way to and we also have to study other ways with simpler but larger stats tables. We have studied the simple pg_dump command with only the data but we would need to dump only a part of the table. Thus we have looked at the COPY command which seems interesting in our case. Are there experience or any feedback on that command.

Sorry, there are many questions, our problem is a bit wide because there are several concerns:
- Polymorphism or not
- One or several DB clusters or servers
- Fsync/asynchronous problem
- Rule limitations
- Use of COPY
But to sum up we would like to collect statistics (write mostly tables, high volume generation, data not critical) on an application usage on a read mostly DB with the least impact on this DB perfs. ANn we would also like to be able to archive outside the DB, the old collected data.

Thanks for any help!

Pascal


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to