Re: [GENERAL] Up-to-date reports database

2012-05-24 Thread Sergey Konoplev
On Wed, May 23, 2012 at 5:11 PM, Herouth Maoz  wrote:
> A replication solution is not very good, either, because of course I can't 
> define indexes differently, I don't want *all* transactions in all tables to 
> be sent, and also, because I may want to cross reference data from different 
> systems. So ideally, I want to have a reporting database, where specific 
> tables (or maybe even just specific columns) from various databases are 
> collected, and have a reporting tool connect to this database. But I want to 
> push the data into into that database as close to real time as possible.

Look at PgQ from Skytools. You can queue your OLTP data changes and
restore specific columns only into your OLAP database.

>
> The most important data I am currently considering are two tables which have 
> an average of 7,600 transactions per hour (standard deviation 10,000, maximum 
> in May is 62,000 transactions per hour). There may be similar pairs of tables 
> collected from more than one database.
>
> I assume this is not an uncommon scenario. What solutions would you recommend?
>
>
> Herouth
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



-- 
Sergey Konoplev

a database and software architect
http://www.linkedin.com/in/grayhemp

Jabber: gray...@gmail.com Skype: gray-hemp Phone: +79160686204

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Up-to-date reports database

2012-05-23 Thread Herouth Maoz

On 23/05/2012, at 18:54, Bartosz Dmytrak wrote:

> hi,
> my suggestion is to redesign reporting database to fit reporting specifics 
> (e.g. brake normal form of database, in some cases this will speed up 
> reports). Than you can use some ETL tool to sync production and reporting. 
> Good thing is to use some OLAP software to use multidimensional analyze - 
> this will make queries easier (with MDX language). I think this kind of 
> discussion is huge one :)
> 
> there are some opensource ETL and BI suits available.

Thanks, I'll take that into consideration.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Up-to-date reports database

2012-05-23 Thread Bartosz Dmytrak
hi,
my suggestion is to redesign reporting database to fit reporting specifics
(e.g. brake normal form of database, in some cases this will speed up
reports). Than you can use some ETL tool to sync production and reporting.
Good thing is to use some OLAP software to use multidimensional analyze -
this will make queries easier (with MDX language). I think this kind of
discussion is huge one :)

there are some opensource ETL and BI suits available.

Regards,
Bartek


Re: [GENERAL] Up-to-date reports database

2012-05-23 Thread Herouth Maoz

On 23/05/2012, at 17:20, Chris Ernst wrote:

> I would have a look at slony.  It is a trigger based replication system
> that allows you to replicate only the tables you define and you can have
> different indexing on the slave.  The only requirement is that each
> table you want to replicate has the same primary key or unique
> constraint on the master and slave.  Other than that, you can index the
> tables on the slave however you want.

Thanks for the pointer. I will read up about it.

Herouth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Up-to-date reports database

2012-05-23 Thread Chris Ernst
On 05/23/2012 07:11 AM, Herouth Maoz wrote:
> Hi guys,
> 
> I'm interested in a solution that will allow our customers to run reports - 
> which may involve complicated queries - on data which is as up-to-date as 
> possible.
> 
> One thing I don't want to do is to let the reporting system connect to the 
> production database. I want the indexes in production to be limited to what 
> production needs, and not add indexes that are required for reports, for 
> instance. And basically, I don't want a customer to run a complicated report 
> and degrade the performance of my production system.
> 
> A replication solution is not very good, either, because of course I can't 
> define indexes differently, I don't want *all* transactions in all tables to 
> be sent, and also, because I may want to cross reference data from different 
> systems. So ideally, I want to have a reporting database, where specific 
> tables (or maybe even just specific columns) from various databases are 
> collected, and have a reporting tool connect to this database. But I want to 
> push the data into into that database as close to real time as possible.
> 
> The most important data I am currently considering are two tables which have 
> an average of 7,600 transactions per hour (standard deviation 10,000, maximum 
> in May is 62,000 transactions per hour). There may be similar pairs of tables 
> collected from more than one database.
> 
> I assume this is not an uncommon scenario. What solutions would you recommend?

Hi Herouth,

I would have a look at slony.  It is a trigger based replication system
that allows you to replicate only the tables you define and you can have
different indexing on the slave.  The only requirement is that each
table you want to replicate has the same primary key or unique
constraint on the master and slave.  Other than that, you can index the
tables on the slave however you want.

Cheers!

- Chris

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Up-to-date reports database

2012-05-23 Thread Herouth Maoz
Hi guys,

I'm interested in a solution that will allow our customers to run reports - 
which may involve complicated queries - on data which is as up-to-date as 
possible.

One thing I don't want to do is to let the reporting system connect to the 
production database. I want the indexes in production to be limited to what 
production needs, and not add indexes that are required for reports, for 
instance. And basically, I don't want a customer to run a complicated report 
and degrade the performance of my production system.

A replication solution is not very good, either, because of course I can't 
define indexes differently, I don't want *all* transactions in all tables to be 
sent, and also, because I may want to cross reference data from different 
systems. So ideally, I want to have a reporting database, where specific tables 
(or maybe even just specific columns) from various databases are collected, and 
have a reporting tool connect to this database. But I want to push the data 
into into that database as close to real time as possible.

The most important data I am currently considering are two tables which have an 
average of 7,600 transactions per hour (standard deviation 10,000, maximum in 
May is 62,000 transactions per hour). There may be similar pairs of tables 
collected from more than one database.

I assume this is not an uncommon scenario. What solutions would you recommend?


Herouth
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general