Re: [GENERAL] Postgres or Greenplum

2011-06-08 Thread Leonardo Francalanci
On 07/06/2011 23.52, Tom Lane wrote: 
 Very fast on a very narrow set of use cases ...   

Can you explain a little (if possible)?

Thank you

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres or Greenplum

2011-06-08 Thread Gabriele Bartolini

Hi Radoslaw,

On Wed, 08 Jun 2011 07:30:33 +0200, Radosław Smogura 
rsmog...@softperience.eu wrote:


But, I think GreenPlum is share nothing, isn't it?


Yes, indeed. In very simple words Greenplum is a parallel processing 
database solution that implements the shared-nothing architecture. One 
master server is responsible for managing data distribution and query 
processing among several segments - usually residing on multiple servers 
(that do not share any physical resource, but the network). The shared 
nothing architecture allows the system to be linearly scalable by adding 
commodity hardware to the cluster.


There is a special version of Greenplum (Greenplum Community Edition) 
that can be used for testing and development (the license allows usage 
in production environments as well under certain circumstances - for 
instance research). Greenplum is owned by EMC and is not open-source.


Greenplum typical usage is for data warehousing purposes, as main 
source for business intelligence and analytics query. If you are 
interested in topics like this, I suggest that you participate to 
Char(11) (www.char11.org), the second edition of the Clustering, High 
Availability and Replication conference.


Cheers,
Gabriele

--
 Gabriele Bartolini - 2ndQuadrant Italia
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it - www.2ndQuadrant.it

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres or Greenplum

2011-06-07 Thread Tom Lane
Simon Windsor simon.wind...@cornfield.me.uk writes:
 I have been using Postgres for many years and have recently discover
 Greenplum, which appears to be a heavily modify Postgres based, multi node
 DB that is VERY fast.

Very fast on a very narrow set of use cases ...

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres or Greenplum

2011-06-07 Thread Simon Riggs
On Tue, Jun 7, 2011 at 10:26 PM, Simon Windsor
simon.wind...@cornfield.me.uk wrote:

 I have been using Postgres for many years and have recently discover
 Greenplum, which appears to be a heavily modify Postgres based, multi node
 DB that is VERY fast.

 All the tests that I have seen suggest that Greenplum when implemented on a
 single server, like Postgres, but with several  separate installations can
 be many time times faster than Postgres. This is achieved by using multiple
  DBs to store the data and using multiple logger  and writer processes to
 fully use the all the resources of the server.

 Has the Postgres development team ever considered using this technique to
 split the data into separate sequential files that can be accessed by
 multiple writers/reader processes? If so, what was the conclusion?

 Finally,  thanks for all the good work over the years!

Yes, I've looked at implementing parallel query a number of times. My
estimate was that its about 2 man years effort to do something
worthwhile there, and so far nobody has offered funding for such a
task. There was some recent discussion about obtaining funding
recently, so we'll see how that goes. It is of course reasonably
straightforward to achieve trivial parallelism, but that's mostly
useless in the real world. So its on the roadmap, but some way off
yet.

Many commercial implementations exist, and IMHO the Greenplum solution
is the best general purpose DW solution currently available for
PostgreSQL-like environments. Greenplum does have a community edition
that is free to use and your stated performance results match my
experience. We've worked with a number of data warehouse customers
hitting the limits and moving up to Greenplum. Once people give up the
Oracle mantra, it frees them to consider a range of alternatives.

Main reasons for deferring work on parallel query has been that other
techniques have been easier to achieve useful gains with. For example,
partitioning allowed PostgreSQL to dramatically reduce scan times with
less complexity. Synchronous scans can also achieve good efficiencies
for cases where total throughput is important. I expect to do more
work on improving decision support query performance in the next
release (9.2), so if anybody wishes to partially fund development that
would be much appreciated.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres or Greenplum

2011-06-07 Thread Radosław Smogura

On Tue, 7 Jun 2011 23:04:04 +0100, Simon Riggs wrote:

On Tue, Jun 7, 2011 at 10:26 PM, Simon Windsor
simon.wind...@cornfield.me.uk wrote:


I have been using Postgres for many years and have recently discover
Greenplum, which appears to be a heavily modify Postgres based, 
multi node

DB that is VERY fast.

All the tests that I have seen suggest that Greenplum when 
implemented on a
single server, like Postgres, but with several  separate 
installations can
be many time times faster than Postgres. This is achieved by using 
multiple
 DBs to store the data and using multiple logger  and writer 
processes to

fully use the all the resources of the server.

Has the Postgres development team ever considered using this 
technique to
split the data into separate sequential files that can be accessed 
by

multiple writers/reader processes? If so, what was the conclusion?

Finally,  thanks for all the good work over the years!


Yes, I've looked at implementing parallel query a number of times. My
estimate was that its about 2 man years effort to do something
worthwhile there, and so far nobody has offered funding for such a
task. There was some recent discussion about obtaining funding
recently, so we'll see how that goes. It is of course reasonably
straightforward to achieve trivial parallelism, but that's mostly
useless in the real world. So its on the roadmap, but some way off
yet.

Many commercial implementations exist, and IMHO the Greenplum 
solution

is the best general purpose DW solution currently available for
PostgreSQL-like environments. Greenplum does have a community edition
that is free to use and your stated performance results match my
experience. We've worked with a number of data warehouse customers
hitting the limits and moving up to Greenplum. Once people give up 
the

Oracle mantra, it frees them to consider a range of alternatives.

Main reasons for deferring work on parallel query has been that other
techniques have been easier to achieve useful gains with. For 
example,
partitioning allowed PostgreSQL to dramatically reduce scan times 
with

less complexity. Synchronous scans can also achieve good efficiencies
for cases where total throughput is important. I expect to do more
work on improving decision support query performance in the next
release (9.2), so if anybody wishes to partially fund development 
that

would be much appreciated.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


But, I think GreenPlum is share nothing, isn't it?

Regards,
Radek

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general