Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-11 Thread Pavel Stehule
2016-09-11 9:23 GMT+02:00 dandl :

> *From:* pgsql-general-ow...@postgresql.org [mailto:pgsql-general-owner@
> postgresql.org] *On Behalf Of *Pavel Stehule
>
> I guess my question then is: how much do you pay for that durability? If
> you benchmark Postgres configured for pure in-memory usage with absolutely
> no writes to disk (or SSD or network), where is it spending its time? Is
> there a lot of overhead in getting data in and out of cache buffers and
> conversions and in concurrency control?
>
>
>
> It is not about durability only.
>
> Postgres holds data in format equal or similar to saved data on persistent
> storage. There are repeated serialization and deserialization. Some
> structures are designed to be simply saved (like Btree), but the
> performance is second target.
>
> I believe so new memory databases can be 10-100x faster - depends on use
> case, because they hold data primary in memory and uses different data
> structures. The performance of these databases is great, when all data are
> well placed in memory all time. But the performance is pretty bad, when
> this rule is not true. There is another issue - when you increase speed of
> database write operations, probably you will hit a file system limits, spin
> lock issues - so it is one reason, why big system are based on distributed
> systems more and more.
>
> That’s the point I’m making, exactly. The question is: does anyone have a
> handle on how big that cost really is, as a guide to whether to try to do
> anything about it? Is it really 25x as Stonebraker says?
>

Some years ago the MonetDB developers did comparing with C++ applications.
The cost of generic database is difference of speed between generic
database and native C application.

I did some benchmarks of MonetDB and it is really pretty fast for OLAP.

You can try to implement TPC-B benchmark in C (without SQL - the cost of
SQL is not significant), and you can check it.

Regards

Pavel





>
>
> Regards
>
> David M Bennett FACS
> *--*
>
> *Andl - A New Database Language - andl.org *
>
>
>
>
>
>
>


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-11 Thread dandl
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Pavel Stehule



I guess my question then is: how much do you pay for that durability? If you 
benchmark Postgres configured for pure in-memory usage with absolutely no 
writes to disk (or SSD or network), where is it spending its time? Is there a 
lot of overhead in getting data in and out of cache buffers and conversions and 
in concurrency control?

 

It is not about durability only. 

Postgres holds data in format equal or similar to saved data on persistent 
storage. There are repeated serialization and deserialization. Some structures 
are designed to be simply saved (like Btree), but the performance is second 
target.

I believe so new memory databases can be 10-100x faster - depends on use case, 
because they hold data primary in memory and uses different data structures. 
The performance of these databases is great, when all data are well placed in 
memory all time. But the performance is pretty bad, when this rule is not true. 
There is another issue - when you increase speed of database write operations, 
probably you will hit a file system limits, spin lock issues - so it is one 
reason, why big system are based on distributed systems more and more. 

That’s the point I’m making, exactly. The question is: does anyone have a 
handle on how big that cost really is, as a guide to whether to try to do 
anything about it? Is it really 25x as Stonebraker says?

 

Regards

David M Bennett FACS

  _  

Andl - A New Database Language - andl.org

 

 

 



Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-10 Thread Pavel Stehule
2016-09-11 7:20 GMT+02:00 dandl :

> > From: Jim Nasby [mailto:jim.na...@bluetreble.com]
> > My guess is this is a test scenario that completely favors VoltDB
> > while hamstringing Postgres, such as using no transaction durability
> > at all in VoltDB while using maximum durability in Postgres. Comparing
> > the cost of every COMMIT doing an fsync vs not could certainly produce
> > a 25x difference. There could be other cases where you'd get a 25x
> > difference.
>
> I guess my question then is: how much do you pay for that durability? If
> you benchmark Postgres configured for pure in-memory usage with absolutely
> no writes to disk (or SSD or network), where is it spending its time? Is
> there a lot of overhead in getting data in and out of cache buffers and
> conversions and in concurrency control?
>

It is not about durability only.

Postgres holds data in format equal or similar to saved data on persistent
storage. There are repeated serialization and deserialization. Some
structures are designed to be simply saved (like Btree), but the
performance is second target.

I believe so new memory databases can be 10-100x faster - depends on use
case, because they hold data primary in memory and uses different data
structures. The performance of these databases is great, when all data are
well placed in memory all time. But the performance is pretty bad, when
this rule is not true. There is another issue - when you increase speed of
database write operations, probably you will hit a file system limits, spin
lock issues - so it is one reason, why big system are based on distributed
systems more and more.

Regards

Pavel



>
> As a case study, assume an RBMS is required to monitor and record Internet
> (or phone or VHF) traffic. If the power goes off the traffic continues, and
> it really doesn’t matter whether you lose 60 seconds of down time or 63
> seconds; in any case another instance in another data centre will pick up
> the slack. So the requirement is atomicity yes, but not durability. Should
> you bid Postgres for the job, or look elsewhere? How much slower would
> Postgres be than a competitor? Do we care?


> > You need to be careful of benchmarks from commercial companies. MySQL
> > used to tout how fast it was compared to Postgres, using a benchmark
> > it created specifically for that purpose that had very little to do
> > with the real world. People eventually discovered that as soon as you
> > had a concurrent workload Postgres was actually faster.
>
> Of course; but at the same time insisting on including durability favours
> Postgres when I'm actually asking about alternatives.
>
> Regards
> David M Bennett FACS
>
> Andl - A New Database Language - andl.org
>
>
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-10 Thread dandl
> From: Jim Nasby [mailto:jim.na...@bluetreble.com]
> My guess is this is a test scenario that completely favors VoltDB
> while hamstringing Postgres, such as using no transaction durability
> at all in VoltDB while using maximum durability in Postgres. Comparing
> the cost of every COMMIT doing an fsync vs not could certainly produce
> a 25x difference. There could be other cases where you'd get a 25x
> difference.

I guess my question then is: how much do you pay for that durability? If you 
benchmark Postgres configured for pure in-memory usage with absolutely no 
writes to disk (or SSD or network), where is it spending its time? Is there a 
lot of overhead in getting data in and out of cache buffers and conversions and 
in concurrency control?

As a case study, assume an RBMS is required to monitor and record Internet (or 
phone or VHF) traffic. If the power goes off the traffic continues, and it 
really doesn’t matter whether you lose 60 seconds of down time or 63 seconds; 
in any case another instance in another data centre will pick up the slack. So 
the requirement is atomicity yes, but not durability. Should you bid Postgres 
for the job, or look elsewhere? How much slower would Postgres be than a 
competitor? Do we care?

> You need to be careful of benchmarks from commercial companies. MySQL
> used to tout how fast it was compared to Postgres, using a benchmark
> it created specifically for that purpose that had very little to do
> with the real world. People eventually discovered that as soon as you
> had a concurrent workload Postgres was actually faster.

Of course; but at the same time insisting on including durability favours 
Postgres when I'm actually asking about alternatives.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org







-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-10 Thread Jim Nasby

On 9/8/16 3:15 AM, Nicolas Grilly wrote:

So my question is not to challenge the Postgres way. It's simply to
ask whether there are any known figures that would directly support
or refute his claims. Does Postgres really spend 96% of its time in
thumb-twiddling once the entire database resides in memory?


Alas, I've been unable to find any relevant benchmark. I'm not motivated
enough to install a PostgreSQL and VoltDB and try it for myself :-)


My guess is this is a test scenario that completely favors VoltDB while 
hamstringing Postgres, such as using no transaction durability at all in 
VoltDB while using maximum durability in Postgres. Comparing the cost of 
every COMMIT doing an fsync vs not could certainly produce a 25x 
difference. There could be other cases where you'd get a 25x difference.


You need to be careful of benchmarks from commercial companies. MySQL 
used to tout how fast it was compared to Postgres, using a benchmark it 
created specifically for that purpose that had very little to do with 
the real world. People eventually discovered that as soon as you had a 
concurrent workload Postgres was actually faster.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-08 Thread Nicolas Grilly
On Thu, Sep 8, 2016 at 2:35 AM, dandl  wrote:

> I understand that. What I'm trying to get a handle on is the magnitude of
> that cost and how it influences other parts of the product, specifically
> for Postgres. If the overhead for perfect durability were (say) 10%, few
> people would care about the cost. But Stonebraker puts the figure at 2500%!
> His presentation says that a pure relational in-memory store can beat a row
> store with disk fully cached in memory by 10x to 25x. [Ditto column stores
> beat row stores by 10x for complex queries in non-updatable data.]
>

VoltDB replication is synchronous in the same cluster/data center, and
asynchronous with a remote cluster/data center. As a consequence, if your
application needs to survive a data center power failure with zero data
loss, then you have to enable VoltDB's synchronous command logging (which
by the way is not available in the Community Edition — only in the
Enterprise Edition). When Stonebraker says VoltDB's throughput is 10~25x
greater, I'd guess this is with no command logging at all, and no periodic
snapshotting.

So my question is not to challenge the Postgres way. It's simply to ask
> whether there are any known figures that would directly support or refute
> his claims. Does Postgres really spend 96% of its time in thumb-twiddling
> once the entire database resides in memory?
>

Alas, I've been unable to find any relevant benchmark. I'm not motivated
enough to install a PostgreSQL and VoltDB and try it for myself :-)


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-07 Thread dandl
> > I don't think this is quite true. The mechanism he proposes has a
> small window in which committed transactions can be lost, and this
> should be addressed by replication or by a small amount of UPC (a few
> seconds).
> 
> Except that's the entire point where all those kind of solutions
> *completely* depart ways from Postgres. Postgres is designed to *lose
> absolutely no data after a COMMIT*, potentially including requiring
> that data to be synchronized out to a second server. That is worlds
> apart from "we might lose a few seconds", and there's a lot of stuff
> Postgres has to worry about to accomplish that. Some of that stuff can
> be short-circuited if you don't care (that's what SET
> synchronous_commit = off does), but there's always going to be some
> amount of extra work to support synchronous_commit = local or
> remote_*.

I understand that. What I'm trying to get a handle on is the magnitude of that 
cost and how it influences other parts of the product, specifically for 
Postgres. If the overhead for perfect durability were (say) 10%, few people 
would care about the cost. But Stonebraker puts the figure at 2500%! His 
presentation says that a pure relational in-memory store can beat a row store 
with disk fully cached in memory by 10x to 25x. [Ditto column stores beat row 
stores by 10x for complex queries in non-updatable data.]

So my question is not to challenge the Postgres way. It's simply to ask whether 
there are any known figures that would directly support or refute his claims. 
Does Postgres really spend 96% of its time in thumb-twiddling once the entire 
database resides in memory?

> Presumably there's more improvements that could be made to Postgres in
> this area, but if you really don't care about losing seconds worth of
> data and you need absolutely the best performance possible then maybe
> Postgres isn't the right choice for you.

Achieving durability for an in-memory database requires either UPS or active 
replication or both, which is an additional cost that is not needed for every 
application. My question precedes that one: is there a big performance gain 
there for the taking, or is it smoke and mirrors?

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org







-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-07 Thread Jim Nasby

On 9/2/16 7:39 PM, dandl wrote:

I don't think this is quite true. The mechanism he proposes has a small window 
in which committed transactions can be lost, and this should be addressed by 
replication or by a small amount of UPC (a few seconds).


Except that's the entire point where all those kind of solutions 
*completely* depart ways from Postgres. Postgres is designed to *lose 
absolutely no data after a COMMIT*, potentially including requiring that 
data to be synchronized out to a second server. That is worlds apart 
from "we might lose a few seconds", and there's a lot of stuff Postgres 
has to worry about to accomplish that. Some of that stuff can be 
short-circuited if you don't care (that's what SET synchronous_commit = 
off does), but there's always going to be some amount of extra work to 
support synchronous_commit = local or remote_*.


Presumably there's more improvements that could be made to Postgres in 
this area, but if you really don't care about losing seconds worth of 
data and you need absolutely the best performance possible then maybe 
Postgres isn't the right choice for you.


"All databases suck, each one just sucks in a different way." - Me, 
circa 1999.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-03 Thread Simon Riggs
On 3 September 2016 at 04:36, dandl  wrote:

> In any event, I am not backing his claims. I am simply asking: does anyone 
> have any facts to support or refute his 96% claim as applied to Postgres?

If that is scientific research he will publish evidence. If not, its
just words and no refutation is required.

Anybody can come here and discuss new features. Anybody. They just
need to explain their thoughts and produce evidence for their
assertions.

Come on in, database researchers, we're open to rational contributions.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-02 Thread Peter Geoghegan
On Fri, Sep 2, 2016 at 8:36 PM, dandl  wrote:
> The paper is substantially in agreement with the presentation I quoted. If 
> there are differences in detail, they certainly don't dominate his argument.

My point is that the paper is rather light on details of the kind that
are really important. And, that it's noteworthy that Stonebraker has
in the past, during presentations, emphasized the buffer lock
crabbing/latch coupling thing *at length*, even though it's a totally
solved problem.

It's also true that Postgres has become vastly more scalable in the
past few years due to optimization that doesn't change the fundamental
nature of the system at all, so it's very easy to imagine individual
differences being more important than differences between major
classes of system.

Those are facts. You may take from them what you will.

> IMO your claim is far weaker. What specifically do you say is wrong about his 
> current claims, and on what facts to you base it?

I'm not the one making overarching conclusions. I'm not trying to
convince you of anything.

-- 
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-02 Thread dandl
> >> > I wondered if there are any figures or measurements on Postgres
> >> > performance in this ‘enough memory’ environment to support or
> >> > contest this point of view?
> >
> > I don't think that's really answerable without individual use-cases
> in
> > mind.  Answering that question for analytics, operational, ...
> > workloads is going to look different, and the overheads are
> elsewhere.
> >
> > I personally think that each implementations restrictions are more
> > likely to be an issue than anything "fundamental".
> 
> +1
> 
> At one point, Stonebraker was regularly claiming that "crabbing" of
> buffer locks in B-Trees was a fundamental overhead paid in systems
> more or less based on System R. He did eventually start to acknowledge
> that Lehman and Yao figured out a technique that made that untrue in
> 1981, if only barely [1], but the lesson for me was to take his claims
> in this area with a generous pinch of salt.
> 
> [1] https://www.cs.cmu.edu/~pavlo/static/papers/stonebraker-
> ic2e2014.pdf
> (See his citation 11)

The paper is substantially in agreement with the presentation I quoted. If 
there are differences in detail, they certainly don't dominate his argument.

IMO your claim is far weaker. What specifically do you say is wrong about his 
current claims, and on what facts to you base it? 

In any event, I am not backing his claims. I am simply asking: does anyone have 
any facts to support or refute his 96% claim as applied to Postgres?

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org







-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-02 Thread 'Bruce Momjian'
On Sat, Sep  3, 2016 at 10:45:47AM +1000, dandl wrote:
> > Agreed. Stonebraker measured Shore DBMS, which is an academic
> > database:
> > 
> > http://research.cs.wisc.edu/shore/
> > 
> > If he had measured a production-quality database that had been
> > optimized like Postgres, I would take more stock of his "overhead"
> > numbers.
> 
> Exactly! And that's what I'm asking: has anyone done or know of any figures 
> for Postgres, to set against these?

Uh, well, there are Postgres tools that measure the overhead of locking
on queries and stuff.  I don't know any numbers myself.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-02 Thread dandl
> Agreed. Stonebraker measured Shore DBMS, which is an academic
> database:
> 
>   http://research.cs.wisc.edu/shore/
> 
> If he had measured a production-quality database that had been
> optimized like Postgres, I would take more stock of his "overhead"
> numbers.

Exactly! And that's what I'm asking: has anyone done or know of any figures for 
Postgres, to set against these?

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org







-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-02 Thread dandl
> > > http://slideshot.epfl.ch/play/suri_stonebraker
> > >
> > >
> > >
> > > He makes the claim that in a modern ‘big iron’ RDBMS such as
> Oracle,
> > > DB2, MS SQL Server, Postgres, given enough memory that the entire
> > > database lives in cache, the server will spend 96% of its memory
> > > cycles on unproductive overhead. This includes buffer management,
> > > locking, latching (thread/CPU
> > > conflicts) and recovery (including log file reads and writes).
> 
> I think those numbers are overblown, and more PR than reality.

Did you check out the presentation? He presents figures obtained by experiment 
from instrumentation. Even if it's only 90% instead of 96%, he has a point.

> But there certainly are some things that can be made more efficient if
> you don't care about durability and replication.

He cares plenty. Durability and high availability both rely on active 
replication. 

> > > I wondered if there are any figures or measurements on Postgres
> > > performance in this ‘enough memory’ environment to support or
> > > contest this point of view?
> 
> I don't think that's really answerable without individual use-cases in
> mind.  Answering that question for analytics, operational, ...
> workloads is going to look different, and the overheads are elsewhere.

That's like: we don't have any figures for how fast your car will go: it 
depends on who's driving and how many passengers. My answer is: yes, of course, 
but you can still provide figures for some specific set of conditions, and 
they'll be better than none at all.

> I personally think that each implementations restrictions are more
> likely to be an issue than anything "fundamental".

Unlikely. But you can still obtain figures.

> > What limits postgresql when everything fits in memory? The fact that
> > it's designed to survive a power outage and not lose all your data.
> >
> > Stonebraker's new stuff is cool, but it is NOT designed to survive
> > total power failure.
> >
> > Two totally different design concepts. It's apples and oranges to
> compare them.
> 
> I don't think they're that fundamentally different.

Agreed.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org







-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-02 Thread dandl
> > Re this talk given by Michael Stonebraker:
> >
> > http://slideshot.epfl.ch/play/suri_stonebraker
> >
> >
> >
> > He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle,
> > DB2, MS SQL Server, Postgres, given enough memory that the entire
> > database lives in cache, the server will spend 96% of its memory
> > cycles on unproductive overhead. This includes buffer management,
> > locking, latching (thread/CPU
> > conflicts) and recovery (including log file reads and writes).
> >
> >
> >
> > [Enough memory in this case assumes that for just about any
> business,
> > 1TB is enough. The intent of his argument is that a server designed
> > correctly for it would run 25x faster.]
> >
> >
> >
> > I wondered if there are any figures or measurements on Postgres
> > performance in this ‘enough memory’ environment to support or
> contest
> > this point of view?
> 
> What limits postgresql when everything fits in memory? The fact that
> it's designed to survive a power outage and not lose all your data.
> 
> Stonebraker's new stuff is cool, but it is NOT designed to survive
> total power failure.

I don't think this is quite true. The mechanism he proposes has a small window 
in which committed transactions can be lost, and this should be addressed by 
replication or by a small amount of UPC (a few seconds).

But that isn't my question: I'm asking whether anyone *knows* any comparable 
figures for Postgres. IOW how much performance gain might be available for 
different design choices.

> Two totally different design concepts. It's apples and oranges to
> compare them.

Not to an end user. A system that runs 10x on OLTP and provides all the same 
functionality is a direct competitor.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org







-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-02 Thread Peter Geoghegan
On Fri, Sep 2, 2016 at 10:32 AM, Andres Freund  wrote:
>
>> > I wondered if there are any figures or measurements on Postgres performance
>> > in this ‘enough memory’ environment to support or contest this point of
>> > view?
>
> I don't think that's really answerable without individual use-cases in
> mind.  Answering that question for analytics, operational, ... workloads
> is going to look different, and the overheads are elsewhere.
>
> I personally think that each implementations restrictions are more
> likely to be an issue than anything "fundamental".

+1

At one point, Stonebraker was regularly claiming that "crabbing" of
buffer locks in B-Trees was a fundamental overhead paid in systems
more or less based on System R. He did eventually start to acknowledge
that Lehman and Yao figured out a technique that made that untrue in
1981, if only barely [1], but the lesson for me was to take his claims
in this area with a generous pinch of salt.

[1] https://www.cs.cmu.edu/~pavlo/static/papers/stonebraker-ic2e2014.pdf
(See his citation 11)
-- 
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-02 Thread Bruce Momjian
On Fri, Sep  2, 2016 at 10:32:46AM -0700, Andres Freund wrote:
> On 2016-09-02 11:10:35 -0600, Scott Marlowe wrote:
> > On Fri, Sep 2, 2016 at 4:49 AM, dandl  wrote:
> > > Re this talk given by Michael Stonebraker:
> > >
> > > http://slideshot.epfl.ch/play/suri_stonebraker
> > >
> > >
> > >
> > > He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle, DB2, 
> > > MS
> > > SQL Server, Postgres, given enough memory that the entire database lives 
> > > in
> > > cache, the server will spend 96% of its memory cycles on unproductive
> > > overhead. This includes buffer management, locking, latching (thread/CPU
> > > conflicts) and recovery (including log file reads and writes).
> 
> I think those numbers are overblown, and more PR than reality.
> 
> But there certainly are some things that can be made more efficient if
> you don't care about durability and replication.

Agreed. Stonebraker measured Shore DBMS, which is an academic database:

http://research.cs.wisc.edu/shore/

If he had measured a production-quality database that had been optimized
like Postgres, I would take more stock of his "overhead" numbers.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-02 Thread Dorian Hoxha
Check out Voltdb (or Scylladb which is more different) for the changes in
architecture required to achieve those performance increases.

On Fri, Sep 2, 2016 at 7:32 PM, Andres Freund  wrote:

> On 2016-09-02 11:10:35 -0600, Scott Marlowe wrote:
> > On Fri, Sep 2, 2016 at 4:49 AM, dandl  wrote:
> > > Re this talk given by Michael Stonebraker:
> > >
> > > http://slideshot.epfl.ch/play/suri_stonebraker
> > >
> > >
> > >
> > > He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle,
> DB2, MS
> > > SQL Server, Postgres, given enough memory that the entire database
> lives in
> > > cache, the server will spend 96% of its memory cycles on unproductive
> > > overhead. This includes buffer management, locking, latching
> (thread/CPU
> > > conflicts) and recovery (including log file reads and writes).
>
> I think those numbers are overblown, and more PR than reality.
>
> But there certainly are some things that can be made more efficient if
> you don't care about durability and replication.
>
>
> > > I wondered if there are any figures or measurements on Postgres
> performance
> > > in this ‘enough memory’ environment to support or contest this point of
> > > view?
>
> I don't think that's really answerable without individual use-cases in
> mind.  Answering that question for analytics, operational, ... workloads
> is going to look different, and the overheads are elsewhere.
>
> I personally think that each implementations restrictions are more
> likely to be an issue than anything "fundamental".
>
>
> > What limits postgresql when everything fits in memory? The fact that
> > it's designed to survive a power outage and not lose all your data.
> >
> > Stonebraker's new stuff is cool, but it is NOT designed to survive
> > total power failure.
> >
> > Two totally different design concepts. It's apples and oranges to
> compare them.
>
> I don't think they're that fundamentally different.
>
>
> Greetings,
>
> Andres Freund
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-02 Thread Andres Freund
On 2016-09-02 11:10:35 -0600, Scott Marlowe wrote:
> On Fri, Sep 2, 2016 at 4:49 AM, dandl  wrote:
> > Re this talk given by Michael Stonebraker:
> >
> > http://slideshot.epfl.ch/play/suri_stonebraker
> >
> >
> >
> > He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle, DB2, MS
> > SQL Server, Postgres, given enough memory that the entire database lives in
> > cache, the server will spend 96% of its memory cycles on unproductive
> > overhead. This includes buffer management, locking, latching (thread/CPU
> > conflicts) and recovery (including log file reads and writes).

I think those numbers are overblown, and more PR than reality.

But there certainly are some things that can be made more efficient if
you don't care about durability and replication.


> > I wondered if there are any figures or measurements on Postgres performance
> > in this ‘enough memory’ environment to support or contest this point of
> > view?

I don't think that's really answerable without individual use-cases in
mind.  Answering that question for analytics, operational, ... workloads
is going to look different, and the overheads are elsewhere.

I personally think that each implementations restrictions are more
likely to be an issue than anything "fundamental".


> What limits postgresql when everything fits in memory? The fact that
> it's designed to survive a power outage and not lose all your data.
> 
> Stonebraker's new stuff is cool, but it is NOT designed to survive
> total power failure.
> 
> Two totally different design concepts. It's apples and oranges to compare 
> them.

I don't think they're that fundamentally different.


Greetings,

Andres Freund


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-02 Thread Scott Marlowe
On Fri, Sep 2, 2016 at 4:49 AM, dandl  wrote:
> Re this talk given by Michael Stonebraker:
>
> http://slideshot.epfl.ch/play/suri_stonebraker
>
>
>
> He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle, DB2, MS
> SQL Server, Postgres, given enough memory that the entire database lives in
> cache, the server will spend 96% of its memory cycles on unproductive
> overhead. This includes buffer management, locking, latching (thread/CPU
> conflicts) and recovery (including log file reads and writes).
>
>
>
> [Enough memory in this case assumes that for just about any business, 1TB is
> enough. The intent of his argument is that a server designed correctly for
> it would run 25x faster.]
>
>
>
> I wondered if there are any figures or measurements on Postgres performance
> in this ‘enough memory’ environment to support or contest this point of
> view?

What limits postgresql when everything fits in memory? The fact that
it's designed to survive a power outage and not lose all your data.

Stonebraker's new stuff is cool, but it is NOT designed to survive
total power failure.

Two totally different design concepts. It's apples and oranges to compare them.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general