Re: Issues Scaling Postgres Concurrency

2023-03-14 Thread Tom Lane
"Peter J. Holzer"  writes:
> On 2023-03-14 22:47:43 +0100, Laurenz Albe wrote:
>> A query that counts the number of rows in a table of half a million
>> rows is quite expensive and keeps a CPU core busy for a while
>> (provided everything is cached). At some degree of parallelism, your
>> CPU is overloaded, which leads to non-linear slowdown.

> The interesting thing is that on my laptop even two concurrent accesses
> cause a 100% slowdown. I think this is because the task is actually
> memory-bound: The cores may do the counting in parallel, but they have
> to read the data from the same RAM (since it's too large to fit in the
> CPU cache) and they have to take turns accessing it.

I wondered if synchronize_seqscans might exacerbate this behavior by
encouraging more cores to be touching the same buffers at the same time.
I couldn't measure much difference between having it on vs. off in a
pgbench test with all the clients counting the same table ... but maybe
on other hardware the effect would show up.

regards, tom lane




Re: Issues Scaling Postgres Concurrency

2023-03-14 Thread Peter J. Holzer
On 2023-03-14 22:47:43 +0100, Laurenz Albe wrote:
> On Mon, 2023-03-13 at 12:24 -0400, Harrison Borges wrote:
> > I’m running into severe performance problems with Postgres as I
> > increase the number of concurrent requests against my backend. I’ve
> > identified that the bottleneck is Postgres, and to simplify the test
> > case, I created an endpoint that only does a count query on a table
> > with ~500k rows. At 5 concurrent users, the response time was 33ms,
> > at 10 users it was 60ms, and at 20 users it was 120ms.
[...]
> > This manifests in essentially a server meltdown on production. As
> > the concurrent requests stack up, our server is stuck waiting for
> > more and more queries. Eventually requests begin timing out as they
> > start taking over 30 seconds to respond.
> > 
> > Am I doing something obviously wrong? Does this sound like normal
> > behavior?
> 
> That sounds like quite normal and expected behavior.
> 
> A query that counts the number of rows in a table of half a million
> rows is quite expensive and keeps a CPU core busy for a while
> (provided everything is cached). At some degree of parallelism, your
> CPU is overloaded, which leads to non-linear slowdown.

The slowdown looks pretty linear to me (6ms per user).

The interesting thing is that on my laptop even two concurrent accesses
cause a 100% slowdown. I think this is because the task is actually
memory-bound: The cores may do the counting in parallel, but they have
to read the data from the same RAM (since it's too large to fit in the
CPU cache) and they have to take turns accessing it.

> The thing you are doing wrong is that you are putting too much load on
> this system.

Or possibly counting stuff far more often than necessary. If an exact
count is necessary more frequently than it changes it is probably a good
idea to store that somewhere and update it in a trigger.

(If the count doesn't have to be totally up-to-date, caching it in the
application may be even better.)

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Issues Scaling Postgres Concurrency

2023-03-14 Thread Laurenz Albe
On Mon, 2023-03-13 at 12:24 -0400, Harrison Borges wrote:
> I’m running into severe performance problems with Postgres as I increase the 
> number
> of concurrent requests against my backend. I’ve identified that the 
> bottleneck is
> Postgres, and to simplify the test case, I created an endpoint that only does 
> a
> count query on a table with ~500k rows. At 5 concurrent users, the response 
> time
> was 33ms, at 10 users it was 60ms, and at 20 users it was 120ms.
> 
> As the number of concurrent users increases, the response time for the count 
> query
> also increases significantly, indicating that Postgres may not be scaling 
> well to
> handle the increasing load. 
> 
> This manifests in essentially a server meltdown on production. As the 
> concurrent
> requests stack up, our server is stuck waiting for more and more queries.
> Eventually requests begin timing out as they start taking over 30 seconds to 
> respond.
> 
> Am I doing something obviously wrong? Does this sound like normal behavior?

That sounds like quite normal and expected behavior.

A query that counts the number of rows in a table of half a million rows is
quite expensive and keeps a CPU core busy for a while (provided everything is
cached). At some degree of parallelism, your CPU is overloaded, which leads
to non-linear slowdown.

The thing you are doing wrong is that you are putting too much load on this
system.

Yours,
Laurenz Albe




Re: Issues Scaling Postgres Concurrency

2023-03-14 Thread Alan Hodgson
On Mon, 2023-03-13 at 12:24 -0400, Harrison Borges wrote:
> Hello everyone.
> 
> I’m running into severe performance problems with Postgres as I
> increase the number of concurrent requests against my backend. I’ve
> identified that the bottleneck is Postgres, and to simplify the
> test case, I created an endpoint that only does a count query on a
> table with ~500k rows. At 5 concurrent users, the response time was
> 33ms, at 10 users it was 60ms, and at 20 users it was 120ms.

I'm no expert on high concurrency, but for something this simple I'd
expect that you're just CPU bottlenecked. Count in PostgreSQL
actually has to read all the rows in the table. And yeah you can't do
too many of them at the same time.


Re: Issues Scaling Postgres Concurrency

2023-03-14 Thread Adrian Klaver

On 3/13/23 9:24 AM, Harrison Borges wrote:

Hello everyone.

I’m running into severe performance problems with Postgres as I increase 
the number of concurrent requests against my backend. I’ve identified 
that the bottleneck is Postgres, and to simplify the test case, I 
created an endpoint that only does a count query on a table with ~500k 
rows. At 5 concurrent users, the response time was 33ms, at 10 users it 
was 60ms, and at 20 users it was 120ms.


As the number of concurrent users increases, the response time for the 
count query also increases significantly, indicating that Postgres may 
not be scaling well to handle the increasing load.


This manifests in essentially a server meltdown on production. As the 
concurrent requests stack up, our server is stuck waiting for more and 
more queries. Eventually requests begin timing out as they start taking 
over 30 seconds to respond.


Am I doing something obviously wrong? Does this sound like normal 
behavior? I'm not very experienced at DB ops so I'm not 100% sure what 
to expect here, but I have worked as a Software Engineer for over 10 
years and I've not encountered problems like this before.


1) https://wiki.postgresql.org/wiki/Slow_Counting

2) Are you using connection pooling?



I would appreciate any insights or advice on how to optimize Postgres 
for high concurrency scenarios. Thank you in advance for your help!



--
Adrian Klaver
adrian.kla...@aklaver.com




Issues Scaling Postgres Concurrency

2023-03-14 Thread Harrison Borges
Hello everyone.

I’m running into severe performance problems with Postgres as I increase
the number of concurrent requests against my backend. I’ve identified that
the bottleneck is Postgres, and to simplify the test case, I created an
endpoint that only does a count query on a table with ~500k rows. At 5
concurrent users, the response time was 33ms, at 10 users it was 60ms, and
at 20 users it was 120ms.

As the number of concurrent users increases, the response time for the
count query also increases significantly, indicating that Postgres may not
be scaling well to handle the increasing load.

This manifests in essentially a server meltdown on production. As the
concurrent requests stack up, our server is stuck waiting for more and more
queries. Eventually requests begin timing out as they start taking over 30
seconds to respond.

Am I doing something obviously wrong? Does this sound like normal behavior?
I'm not very experienced at DB ops so I'm not 100% sure what to expect
here, but I have worked as a Software Engineer for over 10 years and I've
not encountered problems like this before.

I would appreciate any insights or advice on how to optimize Postgres for
high concurrency scenarios. Thank you in advance for your help!