Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-11 Thread James Cloos
 GBB == Graeme B Bell graeme.b...@nibio.no writes:

GBB 1a. For example AMD CPUs list the number of integer cores (e.g. 16),
GBB but there is actually only half as many cores available for floating
GBB point work (8). So if your functions need to use floating point, your
GBB scaling will suffer badly on FP functions.

That is half as many 256-bit float units; for scalar math and for
128-bit vector math each core gets a half of the float unit.

Only for the 256-bit vector math do the schedulars have to compete for
float unit access.

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 0x997A9F17ED7DAEA6


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-09 Thread Graeme B. Bell

On 09 Jul 2015, at 15:22, Thomas Kellerer spam_ea...@gmx.net wrote:

 Graeme B. Bell schrieb am 09.07.2015 um 11:44:
 I don't recall seeing a clear statement telling me I should mark pl/pgsql
 functions nonvolatile wherever possible or throw all performance and
 scalability out the window. 
 
 From: http://www.postgresql.org/docs/current/static/xfunc-volatility.html
 
   For best optimization results, you should label your functions 
with the strictest volatility category that is valid for them.


Hi Thomas,

Thank you very much for the link.

However, the point I was making wasn't that no sentence exists anywhere. My 
point was that I've read the docs more than anyone else in my institute and I 
was completely unaware of this. 

It also quite vague - if you hand that to a younger programmer in particular, 
how do they implement it in practice? When is it important to do it?  If this 
one factor silently breaks multiprocessor scaling of pl/pgsql, and 
multiprocessing is the biggest trend in CPU processing of the last decade 
(comparing server CPUS of 2005 with 2015), then why is this information not up 
front and clear?


A second point to keep in mind that optimization and 
parallelisation/scalability are not always the same thing. 

For example, in one project I took a bunch of looped parallel UPDATEs on a set 
of 50 tables, and rewrote them so as to run the loop all at once inside a 
pl/pgsql function. Crudely, I took out the table-level for loop and put it at 
row-level instead. 

I expected they'd execute much faster if UPDATEs were using data still in 
cache. Also, I would be updating without writing out WAL entries to disk 
repeatedly. 

It turns out the update per row ran much faster - as expected - when I used one 
table, but when I ran it in parallel on many tables, the performance was even 
worse than when I started. If you look at the benchmarks, you'll see that 
performance drops through the floor at 8-16 cores. I think that was when I 
first noticed this bug/feature.

[If anyone is curious, the way I solved that one in the end was to 
pre-calculate every possible way the tables might be updated after N loops of 
updates using Python, and import that as a lookup table into PG. It turns out 
that although we had 10's of GB of data per table, there were only about 100,00 
different types of situation, and only e.g. 80 iterations to consider). Then I 
ran a single set of UPDATEs with no pl/pgsql. It was something like a 1x 
performance improvement.]

Graeme.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-09 Thread Graeme B. Bell
 
 3. I don't disagree that the benchmark code is objectively 'bad' in the 
 sense that it is missing an important optimisation.
 
 Particularly with regards documentation, a patch improving things is
 much more likely to improve the situation than griping.  Also,
 conversation on this list gets recorded for posterity and google is
 remarkably good at matching people looking for problems with
 solutions.  So, even in absence of a patch perhaps we've made the
 lives of future head-scratchers a little bit easier with this
 discussion.

I agree that patchgripe, and about the google aspect. But nonetheless, a 
well-intentioned gripe is  ignorance of a problem. 

As mentioned earlier, I'm sick just now and will be back in hospital again 
tomorrow  monday, so a patch may be a little bit much to ask from me here :-) 
It's a bit much even keeping up with the posts on the thread so far.

I might try to fix the documentation a bit later, though as someone with no 
experience in marking up volatility on pl/pgsql functions I doubt my efforts 
would be that great. I also have other OSS project contributions that need some 
attention first. 

Re: the google effect. Are these mailing list archives mirrored anywhere, 
incidentally? For example, I notice we just lost http:reddit.com/r/amd at the 
weekend, all the discussion of the last few years on that forum is out of 
reach.  

Graeme Bell


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-09 Thread Merlin Moncure
On Thu, Jul 9, 2015 at 10:12 AM, Graeme B. Bell graeme.b...@nibio.no wrote:

 3. I don't disagree that the benchmark code is objectively 'bad' in the 
 sense that it is missing an important optimisation.

 Particularly with regards documentation, a patch improving things is
 much more likely to improve the situation than griping.  Also,
 conversation on this list gets recorded for posterity and google is
 remarkably good at matching people looking for problems with
 solutions.  So, even in absence of a patch perhaps we've made the
 lives of future head-scratchers a little bit easier with this
 discussion.

 I agree that patchgripe, and about the google aspect. But nonetheless, a 
 well-intentioned gripe is  ignorance of a problem.

 As mentioned earlier, I'm sick just now and will be back in hospital again 
 tomorrow  monday, so a patch may be a little bit much to ask from me here 
 :-) It's a bit much even keeping up with the posts on the thread so far.

 I might try to fix the documentation a bit later, though as someone with no 
 experience in marking up volatility on pl/pgsql functions I doubt my efforts 
 would be that great. I also have other OSS project contributions that need 
 some attention first.

 Re: the google effect. Are these mailing list archives mirrored anywhere, 
 incidentally? For example, I notice we just lost http:reddit.com/r/amd at the 
 weekend, all the discussion of the last few years on that forum is out of 
 reach.

The community maintains it's own mailing list archives in
postgresql.org.  Short of an array of tactical nuclear strikes this is
going to be preserved because it represents the history of the project
and in many ways is as important as the source code itself.

The archives are also mirrored by a number of high quality providers
such as nabble (which tend to beat our archives in google rankings --
likely due to the improved interface).

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-09 Thread Graeme B. Bell

On 09 Jul 2015, at 17:42, Merlin Moncure mmonc...@gmail.com wrote:

 The community maintains it's own mailing list archives in
 postgresql.org.  Short of an array of tactical nuclear strikes this is
 going to be preserved 

Good to know, I've seen a lot of dead software projects throughout my life. 

But still - we will have to pray that Kim Jong Un never decides to become a 
MySQL contributor... :)

Graeme.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-09 Thread Andres Freund
On 2015-07-08 23:38:38 -0400, Tom Lane wrote:
 and...@anarazel.de (Andres Freund) writes:
  On 2015-07-08 15:38:24 -0700, Craig James wrote:
  From my admittedly naive point of view, it's hard to see why any of this
  matters. I have functions that do purely CPU-intensive mathematical
  calculations ... you could imagine something like is_prime(N) that
  determines if N is a prime number. I have eight clients that connect to
  eight backends. Each client issues an SQL command like, select
  is_prime(N) where N is a simple number.

  I mostly replied to Merlin's general point (additionally in the context of
  plpgsql).

  But I have a hard time seing that postgres would be the bottleneck for a
  is_prime() function (or something with similar characteristics) that's
  written in C where the average runtime is more than, say, a couple
  thousand cyles.  I'd like to see a profile of that.

 But that was not the case that Graeme was complaining about.

No, Craig was complaining about that case...

 One of my Salesforce colleagues has been looking into ways that we could
 decide to skip the per-statement snapshot acquisition even in volatile
 functions, if we could be sure that a particular statement isn't going to
 do anything that would need a snapshot.

Yea, I actually commented about that on IRC as well.

I was thinking about actually continuing to get a snapshot, but mark it
as 'complete on usage'. I.e. only call GetSnapshotData() only when the
snapshot is used to decide about visibility.  We probably can't do that
in the toplevel visibility case because it'll probably have noticeable
semantic effects, but ISTM it should be doable for the volatile function
using spi case.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-09 Thread Graeme B. Bell
On 08 Jul 2015, at 22:27, Andres Freund and...@anarazel.de wrote:

 On 2015-07-08 13:46:53 -0500, Merlin Moncure wrote:
 On Wed, Jul 8, 2015 at 12:48 PM, Craig James cja...@emolecules.com wrote:
 
 Well, right, which is why I mentioned even with dozens of clients.
 Shouldn't that scale to at least all of the CPUs in use if the function is
 CPU intensive (which it is)?
 
 only in the absence of inter-process locking and cache line bouncing.
 
 And addititionally memory bandwidth (shared between everything, even in
 the numa case), cross socket/bus bandwidth (absolutely performance
 critical in multi-socket configurations), cache capacity (shared between
 cores, and sometimes even sockets!).

1. Note for future readers - it's also worth noting that depending on the 
operation, and on your hardware, you may have less CPU cores than you think 
to parallelise upon.

1a. For example AMD CPUs list the number of integer cores (e.g. 16), but there 
is actually only half as many cores available for floating point work (8). So 
if your functions need to use floating point, your scaling will suffer badly on 
FP functions. 

https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)
  In terms of hardware complexity and functionality, this module is equal to 
a dual-core processor in its integer power, and to a single-core processor in 
its floating-point power: for each two integer cores, there is one 
floating-point core.


1b. Or, if you have hyper-threading enabled on an Intel CPU, you may think you 
have e.g. 8 cores, but if all the threads are running the same type of 
operation repeatedly, it won't be possible for the hyper-threading to work well 
and you'll only get 4 in practice. Maybe less due to overheads. Or, if your 
work is continuallly going to main memory for data (e.g. limited by the memory 
bus), it will run at 4-core speed, because the cores have to share the same 
memory bus. 

Hyper-threading depends on the 2 logical cores being asked to perform two 
different types of tasks at once (each having relatively lower demands on 
memory).

When execution resources would not be used by the current task in a processor 
without hyper-threading, and especially when the processor is stalled, a 
hyper-threading equipped processor can use those execution resources to execute 
another scheduled task.
https://en.wikipedia.org/wiki/Hyper-threading
https://en.wikipedia.org/wiki/Superscalar


2. Keep in mind also when benchmarking that it's normal to see an small 
drop-off when you hit the maximum number of cores for your system. 
After all, the O/S and the benchmark program and anything else you have running 
will need a core or two.

 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-09 Thread Graeme B. Bell
On 09 Jul 2015, at 05:38, Tom Lane t...@sss.pgh.pa.us wrote:

 If you
 write your is_prime function purely in plpgsql, and don't bother to mark
 it nonvolatile, *it will not scale*.  

 much for properly written plpgsql; but there's an awful lot of bad plpgsql
 code out there, and it can make a huge difference for that.


Hi Tom, 

I object to phrases like 'don't bother to mark it' and 'bad plpgsql' here. That 
is putting the blame on programmers. Clearly, if there is no end of code out 
there that isn't right in this regard, there's something very wrong in the 
project documentation.

1. I have been writing pl/pgsql on and off for a couple of years now and I've 
read quite a bit of the postgres doumentation, but I don't recall seeing a 
clear statement telling me I should mark pl/pgsql functions nonvolatile 
wherever possible or throw all performance and scalability out the window. I'm 
sure there may be a line hidden somewhere in the docs, but judging from the 
impact it has in practice, this seems like a very fundamental concept that 
should be repeatedly and clearly marked in the docs. 

2. Furthermore, I have never come across anything in the documentation that 
made it clear to me that any pl/pgsql function I write will, by default, be 
taking out locks for every single statement in the code. I've written code in I 
dunno, maybe 15-20 different languages in my life, and I can't think of another 
language offhand that does that by default. From the reactions on this thread 
to this benchmark and the par_psql benchmarks, it doesn't seem that it was even 
immediately obvious to many postgres enthusiasts and developers.

3. I don't disagree that the benchmark code is objectively 'bad' in the sense 
that it is missing an important optimisation. 

But I really don't think it helps to frame this as laziness or bad in any 
other sense of the word e.g. 'clumsy'.

Let's look at the postgresql documentation for some examples of 'bad' and lazy 
code: 

http://www.postgresql.org/docs/9.3/static/plpgsql-structure.html
http://www.postgresql.org/docs/9.3/static/plpgsql-declarations.html

There are about 13 functions on that page.
How many functions on that page make use non-volatile or immutable wherever it 
would be appropriate?  
zero.

or this one: 
http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html
zero

or this one: 
http://www.postgresql.org/docs/9.3/static/plpgsql-cursors.html#PLPGSQL-CURSOR-USING
zero

The reason 90% of people out there are 'not bothering' and 'writing bad code' 
is because **99% of the postgresql documentation teaches them to do it that 
way**. 

So when you talk about other people 'not bothering' to do things - who is 
really at fault here what for what you see as endemic 'bad' or 'lazy' code? Is 
it the new postgres programmers, or the people that taught them with bad 
examples consistently throughout the *entire project documentation*, starting 
from the very first example? 

I think I'm going to raise this as a documentation bug. 

Graeme. 









-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-09 Thread Merlin Moncure
On Wed, Jul 8, 2015 at 5:38 PM, Craig James cja...@emolecules.com wrote:
 On Wed, Jul 8, 2015 at 1:27 PM, Andres Freund and...@anarazel.de wrote:

 On 2015-07-08 13:46:53 -0500, Merlin Moncure wrote:
  On Wed, Jul 8, 2015 at 12:48 PM, Craig James cja...@emolecules.com
  wrote:
   On Tue, Jul 7, 2015 at 10:31 PM, Joshua D. Drake
   j...@commandprompt.com
   Using Apache Fast-CGI, you are going to fork a process for each
   instance
   of the function being executed and that in turn will use all CPUs up
   to the
   max available resource.
  
   With PostgreSQL, that isn't going to happen unless you are running
   (at
   least) 8 functions across 8 connections.
  
  
   Well, right, which is why I mentioned even with dozens of clients.
   Shouldn't that scale to at least all of the CPUs in use if the
   function is
   CPU intensive (which it is)?
 
  only in the absence of inter-process locking and cache line bouncing.

 And addititionally memory bandwidth (shared between everything, even in
 the numa case), cross socket/bus bandwidth (absolutely performance
 critical in multi-socket configurations), cache capacity (shared between
 cores, and sometimes even sockets!).


 From my admittedly naive point of view, it's hard to see why any of this
 matters. I have functions that do purely CPU-intensive mathematical
 calculations ... you could imagine something like is_prime(N) that
 determines if N is a prime number. I have eight clients that connect to
 eight backends. Each client issues an SQL command like, select is_prime(N)
 where N is a simple number.

 Are you saying that in order to calculate is_prime(N), all of that stuff
 (inter-process locking, memory bandwith, bus bandwidth, cache capacity,
 etc.) is even relevant? And if so, how is it that Postgres is so different
 from an Apache fast-CGI program that runs the exact same is_prime(N)
 calculation?

 Just curious ... as I said, I've already implemented a different solution.

If your is_prime() was written in C and was written so that it did not
utilize the database API, it should scale up quite nicely.  This can
be easily proved.  On my quad core workstation,

postgres=# select 12345! * 0;
 ?column?
──
0
(1 row)

Time: 10435.554 ms


...which is heavily cpu bound, takes about 10 seconds.  scaling out to
4 threads via:

time ~/pgdev/bin/pgbench -n -t1 -c4 -f (echo select 12345! * 0;)

yields:
real 0m11.317s
user 0m0.001s
sys 0m0.005s

...I'll call that pretty good scaling.  The reason why this scales so
good is that the numeric code is all operating on local data
structures and is not involving backend componentry with it's various
attached complexity such as having to be checked for being visible to
the current transaction.

I submit that toy benchmarks like factoring or pi digits are not
really good indicators of language scaling and performance because
just about all real world code involves data structures, i/o, memory
allocation, amateur coders, etc.   Java tends to approach C in
benchmark shootouts but woefully underperforms my expectations
relative to C in code that does things that's actually useful (aside:
if you think I'm knocking java, the situation is even worse with most
other languages I come across).

pl/pgsql is simply not optimized for that style of coding although if
you know postgres you can start to tickle the limits of what's
expected from the language.  If that isn't working for you, pl/v8
strikes me as the best alternative due to it's performance and good
integration with postgres data structures (in fact, I'd be arguing for
it to be moved to core if the v8 dependency wasn't so capricious).
Either way, I'll advocate any solution that allows you to code inside
the database environment as opposed to the client side.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-09 Thread Merlin Moncure
On Thu, Jul 9, 2015 at 4:44 AM, Graeme B. Bell graeme.b...@nibio.no wrote:
 On 09 Jul 2015, at 05:38, Tom Lane t...@sss.pgh.pa.us wrote:

 If you
 write your is_prime function purely in plpgsql, and don't bother to mark
 it nonvolatile, *it will not scale*.

 much for properly written plpgsql; but there's an awful lot of bad plpgsql
 code out there, and it can make a huge difference for that.


 Hi Tom,

 I object to phrases like 'don't bother to mark it' and 'bad plpgsql' here. 
 That is putting the blame on programmers. Clearly, if there is no end of code 
 out there that isn't right in this regard, there's something very wrong in 
 the project documentation.

 1. I have been writing pl/pgsql on and off for a couple of years now and I've 
 read quite a bit of the postgres doumentation, but I don't recall seeing a 
 clear statement telling me I should mark pl/pgsql functions nonvolatile 
 wherever possible or throw all performance and scalability out the window. 
 I'm sure there may be a line hidden somewhere in the docs, but judging from 
 the impact it has in practice, this seems like a very fundamental concept 
 that should be repeatedly and clearly marked in the docs.

 2. Furthermore, I have never come across anything in the documentation that 
 made it clear to me that any pl/pgsql function I write will, by default, be 
 taking out locks for every single statement in the code. I've written code in 
 I dunno, maybe 15-20 different languages in my life, and I can't think of 
 another language offhand that does that by default. From the reactions on 
 this thread to this benchmark and the par_psql benchmarks, it doesn't seem 
 that it was even immediately obvious to many postgres enthusiasts and 
 developers.

 3. I don't disagree that the benchmark code is objectively 'bad' in the sense 
 that it is missing an important optimisation.

Particularly with regards documentation, a patch improving things is
much more likely to improve the situation than griping.  Also,
conversation on this list gets recorded for posterity and google is
remarkably good at matching people looking for problems with
solutions.  So, even in absence of a patch perhaps we've made the
lives of future head-scratchers a little bit easier with this
discussion.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-08 Thread Graeme B. Bell
 On 07/07/2015 08:05 PM, Craig James wrote:
 
 
 No ideas, but I ran into the same thing. I have a set of C/C++ functions
 that put some chemistry calculations into Postgres as extensions (things
 like, calculate the molecular weight of this molecule). As SQL
 functions, the whole thing bogged down, and we never got the scalability
 we needed. On our 8-CPU setup, we couldn't get more than 2 CPUs busy at
 the same time, even with dozens of clients.


Hi all,

The sample code / results were put up last night at http://github.com/gbb/t

Craig's problem sounds similar to my own, assuming he means running C 
indirectly via SQL vs running C more directly.
Lots of parallel connections to postgres but maximum 2 CPUs of scaling (and it 
gets worse, as you try to run more things).

Tom Lane has posted an interesting comment over on the bugs list which identies 
a likely source at least one of the problems, maybe both. 
It seems to be linked to internal locking inside postgres (which makes sense, 
given the results - both problems feel 'lock-y').
Also, he mentions a workaround for some functions that scales to 8-way 
apparently. 

http://www.postgresql.org/message-id/31265.1436317...@sss.pgh.pa.us

I think it's potentially a big problem for CPU intensive postgres libraries 
like pgrouting, or perhaps the postgis  postgis raster functions, things like 
that.
I don't know how well their functions are marked for e.g. immutability. 
Are there any postgis devs on this list?

Graeme Bell



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-08 Thread Merlin Moncure
On Wed, Jul 8, 2015 at 12:48 PM, Craig James cja...@emolecules.com wrote:
 On Tue, Jul 7, 2015 at 10:31 PM, Joshua D. Drake j...@commandprompt.com
 wrote:


 On 07/07/2015 08:05 PM, Craig James wrote:



 No ideas, but I ran into the same thing. I have a set of C/C++ functions
 that put some chemistry calculations into Postgres as extensions (things
 like, calculate the molecular weight of this molecule). As SQL
 functions, the whole thing bogged down, and we never got the scalability
 we needed. On our 8-CPU setup, we couldn't get more than 2 CPUs busy at
 the same time, even with dozens of clients.

 When I moved these same functions into an Apache fast-CGI HTTP service
 (exact same code, same network overhead), I could easily scale up and
 use the full 100% of all eight CPUs.

 I have no idea why, and never investigated further. The convenience of
 having the functions in SQL wasn't that important.


 I admit that I haven't read this whole thread but:

 Using Apache Fast-CGI, you are going to fork a process for each instance
 of the function being executed and that in turn will use all CPUs up to the
 max available resource.

 With PostgreSQL, that isn't going to happen unless you are running (at
 least) 8 functions across 8 connections.


 Well, right, which is why I mentioned even with dozens of clients.
 Shouldn't that scale to at least all of the CPUs in use if the function is
 CPU intensive (which it is)?

only in the absence of inter-process locking and cache line bouncing.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-08 Thread Craig James
On Tue, Jul 7, 2015 at 10:31 PM, Joshua D. Drake j...@commandprompt.com
wrote:


 On 07/07/2015 08:05 PM, Craig James wrote:



 No ideas, but I ran into the same thing. I have a set of C/C++ functions
 that put some chemistry calculations into Postgres as extensions (things
 like, calculate the molecular weight of this molecule). As SQL
 functions, the whole thing bogged down, and we never got the scalability
 we needed. On our 8-CPU setup, we couldn't get more than 2 CPUs busy at
 the same time, even with dozens of clients.

 When I moved these same functions into an Apache fast-CGI HTTP service
 (exact same code, same network overhead), I could easily scale up and
 use the full 100% of all eight CPUs.

 I have no idea why, and never investigated further. The convenience of
 having the functions in SQL wasn't that important.


 I admit that I haven't read this whole thread but:

 Using Apache Fast-CGI, you are going to fork a process for each instance
 of the function being executed and that in turn will use all CPUs up to the
 max available resource.

 With PostgreSQL, that isn't going to happen unless you are running (at
 least) 8 functions across 8 connections.


Well, right, which is why I mentioned even with dozens of clients.
Shouldn't that scale to at least all of the CPUs in use if the function is
CPU intensive (which it is)?

Craig




 JD

 --
 Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
 PostgreSQL Centered full stack support, consulting and development.
 Announcing I'm offended is basically telling the world you can't
 control your own emotions, so everyone else should do it for you.




-- 
-
Craig A. James
Chief Technology Officer
eMolecules, Inc.
-


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-08 Thread Joshua D. Drake


On 07/08/2015 10:48 AM, Craig James wrote:


I admit that I haven't read this whole thread but:

Using Apache Fast-CGI, you are going to fork a process for each
instance of the function being executed and that in turn will use
all CPUs up to the max available resource.

With PostgreSQL, that isn't going to happen unless you are running
(at least) 8 functions across 8 connections.


Well, right, which is why I mentioned even with dozens of clients.
Shouldn't that scale to at least all of the CPUs in use if the function
is CPU intensive (which it is)?


In theory but that isn't PostgreSQL that does that, it will be the 
kernel scheduler. Although (and I am grasping at straws):


I wonder if the execution is taking place outside of the backend proper 
or... are you using a pooler?


JD




Craig




JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.




--
-
Craig A. James
Chief Technology Officer
eMolecules, Inc.
-



--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-08 Thread Craig James
On Wed, Jul 8, 2015 at 10:52 AM, Joshua D. Drake j...@commandprompt.com
wrote:


 On 07/08/2015 10:48 AM, Craig James wrote:

  I admit that I haven't read this whole thread but:

 Using Apache Fast-CGI, you are going to fork a process for each
 instance of the function being executed and that in turn will use
 all CPUs up to the max available resource.

 With PostgreSQL, that isn't going to happen unless you are running
 (at least) 8 functions across 8 connections.


 Well, right, which is why I mentioned even with dozens of clients.
 Shouldn't that scale to at least all of the CPUs in use if the function
 is CPU intensive (which it is)?


 In theory but that isn't PostgreSQL that does that, it will be the kernel
 scheduler. Although (and I am grasping at straws):

 I wonder if the execution is taking place outside of the backend proper
 or... are you using a pooler?


No pooler, and the functions were in an ordinary SQL extension .so library
and loaded as

   CREATE OR REPLACE FUNCTION funcname( ... ) returns ...
   AS 'libxxx.so', 'funcname LANGUAGE c STRICT IMMUTABLE COST 1;

Craig


 JD



 Craig




 JD

 --
 Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
 PostgreSQL Centered full stack support, consulting and development.
 Announcing I'm offended is basically telling the world you can't
 control your own emotions, so everyone else should do it for you.




 --
 -
 Craig A. James
 Chief Technology Officer
 eMolecules, Inc.
 -



 --
 Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
 PostgreSQL Centered full stack support, consulting and development.
 Announcing I'm offended is basically telling the world you can't
 control your own emotions, so everyone else should do it for you.




-- 
-
Craig A. James
Chief Technology Officer
eMolecules, Inc.
-


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-08 Thread Andres Freund
On 2015-07-08 13:46:53 -0500, Merlin Moncure wrote:
 On Wed, Jul 8, 2015 at 12:48 PM, Craig James cja...@emolecules.com wrote:
  On Tue, Jul 7, 2015 at 10:31 PM, Joshua D. Drake j...@commandprompt.com
  Using Apache Fast-CGI, you are going to fork a process for each instance
  of the function being executed and that in turn will use all CPUs up to the
  max available resource.
 
  With PostgreSQL, that isn't going to happen unless you are running (at
  least) 8 functions across 8 connections.
 
 
  Well, right, which is why I mentioned even with dozens of clients.
  Shouldn't that scale to at least all of the CPUs in use if the function is
  CPU intensive (which it is)?
 
 only in the absence of inter-process locking and cache line bouncing.

And addititionally memory bandwidth (shared between everything, even in
the numa case), cross socket/bus bandwidth (absolutely performance
critical in multi-socket configurations), cache capacity (shared between
cores, and sometimes even sockets!).


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-08 Thread Tom Lane
and...@anarazel.de (Andres Freund) writes:
 On 2015-07-08 15:38:24 -0700, Craig James wrote:
 From my admittedly naive point of view, it's hard to see why any of this
 matters. I have functions that do purely CPU-intensive mathematical
 calculations ... you could imagine something like is_prime(N) that
 determines if N is a prime number. I have eight clients that connect to
 eight backends. Each client issues an SQL command like, select
 is_prime(N) where N is a simple number.

 I mostly replied to Merlin's general point (additionally in the context of
 plpgsql).

 But I have a hard time seing that postgres would be the bottleneck for a
 is_prime() function (or something with similar characteristics) that's
 written in C where the average runtime is more than, say, a couple
 thousand cyles.  I'd like to see a profile of that.

But that was not the case that Graeme was complaining about.  He's talking
about simple-arithmetic-and-looping written in plpgsql, in a volatile
function that is going to take a new snapshot for every statement, even if
that's only n := n+1.  So it's going to spend a substantial fraction of
its runtime banging on the ProcArray, and that doesn't scale.  If you
write your is_prime function purely in plpgsql, and don't bother to mark
it nonvolatile, *it will not scale*.  It'll be slow even in single-thread
terms, but it'll be particularly bad if you're saturating a multicore
machine with it.

One of my Salesforce colleagues has been looking into ways that we could
decide to skip the per-statement snapshot acquisition even in volatile
functions, if we could be sure that a particular statement isn't going to
do anything that would need a snapshot.  Now, IMO that doesn't really do
much for properly written plpgsql; but there's an awful lot of bad plpgsql
code out there, and it can make a huge difference for that.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-08 Thread Craig James
On Wed, Jul 8, 2015 at 1:27 PM, Andres Freund and...@anarazel.de wrote:

 On 2015-07-08 13:46:53 -0500, Merlin Moncure wrote:
  On Wed, Jul 8, 2015 at 12:48 PM, Craig James cja...@emolecules.com
 wrote:
   On Tue, Jul 7, 2015 at 10:31 PM, Joshua D. Drake j...@commandprompt.com
 
   Using Apache Fast-CGI, you are going to fork a process for each
 instance
   of the function being executed and that in turn will use all CPUs up
 to the
   max available resource.
  
   With PostgreSQL, that isn't going to happen unless you are running (at
   least) 8 functions across 8 connections.
  
  
   Well, right, which is why I mentioned even with dozens of clients.
   Shouldn't that scale to at least all of the CPUs in use if the
 function is
   CPU intensive (which it is)?
 
  only in the absence of inter-process locking and cache line bouncing.

 And addititionally memory bandwidth (shared between everything, even in
 the numa case), cross socket/bus bandwidth (absolutely performance
 critical in multi-socket configurations), cache capacity (shared between
 cores, and sometimes even sockets!).


From my admittedly naive point of view, it's hard to see why any of this
matters. I have functions that do purely CPU-intensive mathematical
calculations ... you could imagine something like is_prime(N) that
determines if N is a prime number. I have eight clients that connect to
eight backends. Each client issues an SQL command like, select
is_prime(N) where N is a simple number.

Are you saying that in order to calculate is_prime(N), all of that stuff
(inter-process locking, memory bandwith, bus bandwidth, cache capacity,
etc.) is even relevant? And if so, how is it that Postgres is so different
from an Apache fast-CGI program that runs the exact same is_prime(N)
calculation?

Just curious ... as I said, I've already implemented a different solution.

Craig


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-08 Thread Andres Freund
On 2015-07-08 15:38:24 -0700, Craig James wrote:
 From my admittedly naive point of view, it's hard to see why any of this
 matters. I have functions that do purely CPU-intensive mathematical
 calculations ... you could imagine something like is_prime(N) that
 determines if N is a prime number. I have eight clients that connect to
 eight backends. Each client issues an SQL command like, select
 is_prime(N) where N is a simple number.

I mostly replied to Merlin's general point (additionally in the context of
plpgsql).

But I have a hard time seing that postgres would be the bottleneck for a
is_prime() function (or something with similar characteristics) that's
written in C where the average runtime is more than, say, a couple
thousand cyles.  I'd like to see a profile of that.

Andres


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-07 Thread Graeme B. Bell
Hi everyone,

I've written a new open source tool for easily parallelising SQL scripts in 
postgres.   [obligatory plug:   https://github.com/gbb/par_psql   ]

Using it, I'm seeing a problem I've seen in other postgres projects involving 
parallelisation in the last 12 months.

Basically:

- I have machines here with up to 16 CPUs and 128GB memory, very fast SSDs and 
controller etc, carefully configured kernel/postgresql.conf for high 
performance.

- Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff ...), 
e.g. almost up to 16x performance improvement.

- Calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT 
myfunction(some_stuff)) do not parallelise well, even when they are independent 
or accessing tables in a read-only way. They hit a limit at 2.5x performance 
improvement relative to single-CPU performance (pg9.4) and 2x performance 
(pg9.3). This is about 6 times slower than I'm expecting. 

- Can't see what would be locking. It seems like it's the pl/pgsql environment 
itself that is somehow locking or incurring some huge frictional costs. Whether 
I use independently defined functions, independent source tables, independent 
output tables, makes no difference whatsoever, so it doesn't feel 'locky'. It 
also doesn't seem to be WAL/synchronisation related, as the machines I'm using 
can hit absurdly high pgbench rates, and I'm using unlogged tables.

Curious? Take a quick peek here: 
https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md

Wondering what I'm missing here. Any ideas?

Graeme. 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-07 Thread Craig James
On Thu, Jul 2, 2015 at 9:15 AM, Graeme B. Bell graeme.b...@nibio.no wrote:

 Hi everyone,

 I've written a new open source tool for easily parallelising SQL scripts
 in postgres.   [obligatory plug:   https://github.com/gbb/par_psql   ]

 Using it, I'm seeing a problem I've seen in other postgres projects
 involving parallelisation in the last 12 months.

 Basically:

 - I have machines here with up to 16 CPUs and 128GB memory, very fast SSDs
 and controller etc, carefully configured kernel/postgresql.conf for high
 performance.

 - Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff
 ...), e.g. almost up to 16x performance improvement.

 - Calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT
 myfunction(some_stuff)) do not parallelise well, even when they are
 independent or accessing tables in a read-only way. They hit a limit at
 2.5x performance improvement relative to single-CPU performance (pg9.4) and
 2x performance (pg9.3). This is about 6 times slower than I'm expecting.

 - Can't see what would be locking. It seems like it's the pl/pgsql
 environment itself that is somehow locking or incurring some huge
 frictional costs. Whether I use independently defined functions,
 independent source tables, independent output tables, makes no difference
 whatsoever, so it doesn't feel 'locky'. It also doesn't seem to be
 WAL/synchronisation related, as the machines I'm using can hit absurdly
 high pgbench rates, and I'm using unlogged tables.

 Curious? Take a quick peek here:
 https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md

 Wondering what I'm missing here. Any ideas?


No ideas, but I ran into the same thing. I have a set of C/C++ functions
that put some chemistry calculations into Postgres as extensions (things
like, calculate the molecular weight of this molecule). As SQL functions,
the whole thing bogged down, and we never got the scalability we needed. On
our 8-CPU setup, we couldn't get more than 2 CPUs busy at the same time,
even with dozens of clients.

When I moved these same functions into an Apache fast-CGI HTTP service
(exact same code, same network overhead), I could easily scale up and use
the full 100% of all eight CPUs.

I have no idea why, and never investigated further. The convenience of
having the functions in SQL wasn't that important.

Craig



 Graeme.

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
-
Craig A. James
Chief Technology Officer
eMolecules, Inc.
-


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-07 Thread Joshua D. Drake


On 07/07/2015 08:05 PM, Craig James wrote:



No ideas, but I ran into the same thing. I have a set of C/C++ functions
that put some chemistry calculations into Postgres as extensions (things
like, calculate the molecular weight of this molecule). As SQL
functions, the whole thing bogged down, and we never got the scalability
we needed. On our 8-CPU setup, we couldn't get more than 2 CPUs busy at
the same time, even with dozens of clients.

When I moved these same functions into an Apache fast-CGI HTTP service
(exact same code, same network overhead), I could easily scale up and
use the full 100% of all eight CPUs.

I have no idea why, and never investigated further. The convenience of
having the functions in SQL wasn't that important.


I admit that I haven't read this whole thread but:

Using Apache Fast-CGI, you are going to fork a process for each instance 
of the function being executed and that in turn will use all CPUs up to 
the max available resource.


With PostgreSQL, that isn't going to happen unless you are running (at 
least) 8 functions across 8 connections.


JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance