Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 12/4/15 4:15 AM, Peter J. Holzer wrote: On 2015-12-03 10:02:18 -0500, Tom Lane wrote: >"Peter J. Holzer" writes: > >Can those signals be safely ignored? Just blocking them (so that they > >are delivered after the UDF finishes) might be safer. But even that may > >be a problem: If the UDF then executes some SQL, could that rely on > >signals being delivered? I have no idea. > >The minute you start fooling with a backend's signal behavior, we're >going to politely refuse to support whatever breakage you run into. As I understood Jim he was talking about possible changes to postgresql to shield UDFs from those signals, not something the author of a UDF should do. No, I was suggesting that: 1) It might possible for Postgres error handling to detect that a signal had been received since entering a UDF if the UDF reported an error, and make that information available as part of the standard error report. 2) If there are alarms being set specifically for authentication, they should probably be cancelled after authentication is successful. I certainly do NOT think it would be wise to disable all signal handling during UDF execution. Just for starters, that means that if a UDF went into an infinite loop your only way to recover would be to PANIC the entire database. It would probably create a bunch of other problems as well. In other words, UDFs *must* be capable of handling an interrupt. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 2015-12-03 10:02:18 -0500, Tom Lane wrote: > "Peter J. Holzer" writes: > > Can those signals be safely ignored? Just blocking them (so that they > > are delivered after the UDF finishes) might be safer. But even that may > > be a problem: If the UDF then executes some SQL, could that rely on > > signals being delivered? I have no idea. > > The minute you start fooling with a backend's signal behavior, we're > going to politely refuse to support whatever breakage you run into. As I understood Jim he was talking about possible changes to postgresql to shield UDFs from those signals, not something the author of a UDF should do. > We aren't sending those signals just for amusement's sake. Right. That's why I was sceptical whether those signals could be ignored. I wouldn't have thought so, but Jim clearly knows a lot more about the inner workings of postgresql than I do (which is easy - I know almost nothing) and maybe he knows of a way (something like "we can ignore signals while executing the UDF and just assume that we missed at least one signal and call the magic synchronize state function afterwards") hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
"Peter J. Holzer" writes: > Can those signals be safely ignored? Just blocking them (so that they > are delivered after the UDF finishes) might be safer. But even that may > be a problem: If the UDF then executes some SQL, could that rely on > signals being delivered? I have no idea. The minute you start fooling with a backend's signal behavior, we're going to politely refuse to support whatever breakage you run into. We aren't sending those signals just for amusement's sake. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 2015-12-02 19:07:55 -0600, Jim Nasby wrote: > On 12/2/15 9:26 AM, Peter J. Holzer wrote: > >As explained in backend/utils/misc/timeout.c, the timers are never > >cancelled: If a timeout is cancelled, postgres just sees that it has > >nothing to do and resumes whatever it is doing. > > Hrm, if those timers are really just for auth purposes then perhaps they > should be cancelled. But aside from that, there's certainly other things > that can signal a backend (including fairly normal things, like DDL). Yep. I noticed that, too. In one of my test runs I got two signals instead of the one I expected. Checking the logs I found that it seemed be caused by another user dropping a table. > Offhand I don't think functions run in a CRITICAL block (and I don't think > it'd be a good idea for them to). So really, functions have to be handle > being interrupted. Right. I think that should be mentioned somewhere in the manual. Something like this: Note: PostgreSQL uses signals for various purposes. These signals may be delivered while a user-defined function is executed. Therefore user-defined functions must be able to handle being interrupted, in particular they must expect system calls to fail with errno=EINTR and handle that case appropriately. I'm not sure wether that's an issue with all procedural languages. If it is, it should probable go into "Chapter 39. Procedural Languages". If it is specific to plperl(u), I would put it in "42.8.2. Limitations and Missing Features". > Yeah, it'd be nice to detect that this had happened. Or maybe it's worth it > to ignore SIGALARM while a UDF is running. Can those signals be safely ignored? Just blocking them (so that they are delivered after the UDF finishes) might be safer. But even that may be a problem: If the UDF then executes some SQL, could that rely on signals being delivered? I have no idea. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 12/2/15 9:26 AM, Peter J. Holzer wrote: As explained in backend/utils/misc/timeout.c, the timers are never cancelled: If a timeout is cancelled, postgres just sees that it has nothing to do and resumes whatever it is doing. Hrm, if those timers are really just for auth purposes then perhaps they should be cancelled. But aside from that, there's certainly other things that can signal a backend (including fairly normal things, like DDL). Offhand I don't think functions run in a CRITICAL block (and I don't think it'd be a good idea for them to). So really, functions have to be handle being interrupted. This is also what I'm seeing: 60 seconds after start, the process receives a SIGALRM. If the process is idle or in a "normal" SQL statement at the time, thats not a problem. But if it is in one of my stored procedures which is currently calling a ØMQ function which is waiting for some I/O (zmq_msg_recv(), most likely), that gets interrupted and returns an error which my code doesn't know how to handle (yet). So the error gets back to the user. Yeah, it'd be nice to detect that this had happened. Or maybe it's worth it to ignore SIGALARM while a UDF is running. (We certainly wouldn't want to do a HOLD_INTERRUPTS() though...) -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 2015-12-01 20:55:02 +0100, Peter J. Holzer wrote: > On 2015-12-01 18:58:31 +0100, Peter J. Holzer wrote: > > I suspect such an interaction because I cannot reproduce the problem > > outside of a stored procedure. A standalone Perl script doing the same > > requests doesn't get a timeout. [...] > The strace doesn't show a reason for the SIGALRM, though. No alarm(2) or > setitimer(2) system call (I connected strace to a running postgres > process just after I got the prompt from "psql" and before I typed > "select * from mb_search('export');" (I used a different (but very > similar) stored procedure for those tests because it is much easier to > find a search which is slow enough to trigger a timeout at least > sometimes than a data request (which normally finishes in > milliseconds)). > > So I guess my next task will be to find out where that SIGALRM comes > from and/or whether I can just restart the zmq_msg_recv if it happens. Ok, I think I know where that SIGALRM comes from: It's the AuthenticationTimeout. What I'm seeing in strace (if I attach it early enough) is that during authentication the postgres worker process calls setitimer with a 60 second timeout twice. This matches the comment in backend/postmaster/postmaster.c: * Note: AuthenticationTimeout is applied here while waiting for the * startup packet, and then again in InitPostgres for the duration of any * authentication operations. So a hostile client could tie up the * process for nearly twice AuthenticationTimeout before we kick him off. As explained in backend/utils/misc/timeout.c, the timers are never cancelled: If a timeout is cancelled, postgres just sees that it has nothing to do and resumes whatever it is doing. This is also what I'm seeing: 60 seconds after start, the process receives a SIGALRM. If the process is idle or in a "normal" SQL statement at the time, thats not a problem. But if it is in one of my stored procedures which is currently calling a ØMQ function which is waiting for some I/O (zmq_msg_recv(), most likely), that gets interrupted and returns an error which my code doesn't know how to handle (yet). So the error gets back to the user. A strange interaction between postgres and ØMQ indeed. But now that I know what's causing it I can handle that. Thanks for your patience. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 2015-12-01 13:13:27 -0500, Tom Lane wrote: > "Peter J. Holzer" writes: > > Postgres worker processes are single-threaded, are they? Is there > > something else which could interact badly with a moderately complex > > multithreaded I/O library used from a stored procedure? > > Yes, lots. If you cause additional threads to appear inside a backend > process, things could break arbitrarily badly. It's up to you to ensure > that none of those extra threads ever escape to execute any non-Perl > code. Actually, non-�MQ code. Perl doesn't like to be unexpectedly multithreaded either. Yes, those threads should only ever execute code from the �MQ library. In fact they are automatically created and destroyed by the library and there is no way to control them from Perl code (there may be a way to do that from the C API, but I don't remember seeing that in the manual). > I suspect this could easily explain the problems you're seeing. Quite. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 2015-12-01 18:58:31 +0100, Peter J. Holzer wrote: > I suspect such an interaction because I cannot reproduce the problem > outside of a stored procedure. A standalone Perl script doing the same > requests doesn't get a timeout. > > I guess Alvaro is right: I should strace the postgres worker process > while it executes the stored procedure. The problem of course is that > it happens often enough be annoying, but rarely enough that it's not > easily reproducible. I did manage to catch a timeout once with strace in the mean time, although that one was much more straightforward and less mysterious than the original case: postgres process sends message, about 10 seconds later it receives a SIGALRM which interrupts an epoll, reply hasn't yet arrived, error message to client and log file. No waits in functions which shouldn't wait or messages which arrive much later than they were (presumably) sent. The strace doesn't show a reason for the SIGALRM, though. No alarm(2) or setitimer(2) system call (I connected strace to a running postgres process just after I got the prompt from "psql" and before I typed "select * from mb_search('export');" (I used a different (but very similar) stored procedure for those tests because it is much easier to find a search which is slow enough to trigger a timeout at least sometimes than a data request (which normally finishes in milliseconds)). So I guess my next task will be to find out where that SIGALRM comes from and/or whether I can just restart the zmq_msg_recv if it happens. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 2015-12-01 10:20:09 -0800, Adrian Klaver wrote: > On 12/01/2015 09:58 AM, Peter J. Holzer wrote: > >On 2015-12-01 07:16:04 -0800, Adrian Klaver wrote: > >>On 12/01/2015 06:51 AM, Peter J. Holzer wrote: > >>>A rather weird observation from the log files of our server (9.5 beta1): > >>> > >>>2015-12-01 09:23:37 CET [26265]: [42-1] user=fragner,db=wds,pid=26265 > >>>WARNING: Use of uninitialized value $success in concatenation (.) or > >>>string at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. > >>>2015-12-01 09:23:37 CET [26265]: [43-1] user=fragner,db=wds,pid=26265 > >>>CONTEXT: PL/Perl function "mb_timeseriesdata_zmq" > >>>[lots of other stuff from different connections] > >>>2015-12-01 09:24:45 CET [26265]: [44-1] user=fragner,db=wds,pid=26265 > >>>ERROR: impossible result '' (payload=) at > >>>/usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. > >>> WDS::Macrobond::Utils::decode_result("") called at line 30 > >>> main::__ANON__("gen_wqehur") called at -e line 0 > >>> eval {...} called at -e line 0 > >>> > >>>Two messages from the same line of the same plperlu stored procedure, 68 > >>>seconds apart. So what is this line 36? > >>> > >>> confess "impossible result '$success' (payload=$payload)"; > >>> > >>>What? The first message clearly comes from interpolating $success > >>>(which is undef at that point) into the argument. The second from > >>>confess itself. What could cause a plperlu procedure to freeze for 68 > >>>seconds between the call to confess and its output? > >>> > >>>Is it possible that only the writing of the log entry is delayed? > >>> > >>>Another weird thing: $success is undef because a ØMQ rpc call[1] timed > >> > >>And the call is? > > > >The sequence is: > > > >my $req_sck = zmq_socket($context, ZMQ_REQ); > >zmq_connect($req_sck, $url); > >my $qry_msg = join(...); > >zmq_send($req_sck, $qry_msg); > >my $res_msg = zmq_msg_init(); > >my $rv = zmq_msg_recv($res_msg, $req_sck); > ># check rv here. > >my $data = zmq_msg_data($res_msg); # $data is "" here > >my $result = WDS::Macrobond::Utils::decode_result($data); # the error > >messages are from this function > > > >(Yeah, ØMQ is quite low-level. There is a higher level Perl Module, but > >I'm not using it). > > > >I omitted that because I don't think it's terribly relevant here. > >Details of the usage of ØMQ are better discussed on the ØMQ mailing > >list. > > > >But there is something else which may be relevant: ØMQ uses threads > >internally, and I don't actually know whether zmq_msg_recv returning > > Except I see this here: > > http://api.zeromq.org/4-0:zmq-socket > > Thread safety > > ØMQ sockets are not thread safe. Applications MUST NOT use a socket from > multiple threads except after migrating a socket from one thread to another > with a "full fence" memory barrier. Well yes, but I don't use a ØMQ socket in multiple threads, It is created in the stored procedure and destroyed at the end (just checked the strace output: Yes it is. For a moment I wasn't sure whether lexical variables in plperlu procedures go out of scope.). It's the ØMQ library itself which creates extra threads (And it should terminate them properly and afaics from strace it does). [...] > From here: > > http://api.zeromq.org/4-0:zmq-connect > > It seems something like(I am not a Perl programmer, so approach carefully): > > my $rc = zmq_connect($req_sck, $url); > > Then you will have an error code to examine. Yes. I have been a bit sloppy with error checking. I check only the return value of the zmq_msg_recv() call which returns the empty message. It is possible that the problem actually occurs earlier and I should check those calls as well. Mea culpa. However, in at least one case the failed call was indeed zmq_msg_recv() not one of the earlier ones (see my next mail). > Have you looked at the Notes at the bottom of this page: > > http://www.postgresql.org/docs/9.4/interactive/plperl-trusted.html I have. I don't think that's a problem here: Debian perl is built with with both multiplicity and ithreads, and I would assume that the .deb packages from postgresql.org use the shared library provided by the system. But even if that wasn't the case it should not be a problem as only plperlu stored procedures are called. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 12/01/2015 09:58 AM, Peter J. Holzer wrote: On 2015-12-01 07:16:04 -0800, Adrian Klaver wrote: On 12/01/2015 06:51 AM, Peter J. Holzer wrote: A rather weird observation from the log files of our server (9.5 beta1): 2015-12-01 09:23:37 CET [26265]: [42-1] user=fragner,db=wds,pid=26265 WARNING: Use of uninitialized value $success in concatenation (.) or string at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. 2015-12-01 09:23:37 CET [26265]: [43-1] user=fragner,db=wds,pid=26265 CONTEXT: PL/Perl function "mb_timeseriesdata_zmq" [lots of other stuff from different connections] 2015-12-01 09:24:45 CET [26265]: [44-1] user=fragner,db=wds,pid=26265 ERROR: impossible result '' (payload=) at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. WDS::Macrobond::Utils::decode_result("") called at line 30 main::__ANON__("gen_wqehur") called at -e line 0 eval {...} called at -e line 0 Two messages from the same line of the same plperlu stored procedure, 68 seconds apart. So what is this line 36? confess "impossible result '$success' (payload=$payload)"; What? The first message clearly comes from interpolating $success (which is undef at that point) into the argument. The second from confess itself. What could cause a plperlu procedure to freeze for 68 seconds between the call to confess and its output? Is it possible that only the writing of the log entry is delayed? Another weird thing: $success is undef because a ØMQ rpc call[1] timed And the call is? The sequence is: my $req_sck = zmq_socket($context, ZMQ_REQ); zmq_connect($req_sck, $url); my $qry_msg = join(...); zmq_send($req_sck, $qry_msg); my $res_msg = zmq_msg_init(); my $rv = zmq_msg_recv($res_msg, $req_sck); # check rv here. my $data = zmq_msg_data($res_msg); # $data is "" here my $result = WDS::Macrobond::Utils::decode_result($data); # the error messages are from this function (Yeah, ØMQ is quite low-level. There is a higher level Perl Module, but I'm not using it). I omitted that because I don't think it's terribly relevant here. Details of the usage of ØMQ are better discussed on the ØMQ mailing list. But there is something else which may be relevant: ØMQ uses threads internally, and I don't actually know whether zmq_msg_recv returning Except I see this here: http://api.zeromq.org/4-0:zmq-socket Thread safety ØMQ sockets are not thread safe. Applications MUST NOT use a socket from multiple threads except after migrating a socket from one thread to another with a "full fence" memory barrier. means that the read(2) call (or whatever) on the socket terminates. It may actually continue in another thread. But I still don't see how that could block the main thread (or wake it up again in a place which has nothing to do with ØMQ (confess is a standard Perl function to print a stack trace and die)). Or - just thinking aloud here - I fear I'm abusing you guys as support teddy bears[1] - maybe it's the other way round: confess dies, so maybe it frees some lock during cleanup which allows the message which should have been sent by zmq_send to finally go out on the wire. But that still doesn't explain the 68 seconds spent in confess ... Postgres worker processes are single-threaded, are they? Is there something else which could interact badly with a moderately complex multithreaded I/O library used from a stored procedure? I suspect such an interaction because I cannot reproduce the problem outside of a stored procedure. A standalone Perl script doing the same requests doesn't get a timeout. I guess Alvaro is right: I should strace the postgres worker process while it executes the stored procedure. The problem of course is that it happens often enough be annoying, but rarely enough that it's not easily reproducible. From here: http://api.zeromq.org/4-0:zmq-connect It seems something like(I am not a Perl programmer, so approach carefully): my $rc = zmq_connect($req_sck, $url); Then you will have an error code to examine. Have you looked at the Notes at the bottom of this page: http://www.postgresql.org/docs/9.4/interactive/plperl-trusted.html out (after 60 seconds, which is also a mystery, because ØMQ doesn't seem to have a default timeout of 60 seconds, and I don't set one). But at Network timeout? That was my first guess, but I don't see where it would come from. Or why it only is there if I call the code from a stored procedure, not from a standalone script. 09:24:45 (i.e. the time of the error message) the answer for that RPC call arrived. So it kind of looks like confess waited for the message to arrive (which makes no sense at all) or maybe that confess waited for something which also blocked the sending of the request (because according to the server logs, the RPC request only arrived there at 09:24:45 and was answered within 1 second), but that doesn't make any So if the request timed out how did you get a reply, a second r
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
"Peter J. Holzer" writes: > But there is something else which may be relevant: ØMQ uses threads > internally, and I don't actually know whether zmq_msg_recv returning > means that the read(2) call (or whatever) on the socket terminates. ... ugh ... > Postgres worker processes are single-threaded, are they? Is there > something else which could interact badly with a moderately complex > multithreaded I/O library used from a stored procedure? Yes, lots. If you cause additional threads to appear inside a backend process, things could break arbitrarily badly. It's up to you to ensure that none of those extra threads ever escape to execute any non-Perl code. I suspect this could easily explain the problems you're seeing. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 2015-12-01 07:16:04 -0800, Adrian Klaver wrote: > On 12/01/2015 06:51 AM, Peter J. Holzer wrote: > >A rather weird observation from the log files of our server (9.5 beta1): > > > >2015-12-01 09:23:37 CET [26265]: [42-1] user=fragner,db=wds,pid=26265 > >WARNING: Use of uninitialized value $success in concatenation (.) or string > >at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. > >2015-12-01 09:23:37 CET [26265]: [43-1] user=fragner,db=wds,pid=26265 > >CONTEXT: PL/Perl function "mb_timeseriesdata_zmq" > >[lots of other stuff from different connections] > >2015-12-01 09:24:45 CET [26265]: [44-1] user=fragner,db=wds,pid=26265 ERROR: > > impossible result '' (payload=) at > >/usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. > > WDS::Macrobond::Utils::decode_result("") called at line 30 > > main::__ANON__("gen_wqehur") called at -e line 0 > > eval {...} called at -e line 0 > > > >Two messages from the same line of the same plperlu stored procedure, 68 > >seconds apart. So what is this line 36? > > > > confess "impossible result '$success' (payload=$payload)"; > > > >What? The first message clearly comes from interpolating $success > >(which is undef at that point) into the argument. The second from > >confess itself. What could cause a plperlu procedure to freeze for 68 > >seconds between the call to confess and its output? > > > >Is it possible that only the writing of the log entry is delayed? > > > >Another weird thing: $success is undef because a ØMQ rpc call[1] timed > > And the call is? The sequence is: my $req_sck = zmq_socket($context, ZMQ_REQ); zmq_connect($req_sck, $url); my $qry_msg = join(...); zmq_send($req_sck, $qry_msg); my $res_msg = zmq_msg_init(); my $rv = zmq_msg_recv($res_msg, $req_sck); # check rv here. my $data = zmq_msg_data($res_msg); # $data is "" here my $result = WDS::Macrobond::Utils::decode_result($data); # the error messages are from this function (Yeah, ØMQ is quite low-level. There is a higher level Perl Module, but I'm not using it). I omitted that because I don't think it's terribly relevant here. Details of the usage of ØMQ are better discussed on the ØMQ mailing list. But there is something else which may be relevant: ØMQ uses threads internally, and I don't actually know whether zmq_msg_recv returning means that the read(2) call (or whatever) on the socket terminates. It may actually continue in another thread. But I still don't see how that could block the main thread (or wake it up again in a place which has nothing to do with ØMQ (confess is a standard Perl function to print a stack trace and die)). Or - just thinking aloud here - I fear I'm abusing you guys as support teddy bears[1] - maybe it's the other way round: confess dies, so maybe it frees some lock during cleanup which allows the message which should have been sent by zmq_send to finally go out on the wire. But that still doesn't explain the 68 seconds spent in confess ... Postgres worker processes are single-threaded, are they? Is there something else which could interact badly with a moderately complex multithreaded I/O library used from a stored procedure? I suspect such an interaction because I cannot reproduce the problem outside of a stored procedure. A standalone Perl script doing the same requests doesn't get a timeout. I guess Alvaro is right: I should strace the postgres worker process while it executes the stored procedure. The problem of course is that it happens often enough be annoying, but rarely enough that it's not easily reproducible. > >out (after 60 seconds, which is also a mystery, because ØMQ doesn't seem > >to have a default timeout of 60 seconds, and I don't set one). But at > > Network timeout? That was my first guess, but I don't see where it would come from. Or why it only is there if I call the code from a stored procedure, not from a standalone script. > >09:24:45 (i.e. the time of the error message) the answer for that RPC > >call arrived. So it kind of looks like confess waited for the message to > >arrive (which makes no sense at all) or maybe that confess waited for > >something which also blocked the sending of the request (because > >according to the server logs, the RPC request only arrived there at > >09:24:45 and was answered within 1 second), but that doesn't make any > > So if the request timed out how did you get a reply, a second request? Nope. I don't really "get" the reply. I just see in the logs of the other server that it sent a reply at that time. The time line is like this timepostgres processmb_dal process T zmq_send() zmq_msg_recv() T+60zmq_msg_recv returns an empty message; decode() is called which notices that the message is empty and calls confess(); T+128 confess() prints a receives the message sent at T; stacktrace; sends a reply; hp [1] For
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On Tue, Dec 1, 2015 at 8:51 AM, Peter J. Holzer wrote: > What could cause a plperlu procedure to freeze for 68 > seconds between the call to confess and its output? > Another weird thing: $success is undef because a ØMQ rpc call[1] timed > out (after 60 seconds, which is also a mystery, because ØMQ doesn't seem > to have a default timeout of 60 seconds, and I don't set one). One thing I would check is the TCP keepalive settings. If you don't sort it out, please read this and post with more information; it's tough to guess at a cause without knowing the version of PostgreSQL or the OS involved, etc. https://wiki.postgresql.org/wiki/Guide_to_reporting_problems -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
Peter J. Holzer wrote: > Two messages from the same line of the same plperlu stored procedure, 68 > seconds apart. So what is this line 36? > > confess "impossible result '$success' (payload=$payload)"; > > What? The first message clearly comes from interpolating $success > (which is undef at that point) into the argument. The second from > confess itself. What could cause a plperlu procedure to freeze for 68 > seconds between the call to confess and its output? Try running it under strace. Maybe it's trying to resolve some name and its resolver takes a minute to discover it can't, or something like that. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 12/01/2015 06:51 AM, Peter J. Holzer wrote: A rather weird observation from the log files of our server (9.5 beta1): 2015-12-01 09:23:37 CET [26265]: [42-1] user=fragner,db=wds,pid=26265 WARNING: Use of uninitialized value $success in concatenation (.) or string at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. 2015-12-01 09:23:37 CET [26265]: [43-1] user=fragner,db=wds,pid=26265 CONTEXT: PL/Perl function "mb_timeseriesdata_zmq" [lots of other stuff from different connections] 2015-12-01 09:24:45 CET [26265]: [44-1] user=fragner,db=wds,pid=26265 ERROR: impossible result '' (payload=) at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. WDS::Macrobond::Utils::decode_result("") called at line 30 main::__ANON__("gen_wqehur") called at -e line 0 eval {...} called at -e line 0 Two messages from the same line of the same plperlu stored procedure, 68 seconds apart. So what is this line 36? confess "impossible result '$success' (payload=$payload)"; What? The first message clearly comes from interpolating $success (which is undef at that point) into the argument. The second from confess itself. What could cause a plperlu procedure to freeze for 68 seconds between the call to confess and its output? Is it possible that only the writing of the log entry is delayed? Another weird thing: $success is undef because a ØMQ rpc call[1] timed And the call is? out (after 60 seconds, which is also a mystery, because ØMQ doesn't seem to have a default timeout of 60 seconds, and I don't set one). But at Network timeout? 09:24:45 (i.e. the time of the error message) the answer for that RPC call arrived. So it kind of looks like confess waited for the message to arrive (which makes no sense at all) or maybe that confess waited for something which also blocked the sending of the request (because according to the server logs, the RPC request only arrived there at 09:24:45 and was answered within 1 second), but that doesn't make any So if the request timed out how did you get a reply, a second request? sense either. (Just noticed that 60 + 68 == 128, which is also a round number). This looks a question for a ØMQ list. Another thought, have you tried the code outside Postgres to see if works? hp [1] ØMQ is an IPC framework: See http://zeromq.org/ We use it to make RPC calls from stored procedures to a server process. -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] plperlu stored procedure seems to freeze for a minute
A rather weird observation from the log files of our server (9.5 beta1): 2015-12-01 09:23:37 CET [26265]: [42-1] user=fragner,db=wds,pid=26265 WARNING: Use of uninitialized value $success in concatenation (.) or string at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. 2015-12-01 09:23:37 CET [26265]: [43-1] user=fragner,db=wds,pid=26265 CONTEXT: PL/Perl function "mb_timeseriesdata_zmq" [lots of other stuff from different connections] 2015-12-01 09:24:45 CET [26265]: [44-1] user=fragner,db=wds,pid=26265 ERROR: impossible result '' (payload=) at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. WDS::Macrobond::Utils::decode_result("") called at line 30 main::__ANON__("gen_wqehur") called at -e line 0 eval {...} called at -e line 0 Two messages from the same line of the same plperlu stored procedure, 68 seconds apart. So what is this line 36? confess "impossible result '$success' (payload=$payload)"; What? The first message clearly comes from interpolating $success (which is undef at that point) into the argument. The second from confess itself. What could cause a plperlu procedure to freeze for 68 seconds between the call to confess and its output? Is it possible that only the writing of the log entry is delayed? Another weird thing: $success is undef because a ØMQ rpc call[1] timed out (after 60 seconds, which is also a mystery, because ØMQ doesn't seem to have a default timeout of 60 seconds, and I don't set one). But at 09:24:45 (i.e. the time of the error message) the answer for that RPC call arrived. So it kind of looks like confess waited for the message to arrive (which makes no sense at all) or maybe that confess waited for something which also blocked the sending of the request (because according to the server logs, the RPC request only arrived there at 09:24:45 and was answered within 1 second), but that doesn't make any sense either. (Just noticed that 60 + 68 == 128, which is also a round number). hp [1] ØMQ is an IPC framework: See http://zeromq.org/ We use it to make RPC calls from stored procedures to a server process. -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature