Re: [BUGS] [PATCH v2] Use CC atomic builtins as a fallback
Martin Pitt writes: > The updated patch only uses the gcc builtins if there is no explicit > implementation, but drops the arm one as this doesn't work on ARMv7 > and newer, as stated in the original mail. Getting this thread back to the original patch ... I'm afraid that if we apply this as-is, what will happen is that we fix ARMv7 and break older versions. Some googling found this, for instance: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33413 which suggests that (1) ARM gcc hasn't had __sync_lock_test_and_set for very long, and (2) what it generates doesn't work pre-ARMv6. So I'm thinking that removing the swpb ASM option is not such a good idea. We could possibly test for __sync_lock_test_and_set first, and only use swpb if we're on ARM and don't have the builtin. Another thing that is bothering me is that according to the gcc manual, eg here, http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html __sync_lock_test_and_set is nominally provided for datatypes 1, 2, 4, or 8 bytes in length, but the underlying hardware doesn't necessarily support all those widths natively. If you pick the wrong width then you don't get an inline operation at all, but a call to some possibly inefficient library subroutine. I see that your patch just assumes that "int" will be a good width for the lock type, but it's unclear to me what that choice is based on and whether or not it might be a really bad choice on some platforms. A look through s_lock.h suggests that only a minority of platforms prefer int-width locks ... but I have no idea how many of those assembly snippets could have been coded to use a different lock datatype without penalty. Some other evidence that 4-byte __sync_lock_test_and_set isn't universal is here: https://svn.boost.org/trac/boost/ticket/2525 Google is also finding some rather worrisome suggestions that __sync_lock_test_and_set might involve a kernel call on some flavors of ARM. That would be pretty disastrous from a performance standpoint. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function
On 19/12/2011 11:14 PM, Andrea Grassi wrote: Hi, Craig Now my process is blocked and I have the case in my hands. Do you have something to ask me in order to have more details ? As I tend to agree with Tom re this being a kernel issue, try (as root): # Enable stack dumps etc via sysrq echo 8 > /proc/sys/kernel/sysrq # Trigger kernel stack dump of all processes via sysrq mechanism echo t > /proc/sysrq-trigger ... then search the kernel log files to find the kernel stack dump associated with your test program. If you're not on the latest kernel for your OS, you should update it. -- Craig Ringer -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: R: R: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function
On 21/12/2011 1:42 AM, Tom Lane wrote: Hrm. What's with the 48 bytes in the client's receive queue? Surely the kernel should be reporting that the socket is read-ready, if it's got some data. I think you've found an obscure kernel bug somehow it's failing to wake the poll() caller. I've been leaning that way too; that's why I was asking him for /proc/$pid/stack and `wchan -C programname -o wchan:80=` output - to get some idea of what function in the kernel it's sitting in. Unfortunately the OP is on some enterprise distro that doesn't have /proc/$pid/stack . wchan info would still be useful. I wonder how old their kernel is? The bug could've already been fixed. /proc/pid/stack has been around since 2008 so it must be pretty elderly. OP: You can also get a kernel stack for a process by enabling the magic SysRQ key (see Google) then using Alt-SysRq-T . This requires a physical keyboard directly connected to the server. It emits the stack information via dmesg. See: http://en.wikipedia.org/wiki/Magic_SysRq_key There's a "sysrqd" that apparently lets you use these features remotely, but I've never tried it. -- Craig Ringer -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Security definer "generated column" function used in index
Tom Lane wrote: > On reflection what seems most likely is simply that turning these > otherwise-inlineable SQL functions into SECURITY DEFINER disabled > inline-ing them, resulting in catastrophic degradation of the > generated plans, such that they took a lot longer than you were > accustomed to (they shouldn't have been "hung" though). Ah, I had not considered that. That also explains why my attempts to recreate the situation with "toy" tables didn't show the issue. Also, it didn't occur to me until later to check whether a continue and another backtrace showed things moving; all the evidence suggested (in retrospect) that it was "doing something" rather than being blocked, per se; but these are normally sub-second queries which were killed after running over an hour, so I (probably wrongly) assumed they were in an endless loop. I will try again in just one site with a bit more care about which functions I flag. If that goes OK, I'll have the confidence to go forward with the application release. Thanks! -Kevin -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Security definer "generated column" function used in index
"Kevin Grittner" writes: > ... It wasn't even clear to me that it was > OK to have one security definer function call another, based on the > code comment I quoted, so I didn't want to spend more hours on > attempting to create a test case if it simply wasn't supported. Yes, that's definitely *supposed* to work, though I'll grant that there could be bugs there. It's hard to see how it'd be a race condition though. On reflection what seems most likely is simply that turning these otherwise-inlineable SQL functions into SECURITY DEFINER disabled inline-ing them, resulting in catastrophic degradation of the generated plans, such that they took a lot longer than you were accustomed to (they shouldn't have been "hung" though). regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Security definer "generated column" function used in index
Tom Lane wrote: > "Kevin Grittner" writes: >> No comments on this? > > If there was a reproducible test case in your original message, > I didn't see it, so I assumed you intended to investigate further > on your own. It wasn't even clear to me that this was a Postgres > bug rather than some error in your trigger logic. Sorry if my first post wasn't clear. It was happening on SELECT statements; no triggers involved. (I had *intended* just to get trigger functions, but had accidentally included some others.) I wasn't able to create a small, self-contained test case with a few hours of attempts, so I was hoping someone could suggest (from the stack traces and other clues) how best to attempt that or what other information might be useful. It wasn't even clear to me that it was OK to have one security definer function call another, based on the code comment I quoted, so I didn't want to spend more hours on attempting to create a test case if it simply wasn't supported. Sad to say, the script which flagged the functions as security definer didn't cause problems in normal testing, and were deployed to production (in advance of a software release which will need the expanded permissions), where the problem surfaced under user load. The fact that the larger number of concurrent users hit the problem where my test scripts haven't suggests some race condition, so even if I create it here, it will probably be something where I need to know what information to capture while it is happening. We only need to add the security definer flag on trigger functions at this point for the upcoming application release, but I'm not yet confident that this is safe. -Kevin -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Security definer "generated column" function used in index
"Kevin Grittner" writes: > No comments on this? If there was a reproducible test case in your original message, I didn't see it, so I assumed you intended to investigate further on your own. It wasn't even clear to me that this was a Postgres bug rather than some error in your trigger logic. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Security definer "generated column" function used in index
No comments on this? It seems to me that at a minimum this needs better documentation of a limitation, and the conditions under which you hit the problem. I'm not sure there isn't an outright bug here. We would like to flag all of our trigger functions as SECURITY DEFINER, but there are triggers which do DML which can fire other triggers, and at this point I'm not sure whether that's safe. Anyone? -Kevin On 2011-12-09 12:49 PM I wrote: PostgreSQL version 9.0.4, 64 bit. Linux version 2.6.16.60-0.39.3-smp (geeko@buildhost) (gcc version 4.1.2 20070115 (SUSE Linux)) #1 SMP Mon May 11 11:46:34 UTC 2009 SUSE Linux Enterprise Server 10 (x86_64) VERSION = 10 PATCHLEVEL = 2 We flagged some functions as SECURITY DEFINER and had queries which had been in use for months suddenly fail to complete. We set them back to SECURITY INVOKER and things returned to normal. I took stack traces of the four connections with queries which seemed to be "stuck". They all had this sequence of calls in the middle: #13 0x0054a5b6 in fmgr_sql (fcinfo=0x7fff9eec0b30) at functions.c:441 #14 0x006a2f05 in fmgr_security_definer (fcinfo=0x30006) at fmgr.c:957 #15 0x00544047 in ExecMakeFunctionResult (fcache=0x512ac70, econtext=0x512aa80, isNull=0x512b8a8 "", isDone=0x512b9c0) at execQual.c:1827 #16 0x00540fb7 in ExecProject (projInfo=, isDone=0x7fff9eec101c) at execQual.c:5089 #17 0x00555403 in ExecResult (node=0x512a970) at nodeResult.c:155 #18 0x005409a6 in ExecProcNode (node=0x512a970) at execProcnode.c:355 #19 0x0053f891 in standard_ExecutorRun (queryDesc=0x2b4121b11280, direction=-1628699568, count=1) at execMain.c:1188 #20 0x0054a656 in fmgr_sql (fcinfo=0x7fff9eec1310) at functions.c:475 #21 0x006a2f05 in fmgr_security_definer (fcinfo=0x30006) at fmgr.c:957 #22 0x00545ef0 in ExecMakeFunctionResultNoSets (fcache=0x2b4121aa2b98, econtext=0x2b4121aa1798, isNull=0x7fff9eec1a90 "", isDone=) at execQual.c:1894 #23 0x00545e6c in ExecMakeFunctionResultNoSets (fcache=0x2b4121aa2358, econtext=0x2b4121aa1798, isNull=0x7fff9eec1b8f "", isDone=) at execQual.c:1866 #24 0x00545f8f in ExecQual (qual=, econtext=0x2b4121aa1798, resultForNull=0 '\0') at execQual.c:4991 #25 0x005476ef in ExecScan (node=0x2b4121aa1688, accessMtd=0x5511a0 , recheckMtd=0x551150 ) at execScan.c:192 #26 0x005409ea in ExecProcNode (node=0x2b4121aa1688) at execProcnode.c:382 #27 0x00554935 in ExecNestLoop (node=0x2b4121a9dea0) at nodeNestloop.c:154 #28 0x00540a6a in ExecProcNode (node=0x2b4121a9dea0) at execProcnode.c:419 Full (unedited) stack traces for all four attached. Notice the recursive calls to fmgr_security_definer(). I wonder whether that might be a problem, since the comment for that function says: | This is not re-entrant, but then the fcinfo itself can't be used | re-entrantly anyway. All of these queries are similar, and involved searches using a LIKE clause against a "searchName" "generated column" -- a function taking the record type of the table as its parameter. That function then calls a function which takes several parameters, Both functions were changed to SECURITY DEFINER when the problems started. The functions are: CREATE OR REPLACE FUNCTION "searchName"(rec "Party") RETURNS "SearchNameT" LANGUAGE sql IMMUTABLE AS $$ select "searchName"($1."nameL", $1."nameF", $1."nameM", $1."suffix"); $$; CREATE OR REPLACE FUNCTION "searchName"("nameL" "LastNameT", "nameF" "FirstNameT", "nameM" "MiddleNameT", "suffix" "NameSuffixT") RETURNS "SearchNameT" LANGUAGE sql IMMUTABLE AS $$ select regexp_replace(upper( $1 || case when $2 is not null or $3 is not null or $4 is not null then ',' || coalesce($2, '') || coalesce($3, '') || coalesce($4, '') else '' end), '[^A-Z0-9\,]', '', 'g')::"SearchNameT" $$ And there is an index on "Party": "Party_SearchName" btree ("searchName"("Party".*)) First off, is there much chance that this is fixed between 9.0.4 and 9.0.6? If not, what do people feel would be the most useful information for diagnosing the problem? -Kevin -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Incorrect comment in heapam.c
Simon Riggs writes: > On Tue, Dec 20, 2011 at 5:50 PM, Peter Geoghegan > wrote: >> In fact, that macro is defined in access/htup.h...should it be? > IMHO comment is wrong, code is in the right place. It used to be in heapam.h ... evidently, whoever moved it missed this comment. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Incorrect comment in heapam.c
On Tue, Dec 20, 2011 at 5:50 PM, Peter Geoghegan wrote: > Line 834 of heapam.c has the following comment: > > /* > * This is formatted so oddly so that the correspondence to the macro > * definition in access/heapam.h is maintained. > */ > > In fact, that macro is defined in access/htup.h...should it be? IMHO comment is wrong, code is in the right place. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] Incorrect comment in heapam.c
Line 834 of heapam.c has the following comment: /* * This is formatted so oddly so that the correspondence to the macro * definition in access/heapam.h is maintained. */ In fact, that macro is defined in access/htup.h...should it be? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: R: R: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function
"Andrea Grassi" writes: > This is the server side stack kept by gdb: > [ server is waiting to receive something from client ] > The netstat command on client and server connection has this output: > The first line should be the server, the second the client. > Proto Recv-Q Send-Q Local Address Foreign Address State > PID/Program name > tcp0 0 127.0.0.1:5432 127.0.0.1:53129 > ESTABLISHED - > tcp 48 0 127.0.0.1:53129 127.0.0.1:5432 > ESTABLISHED 29802/g_mrprun.e Hrm. What's with the 48 bytes in the client's receive queue? Surely the kernel should be reporting that the socket is read-ready, if it's got some data. I think you've found an obscure kernel bug somehow it's failing to wake the poll() caller. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
R: R: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function
This is the server side stack kept by gdb: (gdb) bt full #0 0x2b6488588ae5 in recv () from /lib64/libc.so.6 No symbol table info available. #1 0x00550cd2 in secure_read () No symbol table info available. #2 0x005563a4 in pq_recvbuf () No symbol table info available. #3 0x005567a7 in pq_getbyte () No symbol table info available. #4 0x005d33e6 in PostgresMain () No symbol table info available. #5 0x005a9708 in ServerLoop () No symbol table info available. #6 0x005aa2b7 in PostmasterMain () No symbol table info available. #7 0x005580be in main () No symbol table info available. The netstat command on client and server connection has this output: The first line should be the server, the second the client. Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp0 0 127.0.0.1:5432 127.0.0.1:53129 ESTABLISHED - tcp 48 0 127.0.0.1:53129 127.0.0.1:5432 ESTABLISHED 29802/g_mrprun.e Regards, Andrea -Messaggio originale- Da: Tom Lane [mailto:t...@sss.pgh.pa.us] Inviato: martedì 20 dicembre 2011 17.38 A: Andrea Grassi Cc: harry...@comcast.net; 'Craig Ringer'; 'Pg Bugs'; 'Alvaro Herrera' Oggetto: Re: R: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function "Andrea Grassi" writes: > #0 0xe410 in __kernel_vsyscall () > No symbol table info available. > #1 0xf76539cb in poll () from /lib/libc.so.. > #2 0xf770d39a in pqSocketCheck () from /home/pg/pgsql/lib-32/libpq.so.5 > #3 0xf770d49d in pqWaitTimed () from /home/pg/pgsql/lib-32/libpq.so.5 > #4 0xf770d513 in pqWait () from /home/pg/pgsql/lib-32/libpq.so.5 > #5 0xf770c6d6 in PQgetResult () from /home/pg/pgsql/lib-32/libpq.so.5 > #6 0xf770c89c in PQexecFinish () from /home/pg/pgsql/lib-32/libpq.so.5 What about a stack trace from the connected server process? libpq clearly thinks it's waiting for a message from the server, but I wonder what the server thinks. Also, what connection status does netstat show on each side? regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: R: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function
"Andrea Grassi" writes: > #0 0xe410 in __kernel_vsyscall () > No symbol table info available. > #1 0xf76539cb in poll () from /lib/libc.so.. > #2 0xf770d39a in pqSocketCheck () from /home/pg/pgsql/lib-32/libpq.so.5 > #3 0xf770d49d in pqWaitTimed () from /home/pg/pgsql/lib-32/libpq.so.5 > #4 0xf770d513 in pqWait () from /home/pg/pgsql/lib-32/libpq.so.5 > #5 0xf770c6d6 in PQgetResult () from /home/pg/pgsql/lib-32/libpq.so.5 > #6 0xf770c89c in PQexecFinish () from /home/pg/pgsql/lib-32/libpq.so.5 What about a stack trace from the connected server process? libpq clearly thinks it's waiting for a message from the server, but I wonder what the server thinks. Also, what connection status does netstat show on each side? regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
R: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function
You wrote: > I also have a client suffering an occasional 'application hang' running Suse > 11.2 and postgressql 8.4 > on an 8 core box which is not reproducable in a VMWare test environment. > Access to postgres is libpq 127.0.0.1 as well. > Unfortunately the client must restart ASAP and I have not produced a 'test > case'. But you examined the stack ? It's similar to mine ? #0 0xe410 in __kernel_vsyscall () No symbol table info available. #1 0xf76539cb in poll () from /lib/libc.so.. #2 0xf770d39a in pqSocketCheck () from /home/pg/pgsql/lib-32/libpq.so.5 #3 0xf770d49d in pqWaitTimed () from /home/pg/pgsql/lib-32/libpq.so.5 #4 0xf770d513 in pqWait () from /home/pg/pgsql/lib-32/libpq.so.5 #5 0xf770c6d6 in PQgetResult () from /home/pg/pgsql/lib-32/libpq.so.5 #6 0xf770c89c in PQexecFinish () from /home/pg/pgsql/lib-32/libpq.so.5 Can you specify the details of hardware and platform of your machine to understand if it can have something in common with the mine and so to understand the reason/origin of the bug? Thanks. Andrea -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
R: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function
This is the output of "bt full" command in gdb of my test program. In this case the libpqs was not compiled in debug-mode. (gdb) bt full #0 0xe410 in __kernel_vsyscall () No symbol table info available. #1 0xf76539cb in poll () from /lib/libc.so.6 No symbol table info available. #2 0xf770d39a in pqSocketCheck () from /home/pg/pgsql/lib-32/libpq.so.5 No symbol table info available. #3 0xf770d49d in pqWaitTimed () from /home/pg/pgsql/lib-32/libpq.so.5 No symbol table info available. #4 0xf770d513 in pqWait () from /home/pg/pgsql/lib-32/libpq.so.5 No symbol table info available. #5 0xf770c6d6 in PQgetResult () from /home/pg/pgsql/lib-32/libpq.so.5 No symbol table info available. #6 0xf770c89c in PQexecFinish () from /home/pg/pgsql/lib-32/libpq.so.5 No symbol table info available. #7 0x08048c3f in read_rigpia () No symbol table info available. #8 0x08048ae9 in main () Here below I add the complete stack of my business application (blocked also it) if it can be useful. In this case, the libpq was compiled in debug-mode and we can see the value parameter of function and the lines numbers of code. Consider that the stack until PQexecFinish is the same as my test program. (gdb) bt full #0 0xe410 in __kernel_vsyscall () No symbol table info available. #1 0xf6cdb9cb in poll () from /lib/libc.so.6 No symbol table info available. #2 0xf766a39a in pqSocketPoll (conn=0x90e0838, forRead=1, forWrite=0, end_time=-1) at fe-misc.c:1082 No locals. #3 pqSocketCheck (conn=0x90e0838, forRead=1, forWrite=0, end_time=-1) at fe-misc.c:1024 result = -1 #4 0xf766a49d in pqWaitTimed (forRead=1, forWrite=0, conn=0x90e0838, finish_time=-1) at fe-misc.c:956 result = #5 0xf766a513 in pqWait (forRead=1, forWrite=0, conn=0x90e0838) at fe-misc.c:939 No locals. #6 0xf76696d6 in PQgetResult (conn=0x90e0838) at fe-exec.c:1554 flushResult = 1 res = 0x0 #7 0xf766989c in PQexecFinish (conn=0x90e0838) at fe-exec.c:1807 result = 0x23 lastResult = 0x0 #8 0xf767c3ec in pos_fetch (cur_dta=0x9485c80) at possup.c:930 cmd = "FETCH 100 IN cur038_00063", '\000' , "Þh\031\b\230\021&\bl_R\t(\tÜÿm\216\027\bø¸\016\t", '\000' , "ø¸\016\t\000\000\000\000\020\000\000\000\230\021&\bl_R\tи\016\tX\tÜÿ/\221\027\bl_R\tи\016\t\001\000\000" res = 0x0 bind = 0x0 buf = 0x0 colinfo = 0x0 colnum = 136712600 len = 156393324 type = 1 row = -2356856 null = 135921165 #9 0xf767b147 in dm_possup (request=35) at possup.c:216 retcode = 135268645 l = 156393324 eliminata = 0 #10 0x081076f3 in dm_call_fnc () No symbol table info available. #11 0x080fda3d in dm_do_a_fetch () No symbol table info available. #12 0x080fd913 in dm_fetch () No symbol table info available. #13 0x08102974 in dm_execute () No symbol table info available. #14 0x080f96de in execute_cursor () No symbol table info available. #15 0x080f8556 in dm_do_dbms () No symbol table info available. #16 0x080ff22f in dm_call () No symbol table info available. #17 0x080f7edd in dm_dbms () No symbol table info available. #18 0xf76a655e in dm_dbms_drv ( command=0xffdc0fa0 "with cursor cur038_00063 execute ") at r_sqlutifunc.c:1090 No locals. #19 0xf76ba4f6 in fetchCursorDb (curName=0xffdc1050 "cur038_00063") at sqlPanth.c:895 buffer = "with cursor cur038_00063 execute ", '\000' , "à*Öö\000\000\000\000\000\000\000\000ô/ÖöÇ¥l÷X¢Üÿ\030\020Üÿ¾\220ÆöP\020Üÿ\r¶l÷(\020Üÿ(\020Üÿ\004N<\bØ\020Üÿ+ k÷" app = 0x0 retcode = 0 command = 0xf76cb6e4 "execute" using_app = '\000' #20 0xf76ba03d in fetchCursor (f0_file=38, curName=0xf76cbb40 "") at sqlPanth.c:759 cursor = "cur038_00063", '\000' app = 0x0 retcode = 0 #21 0xf76bc020 in sqlRead (f0_file=38, w_dat=0xffdcc23c "", mode=7) at sqlRead.c:109 msg = "\000\000\000\000\000\000\000\000\\^Íö\000\000\000\000\035\000\000\000\000\000\000\b\000\000\000\000^\002Ýÿ\000\000\000\000ph\021\th\221Üÿ\022íj÷\224.o÷à*Öö«·\005\b" s_where = " \"cdart\" = '50110725' ", '\000' tslock = 0 tpOrd = 68 id_rec = 0 Failed = 0 failed_lock = 0 old_w_dat = 0x0 init_col = 0 ret = 0 #22 0xf7699159 in ISREAD (f0_file=38, w_dat=0xffdcc23c "", mode=7) at r_dbswsql7.c:75 ret = -2321976 environ_save = '\000' #23 0x0807f6a4 in cal_prodat () No symbol table info available. #24 0x08057650 in read_mrp () at /home/uwrk/pgsai/WRKUNX/g_mrprun.c:465 i = 0 idx = 0 dub = 4.8873862481069038e-313 dub1 = -1.209991882770505e+266 RFPO = {id = -153734240, cdart = "\027\000\000\000¨ÈÜÿéïÄöàÈÜÿ\000", descr = "\000\000\n\000\000\000\000\000\000\000HÉÜÿ$\\\a\bàÈÜÿ\004:9\b\031\000\000", bkini = 0, bkfin = 0, stato = 85 'U', cdpeg = '\000' , grpeg = "\000\000\000\000\000\000\000\000\000\065\000\000\000\000\000\000\000\000\000", tscon = 0 '\000', fillc = "\000\000\000", qtfan = 0, qtpro = 0, bkpeg = 0, lnuti = 0, anpia = "\000",
Re: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function
Excerpts from Andrea Grassi's message of mar dic 20 06:01:55 -0300 2011: > Sorry if I insist, but now I have the case at hand (my test program is now > blocked), so I can check and verify all what you want. > I would like to know if it can be a libpq bug or if you think the fault is > due to a system bug or to a machine issue and in this case I would be > grateful if you could give me a hint on what could be. Please attach GDB to the stuck process (gdb -p `pidof testprogram`) and grab a backtrace (bt full). -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function
I also have a client suffering an occasional 'application hang' running Suse 11.2 and postgressql 8.4 on an 8 core box which is not reproducable in a VMWare test environment. Access to postgres is libpq 127.0.0.1 as well. Unfortunately the client must restart ASAP and I have not produced a 'test case'. On 12/20/2011 1:01 AM, Andrea Grassi wrote: Sorry if I insist, but now I have the case at hand (my test program is now blocked), so I can check and verify all what you want. I would like to know if it can be a libpq bug or if you think the fault is due to a system bug or to a machine issue and in this case I would be grateful if you could give me a hint on what could be. Regards, Andrea -Messaggio originale- Da: Craig Ringer [mailto:ring...@ringerc.id.au] Inviato: sabato 17 dicembre 2011 7.19 A: Andrea Grassi Cc: pgsql-bugs@postgresql.org Oggetto: Re: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function On 16/12/2011 10:10 PM, Andrea Grassi wrote: The client program and the postgres server are on the same host, client connects to 127.0.0.1. In the meantime, my original program blocks (not my example but very probably the reasons are the same). I typed "ps -C testprogramname -o wchan:80=" and the output was only a single dash ( "-" ). That means it's not waiting in a kernel call right now. Was the program in the hung state you've observed at the time you ran the command? Its output would only be interesting when it's hung. I searched for the complete stack in /proc/$pid/stack (where $pid) was the pid of my process but this file doesn't exists !! Why ? Old kernel, maybe? You're running on some kind of enterprise-y distro, so who knows how ancient half the stuff in there is. -- Craig Ringer -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function
Sorry if I insist, but now I have the case at hand (my test program is now blocked), so I can check and verify all what you want. I would like to know if it can be a libpq bug or if you think the fault is due to a system bug or to a machine issue and in this case I would be grateful if you could give me a hint on what could be. Regards, Andrea -Messaggio originale- Da: Craig Ringer [mailto:ring...@ringerc.id.au] Inviato: sabato 17 dicembre 2011 7.19 A: Andrea Grassi Cc: pgsql-bugs@postgresql.org Oggetto: Re: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function On 16/12/2011 10:10 PM, Andrea Grassi wrote: > The client program and the postgres server are on the same host, client > connects to 127.0.0.1. > In the meantime, my original program blocks (not my example but very probably > the reasons are the same). > > I typed "ps -C testprogramname -o wchan:80=" and the output was only a single > dash ( "-" ). That means it's not waiting in a kernel call right now. Was the program in the hung state you've observed at the time you ran the command? Its output would only be interesting when it's hung. > I searched for the complete stack in /proc/$pid/stack (where $pid) was the > pid of my process but this file doesn't exists !! Why ? Old kernel, maybe? You're running on some kind of enterprise-y distro, so who knows how ancient half the stuff in there is. -- Craig Ringer -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs