Re: [BUGS] [PATCH v2] Use CC atomic builtins as a fallback

2011-12-20 Thread Tom Lane
Martin Pitt  writes:
> The updated patch only uses the gcc builtins if there is no explicit
> implementation, but drops the arm one as this doesn't work on ARMv7
> and newer, as stated in the original mail.

Getting this thread back to the original patch ... I'm afraid that if we
apply this as-is, what will happen is that we fix ARMv7 and break older
versions.  Some googling found this, for instance:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33413

which suggests that (1) ARM gcc hasn't had __sync_lock_test_and_set for
very long, and (2) what it generates doesn't work pre-ARMv6.

So I'm thinking that removing the swpb ASM option is not such a good
idea.  We could possibly test for __sync_lock_test_and_set first, and
only use swpb if we're on ARM and don't have the builtin.

Another thing that is bothering me is that according to the gcc manual,
eg here,
http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
__sync_lock_test_and_set is nominally provided for datatypes 1, 2, 4,
or 8 bytes in length, but the underlying hardware doesn't necessarily
support all those widths natively.  If you pick the wrong width then
you don't get an inline operation at all, but a call to some possibly
inefficient library subroutine.  I see that your patch just assumes that
"int" will be a good width for the lock type, but it's unclear to me
what that choice is based on and whether or not it might be a really bad
choice on some platforms.  A look through s_lock.h suggests that only a
minority of platforms prefer int-width locks ... but I have no idea
how many of those assembly snippets could have been coded to use a
different lock datatype without penalty.  Some other evidence that
4-byte __sync_lock_test_and_set isn't universal is here:
https://svn.boost.org/trac/boost/ticket/2525

Google is also finding some rather worrisome suggestions that
__sync_lock_test_and_set might involve a kernel call on some flavors of
ARM.  That would be pretty disastrous from a performance standpoint.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function

2011-12-20 Thread Craig Ringer

On 19/12/2011 11:14 PM, Andrea Grassi wrote:

Hi, Craig
Now my process is blocked and I have the case in my hands.
Do you have something to ask me in order to have more details ?


As I tend to agree with Tom re this being a kernel issue, try (as root):

# Enable stack dumps etc via sysrq
echo 8 > /proc/sys/kernel/sysrq
# Trigger kernel stack dump of all processes via sysrq mechanism
echo t > /proc/sysrq-trigger

... then search the kernel log files to find the kernel stack dump 
associated with your test program.


If you're not on the latest kernel for your OS, you should update it.

--
Craig Ringer

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: R: R: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function

2011-12-20 Thread Craig Ringer

On 21/12/2011 1:42 AM, Tom Lane wrote:

Hrm.  What's with the 48 bytes in the client's receive queue?  Surely
the kernel should be reporting that the socket is read-ready, if it's
got some data.  I think you've found an obscure kernel bug  somehow
it's failing to wake the poll() caller.

I've been leaning that way too; that's why I was asking him for 
/proc/$pid/stack and `wchan -C programname -o wchan:80=` output - to get 
some idea of what function in the kernel it's sitting in.


Unfortunately the OP is on some enterprise distro that doesn't have 
/proc/$pid/stack . wchan info would still be useful. I wonder how old 
their kernel is? The bug could've already been fixed. /proc/pid/stack 
has been around since 2008 so it must be pretty elderly.


OP: You can also get a kernel stack for a process by enabling the magic 
SysRQ key (see Google) then using Alt-SysRq-T . This requires a physical 
keyboard directly connected to the server. It emits the stack 
information via dmesg. See:


http://en.wikipedia.org/wiki/Magic_SysRq_key

There's a "sysrqd" that apparently lets you use these features remotely, 
but I've never tried it.


--
Craig Ringer

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Security definer "generated column" function used in index

2011-12-20 Thread Kevin Grittner
Tom Lane  wrote:
 
> On reflection what seems most likely is simply that turning these
> otherwise-inlineable SQL functions into SECURITY DEFINER disabled
> inline-ing them, resulting in catastrophic degradation of the
> generated plans, such that they took a lot longer than you were
> accustomed to (they shouldn't have been "hung" though).
 
Ah, I had not considered that.  That also explains why my attempts
to recreate the situation with "toy" tables didn't show the issue.
Also, it didn't occur to me until later to check whether a continue
and another backtrace showed things moving; all the evidence
suggested (in retrospect) that it was "doing something" rather than
being blocked, per se; but these are normally sub-second queries
which were killed after running over an hour, so I (probably
wrongly) assumed they were in an endless loop.
 
I will try again in just one site with a bit more care about which
functions I flag.  If that goes OK, I'll have the confidence to go
forward with the application release.
 
Thanks!
 
-Kevin

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Security definer "generated column" function used in index

2011-12-20 Thread Tom Lane
"Kevin Grittner"  writes:
> ... It wasn't even clear to me that it was
> OK to have one security definer function call another, based on the
> code comment I quoted, so I didn't want to spend more hours on
> attempting to create a test case if it simply wasn't supported.
 
Yes, that's definitely *supposed* to work, though I'll grant that there
could be bugs there.  It's hard to see how it'd be a race condition
though.

On reflection what seems most likely is simply that turning these
otherwise-inlineable SQL functions into SECURITY DEFINER disabled
inline-ing them, resulting in catastrophic degradation of the generated
plans, such that they took a lot longer than you were accustomed to
(they shouldn't have been "hung" though).

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Security definer "generated column" function used in index

2011-12-20 Thread Kevin Grittner
Tom Lane  wrote:
> "Kevin Grittner"  writes:
>> No comments on this?
> 
> If there was a reproducible test case in your original message,
> I didn't see it, so I assumed you intended to investigate further
> on your own.  It wasn't even clear to me that this was a Postgres
> bug rather than some error in your trigger logic.
 
Sorry if my first post wasn't clear.  It was happening on SELECT
statements; no triggers involved.  (I had *intended* just to get
trigger functions, but had accidentally included some others.)
 
I wasn't able to create a small, self-contained test case with a few
hours of attempts, so I was hoping someone could suggest (from the
stack traces and other clues) how best to attempt that or what other
information might be useful.  It wasn't even clear to me that it was
OK to have one security definer function call another, based on the
code comment I quoted, so I didn't want to spend more hours on
attempting to create a test case if it simply wasn't supported.
 
Sad to say, the script which flagged the functions as security
definer didn't cause problems in normal testing, and were deployed
to production (in advance of a software release which will need the
expanded permissions), where the problem surfaced under user load. 
The fact that the larger number of concurrent users hit the problem
where my test scripts haven't suggests some race condition, so even
if I create it here, it will probably be something where I need to
know what information to capture while it is happening.
 
We only need to add the security definer flag on trigger functions
at this point for the upcoming application release, but I'm not yet
confident that this is safe.
 
-Kevin

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Security definer "generated column" function used in index

2011-12-20 Thread Tom Lane
"Kevin Grittner"  writes:
> No comments on this?

If there was a reproducible test case in your original message,
I didn't see it, so I assumed you intended to investigate further
on your own.  It wasn't even clear to me that this was a Postgres
bug rather than some error in your trigger logic.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Security definer "generated column" function used in index

2011-12-20 Thread Kevin Grittner
No comments on this?  It seems to me that at a minimum this needs
better documentation of a limitation, and the conditions under which
you hit the problem.  I'm not sure there isn't an outright bug here.
 We would like to flag all of our trigger functions as SECURITY
DEFINER, but there are triggers which do DML which can fire other
triggers, and at this point I'm not sure whether that's safe.
 
Anyone?
 
-Kevin

 
On 2011-12-09 12:49 PM I wrote:

PostgreSQL version 9.0.4, 64 bit.
Linux version 2.6.16.60-0.39.3-smp (geeko@buildhost)
  (gcc version 4.1.2 20070115 (SUSE Linux))
  #1 SMP Mon May 11 11:46:34 UTC 2009
SUSE Linux Enterprise Server 10 (x86_64)
VERSION = 10
PATCHLEVEL = 2
 
We flagged some functions as SECURITY DEFINER and had queries which
had been in use for months suddenly fail to complete.  We set them
back to SECURITY INVOKER and things returned to normal.  I took
stack traces of the four connections with queries which seemed to be
"stuck".  They all had this sequence of calls in the middle:
 
#13 0x0054a5b6
  in fmgr_sql (fcinfo=0x7fff9eec0b30)
  at functions.c:441
#14 0x006a2f05
  in fmgr_security_definer (fcinfo=0x30006)
  at fmgr.c:957
#15 0x00544047
  in ExecMakeFunctionResult (fcache=0x512ac70, econtext=0x512aa80,
 isNull=0x512b8a8 "", isDone=0x512b9c0)
  at execQual.c:1827
#16 0x00540fb7
  in ExecProject (projInfo=,
  isDone=0x7fff9eec101c)
  at execQual.c:5089
#17 0x00555403
  in ExecResult (node=0x512a970)
  at nodeResult.c:155
#18 0x005409a6
  in ExecProcNode (node=0x512a970)
  at execProcnode.c:355
#19 0x0053f891
  in standard_ExecutorRun (queryDesc=0x2b4121b11280,
   direction=-1628699568, count=1)
  at execMain.c:1188
#20 0x0054a656
  in fmgr_sql (fcinfo=0x7fff9eec1310)
  at functions.c:475
#21 0x006a2f05
  in fmgr_security_definer (fcinfo=0x30006)
  at fmgr.c:957
#22 0x00545ef0
  in ExecMakeFunctionResultNoSets (fcache=0x2b4121aa2b98,
   econtext=0x2b4121aa1798,
   isNull=0x7fff9eec1a90 "",
   isDone=)
  at execQual.c:1894
#23 0x00545e6c
  in ExecMakeFunctionResultNoSets (fcache=0x2b4121aa2358,
   econtext=0x2b4121aa1798,
   isNull=0x7fff9eec1b8f "",
   isDone=)
  at execQual.c:1866
#24 0x00545f8f
  in ExecQual (qual=, econtext=0x2b4121aa1798,
   resultForNull=0 '\0')
  at execQual.c:4991
#25 0x005476ef
  in ExecScan (node=0x2b4121aa1688, accessMtd=0x5511a0 ,
   recheckMtd=0x551150 )
  at execScan.c:192
#26 0x005409ea
  in ExecProcNode (node=0x2b4121aa1688)
  at execProcnode.c:382
#27 0x00554935
  in ExecNestLoop (node=0x2b4121a9dea0)
  at nodeNestloop.c:154
#28 0x00540a6a
  in ExecProcNode (node=0x2b4121a9dea0)
  at execProcnode.c:419
 
Full (unedited) stack traces for all four attached.
 
Notice the recursive calls to fmgr_security_definer().  I wonder
whether that might be a problem, since the comment for that function
says:
 
| This is not re-entrant, but then the fcinfo itself can't be used
| re-entrantly anyway.
 
All of these queries are similar, and involved searches using a LIKE
clause against a "searchName" "generated column" -- a function
taking the record type of the table as its parameter.  That function
then calls a function which takes several parameters,  Both
functions were changed to SECURITY DEFINER when the problems
started.
 
The functions are:
 
CREATE OR REPLACE FUNCTION "searchName"(rec "Party")
 RETURNS "SearchNameT"
 LANGUAGE sql
 IMMUTABLE
AS $$
  select "searchName"($1."nameL", $1."nameF", $1."nameM",
  $1."suffix");
$$;

CREATE OR REPLACE FUNCTION "searchName"("nameL" "LastNameT",
"nameF" "FirstNameT",
"nameM" "MiddleNameT",
"suffix" "NameSuffixT")
 RETURNS "SearchNameT"
 LANGUAGE sql
 IMMUTABLE
AS $$
select regexp_replace(upper(
  $1
  || case when $2 is not null or $3 is not null or $4 is not null
   then ',' || coalesce($2, '') || coalesce($3, '')
 || coalesce($4, '')
   else ''
 end),
 '[^A-Z0-9\,]', '', 'g')::"SearchNameT"
$$
 
And there is an index on "Party":
 
  "Party_SearchName" btree ("searchName"("Party".*))
 
First off, is there much chance that this is fixed between 9.0.4 and
9.0.6?  If not, what do people feel would be the most useful
information for diagnosing the problem?
 
-Kevin



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Incorrect comment in heapam.c

2011-12-20 Thread Tom Lane
Simon Riggs  writes:
> On Tue, Dec 20, 2011 at 5:50 PM, Peter Geoghegan  
> wrote:
>> In fact, that macro is defined in access/htup.h...should it be?

> IMHO comment is wrong, code is in the right place.

It used to be in heapam.h ... evidently, whoever moved it missed this
comment.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Incorrect comment in heapam.c

2011-12-20 Thread Simon Riggs
On Tue, Dec 20, 2011 at 5:50 PM, Peter Geoghegan  wrote:
> Line 834 of heapam.c has the following comment:
>
> /*
>  * This is formatted so oddly so that the correspondence to the macro
>  * definition in access/heapam.h is maintained.
>  */
>
> In fact, that macro is defined in access/htup.h...should it be?

IMHO comment is wrong, code is in the right place.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] Incorrect comment in heapam.c

2011-12-20 Thread Peter Geoghegan
Line 834 of heapam.c has the following comment:

/*
 * This is formatted so oddly so that the correspondence to the macro
 * definition in access/heapam.h is maintained.
 */

In fact, that macro is defined in access/htup.h...should it be?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: R: R: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function

2011-12-20 Thread Tom Lane
"Andrea Grassi"  writes:
> This is the server side stack kept by gdb:
> [ server is waiting to receive something from client ]

> The netstat command on client and server connection has this output:
> The first line should be the server, the second the client.

> Proto Recv-Q Send-Q Local Address   Foreign Address State
> PID/Program name
> tcp0  0 127.0.0.1:5432  127.0.0.1:53129
> ESTABLISHED -
> tcp   48  0 127.0.0.1:53129 127.0.0.1:5432
> ESTABLISHED 29802/g_mrprun.e

Hrm.  What's with the 48 bytes in the client's receive queue?  Surely
the kernel should be reporting that the socket is read-ready, if it's
got some data.  I think you've found an obscure kernel bug  somehow
it's failing to wake the poll() caller.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


R: R: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function

2011-12-20 Thread Andrea Grassi
This is the server side stack kept by gdb:

(gdb) bt full
#0  0x2b6488588ae5 in recv () from /lib64/libc.so.6
No symbol table info available.
#1  0x00550cd2 in secure_read ()
No symbol table info available.
#2  0x005563a4 in pq_recvbuf ()
No symbol table info available.
#3  0x005567a7 in pq_getbyte ()
No symbol table info available.
#4  0x005d33e6 in PostgresMain ()
No symbol table info available.
#5  0x005a9708 in ServerLoop ()
No symbol table info available.
#6  0x005aa2b7 in PostmasterMain ()
No symbol table info available.
#7  0x005580be in main ()
No symbol table info available.

The netstat command on client and server connection has this output:
The first line should be the server, the second the client.

Proto Recv-Q Send-Q Local Address   Foreign Address State
PID/Program name
tcp0  0 127.0.0.1:5432  127.0.0.1:53129
ESTABLISHED -
tcp   48  0 127.0.0.1:53129 127.0.0.1:5432
ESTABLISHED 29802/g_mrprun.e


Regards, Andrea

-Messaggio originale-
Da: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Inviato: martedì 20 dicembre 2011 17.38
A: Andrea Grassi
Cc: harry...@comcast.net; 'Craig Ringer'; 'Pg Bugs'; 'Alvaro Herrera'
Oggetto: Re: R: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll"
function 

"Andrea Grassi"  writes:
> #0  0xe410 in __kernel_vsyscall ()
> No symbol table info available.
> #1  0xf76539cb in poll () from /lib/libc.so..
> #2  0xf770d39a in pqSocketCheck () from /home/pg/pgsql/lib-32/libpq.so.5 
> #3  0xf770d49d in pqWaitTimed () from /home/pg/pgsql/lib-32/libpq.so.5 
> #4  0xf770d513 in pqWait () from /home/pg/pgsql/lib-32/libpq.so.5 
> #5  0xf770c6d6 in PQgetResult () from /home/pg/pgsql/lib-32/libpq.so.5 
> #6  0xf770c89c in PQexecFinish () from /home/pg/pgsql/lib-32/libpq.so.5 

What about a stack trace from the connected server process?  libpq
clearly thinks it's waiting for a message from the server, but I wonder
what the server thinks.  Also, what connection status does netstat
show on each side?

regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: R: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function

2011-12-20 Thread Tom Lane
"Andrea Grassi"  writes:
> #0  0xe410 in __kernel_vsyscall ()
> No symbol table info available.
> #1  0xf76539cb in poll () from /lib/libc.so..
> #2  0xf770d39a in pqSocketCheck () from /home/pg/pgsql/lib-32/libpq.so.5 
> #3  0xf770d49d in pqWaitTimed () from /home/pg/pgsql/lib-32/libpq.so.5 
> #4  0xf770d513 in pqWait () from /home/pg/pgsql/lib-32/libpq.so.5 
> #5  0xf770c6d6 in PQgetResult () from /home/pg/pgsql/lib-32/libpq.so.5 
> #6  0xf770c89c in PQexecFinish () from /home/pg/pgsql/lib-32/libpq.so.5 

What about a stack trace from the connected server process?  libpq
clearly thinks it's waiting for a message from the server, but I wonder
what the server thinks.  Also, what connection status does netstat
show on each side?

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


R: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function

2011-12-20 Thread Andrea Grassi
You wrote:

> I also have a client suffering an occasional 'application hang' running Suse 
> 11.2 and postgressql 8.4 
> on an 8 core box which is not reproducable in a VMWare test environment. 
> Access to postgres is libpq 127.0.0.1 as well. 
> Unfortunately the client must restart ASAP and I have not produced a 'test 
> case'.

But you examined the stack ? It's similar to mine ?

#0  0xe410 in __kernel_vsyscall ()
No symbol table info available.
#1  0xf76539cb in poll () from /lib/libc.so..
#2  0xf770d39a in pqSocketCheck () from /home/pg/pgsql/lib-32/libpq.so.5 
#3  0xf770d49d in pqWaitTimed () from /home/pg/pgsql/lib-32/libpq.so.5 
#4  0xf770d513 in pqWait () from /home/pg/pgsql/lib-32/libpq.so.5 
#5  0xf770c6d6 in PQgetResult () from /home/pg/pgsql/lib-32/libpq.so.5 
#6  0xf770c89c in PQexecFinish () from /home/pg/pgsql/lib-32/libpq.so.5 

Can you specify the details of hardware and platform of your machine to 
understand if it can have something in common with the mine and so to 
understand the reason/origin of the bug? 
Thanks.

Andrea



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


R: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function

2011-12-20 Thread Andrea Grassi
This is the output of "bt full" command in gdb of my test program. 
In this case the libpqs was not compiled in debug-mode. 

(gdb) bt full
#0  0xe410 in __kernel_vsyscall ()
No symbol table info available.
#1  0xf76539cb in poll () from /lib/libc.so.6
No symbol table info available.
#2  0xf770d39a in pqSocketCheck () from /home/pg/pgsql/lib-32/libpq.so.5
No symbol table info available.
#3  0xf770d49d in pqWaitTimed () from /home/pg/pgsql/lib-32/libpq.so.5
No symbol table info available.
#4  0xf770d513 in pqWait () from /home/pg/pgsql/lib-32/libpq.so.5
No symbol table info available.
#5  0xf770c6d6 in PQgetResult () from /home/pg/pgsql/lib-32/libpq.so.5
No symbol table info available.
#6  0xf770c89c in PQexecFinish () from /home/pg/pgsql/lib-32/libpq.so.5
No symbol table info available.
#7  0x08048c3f in read_rigpia ()
No symbol table info available.
#8  0x08048ae9 in main ()

Here below I add the complete stack of my business application (blocked also 
it) if it can be useful.
In this case, the libpq was compiled in debug-mode and we can see the value 
parameter of function and the lines numbers of code.
Consider that the stack until PQexecFinish is the same as my test program.

(gdb) bt full
#0  0xe410 in __kernel_vsyscall ()
No symbol table info available.
#1  0xf6cdb9cb in poll () from /lib/libc.so.6
No symbol table info available.
#2  0xf766a39a in pqSocketPoll (conn=0x90e0838, forRead=1, forWrite=0, 
end_time=-1) at fe-misc.c:1082
No locals.
#3  pqSocketCheck (conn=0x90e0838, forRead=1, forWrite=0, end_time=-1)
at fe-misc.c:1024
result = -1
#4  0xf766a49d in pqWaitTimed (forRead=1, forWrite=0, conn=0x90e0838, 
finish_time=-1) at fe-misc.c:956
result = 
#5  0xf766a513 in pqWait (forRead=1, forWrite=0, conn=0x90e0838)
at fe-misc.c:939
No locals.
#6  0xf76696d6 in PQgetResult (conn=0x90e0838) at fe-exec.c:1554
flushResult = 1
res = 0x0
#7  0xf766989c in PQexecFinish (conn=0x90e0838) at fe-exec.c:1807
result = 0x23
lastResult = 0x0
#8  0xf767c3ec in pos_fetch (cur_dta=0x9485c80) at possup.c:930
   cmd = "FETCH 100 IN cur038_00063", '\000' , 
"Þh\031\b\230\021&\bl_R\t(\tÜÿm\216\027\bø¸\016\t", '\000' , 
"ø¸\016\t\000\000\000\000\020\000\000\000\230\021&\bl_R\tи\016\tX\tÜÿ/\221\027\bl_R\tи\016\t\001\000\000"
   res = 0x0
   bind = 0x0
   buf = 0x0
   colinfo = 0x0
   colnum = 136712600
   len = 156393324
   type = 1
   row = -2356856
   null = 135921165
#9  0xf767b147 in dm_possup (request=35) at possup.c:216
   retcode = 135268645
   l = 156393324
   eliminata = 0
#10 0x081076f3 in dm_call_fnc ()
   No symbol table info available.
#11 0x080fda3d in dm_do_a_fetch ()
   No symbol table info available.
#12 0x080fd913 in dm_fetch ()
   No symbol table info available.
#13 0x08102974 in dm_execute ()
   No symbol table info available.
#14 0x080f96de in execute_cursor ()
   No symbol table info available.
#15 0x080f8556 in dm_do_dbms ()
   No symbol table info available.
#16 0x080ff22f in dm_call ()
   No symbol table info available.
#17 0x080f7edd in dm_dbms ()
   No symbol table info available.
#18 0xf76a655e in dm_dbms_drv (
command=0xffdc0fa0 "with cursor cur038_00063 execute ")
at r_sqlutifunc.c:1090
No locals.
#19 0xf76ba4f6 in fetchCursorDb (curName=0xffdc1050 "cur038_00063")
at sqlPanth.c:895
buffer = "with cursor cur038_00063 execute ", '\000' , 
"à*Öö\000\000\000\000\000\000\000\000ô/ÖöÇ¥l÷X¢Üÿ\030\020Üÿ¾\220ÆöP\020Üÿ\r¶l÷(\020Üÿ(\020Üÿ\004N<\bØ\020Üÿ+
 k÷"
app = 0x0
retcode = 0
command = 0xf76cb6e4 "execute"
using_app = '\000' 
#20 0xf76ba03d in fetchCursor (f0_file=38, curName=0xf76cbb40 "")
at sqlPanth.c:759
cursor = "cur038_00063", '\000' 
app = 0x0
retcode = 0
#21 0xf76bc020 in sqlRead (f0_file=38, w_dat=0xffdcc23c "", mode=7)
at sqlRead.c:109
msg = 
"\000\000\000\000\000\000\000\000\\^Íö\000\000\000\000\035\000\000\000\000\000\000\b\000\000\000\000^\002Ýÿ\000\000\000\000ph\021\th\221Üÿ\022íj÷\224.o÷à*Öö«·\005\b"
s_where = " \"cdart\" = '50110725'  ", '\000' 
tslock = 0
tpOrd = 68
id_rec = 0
Failed = 0
failed_lock = 0
old_w_dat = 0x0
init_col = 0
ret = 0
#22 0xf7699159 in ISREAD (f0_file=38, w_dat=0xffdcc23c "", mode=7)
at r_dbswsql7.c:75
ret = -2321976
environ_save = '\000' 
#23 0x0807f6a4 in cal_prodat ()
No symbol table info available.
#24 0x08057650 in read_mrp () at /home/uwrk/pgsai/WRKUNX/g_mrprun.c:465
i = 0
idx = 0
dub = 4.8873862481069038e-313
dub1 = -1.209991882770505e+266
RFPO = {id = -153734240, cdart = "\027\000\000\000¨ÈÜÿéïÄöàÈÜÿ\000", 
descr = 
"\000\000\n\000\000\000\000\000\000\000HÉÜÿ$\\\a\bàÈÜÿ\004:9\b\031\000\000", 
bkini = 0, bkfin = 0, stato = 85 'U', 
cdpeg = '\000' , 
grpeg = 
"\000\000\000\000\000\000\000\000\000\065\000\000\000\000\000\000\000\000\000", 
tscon = 0 '\000', fillc = "\000\000\000", qtfan = 0, 
qtpro = 0, bkpeg = 0, lnuti = 0, anpia = "\000", 

Re: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function

2011-12-20 Thread Alvaro Herrera

Excerpts from Andrea Grassi's message of mar dic 20 06:01:55 -0300 2011:
> Sorry if I insist, but now I have the case at hand (my test program is now 
> blocked), so I can check and verify all what you want.
> I would like to know if it can be a libpq bug or if you think the fault is 
> due to a system bug or to a machine issue and in this case I would be 
> grateful if you could give me a hint on what could be.

Please attach GDB to the stuck process (gdb -p `pidof testprogram`) and
grab a backtrace (bt full).

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function

2011-12-20 Thread Harry Rossignol
I also have a client suffering an occasional 'application hang' running 
Suse 11.2 and postgressql 8.4 on an 8 core box which is not reproducable 
in a VMWare test environment.
Access to postgres is libpq 127.0.0.1 as well. Unfortunately the client 
must restart ASAP and I have not produced a 'test case'.



On 12/20/2011 1:01 AM, Andrea Grassi wrote:

Sorry if I insist, but now I have the case at hand (my test program is now 
blocked), so I can check and verify all what you want.
I would like to know if it can be a libpq bug or if you think the fault is due 
to a system bug or to a machine issue and in this case I would be grateful if 
you could give me a hint on what could be.

Regards, Andrea

-Messaggio originale-
Da: Craig Ringer [mailto:ring...@ringerc.id.au]
Inviato: sabato 17 dicembre 2011 7.19
A: Andrea Grassi
Cc: pgsql-bugs@postgresql.org
Oggetto: Re: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function

On 16/12/2011 10:10 PM, Andrea Grassi wrote:

The client program and the postgres server are on the same host, client 
connects to 127.0.0.1.
In the meantime, my original program blocks (not my example but very probably 
the reasons are the same).

I typed "ps -C testprogramname -o wchan:80=" and the output was only a single dash ( 
"-" ).

That means it's not waiting in a kernel call right now. Was the program
in the hung state you've observed at the time you ran the command? Its
output would only be interesting when it's hung.

I searched for the complete stack in /proc/$pid/stack (where $pid) was the pid 
of my process but this file doesn't exists !! Why ?

Old kernel, maybe? You're running on some kind of enterprise-y distro,
so who knows how ancient half the stuff in there is.

--
Craig Ringer





--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


R: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function

2011-12-20 Thread Andrea Grassi
Sorry if I insist, but now I have the case at hand (my test program is now 
blocked), so I can check and verify all what you want.
I would like to know if it can be a libpq bug or if you think the fault is due 
to a system bug or to a machine issue and in this case I would be grateful if 
you could give me a hint on what could be.

Regards, Andrea

-Messaggio originale-
Da: Craig Ringer [mailto:ring...@ringerc.id.au] 
Inviato: sabato 17 dicembre 2011 7.19
A: Andrea Grassi
Cc: pgsql-bugs@postgresql.org
Oggetto: Re: R: [BUGS] BUG #6342: libpq blocks forever in "poll" function

On 16/12/2011 10:10 PM, Andrea Grassi wrote:
> The client program and the postgres server are on the same host, client 
> connects to 127.0.0.1.
> In the meantime, my original program blocks (not my example but very probably 
> the reasons are the same).
>
> I typed "ps -C testprogramname -o wchan:80=" and the output was only a single 
> dash ( "-" ).
That means it's not waiting in a kernel call right now. Was the program 
in the hung state you've observed at the time you ran the command? Its 
output would only be interesting when it's hung.
> I searched for the complete stack in /proc/$pid/stack (where $pid) was the 
> pid of my process but this file doesn't exists !! Why ?
Old kernel, maybe? You're running on some kind of enterprise-y distro, 
so who knows how ancient half the stuff in there is.

--
Craig Ringer


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs