Re: [GENERAL] Segmentation fault with core dump

2013-06-11 Thread Hiroshi Inoue

Hi,

(2013/05/09 1:39), Joshua Berry wrote:

| I'm using PG 9.1.9 with a client application using various versions of
the
| pgsqlODBC driver on Windows. Cursors are used heavily, as well as some
pretty
| heavy trigger queries on db writes which update several materialized
views.
|
| The server has 48GB RAM installed, PG is configured for 12GB shared
buffers,
| 8MB max_stack_depth, 32MB temp_buffers, and 2MB work_mem. Most of the
other
| settings are defaults.
|
| The server will seg fault from every few days to up to two weeks. Each
time
| one of the postgres server processes seg faults, the server gets
terminated by
| signal 11, restarts in recovery for up to 30 seconds, after which time it
| accepts connections as if nothing ever happened. Unfortunately all the
open
| cursors and connections are lost, so the client apps are left in a bad
state.
|
| Seg faults have also occurred with PG 8.4. ... I migrated the database
to a
| server running PG9.1 with the hopes that the problem would disappear,
but it
| has not. So now I'm starting to debug.
|
| # uname -a
| Linux [hostname] 2.6.32-358.2.1.el6.x86_64 #1 SMP Tue Mar 12 14:18:09
CDT 2013
| x86_64 x86_64 x86_64 GNU/Linux
| # cat /etc/redhat-release
| Scientific Linux release 6.3 (Carbon)
|
| # psql -U jberry
| psql (9.1.9)
| Type help for help.
|
| jberry=# select version();
|version
|
---
|  PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7
|  20120313 (Red Hat 4.4.7-3), 64-bit
| (1 row)

I've had another postmaster segfault on my production server. It appears
to be the same failure as the last one nearly a month ago, but I wanted
to post the gdb bt details in case it helps shed light on the issue.
Please let me know if anyone would like to drill into the dumped core
with greater detail. Both the OS and PG versions remain unchanged.

Kind Regards,
-Joshua


On Fri, Apr 12, 2013 at 6:12 AM, Andres Freund and...@2ndquadrant.com
mailto:and...@2ndquadrant.com wrote:

On 2013-04-10 19:06:12 -0400, Tom Lane wrote:
  I wrote:
   (Wanders away wondering just how much the regression tests exercise
   holdable cursors.)
 
  And the answer is they're not testing this code path at all,
because if
  you do
DECLARE c CURSOR WITH HOLD FOR ...
FETCH ALL FROM c;
  then the second query executes with a portal (and resource owner)
  created to execute the FETCH command, not directly on the held
portal.
 
  After a little bit of thought I'm not sure it's even possible to
  reproduce this problem with libpq, because it doesn't expose any
way to
  issue a bare protocol Execute command against a pre-existing portal.
  (I had thought psqlOBC went through libpq, but maybe it's playing
some
  games here.)
 
  Anyway, I'm thinking the appropriate fix might be like this
 
  - CurrentResourceOwner = portal-resowner;
  + if (portal-resowner)
  + CurrentResourceOwner = portal-resowner;
 
  in several places in pquery.c; that is, keep using
  TopTransactionResourceOwner if the portal doesn't have its own.
 
  A more general but probably much more invasive solution would be
to fake
  up an intermediate portal when pulling data from a held portal, to
  more closely approximate the explicit-FETCH case.

We could also allocate a new resowner for the duration of that
transaction. That would get reassigned to the transactions resowner in
PreCommit_Portals (after a slight change there).
That actually seems simple enough?


I made some changes to multi thread handling of psqlodbc driver.
It's also better to fix the crash at backend side.

I made 2 patches.
The 1st one temporarily changes CurrentResourceOwner to 
CurTransactionResourceOwner during catalog cache handling.

The 2nd one allocates a new resource owner for held portals.
Both fix the crash in my test case.

regards,
Hiroshi Inoue
diff --git a/src/backend/access/common/printtup.c b/src/backend/access/common/printtup.c
index e87e675..258a4c2 100644
--- a/src/backend/access/common/printtup.c
+++ b/src/backend/access/common/printtup.c
@@ -238,6 +238,7 @@ printtup_prepare_info(DR_printtup *myState, TupleDesc typeinfo, int numAttrs)
 {
 	int16	   *formats = myState-portal-formats;
 	int			i;
+	bool	resownerIsNull	= false;
 
 	/* get rid of any old data */
 	if (myState-myinfo)
@@ -252,6 +253,15 @@ printtup_prepare_info(DR_printtup *myState, TupleDesc typeinfo, int numAttrs)
 	myState-myinfo = (PrinttupAttrInfo *)
 		palloc0(numAttrs * sizeof(PrinttupAttrInfo));
 
+	if (NULL == CurrentResourceOwner)
+	{
+		/*
+		 *	This occurs when processing execute messages
+		 *	for holdable cursors after commit.
+		 */
+		resownerIsNull = true;
+		CurrentResourceOwner = 

Re: [GENERAL] Segmentation fault with core dump

2013-06-11 Thread Tom Lane
Hiroshi Inoue in...@tpf.co.jp writes:
 It's also better to fix the crash at backend side.

Yeah, definitely.  Do we have a self-contained test case for this?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Segmentation fault with core dump

2013-06-11 Thread Inoue, Hiroshi
(2013/06/12 0:03), Tom Lane wrote:
 Hiroshi Inoue in...@tpf.co.jp writes:
 It's also better to fix the crash at backend side.
 
 Yeah, definitely.  Do we have a self-contained test case for this?

Unfortunately no. I'm testing with a modified psqlodbc driver.
The simplest way may be as follows.

1. Implement an API function like PQexecPortal(portal name, count)
   which sends only an execute message.

2. Call the APIs as follows.

   // Connect to a database.
   conn = PQ();

   //
   // Not begin a transaction for simplicity.
   //
   // The following operation defines a holdable portal whose
   // portal name = cursor_name.
   PQexec(conn, declare cursor_name cursor with hold for select );

   // At this point the holdable cursor becomes a held portal whose
   // resowner is reset to NULL. The following function call would
   // crash the backend.
   PQexecPortal(conn, cursor_name, 1);



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Segmentation fault with core dump

2013-05-08 Thread Joshua Berry
| I'm using PG 9.1.9 with a client application using various versions of
the
| pgsqlODBC driver on Windows. Cursors are used heavily, as well as some
pretty
| heavy trigger queries on db writes which update several materialized
views.
|
| The server has 48GB RAM installed, PG is configured for 12GB shared
buffers,
| 8MB max_stack_depth, 32MB temp_buffers, and 2MB work_mem. Most of the
other
| settings are defaults.
|
| The server will seg fault from every few days to up to two weeks. Each
time
| one of the postgres server processes seg faults, the server gets
terminated by
| signal 11, restarts in recovery for up to 30 seconds, after which time it
| accepts connections as if nothing ever happened. Unfortunately all the
open
| cursors and connections are lost, so the client apps are left in a bad
state.
|
| Seg faults have also occurred with PG 8.4. ... I migrated the database to
a
| server running PG9.1 with the hopes that the problem would disappear, but
it
| has not. So now I'm starting to debug.
|
| # uname -a
| Linux [hostname] 2.6.32-358.2.1.el6.x86_64 #1 SMP Tue Mar 12 14:18:09 CDT
2013
| x86_64 x86_64 x86_64 GNU/Linux
| # cat /etc/redhat-release
| Scientific Linux release 6.3 (Carbon)
|
| # psql -U jberry
| psql (9.1.9)
| Type help for help.
|
| jberry=# select version();
|version
|
---
|  PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7
|  20120313 (Red Hat 4.4.7-3), 64-bit
| (1 row)

I've had another postmaster segfault on my production server. It appears to
be the same failure as the last one nearly a month ago, but I wanted to
post the gdb bt details in case it helps shed light on the issue. Please
let me know if anyone would like to drill into the dumped core with greater
detail. Both the OS and PG versions remain unchanged.

Kind Regards,
-Joshua


From the PG log:
2013-05-07 08:48:35 CDT %@ LOG:  server process (PID 12367) was
terminated by signal 11: Segmentation fault

From /var/log/messages
  May  7 08:48:17 active kernel: postmaster[12367]: segfault at 40 ip
00710e2e sp 7fffdcaeedf0 error 4 in postgres[40+4ea000]


(gdb) bt full
#0  ResourceOwnerEnlargeCatCacheRefs (owner=0x0) at resowner.c:605
newmax = value optimized out
#1  0x006e1382 in SearchCatCache (cache=0x25bf270, v1=value
optimized out, v2=value optimized out, v3=value optimized out,
v4=value optimized out) at catcache.c:1143
res = 1 '\001'
cur_skey = {{sk_flags = 0, sk_attno = -2, sk_strategy = 3,
sk_subtype = 0, sk_collation = 0, sk_func = {fn_addr = 0x686640 oideq,
fn_oid = 184, fn_nargs = 2, fn_strict = 1 '\001',
  fn_retset = 0 '\000', fn_stats = 2 '\002', fn_extra = 0x0,
fn_mcxt = 0x252f108, fn_expr = 0x0}, sk_argument = 20}, {sk_flags = 0,
sk_attno = 0, sk_strategy = 0, sk_subtype = 0,
sk_collation = 0, sk_func = {fn_addr = 0, fn_oid = 0, fn_nargs
= 0, fn_strict = 0 '\000', fn_retset = 0 '\000', fn_stats = 0 '\000',
fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0},
sk_argument = 0}, {sk_flags = 0, sk_attno = 0, sk_strategy = 0,
sk_subtype = 0, sk_collation = 0, sk_func = {fn_addr = 0, fn_oid = 0,
fn_nargs = 0, fn_strict = 0 '\000',
  fn_retset = 0 '\000', fn_stats = 0 '\000', fn_extra = 0x0,
fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument = 0}, {sk_flags = 0, sk_attno =
0, sk_strategy = 0, sk_subtype = 0,
sk_collation = 0, sk_func = {fn_addr = 0, fn_oid = 0, fn_nargs
= 0, fn_strict = 0 '\000', fn_retset = 0 '\000', fn_stats = 0 '\000',
fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0},
sk_argument = 0}}
hashValue = 2280326203
hashIndex = 59
elt = 0x7fe0742cf228
ct = 0x7fe0742cf218
relation = value optimized out
scandesc = value optimized out
ntp = value optimized out
#2  0x006ec69e in getTypeOutputInfo (type=20, typOutput=0x277dea0,
typIsVarlena=0x277dea8 ) at lsyscache.c:2438
typeTuple = value optimized out
pt = value optimized out
__func__ = getTypeOutputInfo
#3  0x00459027 in printtup_prepare_info (myState=0x25d53d0,
typeinfo=0x2686640, numAttrs=84) at printtup.c:263
thisState = value optimized out
format = value optimized out
formats = 0x0
i = value optimized out
__func__ = printtup_prepare_info
#4  0x004593c4 in printtup (slot=0x26fd960, self=0x25d53d0) at
printtup.c:297
typeinfo = value optimized out
myState = 0x25d53d0
buf = {data = 0x262cf40 \320I\\\002, len = 40884576, maxlen = 0,
cursor = 40884576}
natts = 84
i = value optimized out
#5  0x006376ca in RunFromStore (portal=0x262cf40, direction=value
optimized out, count=10, dest=0x25d53d0) at pquery.c:1121
oldcontext = 0x295a470
ok = value optimized out
current_tuple_count = 0
slot = 

Re: [GENERAL] Segmentation fault with core dump

2013-04-12 Thread Andres Freund
On 2013-04-10 19:06:12 -0400, Tom Lane wrote:
 I wrote:
  (Wanders away wondering just how much the regression tests exercise
  holdable cursors.)
 
 And the answer is they're not testing this code path at all, because if
 you do
   DECLARE c CURSOR WITH HOLD FOR ...
   FETCH ALL FROM c;
 then the second query executes with a portal (and resource owner)
 created to execute the FETCH command, not directly on the held portal.
 
 After a little bit of thought I'm not sure it's even possible to
 reproduce this problem with libpq, because it doesn't expose any way to
 issue a bare protocol Execute command against a pre-existing portal.
 (I had thought psqlOBC went through libpq, but maybe it's playing some
 games here.)
 
 Anyway, I'm thinking the appropriate fix might be like this
 
 - CurrentResourceOwner = portal-resowner;
 + if (portal-resowner)
 + CurrentResourceOwner = portal-resowner;
 
 in several places in pquery.c; that is, keep using
 TopTransactionResourceOwner if the portal doesn't have its own.
 
 A more general but probably much more invasive solution would be to fake
 up an intermediate portal when pulling data from a held portal, to
 more closely approximate the explicit-FETCH case.

We could also allocate a new resowner for the duration of that
transaction. That would get reassigned to the transactions resowner in
PreCommit_Portals (after a slight change there).
That actually seems simple enough?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Segmentation fault with core dump

2013-04-10 Thread Alvaro Herrera
Joshua Berry escribió:

 gdb /usr/pgsql-9.1/bin/postmaster -c core.17356
 [...loading/reading symbols...]
 Core was generated by `postgres: [username] [databasename]
 [client_ipaddress](1500) SELECT  '.
 Program terminated with signal 11, Segmentation fault.
 #0  ResourceOwnerEnlargeCatCacheRefs (owner=0x0) at resowner.c:605
 605 if (owner-ncatrefs  owner-maxcatrefs)
 (gdb) q

Run bt here please (maybe bt full is even more illuminating)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Segmentation fault with core dump

2013-04-10 Thread Andres Freund
Hi,

On 2013-04-10 16:34:40 -0500, Joshua Berry wrote:
 Below are the relevant details. I'm not terribly savvy with gdb, so please
 let me know what else I could/should examine from the core dump, as well as
 anything else about the system/configuration.


 Kind Regards,
 -Joshua
 
 #NB: some info in square brackets has been [redacted]
 # grep postmaster /var/log/messages
 Apr 10 13:18:32 [hostname] kernel: postmaster[17356]: segfault at 40 ip
 00710e2e sp 7fffd193ca70 error 4 in postgres[40+4ea000]
 
 gdb /usr/pgsql-9.1/bin/postmaster -c core.17356
 [...loading/reading symbols...]
 Core was generated by `postgres: [username] [databasename]
 [client_ipaddress](1500) SELECT  '.
 Program terminated with signal 11, Segmentation fault.
 #0  ResourceOwnerEnlargeCatCacheRefs (owner=0x0) at resowner.c:605
 605 if (owner-ncatrefs  owner-maxcatrefs)
 (gdb) q

Could you show the output of 'bt full'?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Segmentation fault with core dump

2013-04-10 Thread Joshua Berry
Hi Andres!

On Wed, Apr 10, 2013 at 4:49 PM, Andres Freund and...@2ndquadrant.comwrote:

 Could you show the output of 'bt full'?


Program terminated with signal 11, Segmentation fault.
#0  ResourceOwnerEnlargeCatCacheRefs (owner=0x0) at resowner.c:605
605 if (owner-ncatrefs  owner-maxcatrefs)
(gdb) bt
#0  ResourceOwnerEnlargeCatCacheRefs (owner=0x0) at resowner.c:605
#1  0x006e1382 in SearchCatCache (cache=0x2a1aad0,
v1=value optimized out, v2=value optimized out,
v3=value optimized out, v4=value optimized out) at catcache.c:1143
#2  0x006ec69e in getTypeOutputInfo (type=20, typOutput=0x2a8ac30,
typIsVarlena=0x2a8ac38 ) at lsyscache.c:2438
#3  0x00459027 in printtup_prepare_info (myState=0x2a50c40,
typeinfo=0x2e7fc50, numAttrs=42) at printtup.c:263
#4  0x004593c4 in printtup (slot=0x2d14618, self=0x2a50c40)
at printtup.c:297
#5  0x006376ca in RunFromStore (portal=0x2aa9360,
direction=value optimized out, count=10, dest=0x2a50c40) at
pquery.c:1121
#6  0x006377b2 in PortalRunSelect (portal=0x2aa9360,
forward=value optimized out, count=10, dest=0x2a50c40) at pquery.c:939
#7  0x00638c78 in PortalRun (portal=0x2aa9360, count=10,
isTopLevel=1 '\001', dest=0x2a50c40, altdest=0x2a50c40,
completionTag=0x7fffd193d0f0 ) at pquery.c:787
#8  0x0063661e in exec_execute_message (argc=value optimized out,
argv=value optimized out, dbname=0x298a150 [dbname],
username=value optimized out) at postgres.c:1965
#9  PostgresMain (argc=value optimized out, argv=value optimized out,
dbname=0x298a150 [dbname], username=value optimized out)
at postgres.c:4026
#10 0x005f6c61 in BackendRun () at postmaster.c:3612
#11 BackendStartup () at postmaster.c:3302
#12 ServerLoop () at postmaster.c:1466
#13 0x005f9431 in PostmasterMain (argc=value optimized out,
argv=value optimized out) at postmaster.c:1127
#14 0x0059a9b0 in main (argc=5, argv=0x2988480) at main.c:199
(gdb) bt full
#0  ResourceOwnerEnlargeCatCacheRefs (owner=0x0) at resowner.c:605
newmax = value optimized out
#1  0x006e1382 in SearchCatCache (cache=0x2a1aad0,
v1=value optimized out, v2=value optimized out,
v3=value optimized out, v4=value optimized out) at catcache.c:1143
res = 1 '\001'
cur_skey = {{sk_flags = 0, sk_attno = -2, sk_strategy = 3,
sk_subtype = 0, sk_collation = 0, sk_func = {
  fn_addr = 0x686640 oideq, fn_oid = 184, fn_nargs = 2,
  fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2
'\002',
  fn_extra = 0x0, fn_mcxt = 0x298b108, fn_expr = 0x0},
sk_argument = 20}, {sk_flags = 0, sk_attno = 0, sk_strategy = 0,
sk_subtype = 0, sk_collation = 0, sk_func = {fn_addr = 0,
  fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\000',
  fn_retset = 0 '\000', fn_stats = 0 '\000', fn_extra = 0x0,
  fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument = 0}, {sk_flags =
0,
sk_attno = 0, sk_strategy = 0, sk_subtype = 0, sk_collation = 0,
sk_func = {fn_addr = 0, fn_oid = 0, fn_nargs = 0,
  fn_strict = 0 '\000', fn_retset = 0 '\000', fn_stats = 0
'\000',
  fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument =
0},
  {sk_flags = 0, sk_attno = 0, sk_strategy = 0, sk_subtype = 0,
sk_collation = 0, sk_func = {fn_addr = 0, fn_oid = 0,
  fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000',
  fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0,
  fn_expr = 0x0}, sk_argument = 0}}
hashValue = 2280326203
hashIndex = 59
elt = 0x7fc28648fb88
ct = 0x7fc28648fb78
relation = value optimized out
scandesc = value optimized out
ntp = value optimized out
#2  0x006ec69e in getTypeOutputInfo (type=20, typOutput=0x2a8ac30,
typIsVarlena=0x2a8ac38 ) at lsyscache.c:2438
typeTuple = value optimized out
pt = value optimized out
__func__ = getTypeOutputInfo
#3  0x00459027 in printtup_prepare_info (myState=0x2a50c40,
typeinfo=0x2e7fc50, numAttrs=42) at printtup.c:263
thisState = value optimized out
format = value optimized out
formats = 0x0
i = value optimized out
__func__ = printtup_prepare_info
#4  0x004593c4 in printtup (slot=0x2d14618, self=0x2a50c40)
at printtup.c:297
typeinfo = value optimized out
myState = 0x2a50c40
buf = {data = 0x2aa9360 x\376\243\002, len = 47269400, maxlen = 0,
  cursor = 47269400}
natts = 42
i = value optimized out
#5  0x006376ca in RunFromStore (portal=0x2aa9360,
direction=value optimized out, count=10, dest=0x2a50c40) at
pquery.c:1121
oldcontext = 0x2aca6f0
ok = value optimized out
current_tuple_count = 0
slot = 0x2d14618
#6  

Re: [GENERAL] Segmentation fault with core dump

2013-04-10 Thread Andres Freund
On 2013-04-10 16:53:12 -0500, Joshua Berry wrote:
 #7  0x00638c78 in PortalRun (portal=0x2aa9360, count=10,
 isTopLevel=1 '\001', dest=0x2a50c40, altdest=0x2a50c40,
 completionTag=0x7fffd193d0f0 ) at pquery.c:787
 save_exception_stack = 0x7fffd193cfe0
 save_context_stack = 0x0
 local_sigjmp_buf = {{__jmpbuf = {140736709513424,
 8412518331319730861,
   44733280, 0, 4294967295, 3, -8412469024494566739,
   8412517527857939117}, __mask_was_saved = 0, __saved_mask = {
   __val = {76, 44316880, 5, 0, 44804496, 767, 5, 0, 4294967295,
 3,
 5565893, 11495616, 4746857, 140736709513424,
 140736709513424,
 0
 result = value optimized out
 nprocessed = value optimized out
 saveTopTransactionResourceOwner = 0x2a439e0
 saveTopTransactionContext = 0x2a438d0
 saveActivePortal = 0x0
 saveResourceOwner = 0x2a439e0
 savePortalContext = 0x0
 saveMemoryContext = 0x2a438d0
 __func__ = PortalRun

Ok, so while we have a valid resource owner up to here, portal-resonwer
is NULL.

Could you 'up' until youre in this frame and then do 'print *portal'?

Thanks,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Segmentation fault with core dump

2013-04-10 Thread Joshua Berry
 Ok, so while we have a valid resource owner up to here, portal-resonwer
 is NULL.

 Could you 'up' until youre in this frame and then do 'print *portal'?


#7  0x00638c78 in PortalRun (portal=0x2aa9360, count=10,
isTopLevel=1 '\001', dest=0x2a50c40, altdest=0x2a50c40,
completionTag=0x7fffd193d0f0 ) at pquery.c:787
787 nprocessed =
PortalRunSelect(portal, true, count, dest);
(gdb) print *portal
$1 = {name = 0x2a3fe78 SQL_CUR017CB040, prepStmtName = 0x0,
  heap = 0x2aca6f0, resowner = 0x0, cleanup = 0x535d30 PortalCleanup,
  createSubid = 0,
  sourceText = 0x2d141b8 declare \SQL_CUR017CB040\ cursor with hold for
SELECT
anl.LAB,anl.JOB,anl.COMPANYCODE,anl.SAMPLETYPE,anl.COMPANYLAB,anl.CYLNO,anl.CONTAINERCODE,anl.DATEGCANALYSIS,anl.DATEREPORT,anl.SAMPLENAME,anl.FIE...,
  commandTag = 0x828f32 SELECT, stmts = 0x2d145e8, cplan = 0x0,
  portalParams = 0x0, strategy = PORTAL_ONE_SELECT, cursorOptions = 52,
  status = PORTAL_ACTIVE, portalPinned = 0 '\000', queryDesc = 0x0,
  tupDesc = 0x2e7fc50, formats = 0x0, holdStore = 0x2d0d430,
  holdContext = 0x2f4e290, atStart = 0 '\000', atEnd = 0 '\000',
  posOverflow = 0 '\000', portalPos = 10, creation_time = 418933112934458,
  visible = 1 '\001'}
(gdb)


Re: [GENERAL] Segmentation fault with core dump

2013-04-10 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Ok, so while we have a valid resource owner up to here, portal-resonwer
 is NULL.

Yeah, that's what I'm guessing.  Given the exposed reference to a cursor
WITH HOLD, it seems likely that the reason the portal has no resowner
is that PreCommit_Portals() got rid of it when the cursor was held.
However, if that were the explanation, it's not clear how come this
isn't falling over all the time for everybody who uses holdable cursors.
Any cache flush happening just before fetching from a holdable cursor
ought to do it ...

(Wanders away wondering just how much the regression tests exercise
holdable cursors.)

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Segmentation fault with core dump

2013-04-10 Thread Tom Lane
I wrote:
 (Wanders away wondering just how much the regression tests exercise
 holdable cursors.)

And the answer is they're not testing this code path at all, because if
you do
DECLARE c CURSOR WITH HOLD FOR ...
FETCH ALL FROM c;
then the second query executes with a portal (and resource owner)
created to execute the FETCH command, not directly on the held portal.

After a little bit of thought I'm not sure it's even possible to
reproduce this problem with libpq, because it doesn't expose any way to
issue a bare protocol Execute command against a pre-existing portal.
(I had thought psqlOBC went through libpq, but maybe it's playing some
games here.)

Anyway, I'm thinking the appropriate fix might be like this

-   CurrentResourceOwner = portal-resowner;
+   if (portal-resowner)
+   CurrentResourceOwner = portal-resowner;

in several places in pquery.c; that is, keep using
TopTransactionResourceOwner if the portal doesn't have its own.

A more general but probably much more invasive solution would be to fake
up an intermediate portal when pulling data from a held portal, to
more closely approximate the explicit-FETCH case.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Segmentation fault with core dump

2013-04-10 Thread Andres Freund
On 2013-04-10 17:31:09 -0500, Joshua Berry wrote:
  Ok, so while we have a valid resource owner up to here, portal-resonwer
  is NULL.
 
  Could you 'up' until youre in this frame and then do 'print *portal'?
 
 
 #7  0x00638c78 in PortalRun (portal=0x2aa9360, count=10,
 isTopLevel=1 '\001', dest=0x2a50c40, altdest=0x2a50c40,
 completionTag=0x7fffd193d0f0 ) at pquery.c:787
 787 nprocessed =
 PortalRunSelect(portal, true, count, dest);
 (gdb) print *portal
 $1 = {name = 0x2a3fe78 SQL_CUR017CB040, prepStmtName = 0x0,
   heap = 0x2aca6f0, resowner = 0x0, cleanup = 0x535d30 PortalCleanup,
   createSubid = 0,
   sourceText = 0x2d141b8 declare \SQL_CUR017CB040\ cursor with hold for
 SELECT
 anl.LAB,anl.JOB,anl.COMPANYCODE,anl.SAMPLETYPE,anl.COMPANYLAB,anl.CYLNO,anl.CONTAINERCODE,anl.DATEGCANALYSIS,anl.DATEREPORT,anl.SAMPLENAME,anl.FIE...,
   commandTag = 0x828f32 SELECT, stmts = 0x2d145e8, cplan = 0x0,
   portalParams = 0x0, strategy = PORTAL_ONE_SELECT, cursorOptions = 52,
   status = PORTAL_ACTIVE, portalPinned = 0 '\000', queryDesc = 0x0,
   tupDesc = 0x2e7fc50, formats = 0x0, holdStore = 0x2d0d430,
   holdContext = 0x2f4e290, atStart = 0 '\000', atEnd = 0 '\000',
   posOverflow = 0 '\000', portalPos = 10, creation_time = 418933112934458,
   visible = 1 '\001'}
 (gdb)

Ok, I might be seeing whats going on here. Could you go to 'printtup'
and print *myState, *myState-attrinfo, *typpeinfo?

Tom: It looks to me like printtup_prepare_info won't normally be called
in an held cursor. But if some concurrent DDL changed the number of
columns in typeinfo vs thaose in the the receiver that could explain the
issue and why its not seen all the time, right?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Segmentation fault with core dump

2013-04-10 Thread Andres Freund
On 2013-04-10 19:06:12 -0400, Tom Lane wrote:
 I wrote:
  (Wanders away wondering just how much the regression tests exercise
  holdable cursors.)
 
 And the answer is they're not testing this code path at all, because if
 you do
   DECLARE c CURSOR WITH HOLD FOR ...
   FETCH ALL FROM c;
 then the second query executes with a portal (and resource owner)
 created to execute the FETCH command, not directly on the held portal.

But in that path CurrentResourceOwner gets reset to portal-resowner as
well (see PortalRunFetch())?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Segmentation fault with core dump

2013-04-10 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Tom: It looks to me like printtup_prepare_info won't normally be called
 in an held cursor. But if some concurrent DDL changed the number of
 columns in typeinfo vs thaose in the the receiver that could explain the
 issue and why its not seen all the time, right?

It looks to me like there are probably two triggering conditions:

1. Client is doing a direct protocol Execute on a held-cursor portal.

2. Cache flush occurs to drop the syscache entries needed by
getTypeOutputInfo.  (Otherwise, they'd still be around from when the
held cursor was created.)

The first of these explains why we don't see it in the
CLOBBER_CACHE_ALWAYS buildfarm runs, and the second one explains why
Joshua is only seeing it intermittently and not every darn time his
application does that.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Segmentation fault with core dump

2013-04-10 Thread Joshua Berry
 Ok, I might be seeing whats going on here. Could you go to 'printtup'
 and print *myState, *myState-attrinfo, *typpeinfo?


#4  0x004593c4 in printtup (slot=0x2d14618, self=0x2a50c40)
at printtup.c:297
297 printtup_prepare_info(myState, typeinfo, natts);
(gdb) print *myState, *myState-attrinfo, *typeinfo
value has been optimized out
(gdb)


Re: [GENERAL] Segmentation fault with core dump

2013-04-10 Thread Andres Freund
On 2013-04-10 18:25:24 -0500, Joshua Berry wrote:
  Ok, I might be seeing whats going on here. Could you go to 'printtup'
  and print *myState, *myState-attrinfo, *typpeinfo?
 
 
 #4  0x004593c4 in printtup (slot=0x2d14618, self=0x2a50c40)
 at printtup.c:297
 297 printtup_prepare_info(myState, typeinfo, natts);
 (gdb) print *myState, *myState-attrinfo, *typeinfo
 value has been optimized out

Hm. Make that a
print *(DR_printtup *) self
print *((DR_printtup *) self)-attrinfo
print *slot-tts_tupleDescriptor

then.


Thanks,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Segmentation fault with core dump

2013-04-10 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-04-10 19:06:12 -0400, Tom Lane wrote:
 And the answer is they're not testing this code path at all, because if
 you do
 DECLARE c CURSOR WITH HOLD FOR ...
 FETCH ALL FROM c;
 then the second query executes with a portal (and resource owner)
 created to execute the FETCH command, not directly on the held portal.

 But in that path CurrentResourceOwner gets reset to portal-resowner as
 well (see PortalRunFetch())?

Right, but that's the FETCH's portal, which is a regular live portal
that has a ResOwner.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Segmentation fault with core dump

2013-04-10 Thread Andres Freund
On 2013-04-10 19:06:12 -0400, Tom Lane wrote:
 I wrote:
  (Wanders away wondering just how much the regression tests exercise
  holdable cursors.)
 
 And the answer is they're not testing this code path at all, because if
 you do
   DECLARE c CURSOR WITH HOLD FOR ...
   FETCH ALL FROM c;
 then the second query executes with a portal (and resource owner)
 created to execute the FETCH command, not directly on the held portal.
 
 After a little bit of thought I'm not sure it's even possible to
 reproduce this problem with libpq, because it doesn't expose any way to
 issue a bare protocol Execute command against a pre-existing portal.
 (I had thought psqlOBC went through libpq, but maybe it's playing some
 games here.)

Hm. PQexecPrepared() looks like it can do that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Segmentation fault with core dump

2013-04-10 Thread Joshua Berry
Hm. Make that a
 print *(DR_printtup *) self
 print *((DR_printtup *) self)-attrinfo
 print *slot-tts_tupleDescriptor


(gdb) print *(DR_printtup *) self
$2 = {pub = {receiveSlot = 0x459390 printtup,
rStartup = 0x459550 printtup_startup,
rShutdown = 0x458a20 printtup_shutdown,
rDestroy = 0x458a10 printtup_destroy, mydest = DestRemoteExecute},
  portal = 0x2aa9360, sendDescrip = 0 '\000', attrinfo = 0x2e7fc50,
  nattrs = 42, myinfo = 0x2a8ac30}
(gdb) print *((DR_printtup *) self)-attrinfo
$3 = {natts = 42, attrs = 0x2e7fc78, constr = 0x0, tdtypeid = 2249,
  tdtypmod = -1, tdhasoid = 0 '\000', tdrefcount = -1}
(gdb) print *slot-tts_tupleDescriptor
$4 = {natts = 42, attrs = 0x2e7fc78, constr = 0x0, tdtypeid = 2249,
  tdtypmod = -1, tdhasoid = 0 '\000', tdrefcount = -1}
(gdb)


Re: [GENERAL] Segmentation fault with core dump

2013-04-10 Thread Andres Freund
On 2013-04-10 19:29:06 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-04-10 19:06:12 -0400, Tom Lane wrote:
  And the answer is they're not testing this code path at all, because if
  you do
  DECLARE c CURSOR WITH HOLD FOR ...
  FETCH ALL FROM c;
  then the second query executes with a portal (and resource owner)
  created to execute the FETCH command, not directly on the held portal.
 
  But in that path CurrentResourceOwner gets reset to portal-resowner as
  well (see PortalRunFetch())?
 
 Right, but that's the FETCH's portal, which is a regular live portal
 that has a ResOwner.

I don't think so?

standard_ProcessUtility:
PerformPortalFetch:
/* get the portal from the portal name */
portal = GetPortalByName(stmt-portalname);
...
/* Do it */
nprocessed = PortalRunFetch(portal,
stmt-direction,
stmt-howMany,
dest);
PortalRunFetch:
PG_TRY();
{
ActivePortal = portal;
CurrentResourceOwner = portal-resowner;

So it seems to trigger a very similar codepath?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general