Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2015-03-27 Thread Day, David
Hi,

Update : A storey with a happy ending.

I have not seen this segmentation fault since converting the pgperl functions 
to python within the
FreeBSD 9.x environment. So I believe Guy Helmer’s suggested causation was 
likely spot on.
Due to the inability to reproduce the  issue on demand there is a small chance 
this is not the root cause, but  I’ll let the current empirical health of the 
system speak loudest on this matter.

We are in the process of migrating development efforts to 10.x so selection of 
perl over python should become a non-issue.

Thanks all who assisted me in figuring this out.

Best Regards


Dave

From: Day, David
Sent: Wednesday, February 18, 2015 8:07 AM
To: 'Guy Helmer'
Cc: 'pgsql-general@postgresql.org'
Subject: RE: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu 
related ?

Update/Information sharing: ( FreeBSD 10.0 (amd64) – Postgres 9.3.5 – Perl 5.18 
)

I have converted our Postgres  plperlu functions to plpython2u to see if the 
postgres segmentation faults disappear.
Lacking a known way to reproduce the error on demand, I will have to wait a few 
weeks for the absence of the symptom before I might conclude that this bug 
reported to me by Guy  Helmer was the issue.  Migration/Upgrade  to FreeBsd 
10.1 was not an immediate option.


Regards

Dave



Guy,

No I had not seen that bug report before.  ( 
https://rt.perl.org/Public/Bug/Display.html?id=122199 )

We did migrate from FreeBSD 9.x (2?) and I think it true
that we were not experiencing the problem at time.
So it might be a good fit/explanation for our current experience

There were a couple of suggestions to follow up on.
I’ll keep the thread updated.

Thanks, a  good start to my  Friday the 13th.


Regards


Dave Day






From: Guy Helmer [mailto:ghel...@palisadesystems.com]
Sent: Thursday, February 12, 2015 6:19 PM
To: Day, David
Cc: pgsql-general@postgresql.orgmailto:pgsql-general@postgresql.org
Subject: Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu 
related ?


On Feb 12, 2015, at 3:21 PM, Day, David 
d...@redcom.commailto:d...@redcom.com wrote:

Update/Information sharing on my pursuit of  segmentation faults

FreeBSD 10.0-RELEASE-p12 amd64
Postgres version 9.3.5

Below are three postgres core files generated from two different machine ( 
Georgia and Alabama ) on Feb 11.
These cores would not be caused  from an  environment update issue that I last 
suspected might be causing the segfaults
So I am kind of back to square one in terms of thinking what is occurring.

?  I am not sure that I understand the associated time events in the  postgres 
log file output.  Is this whatever happens to be running on the other postgress 
forked process when the cored  process was detected ?
If this is the case then I have probably been reading to much from the content 
of the postgres log file at the time of core.
This probably just represents collateral damage of routine transactions that 
were in other forked  processes at the time one of the processes cored ?

Therefore I would now just assert  that postgres has a sporadic segmentation 
problem,  no known way to reliably cause it
and am uncertain as to how to proceed to resolve it.

. . .

 Georgia-Core 8:38 -  Feb 11
[New process 101032]
[New Thread 802c06400 (LWP 101032)]
Core was generated by `postgres'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00080c4b6d51 in Perl_hfree_next_entry () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
(gdb) bt
#0  0x00080c4b6d51 in Perl_hfree_next_entry () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#1  0x00080c4cab49 in Perl_sv_clear () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#2  0x00080c4cb13a in Perl_sv_free2 () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#3  0x00080c4e5102 in Perl_free_tmps () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#4  0x00080bcfedea in plperl_destroy_interp () from 
/usr/local/lib/postgresql/plperl.so
#5  0x00080bcfec05 in plperl_fini () from 
/usr/local/lib/postgresql/plperl.so
#6  0x006292c6 in ?? ()
#7  0x0062918d in proc_exit ()
#8  0x006443f3 in PostgresMain ()
#9  0x005ff267 in PostmasterMain ()
#10 0x005a31ba in main ()
(gdb) info threads
  Id   Target Id Frame
* 2Thread 802c06400 (LWP 101032) 0x00080c4b6d51 in 
Perl_hfree_next_entry () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
* 1Thread 802c06400 (LWP 101032) 0x00080c4b6d51 in 
Perl_hfree_next_entry () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18


Given two of the coredumps are in down in libperl and this is FreeBSD 10.0 
amd64, have you seen this?

https://rt.perl.org/Public/Bug/Display.html?id=122199

Michael Moll suggested trying setting vm.pmap.pcid_enabled to 0 but I don’t 
recall seeing if that helped.

Guy




Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2015-02-18 Thread Day, David
Update/Information sharing: ( FreeBSD 10.0 (amd64) – Postgres 9.3.5 – Perl 5.18 
)

I have converted our Postgres  plperlu functions to plpython2u to see if the 
postgres segmentation faults disappear.
Lacking a known way to reproduce the error on demand, I will have to wait a few 
weeks for the absence of the symptom before I might conclude that this bug 
reported to me by Guy  Helmer was the issue.  Migration/Upgrade  to FreeBsd 
10.1 was not an immediate option.


Regards

Dave



Guy,

No I had not seen that bug report before.  ( 
https://rt.perl.org/Public/Bug/Display.html?id=122199 )

We did migrate from FreeBSD 9.x (2?) and I think it true
that we were not experiencing the problem at time.
So it might be a good fit/explanation for our current experience

There were a couple of suggestions to follow up on.
I’ll keep the thread updated.

Thanks, a  good start to my  Friday the 13th.


Regards


Dave Day






From: Guy Helmer [mailto:ghel...@palisadesystems.com]
Sent: Thursday, February 12, 2015 6:19 PM
To: Day, David
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu 
related ?


On Feb 12, 2015, at 3:21 PM, Day, David 
d...@redcom.commailto:d...@redcom.com wrote:

Update/Information sharing on my pursuit of  segmentation faults

FreeBSD 10.0-RELEASE-p12 amd64
Postgres version 9.3.5

Below are three postgres core files generated from two different machine ( 
Georgia and Alabama ) on Feb 11.
These cores would not be caused  from an  environment update issue that I last 
suspected might be causing the segfaults
So I am kind of back to square one in terms of thinking what is occurring.

?  I am not sure that I understand the associated time events in the  postgres 
log file output.  Is this whatever happens to be running on the other postgress 
forked process when the cored  process was detected ?
If this is the case then I have probably been reading to much from the content 
of the postgres log file at the time of core.
This probably just represents collateral damage of routine transactions that 
were in other forked  processes at the time one of the processes cored ?

Therefore I would now just assert  that postgres has a sporadic segmentation 
problem,  no known way to reliably cause it
and am uncertain as to how to proceed to resolve it.

. . .

 Georgia-Core 8:38 -  Feb 11
[New process 101032]
[New Thread 802c06400 (LWP 101032)]
Core was generated by `postgres'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00080c4b6d51 in Perl_hfree_next_entry () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
(gdb) bt
#0  0x00080c4b6d51 in Perl_hfree_next_entry () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#1  0x00080c4cab49 in Perl_sv_clear () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#2  0x00080c4cb13a in Perl_sv_free2 () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#3  0x00080c4e5102 in Perl_free_tmps () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#4  0x00080bcfedea in plperl_destroy_interp () from 
/usr/local/lib/postgresql/plperl.so
#5  0x00080bcfec05 in plperl_fini () from 
/usr/local/lib/postgresql/plperl.so
#6  0x006292c6 in ?? ()
#7  0x0062918d in proc_exit ()
#8  0x006443f3 in PostgresMain ()
#9  0x005ff267 in PostmasterMain ()
#10 0x005a31ba in main ()
(gdb) info threads
  Id   Target Id Frame
* 2Thread 802c06400 (LWP 101032) 0x00080c4b6d51 in 
Perl_hfree_next_entry () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
* 1Thread 802c06400 (LWP 101032) 0x00080c4b6d51 in 
Perl_hfree_next_entry () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18


Given two of the coredumps are in down in libperl and this is FreeBSD 10.0 
amd64, have you seen this?

https://rt.perl.org/Public/Bug/Display.html?id=122199

Michael Moll suggested trying setting vm.pmap.pcid_enabled to 0 but I don’t 
recall seeing if that helped.

Guy




Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2015-02-13 Thread Guy Helmer

 On Feb 12, 2015, at 3:21 PM, Day, David d...@redcom.com wrote:
 
 Update/Information sharing on my pursuit of  segmentation faults
  
 FreeBSD 10.0-RELEASE-p12 amd64
 Postgres version 9.3.5
  
 Below are three postgres core files generated from two different machine ( 
 Georgia and Alabama ) on Feb 11. 
 These cores would not be caused  from an  environment update issue that I 
 last suspected might be causing the segfaults
 So I am kind of back to square one in terms of thinking what is occurring.
  
 ?  I am not sure that I understand the associated time events in the  
 postgres log file output.  Is this whatever happens to be running on the 
 other postgress forked process when the cored  process was detected ? 
 If this is the case then I have probably been reading to much from the 
 content of the postgres log file at the time of core.
 This probably just represents collateral damage of routine transactions that 
 were in other forked  processes at the time one of the processes cored ?
  
 Therefore I would now just assert  that postgres has a sporadic segmentation 
 problem,  no known way to reliably cause it 
 and am uncertain as to how to proceed to resolve it. 

. . .

  Georgia-Core 8:38 -  Feb 11
 [New process 101032]
 [New Thread 802c06400 (LWP 101032)]
 Core was generated by `postgres'.
 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  0x00080c4b6d51 in Perl_hfree_next_entry () from 
 /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
 (gdb) bt
 #0  0x00080c4b6d51 in Perl_hfree_next_entry () from 
 /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
 #1  0x00080c4cab49 in Perl_sv_clear () from 
 /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
 #2  0x00080c4cb13a in Perl_sv_free2 () from 
 /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
 #3  0x00080c4e5102 in Perl_free_tmps () from 
 /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
 #4  0x00080bcfedea in plperl_destroy_interp () from 
 /usr/local/lib/postgresql/plperl.so
 #5  0x00080bcfec05 in plperl_fini () from 
 /usr/local/lib/postgresql/plperl.so
 #6  0x006292c6 in ?? ()
 #7  0x0062918d in proc_exit ()
 #8  0x006443f3 in PostgresMain ()
 #9  0x005ff267 in PostmasterMain ()
 #10 0x005a31ba in main ()
 (gdb) info threads
   Id   Target Id Frame
 * 2Thread 802c06400 (LWP 101032) 0x00080c4b6d51 in 
 Perl_hfree_next_entry () from 
 /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
 * 1Thread 802c06400 (LWP 101032) 0x00080c4b6d51 in 
 Perl_hfree_next_entry () from 
 /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
  

Given two of the coredumps are in down in libperl and this is FreeBSD 10.0 
amd64, have you seen this? 

https://rt.perl.org/Public/Bug/Display.html?id=122199 
https://rt.perl.org/Public/Bug/Display.html?id=122199

Michael Moll suggested trying setting vm.pmap.pcid_enabled to 0 but I don’t 
recall seeing if that helped.

Guy




Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2015-02-13 Thread Day, David
Guy,

No I had not seen that bug report before.  ( 
https://rt.perl.org/Public/Bug/Display.html?id=122199 )

We did migrate from FreeBSD 9.x (2?) and I think it true
that we were not experiencing the problem at time.
So it might be a good fit/explanation for our current experience

There were a couple of suggestions to follow up on.
I’ll keep the thread updated.

Thanks, a  good start to my  Friday the 13th.


Regards


Dave Day





From: Guy Helmer [mailto:ghel...@palisadesystems.com]
Sent: Thursday, February 12, 2015 6:19 PM
To: Day, David
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu 
related ?


On Feb 12, 2015, at 3:21 PM, Day, David 
d...@redcom.commailto:d...@redcom.com wrote:

Update/Information sharing on my pursuit of  segmentation faults

FreeBSD 10.0-RELEASE-p12 amd64
Postgres version 9.3.5

Below are three postgres core files generated from two different machine ( 
Georgia and Alabama ) on Feb 11.
These cores would not be caused  from an  environment update issue that I last 
suspected might be causing the segfaults
So I am kind of back to square one in terms of thinking what is occurring.

?  I am not sure that I understand the associated time events in the  postgres 
log file output.  Is this whatever happens to be running on the other postgress 
forked process when the cored  process was detected ?
If this is the case then I have probably been reading to much from the content 
of the postgres log file at the time of core.
This probably just represents collateral damage of routine transactions that 
were in other forked  processes at the time one of the processes cored ?

Therefore I would now just assert  that postgres has a sporadic segmentation 
problem,  no known way to reliably cause it
and am uncertain as to how to proceed to resolve it.

. . .


 Georgia-Core 8:38 -  Feb 11
[New process 101032]
[New Thread 802c06400 (LWP 101032)]
Core was generated by `postgres'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00080c4b6d51 in Perl_hfree_next_entry () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
(gdb) bt
#0  0x00080c4b6d51 in Perl_hfree_next_entry () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#1  0x00080c4cab49 in Perl_sv_clear () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#2  0x00080c4cb13a in Perl_sv_free2 () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#3  0x00080c4e5102 in Perl_free_tmps () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#4  0x00080bcfedea in plperl_destroy_interp () from 
/usr/local/lib/postgresql/plperl.so
#5  0x00080bcfec05 in plperl_fini () from 
/usr/local/lib/postgresql/plperl.so
#6  0x006292c6 in ?? ()
#7  0x0062918d in proc_exit ()
#8  0x006443f3 in PostgresMain ()
#9  0x005ff267 in PostmasterMain ()
#10 0x005a31ba in main ()
(gdb) info threads
  Id   Target Id Frame
* 2Thread 802c06400 (LWP 101032) 0x00080c4b6d51 in 
Perl_hfree_next_entry () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
* 1Thread 802c06400 (LWP 101032) 0x00080c4b6d51 in 
Perl_hfree_next_entry () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18


Given two of the coredumps are in down in libperl and this is FreeBSD 10.0 
amd64, have you seen this?

https://rt.perl.org/Public/Bug/Display.html?id=122199

Michael Moll suggested trying setting vm.pmap.pcid_enabled to 0 but I don’t 
recall seeing if that helped.

Guy




[GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2015-02-12 Thread Day, David
Update/Information sharing on my pursuit of  segmentation faults

FreeBSD 10.0-RELEASE-p12 amd64
Postgres version 9.3.5

Below are three postgres core files generated from two different machine ( 
Georgia and Alabama ) on Feb 11.
These cores would not be caused  from an  environment update issue that I last 
suspected might be causing the segfaults
So I am kind of back to square one in terms of thinking what is occurring.

?  I am not sure that I understand the associated time events in the  postgres 
log file output.  Is this whatever happens to be running on the other postgress 
forked process when the cored  process was detected ?
If this is the case then I have probably been reading to much from the content 
of the postgres log file at the time of core.
This probably just represents collateral damage of routine transactions that 
were in other forked  processes at the time one of the processes cored ?

Therefore I would now just assert  that postgres has a sporadic segmentation 
problem,  no known way to reliably cause it
and am uncertain as to how to proceed to resolve it.


Georgia 8:38
Georgia 17:55
Alabama: 15:30

--


If someone sees something suggesting  a direction to pursue from these core 
file back traces much appreciated.



Thanks


Dave

Georgia - Core 17:55 - Feb 11
(gdb) bt
#0  0x006f8670 in SearchCatCache ()
#1  0x00672537 in enum_in ()
#2  0x0071375b in InputFunctionCall ()
#3  0x00713b7e in OidInputFunctionCall ()
#4  0x00509a3d in coerce_type ()
#5  0x00511af3 in make_fn_arguments ()
#6  0x00513fed in make_op ()
#7  0x0050f53b in ?? ()
#8  0x0050d706 in transformExpr ()
#9  0x00518333 in transformTargetList ()
#10 0x004f02bc in transformStmt ()
#11 0x0064109d in pg_analyze_and_rewrite_params ()
#12 0x006fbc6b in ?? ()
#13 0x006fb6f5 in GetCachedPlan ()
#14 0x0059597a in SPI_plan_get_cached_plan ()
#15 0x0008024ed34d in ?? () from /usr/local/lib/postgresql/plpgsql.so
#16 0x0008024f2590 in ?? () from /usr/local/lib/postgresql/plpgsql.so
#17 0x0008024ee0d0 in ?? () from /usr/local/lib/postgresql/plpgsql.so
#18 0x0008024eaf3b in ?? () from /usr/local/lib/postgresql/plpgsql.so
#19 0x0008024ea243 in plpgsql_exec_function () from 
/usr/local/lib/postgresql/plpgsql.so
#20 0x0008024e6551 in plpgsql_call_handler () from 
/usr/local/lib/postgresql/plpgsql.so
#21 0x0057611f in ExecMakeTableFunctionResult ()
#22 0x0058b6c7 in ?? ()
#23 0x0057bab2 in ExecScan ()
#24 0x005756b8 in ExecProcNode ()
#25 0x00573630 in standard_ExecutorRun ()
#26 0x00645b0a in ?? ()
#27 0x00645719 in PortalRun ()
#28 0x006438ea in PostgresMain ()
#29 0x005ff267 in PostmasterMain ()
#30 0x005a31ba in main ()
(gdb) info threads
  Id   Target Id Frame
* 2Thread 802c06400 (LWP 100070) 0x006f8670 in SearchCatCache ()
* 1Thread 802c06400 (LWP 100070) 0x006f8670 in SearchCatCache ()


? The gdb info threads response is still an annoying piece of information.  
Connecting gdb to a healthy running postmaster gives the same thread count as 
the core file. (2)
However, other system system tools (top ps ) which  indicate number of threads 
for the process only indicate one thread on the healty process. So I think this 
is  a debugger bug.



2015-02-11T17:55:13.732147-05:00 georgia local0 info postgres[38321]: [7236-1] 
user=ace_db_client, db=ace_db, proc=38321, audit=dbm_client9, LOG:  du
ration: 4.384 ms  statement: COMMIT
2015-02-11T17:55:13.743399-05:00 georgia local0 info postgres[86738]: [12-1] 
user=redcom, db=ace_db, proc=86738, audit=[unknown], LOG:  duration: 14.
581 ms  statement: SELECT database, COALESCE(max(extract(epoch FROM 
CURRENT_TIMESTAMP-prepared)),0) FROM pg_prepared_xacts JOIN pg_database ON 
datnam
e=database WHERE datname='ace_db' GROUP BY database ORDER BY 1
2015-02-11T17:55:13.833624-05:00 georgia local0 info postgres[1018]: [11-1] 
user=, db=, proc=1018, audit=, LOG:  server process (PID 38319) was termi
nated by signal 11: Segmentation fault
2015-02-11T17:55:13.833669-05:00 georgia local0 info postgres[1018]: [11-2] 
user=, db=, proc=1018, audit=, DETAIL:  Failed process was running: SELEC
T * FROM cc.register_port_sip_user($1, $2, $3, $4, $5, $6, $7, $8, $9, $10 )
2015-02-11T17:55:13.833701-05:00 georgia local0 info postgres[1018]: [12-1] 
user=, db=, proc=1018, audit=, LOG:  terminating any other active server
processes
2015-02-11T17:55:13.833896-05:00 georgia local0 notice postgres[38321]: 
[7237-1] user=ace_db_client, db=ace_db, proc=38321, audit=dbm_client9, WARNIN
G:  terminating connection because of crash of another server process
2015-02-11T17:55:13.833923-05:00 georgia local0 notice postgres[38321]: 
[7237-2] user=ace_db_client, db=ace_db, proc=38321, audit=dbm_client9, DETAIL
:  The postmaster has commanded this server process to roll back 

Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2015-01-30 Thread Day, David
Alan,

I tried as you suggested,  I believe the gdb debugger is giving some false 
indication about threads.
Whether I attach to a newly launched  backend or a backend that has been 
executing the suspect perlu function.
The “info threads” result is two.  Suspiciously  they are both at the same 
location.

e.g.

* 2Thread 802c06400 (LWP 101353) 0x00080bfa50a3 in Perl_fbm_instr ()
   from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
* 1Thread 802c06400 (LWP 101353) 0x00080bfa50a3 in Perl_fbm_instr ()
   from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18

That seemed odd to me.  If we use ‘top’ or ‘ps axuwwH’ to get a thread count for
a given process the indication is only one thread for the same situations.

I am now  pursuing a different causal hypothesis.   There are instances of 
another
segmentation fault that do not involve this perl fx.  Rather it is a function 
that
is also called regularly even on a basically idle system.  Therefore it is 
perhaps  happenstance as
to which kind might happen.   I believe this may relate to our update process.

Product developers are frequently updating (daily)  environments/packages while 
running postgres and possibly our  application.  I am thinking this update 
process is not properly coordinating with a running postgres and  may result in 
occasional
shared library issues.  This thought is consistent  in  that our production 
testers who update
at a much lower frequency almost never see this segmentation fault problem but 
use the same update script.

I’ll attempt some scripts changes and meanwhile ask the developers to make 
observations that would support this idea.

I’ll update the thread with the future observations/outcome.
Possibly changing the subject to careless developers cause segmentation fault


Thanks for your assistance on this matter.


Dave


From: Alex Hunsaker [mailto:bada...@gmail.com]
Sent: Thursday, January 29, 2015 6:10 PM
To: Day, David
Cc: pgsql-general@postgresql.org; Tom Lane
Subject: Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu 
related ?



On Thu, Jan 29, 2015 at 1:54 PM, Day, David 
d...@redcom.commailto:d...@redcom.com wrote:
Thanks for the inputs,  I’ll attempt to apply it and will update when I have 
some new information.


BTW a quick check would be to attach with gdb right after you connect, check 
info threads (there should be none), run the plperlu procedure (with the right 
arguments/calls to hit all the execution paths), check info threads again. If 
info threads now reports a thread, we are likely looking at the right plperlu 
code. It should just be a matter of commenting stuff out to deduce what makes 
the thread. If not, it could be that plperlu is not at fault and its something 
else like an extension or some other procedure/pl.


Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2015-01-30 Thread Alex Hunsaker
On Fri, Jan 30, 2015 at 9:54 AM, Day, David d...@redcom.com wrote:


 Alan,



 I tried as you suggested,  I believe the gdb debugger is giving some false
 indication about threads.

 Whether I attach to a newly launched  backend or a backend that has been
 executing the suspect perlu function.

 The “info threads” result is two.  Suspiciously  they are both at the same
 location.



Curious, hrm, well, assuming gdb isn't lying about threads-- I think that
would point an extension or a external library (shared_preload_libraries or
local_preload_libraries).

Does info threads on the postmaster also report threads?


Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2015-01-29 Thread Day, David
I am amending the info threads info there are two threads.

I was using the wrong instance of the gdb debugger.
Program terminated with signal SIGSEGV, Segmentation fault.

(gdb) bt
#0  0x00080bfa50a3 in Perl_fbm_instr () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#1  0x00080c00ff93 in Perl_re_intuit_start () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#2  0x00080bfc27a2 in Perl_pp_match () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#3  0x00080bfbe6a3 in Perl_runops_standard () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#4  0x00080bf57bd8 in Perl_call_sv () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#5  0x00080bcfb7c7 in plperl_call_perl_func () from 
/usr/local/lib/postgresql/plperl.so
#6  0x00080bcf83c2 in plperl_call_handler () from 
/usr/local/lib/postgresql/plperl.so
#7  0x0057611f in ExecMakeTableFunctionResult ()
#8  0x0058b6c7 in ?? ()
#9  0x0057bab2 in ExecScan ()
#10 0x005756b8 in ExecProcNode ()
#11 0x005876a8 in ExecLimit ()
#12 0x00575771 in ExecProcNode ()
#13 0x00573630 in standard_ExecutorRun ()
#14 0x00593294 in ?? ()
#15 0x0059379c in SPI_execute_plan_with_paramlist ()
#16 0x0008024f19bc in ?? () from /usr/local/lib/postgresql/plpgsql.so
#17 0x0008024ee909 in ?? () from /usr/local/lib/postgresql/plpgsql.so
#18 0x0008024eaf3b in ?? () from /usr/local/lib/postgresql/plpgsql.so
#19 0x0008024ea243 in plpgsql_exec_function () from 
/usr/local/lib/postgresql/plpgsql.so
#20 0x0008024e6551 in plpgsql_call_handler () from 
/usr/local/lib/postgresql/plpgsql.so
#21 0x0057611f in ExecMakeTableFunctionResult ()
#22 0x0058b6c7 in ?? ()
#23 0x0057bab2 in ExecScan ()
#24 0x005756b8 in ExecProcNode ()
#25 0x00573630 in standard_ExecutorRun ()
#26 0x00645b0a in ?? ()
#27 0x00645719 in PortalRun ()
#28 0x006438ea in PostgresMain ()
#29 0x005ff267 in PostmasterMain ()
#30 0x005a31ba in main ()
(gdb) info thread
  Id   Target Id Frame
* 2Thread 802c06400 (LWP 101353) 0x00080bfa50a3 in Perl_fbm_instr ()
   from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
* 1Thread 802c06400 (LWP 101353) 0x00080bfa50a3 in Perl_fbm_instr ()
   from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18


Hi Alan,

Thanks for your  input.

My initial simplistic stress  test ( two connections calling same suspect  
function in a loop ) has failed in causing the problem albeit I have not used  
any  range of inputs for the possible parameters. Given your thoughts on the 
the internal mechnanics it seems unlikely it is competing sessions.I’ll see 
about varying and logging  arguments in future testing.   Reproducing is 90 % 
of the battle and
unfortunately we are losing on that front currently.

When I type (gdb) info threads  on the most recent core file I see:
* 1 Thread 802c06400 (LWP 101353/postgres)  0x005756b8 in ExecProcNode 
()
Not sure that fits with your expectations.

We only have two invoked perl functions in the database both of which are 
plperlu.  These functions are
both invoked at least once  in a normal usage  scenario,  which makes the 
infrequency of the segmentation fault puzzling.


Regards


Dave




From: Alex Hunsaker [mailto:bada...@gmail.com]
Sent: Thursday, January 29, 2015 12:58 AM
To: Day, David
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu 
related ?


On Wed, Jan 28, 2015 at 1:23 PM, Day, David 
d...@redcom.commailto:d...@redcom.com wrote:
It has been some time since we have seen this problem.
See earlier  message on this subject/thread  for the suspect  plperl function 
executing
at the time of the core.

Someone on our development team  suggested it might relate to some build 
options of perl.
In particular MULTIPLICITY or THREADS . We can have this perl fx executing on
two different connections/sessions at the same time.

Hrm, I can't see how 1 connections/sessions could tickle the bug. Or 
THREADS/MULTIPLICITY, short of some perl bug. Each backend is its own process 
and so each perl interpreter is isolated at from each other at that level. IOW 
each new connection has its very own perl interpreter that has no shared state 
with any of the others (short of using $_SHARED). But hey, if your testing 
finds it is easier to trigger with more connections, it just makes the bug more 
interesting :).

open as use use it should just be standard pipe(); fork(); exec(); dance. And 
I'm fairly certain perl does not do anything magic like making a thread behind 
the scene. In gdb you could also try info threads, just to see if somehow a 
thread did created.

Multiplicity should only come into play if you use plperl and plperlu in the 
same session (without it, it should error out with Cannot allocate multiple 
Perl interpreters

Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2015-01-29 Thread Day, David
Hi Alan,

Thanks for your  input.

My initial simplistic stress  test ( two connections calling same suspect  
function in a loop ) has failed in causing the problem albeit I have not used  
any  range of inputs for the possible parameters. Given your thoughts on the 
the internal mechnanics it seems unlikely it is competing sessions.I’ll see 
about varying and logging  arguments in future testing.   Reproducing is 90 % 
of the battle and
unfortunately we are losing on that front currently.

When I type (gdb) info threads  on the most recent core file I see:
* 1 Thread 802c06400 (LWP 101353/postgres)  0x005756b8 in ExecProcNode 
()
Not sure that fits with your expectations.

We only have two invoked perl functions in the database both of which are 
plperlu.  These functions are
both invoked at least once  in a normal usage  scenario,  which makes the 
infrequency of the segmentation fault puzzling.


Regards


Dave




From: Alex Hunsaker [mailto:bada...@gmail.com]
Sent: Thursday, January 29, 2015 12:58 AM
To: Day, David
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu 
related ?


On Wed, Jan 28, 2015 at 1:23 PM, Day, David 
d...@redcom.commailto:d...@redcom.com wrote:
It has been some time since we have seen this problem.
See earlier  message on this subject/thread  for the suspect  plperl function 
executing
at the time of the core.

Someone on our development team  suggested it might relate to some build 
options of perl.
In particular MULTIPLICITY or THREADS . We can have this perl fx executing on
two different connections/sessions at the same time.

Hrm, I can't see how 1 connections/sessions could tickle the bug. Or 
THREADS/MULTIPLICITY, short of some perl bug. Each backend is its own process 
and so each perl interpreter is isolated at from each other at that level. IOW 
each new connection has its very own perl interpreter that has no shared state 
with any of the others (short of using $_SHARED). But hey, if your testing 
finds it is easier to trigger with more connections, it just makes the bug more 
interesting :).

open as use use it should just be standard pipe(); fork(); exec(); dance. And 
I'm fairly certain perl does not do anything magic like making a thread behind 
the scene. In gdb you could also try info threads, just to see if somehow a 
thread did created.

Multiplicity should only come into play if you use plperl and plperlu in the 
same session (without it, it should error out with Cannot allocate multiple 
Perl interpreters on this platform).



I believe below is an valid stack dump:

Core was generated by `postgres'.
Program terminated with signal 11, Segmentation fault.
(gdb) bt
#0  0x00080bfa50a3 in Perl_fbm_instr () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#1  0x00080c00ff93 in Perl_re_intuit_start () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#2  0x00080bfc27a2 in Perl_pp_match () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18

This sure makes it look like it is segfaulting on some kind of regex /not/ open.

Any chance you could come up with a reproducible test case? I suspect the 
inputs to the function might help narrow it down to something reproducible. 
Maybe log the arguments at the start of the function? Or perhaps in your 
middleware when calling the function crashes, log how it was called?



Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2015-01-29 Thread Tom Lane
Day, David d...@redcom.com writes:
 I am amending the info threads info there are two threads.

Well, that's your problem right there.  There should never, ever be more
than one thread in a Postgres backend process: none of the code in the
backend is meant for a multithreaded situation, and so there are no
interlocks on global variable access etc.

Presumably what is happening is that your plperlu function is somehow
managing to spawn an additional execution thread and let that return
control as well as the original thread.  You need to prevent that.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2015-01-29 Thread Alex Hunsaker
On Thu, Jan 29, 2015 at 8:40 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Day, David d...@redcom.com writes:
  I am amending the info threads info there are two threads.

 Well, that's your problem right there.  There should never, ever be more
 than one thread in a Postgres backend process: none of the code in the
 backend is meant for a multithreaded situation, and so there are no
 interlocks on global variable access etc.


One thing you might try is setting a breakpoint on pthread_create (or
perhaps clone?) and see if that gives any clues as to what is spawning the
thread. If that doesn't help, I would try commenting out large chunks of
the plperlu function until the break point is not tripped, trying to find
what line causes it. It might also be interesting to see what happens if
you try with a non thread enabled perl-- but AFAICT nothing in
cc.get_sip_id() should cause threads to be used. A very quick grep of the
perl source seems to confirm this. Maybe something in the URI module?


Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2015-01-29 Thread Day, David
Thanks for the inputs,  I’ll attempt to apply it and will update when I have 
some new information.

Thanks


Dave

From: Alex Hunsaker [mailto:bada...@gmail.com]
Sent: Thursday, January 29, 2015 3:30 PM
To: Day, David
Cc: pgsql-general@postgresql.org; Tom Lane
Subject: Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu 
related ?



On Thu, Jan 29, 2015 at 8:40 AM, Tom Lane 
t...@sss.pgh.pa.usmailto:t...@sss.pgh.pa.us wrote:
Day, David d...@redcom.commailto:d...@redcom.com writes:
 I am amending the info threads info there are two threads.

Well, that's your problem right there.  There should never, ever be more
than one thread in a Postgres backend process: none of the code in the
backend is meant for a multithreaded situation, and so there are no
interlocks on global variable access etc.

One thing you might try is setting a breakpoint on pthread_create (or perhaps 
clone?) and see if that gives any clues as to what is spawning the thread. If 
that doesn't help, I would try commenting out large chunks of the plperlu 
function until the break point is not tripped, trying to find what line causes 
it. It might also be interesting to see what happens if you try with a non 
thread enabled perl-- but AFAICT nothing in cc.get_sip_id() should cause 
threads to be used. A very quick grep of the perl source seems to confirm this. 
Maybe something in the URI module?


Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2015-01-29 Thread Alex Hunsaker
On Thu, Jan 29, 2015 at 1:54 PM, Day, David d...@redcom.com wrote:

 Thanks for the inputs,  I’ll attempt to apply it and will update when I
 have some new information.




BTW a quick check would be to attach with gdb right after you connect,
check info threads (there should be none), run the plperlu procedure (with
the right arguments/calls to hit all the execution paths), check info
threads again. If info threads now reports a thread, we are likely looking
at the right plperlu code. It should just be a matter of commenting stuff
out to deduce what makes the thread. If not, it could be that plperlu is
not at fault and its something else like an extension or some other
procedure/pl.


Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2015-01-28 Thread Day, David
It has been some time since we have seen this problem.
See earlier  message on this subject/thread  for the suspect  plperl function 
executing
at the time of the core. 

Someone on our development team  suggested it might relate to some build 
options of perl.
In particular MULTIPLICITY or THREADS . We can have this perl fx executing on
two different connections/sessions at the same time. I intend to write some 
test scripts
that will increase the possibility of this occurrence to see if it makes the 
problem 
more reproducible. 

I'll update again after completing some testing. Meanwhile other thoughts and/or
confirmation that these build options should be enabled are welcome.

Thanks

Dave Day
   
I believe below is an valid stack dump: 

Core was generated by `postgres'.
Program terminated with signal 11, Segmentation fault.
(gdb) bt
#0  0x00080bfa50a3 in Perl_fbm_instr () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#1  0x00080c00ff93 in Perl_re_intuit_start () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#2  0x00080bfc27a2 in Perl_pp_match () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#3  0x00080bfbe6a3 in Perl_runops_standard () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#4  0x00080bf57bd8 in Perl_call_sv () from 
/usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#5  0x00080bcfb7c7 in plperl_call_perl_func () from 
/usr/local/lib/postgresql/plperl.so
#6  0x00080bcf83c2 in plperl_call_handler () from 
/usr/local/lib/postgresql/plperl.so
#7  0x0057611f in ExecMakeTableFunctionResult ()
#8  0x0058b6c7 in ExecFunctionScan ()
#9  0x0057bab2 in ExecScan ()
#10 0x005756b8 in ExecProcNode ()
#11 0x005876a8 in ExecLimit ()
#12 0x00575771 in ExecProcNode ()
#13 0x00573630 in standard_ExecutorRun ()
#14 0x00593294 in SPI_execute ()
#15 0x0059379c in SPI_execute_plan_with_paramlist ()
#16 0x0008024f19bc in plpgsql_subxact_cb () from 
/usr/local/lib/postgresql/plpgsql.so
#17 0x0008024ee909 in plpgsql_subxact_cb () from 
/usr/local/lib/postgresql/plpgsql.so
#18 0x0008024eaf3b in plpgsql_exec_function () from 
/usr/local/lib/postgresql/plpgsql.so
#19 0x0008024ea243 in plpgsql_exec_function () from 
/usr/local/lib/postgresql/plpgsql.so
#20 0x0008024e6551 in plpgsql_call_handler () from 
/usr/local/lib/postgresql/plpgsql.so
#21 0x0057611f in ExecMakeTableFunctionResult ()
#22 0x0058b6c7 in ExecFunctionScan ()
#23 0x0057bab2 in ExecScan ()
#24 0x005756b8 in ExecProcNode ()
#25 0x00573630 in standard_ExecutorRun ()
#26 0x00645b0a in PortalRun ()
#27 0x00645719 in PortalRun ()
#28 0x006438ea in PostgresMain ()
#29 0x005ff267 in PostmasterMain ()
#30 0x005a31ba in main ()

pkg info perl5
perl5-5.18.4_11
Name   : perl5
Version: 5.18.4_11
Installed on   : Mon Jan  5 09:28:05 EST 2015
Origin : lang/perl5.18
Architecture   : freebsd:10:x86:64
Prefix : /usr/local
Categories : perl5 lang devel
Licenses   : GPLv1 or ART10
Maintainer : p...@freebsd.org
WWW: http://www.perl.org/
Comment: Practical Extraction and Report Language
Options:
DEBUG  : off
GDBM   : off
MULTIPLICITY   : off
PERL_64BITINT  : on
PERL_MALLOC: off
PTHREAD: on
SITECUSTOMIZE  : off
THREADS: off
USE_PERL   : on
Shared Libs provided:
libperl.so.5.18
Annotations:
cpe: cpe:2.3:a:perl:perl:5.18.4:freebsd10:x64:11
repo_type  : binary
repository : redcom
Flat size  : 49.2MiB
Description:
Perl is a language that combines some of the features of C, sed, awk and
shell.  See the manual page for more hype.  There are also many books
published by O'Reilly  Assoc.  See pod/perlbook.pod for more
information.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2015-01-28 Thread Alex Hunsaker
On Wed, Jan 28, 2015 at 1:23 PM, Day, David d...@redcom.com wrote:

 It has been some time since we have seen this problem.
 See earlier  message on this subject/thread  for the suspect  plperl
 function executing
 at the time of the core.

 Someone on our development team  suggested it might relate to some build
 options of perl.
 In particular MULTIPLICITY or THREADS . We can have this perl fx executing
 on
 two different connections/sessions at the same time.


Hrm, I can't see how 1 connections/sessions could tickle the bug. Or
THREADS/MULTIPLICITY, short of some perl bug. Each backend is its own
process and so each perl interpreter is isolated at from each other at that
level. IOW each new connection has its very own perl interpreter that has
no shared state with any of the others (short of using $_SHARED). But hey,
if your testing finds it is easier to trigger with more connections, it
just makes the bug more interesting :).

open as use use it should just be standard pipe(); fork(); exec(); dance.
And I'm fairly certain perl does not do anything magic like making a thread
behind the scene. In gdb you could also try info threads, just to see if
somehow a thread did created.

Multiplicity should only come into play if you use plperl and plperlu in
the same session (without it, it should error out with Cannot allocate
multiple Perl interpreters on this platform).



 I believe below is an valid stack dump:

 Core was generated by `postgres'.
 Program terminated with signal 11, Segmentation fault.
 (gdb) bt
 #0  0x00080bfa50a3 in Perl_fbm_instr () from
 /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
 #1  0x00080c00ff93 in Perl_re_intuit_start () from
 /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
 #2  0x00080bfc27a2 in Perl_pp_match () from
 /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18


This sure makes it look like it is segfaulting on some kind of regex /not/
open.

Any chance you could come up with a reproducible test case? I suspect the
inputs to the function might help narrow it down to something reproducible.
Maybe log the arguments at the start of the function? Or perhaps in your
middleware when calling the function crashes, log how it was called?


Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2014-12-04 Thread Day, David

Tom,

Thanks very much for the feedback.

It is very likely that the date of the core was 'touched' to make the rebuilt
Postgres binary with symbols play nice with gdb.
Apparently, that was not a great idea based on your comments.

In any case we are better prepared to analyze it on the next instance.
Unfortunately the issue has been in remission since the thanksgiving holiday.

The combination of FreeBSD and postgres had been remarkably stable and
dependable up to very recently. This original bit of logic that we suspect
is related to the event  was originally written in plpgsql.  The logic  needed 
some access 
to system level info  for which plpgsql had no built in support.
I suspect the 'getaddrinfo' and 'getnameinfo' and 'open' related statements.
The open was the last piece added so it does bear the best correlation to the
problem onset.

In checking the thread counts for the backend processes that have executed this 
logic successfully I only 
see one thread per backend.

Pondering and awaiting an AHA moment. I'll keep the list appraised of any 
progress on the matter.

Best Regards


Dave Day



-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Wednesday, December 03, 2014 3:57 PM
To: Day, David
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu 
related ?

Day, David d...@redcom.com writes:
 We are developing on and  running Postgres 9.3.5 on  FreeBsd 10.0-p12.
 We have been experiencing a intermittent postgres core dump which 
 Seems primarily to be associated with the the 2  functions below.

 Given the onset of this problem,  we suspect it has something to do with the 
 addition of  DNS lookup within the our  perlu function cc.get_sip_id(...).

So this bit is new?

 open my $fh, /sbin/route get $host |;

I wonder if your version of Perl thinks this is sufficient license to go 
multithreaded or something like that.  That could be problematic.  You might 
try looking to see if a backend process that's successfully executed this code 
now contains multiple threads.

It's difficult to offer much help on the basis of the info provided.
One comment is that the stack trace you show is completely nonsensical:
functions by those names do exist in PG, but the calling order shown is 
impossible.  So it seems there's some problem in how you rebuilt with debug 
symbols --- maybe the symbols being used don't match the executable?  I'm not a 
FreeBSD user so I have no useful speculation to offer about how such a mixup 
might occur on that platform.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2014-12-04 Thread Tom Lane
Day, David d...@redcom.com writes:
 It is very likely that the date of the core was 'touched' to make the rebuilt
 Postgres binary with symbols play nice with gdb.
 Apparently, that was not a great idea based on your comments.

Oh, so you rebuilt with debug enabled and then retrospectively tried to
use that executable with a core file from a previous executable?  Yeah,
I'm unsurprised that that didn't work :-( ... perhaps it would in an
ideal world, but it's unreliable in the real world.  Make sure you have
the debug-enabled build installed as the running server so the next
corefile can be examined meaningfully.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2014-12-03 Thread Tom Lane
Day, David d...@redcom.com writes:
 We are developing on and  running Postgres 9.3.5 on  FreeBsd 10.0-p12.
 We have been experiencing a intermittent postgres core dump which
 Seems primarily to be associated with the the 2  functions below.

 Given the onset of this problem,  we suspect it has something to do with the 
 addition of  DNS lookup within the our  perlu function cc.get_sip_id(...).

So this bit is new?

 open my $fh, /sbin/route get $host |;

I wonder if your version of Perl thinks this is sufficient license to go
multithreaded or something like that.  That could be problematic.  You
might try looking to see if a backend process that's successfully executed
this code now contains multiple threads.

It's difficult to offer much help on the basis of the info provided.
One comment is that the stack trace you show is completely nonsensical:
functions by those names do exist in PG, but the calling order shown
is impossible.  So it seems there's some problem in how you rebuilt
with debug symbols --- maybe the symbols being used don't match the
executable?  I'm not a FreeBSD user so I have no useful speculation
to offer about how such a mixup might occur on that platform.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?

2014-12-02 Thread Day, David
We are developing on and  running Postgres 9.3.5 on  FreeBsd 10.0-p12.
We have been experiencing a intermittent postgres core dump which
Seems primarily to be associated with the the 2  functions below.

The area of interest is  based on the content of the postgres log file which 
often indicates

2014-12-01T14:37:41.559725-05:00 puertorico local0 info postgres[30154]: [3-1] 
LOG:  server process (PID 30187) was terminated by signal 11: Segmentation fault
2014-12-01T14:37:41.559787-05:00 puertorico local0 info postgres[30154]: [3-2] 
DETAIL:  Failed process was running: SELECT * FROM 
cc.get_port_and_registration_data($1, $2, $3, $4, $5)
2014-12-01T14:37:41.559794-05:00 puertorico local0 info postgres[30154]: [4-1] 
LOG:  terminating any other active server processes

And that the core file back trace may show  association to perl libraries  of 
which we only have two possibilities currently, and this is the most relevant 
logic.

Given the onset of this problem,  we suspect it has something to do with the 
addition of  DNS lookup within the our  perlu function cc.get_sip_id(...).
I would note that we have captured the details of the arguments to the 
cc.get_port_and_registration_data at time of a core  and can repeat
the same query after the core event without incident.  Currently we are testing 
for an absence of the core event by commenting out dns  perl function logic and
have rebuilt postgres with debugging symbols.  An example core of this output 
is below.  ( prior to function alteration ).

I am usually attempting to debug simpler  program errors  without such a bad 
impact on the postgres server.
I would appreciate any comment on potential issues or bad practices in the 
suspect functions and/or additional details
that could be  gathered from the core files that might assist in resolving this 
matter.


Thanks

Dave Day



CREATE OR REPLACE FUNCTION cc.get_port_and_registration_data(cca character 
varying, tgrp character varying, dhost character varying, usr character 
varying[], orig_flag boolean)
  RETURNS SETOF cc.port_type_tbl AS
$BODY$
--  The inputs to this overloaded function are sip parameters.
DECLARE pid INTEGER;
DECLARE uid INTEGER;
DECLARE modeCHARACTER VARYING;
DECLARE sql_result record;

BEGIN

  SELECT * FROM cc.get_sip_id($1,$2,$3, $4) INTO pid LIMIT 1;   -- Perl 
invocation

  FOR sql_result IN
   SELECT cc.get_db_refhndl($5)AS db_ref_hndl,* FROM  cc.port_info t1
 LEFT JOIN (SELECT translator_id, mgcp_digit_map FROM 
cc.translator_sys) t2 USING (translator_id)
 LEFT JOIN cc.register_port USING (port_id)
   WHERE port_id = pid AND op_mode = 'default'
   ORDER by expiration DESC
  LOOP
RETURN NEXT sql_result;
  END LOOP;
  RETURN;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100
  ROWS 1000;
ALTER FUNCTION cc.get_port_and_registration_data(character varying, character 
varying, character varying, character varying[], boolean)
  OWNER TO redcom;


CREATE OR REPLACE FUNCTION cc.get_sip_id(cca character varying, tgrp character 
varying, dhost character varying, usr character varying[])
  RETURNS integer AS
$BODY$

use Socket qw(getaddrinfo getnameinfo
  PF_UNSPEC SOCK_STREAM AI_NUMERICHOST NI_NAMEREQD NIx_NOSERV);
use URI;

sub is_local {
my $host = shift(@_);
my $result = 0;
open my $fh, /sbin/route get $host |;
while ($fh) {
if (m/interface/) {
chomp;
my @fields = split /\s+/;
if ($fields[2] eq lo0) {
$result = 1;
}
last;
}
}
close $fh;
return $result;
}

my ($cca, $tgrp, $dhost, $usr) = @_;

$do_dns_lookup = 1;
{
my $query = qq{
SELECT sip_dns_lookup_on_incoming_requests FROM admin.system_options;
};
my $rv = spi_exec_query($query, 1);
if ($rv-{status} =~ /^SPI_OK/  $rv-{processed}  0) {
$do_dns_lookup = $rv-{rows}[0]-{sip_dns_lookup_on_incoming_requests};
}
}

if ($tgrp ne '') {
my $query = qq{
SELECT port_id FROM cc.port_info WHERE destination_group_id = '$tgrp';
};
my $rv = spi_exec_query($query, 1);
if ($rv-{status} =~ /^SPI_OK/  $rv-{processed}  0) {
return $rv-{rows}[0]-{port_id};
}
}

if ($cca ne '') {
my $query = qq{
SELECT port_id FROM cc.port_info WHERE call_control_agent = '$cca';
};
my $rv = spi_exec_query($query, 1);
if ($rv-{status} =~ /^SPI_OK/  $rv-{processed}  0) {
return $rv-{rows}[0]-{port_id};
}
}

for my $uristr (@$usr) {
if ($uristr ne '') {
my $uri = URI-new($uristr);
if (is_local($uri-host)) {
$dhost = '';
my $name = $uri-user;
if ($name ne '') {
my $query = qq{
SELECT port_id FROM cc.port_info
WHERE registration_user = '$name';
};
my $rv = spi_exec_query($query, 1);
if ($rv-{status} =~ /^SPI_OK/  $rv-{processed}  0) {