Re: [R] Fwd: Help: malloc/free deadlock in unsafe signal handler 'Rf_onsigusr1'

2016-08-02 Thread Ming Li
Thanks luke. cc hawq dev team.

I sent this email to R-devel 2 days before forwarding it to R-help, but no
one reply.

Is there any workaround? When were SIGUSR1 and SIGUSR2 sent in R? Or maybe
we should move all operations not too emergency out of signal handler?

Thanks.


On Tue, Aug 2, 2016 at 4:02 AM,  wrote:

> The handlers for SIGUSR1 and SIGUSR2 are really intended as an
> emergency break, not for ordinary programming. These could be
> rewritten to be safer but that would make them less immediate.
>
> Followups would be more appropriate on R-devel.
>
> Best,
>
> luke
>
>
> On Mon, 1 Aug 2016, Ming Li wrote:
>
> Hi all,
>>
>> I am working on a bug,  which running PLR on HAWQ. The process hung and
>> can't be terminated.
>>
>> From my investigation, it seems signal handler 'Rf_onsigusr1' trigger a
>>>
>> malloc/free deadlock.
>>
>> The calling stack is below.
>>
>> Thread 1 (Thread 0x7f4c93af48e0 (LWP 431263)):
>> #0  0x7f4c9015805e in __lll_lock_wait_private () from /lib64/libc.so.6
>> #1  0x7f4c900dd16b in _L_lock_9503 () from /lib64/libc.so.6
>> #2  0x7f4c900da6a6 in malloc () from /lib64/libc.so.6
>> #3  0x7f4c9008fb39 in _nl_make_l10nflist () from /lib64/libc.so.6
>> #4  0x7f4c9008ddf5 in _nl_find_domain () from /lib64/libc.so.6
>> #5  0x7f4c9008d6e0 in __dcigettext () from /lib64/libc.so.6
>> #6  0x7f4c6fabcfe3 in Rf_onsigusr1 () from
>> /usr/local/lib64/R/lib/libR.so
>> #7  
>> #8  0x7f4c9014079a in brk () from /lib64/libc.so.6
>> #9  0x7f4c90140845 in sbrk () from /lib64/libc.so.6
>> #10 0x7f4c900dd769 in __default_morecore () from /lib64/libc.so.6
>> #11 0x7f4c900d87a2 in _int_free () from /lib64/libc.so.6
>> #12 0x00b3ff24 in gp_free2 ()
>> #13 0x00b356fc in AllocSetDelete ()
>> #14 0x00b38391 in MemoryContextDeleteImpl ()
>> #15 0x0077c851 in ExecEndAgg ()
>> #16 0x007592ad in ExecEndNode ()
>> #17 0x0075186c in ExecEndPlan ()
>> #18 0x0079dffa in ExecEndSubqueryScan ()
>> #19 0x0075921d in ExecEndNode ()
>> #20 0x0075186c in ExecEndPlan ()
>> #21 0x00752565 in ExecutorEnd ()
>> #22 0x006dd9bd in PortalCleanup ()
>> #23 0x00b3f077 in AtCommit_Portals ()
>> #24 0x0051abe5 in CommitTransaction ()
>> #25 0x0051f1d5 in CommitTransactionCommand ()
>> #26 0x0099809e in PostgresMain ()
>> #27 0x008f1031 in BackendStartup ()
>> #28 0x008f70e0 in PostmasterMain ()
>> #29 0x007f63da in main ()
>>
>>
>> I googled and found below info maybe useful to fix it: The best way to
>> avoid this kind of deadlock is to Call only asynchronous-safe functions
>> within signal handlers.
>>
>>
>> https://www.securecoding.cert.org/confluence/display/c/SIG30-C.+Call+only+asynchronous-safe+functions+within+signal+handlers
>>
>> Thanks a lot.
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa  Phone: 319-335-3386
> Department of Statistics andFax:   319-335-3017
>Actuarial Science
> 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
> Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: Help: malloc/free deadlock in unsafe signal handler 'Rf_onsigusr1'

2016-08-01 Thread luke-tierney

The handlers for SIGUSR1 and SIGUSR2 are really intended as an
emergency break, not for ordinary programming. These could be
rewritten to be safer but that would make them less immediate.

Followups would be more appropriate on R-devel.

Best,

luke

On Mon, 1 Aug 2016, Ming Li wrote:


Hi all,

I am working on a bug,  which running PLR on HAWQ. The process hung and
can't be terminated.


From my investigation, it seems signal handler 'Rf_onsigusr1' trigger a

malloc/free deadlock.

The calling stack is below.

Thread 1 (Thread 0x7f4c93af48e0 (LWP 431263)):
#0  0x7f4c9015805e in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x7f4c900dd16b in _L_lock_9503 () from /lib64/libc.so.6
#2  0x7f4c900da6a6 in malloc () from /lib64/libc.so.6
#3  0x7f4c9008fb39 in _nl_make_l10nflist () from /lib64/libc.so.6
#4  0x7f4c9008ddf5 in _nl_find_domain () from /lib64/libc.so.6
#5  0x7f4c9008d6e0 in __dcigettext () from /lib64/libc.so.6
#6  0x7f4c6fabcfe3 in Rf_onsigusr1 () from /usr/local/lib64/R/lib/libR.so
#7  
#8  0x7f4c9014079a in brk () from /lib64/libc.so.6
#9  0x7f4c90140845 in sbrk () from /lib64/libc.so.6
#10 0x7f4c900dd769 in __default_morecore () from /lib64/libc.so.6
#11 0x7f4c900d87a2 in _int_free () from /lib64/libc.so.6
#12 0x00b3ff24 in gp_free2 ()
#13 0x00b356fc in AllocSetDelete ()
#14 0x00b38391 in MemoryContextDeleteImpl ()
#15 0x0077c851 in ExecEndAgg ()
#16 0x007592ad in ExecEndNode ()
#17 0x0075186c in ExecEndPlan ()
#18 0x0079dffa in ExecEndSubqueryScan ()
#19 0x0075921d in ExecEndNode ()
#20 0x0075186c in ExecEndPlan ()
#21 0x00752565 in ExecutorEnd ()
#22 0x006dd9bd in PortalCleanup ()
#23 0x00b3f077 in AtCommit_Portals ()
#24 0x0051abe5 in CommitTransaction ()
#25 0x0051f1d5 in CommitTransactionCommand ()
#26 0x0099809e in PostgresMain ()
#27 0x008f1031 in BackendStartup ()
#28 0x008f70e0 in PostmasterMain ()
#29 0x007f63da in main ()


I googled and found below info maybe useful to fix it: The best way to
avoid this kind of deadlock is to Call only asynchronous-safe functions
within signal handlers.

https://www.securecoding.cert.org/confluence/display/c/SIG30-C.+Call+only+asynchronous-safe+functions+within+signal+handlers

Thanks a lot.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Help: malloc/free deadlock in unsafe signal handler 'Rf_onsigusr1'

2016-08-01 Thread Ming Li
Hi all,

I am working on a bug,  which running PLR on HAWQ. The process hung and
can't be terminated.

>From my investigation, it seems signal handler 'Rf_onsigusr1' trigger a
malloc/free deadlock.

The calling stack is below.

Thread 1 (Thread 0x7f4c93af48e0 (LWP 431263)):
#0  0x7f4c9015805e in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x7f4c900dd16b in _L_lock_9503 () from /lib64/libc.so.6
#2  0x7f4c900da6a6 in malloc () from /lib64/libc.so.6
#3  0x7f4c9008fb39 in _nl_make_l10nflist () from /lib64/libc.so.6
#4  0x7f4c9008ddf5 in _nl_find_domain () from /lib64/libc.so.6
#5  0x7f4c9008d6e0 in __dcigettext () from /lib64/libc.so.6
#6  0x7f4c6fabcfe3 in Rf_onsigusr1 () from /usr/local/lib64/R/lib/libR.so
#7  
#8  0x7f4c9014079a in brk () from /lib64/libc.so.6
#9  0x7f4c90140845 in sbrk () from /lib64/libc.so.6
#10 0x7f4c900dd769 in __default_morecore () from /lib64/libc.so.6
#11 0x7f4c900d87a2 in _int_free () from /lib64/libc.so.6
#12 0x00b3ff24 in gp_free2 ()
#13 0x00b356fc in AllocSetDelete ()
#14 0x00b38391 in MemoryContextDeleteImpl ()
#15 0x0077c851 in ExecEndAgg ()
#16 0x007592ad in ExecEndNode ()
#17 0x0075186c in ExecEndPlan ()
#18 0x0079dffa in ExecEndSubqueryScan ()
#19 0x0075921d in ExecEndNode ()
#20 0x0075186c in ExecEndPlan ()
#21 0x00752565 in ExecutorEnd ()
#22 0x006dd9bd in PortalCleanup ()
#23 0x00b3f077 in AtCommit_Portals ()
#24 0x0051abe5 in CommitTransaction ()
#25 0x0051f1d5 in CommitTransactionCommand ()
#26 0x0099809e in PostgresMain ()
#27 0x008f1031 in BackendStartup ()
#28 0x008f70e0 in PostmasterMain ()
#29 0x007f63da in main ()


I googled and found below info maybe useful to fix it: The best way to
avoid this kind of deadlock is to Call only asynchronous-safe functions
within signal handlers.

https://www.securecoding.cert.org/confluence/display/c/SIG30-C.+Call+only+asynchronous-safe+functions+within+signal+handlers

Thanks a lot.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.