[ https://issues.apache.org/jira/browse/MYNEWT-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Christopher Collins reassigned MYNEWT-745: ------------------------------------------ Assignee: Christopher Collins > Sim - deadlock involving system calls > ------------------------------------- > > Key: MYNEWT-745 > URL: https://issues.apache.org/jira/browse/MYNEWT-745 > Project: Mynewt > Issue Type: Bug > Reporter: Christopher Collins > Assignee: Christopher Collins > Fix For: v1_1_0_rel > > Attachments: main.c > > > The problem appears to occur when a system call is interrupted by a sim > context switch. Because a sim context switch is implemented as a signal > handler that never returns (it calls longjmp()), the system call is left > unfinished. In some cases, it seems the system call acquired some resources > that it never got a chance to release, leading to deadlock on a subsequent > system call. For whatever reason, when the original system call is resumed > (i.e., when Mynewt switch back to the original task), the syscall is unable > to recover. > In sim, a context switch is triggered by delivery of a SIGURG signal. A few > events generate this signal: > # A task calls an OS function with the potential to switch tasks (e.g., > os_eventq_get(), os_mutex_release(), etc.). > # An OS tick occurs. > The problem appears to occur when an OS tick generates the SIGURG signal. > The OS ticker is implemented via an itimer, which generates the SIG_ALRM > signal on each tick. The SIG_ALRM handler advances the OS time, and then > calls os_sched(), potentially generating a SIGURG signal. If the current > task happened to be in the middle of a syscall when the tick timer expired, > the SIGURG signal gets handled before the syscall returns. > Here is a stack trace showing a context switch in the middle of a system call: > {noformat} > (gdb) whe > #0 0x0804a3bd in ctxsw_handler (sig=23) > at kernel/os/src/arch/sim/os_arch_sim.c:150 > #1 <signal handler called> > #2 0xf7ffdbe7 in __kernel_vsyscall () > #3 0x08097630 in __lll_lock_wait_private () > #4 0x080923b0 in __tz_convert () > #5 0x08091673 in localtime () > #6 0x0809162c in ctime () > #7 0x08048a5a in task1_handler (arg=0x0) at apps/slinky/src/main.c:162 > #8 0x0804a2c8 in os_arch_task_start (sf=0x8160314, rc=1) > at kernel/os/src/arch/sim/os_arch_sim.c:88 > #9 0x0804ad90 in os_arch_frame_init () > at kernel/os/src/arch/sim/os_arch_stack_frame.s:98 > #10 0x0804ad90 in os_arch_frame_init () > at kernel/os/src/arch/sim/os_arch_stack_frame.s:98 > {noformat} > Attached is a simple Mynewt app that can be used to replicate this issue > (main.c). -- This message was sent by Atlassian JIRA (v6.3.15#6346)