[jira] [Updated] (MYNEWT-745) Sim - deadlock involving system calls

2017-05-08 Thread Christopher Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MYNEWT-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Collins updated MYNEWT-745:
---
Description: 
The problem appears to occur when a system call is interrupted by a sim context 
switch.  Because a sim context switch is implemented as a signal handler that 
never returns (it calls longjmp()), the system call is left unfinished.  In 
some cases, it seems the system call acquired some resources that it never got 
a chance to release, leading to deadlock on a subsequent system call. For 
whatever reason, when the original system call is resumed (i.e., when Mynewt 
switch back to the original task), the syscall is unable to recover.

In sim, a context switch is triggered by delivery of a SIGURG signal. A few 
events generate this signal:
# A task calls an OS function with the potential to switch tasks (e.g., 
os_eventq_get(), os_mutex_release(), etc.).
# An OS tick occurs.

The problem appears to occur when an OS tick generates the SIGURG signal.  The 
OS ticker is implemented via an itimer, which generates the SIG_ALRM signal on 
each tick.  The SIG_ALRM handler advances the OS time, and then calls 
os_sched(), potentially generating a SIGURG signal.  If the current task 
happened to be in the middle of a syscall when the tick timer expired, the 
SIGURG signal gets handled before the syscall returns.

Here is a stack trace showing a context switch in the middle of a system call:

{noformat}
(gdb) whe
#0  0x0804a3bd in ctxsw_handler (sig=23)
at kernel/os/src/arch/sim/os_arch_sim.c:150
#1  
#2  0xf7ffdbe7 in __kernel_vsyscall ()
#3  0x08097630 in __lll_lock_wait_private ()
#4  0x080923b0 in __tz_convert ()
#5  0x08091673 in localtime ()
#6  0x0809162c in ctime ()
#7  0x08048a5a in task1_handler (arg=0x0) at apps/slinky/src/main.c:162
#8  0x0804a2c8 in os_arch_task_start (sf=0x8160314, rc=1)
at kernel/os/src/arch/sim/os_arch_sim.c:88
#9  0x0804ad90 in os_arch_frame_init ()
at kernel/os/src/arch/sim/os_arch_stack_frame.s:98
#10 0x0804ad90 in os_arch_frame_init ()
at kernel/os/src/arch/sim/os_arch_stack_frame.s:98
{noformat}

Attached is a simple Mynewt app that can be used to replicate this issue 
(main.c).

  was:
The problem appears to occur when a system call is interrupted by a sim context 
switch.  Because a sim context switch is implemented as a signal handler that 
never returns (it calls longjmp()), the system call is left unfinished.  In 
some cases, it seems the system call acquired some resources that it never got 
a chance to release, leading to deadlock on a subsequent system call.

Sim has protections in place to prevent this problem from happening.  
Specifically, a context switch is triggered by delivery of a SIGURG signal, and 
SIGURG is only sent from within the SIGALARM signal handler.  These handlers 
are configured such that all signals are blocked until the handlers complete (I 
am not sure how this works for the SIGURG handler, considering it never 
returns).

My initial guess was that a pending SIGURG signal does not get delivered as 
soon as it is unblocked at the end of the SIGALARM handler.  However, a simple 
test using sigpending() and sleep prove that this is not the case.

Here is a stack trace showing a context switch in the middle of a system call:

{noformat}
(gdb) whe
#0  0x0804a3bd in ctxsw_handler (sig=23)
at kernel/os/src/arch/sim/os_arch_sim.c:150
#1  
#2  0xf7ffdbe7 in __kernel_vsyscall ()
#3  0x08097630 in __lll_lock_wait_private ()
#4  0x080923b0 in __tz_convert ()
#5  0x08091673 in localtime ()
#6  0x0809162c in ctime ()
#7  0x08048a5a in task1_handler (arg=0x0) at apps/slinky/src/main.c:162
#8  0x0804a2c8 in os_arch_task_start (sf=0x8160314, rc=1)
at kernel/os/src/arch/sim/os_arch_sim.c:88
#9  0x0804ad90 in os_arch_frame_init ()
at kernel/os/src/arch/sim/os_arch_stack_frame.s:98
#10 0x0804ad90 in os_arch_frame_init ()
at kernel/os/src/arch/sim/os_arch_stack_frame.s:98
{noformat}

Attached is a simple Mynewt app that can be used to replicate this issue 
(main.c).


> Sim - deadlock involving system calls
> -
>
> Key: MYNEWT-745
> URL: https://issues.apache.org/jira/browse/MYNEWT-745
> Project: Mynewt
>  Issue Type: Bug
>Reporter: Christopher Collins
> Fix For: v1_1_0_rel
>
> Attachments: main.c
>
>
> The problem appears to occur when a system call is interrupted by a sim 
> context switch.  Because a sim context switch is implemented as a signal 
> handler that never returns (it calls longjmp()), the system call is left 
> unfinished.  In some cases, it seems the system call acquired some resources 
> that it never got a chance to release, leading to deadlock on a subsequent 
> system call. For whatever reason, when the original system call is resumed 
> (i.e., when Mynewt switch back to the 

[jira] [Updated] (MYNEWT-744) SensorAPI: improvements

2017-05-08 Thread Vipul Rahane (JIRA)

 [ 
https://issues.apache.org/jira/browse/MYNEWT-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vipul Rahane updated MYNEWT-744:

Fix Version/s: v1_1_0_rel

> SensorAPI: improvements
> ---
>
> Key: MYNEWT-744
> URL: https://issues.apache.org/jira/browse/MYNEWT-744
> Project: Mynewt
>  Issue Type: Improvement
>Reporter: Vipul Rahane
>Assignee: Vipul Rahane
> Fix For: v1_1_0_rel
>
>
> 1. Changa data structure to SLIST instead of TAILQ so that it can get 
> initialized statically. 
> 2. Remove dummy functions in the bsp and add the sensor device initialization 
> functions instead.
> 3. sensor_dev_create() should be moved out of hal_bsp.c.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MYNEWT-526) Gap params update succeeds, but times out immediately after

2017-05-08 Thread Jacob (JIRA)

 [ 
https://issues.apache.org/jira/browse/MYNEWT-526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob updated MYNEWT-526:
-

Thanks for the hard work everyone

On Mon, May 8, 2017 at 2:00 AM, Łukasz Rymanowski (JIRA) 



> Gap params update succeeds, but times out immediately after
> ---
>
> Key: MYNEWT-526
> URL: https://issues.apache.org/jira/browse/MYNEWT-526
> Project: Mynewt
>  Issue Type: Bug
>  Components: Nimble
>Affects Versions: v1_0_0_beta1
> Environment: macos sierra, gcc version 5.4.1 20160919
>Reporter: Jacob
>Assignee: Łukasz Rymanowski
> Attachments: polar_mynewt.pcap, polar_mynewt.pcap, 
> polar_mynewt_with_android_param.pcapng
>
>
> Modification of blecent, connecting and subscribing to my polar HRM.
> My monitor (consistently) connects and subscribes and gets notifications for 
> like 30 seconds before it does a successful gap params update, and then 
> promptly times out.
> I havent confirmed this yet but grammar wise you do seem to be mixing terms, 
> timeout_multiplier and supervision_timeout are separate things in the spec
> l2cap_params->timeout_multiplier = params->supervision_timeout; 
> Perhaps the timeout_multiplier should be multiplied by 10ms to get the 
> supervision_timeout?
> 3924:[ts=30656208ssb, mod=4 level=0] rxed att command: notify req; conn=1 
> handle=0x0011
> 3926:[ts=30671832ssb, mod=64 level=1] received notification; conn_handle=1 
> attr_handle=17 attr_len=4
> 3928:[ts=30687456ssb, mod=64 level=1] 0x16:0x00:0xee:0x02
> 3929:[ts=30695268ssb, mod=64 level=1] pkthdr_len=16; om_len=4  
> ble_hs_hci_evt_acl_process(): handle=1 pb=2 len=13 data=0x09 0x00 0x04 0x00 
> 0x1b 0x11 0x00 0x16 0x55 0xda 0x02 0xc9 0x02 
> 4052:[ts=31656208ssb, mod=4 level=0] rxed att command: notify req; conn=1 
> handle=0x0011
> 4054:[ts=31671832ssb, mod=64 level=1] received notification; conn_handle=1 
> attr_handle=17 attr_len=6
> 4056:[ts=31687456ssb, mod=64 level=1] 0x16:0x55:0xda:0x02:0xc9:0x02
> 4057:[ts=31695268ssb, mod=64 level=1] pkthdr_len=16; om_len=6  
> ble_hs_hci_evt_acl_process(): handle=1 pb=2 len=16 data=0x0c 0x00 0x05 0x00 
> 0x12 0x01 0x08 0x00 0xfa 0x00 0x90 0x01 0x01 0x00 0x58 0x02 
> 4148:[ts=32406224ssb, mod=4 level=0] L2CAP - rxed signalling msg: 0x12 0x01 
> 0x08 0x00 0xfa 0x00 0x90 0x01 0x01 0x00 0x58 0x02 
> 4151:[ts=32429660ssb, mod=4 level=1] GAP procedure initiated: connection 
> parameter update; conn_handle=1 itvl_min=250 itvl_max=400 latency=1 
> supervision_timeout=600 min_ce_len=16 max_ce_len
> 4156:[ts=32468720ssb, mod=4 level=0] ble_hs_hci_cmd_send: ogf=0x08 ocf=0x13 
> len=14
> 4158:[ts=32484344ssb, mod=4 level=0] 0x13 0x20 0x0e 0x01 0x00 0xfa 0x00 0x90 
> 0x01 0x01 0x00 0x58 0x02 0x10 0x00 0x00 0x03 
> 4159:[ts=32492156ssb, mod=4 level=0] Command Status: status=0 cmd_pkts=1 
> ocf=0x13 ogf=0x8
> 4162:[ts=32515592ssb, mod=4 level=0] host tx hci data; handle=1 length=10
> 4163:[ts=32523404ssb, mod=4 level=0] ble_hs_hci_acl_tx(): 0x01 0x00 0x0a 0x00 
> 0x06 0x00 0x05 0x00 0x13 0x01 0x02 0x00 0x00 0x00 
> 4171:[ts=32585900ssb, mod=4 level=0] Number of Completed Packets: 
> num_handles=1
> 4173:[ts=32601524ssb, mod=4 level=0] handle:1 pkts:1
> 4178:[ts=32640584ssb, mod=4 level=0] ble_hs_hci_evt_acl_process(): handle=1 
> pb=2 len=11 data=0x07 0x00 0x04 0x00 0x1b 0x11 0x00 0x16 0x56 0xc0 0x02 
> 4181:[ts=32664020ssb, mod=4 level=0] rxed att command: notify req; conn=1 
> handle=0x0011
> 4183:[ts=32679644ssb, mod=64 level=1] received notification; conn_handle=1 
> attr_handle=17 attr_len=4
> 4185:[ts=32695268ssb, mod=64 level=1] 0x16:0x56:0xc0:0x02
> 4186:[ts=32703080ssb, mod=64 level=1] pkthdr_len=16; om_len=4  LE Connection 
> Update Complete. handle=1 itvl=400 latency=1 timeout=600
> 4978:[ts=38890568ssb, mod=4 level=0] Disconnection Complete: status=0 
> handle=1 reason=8
> 4980:[ts=38906192ssb, mod=64 level=1] disconnect; reason=520 handle=1 
> our_ota_addr_type=0 our_ota_addr=0c:0c:0c:0c:0c:0c our_id_addr_type=0 
> our_id_addr=0c:0c:0c:0c:0c:0c peer_ota_addr_type=0 
> peer_ota_addr=00:22:d0:2a:e4:a3 peer_id_addr_type=0 
> peer_id_addr=00:22:d0:2a:e4:a3 conn_itvl=400 conn_latency=1 
> supervision_timeout=600 encrypted=0 authenticated=0 bonded=0
> 4988:[ts=38968688ssb, mod=4 level=1] GAP procedure initiated: discovery; 
> own_addr_type=0 filter_policy=0 passive=1 limited=0 filter_duplicates=1 
> duration=forever



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (MYNEWT-728) Raspberry Pi 3 support

2017-05-08 Thread Marko Kiiskila (JIRA)

 [ 
https://issues.apache.org/jira/browse/MYNEWT-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Kiiskila resolved MYNEWT-728.
---
Resolution: Fixed

> Raspberry Pi 3 support
> --
>
> Key: MYNEWT-728
> URL: https://issues.apache.org/jira/browse/MYNEWT-728
> Project: Mynewt
>  Issue Type: New Feature
>  Components: OS
>Reporter: Sheela
>Assignee: Marko Kiiskila
> Fix For: v1_1_0_rel
>
>
> support raspberry pi as dev environment



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (MYNEWT-526) Gap params update succeeds, but times out immediately after

2017-05-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MYNEWT-526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Łukasz Rymanowski resolved MYNEWT-526.
--
Resolution: Fixed

Workaround for broken TI delivered in following PR:

https://github.com/apache/incubator-mynewt-core/pull/257



> Gap params update succeeds, but times out immediately after
> ---
>
> Key: MYNEWT-526
> URL: https://issues.apache.org/jira/browse/MYNEWT-526
> Project: Mynewt
>  Issue Type: Bug
>  Components: Nimble
>Affects Versions: v1_0_0_beta1
> Environment: macos sierra, gcc version 5.4.1 20160919
>Reporter: Jacob
>Assignee: Łukasz Rymanowski
> Attachments: polar_mynewt.pcap, polar_mynewt.pcap, 
> polar_mynewt_with_android_param.pcapng
>
>
> Modification of blecent, connecting and subscribing to my polar HRM.
> My monitor (consistently) connects and subscribes and gets notifications for 
> like 30 seconds before it does a successful gap params update, and then 
> promptly times out.
> I havent confirmed this yet but grammar wise you do seem to be mixing terms, 
> timeout_multiplier and supervision_timeout are separate things in the spec
> l2cap_params->timeout_multiplier = params->supervision_timeout; 
> Perhaps the timeout_multiplier should be multiplied by 10ms to get the 
> supervision_timeout?
> 3924:[ts=30656208ssb, mod=4 level=0] rxed att command: notify req; conn=1 
> handle=0x0011
> 3926:[ts=30671832ssb, mod=64 level=1] received notification; conn_handle=1 
> attr_handle=17 attr_len=4
> 3928:[ts=30687456ssb, mod=64 level=1] 0x16:0x00:0xee:0x02
> 3929:[ts=30695268ssb, mod=64 level=1] pkthdr_len=16; om_len=4  
> ble_hs_hci_evt_acl_process(): handle=1 pb=2 len=13 data=0x09 0x00 0x04 0x00 
> 0x1b 0x11 0x00 0x16 0x55 0xda 0x02 0xc9 0x02 
> 4052:[ts=31656208ssb, mod=4 level=0] rxed att command: notify req; conn=1 
> handle=0x0011
> 4054:[ts=31671832ssb, mod=64 level=1] received notification; conn_handle=1 
> attr_handle=17 attr_len=6
> 4056:[ts=31687456ssb, mod=64 level=1] 0x16:0x55:0xda:0x02:0xc9:0x02
> 4057:[ts=31695268ssb, mod=64 level=1] pkthdr_len=16; om_len=6  
> ble_hs_hci_evt_acl_process(): handle=1 pb=2 len=16 data=0x0c 0x00 0x05 0x00 
> 0x12 0x01 0x08 0x00 0xfa 0x00 0x90 0x01 0x01 0x00 0x58 0x02 
> 4148:[ts=32406224ssb, mod=4 level=0] L2CAP - rxed signalling msg: 0x12 0x01 
> 0x08 0x00 0xfa 0x00 0x90 0x01 0x01 0x00 0x58 0x02 
> 4151:[ts=32429660ssb, mod=4 level=1] GAP procedure initiated: connection 
> parameter update; conn_handle=1 itvl_min=250 itvl_max=400 latency=1 
> supervision_timeout=600 min_ce_len=16 max_ce_len
> 4156:[ts=32468720ssb, mod=4 level=0] ble_hs_hci_cmd_send: ogf=0x08 ocf=0x13 
> len=14
> 4158:[ts=32484344ssb, mod=4 level=0] 0x13 0x20 0x0e 0x01 0x00 0xfa 0x00 0x90 
> 0x01 0x01 0x00 0x58 0x02 0x10 0x00 0x00 0x03 
> 4159:[ts=32492156ssb, mod=4 level=0] Command Status: status=0 cmd_pkts=1 
> ocf=0x13 ogf=0x8
> 4162:[ts=32515592ssb, mod=4 level=0] host tx hci data; handle=1 length=10
> 4163:[ts=32523404ssb, mod=4 level=0] ble_hs_hci_acl_tx(): 0x01 0x00 0x0a 0x00 
> 0x06 0x00 0x05 0x00 0x13 0x01 0x02 0x00 0x00 0x00 
> 4171:[ts=32585900ssb, mod=4 level=0] Number of Completed Packets: 
> num_handles=1
> 4173:[ts=32601524ssb, mod=4 level=0] handle:1 pkts:1
> 4178:[ts=32640584ssb, mod=4 level=0] ble_hs_hci_evt_acl_process(): handle=1 
> pb=2 len=11 data=0x07 0x00 0x04 0x00 0x1b 0x11 0x00 0x16 0x56 0xc0 0x02 
> 4181:[ts=32664020ssb, mod=4 level=0] rxed att command: notify req; conn=1 
> handle=0x0011
> 4183:[ts=32679644ssb, mod=64 level=1] received notification; conn_handle=1 
> attr_handle=17 attr_len=4
> 4185:[ts=32695268ssb, mod=64 level=1] 0x16:0x56:0xc0:0x02
> 4186:[ts=32703080ssb, mod=64 level=1] pkthdr_len=16; om_len=4  LE Connection 
> Update Complete. handle=1 itvl=400 latency=1 timeout=600
> 4978:[ts=38890568ssb, mod=4 level=0] Disconnection Complete: status=0 
> handle=1 reason=8
> 4980:[ts=38906192ssb, mod=64 level=1] disconnect; reason=520 handle=1 
> our_ota_addr_type=0 our_ota_addr=0c:0c:0c:0c:0c:0c our_id_addr_type=0 
> our_id_addr=0c:0c:0c:0c:0c:0c peer_ota_addr_type=0 
> peer_ota_addr=00:22:d0:2a:e4:a3 peer_id_addr_type=0 
> peer_id_addr=00:22:d0:2a:e4:a3 conn_itvl=400 conn_latency=1 
> supervision_timeout=600 encrypted=0 authenticated=0 bonded=0
> 4988:[ts=38968688ssb, mod=4 level=1] GAP procedure initiated: discovery; 
> own_addr_type=0 filter_policy=0 passive=1 limited=0 filter_duplicates=1 
> duration=forever



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)