[jira] [Comment Edited] (MYNEWT-765) os_mbuf memory corruption on native platform

2017-05-25 Thread Christopher Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/MYNEWT-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025576#comment-16025576
 ] 

Christopher Collins edited comment on MYNEWT-765 at 5/26/17 12:52 AM:
--

In addition, the double free issue 
(https://github.com/apache/incubator-mynewt-core/pull/292) was causing a 
problem.  After merging that PR, everything looks good to me.

I am going to merge the above PR now.  If you test this again, be sure to grab 
the latest from master.


was (Author: ccollins476):
Looks like I spoke too soon - I'm seeing another issue after letting it run for 
several minutes:
{noformat}
(gdb) p ble_hci_uart_acl_pool
$4 = {mp_block_size = 292, mp_num_blocks = 1000, mp_num_free = -1467022310, 
mp_min_free = 983, mp_membuf_addr = 1290240, mp_list = {stqe_next = 0xc0be0 
}, {slh_first = 0x13b248}, name = 0x3d0a1 
"ble_hci_uart_acl_pool"}
{noformat}

{{mp_num_free}} shouldn't be negative!  I am going to keep looking at this.  
Hopefully this and the memory corruption issue are related.

(The issues I mentioned in the above comment are still valid.)

> os_mbuf memory corruption on native platform
> 
>
> Key: MYNEWT-765
> URL: https://issues.apache.org/jira/browse/MYNEWT-765
> Project: Mynewt
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
> Environment: bsncent app on native 32-bit Ubuntu 17.04
>Reporter: Michał Narajowski
>Priority: Minor
>
> h4. General description:
> There is a segmentation fault error in function {{ble_hs_log_mbuf}} in file 
> {{net/nimble/host/src/ble_hs_log.c}} when receiving notifications at high 
> rate. Tested using *bsncent* app from 
> https://github.com/rymanluk/incubator-mynewt-core/tree/bsn and *bsnprph* also 
> from https://github.com/apache/incubator-mynewt-core/tree/bsnbranch
> Data from HCI command overwrites the os_mbuf struct instead of being written 
> to {{om->om_data}}. I tried to catch that memory violation earlier in code, 
> but somehow it is only triggered in the {{ble_hs_log_mbuf}} function.
> h4. How to reproduce:
> 1. Build and flash *bsnprph* app from 
> https://github.com/apache/incubator-mynewt-core/tree/bsnbranch with the 
> following configuration:
> {quote}
> app=@apache-mynewt-core/apps/bsnprph
> bsp=@apache-mynewt-core/hw/bsp/nrf52dk
> build_profile=optimized
> {quote}
> 2. Build *bsncent* app from 
> https://github.com/rymanluk/incubator-mynewt-core/tree/bsn with the following 
> configuration:
> {quote}
> app=@apache-mynewt-core/apps/bsncent
> bsp=@apache-mynewt-core/hw/bsp/native
> build_profile=debug
> syscfg=BLE_HS_DEBUG=1:BLE_MAX_CONNECTIONS=5:BLE_SM_BONDING=1:BLE_SM_IO_CAP=BLE_HS_IO_KEYBOARD_DISPLAY:BLE_SM_LEGACY=1:BLE_SM_MITM=1:BLE_SM_OUR_KEY_DIST=7:BLE_SM_SC=1:BLE_SOCK_LINUX_DEV=0:BLE_SOCK_USE_LINUX_BLUE=1:BLE_SOCK_USE_TCP=0:LOG_LEVEL=0:MCU_NATIVE_USE_SIGNALS=1:OS_MAIN_STACK_SIZE=512:SHELL_TASK=1
> {quote}
> 3. It is possible to reproduce it using Mynewt controller (but then another 
> issue shows up sometimes, described below) or some other controller like PTS 
> with some hacks in ble_hs_startup.c to start controller.
> 4. Run *bsncent* app from 32bit Ubuntu
> Here is the backtrace from GDB:
> {quote}
> Program received signal SIGSEGV, Segmentation fault.
> __memcpy_sse2_unaligned () at 
> ../sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S:651
> 651 ../sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S: No such file 
> or directory.
> (gdb) bt
> #0  __memcpy_sse2_unaligned () at 
> ../sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S:651
> #1  0x80009fc0 in os_mbuf_copydata (m=0x8008fb6c, off=0, len=1, 
> dst=0x800746c7 ) at 
> repos/apache-mynewt-core/kernel/os/src/os_mbuf.c:722
> #2  0x8001fb5a in ble_hs_log_mbuf (om=0x8008fb6c)
> at repos/apache-mynewt-core/net/nimble/host/src/ble_hs_log.c:32
> #3  0x8001f18c in ble_hs_hci_evt_acl_process (om=0x8008fb6c)
> at repos/apache-mynewt-core/net/nimble/host/src/ble_hs_hci_evt.c:631
> #4  0x80018c1f in ble_hs_process_rx_data_queue ()
> at repos/apache-mynewt-core/net/nimble/host/src/ble_hs.c:195
> #5  0x80019020 in ble_hs_event_data (ev=0x80075aec )
> at repos/apache-mynewt-core/net/nimble/host/src/ble_hs.c:379
> #6  0x80007009 in os_eventq_run (evq=0x80074908 )
> at repos/apache-mynewt-core/kernel/os/src/os_eventq.c:172
> #7  0x80002308 in main (argc=0, argv=0x0) at 
> repos/apache-mynewt-core/apps/bsncent/src/main.c:457
> {quote}
> h4. Another issue
> Actually, there is also a second problem. When using *blehci* as the 
> controller the communication between central and peripheral freezes somewhere 
> around GATT discovery most of the time. It happens quiet randomly. 
> To reproduce it:
> 1. Build and flash *blehci* app from 
> 

[jira] [Comment Edited] (MYNEWT-765) os_mbuf memory corruption on native platform

2017-05-25 Thread Christopher Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/MYNEWT-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025576#comment-16025576
 ] 

Christopher Collins edited comment on MYNEWT-765 at 5/26/17 12:02 AM:
--

Looks like I spoke too soon - I'm seeing another issue after letting it run for 
several minutes:
{noformat}
(gdb) p ble_hci_uart_acl_pool
$4 = {mp_block_size = 292, mp_num_blocks = 1000, mp_num_free = -1467022310, 
mp_min_free = 983, mp_membuf_addr = 1290240, mp_list = {stqe_next = 0xc0be0 
}, {slh_first = 0x13b248}, name = 0x3d0a1 
"ble_hci_uart_acl_pool"}
{noformat}

{{mp_num_free}} shouldn't be negative!  I am going to keep looking at this.  
Hopefully this and the memory corruption issue are related.

(The issues I mentioned in the above comment are still valid.)


was (Author: ccollins476):
Looks like I spoke too soon - I'm seeing another issue after letting it run for 
several minutes:
```
(gdb) p ble_hci_uart_acl_pool
$4 = {mp_block_size = 292, mp_num_blocks = 1000, mp_num_free = -1467022310, 
mp_min_free = 983, mp_membuf_addr = 1290240, mp_list = {stqe_next = 0xc0be0 
}, {slh_first = 0x13b248}, name = 0x3d0a1 
"ble_hci_uart_acl_pool"}
```

`mp_num_free` shouldn't be negative!  I am going to keep looking at this.  
Hopefully this and the memory corruption issue are related.

(The issues I mentioned in the above comment are still valid.)

> os_mbuf memory corruption on native platform
> 
>
> Key: MYNEWT-765
> URL: https://issues.apache.org/jira/browse/MYNEWT-765
> Project: Mynewt
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
> Environment: bsncent app on native 32-bit Ubuntu 17.04
>Reporter: Michał Narajowski
>Priority: Minor
>
> h4. General description:
> There is a segmentation fault error in function {{ble_hs_log_mbuf}} in file 
> {{net/nimble/host/src/ble_hs_log.c}} when receiving notifications at high 
> rate. Tested using *bsncent* app from 
> https://github.com/rymanluk/incubator-mynewt-core/tree/bsn and *bsnprph* also 
> from https://github.com/apache/incubator-mynewt-core/tree/bsnbranch
> Data from HCI command overwrites the os_mbuf struct instead of being written 
> to {{om->om_data}}. I tried to catch that memory violation earlier in code, 
> but somehow it is only triggered in the {{ble_hs_log_mbuf}} function.
> h4. How to reproduce:
> 1. Build and flash *bsnprph* app from 
> https://github.com/apache/incubator-mynewt-core/tree/bsnbranch with the 
> following configuration:
> {quote}
> app=@apache-mynewt-core/apps/bsnprph
> bsp=@apache-mynewt-core/hw/bsp/nrf52dk
> build_profile=optimized
> {quote}
> 2. Build *bsncent* app from 
> https://github.com/rymanluk/incubator-mynewt-core/tree/bsn with the following 
> configuration:
> {quote}
> app=@apache-mynewt-core/apps/bsncent
> bsp=@apache-mynewt-core/hw/bsp/native
> build_profile=debug
> syscfg=BLE_HS_DEBUG=1:BLE_MAX_CONNECTIONS=5:BLE_SM_BONDING=1:BLE_SM_IO_CAP=BLE_HS_IO_KEYBOARD_DISPLAY:BLE_SM_LEGACY=1:BLE_SM_MITM=1:BLE_SM_OUR_KEY_DIST=7:BLE_SM_SC=1:BLE_SOCK_LINUX_DEV=0:BLE_SOCK_USE_LINUX_BLUE=1:BLE_SOCK_USE_TCP=0:LOG_LEVEL=0:MCU_NATIVE_USE_SIGNALS=1:OS_MAIN_STACK_SIZE=512:SHELL_TASK=1
> {quote}
> 3. It is possible to reproduce it using Mynewt controller (but then another 
> issue shows up sometimes, described below) or some other controller like PTS 
> with some hacks in ble_hs_startup.c to start controller.
> 4. Run *bsncent* app from 32bit Ubuntu
> Here is the backtrace from GDB:
> {quote}
> Program received signal SIGSEGV, Segmentation fault.
> __memcpy_sse2_unaligned () at 
> ../sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S:651
> 651 ../sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S: No such file 
> or directory.
> (gdb) bt
> #0  __memcpy_sse2_unaligned () at 
> ../sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S:651
> #1  0x80009fc0 in os_mbuf_copydata (m=0x8008fb6c, off=0, len=1, 
> dst=0x800746c7 ) at 
> repos/apache-mynewt-core/kernel/os/src/os_mbuf.c:722
> #2  0x8001fb5a in ble_hs_log_mbuf (om=0x8008fb6c)
> at repos/apache-mynewt-core/net/nimble/host/src/ble_hs_log.c:32
> #3  0x8001f18c in ble_hs_hci_evt_acl_process (om=0x8008fb6c)
> at repos/apache-mynewt-core/net/nimble/host/src/ble_hs_hci_evt.c:631
> #4  0x80018c1f in ble_hs_process_rx_data_queue ()
> at repos/apache-mynewt-core/net/nimble/host/src/ble_hs.c:195
> #5  0x80019020 in ble_hs_event_data (ev=0x80075aec )
> at repos/apache-mynewt-core/net/nimble/host/src/ble_hs.c:379
> #6  0x80007009 in os_eventq_run (evq=0x80074908 )
> at repos/apache-mynewt-core/kernel/os/src/os_eventq.c:172
> #7  0x80002308 in main (argc=0, argv=0x0) at 
> repos/apache-mynewt-core/apps/bsncent/src/main.c:457
> {quote}
> h4. Another issue
> Actually, there is also a second 

[jira] [Comment Edited] (MYNEWT-765) os_mbuf memory corruption on native platform

2017-05-23 Thread Christopher Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/MYNEWT-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021734#comment-16021734
 ] 

Christopher Collins edited comment on MYNEWT-765 at 5/23/17 11:26 PM:
--

{quote}
2. Build bsncent app from 
https://github.com/rymanluk/incubator-mynewt-core/tree/bsn with the following 
configuration:

app=@apache-mynewt-core/apps/bsncent
bsp=@apache-mynewt-core/hw/bsp/native
build_profile=debug

syscfg=BLE_HS_DEBUG=1:BLE_MAX_CONNECTIONS=5:BLE_SM_BONDING=1:BLE_SM_IO_CAP=BLE_HS_IO_KEYBOARD_DISPLAY:BLE_SM_LEGACY=1:BLE_SM_MITM=1:BLE_SM_OUR_KEY_DIST=7:BLE_SM_SC=1:BLE_SOCK_LINUX_DEV=0:BLE_SOCK_USE_LINUX_BLUE=1:BLE_SOCK_USE_TCP=0:LOG_LEVEL=0:MCU_NATIVE_USE_SIGNALS=1:OS_MAIN_STACK_SIZE=512:SHELL_TASK=1
{quote}

MCU_NATIVE_USE_SIGNALS should probably be set to 0 here (not 1).  From 
{{hw/mcu/native/syscfg.yml}}:
{noformat}
MCU_NATIVE_USE_SIGNALS:
description: >
Whether to use POSIX signals to implement context switches.  Valid
values are as follows:
1: More correctness; less stability.  The OS tick timer will
   cause a high-priority task to preempt a low-priority task.
   This causes stability issues because a task can be preempted
   while it is in the middle of a system call, potentially
   causing deadlock or memory corruption.

0: Less correctness; more stability.  The OS tick timer only
   runs while the idle task is active.  Therefore, a sleeping
   high-priority task will not preempt a low-priority task due
   to a timing event (e.g., delay or callout expired).
   However, this version of sim does not suffer from the
   stability issues that affect the "signals" implementation.

Unit tests should use 1.  Long-running sim processes should use 0.
{noformat}

Setting this to 0 causes sim to use the new behavior implemented in 
{{9864f55e53df0b945fa3482d9b9ea63109c09123}}.

Hopefully this is the problem.  With this setting equal to 1, I have seen 
memory corruption like this (typically when mmap() or sbrk() gets longjmped out 
of (called via malloc()).


was (Author: ccollins476):
{quote}
2. Build bsncent app from 
https://github.com/rymanluk/incubator-mynewt-core/tree/bsn with the following 
configuration:

app=@apache-mynewt-core/apps/bsncent
bsp=@apache-mynewt-core/hw/bsp/native
build_profile=debug

syscfg=BLE_HS_DEBUG=1:BLE_MAX_CONNECTIONS=5:BLE_SM_BONDING=1:BLE_SM_IO_CAP=BLE_HS_IO_KEYBOARD_DISPLAY:BLE_SM_LEGACY=1:BLE_SM_MITM=1:BLE_SM_OUR_KEY_DIST=7:BLE_SM_SC=1:BLE_SOCK_LINUX_DEV=0:BLE_SOCK_USE_LINUX_BLUE=1:BLE_SOCK_USE_TCP=0:LOG_LEVEL=0:MCU_NATIVE_USE_SIGNALS=1:OS_MAIN_STACK_SIZE=512:SHELL_TASK=1
{quote}

MCU_NATIVE_USE_SIGNALS should probably be set to 0 here (not 1).  From 
{{hw/mcu/native/syscfg.yml}}:
{noformat}
MCU_NATIVE_USE_SIGNALS:
description: >
Whether to use POSIX signals to implement context switches.  Valid
values are as follows:
1: More correctness; less stability.  The OS tick timer will
   cause a high-priority task to preempt a low-priority task.
   This causes stability issues because a task can be preempted
   while it is in the middle of a system call, potentially
   causing deadlock or memory corruption.

0: Less correctness; more stability.  The OS tick timer only
   runs while the idle task is active.  Therefore, a sleeping
   high-priority task will not preempt a low-priority task due
   to a timing event (e.g., delay or callout expired).
   However, this version of sim does not suffer from the
   stability issues that affect the "signals" implementation.

Unit tests should use 1.  Long-running sim processes should use 0.
{noformat}

Setting this to 0 causes sim to use the new behavior implemented in 
{{9864f55e53df0b945fa3482d9b9ea63109c09123}}.

Hopefully this is the problem.  With this setting equal to 0, I have seen 
memory corruption like this (typically when mmap() or sbrk() gets longjmped out 
of (called via malloc()).

> os_mbuf memory corruption on native platform
> 
>
> Key: MYNEWT-765
> URL: https://issues.apache.org/jira/browse/MYNEWT-765
> Project: Mynewt
>  Issue Type: Bug
> Environment: bsncent app on native 32-bit Ubuntu 17.04
>Reporter: Michał Narajowski
>Priority: Minor
>
> h4. General description:
> There is a segmentation fault error in function {{ble_hs_log_mbuf}} in file 
> {{net/nimble/host/src/ble_hs_log.c}} when receiving notifications