Re: uart rpmsg driver compatibility
Hi, The attachments are the rpmsg-tty patches based on linux LTS 5.15. If you have any problems, feel free to contact me. Regards, Bowen Wang Xiang Xiao 于2024年3月11日周一 23:43写道: > On Mon, Mar 11, 2024 at 11:13 PM Andre Heinemans > wrote: > > > Hi, > > > > Does the NuttX uart_rpmsg.c driver have a Linux counterpart to interact > > with? > > > Yes, the old version is here: > > https://lore.kernel.org/lkml/CAH2Cfb87Wacgsh=xz9h9kgwygbkxnbdbcdj4w3ups2likbt...@mail.gmail.com/ > > > > I want to achieve a virtual uart connection through rpmsg on an imx8mp > > between NuttX (m7) and Linux (a53). > > The tty_rpmsg.c driver in mainline linux does not seem compatible as it > > read and writes the raw data directly from the rpmsg buffers. > > > The mainline version comes from ST developer, which lacks of the flow > control and very easy to lose the data with the fast transaction. > > > > Whereas the NuttX driver uses a struct ‘uart_rpmsg_write_s’ which > contains > > the raw data in one of its fields. > > > > > We renewed the rpmsg tty driver on top of Linux 5.14 recently, which works > perfectly with the NuttX mainline rpmsg_uart driver. > Bowen could share the implementation tomorrow. > > > > > Kind regards, > > Andre > > > From 32118f9e7f00bd874cec91f8f23d8647a9576e7c Mon Sep 17 00:00:00 2001 From: Bowen Wang Date: Fri, 15 Dec 2023 19:29:34 +0800 Subject: [PATCH 2/4] rpmsg: support the zero copy transimit VELAPLATFO-18516 by rpmsg_get_tx_payload_buffer and rpmsg_sendxxx_nocopy Change-Id: I0b9ae403783232e33d736eee9be888c33317d3e0 Signed-off-by: Xiang Xiao --- drivers/rpmsg/rpmsg_core.c | 110 + drivers/rpmsg/rpmsg_internal.h | 31 +++-- drivers/rpmsg/virtio_rpmsg_bus.c | 203 --- include/linux/rpmsg.h| 42 +++ 4 files changed, 280 insertions(+), 106 deletions(-) diff --git a/drivers/rpmsg/rpmsg_core.c b/drivers/rpmsg/rpmsg_core.c index 43f40d3713a9..65be0d411403 100644 --- a/drivers/rpmsg/rpmsg_core.c +++ b/drivers/rpmsg/rpmsg_core.c @@ -133,6 +133,116 @@ void rpmsg_destroy_ept(struct rpmsg_endpoint *ept) } EXPORT_SYMBOL(rpmsg_destroy_ept); +/** + * rpmsg_get_tx_payload_buffer() - get the payload buffer from the pool + * @ept: the rpmsg endpoint + * @len: length of payload + * @wait: wait if the pool is empty + * + * Returns the buffer on success and an appropriate error value on failure. + */ +void *rpmsg_get_tx_payload_buffer(struct rpmsg_endpoint *ept, + unsigned int *len, bool wait) +{ + if (WARN_ON(!ept)) + return ERR_PTR(-EINVAL); + if (!ept->ops->get_tx_payload_buffer) + return ERR_PTR(-ENXIO); + + return ept->ops->get_tx_payload_buffer(ept, len, wait); +} +EXPORT_SYMBOL(rpmsg_get_tx_payload_buffer); + +/** + * rpmsg_send_nocopy() - send a message across to the remote processor + * @ept: the rpmsg endpoint + * @data: payload of message + * @len: length of payload + * + * This function sends @data of length @len on the @ept endpoint. + * The message will be sent to the remote processor which the @ept + * endpoint belongs to, using @ept's address and its associated rpmsg + * device destination addresses. + * In case there are no TX buffers available, the function will block until + * one becomes available, or a timeout of 15 seconds elapses. When the latter + * happens, -ERESTARTSYS is returned. + * + * Can only be called from process context (for now). + * + * Returns 0 on success and an appropriate error value on failure. + */ +int rpmsg_send_nocopy(struct rpmsg_endpoint *ept, void *data, int len) +{ + if (WARN_ON(!ept)) + return -EINVAL; + if (!ept->ops->send_nocopy) + return -ENXIO; + + return ept->ops->send_nocopy(ept, data, len); +} +EXPORT_SYMBOL(rpmsg_send_nocopy); + +/** + * rpmsg_sendto_nocopy() - send a message across to the remote processor, specify dst + * @ept: the rpmsg endpoint + * @data: payload of message + * @len: length of payload + * @dst: destination address + * + * This function sends @data of length @len to the remote @dst address. + * The message will be sent to the remote processor which the @ept + * endpoint belongs to, using @ept's address as source. + * In case there are no TX buffers available, the function will block until + * one becomes available, or a timeout of 15 seconds elapses. When the latter + * happens, -ERESTARTSYS is returned. + * + * Can only be called from process context (for now). + * + * Returns 0 on success and an appropriate error value on failure. + */ +int rpmsg_sendto_nocopy(struct rpmsg_endpoint *ept, void *data, int len, u32 dst) +{ + if (WARN_ON(!ept)) + return -EINVAL; + if (!ept->ops->sendto_nocopy) + return -ENXIO; + + return ept->ops->sendto_nocopy(ept, data, len, dst); +} +EXPORT_SYMBOL(rpmsg_sendto_nocopy); + +/** + * rpmsg_send_offchannel_nocopy() - send a message using explicit src/dst addresses + * @ept: the rpmsg endpoint + * @src: source address + * @dst: destination address + * @data: payload of message + * @len: length of payload + * + * This functio
Re: mm/mm_heap assertion error
meminfo() can be helpful too. It detects many heap corruption problems (but perhaps not all?). By sprinkling a few calls to kmm_meminfo() in choice locations, you should also be able to isolate the culprit. Perhaps after each time the lopri worker runs or after each rpmsg. On 3/11/2024 1:20 PM, Simon Filgis wrote: Is there a way to colorize heap to track down the bandid? Like CRC pattern on all the spaces around and check on every call that the CRC pattern ist still OK? Gregory Nutt schrieb am Mo., 11. März 2024, 19:27: If the memory location that is corrupted is consistent, then you can monitor that location to find the culprit (perhaps using debug output). If your debugger supports it then setting a watchpoint could also trigger a break when the corruption occurs. Maybe you can also try disabling features until you find the feature logic that is corrupting the heap. There is no easy way to accomplish this. On 3/11/2024 11:27 AM, Nathan Hartman wrote: What's needed is some way to binary search where the culprit is. If I understand correctly, it looks like the crash is happening in the later stages of board bring-up? What is running before that? Can parts be disabled or skipped to see if the problem goes away? Another idea is to try running a static analysis tool on the sources and see if it finds anything suspicious to be looked into more carefully. On Mon, Mar 11, 2024 at 10:00 AM Gregory Nutt wrote: The reason that the error is confusing is because the error probably did not occur at the time of the assertion; it probably occurred much earlier. In most crashes due to heap corruption there are two players: the culprit and the victim threads. The culprit thread actually cause the corruption. But at the time of the corruption, no error occurs. The error will not occur until later. So sometime later, the victim thread runs, encounters the clobbered heap and crashes. In this case, "AppBringup" and "rptun" are potential victim threads. The fact that they crash tell you very little about the culprit. On 3/10/2024 6:51 PM, yfliu2008 wrote: Gregory, thank you for the analysis. The crashes happened during system booting up, mostly at "AppBringup" or "rptun" threads, as per the assertion logs. The other threads existing are the "idle" and the "lpwork" threads as per the sched logs. There should be no other threads as NSH creation is still ongoing. As for interruptions, the UART and IPI are running in kernel space and MTIMER are in NuttSBI space. The NSH is loaded from a RPMSGFS volume, thus there are a lot RPMSG communications. Is the KASAN proper for use in Kernel mode? With MM_KASAN_ALL it reports a read access error: BCkasan_report: kasan detected a read access error, address at 0x708fe90,size is 8, return address: 0x701aeac _assert: Assertion failed panic: at file: kasan/kasan.c:117 task: Idle_Task process: Kernel 0x70023c0 The call stack looks like: #0 _assert (filename=0x7060f78 "kasan/kasan.c", linenum=117, msg=0x7060ff0 "panic", regs=0x7082720 #2 0x070141d6 in kasan_report (addr=0x708fe90, size=8, is_write=false, return_address=0x701aeac #3 0x07014412 in kasan_check_report (addr=0x708fe90, size=8, is_write=false, return_address=0x701aeac #4 0x0701468c in __asan_load8_noabort (addr=0x708fe90) at kasan/kasan.c:315 #5 0x0701aeac in riscv_swint (irq=0, context=0x708fe40, arg=0x0) at common/riscv_swint.c:133 #6 0x0701b8fe in riscv_perform_syscall (regs=0x708fe40) at common/supervisor/riscv_perform_syscall.c:45 #7 0x07000570 in sys_call6 () With MM_KASAN_DISABLE_READ_CHECKS=y, it reports: _assert: Assertion failed : at file: mm_heap/mm_malloc.c:245 task: rptun process: Kernel 0x704a030 The call stack is: #0 _assert (filename=0x7056060 "mm_heap/mm_malloc.c", linenum=245, msg=0x0, regs=0x7082720 #2 0x07013082 in mm_malloc (heap=0x7089c00, size=128) at mm_heap/mm_malloc.c:245 #3 0x07011694 in kmm_malloc (size=128) at kmm_heap/kmm_malloc.c:51 #4 0x0704efd4 in metal_allocate_memory (size=128) at .../nuttx/include/metal/system/nuttx/alloc.h:27 #5 0x0704fd8a in rproc_virtio_create_vdev (role=1, notifyid=0, rsc=0x80200050, rsc_io=0x7080408 priv=0x708ecd8, notify=0x704e6d2 rst_cb=0x0) at open-amp/lib/remoteproc/remoteproc_virtio.c:356 #6 0x0704e956 in remoteproc_create_virtio (rproc=0x708ecd8, vdev_id=0, role=1, rst_cb=0x0) at open-amp/lib/remoteproc/remoteproc.c:957 #7 0x0704b1ee in rptun_dev_start (rproc=0x708ecd8) at rptun/rptun.c:757 #8 0x07049ff8 in rptun_start_worker (arg=0x708eac0) at rptun/rptun.c:233 #9 0x0704a0ac in rptun_thread (argc=3, argv=0x7092010) at rptun/rptun.c:253 #10 0x0700437e in nxtask_start () at task/task_start.c:107 This looks like already corrupted. I also noticed there is a "mm_checkcorruption()" function, not sure how to use it
Re: mm/mm_heap assertion error
Is there a way to colorize heap to track down the bandid? Like CRC pattern on all the spaces around and check on every call that the CRC pattern ist still OK? Gregory Nutt schrieb am Mo., 11. März 2024, 19:27: > If the memory location that is corrupted is consistent, then you can > monitor that location to find the culprit (perhaps using debug output). > If your debugger supports it then setting a watchpoint could also > trigger a break when the corruption occurs. > > Maybe you can also try disabling features until you find the feature > logic that is corrupting the heap. There is no easy way to accomplish > this. > > On 3/11/2024 11:27 AM, Nathan Hartman wrote: > > What's needed is some way to binary search where the culprit is. > > > > If I understand correctly, it looks like the crash is happening in the > > later stages of board bring-up? What is running before that? Can parts > > be disabled or skipped to see if the problem goes away? > > > > Another idea is to try running a static analysis tool on the sources > > and see if it finds anything suspicious to be looked into more > > carefully. > > > > > > On Mon, Mar 11, 2024 at 10:00 AM Gregory Nutt > wrote: > >> The reason that the error is confusing is because the error probably did > >> not occur at the time of the assertion; it probably occurred much > earlier. > >> > >> In most crashes due to heap corruption there are two players: the > >> culprit and the victim threads. The culprit thread actually cause the > >> corruption. But at the time of the corruption, no error occurs. The > >> error will not occur until later. > >> > >> So sometime later, the victim thread runs, encounters the clobbered heap > >> and crashes. In this case, "AppBringup" and "rptun" are potential > >> victim threads. The fact that they crash tell you very little about the > >> culprit. > >> > >> On 3/10/2024 6:51 PM, yfliu2008 wrote: > >>> Gregory, thank you for the analysis. > >>> > >>> > >>> > >>> > >>> The crashes happened during system booting up, mostly at "AppBringup" > or "rptun" threads, as per the assertion logs. The other threads existing > are the "idle" and the "lpwork" threads as per the sched logs. There should > be no other threads as NSH creation is still ongoing. As for > interruptions, the UART and IPI are running in kernel space and MTIMER are > in NuttSBI space. The NSH is loaded from a RPMSGFS volume, thus there > are a lot RPMSG communications. > >>> > >>> > >>> > >>> > >>> Is the KASAN proper for use in Kernel mode? > >>> > >>> > >>> With MM_KASAN_ALL it reports a read access error: > >>> > >>> > >>> > >>> BCkasan_report: kasan detected a read access error, address at > 0x708fe90,size is 8, return address: 0x701aeac > >>> > >>> _assert: Assertion failed panic: at file: kasan/kasan.c:117 task: > Idle_Task process: Kernel 0x70023c0 > >>> > >>> > >>> The call stack looks like: > >>> > >>> > >>> #0 _assert (filename=0x7060f78 "kasan/kasan.c", linenum=117, > msg=0x7060ff0 "panic", regs=0x7082720 misc/assert.c:536#1 0x07010248 in __assert > (filename=0x7060f78 "kasan/kasan.c", linenum=117, msg=0x7060ff0 "panic") at > assert/lib_assert.c:36 > >>> #2 0x070141d6 in kasan_report (addr=0x708fe90, size=8, > is_write=false, return_address=0x701aeac kasan/kasan.c:117 > >>> #3 0x07014412 in kasan_check_report (addr=0x708fe90, > size=8, is_write=false, return_address=0x701aeac kasan/kasan.c:190 > >>> #4 0x0701468c in __asan_load8_noabort (addr=0x708fe90) > at kasan/kasan.c:315 > >>> #5 0x0701aeac in riscv_swint (irq=0, context=0x708fe40, > arg=0x0) at common/riscv_swint.c:133 > >>> #6 0x0701b8fe in riscv_perform_syscall (regs=0x708fe40) > at common/supervisor/riscv_perform_syscall.c:45 > >>> #7 0x07000570 in sys_call6 () > >>> > >>> > >>> > >>> With MM_KASAN_DISABLE_READ_CHECKS=y, it reports: > >>> > >>> > >>> _assert: Assertion failed : at file: mm_heap/mm_malloc.c:245 task: > rptun process: Kernel 0x704a030 > >>> > >>> > >>> The call stack is: > >>> > >>> > >>> #0 _assert (filename=0x7056060 "mm_heap/mm_malloc.c", > linenum=245, msg=0x0, regs=0x7082720 misc/assert.c:536#1 0x0700df18 in __assert > (filename=0x7056060 "mm_heap/mm_malloc.c", linenum=245, msg=0x0) at > assert/lib_assert.c:36 > >>> #2 0x07013082 in mm_malloc (heap=0x7089c00, size=128) at > mm_heap/mm_malloc.c:245 > >>> #3 0x07011694 in kmm_malloc (size=128) at > kmm_heap/kmm_malloc.c:51 > >>> #4 0x0704efd4 in metal_allocate_memory (size=128) at > .../nuttx/include/metal/system/nuttx/alloc.h:27 > >>> #5 0x0704fd8a in rproc_virtio_create_vdev (role=1, > notifyid=0, > >>> rsc=0x80200050, rsc_io=0x7080408 priv=0x708ecd8, > >>> notify=0x704e6d2 rst_cb=0x0) > >>> at open-amp/lib/remoteproc/remoteproc_virtio.c:356 > >>> #6 0x0704e956 in remoteproc_create_virtio > (rproc=0x708ecd8, > >>> vdev_id=0, role=1, rst_cb=0x0) at > open-amp/lib/remoteproc/remoteproc.c:9
Re: mm/mm_heap assertion error
If the memory location that is corrupted is consistent, then you can monitor that location to find the culprit (perhaps using debug output). If your debugger supports it then setting a watchpoint could also trigger a break when the corruption occurs. Maybe you can also try disabling features until you find the feature logic that is corrupting the heap. There is no easy way to accomplish this. On 3/11/2024 11:27 AM, Nathan Hartman wrote: What's needed is some way to binary search where the culprit is. If I understand correctly, it looks like the crash is happening in the later stages of board bring-up? What is running before that? Can parts be disabled or skipped to see if the problem goes away? Another idea is to try running a static analysis tool on the sources and see if it finds anything suspicious to be looked into more carefully. On Mon, Mar 11, 2024 at 10:00 AM Gregory Nutt wrote: The reason that the error is confusing is because the error probably did not occur at the time of the assertion; it probably occurred much earlier. In most crashes due to heap corruption there are two players: the culprit and the victim threads. The culprit thread actually cause the corruption. But at the time of the corruption, no error occurs. The error will not occur until later. So sometime later, the victim thread runs, encounters the clobbered heap and crashes. In this case, "AppBringup" and "rptun" are potential victim threads. The fact that they crash tell you very little about the culprit. On 3/10/2024 6:51 PM, yfliu2008 wrote: Gregory, thank you for the analysis. The crashes happened during system booting up, mostly at "AppBringup" or "rptun" threads, as per the assertion logs. The other threads existing are the "idle" and the "lpwork" threads as per the sched logs. There should be no other threads as NSH creation is still ongoing. As for interruptions, the UART and IPI are running in kernel space and MTIMER are in NuttSBI space. The NSH is loaded from a RPMSGFS volume, thus there are a lot RPMSG communications. Is the KASAN proper for use in Kernel mode? With MM_KASAN_ALL it reports a read access error: BCkasan_report: kasan detected a read access error, address at 0x708fe90,size is 8, return address: 0x701aeac _assert: Assertion failed panic: at file: kasan/kasan.c:117 task: Idle_Task process: Kernel 0x70023c0 The call stack looks like: #0 _assert (filename=0x7060f78 "kasan/kasan.c", linenum=117, msg=0x7060ff0 "panic", regs=0x7082720
Re: mm/mm_heap assertion error
What's needed is some way to binary search where the culprit is. If I understand correctly, it looks like the crash is happening in the later stages of board bring-up? What is running before that? Can parts be disabled or skipped to see if the problem goes away? Another idea is to try running a static analysis tool on the sources and see if it finds anything suspicious to be looked into more carefully. On Mon, Mar 11, 2024 at 10:00 AM Gregory Nutt wrote: > > The reason that the error is confusing is because the error probably did > not occur at the time of the assertion; it probably occurred much earlier. > > In most crashes due to heap corruption there are two players: the > culprit and the victim threads. The culprit thread actually cause the > corruption. But at the time of the corruption, no error occurs. The > error will not occur until later. > > So sometime later, the victim thread runs, encounters the clobbered heap > and crashes. In this case, "AppBringup" and "rptun" are potential > victim threads. The fact that they crash tell you very little about the > culprit. > > On 3/10/2024 6:51 PM, yfliu2008 wrote: > > Gregory, thank you for the analysis. > > > > > > > > > > The crashes happened during system booting up, mostly at "AppBringup" or > > "rptun" threads, as per the assertion logs. The other threads existing are > > the "idle" and the "lpwork" threads as per the sched logs. There should be > > no other threads as NSH creation is still ongoing. As for > > interruptions, the UART and IPI are running in kernel space and MTIMER are > > in NuttSBI space. The NSH is loaded from a RPMSGFS volume, thus there > > are a lot RPMSG communications. > > > > > > > > > > Is the KASAN proper for use in Kernel mode? > > > > > > With MM_KASAN_ALL it reports a read access error: > > > > > > > > BCkasan_report: kasan detected a read access error, address at > > 0x708fe90,size is 8, return address: 0x701aeac > > > > _assert: Assertion failed panic: at file: kasan/kasan.c:117 task: Idle_Task > > process: Kernel 0x70023c0 > > > > > > The call stack looks like: > > > > > > #0 _assert (filename=0x7060f78 "kasan/kasan.c", linenum=117, > > msg=0x7060ff0 "panic", regs=0x7082720 > misc/assert.c:536#1 0x07010248 in __assert > > (filename=0x7060f78 "kasan/kasan.c", linenum=117, msg=0x7060ff0 "panic") at > > assert/lib_assert.c:36 > > #2 0x070141d6 in kasan_report (addr=0x708fe90, size=8, > > is_write=false, return_address=0x701aeac > kasan/kasan.c:117 > > #3 0x07014412 in kasan_check_report (addr=0x708fe90, size=8, > > is_write=false, return_address=0x701aeac > kasan/kasan.c:190 > > #4 0x0701468c in __asan_load8_noabort (addr=0x708fe90) at > > kasan/kasan.c:315 > > #5 0x0701aeac in riscv_swint (irq=0, context=0x708fe40, > > arg=0x0) at common/riscv_swint.c:133 > > #6 0x0701b8fe in riscv_perform_syscall (regs=0x708fe40) at > > common/supervisor/riscv_perform_syscall.c:45 > > #7 0x07000570 in sys_call6 () > > > > > > > > With MM_KASAN_DISABLE_READ_CHECKS=y, it reports: > > > > > > _assert: Assertion failed : at file: mm_heap/mm_malloc.c:245 task: rptun > > process: Kernel 0x704a030 > > > > > > The call stack is: > > > > > > #0 _assert (filename=0x7056060 "mm_heap/mm_malloc.c", linenum=245, > > msg=0x0, regs=0x7082720 > 0x0700df18 in __assert (filename=0x7056060 > > "mm_heap/mm_malloc.c", linenum=245, msg=0x0) at assert/lib_assert.c:36 > > #2 0x07013082 in mm_malloc (heap=0x7089c00, size=128) at > > mm_heap/mm_malloc.c:245 > > #3 0x07011694 in kmm_malloc (size=128) at > > kmm_heap/kmm_malloc.c:51 > > #4 0x0704efd4 in metal_allocate_memory (size=128) at > > .../nuttx/include/metal/system/nuttx/alloc.h:27 > > #5 0x0704fd8a in rproc_virtio_create_vdev (role=1, notifyid=0, > > rsc=0x80200050, rsc_io=0x7080408 > priv=0x708ecd8, > > notify=0x704e6d2 > at open-amp/lib/remoteproc/remoteproc_virtio.c:356 > > #6 0x0704e956 in remoteproc_create_virtio (rproc=0x708ecd8, > > vdev_id=0, role=1, rst_cb=0x0) at > > open-amp/lib/remoteproc/remoteproc.c:957 > > #7 0x0704b1ee in rptun_dev_start (rproc=0x708ecd8) > > at rptun/rptun.c:757 > > #8 0x07049ff8 in rptun_start_worker (arg=0x708eac0) > > at rptun/rptun.c:233 > > #9 0x0704a0ac in rptun_thread (argc=3, argv=0x7092010) > > at rptun/rptun.c:253 > > #10 0x0700437e in nxtask_start () at task/task_start.c:107 > > > > > > This looks like already corrupted. > > > > > > > > I also noticed there is a "mm_checkcorruption()" function, not sure how to > > use it yet. > > > > > > > > Regards, > > yf > > > > > > > > > > > > Original > > > > > > > > From:"Gregory Nutt"< spudan...@gmail.com >; > > > > Date:2024/3/11 1:43 > > > > To:"dev"< dev@nuttx.apache.org >; > > > > Subject:Re: mm/mm_heap assertion error > > > > > > On 3/10/2024 4:38 AM, yfliu2008 wrote: > > > Dear experts, > > >
Re: uart rpmsg driver compatibility
On Mon, Mar 11, 2024 at 11:13 PM Andre Heinemans wrote: > Hi, > > Does the NuttX uart_rpmsg.c driver have a Linux counterpart to interact > with? Yes, the old version is here: https://lore.kernel.org/lkml/CAH2Cfb87Wacgsh=xz9h9kgwygbkxnbdbcdj4w3ups2likbt...@mail.gmail.com/ > I want to achieve a virtual uart connection through rpmsg on an imx8mp > between NuttX (m7) and Linux (a53). > The tty_rpmsg.c driver in mainline linux does not seem compatible as it > read and writes the raw data directly from the rpmsg buffers. The mainline version comes from ST developer, which lacks of the flow control and very easy to lose the data with the fast transaction. > Whereas the NuttX driver uses a struct ‘uart_rpmsg_write_s’ which contains > the raw data in one of its fields. > > We renewed the rpmsg tty driver on top of Linux 5.14 recently, which works perfectly with the NuttX mainline rpmsg_uart driver. Bowen could share the implementation tomorrow. > Kind regards, > Andre >
uart rpmsg driver compatibility
Hi, Does the NuttX uart_rpmsg.c driver have a Linux counterpart to interact with? I want to achieve a virtual uart connection through rpmsg on an imx8mp between NuttX (m7) and Linux (a53). The tty_rpmsg.c driver in mainline linux does not seem compatible as it read and writes the raw data directly from the rpmsg buffers. Whereas the NuttX driver uses a struct ‘uart_rpmsg_write_s’ which contains the raw data in one of its fields. Kind regards, Andre
Re: mm/mm_heap assertion error
The reason that the error is confusing is because the error probably did not occur at the time of the assertion; it probably occurred much earlier. In most crashes due to heap corruption there are two players: the culprit and the victim threads. The culprit thread actually cause the corruption. But at the time of the corruption, no error occurs. The error will not occur until later. So sometime later, the victim thread runs, encounters the clobbered heap and crashes. In this case, "AppBringup" and "rptun" are potential victim threads. The fact that they crash tell you very little about the culprit. On 3/10/2024 6:51 PM, yfliu2008 wrote: Gregory, thank you for the analysis. The crashes happened during system booting up, mostly at "AppBringup" or "rptun" threads, as per the assertion logs. The other threads existing are the "idle" and the "lpwork" threads as per the sched logs. There should be no other threads as NSH creation is still ongoing. As for interruptions, the UART and IPI are running in kernel space and MTIMER are in NuttSBI space. The NSH is loaded from a RPMSGFS volume, thus there are a lot RPMSG communications. Is the KASAN proper for use in Kernel mode? With MM_KASAN_ALL it reports a read access error: BCkasan_report: kasan detected a read access error, address at 0x708fe90,size is 8, return address: 0x701aeac _assert: Assertion failed panic: at file: kasan/kasan.c:117 task: Idle_Task process: Kernel 0x70023c0 The call stack looks like: #0 _assert (filename=0x7060f78 "kasan/kasan.c", linenum=117, msg=0x7060ff0 "panic", regs=0x7082720 Original From:"Gregory Nutt"< spudan...@gmail.com >; Date:2024/3/11 1:43 To:"dev"< dev@nuttx.apache.org >; Subject:Re: mm/mm_heap assertion error On 3/10/2024 4:38 AM, yfliu2008 wrote: > Dear experts, > > > > > When doing regression check on K230 with a previously working Kernel mode configuration, I got assertion error like below: > > > > #0 _assert (filename=0x704c598 "mm_heap/mm_malloc.c", linenum=245, msg=0x0,regs=0x7082730 > #2 0x070110f0 in mm_malloc (heap=0x7089c00, size=112) at mm_heap/mm_malloc.c:245 > #3 0x0700fd74 in kmm_malloc (size=112) at kmm_heap/kmm_malloc.c:51 > #4 0x07028d4e in elf_loadphdrs (loadinfo=0x7090550) at libelf/libelf_sections.c:207 > #5 0x07028b0c in elf_load (loadinfo=0x7090550) at libelf/libelf_load.c:337 > #6 0x070278aa in elf_loadbinary (binp=0x708f5d0, filename=0x704bca8 "/system/bin/init", exports=0x0, nexports=0) at elf.c:257 > #7 0x070293ea in load_absmodule (bin=0x708f5d0, filename=0x704bca8 "/system/bin/init", exports=0x0, nexports=0) at binfmt_loadmodule.c:115 > #8 0x07029504 in load_module (bin=0x708f5d0, filename=0x704bca8 "/system/bin/init", exports=0x0, nexports=0) at binfmt_loadmodule.c:219 > #9 0x07027674 in exec_internal (filename=0x704bca8 "/system/bin/init", argv=0x70907a0, envp=0x0, exports=0x0, nexports=0, actions=0x0, attr=0x7090788, spawn=true) at binfmt_exec.c:98 > #10 0x0702779c in exec_spawn (filename=0x704bca8 "/system/bin/init", argv=0x70907a0, envp=0x0, exports=0x0, nexports=0, actions=0x0, attr=0x7090788) at binfmt_exec.c:220 > #11 0x0700299e in nx_start_application () at init/nx_bringup.c:375 > #12 0x070029f0 in nx_start_task (argc=1, argv=0x7090010) at init/nx_bringup.c:403 > #13 0x07003f84 in nxtask_start () at task/task_start.c:107 > > > > It looks like mm/mm_heap data structure consistency was broken. As I am unfamilar with these internals, I am looking forward to any hints about how to find the root cause. > > > > > > > > Regards, > > yf This does indicate heap corruption: 240 /* Node next must be alloced, otherwise it should be merged. 241 * Its prenode(the founded node) must be free and preceding should 242 * match with nodesize. 243 */ 244 245 DEBUGASSERT(MM_NODE_IS_ALLOC(next) && MM_PREVNODE_IS_FREE(next) && 246 next->preceding == nodesize); Heap corruption normally occurs when that this a wild write outside of the allocated memory region. These kinds of wild writes may clobber some other threads data and directory or indirectly clobber the heap meta data. Trying to traverse the damages heap meta data is probably the root cause of the problem. Only a kernel thread or interrupt handler could damage the heap. The cause of this corruption can be really difficult to find because the reported error does not occur when the heap is damaged but may not manifest itself until sometime later. It is unlikely that anyone will be able to solve this by just talking about it. It might be worth increasing some kernel thread heap sizes just to eliminate that common cause.