Hi, Thank you all for sharing your Networking and dcache expertise!
The descriptors and buffers are all aligned and sized for the dcache line size. The problem is the following. (The value of number do not matter, but help express the nature of the problem.) If CONFIG_NET_ETH_PKTSIZE is the default 490 and a 1518 frame on the network is received by the F7. The DMA HW will store the frame as n buffer sizes segments and one or 0 remainder sizes buffer. The following will happen: 490 becomes 608 after the sizing and alignment. DMA populates the buffers from the descriptors +>D0->B0(608) the FL is 1518 the d_len is set to 1514. FL from (FL bits in RDES0[29:16]) - 4 | | | V | D1->B1(608) | | | V | D2->B2(298) | .... | | | V <+Dn->Bn[] .... >From RM410: To compute the amount of valid data in this final buffer, the driver must read the frame length (FL bits in RDES0[29:16]) and subtract the sum of the buffer sizes of the preceding buffers in this frame. But the code is invalidating from &B0[0] to &B0[1513]. If the buffers were contiguous in memory this would be ok. But the buffers that are used in RX are replaced (in the descriptors) from the free pool using the g_txbuffer memory. While at boot B0 to Bn are contiguous, they become scattered as a process of receiving (the nature of the ring and replacement from the free pool) The ring: /* Scan descriptors owned by the CPU. Scan until: * * 1) We find a descriptor still owned by the DMA, * 2) We have examined all of the RX descriptors, or * 3) All of the TX descriptors are in flight. * The replacement: buffer = stm32_allocbuffer(priv); /* Take the buffer from the RX descriptor of the first free * segment, put it into the network device structure, then * replace the buffer in the RX descriptor with the newly * allocated buffer. */ dev->d_buf = (uint8_t *)rxcurr->rdes2; rxcurr->rdes2 = (uint32_t)buffer; Eventually, B0 is allocated from one of the buffers in the g_txbuffer array. Given this layout of memory low-high /* Descriptor allocations */ g_rxtable[RXTABLE_SIZE] g_txtable[TXTABLE_SIZE] /* Buffer allocations */ g_rxbuffer[RXBUFFER_ALLOC] g_txbuffer[TXBUFFER_ALLOC] /* These are the pre-allocated Ethernet device structures */ stm32_ethmac_s g_stm32ethmac[STM32F7_NETHERNET]; The dev->d_buf is an address in g_txbuffer. dev->d_len is the Frame Length 1514 NOT the buffer length! The up_invalidate_dcache then corrupts the g_stm32ethmac. The result is dev->d_buf and + dev->d_len are both 0. Context before the call to up_invalidate_dcache dev->d_buf = &g_txbuffer[n * (RXBUFFER_ALLOC/608)] dev->d_len = 1514 up_invalidate_dcache((uintptr_t)dev->d_buf, (uintptr_t)dev->d_buf + dev->d_len); Context after the call to up_invalidate_dcache dev->d_buf =0 dev->d_len = 0 This then returns OK and stm32_receive dereferences a null pointer and places the null into the free pool. The hard fault then happens. When the CONFIG_NET_ETH_PKTSIZE is 1514, the corruption does to happen because sizeof FRAME == sizeof BUFFER (The system will still crash if the hardware can receive a bigger frame the numbers are relaitve) The driver is not quite right, the code manages the segments but does not coalesce them back in to a frame. (memcpy with such a DMA is gross thought) So the data RX data is useless to the network layer. If the network layers used IOB and could deal with on the fly assembly the system would be most efficient. But that is a major undertaking. The goal now is to harden the driver. 1) Discard frames (all segments) grater then the size of one buffer. 2) Fix the invalidation. David -----Original Message----- From: Gregory Nutt [mailto:spudan...@gmail.com] Sent: Thursday, March 04, 2021 5:01 PM To: dev@nuttx.apache.org Subject: Re: STM32H7 ethernet hardfaults > My question for Greg was: Is there an assumption that > CONFIG_NET_ETH_PKTSIZE > has to be 1514? So that ultimately a frame must be received completely > into > one buffer? Search for packet and frame in the Ethernet section of the reference manual. The hardware will DMA up to (I think it was) 2048 bytes without Jumbo mode. These are the 2K packets. This is for the H7 (and probably the F7). However, that should not happen because nothing should ever send packets that large to the station (and PROMISCUOUS mode should be disabled). From what you are saying, the packet buffer should also be aligned to the cache line and be a multiple of the cache line size. So for an MTU of X, I think that should be mask = Y - 1 size = ( (X + 18) + mask) & ~mask buffer = memalign(Y, size); where * X is the MTU (like 1500). The MTU does not include the Ethernet overhead. * Y is the cache line size, usually 32. * 18 is the Ethernet overhead (14 byte Ethernet header + 2 byte FCS at the end). The FCS is usually accounted for the the CONFIG_GUARD_SIZE which defaults to 2. But some hardware transfers additional info after the FCS. The 1514 packet size you mention may not be meaningful without some clarification. The packet buffer size of 1518 is a 1500 byte MTU plus the 18 byte Ethernet overhead. Some hardware verifies the FCS and does not transfer it to memory ... that would make 1514. But that is not a typical case. The selection of the MTU should not matter on most networks or with most network hardware. TCP should follow the negotiated MSS and everything else should be small. The MTU should be selected based on the configured properties of the network, resources available in the target, and the needs of the application IP size = Ethernet header + MTU + FCS = Ethernet header + IP header + protocol header + MSS + FCS IPv4 hosts are required to be able to handle an MSS of at least 536 bytes; IPv6 hosts are required to be able to handle an MSS of 1220 bytes. This results in the minimum sizes and is correct.