Hi Morten, On 2024/4/10 18:30, Morten Brørup wrote: >> From: fengchengwen [mailto:fengcheng...@huawei.com] >> Sent: Wednesday, 10 April 2024 11.34 >> >> Hi All, >> >> We have a problem introduced by a compiler upgrade (from gcc10 to gcc12.3), >> we've done some >> research but haven't been able to figure out why. We'd like the community's >> help. >> >> Environment: >> 1. Source: DPDK 23.11 >> 2. GCC: 12.3.1 [1] >> 3. Compiled with target kunpeng SoC (ARM64) >> 4. Run on kunpeng SoC >> >> >> Problem & Debug: >> 1. We found the hns3 driver fails to update the link status. The >> corresponding >> function is >> hns3_update_linkstatus_and_event [2], and we found the rte_eth_linkstatus_set >> [3] always return zero. >> 2. After disassembly the hns3_update_linkstatus_and_event, and found >> rte_eth_linkstatus_set's >> rte_atomic_exchange_explicit return to xzr register (which is zero register): >> 1239fec: 3900f3e0 strb w0, [sp, #60] >> 1239ff0: 9101a041 add x1, x2, #0x68 ---x2 seem not the >> variable new_link >> 1239ff4: f8ff8022 swpal xzr, x2, [x1] ---this instr >> corresponding rte_atomic_exchange_explicit, >> it will place >> the resut in xzr which always zero, >> and this will >> lead to rte_eth_linkstatus_set return 0. >> 1239ff8: 3940f3e0 ldrb w0, [sp, #60] >> 1239ffc: d3609c41 ubfx x1, x2, #32, #8 >> 123a000: 4a010000 eor w0, w0, w1 >> 123a004: 36100080 tbz w0, #2, 123a014 >> <hns3_update_linkstatus_and_event+0xd4> >> 3. We checked other "ret = rte_eth_linkstatus_set" calls, and can't find >> similar problem. >> 4. After reading a lot of documents, we preliminarily think that the problem >> is caused by -fstrict-aliasing >> (which was enabled default with O2 or O3), if compiled with -fno-strict- >> aliasing, then this problem don't >> exist. We guest this maybe strict-aliasing's bug which only happened in our >> function. >> 5. We also try to use union to avoid such aliasing in rte_eth_linkstatus_set, >> we changed the struct >> rte_eth_link define, and it works: >> -__extension__ >> -struct __rte_aligned(8) rte_eth_link { /**< aligned for atomic64 >> read/write >> */ >> - uint32_t link_speed; /**< RTE_ETH_SPEED_NUM_ */ >> - uint16_t link_duplex : 1; /**< RTE_ETH_LINK_[HALF/FULL]_DUPLEX */ >> - uint16_t link_autoneg : 1; /**< RTE_ETH_LINK_[AUTONEG/FIXED] */ >> - uint16_t link_status : 1; /**< RTE_ETH_LINK_[DOWN/UP] */ >> +struct rte_eth_link { /**< aligned for atomic64 read/write */ >> + union { >> + uint64_t val64; >> + struct { >> + uint32_t link_speed; /**< RTE_ETH_SPEED_NUM_ >> */ >> + uint16_t link_duplex : 1; /**< >> RTE_ETH_LINK_[HALF/FULL]_DUPLEX */ >> + uint16_t link_autoneg : 1; /**< >> RTE_ETH_LINK_[AUTONEG/FIXED] */ >> + uint16_t link_status : 1; /**< >> RTE_ETH_LINK_[DOWN/UP] */ >> + }; >> + }; >> }; >> the corresponding rte_eth_linkstatus_set: >> @@ -1674,18 +1674,13 @@ static inline int >> rte_eth_linkstatus_set(struct rte_eth_dev *dev, >> const struct rte_eth_link *new_link) >> { >> - RTE_ATOMIC(uint64_t) *dev_link = (uint64_t __rte_atomic *)&(dev- >>> data->dev_link); >> - union { >> - uint64_t val64; >> - struct rte_eth_link link; >> - } orig; >> - >> - RTE_BUILD_BUG_ON(sizeof(*new_link) != sizeof(uint64_t)); >> + struct rte_eth_link old_link; >> >> - orig.val64 = rte_atomic_exchange_explicit(dev_link, *(const >> uint64_t >> *)new_link, >> + old_link.val64 = rte_atomic_exchange_explicit(&dev->data- >>> dev_link.val64, >> + new_link->val64, >> rte_memory_order_seq_cst); >> >> - return (orig.link.link_status == new_link->link_status) ? -1 : 0; >> + return (old_link.link_status == new_link->link_status) ? -1 : 0; >> } >> 6. BTW: the linux kernel enabled "-fno-strict-aliasing" default, please see >> [4] for more. >> > > Thank you for the detailed analysis. > Looking at the GCC documentation for -fstrict-aliasing supports your > conclusion. > > Just out of curiosity: > Does building with -Wstrict-aliasing=3 or -Wstrict-aliasing=1 provide any > useful information?
There are no any useful information with -Wstrict-aliasing=3 But for -Wstrict-aliasing=1, there are plenty of warnings, below is rte_eth_linkstatus_set: In file included from ../../dpdk/lib/ethdev/rte_ethdev.c:32: ../../dpdk/lib/ethdev/ethdev_driver.h: In function ‘rte_eth_linkstatus_set’: ../../dpdk/lib/ethdev/ethdev_driver.h:1677:67: warning: dereferencing type-punned pointer might break strict-aliasing rules [-Wstrict-aliasing] 1677 | RTE_ATOMIC(uint64_t) *dev_link = (uint64_t __rte_atomic *)&(dev->data->dev_link); | ^~~~~~~~~~~~~~~~~~~~~~ ../../dpdk/lib/ethdev/ethdev_driver.h:1685:9: warning: dereferencing type-punned pointer might break strict-aliasing rules [-Wstrict-aliasing] 1685 | orig.val64 = rte_atomic_exchange_explicit(dev_link, *(const uint64_t *)new_link, | ^~~~ Note: above without apply my fix (use union). > And are the any differences in the strict-aliasing warnings without/with your > fix? After apply my fix (use union), with enabled -Wstrict-aliasing=1, there are no strict-aliasing warnings related to rte_eth_linkstatus_set. Thanks > >> >> Last: We think there are two ways to solve this problem. >> 1. Add the compilation option '-fno-strict-aliasing' for hold DPDK project. >> 2. Use union to avoid such aliasing in rte_eth_linkstatus_set (please see >> above). >> PS: We prefer first way. >> >> Hope for more discuess. >> >> Thanks > > Unfortunately, DPDK uses a lot of type casting where other methods would be > formally more correct. > I fear that you only found one of potentially many more bugs like this. > Fixing this generally would be great, but probably not realistic. > > I prefer correctness over performance, and thus am in favor of adding > -fno-strict-aliasing project wide. > > Performance can be improved by adding more hints and attributes to functions > and parameters, e.g. rte_memcpy() could be: > __attribute__((access(write_only, 1, 3), access(read_only, 2, 3))) > static void * > rte_memcpy(void * restrict dst, const void * restrict src, size_t n); > > Instead of just: > static void * > rte_memcpy(void * dst, const void * src, size_t n); > > >> >> >> [1] https://developer.arm.com/downloads/-/arm-gnu-toolchain-downloads/12-3- >> rel1 >> [2] void >> hns3_update_linkstatus_and_event(struct hns3_hw *hw, bool query) >> { >> struct rte_eth_dev *dev = &rte_eth_devices[hw->data->port_id]; >> struct rte_eth_link new_link; >> int ret; >> >> if (query) >> hns3_update_port_link_info(dev); >> >> memset(&new_link, 0, sizeof(new_link)); >> hns3_setup_linkstatus(dev, &new_link); >> >> ret = rte_eth_linkstatus_set(dev, &new_link); >> if (ret == 0 && dev->data->dev_conf.intr_conf.lsc != 0) >> hns3_start_report_lse(dev); >> } >> [3] static inline int >> rte_eth_linkstatus_set(struct rte_eth_dev *dev, >> const struct rte_eth_link *new_link) >> { >> RTE_ATOMIC(uint64_t) *dev_link = (uint64_t __rte_atomic *)&(dev->data- >>> dev_link); >> union { >> uint64_t val64; >> struct rte_eth_link link; >> } orig; >> >> RTE_BUILD_BUG_ON(sizeof(*new_link) != sizeof(uint64_t)); >> >> orig.val64 = rte_atomic_exchange_explicit(dev_link, *(const uint64_t >> *)new_link, >> rte_memory_order_seq_cst); >> >> return (orig.link.link_status == new_link->link_status) ? -1 : 0; >> } >> [4] https://lore.kernel.org/all/b3itcd$2bi$1...@penguin.transmeta.com/