> From: fengchengwen [mailto:fengcheng...@huawei.com] > Sent: Wednesday, 10 April 2024 11.34 > > Hi All, > > We have a problem introduced by a compiler upgrade (from gcc10 to gcc12.3), > we've done some > research but haven't been able to figure out why. We'd like the community's > help. > > Environment: > 1. Source: DPDK 23.11 > 2. GCC: 12.3.1 [1] > 3. Compiled with target kunpeng SoC (ARM64) > 4. Run on kunpeng SoC > > > Problem & Debug: > 1. We found the hns3 driver fails to update the link status. The corresponding > function is > hns3_update_linkstatus_and_event [2], and we found the rte_eth_linkstatus_set > [3] always return zero. > 2. After disassembly the hns3_update_linkstatus_and_event, and found > rte_eth_linkstatus_set's > rte_atomic_exchange_explicit return to xzr register (which is zero register): > 1239fec: 3900f3e0 strb w0, [sp, #60] > 1239ff0: 9101a041 add x1, x2, #0x68 ---x2 seem not the > variable new_link > 1239ff4: f8ff8022 swpal xzr, x2, [x1] ---this instr > corresponding rte_atomic_exchange_explicit, > it will place > the resut in xzr which always zero, > and this will > lead to rte_eth_linkstatus_set return 0. > 1239ff8: 3940f3e0 ldrb w0, [sp, #60] > 1239ffc: d3609c41 ubfx x1, x2, #32, #8 > 123a000: 4a010000 eor w0, w0, w1 > 123a004: 36100080 tbz w0, #2, 123a014 > <hns3_update_linkstatus_and_event+0xd4> > 3. We checked other "ret = rte_eth_linkstatus_set" calls, and can't find > similar problem. > 4. After reading a lot of documents, we preliminarily think that the problem > is caused by -fstrict-aliasing > (which was enabled default with O2 or O3), if compiled with -fno-strict- > aliasing, then this problem don't > exist. We guest this maybe strict-aliasing's bug which only happened in our > function. > 5. We also try to use union to avoid such aliasing in rte_eth_linkstatus_set, > we changed the struct > rte_eth_link define, and it works: > -__extension__ > -struct __rte_aligned(8) rte_eth_link { /**< aligned for atomic64 read/write > */ > - uint32_t link_speed; /**< RTE_ETH_SPEED_NUM_ */ > - uint16_t link_duplex : 1; /**< RTE_ETH_LINK_[HALF/FULL]_DUPLEX */ > - uint16_t link_autoneg : 1; /**< RTE_ETH_LINK_[AUTONEG/FIXED] */ > - uint16_t link_status : 1; /**< RTE_ETH_LINK_[DOWN/UP] */ > +struct rte_eth_link { /**< aligned for atomic64 read/write */ > + union { > + uint64_t val64; > + struct { > + uint32_t link_speed; /**< RTE_ETH_SPEED_NUM_ > */ > + uint16_t link_duplex : 1; /**< > RTE_ETH_LINK_[HALF/FULL]_DUPLEX */ > + uint16_t link_autoneg : 1; /**< > RTE_ETH_LINK_[AUTONEG/FIXED] */ > + uint16_t link_status : 1; /**< > RTE_ETH_LINK_[DOWN/UP] */ > + }; > + }; > }; > the corresponding rte_eth_linkstatus_set: > @@ -1674,18 +1674,13 @@ static inline int > rte_eth_linkstatus_set(struct rte_eth_dev *dev, > const struct rte_eth_link *new_link) > { > - RTE_ATOMIC(uint64_t) *dev_link = (uint64_t __rte_atomic *)&(dev- > >data->dev_link); > - union { > - uint64_t val64; > - struct rte_eth_link link; > - } orig; > - > - RTE_BUILD_BUG_ON(sizeof(*new_link) != sizeof(uint64_t)); > + struct rte_eth_link old_link; > > - orig.val64 = rte_atomic_exchange_explicit(dev_link, *(const uint64_t > *)new_link, > + old_link.val64 = rte_atomic_exchange_explicit(&dev->data- > >dev_link.val64, > + new_link->val64, > rte_memory_order_seq_cst); > > - return (orig.link.link_status == new_link->link_status) ? -1 : 0; > + return (old_link.link_status == new_link->link_status) ? -1 : 0; > } > 6. BTW: the linux kernel enabled "-fno-strict-aliasing" default, please see > [4] for more. >
Thank you for the detailed analysis. Looking at the GCC documentation for -fstrict-aliasing supports your conclusion. Just out of curiosity: Does building with -Wstrict-aliasing=3 or -Wstrict-aliasing=1 provide any useful information? And are the any differences in the strict-aliasing warnings without/with your fix? > > Last: We think there are two ways to solve this problem. > 1. Add the compilation option '-fno-strict-aliasing' for hold DPDK project. > 2. Use union to avoid such aliasing in rte_eth_linkstatus_set (please see > above). > PS: We prefer first way. > > Hope for more discuess. > > Thanks Unfortunately, DPDK uses a lot of type casting where other methods would be formally more correct. I fear that you only found one of potentially many more bugs like this. Fixing this generally would be great, but probably not realistic. I prefer correctness over performance, and thus am in favor of adding -fno-strict-aliasing project wide. Performance can be improved by adding more hints and attributes to functions and parameters, e.g. rte_memcpy() could be: __attribute__((access(write_only, 1, 3), access(read_only, 2, 3))) static void * rte_memcpy(void * restrict dst, const void * restrict src, size_t n); Instead of just: static void * rte_memcpy(void * dst, const void * src, size_t n); > > > [1] https://developer.arm.com/downloads/-/arm-gnu-toolchain-downloads/12-3- > rel1 > [2] void > hns3_update_linkstatus_and_event(struct hns3_hw *hw, bool query) > { > struct rte_eth_dev *dev = &rte_eth_devices[hw->data->port_id]; > struct rte_eth_link new_link; > int ret; > > if (query) > hns3_update_port_link_info(dev); > > memset(&new_link, 0, sizeof(new_link)); > hns3_setup_linkstatus(dev, &new_link); > > ret = rte_eth_linkstatus_set(dev, &new_link); > if (ret == 0 && dev->data->dev_conf.intr_conf.lsc != 0) > hns3_start_report_lse(dev); > } > [3] static inline int > rte_eth_linkstatus_set(struct rte_eth_dev *dev, > const struct rte_eth_link *new_link) > { > RTE_ATOMIC(uint64_t) *dev_link = (uint64_t __rte_atomic *)&(dev->data- > >dev_link); > union { > uint64_t val64; > struct rte_eth_link link; > } orig; > > RTE_BUILD_BUG_ON(sizeof(*new_link) != sizeof(uint64_t)); > > orig.val64 = rte_atomic_exchange_explicit(dev_link, *(const uint64_t > *)new_link, > rte_memory_order_seq_cst); > > return (orig.link.link_status == new_link->link_status) ? -1 : 0; > } > [4] https://lore.kernel.org/all/b3itcd$2bi$1...@penguin.transmeta.com/