On 1/16/2020 7:43 AM, Andrew Rybchenko wrote: > On 1/15/20 11:43 PM, Thomas Monjalon wrote: >> 15/01/2020 19:35, Ferruh Yigit: >>> On 1/15/2020 6:49 AM, 方统浩50450 wrote: >>>> Hi Ferruh, thanks for your message. >>>> >>>> >>>> We developed a ethtool-dpdk which is secondary process based dpdk 17.08 >>>> version. Our device >>>> support hotplug detach, but hotplug deatch is failed when we use >>>> ethtool-dpdk.We found the >>>> secondary process will change the shared memory when >>>> initializing.Secondary process calls >>>> "rte_eth_dev_pci_allocate" function and enters "rte_eth_copy_pci_info" >>>> function. >>>> (rte_eth_dev_pci_generic_probe -> rte_eth_dev_pci_allocate -> >>>> rte_eth_copy_pci_info) >>>> Then it sets the value of struct "rte_eth_dev_data.dev_flags" to zero.In >>>> our platform, this value >>>> is equal to 0x0003.(RTE_ETH_DEV_DETACHABLE | RTE_ETH_DEV_INTR_LSC),but >>>> after reset >>>> the "dev_flags", the value changed to 0x0002.(RTE_ETH_DEV_DETACHABLE).So, >>>> our device hotplug >>>> detach is failed.I found the similar problem in other dpdk version, >>>> include dpdk 19.11.Even though >>>> the deivce hotplug detach is discarded,but i think the shared memory >>>> changed is unexpected by primary >>>> process. > > Hold on, just for my understanding. As far as I can see > RTE_ETH_DEV_DETACHABLE was removed in 17.11. Does it > change something in above description?
Overall secondary overwrites primary values, I think we should fix it independent from the flags involved. > >>> I agree this is the problem. >>> In the driver code, 'rte_eth_copy_pci_info' is called only by primary >>> process, >>> >>> but the generic code is faulty. >>> >>> And in 19.11 additionally 'eth_dev_pci_specific_init' also seems has same >>> problem. > > Yes, as I understand RTE_ETH_DEV_CLOSE_REMOVE, > RTE_ETH_DEV_BONDED_SLAVE, RTE_ETH_DEV_REPRESENTOR and > RTE_ETH_DEV_NOLIVE_MAC_ADDR may be lost because of > reinit (if not restored in other branches). Bad anyway. > >>>> Our driver is ixgbe, i think this problem has a little relationship with >>>> driver, Secondary process >>>> enters "rte_eth_copy_pci_info" by "rte_eth_dev_pci_allocate".And I agree >>>> your opinion, the helper >>>> function should simple on what it does.I have two ways to fix this >>>> problem, one is add an if-statement >>>> >>>> in "rte_eth_dev_pci_allocate" function to forbid secondary process enters >>>> "rte_eth_copy_pci_info" function, >>>> another way is add an if-statement in "rte_eth_copy_pci_info" function to >>>> forbid secondary process change >>>> shared memory.And First way need to ensure the "rte_eth_copy_pci_info" >>>> function won't be called anywhere else. >>>> I think the second way is simple and lower risk. >>> >>> Yes these are the two options. >>> >>> I agree adding check in the 'rte_eth_copy_pci_info' covers all cases and >>> safer. >>> BUT my concern was adding decision making to simple/leaf function and make >>> it >>> harder to debug/use, instead of giving what primary/secondary process should >>> call decision in higher level. >>> >>> But I just recognized that some PMDs are calling 'rte_eth_copy_pci_info' on >>> secondary process, like mlx4 or szedata2, and most probably this is not >>> their >>> intention. >>> And 'eth_dev->intr_handle' set in 'rte_eth_copy_pci_info', not calling this >>> function may have side affect of 'eth_dev->intr_handle' not set in >>> secondary. >>> >>> With above considerations I am OK to your proposal to cover all cases, >>> Thomas, >>> Andrew, any concern? > > I would put if condition in rte_eth_copy_pci_info(). > It is the function which writes shared space from > secondary process when it should not be done and it > should be fixed there. OK > >> Do you mean drivers need to be fixed? > > I'm not sure that I fully understand it. Since copy function > cares about intr_handle copying I'm afraid that it is not > 100% correct to skip it in secondary process completely as > many drivers do right now. Basically it makes eth_dev structure > in secondary process inconsistent. However, it looks like > most of these drivers simply obtain handle from pci_dev > directly and it explains why they are not affected. > There are exceptions which are potentially bugs, e.g. > drivers/net/ice/ice_ethdev.c: ice_interrupt_handler at the end. > > I think that it would be better if intr_handle is always > correct in eth_dev (both primary and secondary cases) and > drivers use it instead of the same from pci_dev. > OK So this suggest going on with Fang's patch. I only requested an additional note in function comment related to this secondary check.