On 05/28/2014 09:55 AM, Alexander Graf wrote: > > On 27.05.14 06:51, Alexey Kardashevskiy wrote: >> On 05/23/2014 12:25 AM, Alexey Kardashevskiy wrote: >>> On 05/22/2014 08:57 PM, Alexander Graf wrote: >>>> On 22.05.14 12:53, Alexey Kardashevskiy wrote: >>>>> On 05/22/2014 05:16 PM, Alexander Graf wrote:> >>>>>>> Am 22.05.2014 um 08:53 schrieb Alexey Kardashevskiy <a...@ozlabs.ru>: >>>>>>> >>>>>>>> On 05/21/2014 10:42 PM, Alexey Kardashevskiy wrote: >>>>>>>>> On 05/21/2014 08:35 PM, Alexander Graf wrote: >>>>>>>>> >>>>>>>>>> On 21.05.14 12:13, Alexey Kardashevskiy wrote: >>>>>>>>>>> On 05/21/2014 07:50 PM, Alexander Graf wrote: >>>>>>>>>>>> On 21.05.14 11:33, Alexey Kardashevskiy wrote: >>>>>>>>>>>>> On 05/21/2014 07:13 PM, Alexander Graf wrote: >>>>>>>>>>>>>> On 21.05.14 11:11, Michael S. Tsirkin wrote: >>>>>>>>>>>>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote: >>>>>>>>>>>>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote: >>>>>>>>>>>>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote: >>>>>>>>>>>>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote: >>>>>>>>>>>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX >>>>>>>>>>>>>>>>>> interrupt as >>>>>>>>>>>>>>>>>> XICS used to be unable to reuse interrupts which becomes a >>>>>>>>>>>>>>>>>> problem for >>>>>>>>>>>>>>>>>> dynamic MSI reconfiguration which is happening on guest >>>>>>>>>>>>>>>>>> driver >>>>>>>>>>>>>>>>>> reload or >>>>>>>>>>>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of >>>>>>>>>>>>>>>>>> devices >>>>>>>>>>>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no >>>>>>>>>>>>>>>>>> good >>>>>>>>>>>>>>>>>> reason >>>>>>>>>>>>>>>>>> for that. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This makes use of new XICS ability to reuse interrupts. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the >>>>>>>>>>>>>>>>>> first >>>>>>>>>>>>>>>>>> IRQ >>>>>>>>>>>>>>>>>> number >>>>>>>>>>>>>>>>>> of a device is stored in MSI/MSIX config space so there >>>>>>>>>>>>>>>>>> is no >>>>>>>>>>>>>>>>>> need to >>>>>>>>>>>>>>>>>> store >>>>>>>>>>>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags >>>>>>>>>>>>>>>>>> telling >>>>>>>>>>>>>>>>>> what >>>>>>>>>>>>>>>>>> type >>>>>>>>>>>>>>>>>> of interrupt for which device it has configured in order to >>>>>>>>>>>>>>>>>> return >>>>>>>>>>>>>>>>>> error if >>>>>>>>>>>>>>>>>> (for example) MSIX was enabled and the guest is trying to >>>>>>>>>>>>>>>>>> disable >>>>>>>>>>>>>>>>>> MSI >>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>> it has not enabled. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This removes a limit for the maximum number of MSIX-enabled >>>>>>>>>>>>>>>>>> devices >>>>>>>>>>>>>>>>>> per PHB, >>>>>>>>>>>>>>>>>> now XICS and PCI bus capacity are the only limitation. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This changes migration stream as it fixes >>>>>>>>>>>>>>>>>> vmstate_spapr_pci_msi::name >>>>>>>>>>>>>>>>>> which was >>>>>>>>>>>>>>>>>> wrong since the beginning. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This fixed traces to be more informative. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> >>>>>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I >>>>>>>>>>>>>>>>>> could >>>>>>>>>>>>>>>>>> remove >>>>>>>>>>>>>>>>>> msi/msix >>>>>>>>>>>>>>>>>> bitmaps from this patch, would it make sense? >>>>>>>>>>>>>>>>> Is this a hard requirement? Does a device have to choose >>>>>>>>>>>>>>>>> between >>>>>>>>>>>>>>>>> MSIX and >>>>>>>>>>>>>>>>> MSI or could it theoretically have both enabled? Is this a >>>>>>>>>>>>>>>>> PCI >>>>>>>>>>>>>>>>> limitation, >>>>>>>>>>>>>>>>> a PAPR/XICS limitation or just a limitation of your >>>>>>>>>>>>>>>>> implementation? >>>>>>>>>>>>>>>> My implementation does not have this limitation, I asked if >>>>>>>>>>>>>>>> I can >>>>>>>>>>>>>>>> simplify >>>>>>>>>>>>>>>> code by introducing one :) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX >>>>>>>>>>>>>>>> enabled >>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>> it does not seem to be used by anyone => cannot debug and >>>>>>>>>>>>>>>> confirm. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when >>>>>>>>>>>>>>>> MSI is >>>>>>>>>>>>>>>> already >>>>>>>>>>>>>>>> enabled, this is a "change", not enabling both types. But >>>>>>>>>>>>>>>> it also >>>>>>>>>>>>>>>> says MSI >>>>>>>>>>>>>>>> and MSIX vector numbers are not shared. Hm. >>>>>>>>>>>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd >>>>>>>>>>>>>>> rather not impose one. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Michael, do you know of any hardware that uses MSI *and* >>>>>>>>>>>>>>> MSI-X at >>>>>>>>>>>>>>> the same time? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Alex >>>>>>>>>>>>>> No, and the PCI spec says: >>>>>>>>>>>>>> A function is permitted to implement both MSI and >>>>>>>>>>>>>> MSI-X, but >>>>>>>>>>>>>> system >>>>>>>>>>>>>> software is >>>>>>>>>>>>>> prohibited from enabling both at the same time. If system >>>>>>>>>>>>>> software >>>>>>>>>>>>>> enables both at the same time, the result is undefined. >>>>>>>>>>>>> Ah, cool. So yes Alexey, feel free to impose it :). >>>>>>>>>>>> Heh. This solves just half of the problem - I still have to keep >>>>>>>>>>>> track of >>>>>>>>>>>> what device got MSI/MSIX configured via that ibm,change-msi >>>>>>>>>>>> interface. I >>>>>>>>>>>> was hoping I can store such flag somewhere in a device PCI config >>>>>>>>>>>> space >>>>>>>>>>>> but >>>>>>>>>>>> MSI/MSIX enable bit is not good as it is not set when those >>>>>>>>>>>> calls are >>>>>>>>>>>> made. >>>>>>>>>>>> And I cannot rely on address/data fields much as the guest can >>>>>>>>>>>> change them >>>>>>>>>>>> (I already use them to store IRQ numbers and btw it is missing >>>>>>>>>>>> checks when >>>>>>>>>>>> I read them back for disposal, I'll fix in next round). >>>>>>>>>>>> >>>>>>>>>>>> Or on "enable" event I could put IRQ numbers to .data of MSI >>>>>>>>>>>> config space >>>>>>>>>>>> and on "disable" check if it is not zero, then configuration took >>>>>>>>>>>> place, >>>>>>>>>>>> then I can remove my msi[]/msix[] flag arrays. If the guest did >>>>>>>>>>>> any change >>>>>>>>>>>> to MSI/MSIX config space (it does not on SPAPR except weird >>>>>>>>>>>> selftest >>>>>>>>>>>> cases), I compare .data with what ICS can possibly have and either >>>>>>>>>>>> reject >>>>>>>>>>>> "disable" or handle it and if it breaks XICS - that's too bad >>>>>>>>>>>> for the >>>>>>>>>>>> stupid guest. Would that be acceptable? >>>>>>>>>>> Can't you prohibit the guest from writing to the MSI configuration >>>>>>>>>>> registers itself? Then you don't need to do sanity checks. >>>>>>>>>> I could for emulated devices but VFIO uses the same code. For >>>>>>>>>> example, >>>>>>>>>> there is an IBM SCSI IPR card which does a "self test". For that, it >>>>>>>>>> saves >>>>>>>>>> MSIX BAR content, does reboot via some backdoor interface and >>>>>>>>>> restores MSIX >>>>>>>>>> BAR. It has been solved for VFIO in the host kernel by restoring >>>>>>>>>> MSIX data >>>>>>>>>> from cached values when guest is trying to restore it with what it >>>>>>>>>> thinks >>>>>>>>>> is actual MSIX data (it is virtualized because of x86). But there is >>>>>>>>>> cache >>>>>>>>> We already have a cache because we don't access the real PCI >>>>>>>>> registers with >>>>>>>>> msi_set_message(), no? >>>>>>>> For emulated devices there is no cache. And in any case the guest is >>>>>>>> allowed to write to it... Who knows what AIX does? I do not. >>>>>>> Tried GHashTable for keeping bus:dev.fn <-> (irq, num), more or less ok >>>>>>> but >>>>>>> how to migrate such thing? Temporary cache somewhere and then unpack >>>>>>> it? Or >>>>>>> use old style migration callbacks? >>>>>> Could you try to introduce a new vmstate type that just serializes and >>>>>> deserializes hash tables? Maybe there is already a serialization >>>>>> function for it in glib? >>>>> I have not found any (most likely I do not know how to search there), >>>>> I added mine, here are VMSTATE_HASH + its use for SPAPR. >>>>> >>>>> >>>>> Is this a movement to right direction? I need to put key/value sizes >>>>> into VMSTATE definition somehow but do not really want to touch >>>>> VMStateField. >>>> I'm not sure. Juan? >>> >>> I also tried to simplify this particular thing more by assuming that the >>> key is always "int" and put size of value to VMStateField::size. But there >>> is no way to get the size in VMStateInfo::get(). Or I could do a >>> "subsection" and try implementing has as an array (and have get/put defined >>> for items, should work) but in this case I'll need to cache number of >>> elements of the hash table somewhere so I'll need a wrapper around >>> GHashTable. >>> >>> Making generalized version is not obvious for me :( >> Juan? >> Alex? >> Anybody? >> Ping. >> >> How do I migrate GHashTable? If I am allowed to use custom and bit more >> polished get/put from "[PATCH 1/2] vmstate: Add helper to enable GHashTable >> migration", I'll be fine. > > Yeah, I think it's ok to be custom in this case. Or another crazy idea - > could you flatten the hash table into an array of structs that you can > describe using VMState? You could then convert from the flat array to/from > the GHashTable with pre_load/post_load hooks.
Array is exactly what I am trying to get rid of. Ok, I'll remove hashmap at all and implement dynamic flat array (yay, yet another bicycle!). > The benefit of that would be that the data becomes more easily > introspectible with tools like my live migration parser. Wow. What is that parser? -- Alexey