On 09/04/2014 11:34 AM, Michael Roth wrote: > Quoting Michael Roth (2014-09-04 11:12:15) >> Quoting Bharata B Rao (2014-09-04 10:08:20) >>> On Thu, Sep 4, 2014 at 4:33 AM, Michael Roth <mdr...@linux.vnet.ibm.com> >>> wrote: >>>>>> +static int spapr_device_hotplug_add(DeviceState *qdev, PCIDevice *dev) >>>>>> +{ >>>>>> + sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(qdev); >>>>>> + sPAPRDrcEntry *drc_entry, *drc_entry_slot; >>>>>> + sPAPRConfigureConnectorState *ccs; >>>>>> + int slot = PCI_SLOT(dev->devfn); >>>>>> + int offset, ret; >>>>>> + void *fdt_orig, *fdt; >>>>>> + char nodename[512]; >>>>>> + uint32_t encoded = ENCODE_DRC_STATE(INDICATOR_ENTITY_SENSE_PRESENT, >>>>>> + INDICATOR_ENTITY_SENSE_MASK, >>>>>> + INDICATOR_ENTITY_SENSE_SHIFT); >>>>>> + >>>>> >>>>> I am building on this infrastructure of yours and adding CPU hotplug >>>>> support to sPAPR guests. >>>>> >>>>> So we start with dr state of UNUSABLE and change it to PRESENT like >>>>> above when performing hotplug operation. But after this, in case of >>>>> CPU hotplug, the CPU hotplug code in the kernel >>>>> (arch/powerpc/platforms/pseries/dlpar.c:dlpar_acquire_drc()) does >>>>> get-sensor-state rtas call and expects the dr state to be UNUSABLE. Is >>>>> the guest kernel right in expecting dr state to be in UNUSABLE state >>>>> like this ? I have in fact disabled this check in the guest kernel to >>>>> get a CPU hotplugged successfully, but wanted to understand the state >>>>> changes and the expectations from the guest kernel correctly. >>>> >>>> According to PAPR+ 2.7 13.5.3.3, >>>> >>>> PRESENT (1): >>>> >>>> Returned for logical and physical DR entities when the DR connector is >>>> allocated to the OS and the DR entity is present. For physical DR >>>> entities, >>>> this indicates that the DR connector actually has a DR entity plugged >>>> into >>>> it. For DR connectors of physical DR entities, the DR connector must be >>>> allocated to the OS to return this value, otherwise a status of -3, no >>>> such >>>> sensor implemented, will be returned from the get-sensor-state RTAS >>>> call. For >>>> DR connectors of logical DR entities, the DR connector must be allocated >>>> to >>>> the OS to return this value, otherwise a sensor value of 2 or 3 will be >>>> returned. >>>> >>>> UNUSABLE (2): >>>> >>>> Returned for logical DR entities when the DR entity is not currently >>>> available to the OS, but may possibly be made available to the OS by >>>> calling >>>> set-indicator with the allocation-state indicator, setting that >>>> indicator to >>>> usable. >>>> >>>> So it seems 'PRESENT' is in fact the right value immediately after PCI >>>> hotplug, but it doesn't seem clear from the documentation whether 'PRESENT' >>>> or 'UNUSABLE' is more correct immediately after CPU hotplug. What does >>>> seem clear as that UNUSABLE does not have any use in the case of PCI >>>> devices: just PRESENT/EMPTY(0). >>>> >>>> But we actually 0-initialize the sensor field for DRCEntrys corresponding >>>> to PCI devices, which corresponds to 'EMPTY' (0), so the handling seems >>>> correct for PCI devices... >>> >>> Thanks Michael for the above information on PRESENT and USABLE states. >>> >>>> >>>> And since we already initialize PHB sensors to UNUSABLE in the top-level >>>> DRC list, I'm not sure why adjacent CPU entries would be affected by what >>>> we do later for PCI devices? >>> >>> Sorry if I wasn't clear enough in my previous mail. CPU hotplug isn't >>> affected by what you do for PCI devices, but... >>> >>>> It seems like you'd just need to do the >>>> equivalent of what we do for PHBs during realize: >>> >>> when I try to do the same state changes for CPU hotplug, things don't >>> work as expected. >>> >>>> >>>> spapr_add_phb_to_drc_table(sphb->buid, 2 /* Unusable */); >>>> >>>> So I'm not sure where the need for guest kernel changes is coming from for >>>> CPU hotplug. >>> >>> When the resource is hotplugged, you change the state from UNUSABLE to >>> PRESENT in QEMU before signalling the guest (via check exception irq). >>> But the same state change in CPU hotplug case isn't as per guest >>> kernel's expectation. >>> >>>> Do you have a snippet of what the initialize/hot_add hooks >>>> like in your case? >>> >>> I am talking about this piece of code in the the kernel in >>> arch/powerpc/platforms/pseries/dlpar.c >>> >>> int dlpar_acquire_drc(u32 drc_index) >>> { >>> int dr_status, rc; >>> >>> rc = rtas_call(rtas_token("get-sensor-state"), 2, 2, &dr_status, >>> DR_ENTITY_SENSE, drc_index); >>> if (rc || dr_status != DR_ENTITY_UNUSABLE) >>> return -1; >>> ... >>> } >>> >>> I have circumvented this problem by not setting the state to PRESENT >>> in my current hotplug patch. You can refer to the same in >>> spapr_cpu_hotplug_add() routine that's part of my patch 14/15 >>> (https://lists.gnu.org/archive/html/qemu-devel/2014-09/msg00757.html) >> >> Yah, that's what I was getting at: at least just to get things working >> for testing, just avoid the PRESENT bits in your hot_add_cpu hook rather >> than patching the guest. Unfortunately the documentation isn't particularly >> clear about which of these approaches is more correct as far as CPUs go. But >> looking at it again: >> >> UNUSABLE (2): >> >> Returned for logical DR entities when the DR entity is not currently >> available to the OS, but may possibly be made available to the OS by >> calling >> set-indicator with the allocation-state indicator, setting that indicator >> to >> usable. >> >> That 'usable' indicator setting is documented for set-indicator as (1), which >> happens to correspond to PRESENT (1). So my read would be that for 'physical' >> hotplug (like PCI), the firmware changes the indicator state to >> PRESENT/USABLE >> immediately after hotplug, whereas with 'logical' hotplug (like PHB/CPU), the >> guest OS signals this transition to USABLE through set-indicator calls for >> the >> 9003 sensor/allocation state after hotplug (which also seems to match up with >> the kernel code). >> >> This seems to correspond with the dlpar_acquire_drc() function, but I'm a >> little confused why that's not also called in the PHB path...I think maybe >> in that case it's handled by drmgr in userspace... will take another look >> to confirm. > > Yah, here's the code from drmgr, same expectations:
Yes, the guest expects the state to be UNUSABLE. As mentioned above, I don't think we should be changing the state to PRESENT before notifying the guest. -Nathan > > int > acquire_drc(uint32_t drc_index) > { > int rc; > > say(DEBUG, "Acquiring drc index 0x%x\n", drc_index); > > rc = dr_entity_sense(drc_index); > if (rc != STATE_UNUSABLE) { > say(ERROR, "Entity sense failed for drc %x with %d\n%s\n", > drc_index, rc, entity_sense_error(rc)); > return -1; > } > > say(DEBUG, "Setting allocation state to 'alloc usable'\n"); > rc = rtas_set_indicator(ALLOCATION_STATE, drc_index, ALLOC_USABLE); > if (rc) { > say(ERROR, "Allocation failed for drc %x with %d\n%s\n", > drc_index, rc, set_indicator_error(rc)); > return -1; > } > > say(DEBUG, "Setting indicator state to 'unisolate'\n"); > rc = rtas_set_indicator(ISOLATION_STATE, drc_index, UNISOLATE); > if (rc) { > int ret; > rc = -1; > > say(ERROR, "Unisolate failed for drc %x with %d\n%s\n", > drc_index, rc, set_indicator_error(rc)); > ret = rtas_set_indicator(ALLOCATION_STATE, drc_index, > ALLOC_UNUSABLE); > if (ret) { > say(ERROR, "Failed recovery to unusable state after " > "unisolate failure for drc %x with %d\n%s\n", > drc_index, ret, set_indicator_error(ret)); > } > } > > return rc; > } > >> >>> >>> Regards, >>> Bharata.