Quoting Michael Roth (2014-09-04 11:12:15) > Quoting Bharata B Rao (2014-09-04 10:08:20) > > On Thu, Sep 4, 2014 at 4:33 AM, Michael Roth <mdr...@linux.vnet.ibm.com> > > wrote: > > >> > +static int spapr_device_hotplug_add(DeviceState *qdev, PCIDevice *dev) > > >> > +{ > > >> > + sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(qdev); > > >> > + sPAPRDrcEntry *drc_entry, *drc_entry_slot; > > >> > + sPAPRConfigureConnectorState *ccs; > > >> > + int slot = PCI_SLOT(dev->devfn); > > >> > + int offset, ret; > > >> > + void *fdt_orig, *fdt; > > >> > + char nodename[512]; > > >> > + uint32_t encoded = > > >> > ENCODE_DRC_STATE(INDICATOR_ENTITY_SENSE_PRESENT, > > >> > + INDICATOR_ENTITY_SENSE_MASK, > > >> > + INDICATOR_ENTITY_SENSE_SHIFT); > > >> > + > > >> > > >> I am building on this infrastructure of yours and adding CPU hotplug > > >> support to sPAPR guests. > > >> > > >> So we start with dr state of UNUSABLE and change it to PRESENT like > > >> above when performing hotplug operation. But after this, in case of > > >> CPU hotplug, the CPU hotplug code in the kernel > > >> (arch/powerpc/platforms/pseries/dlpar.c:dlpar_acquire_drc()) does > > >> get-sensor-state rtas call and expects the dr state to be UNUSABLE. Is > > >> the guest kernel right in expecting dr state to be in UNUSABLE state > > >> like this ? I have in fact disabled this check in the guest kernel to > > >> get a CPU hotplugged successfully, but wanted to understand the state > > >> changes and the expectations from the guest kernel correctly. > > > > > > According to PAPR+ 2.7 13.5.3.3, > > > > > > PRESENT (1): > > > > > > Returned for logical and physical DR entities when the DR connector is > > > allocated to the OS and the DR entity is present. For physical DR > > > entities, > > > this indicates that the DR connector actually has a DR entity plugged > > > into > > > it. For DR connectors of physical DR entities, the DR connector must be > > > allocated to the OS to return this value, otherwise a status of -3, no > > > such > > > sensor implemented, will be returned from the get-sensor-state RTAS > > > call. For > > > DR connectors of logical DR entities, the DR connector must be > > > allocated to > > > the OS to return this value, otherwise a sensor value of 2 or 3 will be > > > returned. > > > > > > UNUSABLE (2): > > > > > > Returned for logical DR entities when the DR entity is not currently > > > available to the OS, but may possibly be made available to the OS by > > > calling > > > set-indicator with the allocation-state indicator, setting that > > > indicator to > > > usable. > > > > > > So it seems 'PRESENT' is in fact the right value immediately after PCI > > > hotplug, but it doesn't seem clear from the documentation whether > > > 'PRESENT' > > > or 'UNUSABLE' is more correct immediately after CPU hotplug. What does > > > seem clear as that UNUSABLE does not have any use in the case of PCI > > > devices: just PRESENT/EMPTY(0). > > > > > > But we actually 0-initialize the sensor field for DRCEntrys corresponding > > > to PCI devices, which corresponds to 'EMPTY' (0), so the handling seems > > > correct for PCI devices... > > > > Thanks Michael for the above information on PRESENT and USABLE states. > > > > > > > > And since we already initialize PHB sensors to UNUSABLE in the top-level > > > DRC list, I'm not sure why adjacent CPU entries would be affected by what > > > we do later for PCI devices? > > > > Sorry if I wasn't clear enough in my previous mail. CPU hotplug isn't > > affected by what you do for PCI devices, but... > > > > > It seems like you'd just need to do the > > > equivalent of what we do for PHBs during realize: > > > > when I try to do the same state changes for CPU hotplug, things don't > > work as expected. > > > > > > > > spapr_add_phb_to_drc_table(sphb->buid, 2 /* Unusable */); > > > > > > So I'm not sure where the need for guest kernel changes is coming from for > > > CPU hotplug. > > > > When the resource is hotplugged, you change the state from UNUSABLE to > > PRESENT in QEMU before signalling the guest (via check exception irq). > > But the same state change in CPU hotplug case isn't as per guest > > kernel's expectation. > > > > > Do you have a snippet of what the initialize/hot_add hooks > > > like in your case? > > > > I am talking about this piece of code in the the kernel in > > arch/powerpc/platforms/pseries/dlpar.c > > > > int dlpar_acquire_drc(u32 drc_index) > > { > > int dr_status, rc; > > > > rc = rtas_call(rtas_token("get-sensor-state"), 2, 2, &dr_status, > > DR_ENTITY_SENSE, drc_index); > > if (rc || dr_status != DR_ENTITY_UNUSABLE) > > return -1; > > ... > > } > > > > I have circumvented this problem by not setting the state to PRESENT > > in my current hotplug patch. You can refer to the same in > > spapr_cpu_hotplug_add() routine that's part of my patch 14/15 > > (https://lists.gnu.org/archive/html/qemu-devel/2014-09/msg00757.html) > > Yah, that's what I was getting at: at least just to get things working > for testing, just avoid the PRESENT bits in your hot_add_cpu hook rather > than patching the guest. Unfortunately the documentation isn't particularly > clear about which of these approaches is more correct as far as CPUs go. But > looking at it again: > > UNUSABLE (2): > > Returned for logical DR entities when the DR entity is not currently > available to the OS, but may possibly be made available to the OS by > calling > set-indicator with the allocation-state indicator, setting that indicator > to > usable. > > That 'usable' indicator setting is documented for set-indicator as (1), which > happens to correspond to PRESENT (1). So my read would be that for 'physical' > hotplug (like PCI), the firmware changes the indicator state to PRESENT/USABLE > immediately after hotplug, whereas with 'logical' hotplug (like PHB/CPU), the > guest OS signals this transition to USABLE through set-indicator calls for the > 9003 sensor/allocation state after hotplug (which also seems to match up with > the kernel code). > > This seems to correspond with the dlpar_acquire_drc() function, but I'm a > little confused why that's not also called in the PHB path...I think maybe > in that case it's handled by drmgr in userspace... will take another look > to confirm.
Yah, here's the code from drmgr, same expectations: int acquire_drc(uint32_t drc_index) { int rc; say(DEBUG, "Acquiring drc index 0x%x\n", drc_index); rc = dr_entity_sense(drc_index); if (rc != STATE_UNUSABLE) { say(ERROR, "Entity sense failed for drc %x with %d\n%s\n", drc_index, rc, entity_sense_error(rc)); return -1; } say(DEBUG, "Setting allocation state to 'alloc usable'\n"); rc = rtas_set_indicator(ALLOCATION_STATE, drc_index, ALLOC_USABLE); if (rc) { say(ERROR, "Allocation failed for drc %x with %d\n%s\n", drc_index, rc, set_indicator_error(rc)); return -1; } say(DEBUG, "Setting indicator state to 'unisolate'\n"); rc = rtas_set_indicator(ISOLATION_STATE, drc_index, UNISOLATE); if (rc) { int ret; rc = -1; say(ERROR, "Unisolate failed for drc %x with %d\n%s\n", drc_index, rc, set_indicator_error(rc)); ret = rtas_set_indicator(ALLOCATION_STATE, drc_index, ALLOC_UNUSABLE); if (ret) { say(ERROR, "Failed recovery to unusable state after " "unisolate failure for drc %x with %d\n%s\n", drc_index, ret, set_indicator_error(ret)); } } return rc; } > > > > > Regards, > > Bharata.