On Thu, Oct 09, 2014 at 01:53:16PM +0800, Chen, Tiejun wrote: > > > On 2014/10/7 15:26, Michael S. Tsirkin wrote: > >On Tue, Sep 30, 2014 at 10:43:09AM +0800, Chen, Tiejun wrote: > >>On 2014/9/29 18:01, Michael S. Tsirkin wrote: > >>>On Sun, Sep 28, 2014 at 10:59:05AM +0800, Chen, Tiejun wrote: > >>>>On 2014/9/3 9:40, Kay, Allen M wrote: > >>>>> > >>>>> > >>>>>>-----Original Message----- > >>>>>>From: Chen, Tiejun > >>>>>>Sent: Monday, September 01, 2014 12:50 AM > >>>>>>To: Michael S. Tsirkin > >>>>>>Cc: xen-de...@lists.xensource.com; Kay, Allen M; qemu-devel@nongnu.org; > >>>>>>Konrad Rzeszutek Wilk > >>>>>>Subject: Re: [Qemu-devel] [Xen-devel] [PATCH 2/2] xen:i386:pc_piix: > >>>>>>create > >>>>>>isa bridge specific to IGD passthrough > >>>>>> > >>>>>>On 2014/9/1 14:05, Michael S. Tsirkin wrote: > >>>>>>>On Mon, Sep 01, 2014 at 10:50:37AM +0800, Chen, Tiejun wrote: > >>>>>>>>On 2014/8/31 16:58, Michael S. Tsirkin wrote: > >>>>>>>>>On Fri, Aug 29, 2014 at 09:28:50AM +0800, Chen, Tiejun wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>On 2014/8/28 8:56, Chen, Tiejun wrote: > >>>>>>>>>>>>>>>>>+ */ > >>>>>>>>>>>>>>>>>+ dev = pci_create_simple(bus, PCI_DEVFN(0x1f, 0), > >>>>>>>>>>>>>>>>>+ > >>>>>>>>>>>>>>>>>"xen-igd-passthrough-isa-bridge"); > >>>>>>>>>>>>>>>>>+ if (dev) { > >>>>>>>>>>>>>>>>>+ r = xen_host_pci_device_get(&hdev, 0, 0, > >>>>>>>>>>>>>>>>>+ PCI_DEVFN(0x1f, > >>>>>>>>>>>>>>>>>0), 0); > >>>>>>>>>>>>>>>>>+ if (!r) { > >>>>>>>>>>>>>>>>>+ pci_config_set_vendor_id(dev->config, > >>>>>>hdev.vendor_id); > >>>>>>>>>>>>>>>>>+ pci_config_set_device_id(dev->config, > >>>>>>>>>>>>>>>>>+ hdev.device_id); > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>Can you, instead, implement the reverse logic, probing the card > >>>>>>>>>>>>>>and supplying the correct device id for PCH? > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>Here what is your so-called reverse logic as I already asked you > >>>>>>>>>>>>>previously? Do you mean I should list all PCHs with a combo > >>>>>>>>>>>>>illustrated with the vendor/device id in advance? Then look up > >>>>>>>>>>>>>if we can find a > >>>>>>>>>>>> > >>>>>>>>>>>>Michael, > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>Ping. > >>>>>>>>>>> > >>>>>>>>>>>Thanks > >>>>>>>>>>>Tiejun > >>>>>>>>>>> > >>>>>>>>>>>>Could you explain this exactly? Then I can try follow-up your > >>>>>>>>>>>>idea ASAP if this is necessary and possible. > >>>>>>>>>> > >>>>>>>>>>Michel, > >>>>>>>>>> > >>>>>>>>>>Could you give us some explanation for your "reverse logic" when > >>>>>>>>>>you're free? > >>>>>>>>>> > >>>>>>>>>>Thanks > >>>>>>>>>>Tiejun > >>>>>>>>> > >>>>>>>>>So future drivers will look at device ID for the card and figure out > >>>>>>>>>how things should work from there. > >>>>>>>>>Old drivers still poke at device id of the chipset for this, but > >>>>>>>>>maybe qemu can do what newer drivers do: > >>>>>>>>>look at the card and figure out what guest should do, then present > >>>>>>>>>the appropriate chipset id. > >>>>>>>>> > >>>>>>>>>This is based on what Jesse said: > >>>>>>>>> Practically speaking, we could probably assume specific GPU/PCH > >>>>>>combos, > >>>>>>>>> as I don't think they're generally mixed across generations, > >>>>>>>>> though > >>>>>>SNB > >>>>>>>>> and IVB did have compatible sockets, so there is the > >>>>>>>>> possibility of > >>>>>>>>> mixing CPT and PPT PCHs, but those are register identical as > >>>>>>>>> far as the > >>>>>>>>> graphics driver is concerned, so even that should be safe. > >>>>>>>>> > >>>>>>>> > >>>>>>>>Michael, > >>>>>>>> > >>>>>>>>Thanks for your explanation. > >>>>>>>> > >>>>>>>>>so the idea is to have a reverse table mapping GPU back to PCH. > >>>>>>>>>Present to guest the ID that will let it assume the correct GPU. > >>>>>>>> > >>>>>>>>I guess you mean we should create to maintain such a table: > >>>>>>>> > >>>>>>>>[GPU Card: device_id(s), PCH: device_id] > >>>>>>>> > >>>>>>>>Then each time, instead of exposing that real PCH device id directly, > >>>>>>>>qemu first can get the real GPU device id to lookup this table to > >>>>>>>>present a appropriate PCH's device id to the guest. > >>>>>>>> > >>>>>>>>And looks here that appropriate PCH's device id is not always a that > >>>>>>>>real PCH's device id. Right? If I'm wrong please correct me. > >>>>>>> > >>>>>>>Exactly: we don't really care what the PCH ID is - we only need it for > >>>>>>>the guest driver to do the right thing. > >>>>>> > >>>>>>Agreed. > >>>>>> > >>>>>>I need to ask Allen to check if one given GPU card device id is always > >>>>>>corresponding to one given PCH on both HSW and BDW currently. If yes, I > >>>>>>can > >>>>>>do this quickly. Otherwise I need some time to establish that sort > >>>>>>of connection. > >>>> > >>>>Michael, > >>>> > >>>>Sorry for this delay response but please keep in mind we really are in > >>>>this > >>>>process. > >>>> > >>>>You know this involve different GPU components we have to take some time > >>>>to > >>>>communicate or even discuss internal. > >>>> > >>>>Now I have a draft codes, could you take a look at this? I hope that > >>>>comment > >>>>can help us to understand something, but if you have any question, we can > >>>>further respond inline. > >>>> > >>>>--- > >>>>typedef struct { > >>>> uint16_t gpu_device_id; > >>>> uint16_t pch_device_id; > >>>>} XenIGDDeviceIDInfo; > >>>> > >>>>/* In real world different GPU should have different PCH. But actually > >>>> * the different PCH DIDs likely map to different PCH SKUs. We do the > >>>> * same thing for the GPU. For PCH, the different SKUs are going to be > >>>> * all the same silicon design and implementation, just different > >>>> * features turn on and off with fuses. The SW interfaces should be > >>>> * consistent across all SKUs in a given family (eg LPT). But just same > >>>> * features may not be supported. > >>>> * > >>>> * Most of these different PCH features probably don't matter to the > >>>> * Gfx driver, but obviously any difference in display port connections > >>>> * will so it should be fine with any PCH in case of passthrough. > >>>> * > >>>> * So currently use one PCH version, 0x8c4e, to cover all HSW scenarios, > >>>> * 0x9cc3 for BDW. > >>> > >>>Pls write Haswell and Broadwell in full in comment. > >> > >>Are you saying I should list all PCH device ids both of HSW and BDW here > >>like this format? > >> > >>HSWGT1D: 0xABCD > >>... > > > >I am merely saying don't say "HSW, BDW" in comments. Say "Haswell, > > Sorry I'm misunderstanding what you mean. > > >Broadwell, or if you like HWS(Haswell), BDW(Broadwell). > > Looks the latter is moderate. > > + * So currently use one PCH version, 0x8c4e, to cover all HSW(Haswell) > + * scenarios, 0x9cc3 for BDW(Broadwell). > + */
Exactly. > > > > >>> > >>>> */ > >>>>static const XenIGDDeviceIDInfo xen_igd_combo_id_infos[] = { > >>>> /* HSW Classic */ > >>>> {0x0402, 0x8c4e}, /* HSWGT1D, HSWD_w7 */ > >>>> {0x0406, 0x8c4e}, /* HSWGT1M, HSWM_w7 */ > >>>> {0x0412, 0x8c4e}, /* HSWGT2D, HSWD_w7 */ > >>>> {0x0416, 0x8c4e}, /* HSWGT2M, HSWM_w7 */ > >>>> {0x041E, 0x8c4e}, /* HSWGT15D, HSWD_w7 */ > >>>> /* HSW ULT */ > >>>> {0x0A06, 0x8c4e}, /* HSWGT1UT, HSWM_w7 */ > >>>> {0x0A16, 0x8c4e}, /* HSWGT2UT, HSWM_w7 */ > >>>> {0x0A26, 0x8c4e}, /* HSWGT3UT, HSWM_w7 */ > >>>> {0x0A2E, 0x8c4e}, /* HSWGT3UT28W, HSWM_w7 */ > >>>> {0x0A1E, 0x8c4e}, /* HSWGT2UX, HSWM_w7 */ > >>>> {0x0A0E, 0x8c4e}, /* HSWGT1ULX, HSWM_w7 */ > >>>> /* HSW CRW */ > >>>> {0x0D26, 0x8c4e}, /* HSWGT3CW, HSWM_w7 */ > >>>> {0x0D22, 0x8c4e}, /* HSWGT3CWDT, HSWD_w7 */ > >>>> /* HSW Server */ > >>>> {0x041A, 0x8c4e}, /* HSWSVGT2, HSWD_w7 */ > >>>> /* HSW SRVR */ > >>>> {0x040A, 0x8c4e}, /* HSWSVGT1, HSWD_w7 */ > >>>> /* BSW */ > >>>> {0x1606, 0x9cc3}, /* BDWULTGT1, BDWM_w7 */ > >>>> {0x1616, 0x9cc3}, /* BDWULTGT2, BDWM_w7 */ > >>>> {0x1626, 0x9cc3}, /* BDWULTGT3, BDWM_w7 */ > >>>> {0x160E, 0x9cc3}, /* BDWULXGT1, BDWM_w7 */ > >>>> {0x161E, 0x9cc3}, /* BDWULXGT2, BDWM_w7 */ > >>>> {0x1602, 0x9cc3}, /* BDWHALOGT1, BDWM_w7 */ > >>>> {0x1612, 0x9cc3}, /* BDWHALOGT2, BDWM_w7 */ > >>>> {0x1622, 0x9cc3}, /* BDWHALOGT3, BDWM_w7 */ > >>>> {0x162B, 0x9cc3}, /* BDWHALO28W, BDWM_w7 */ > >>>> {0x162A, 0x9cc3}, /* BDWGT3WRKS, BDWM_w7 */ > >>>> {0x162D, 0x9cc3}, /* BDWGT3SRVR, BDWM_w7 */ > >>>>}; > >>>> > >>>>static void xen_igd_passthrough_pciisabridge_get_pci_device_id(Object > >>>>*obj, > >>>> Visitor > >>>> *v, > >>>> void > >>>> *opaque, > >>>> const char > >>>>*name, > >>>> Error > >>>> **errp) > >>>>{ > >>>> uint32_t value = 0; > >>>> XenHostPCIDevice hdev; > >>>> int r = 0, num; > >>>> > >>>> r = xen_host_pci_device_get(&hdev, 0, 0, 0x02, 0); > >>>> if (!r) { > >>>> value = hdev.device_id; > >>>> > >>>> num = sizeof(xen_igd_combo_id_infos)/sizeof(uint16_t); > >>>> for (r = 0; r < num; r++) > >>>> if (value == xen_igd_combo_id_infos[r].gpu_device_id) > >>>> value = xen_igd_combo_id_infos[r].pch_device_id; > >>> > >>>that's messy, I would use different variables for ID > >>>of GPU and PCH. > >> > >> for (r = 0; r < num; r++) > >> if (gpu_id == xen_igd_combo_id_infos[r].gpu_device_id) > >> pch_id = xen_igd_combo_id_infos[r].pch_device_id; > > > >yea > > > >>> > >>>> } > >>>> > >>>> visit_type_uint32(v, &value, name, errp); > >>> > >>> > >>>what I was suggesting is to start with the GPU device ID > >>>(you can get it e.g. from the config array). > >> > >>I think current codes is doing this: > >> > >> r = xen_host_pci_device_get(&hdev, 0, 0, 0x02, 0); > >> if (!r) { > >> gpu_id = hdev.device_id; > >> > >>Here xen_host_pci_device_get() is a wrapper of reading pci sysfs. > > > > > >Do you want the device ID of the GPU? > > Yes. > > >So please don't poke at it directly in sysfs. > >Instead, find the GPU's PCIDevice and do > > gpu->config_read(); > >or just get from gpu->config. > > Good, but how can we get gpu as pci_dev instance here? Then we can call > pci_dev->config_read as expect. You can't do this at host init but you can do this when gpu is added or at machine done. > > > >>>Use that to look up the PCH ID in xen_igd_combo_id_infos. > >>>If there, override the PCH ID. > >>> > >>>If not there, this is a new device so its driver will > >>>not look at PCH at all, we can make it whatever or > >>>skip it completely. > >> > >>I think we should return one initialized value, 0xffff, since often this > >>represents an invalid PCI device. > > > >I'm not sure you really want an invalid PCI device, > >would be nicer not to have a PCH device there at all. > > Yes, I means 0xffff indicates that there's no any PCI device. > > > > > > >>> > >>>This seems to almost do this, however > >>>- Why are you looking at host PCH device ID at all? > >>>- Why don't you look at the GPU device ID? > >> > >>We still fix this bridge at 1f.0, and current our implementation can cover > >>our requirement and safe. I means this bridge should not be used for other > >>use cases, so if its still be accessed we mightn't take care of them, right? > > > >I was asking about the host bridge device, not guest. > > Are you assuming what will happen to a new device? Yes, the assumption is a new devices will use a new driver which does everything through BAR, so it will not need these hacks. > > > >>> > >>> > >>>>} > >>>> > >>>>static void xen_igd_passthrough_isa_bridge_initfn(Object *obj) > >>>>{ > >>>> object_property_add(obj, "device-id", "int", > >>>> > >>>> xen_igd_passthrough_pciisabridge_get_pci_device_id, > >>>> NULL, NULL, NULL, NULL); > >>>>} > >>> > >>>OK and what reads this property? > >> > >>In sequent patch I will do something like this, > >> > >>@@ -464,6 +464,32 @@ static void pc_xen_hvm_init(MachineState *machine) > >> } > >> } > >> > >>+static void xen_igd_passthrough_isa_bridge_create(PCIBus *bus) > >>+{ > >>+ struct PCIDevice *dev; > >>+ Error *local_err = NULL; > >>+ uint16_t device_id = 0xffff; > >>+ > >>+ /* Currently IGD drivers always need to access PCH by 1f.0. */ > >>+ dev = pci_create_simple(bus, PCI_DEVFN(0x1f, 0), > >>+ "xen-igd-passthrough-isa-bridge"); > >>+ > >>+ /* Identify PCH card with its own real vendor/device ids. > >>+ * Here that vendor id is always PCI_VENDOR_ID_INTEL. > >>+ */ > >>+ if (dev) { > >>+ device_id = (uint16_t)object_property_get_int(OBJECT(dev), > >>+ "device-id", > >>+ &local_err); > > > >cast is not needed here. > > > >>+ if ((!local_err) && (device_id != 0xffff)) { > > > >too many (). > > + device_id = object_property_get_int(OBJECT(dev), "device-id", > + &local_err); > + if (!local_err && device_id != 0xffff) { > > > > >>+ pci_config_set_device_id(dev->config, device_id); > >>+ return; > >>+ } > >>+ } > >>+ > >>+ fprintf(stderr, "xen set xen-igd-passthrough-isa-bridge failed\n"); > >>+} > >>+ > >> static void xen_igd_passthrough_pc_hvm_init(MachineState *machine) > >> { > >> PCIBus *bus; > >>@@ -473,6 +499,7 @@ static void xen_igd_passthrough_pc_hvm_init(MachineState > >>*machine) > >> bus = pci_find_primary_bus(); > >> if (bus != NULL) { > >> pci_create_simple(bus, -1, "xen-platform"); > >>+ xen_igd_passthrough_isa_bridge_create(bus); > >> } > >> } > >> #endif > >> > >>Tiejun > >> > > > >By the way, I wonder: would it make sense to add this code in xen_pt_initfn? > > > >You would then do the following there: > > > > - look up GPU by device/vendor ID > > - if a legacy GPU is found, get the ID, translate > > to host bridge ID, and create the compat bridge. > > > >saves the hassle of using a distinct machine type for passthrough, no? > > As I remember Paolo thought we'd better have a separate machine specific to > IGD passthrough, so I'm not sure if its fine to him now. > > Thanks > Tiejun I think the point was mostly to reserve 1f to prevent devices from using it. As we populate slots in order it doesn't seem to important ... Paolo? > > > >>> > >>>>Thanks > >>>>Tiejun > >>>> > >>>> > >>>>>> > >>>>> > >>>>>If I understand this correctly, the only difference is instead of > >>>>>reading PCH DevID/RevID from the host hardware, QEMU inserts those > >>>>>values into PCH virtual device by looking at the reverse mapping table > >>>>>it maintains. > >>>>> > >>>>>I agree the downside of doing this is the reverse mapping table may be > >>>>>hard to maintain. > >>>>> > >>>>>What is the advantage of doing this instead of having QEMU reading it > >>>>>from the host? Is it to test to make sure reverse mapping methods works > >>>>>before it is adopted in the new drivers? > >>>>> > >>>>>>Thanks > >>>>>>Tiejun > >>>>>> > >>>>>>> > >>>>>>>>> > >>>>>>>>>the problem with these tables is they are hard to keep up to date > >>>>>>>> > >>>>>>>>Yeah. But I think currently we can just start from some modern CPU > >>>>>>>>such as HSW and BDW, then things could be easy. > >>>>>>>> > >>>>>>>>Allen, > >>>>>>>> > >>>>>>>>Any idea to this suggestion? > >>>>>>>> > >>>>>>>>>as new hardware comes out, but as future hardware won't need these > >>>>>>>>>hacks, we shall be fine. > >>>>>>>> > >>>>>>>>Yeah. > >>>>>>>> > >>>>>>>>Thanks > >>>>>>>>Tiejun > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>Thanks > >>>>>>>>>>>>Tiejun > >>>>>>>>>>>> > >>>>>>>>>>>>>matched PCH? If yes, what is that benefit you expect in > >>>>>>>>>>>>>passthrough case? Shouldn't we pass these info to VM directly in > >>>>>>passthrough case? > >>>>>>>>>>>>> > >>>>>>>>>>>>>Thanks > >>>>>>>>>>>>>Tiejun > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>>>Allen > >>>>> > >>>>> > >>> > > > >