Re: [PATCH v6,1/2] PCI: hv: Detect and fix Hyper-V PCI domain number collision
On Thu, Aug 15, 2019 at 05:01:37PM +, Haiyang Zhang wrote: > Currently in Azure cloud, for passthrough devices, the host sets the device > instance ID's bytes 8 - 15 to a value derived from the host HWID, which is > the same on all devices in a VM. So, the device instance ID's bytes 8 and 9 > provided by the host are no longer unique. This affects all Azure hosts > since July 2018, and can cause device passthrough to VMs to fail because > the bytes 8 and 9 are used as PCI domain number. Collision of domain > numbers will cause the second device with the same domain number fail to > load. > > In the cases of collision, we will detect and find another number that is > not in use. > > Suggested-by: Michael Kelley > Signed-off-by: Haiyang Zhang > Acked-by: Sasha Levin > --- > drivers/pci/controller/pci-hyperv.c | 92 > +++-- > 1 file changed, 79 insertions(+), 13 deletions(-) I have applied both patches to pci/hv for v5.4. Thanks, Lorenzo > diff --git a/drivers/pci/controller/pci-hyperv.c > b/drivers/pci/controller/pci-hyperv.c > index 40b6254..31b8fd5 100644 > --- a/drivers/pci/controller/pci-hyperv.c > +++ b/drivers/pci/controller/pci-hyperv.c > @@ -2510,6 +2510,48 @@ static void put_hvpcibus(struct hv_pcibus_device *hbus) > complete(&hbus->remove_event); > } > > +#define HVPCI_DOM_MAP_SIZE (64 * 1024) > +static DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE); > + > +/* > + * PCI domain number 0 is used by emulated devices on Gen1 VMs, so define 0 > + * as invalid for passthrough PCI devices of this driver. > + */ > +#define HVPCI_DOM_INVALID 0 > + > +/** > + * hv_get_dom_num() - Get a valid PCI domain number > + * Check if the PCI domain number is in use, and return another number if > + * it is in use. > + * > + * @dom: Requested domain number > + * > + * return: domain number on success, HVPCI_DOM_INVALID on failure > + */ > +static u16 hv_get_dom_num(u16 dom) > +{ > + unsigned int i; > + > + if (test_and_set_bit(dom, hvpci_dom_map) == 0) > + return dom; > + > + for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) { > + if (test_and_set_bit(i, hvpci_dom_map) == 0) > + return i; > + } > + > + return HVPCI_DOM_INVALID; > +} > + > +/** > + * hv_put_dom_num() - Mark the PCI domain number as free > + * @dom: Domain number to be freed > + */ > +static void hv_put_dom_num(u16 dom) > +{ > + clear_bit(dom, hvpci_dom_map); > +} > + > /** > * hv_pci_probe() - New VMBus channel probe, for a root PCI bus > * @hdev:VMBus's tracking struct for this root PCI bus > @@ -2521,6 +2563,7 @@ static int hv_pci_probe(struct hv_device *hdev, > const struct hv_vmbus_device_id *dev_id) > { > struct hv_pcibus_device *hbus; > + u16 dom_req, dom; > int ret; > > /* > @@ -2535,19 +2578,34 @@ static int hv_pci_probe(struct hv_device *hdev, > hbus->state = hv_pcibus_init; > > /* > - * The PCI bus "domain" is what is called "segment" in ACPI and > - * other specs. Pull it from the instance ID, to get something > - * unique. Bytes 8 and 9 are what is used in Windows guests, so > - * do the same thing for consistency. Note that, since this code > - * only runs in a Hyper-V VM, Hyper-V can (and does) guarantee > - * that (1) the only domain in use for something that looks like > - * a physical PCI bus (which is actually emulated by the > - * hypervisor) is domain 0 and (2) there will be no overlap > - * between domains derived from these instance IDs in the same > - * VM. > + * The PCI bus "domain" is what is called "segment" in ACPI and other > + * specs. Pull it from the instance ID, to get something usually > + * unique. In rare cases of collision, we will find out another number > + * not in use. > + * > + * Note that, since this code only runs in a Hyper-V VM, Hyper-V > + * together with this guest driver can guarantee that (1) The only > + * domain used by Gen1 VMs for something that looks like a physical > + * PCI bus (which is actually emulated by the hypervisor) is domain 0. > + * (2) There will be no overlap between domains (after fixing possible > + * collisions) in the same VM. >*/ > - hbus->sysdata.domain = hdev->dev_instance.b[9] | > -hdev->dev_instance.b[8] << 8; > + dom_req = hdev->dev_instance.b[8] << 8 | hdev->dev_instance.b[9]; > + dom = hv_get_dom_num(dom_req); > + > + if (dom == HVPCI_DOM_INVALID) { > + dev_err(&hdev->device, > + "Unable to use dom# 0x%hx or other numbers", dom_req); > + ret = -EINVAL; > + goto free_bus; > + } > + > + if (dom != dom_req) > + dev_info(&hdev->device, > + "PCI dom# 0x%hx has collision, using 0x%hx", > + dom_req, do
RE: [PATCH v6,1/2] PCI: hv: Detect and fix Hyper-V PCI domain number collision
> -Original Message- > From: Lorenzo Pieralisi > Sent: Friday, August 16, 2019 5:52 AM > To: Haiyang Zhang > Cc: sas...@kernel.org; bhelg...@google.com; linux- > hyp...@vger.kernel.org; linux-...@vger.kernel.org; KY Srinivasan > ; Stephen Hemminger ; > o...@aepfle.de; vkuznets ; linux- > ker...@vger.kernel.org > Subject: Re: [PATCH v6,1/2] PCI: hv: Detect and fix Hyper-V PCI domain > number collision > > On Thu, Aug 15, 2019 at 05:01:37PM +, Haiyang Zhang wrote: > > Currently in Azure cloud, for passthrough devices, the host sets the > > device instance ID's bytes 8 - 15 to a value derived from the host > > HWID, which is the same on all devices in a VM. So, the device > > instance ID's bytes 8 and 9 provided by the host are no longer unique. > > This affects all Azure hosts since July 2018, and can cause device > > passthrough to VMs to fail because the bytes 8 and 9 are used as PCI > > domain number. Collision of domain numbers will cause the second > > device with the same domain number fail to load. > > > > In the cases of collision, we will detect and find another number that > > is not in use. > > > > Suggested-by: Michael Kelley > > Signed-off-by: Haiyang Zhang > > Acked-by: Sasha Levin > > I assume you will take care of backporting and sending this patch to stable > kernels given that you have not applied any tag with such request. > > I appreciate it may not be easy to define but a Fixes: tag would help. Sure, I will add a Fixes tag, and Cc stable. Usually Sasha from our team will do the stable porting in batches. Thanks, - Haiyang
Re: [PATCH v6,1/2] PCI: hv: Detect and fix Hyper-V PCI domain number collision
On Thu, Aug 15, 2019 at 05:01:37PM +, Haiyang Zhang wrote: > Currently in Azure cloud, for passthrough devices, the host sets the device > instance ID's bytes 8 - 15 to a value derived from the host HWID, which is > the same on all devices in a VM. So, the device instance ID's bytes 8 and 9 > provided by the host are no longer unique. This affects all Azure hosts > since July 2018, and can cause device passthrough to VMs to fail because > the bytes 8 and 9 are used as PCI domain number. Collision of domain > numbers will cause the second device with the same domain number fail to > load. > > In the cases of collision, we will detect and find another number that is > not in use. > > Suggested-by: Michael Kelley > Signed-off-by: Haiyang Zhang > Acked-by: Sasha Levin I assume you will take care of backporting and sending this patch to stable kernels given that you have not applied any tag with such request. I appreciate it may not be easy to define but a Fixes: tag would help. Thanks, Lorenzo > --- > drivers/pci/controller/pci-hyperv.c | 92 > +++-- > 1 file changed, 79 insertions(+), 13 deletions(-) > > diff --git a/drivers/pci/controller/pci-hyperv.c > b/drivers/pci/controller/pci-hyperv.c > index 40b6254..31b8fd5 100644 > --- a/drivers/pci/controller/pci-hyperv.c > +++ b/drivers/pci/controller/pci-hyperv.c > @@ -2510,6 +2510,48 @@ static void put_hvpcibus(struct hv_pcibus_device *hbus) > complete(&hbus->remove_event); > } > > +#define HVPCI_DOM_MAP_SIZE (64 * 1024) > +static DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE); > + > +/* > + * PCI domain number 0 is used by emulated devices on Gen1 VMs, so define 0 > + * as invalid for passthrough PCI devices of this driver. > + */ > +#define HVPCI_DOM_INVALID 0 > + > +/** > + * hv_get_dom_num() - Get a valid PCI domain number > + * Check if the PCI domain number is in use, and return another number if > + * it is in use. > + * > + * @dom: Requested domain number > + * > + * return: domain number on success, HVPCI_DOM_INVALID on failure > + */ > +static u16 hv_get_dom_num(u16 dom) > +{ > + unsigned int i; > + > + if (test_and_set_bit(dom, hvpci_dom_map) == 0) > + return dom; > + > + for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) { > + if (test_and_set_bit(i, hvpci_dom_map) == 0) > + return i; > + } > + > + return HVPCI_DOM_INVALID; > +} > + > +/** > + * hv_put_dom_num() - Mark the PCI domain number as free > + * @dom: Domain number to be freed > + */ > +static void hv_put_dom_num(u16 dom) > +{ > + clear_bit(dom, hvpci_dom_map); > +} > + > /** > * hv_pci_probe() - New VMBus channel probe, for a root PCI bus > * @hdev:VMBus's tracking struct for this root PCI bus > @@ -2521,6 +2563,7 @@ static int hv_pci_probe(struct hv_device *hdev, > const struct hv_vmbus_device_id *dev_id) > { > struct hv_pcibus_device *hbus; > + u16 dom_req, dom; > int ret; > > /* > @@ -2535,19 +2578,34 @@ static int hv_pci_probe(struct hv_device *hdev, > hbus->state = hv_pcibus_init; > > /* > - * The PCI bus "domain" is what is called "segment" in ACPI and > - * other specs. Pull it from the instance ID, to get something > - * unique. Bytes 8 and 9 are what is used in Windows guests, so > - * do the same thing for consistency. Note that, since this code > - * only runs in a Hyper-V VM, Hyper-V can (and does) guarantee > - * that (1) the only domain in use for something that looks like > - * a physical PCI bus (which is actually emulated by the > - * hypervisor) is domain 0 and (2) there will be no overlap > - * between domains derived from these instance IDs in the same > - * VM. > + * The PCI bus "domain" is what is called "segment" in ACPI and other > + * specs. Pull it from the instance ID, to get something usually > + * unique. In rare cases of collision, we will find out another number > + * not in use. > + * > + * Note that, since this code only runs in a Hyper-V VM, Hyper-V > + * together with this guest driver can guarantee that (1) The only > + * domain used by Gen1 VMs for something that looks like a physical > + * PCI bus (which is actually emulated by the hypervisor) is domain 0. > + * (2) There will be no overlap between domains (after fixing possible > + * collisions) in the same VM. >*/ > - hbus->sysdata.domain = hdev->dev_instance.b[9] | > -hdev->dev_instance.b[8] << 8; > + dom_req = hdev->dev_instance.b[8] << 8 | hdev->dev_instance.b[9]; > + dom = hv_get_dom_num(dom_req); > + > + if (dom == HVPCI_DOM_INVALID) { > + dev_err(&hdev->device, > + "Unable to use dom# 0x%hx or other numbers", dom_req); > + ret = -EINVAL; > + goto free_bus; > + } > + > +