On Mon, 2013-12-16 at 17:48 -0700, Bjorn Helgaas wrote:
> On Tue, Dec 10, 2013 at 11:48:45AM -0700, Alex Williamson wrote:
> > While we don't really have any infrastructure for making use of VC
> > support, the system BIOS can configure the topology to non-default
> > VC values prior to boot.  This may be due to silicon bugs, desire to
> > reserve traffic classes, or perhaps just BIOS bugs.  When we reset
> > devices, the VC configuration may return to default values, which can
> > be incompatible with devices upstream.  For instance, Nvidia GRID
> > cards provide a PCIe switch and some number of GPUs, all supporting
> > VC.  The power-on default for VC is to support TC0-7 across VC0,
> > however some platforms will only enable TC0/VC0 mapping across the
> > topology.  When we do a secondary bus reset on the downstream switch
> > port, the GPU is reset to a TC0-7/VC0 mapping while the opposite end
> > of the link only enables TC0/VC0.  If the GPU attempts to use TC1-7,
> > it fails.
> > 
> > This patch attempts to provide complete support for VC save/restore,
> > even beyond the minimally required use case above.  This includes
> > save/restore and reload of the arbitration table, save/restore and
> > reload of the port arbitration tables, and re-enabling of the
> > channels for VC, VC9, and MFVC capabilities.
> > 
> > Signed-off-by: Alex Williamson <alex.william...@redhat.com>
> > ---
> >  drivers/pci/pci.c |  387 
> > +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> Wow, that's a lot of code just to support resetting a device.  I wish I had
> a better idea, but I don't.

I know, me too.  I tried to keep the code as compact as I could.  If you
look at it on a per capability basis, since this supports VC/VC9/MFVC,
it's not so bad ;)

>   I know we have similar save/restore code in
> pci.c already, but would it make sense to put this in a vc.c to keep pci.c
> from growing without bound?

Yep, I can do that.

> >  1 file changed, 387 insertions(+)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index a898e4e..3a1d060 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -969,6 +969,367 @@ static void pci_restore_pcix_state(struct pci_dev 
> > *dev)
> >     pci_write_config_word(dev, pos + PCI_X_CMD, cap[i++]);
> >  }
> >  
> > +/**
> > + * pci_vc_save_restore_dwords - Save or restore a series of dwords
> > + * @dev: device
> > + * @pos: starting config space position
> > + * @buf: buffer to save to or restore from
> > + * @dwords: number of dwords to save/restore
> > + * @save: whether to save or restore
> > + */
> > +static void pci_vc_save_restore_dwords(struct pci_dev *dev, int pos,
> > +                                  u32 *buf, int dwords, bool save)
> 
> Nothing VC-specific here, so maybe remove "_vc_" from the name.

Sure, but if I make a vc.c there's no reason not to move this function
there so retaining _vc_ avoids polluting the common pci namespace.

> > +{
> > +   int i;
> > +
> > +   for (i = 0; i < dwords; i++, buf++) {
> > +           if (save)
> > +                   pci_read_config_dword(dev, pos + (i * 4), buf);
> > +           else
> > +                   pci_write_config_dword(dev, pos + (i * 4), *buf);
> > +   }
> > +}
> > +
> > +/**
> > + * pci_vc_load_port_arb_table - load and wait for VC arbitration table
> > + * @dev: device
> > + * @pos: starting position of VC capability (VC/VC9/MFVC)
> > + *
> > + * Signal a VC arbitration table to load and wait for completion.
> > + */
> > +static void pci_vc_load_arb_table(struct pci_dev *dev, int pos)
> > +{
> > +   u32 ctrl;
> > +
> > +   pci_read_config_dword(dev, pos + PCI_VC_PORT_CTRL, &ctrl);
> > +   pci_write_config_dword(dev, pos + PCI_VC_PORT_CTRL, ctrl | 1);
> > +   if (pci_wait_for_pending(dev, pos + PCI_VC_PORT_STATUS, 1))
> 
> Comment doesn't match function name.  The spec "Request hardware to apply
> new values" language would be a useful clue that this doesn't actually
> *write* the table; before reading the spec, I initially looked for a
> buffer containing the arbitration table.  It'd be nice to have #defines
> for the bits here, i.e., PCI_VC_PORT_CTRL_LOAD_ARB or similar.  The

Ok

> spec says these are 16-bit registers; shouldn't these be "word" accesses?

The control registers are 32-bit, the status registers are 16-bit.
pci_wait_for_pending uses word access.

> > +           return;
> > +
> > +   dev_err(&dev->dev, "VC arbitration table failed to load\n");
> > +}
> > +
> > +/**
> > + * pci_vc_load_port_arb_table - Load and wait for VC port arbitration table
> > + * @dev: device
> > + * @pos: starting position of VC capability (VC/VC9/MFVC)
> > + * @res: VC res number, ie. VCn (0-7)
> > + *
> > + * Signal a VC port arbitration table to load and wait for completion.
> > + */
> > +static void pci_vc_load_port_arb_table(struct pci_dev *dev, int pos, int 
> > res)
> > +{
> > +   int ctrl_pos, status_pos;
> > +   u32 ctrl;
> > +
> > +   ctrl_pos = pos + PCI_VC_RES_CTRL + (res * PCI_CAP_VC_PER_VC_SIZEOF);
> > +   status_pos = pos + PCI_VC_RES_STATUS + (res * PCI_CAP_VC_PER_VC_SIZEOF);
> > +
> > +   pci_read_config_dword(dev, ctrl_pos, &ctrl);
> > +   pci_write_config_dword(dev, ctrl_pos, ctrl | (1 << 16));
> > +
> > +   if (pci_wait_for_pending(dev, status_pos, 1))
> 
> #defines, please, to help grep users later.

Yep

> > +           return;
> > +
> > +   dev_err(&dev->dev, "VC%d port arbitration table failed to load\n", res);
> > +}
> > +
> > +/**
> > + * pci_vc_enable - Enable virtual channel
> > + * @dev: device
> > + * @pos: starting position of VC capability (VC/VC9/MFVC)
> > + * @res: VC res number, ie. VCn (0-7)
> > + *
> > + * A VC is enabled by setting the enable bit in matching resource control
> > + * registers on both sides of a link.  We therefore need to find the 
> > opposite
> > + * end of the link.  To keep this simple we enable from the downstream 
> > device.
> > + * RC devices do not have an upstream device, nor does it seem that VC9 do
> > + * (spec is unclear).  Once we find the upstream device, match the VC ID to
> > + * get the correct resource, disable and enable on both ends.
> > + */
> > +static void pci_vc_enable(struct pci_dev *dev, int pos, int res)
> > +{
> > +   int ctrl_pos, status_pos, id, pos2, evcc, i, ctrl_pos2, status_pos2;
> > +   u32 ctrl, header, reg1, ctrl2;
> > +   struct pci_dev *link = NULL;
> > +
> > +   /* Enable VCs from the downstream device */
> > +   if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > +       pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)
> > +           return;
> > +
> > +   ctrl_pos = pos + PCI_VC_RES_CTRL + (res * PCI_CAP_VC_PER_VC_SIZEOF);
> > +   status_pos = pos + PCI_VC_RES_STATUS + (res * PCI_CAP_VC_PER_VC_SIZEOF);
> > +
> > +   pci_read_config_dword(dev, ctrl_pos, &ctrl);
> > +   id = (ctrl >> 24) & 0x7;
> 
> #define PCI_VC_RES_CTRL_ID 0x07000000.
> 
> It gets a little cumbersome to have #defines for *both* the mask and the
> shift; the compromise I like is to have a #define for the mask (which shows
> the position in the register) and use bare numbers when we need to shift.
> Then it's easy to find uses of the field with grep or cscope, and the mask
> definition helps manual decoding.  There are several other opportunities
> for mask #defines below.

Ok, figuring out the define for mask vs shift did play a part in doing
neither.  I'll follow your standard.

> In this case, you only compare ID values, so there's actually no need to
> shift it.

Good point.

> > +
> > +   pci_read_config_dword(dev, pos, &header);
> > +
> > +   /* If there is no opposite end of the link, skip to enable */
> > +   if (PCI_EXT_CAP_ID(header) == PCI_EXT_CAP_ID_VC9 ||
> > +       pci_is_root_bus(dev->bus))
> > +           goto enable;
> > +
> > +   pos2 = pci_find_ext_capability(dev->bus->self, PCI_EXT_CAP_ID_VC);
> > +   if (pos2 <= 0)
> > +           goto enable;
> 
> I don't think < 0 is possible (several other occurrences below, too).

Copied from elsewhere, but no need to propagate poor form.

> > +
> > +   pci_read_config_dword(dev->bus->self, pos2 + PCI_VC_PORT_REG1, &reg1);
> > +   evcc = reg1 & PCI_VC_REG1_EVCC;
> > +
> > +   /* VC0 is hardwired enabled, so we can start with 1 */
> > +   for (i = 1; i < evcc + 1; i++) {
> > +           ctrl_pos2 = pos2 + PCI_VC_RES_CTRL +
> > +                           (i * PCI_CAP_VC_PER_VC_SIZEOF);
> > +           status_pos2 = pos2 + PCI_VC_RES_STATUS +
> > +                           (i * PCI_CAP_VC_PER_VC_SIZEOF);
> > +           pci_read_config_dword(dev->bus->self, ctrl_pos2, &ctrl2);
> > +           if (((ctrl2 >> 24) & 0x7) == id) {
> > +                   link = dev->bus->self;
> > +                   break;
> > +           }
> > +   }
> > +
> > +   if (!link)
> > +           goto enable;
> > +
> > +   /* Disable if enabled */
> > +   if (ctrl2 & (1U << 31)) {
> > +           ctrl2 &= ~(1U << 31);
> > +           pci_write_config_dword(link, ctrl_pos2, ctrl2);
> > +   }
> > +
> > +   /* Enable on both ends */
> > +   ctrl2 |= 1U << 31;
> > +   pci_write_config_dword(link, ctrl_pos2, ctrl2);
> > +enable:
> > +   ctrl |= 1U << 31;
> > +   pci_write_config_dword(dev, ctrl_pos, ctrl);
> > +
> > +   if (!pci_wait_for_pending(dev, status_pos, 0x2))
> > +           dev_err(&dev->dev, "VC%d negotiation stuck pending\n", id);
> > +
> > +   if (link && !pci_wait_for_pending(link, status_pos2, 0x2))
> > +           dev_err(&link->dev, "VC%d negotiation stuck pending\n", id);
> > +}
> > +
> > +/**
> > + * pci_vc_do_save_buffer - Size, save, or restore VC state
> > + * @dev: device
> > + * @pos: starting position of VC capability (VC/VC9/MFVC)
> > + * @save_state: buffer for save/restore
> > + * @name: for error message
> > + * @save: if provided a buffer, this indicates what to do with it
> > + *
> > + * Walking Virtual Channel config space to size, save, or restore it
> > + * is complicated, so we do it all from one function to reduce code and
> > + * guarantee ordering matches in the buffer.  When called with NULL
> > + * @save_state, return the size of the necessary save buffer.  When called
> > + * with a non-NULL @save_state, @save determines whether we save to the
> > + * buffer or restore from it.
> > + */
> > +static int pci_vc_do_save_buffer(struct pci_dev *dev, int pos,
> > +                            struct pci_cap_saved_state *save_state,
> > +                            bool save)
> > +{
> > +   u32 reg1;
> > +   char evcc, lpevcc, parb_size;
> > +   int i, len = 0;
> > +   u32 *buf = save_state ? save_state->cap.data : NULL;
> > +
> > +   /* Sanity check buffer size for save/restore */
> > +   if (buf && save_state->cap.size !=
> > +       pci_vc_do_save_buffer(dev, pos, NULL, save)) {
> > +           dev_err(&dev->dev,
> > +                   "VC save buffer size does not match @0x%x\n", pos);
> > +           return -ENOMEM;
> > +   }
> > +
> > +   pci_read_config_dword(dev, pos + PCI_VC_PORT_REG1, &reg1);
> > +   /* Extended VC Count (not counting VC0) */
> > +   evcc = reg1 & PCI_VC_REG1_EVCC;
> > +   /* Low Priority Extended VC Count (not counting VC0) */
> > +   lpevcc = (reg1 >> 4) & PCI_VC_REG1_EVCC;
> 
> Please make a new #define, since LPEVCC is a separate field.

Ok

Thanks for the review,

Alex

> +     /* Port Arbitration Table Entry Size (bits) */
> > +   parb_size = 1 << ((reg1 >> 10) & 0x3);
> > +
> > +   /*
> > +    * Port VC Control Register contains VC Arbitration Select, which
> > +    * cannot be modified when more than one LPVC is in operation.  We
> > +    * therefore save/restore it first, as only VC0 should be enabled
> > +    * after device reset.
> > +    */
> > +   if (buf) {
> > +           pci_vc_save_restore_dwords(dev, pos + PCI_VC_PORT_CTRL,
> > +                                      buf, 1, save);
> > +           buf++;
> > +   }
> > +   len += 4;
> > +
> > +   /*
> > +    * If we have any Low Priority VCs and a VC Arbitration Table Offset
> > +    * in Port VC Capability Register 2 then save/restore it next.
> > +    */
> > +   if (lpevcc) {
> > +           u32 reg2;
> > +           int vcarb_offset;
> > +
> > +           pci_read_config_dword(dev, pos + PCI_VC_PORT_REG2, &reg2);
> > +           vcarb_offset = (reg2 >> 24) * 16;
> > +
> > +           if (vcarb_offset) {
> > +                   int size, vcarb_phases = 0;
> > +
> > +                   if (reg2 & PCI_VC_REG2_128_PHASE)
> > +                           vcarb_phases = 128;
> > +                   else if (reg2 & PCI_VC_REG2_64_PHASE)
> > +                           vcarb_phases = 64;
> > +                   else if (reg2 & PCI_VC_REG2_32_PHASE)
> > +                           vcarb_phases = 32;
> > +
> > +                   /* Fixed 4 bits per phase per lpevcc (plus VC0) */
> > +                   size = ((lpevcc + 1) * vcarb_phases * 4) / 8;
> > +
> > +                   if (size && buf) {
> > +                           pci_vc_save_restore_dwords(dev,
> > +                                                      pos + vcarb_offset,
> > +                                                      buf, size / 4, save);
> > +                           /*
> > +                            * On restore, we need to signal hardware to
> > +                            * re-load the VC Arbitration Table.
> > +                            */
> > +                           if (!save)
> > +                                   pci_vc_load_arb_table(dev, pos);
> > +
> > +                           buf += size / 4;
> > +                   }
> > +                   len += size;
> > +           }
> > +   }
> > +
> > +   /*
> > +    * In addition to each VC Resource Control Register, we may have a
> > +    * Port Arbitration Table attached to each VC.  The Port Arbitration
> > +    * Table Offset in each VC Resource Capability Register tells us if
> > +    * it exists.  The entry size is global from the Port VC Capability
> > +    * Register1 above.  The number of phases is determined per VC.
> > +    */
> > +   for (i = 0; i < evcc + 1; i++) {
> > +           u32 cap;
> > +           int parb_offset;
> > +
> > +           pci_read_config_dword(dev, pos + PCI_VC_RES_CAP +
> > +                                 (i * PCI_CAP_VC_PER_VC_SIZEOF), &cap);
> > +           parb_offset = (cap >> 24) * 16;
> > +           if (parb_offset) {
> > +                   int size, parb_phases = 0;
> > +
> > +                   if (cap & 0x20)
> > +                           parb_phases = 256;
> > +                   else if (cap & 0x18)
> > +                           parb_phases = 128;
> > +                   else if (cap & 0x4)
> > +                           parb_phases = 64;
> > +                   else if (cap & 0x2)
> > +                           parb_phases = 32;
> > +
> > +                   size = (parb_size * parb_phases) / 8;
> > +
> > +                   if (size && buf) {
> > +                           pci_vc_save_restore_dwords(dev,
> > +                                                      pos + parb_offset,
> > +                                                      buf, size / 4, save);
> > +                           buf += size / 4;
> > +                   }
> > +                   len += size;
> > +           }
> > +
> > +           /* VC Resource Control Register */
> > +           if (buf) {
> > +                   int ctrl_pos = pos + PCI_VC_RES_CTRL +
> > +                                           (i * PCI_CAP_VC_PER_VC_SIZEOF);
> > +                   if (save)
> > +                           pci_read_config_dword(dev, ctrl_pos, buf);
> > +                   else {
> > +                           u32 tmp, ctrl = *(u32 *)buf;
> > +                           /*
> > +                            * For an FLR case, the VC config may remain.
> > +                            * Preserve enable bit, restore the rest.
> > +                            */
> > +                           pci_read_config_dword(dev, ctrl_pos, &tmp);
> > +                           tmp &= 1U << 31;
> > +                           tmp |= ctrl & ~(1U << 31);
> > +                           pci_write_config_dword(dev, ctrl_pos, tmp);
> > +                           /* Load port arbitration table if used */
> > +                           if (ctrl & (7 << 17))
> > +                                   pci_vc_load_port_arb_table(dev, pos, i);
> > +                           /* Re-enable if needed */
> > +                           if ((ctrl ^ tmp) & (1U << 31))
> > +                                   pci_vc_enable(dev, pos, i);
> > +                   }
> > +                   buf++;
> > +           }
> > +           len += 4;
> > +   }
> > +
> > +   return buf ? 0 : len;
> > +}
> > +
> > +static struct {
> > +   u16 id;
> > +   const char *name;
> > +} vc_caps[] = { { PCI_EXT_CAP_ID_MFVC, "MFVC" },
> > +           { PCI_EXT_CAP_ID_VC, "VC" },
> > +           { PCI_EXT_CAP_ID_VC9, "VC9" } };
> > +
> > +static int pci_save_vc_state(struct pci_dev *dev)
> > +{
> > +   int i;
> > +
> > +   for (i = 0; i < ARRAY_SIZE(vc_caps); i++) {
> > +           int pos, ret;
> > +           struct pci_cap_saved_state *save_state;
> > +
> > +           pos = pci_find_ext_capability(dev, vc_caps[i].id);
> > +           if (pos <= 0)
> > +                   continue;
> > +
> > +           save_state = pci_find_saved_ext_cap(dev, vc_caps[i].id);
> > +           if (!save_state) {
> > +                   dev_err(&dev->dev, "%s buffer not found in %s\n",
> > +                           vc_caps[i].name, __func__);
> > +                   return -ENOMEM;
> > +           }
> > +
> > +           ret = pci_vc_do_save_buffer(dev, pos, save_state, true);
> > +           if (ret) {
> > +                   dev_err(&dev->dev, "%s save unsuccessful %s\n",
> > +                           vc_caps[i].name, __func__);
> > +                   return ret;
> > +           }
> > +   }
> > +
> > +   return 0;
> > +}
> > +
> > +static void pci_restore_vc_state(struct pci_dev *dev)
> > +{
> > +   int i;
> > +
> > +   for (i = 0; i < ARRAY_SIZE(vc_caps); i++) {
> > +           int pos;
> > +           struct pci_cap_saved_state *save_state;
> > +
> > +           pos = pci_find_ext_capability(dev, vc_caps[i].id);
> > +           save_state = pci_find_saved_ext_cap(dev, vc_caps[i].id);
> > +           if (!save_state || pos <= 0)
> > +                   continue;
> > +
> > +           pci_vc_do_save_buffer(dev, pos, save_state, false);
> > +   }
> > +}
> > +
> >  
> >  /**
> >   * pci_save_state - save the PCI configuration space of a device before 
> > suspending
> > @@ -986,6 +1347,8 @@ pci_save_state(struct pci_dev *dev)
> >             return i;
> >     if ((i = pci_save_pcix_state(dev)) != 0)
> >             return i;
> > +   if ((i = pci_save_vc_state(dev)) != 0)
> > +           return i;
> >     return 0;
> >  }
> >  
> > @@ -1048,6 +1411,7 @@ void pci_restore_state(struct pci_dev *dev)
> >     /* PCI Express register must be restored first */
> >     pci_restore_pcie_state(dev);
> >     pci_restore_ats_state(dev);
> > +   pci_restore_vc_state(dev);
> >  
> >     pci_restore_config_space(dev);
> >  
> > @@ -2104,6 +2468,27 @@ static int pci_add_ext_cap_save_buffer(struct 
> > pci_dev *dev,
> >     return _pci_add_cap_save_buffer(dev, cap, true, size);
> >  }
> >  
> > +static void pci_allocate_vc_save_buffers(struct pci_dev *dev)
> > +{
> > +   int i;
> > +
> > +   for (i = 0; i < ARRAY_SIZE(vc_caps); i++) {
> > +           int error, pos = pci_find_ext_capability(dev, vc_caps[i].id);
> > +           int len;
> > +
> > +           if (pos <= 0)
> > +                   continue;
> > +
> > +           len = pci_vc_do_save_buffer(dev, pos, NULL, false);
> > +           error = pci_add_ext_cap_save_buffer(dev,
> > +                                               vc_caps[i].id, len);
> > +           if (error)
> > +                   dev_err(&dev->dev,
> > +                           "unable to preallocate %s save buffer\n",
> > +                           vc_caps[i].name);
> > +   }
> > +}
> > +
> >  /**
> >   * pci_allocate_cap_save_buffers - allocate buffers for saving capabilities
> >   * @dev: the PCI device
> > @@ -2122,6 +2507,8 @@ void pci_allocate_cap_save_buffers(struct pci_dev 
> > *dev)
> >     if (error)
> >             dev_err(&dev->dev,
> >                     "unable to preallocate PCI-X save buffer\n");
> > +
> > +   pci_allocate_vc_save_buffers(dev);
> >  }
> >  
> >  void pci_free_cap_save_buffers(struct pci_dev *dev)
> > 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to