Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s
On Thu, Aug 17, 2017 at 11:17:32AM -0600, Jason Gunthorpe wrote: > On Thu, Aug 17, 2017 at 12:38:07PM +0200, Sebastian Andrzej Siewior wrote: > > > > I worry a bit about "appears to fix". It seems odd that the TPM device > > > driver would be the first code to uncover this. Can anyone confirm that > > > the > > > chipset does indeed have this bug? > > > > What Haris says makes sense. It is just not all architectures > > accumulate/ batch writes to HW. > > It doesn't seem that odd to me.. In modern Intel chipsets the physical > LPC bus is used for very little. Maybe some flash and possibly a > winbond super IO at worst? Plus the TPM. > > I can't confirm what Intel has done, but if writes are posted, then it > is not a 'bug', but expected operation for a PCI/LPC bridge device to > have an ordered queue of posted writes, and thus higher latency when > processing reads due to ordering requirments. > > Other drivers may not see it because most LPC usages would not be > write heavy, or might use IO instructions which are not posted.. > > I can confirm that my ARM systems with a custom PCI-LPC bridge will > have exactly the same problem, and that the readl is the only > solution. > > This is becuase writes to LPC are posted over PCI and will be buffered > in the root complex, device end port and internally in the LPC > bridge. Since they are posted there is no way for the CPU to know when > the complete and when it would be 'low latency' to issue a read. What you say makes sense to me but wasn't the patch tried with SPI-TPM, not LPC? Anyway, what you're saying probably still applies. /Jarkko
Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s
On Thu, Aug 17, 2017 at 11:17:32AM -0600, Jason Gunthorpe wrote: > On Thu, Aug 17, 2017 at 12:38:07PM +0200, Sebastian Andrzej Siewior wrote: > > > > I worry a bit about "appears to fix". It seems odd that the TPM device > > > driver would be the first code to uncover this. Can anyone confirm that > > > the > > > chipset does indeed have this bug? > > > > What Haris says makes sense. It is just not all architectures > > accumulate/ batch writes to HW. > > It doesn't seem that odd to me.. In modern Intel chipsets the physical > LPC bus is used for very little. Maybe some flash and possibly a > winbond super IO at worst? Plus the TPM. > > I can't confirm what Intel has done, but if writes are posted, then it > is not a 'bug', but expected operation for a PCI/LPC bridge device to > have an ordered queue of posted writes, and thus higher latency when > processing reads due to ordering requirments. > > Other drivers may not see it because most LPC usages would not be > write heavy, or might use IO instructions which are not posted.. > > I can confirm that my ARM systems with a custom PCI-LPC bridge will > have exactly the same problem, and that the readl is the only > solution. > > This is becuase writes to LPC are posted over PCI and will be buffered > in the root complex, device end port and internally in the LPC > bridge. Since they are posted there is no way for the CPU to know when > the complete and when it would be 'low latency' to issue a read. What you say makes sense to me but wasn't the patch tried with SPI-TPM, not LPC? Anyway, what you're saying probably still applies. /Jarkko
Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s
Neither wmb() nor mb() have any effect when substituted for ioread8(iobase + TPM_ACCESS(0)) in tpm_tis_flush(). I still see 300 - 400 us spikes in cyclictest invoking my TPM chip's RNG. -- Haris On 08/17/2017 12:17 PM, Jason Gunthorpe wrote: On Thu, Aug 17, 2017 at 12:38:07PM +0200, Sebastian Andrzej Siewior wrote: I worry a bit about "appears to fix". It seems odd that the TPM device driver would be the first code to uncover this. Can anyone confirm that the chipset does indeed have this bug? What Haris says makes sense. It is just not all architectures accumulate/ batch writes to HW. It doesn't seem that odd to me.. In modern Intel chipsets the physical LPC bus is used for very little. Maybe some flash and possibly a winbond super IO at worst? Plus the TPM. I can't confirm what Intel has done, but if writes are posted, then it is not a 'bug', but expected operation for a PCI/LPC bridge device to have an ordered queue of posted writes, and thus higher latency when processing reads due to ordering requirments. Other drivers may not see it because most LPC usages would not be write heavy, or might use IO instructions which are not posted.. I can confirm that my ARM systems with a custom PCI-LPC bridge will have exactly the same problem, and that the readl is the only solution. This is becuase writes to LPC are posted over PCI and will be buffered in the root complex, device end port and internally in the LPC bridge. Since they are posted there is no way for the CPU to know when the complete and when it would be 'low latency' to issue a read. So powerpc (for instance) has a sync operation after each write to HW. I am wondering if we could need something like that on x86. Even on something like PPC 'sync' is not defined to globally flush posted writes, and wil not help. WMB is probably similar. Jason
Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s
Neither wmb() nor mb() have any effect when substituted for ioread8(iobase + TPM_ACCESS(0)) in tpm_tis_flush(). I still see 300 - 400 us spikes in cyclictest invoking my TPM chip's RNG. -- Haris On 08/17/2017 12:17 PM, Jason Gunthorpe wrote: On Thu, Aug 17, 2017 at 12:38:07PM +0200, Sebastian Andrzej Siewior wrote: I worry a bit about "appears to fix". It seems odd that the TPM device driver would be the first code to uncover this. Can anyone confirm that the chipset does indeed have this bug? What Haris says makes sense. It is just not all architectures accumulate/ batch writes to HW. It doesn't seem that odd to me.. In modern Intel chipsets the physical LPC bus is used for very little. Maybe some flash and possibly a winbond super IO at worst? Plus the TPM. I can't confirm what Intel has done, but if writes are posted, then it is not a 'bug', but expected operation for a PCI/LPC bridge device to have an ordered queue of posted writes, and thus higher latency when processing reads due to ordering requirments. Other drivers may not see it because most LPC usages would not be write heavy, or might use IO instructions which are not posted.. I can confirm that my ARM systems with a custom PCI-LPC bridge will have exactly the same problem, and that the readl is the only solution. This is becuase writes to LPC are posted over PCI and will be buffered in the root complex, device end port and internally in the LPC bridge. Since they are posted there is no way for the CPU to know when the complete and when it would be 'low latency' to issue a read. So powerpc (for instance) has a sync operation after each write to HW. I am wondering if we could need something like that on x86. Even on something like PPC 'sync' is not defined to globally flush posted writes, and wil not help. WMB is probably similar. Jason
Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s
On Thu, Aug 17, 2017 at 12:38:07PM +0200, Sebastian Andrzej Siewior wrote: > > I worry a bit about "appears to fix". It seems odd that the TPM device > > driver would be the first code to uncover this. Can anyone confirm that the > > chipset does indeed have this bug? > > What Haris says makes sense. It is just not all architectures > accumulate/ batch writes to HW. It doesn't seem that odd to me.. In modern Intel chipsets the physical LPC bus is used for very little. Maybe some flash and possibly a winbond super IO at worst? Plus the TPM. I can't confirm what Intel has done, but if writes are posted, then it is not a 'bug', but expected operation for a PCI/LPC bridge device to have an ordered queue of posted writes, and thus higher latency when processing reads due to ordering requirments. Other drivers may not see it because most LPC usages would not be write heavy, or might use IO instructions which are not posted.. I can confirm that my ARM systems with a custom PCI-LPC bridge will have exactly the same problem, and that the readl is the only solution. This is becuase writes to LPC are posted over PCI and will be buffered in the root complex, device end port and internally in the LPC bridge. Since they are posted there is no way for the CPU to know when the complete and when it would be 'low latency' to issue a read. > So powerpc (for instance) has a sync operation after each write to HW. I > am wondering if we could need something like that on x86. Even on something like PPC 'sync' is not defined to globally flush posted writes, and wil not help. WMB is probably similar. Jason
Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s
On Thu, Aug 17, 2017 at 12:38:07PM +0200, Sebastian Andrzej Siewior wrote: > > I worry a bit about "appears to fix". It seems odd that the TPM device > > driver would be the first code to uncover this. Can anyone confirm that the > > chipset does indeed have this bug? > > What Haris says makes sense. It is just not all architectures > accumulate/ batch writes to HW. It doesn't seem that odd to me.. In modern Intel chipsets the physical LPC bus is used for very little. Maybe some flash and possibly a winbond super IO at worst? Plus the TPM. I can't confirm what Intel has done, but if writes are posted, then it is not a 'bug', but expected operation for a PCI/LPC bridge device to have an ordered queue of posted writes, and thus higher latency when processing reads due to ordering requirments. Other drivers may not see it because most LPC usages would not be write heavy, or might use IO instructions which are not posted.. I can confirm that my ARM systems with a custom PCI-LPC bridge will have exactly the same problem, and that the readl is the only solution. This is becuase writes to LPC are posted over PCI and will be buffered in the root complex, device end port and internally in the LPC bridge. Since they are posted there is no way for the CPU to know when the complete and when it would be 'low latency' to issue a read. > So powerpc (for instance) has a sync operation after each write to HW. I > am wondering if we could need something like that on x86. Even on something like PPC 'sync' is not defined to globally flush posted writes, and wil not help. WMB is probably similar. Jason
Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s
On 2017-08-16 17:15:55 [-0400], Ken Goldman wrote: > On 8/15/2017 4:13 PM, Haris Okanovic wrote: > > ioread8() operations to TPM MMIO addresses can stall the cpu when > > immediately following a sequence of iowrite*()'s to the same region. > > > > For example, cyclitest measures ~400us latency spikes when a non-RT > > usermode application communicates with an SPI-based TPM chip (Intel Atom > > E3940 system, PREEMPT_RT_FULL kernel). The spikes are caused by a > > stalling ioread8() operation following a sequence of 30+ iowrite8()s to > > the same address. I believe this happens because the write sequence is > > buffered (in cpu or somewhere along the bus), and gets flushed on the > > first LOAD instruction (ioread*()) that follows. > > > > The enclosed change appears to fix this issue: read the TPM chip's > > access register (status code) after every iowrite*() operation to > > amortize the cost of flushing data to chip across multiple instructions. Haris, could you try a wmb() instead the read? > I worry a bit about "appears to fix". It seems odd that the TPM device > driver would be the first code to uncover this. Can anyone confirm that the > chipset does indeed have this bug? What Haris says makes sense. It is just not all architectures accumulate/ batch writes to HW. > I'd also like an indication of the performance penalty. We're doing a lot > of work to improve the performance and I worry that "do a read after every > write" will have a performance impact. So powerpc (for instance) has a sync operation after each write to HW. I am wondering if we could need something like that on x86. Sebastian
Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s
On 2017-08-16 17:15:55 [-0400], Ken Goldman wrote: > On 8/15/2017 4:13 PM, Haris Okanovic wrote: > > ioread8() operations to TPM MMIO addresses can stall the cpu when > > immediately following a sequence of iowrite*()'s to the same region. > > > > For example, cyclitest measures ~400us latency spikes when a non-RT > > usermode application communicates with an SPI-based TPM chip (Intel Atom > > E3940 system, PREEMPT_RT_FULL kernel). The spikes are caused by a > > stalling ioread8() operation following a sequence of 30+ iowrite8()s to > > the same address. I believe this happens because the write sequence is > > buffered (in cpu or somewhere along the bus), and gets flushed on the > > first LOAD instruction (ioread*()) that follows. > > > > The enclosed change appears to fix this issue: read the TPM chip's > > access register (status code) after every iowrite*() operation to > > amortize the cost of flushing data to chip across multiple instructions. Haris, could you try a wmb() instead the read? > I worry a bit about "appears to fix". It seems odd that the TPM device > driver would be the first code to uncover this. Can anyone confirm that the > chipset does indeed have this bug? What Haris says makes sense. It is just not all architectures accumulate/ batch writes to HW. > I'd also like an indication of the performance penalty. We're doing a lot > of work to improve the performance and I worry that "do a read after every > write" will have a performance impact. So powerpc (for instance) has a sync operation after each write to HW. I am wondering if we could need something like that on x86. Sebastian
Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s
On Wednesday 16 August 2017 17:15:55, Ken Goldman wrote: > On 8/15/2017 4:13 PM, Haris Okanovic wrote: > > ioread8() operations to TPM MMIO addresses can stall the cpu when > > immediately following a sequence of iowrite*()'s to the same region. > > > > For example, cyclitest measures ~400us latency spikes when a non-RT > > usermode application communicates with an SPI-based TPM chip (Intel Atom > > E3940 system, PREEMPT_RT_FULL kernel). The spikes are caused by a > > stalling ioread8() operation following a sequence of 30+ iowrite8()s to > > the same address. I believe this happens because the write sequence is > > buffered (in cpu or somewhere along the bus), and gets flushed on the > > first LOAD instruction (ioread*()) that follows. > > > > The enclosed change appears to fix this issue: read the TPM chip's > > access register (status code) after every iowrite*() operation to > > amortize the cost of flushing data to chip across multiple instructions. > > I worry a bit about "appears to fix". It seems odd that the TPM device > driver would be the first code to uncover this. Can anyone confirm that > the chipset does indeed have this bug? No, there was already a similar problem in e1000e where a PCIe read stalled the CPU, hence no interrupts are serviced. See https://www.spinics.net/lists/linux-rt-users/msg14077.html AFAIK there was no outcome though. > I'd also like an indication of the performance penalty. We're doing a > lot of work to improve the performance and I worry that "do a read after > every write" will have a performance impact. Realtime will always affect performance, but IMHO the latter is much more important. Best regards, Alexander
Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s
On Wednesday 16 August 2017 17:15:55, Ken Goldman wrote: > On 8/15/2017 4:13 PM, Haris Okanovic wrote: > > ioread8() operations to TPM MMIO addresses can stall the cpu when > > immediately following a sequence of iowrite*()'s to the same region. > > > > For example, cyclitest measures ~400us latency spikes when a non-RT > > usermode application communicates with an SPI-based TPM chip (Intel Atom > > E3940 system, PREEMPT_RT_FULL kernel). The spikes are caused by a > > stalling ioread8() operation following a sequence of 30+ iowrite8()s to > > the same address. I believe this happens because the write sequence is > > buffered (in cpu or somewhere along the bus), and gets flushed on the > > first LOAD instruction (ioread*()) that follows. > > > > The enclosed change appears to fix this issue: read the TPM chip's > > access register (status code) after every iowrite*() operation to > > amortize the cost of flushing data to chip across multiple instructions. > > I worry a bit about "appears to fix". It seems odd that the TPM device > driver would be the first code to uncover this. Can anyone confirm that > the chipset does indeed have this bug? No, there was already a similar problem in e1000e where a PCIe read stalled the CPU, hence no interrupts are serviced. See https://www.spinics.net/lists/linux-rt-users/msg14077.html AFAIK there was no outcome though. > I'd also like an indication of the performance penalty. We're doing a > lot of work to improve the performance and I worry that "do a read after > every write" will have a performance impact. Realtime will always affect performance, but IMHO the latter is much more important. Best regards, Alexander
Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s
On 8/15/2017 4:13 PM, Haris Okanovic wrote: ioread8() operations to TPM MMIO addresses can stall the cpu when immediately following a sequence of iowrite*()'s to the same region. For example, cyclitest measures ~400us latency spikes when a non-RT usermode application communicates with an SPI-based TPM chip (Intel Atom E3940 system, PREEMPT_RT_FULL kernel). The spikes are caused by a stalling ioread8() operation following a sequence of 30+ iowrite8()s to the same address. I believe this happens because the write sequence is buffered (in cpu or somewhere along the bus), and gets flushed on the first LOAD instruction (ioread*()) that follows. The enclosed change appears to fix this issue: read the TPM chip's access register (status code) after every iowrite*() operation to amortize the cost of flushing data to chip across multiple instructions. I worry a bit about "appears to fix". It seems odd that the TPM device driver would be the first code to uncover this. Can anyone confirm that the chipset does indeed have this bug? I'd also like an indication of the performance penalty. We're doing a lot of work to improve the performance and I worry that "do a read after every write" will have a performance impact.
Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s
On 8/15/2017 4:13 PM, Haris Okanovic wrote: ioread8() operations to TPM MMIO addresses can stall the cpu when immediately following a sequence of iowrite*()'s to the same region. For example, cyclitest measures ~400us latency spikes when a non-RT usermode application communicates with an SPI-based TPM chip (Intel Atom E3940 system, PREEMPT_RT_FULL kernel). The spikes are caused by a stalling ioread8() operation following a sequence of 30+ iowrite8()s to the same address. I believe this happens because the write sequence is buffered (in cpu or somewhere along the bus), and gets flushed on the first LOAD instruction (ioread*()) that follows. The enclosed change appears to fix this issue: read the TPM chip's access register (status code) after every iowrite*() operation to amortize the cost of flushing data to chip across multiple instructions. I worry a bit about "appears to fix". It seems odd that the TPM device driver would be the first code to uncover this. Can anyone confirm that the chipset does indeed have this bug? I'd also like an indication of the performance penalty. We're doing a lot of work to improve the performance and I worry that "do a read after every write" will have a performance impact.