Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s

2017-08-19 Thread Jarkko Sakkinen
On Thu, Aug 17, 2017 at 11:17:32AM -0600, Jason Gunthorpe wrote:
> On Thu, Aug 17, 2017 at 12:38:07PM +0200, Sebastian Andrzej Siewior wrote:
> 
> > > I worry a bit about "appears to fix".  It seems odd that the TPM device
> > > driver would be the first code to uncover this.  Can anyone confirm that 
> > > the
> > > chipset does indeed have this bug?
> > 
> > What Haris says makes sense. It is just not all architectures
> > accumulate/ batch writes to HW.
> 
> It doesn't seem that odd to me.. In modern Intel chipsets the physical
> LPC bus is used for very little. Maybe some flash and possibly a
> winbond super IO at worst?  Plus the TPM.
> 
> I can't confirm what Intel has done, but if writes are posted, then it
> is not a 'bug', but expected operation for a PCI/LPC bridge device to
> have an ordered queue of posted writes, and thus higher latency when
> processing reads due to ordering requirments.
> 
> Other drivers may not see it because most LPC usages would not be
> write heavy, or might use IO instructions which are not posted..
> 
> I can confirm that my ARM systems with a custom PCI-LPC bridge will
> have exactly the same problem, and that the readl is the only
> solution.
> 
> This is becuase writes to LPC are posted over PCI and will be buffered
> in the root complex, device end port and internally in the LPC
> bridge. Since they are posted there is no way for the CPU to know when
> the complete and when it would be 'low latency' to issue a read.

What you say makes sense to me but wasn't the patch tried with SPI-TPM,
not LPC? Anyway, what you're saying probably still applies.

/Jarkko


Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s

2017-08-19 Thread Jarkko Sakkinen
On Thu, Aug 17, 2017 at 11:17:32AM -0600, Jason Gunthorpe wrote:
> On Thu, Aug 17, 2017 at 12:38:07PM +0200, Sebastian Andrzej Siewior wrote:
> 
> > > I worry a bit about "appears to fix".  It seems odd that the TPM device
> > > driver would be the first code to uncover this.  Can anyone confirm that 
> > > the
> > > chipset does indeed have this bug?
> > 
> > What Haris says makes sense. It is just not all architectures
> > accumulate/ batch writes to HW.
> 
> It doesn't seem that odd to me.. In modern Intel chipsets the physical
> LPC bus is used for very little. Maybe some flash and possibly a
> winbond super IO at worst?  Plus the TPM.
> 
> I can't confirm what Intel has done, but if writes are posted, then it
> is not a 'bug', but expected operation for a PCI/LPC bridge device to
> have an ordered queue of posted writes, and thus higher latency when
> processing reads due to ordering requirments.
> 
> Other drivers may not see it because most LPC usages would not be
> write heavy, or might use IO instructions which are not posted..
> 
> I can confirm that my ARM systems with a custom PCI-LPC bridge will
> have exactly the same problem, and that the readl is the only
> solution.
> 
> This is becuase writes to LPC are posted over PCI and will be buffered
> in the root complex, device end port and internally in the LPC
> bridge. Since they are posted there is no way for the CPU to know when
> the complete and when it would be 'low latency' to issue a read.

What you say makes sense to me but wasn't the patch tried with SPI-TPM,
not LPC? Anyway, what you're saying probably still applies.

/Jarkko


Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s

2017-08-17 Thread Haris Okanovic
Neither wmb() nor mb() have any effect when substituted for 
ioread8(iobase + TPM_ACCESS(0)) in tpm_tis_flush(). I still see 300 - 
400 us spikes in cyclictest invoking my TPM chip's RNG.


-- Haris


On 08/17/2017 12:17 PM, Jason Gunthorpe wrote:

On Thu, Aug 17, 2017 at 12:38:07PM +0200, Sebastian Andrzej Siewior wrote:


I worry a bit about "appears to fix".  It seems odd that the TPM device
driver would be the first code to uncover this.  Can anyone confirm that the
chipset does indeed have this bug?


What Haris says makes sense. It is just not all architectures
accumulate/ batch writes to HW.


It doesn't seem that odd to me.. In modern Intel chipsets the physical
LPC bus is used for very little. Maybe some flash and possibly a
winbond super IO at worst?  Plus the TPM.

I can't confirm what Intel has done, but if writes are posted, then it
is not a 'bug', but expected operation for a PCI/LPC bridge device to
have an ordered queue of posted writes, and thus higher latency when
processing reads due to ordering requirments.

Other drivers may not see it because most LPC usages would not be
write heavy, or might use IO instructions which are not posted..

I can confirm that my ARM systems with a custom PCI-LPC bridge will
have exactly the same problem, and that the readl is the only
solution.

This is becuase writes to LPC are posted over PCI and will be buffered
in the root complex, device end port and internally in the LPC
bridge. Since they are posted there is no way for the CPU to know when
the complete and when it would be 'low latency' to issue a read.


So powerpc (for instance) has a sync operation after each write to HW. I
am wondering if we could need something like that on x86.


Even on something like PPC 'sync' is not defined to globally flush
posted writes, and wil not help. WMB is probably similar.

Jason



Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s

2017-08-17 Thread Haris Okanovic
Neither wmb() nor mb() have any effect when substituted for 
ioread8(iobase + TPM_ACCESS(0)) in tpm_tis_flush(). I still see 300 - 
400 us spikes in cyclictest invoking my TPM chip's RNG.


-- Haris


On 08/17/2017 12:17 PM, Jason Gunthorpe wrote:

On Thu, Aug 17, 2017 at 12:38:07PM +0200, Sebastian Andrzej Siewior wrote:


I worry a bit about "appears to fix".  It seems odd that the TPM device
driver would be the first code to uncover this.  Can anyone confirm that the
chipset does indeed have this bug?


What Haris says makes sense. It is just not all architectures
accumulate/ batch writes to HW.


It doesn't seem that odd to me.. In modern Intel chipsets the physical
LPC bus is used for very little. Maybe some flash and possibly a
winbond super IO at worst?  Plus the TPM.

I can't confirm what Intel has done, but if writes are posted, then it
is not a 'bug', but expected operation for a PCI/LPC bridge device to
have an ordered queue of posted writes, and thus higher latency when
processing reads due to ordering requirments.

Other drivers may not see it because most LPC usages would not be
write heavy, or might use IO instructions which are not posted..

I can confirm that my ARM systems with a custom PCI-LPC bridge will
have exactly the same problem, and that the readl is the only
solution.

This is becuase writes to LPC are posted over PCI and will be buffered
in the root complex, device end port and internally in the LPC
bridge. Since they are posted there is no way for the CPU to know when
the complete and when it would be 'low latency' to issue a read.


So powerpc (for instance) has a sync operation after each write to HW. I
am wondering if we could need something like that on x86.


Even on something like PPC 'sync' is not defined to globally flush
posted writes, and wil not help. WMB is probably similar.

Jason



Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s

2017-08-17 Thread Jason Gunthorpe
On Thu, Aug 17, 2017 at 12:38:07PM +0200, Sebastian Andrzej Siewior wrote:

> > I worry a bit about "appears to fix".  It seems odd that the TPM device
> > driver would be the first code to uncover this.  Can anyone confirm that the
> > chipset does indeed have this bug?
> 
> What Haris says makes sense. It is just not all architectures
> accumulate/ batch writes to HW.

It doesn't seem that odd to me.. In modern Intel chipsets the physical
LPC bus is used for very little. Maybe some flash and possibly a
winbond super IO at worst?  Plus the TPM.

I can't confirm what Intel has done, but if writes are posted, then it
is not a 'bug', but expected operation for a PCI/LPC bridge device to
have an ordered queue of posted writes, and thus higher latency when
processing reads due to ordering requirments.

Other drivers may not see it because most LPC usages would not be
write heavy, or might use IO instructions which are not posted..

I can confirm that my ARM systems with a custom PCI-LPC bridge will
have exactly the same problem, and that the readl is the only
solution.

This is becuase writes to LPC are posted over PCI and will be buffered
in the root complex, device end port and internally in the LPC
bridge. Since they are posted there is no way for the CPU to know when
the complete and when it would be 'low latency' to issue a read.

> So powerpc (for instance) has a sync operation after each write to HW. I
> am wondering if we could need something like that on x86.

Even on something like PPC 'sync' is not defined to globally flush
posted writes, and wil not help. WMB is probably similar.

Jason


Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s

2017-08-17 Thread Jason Gunthorpe
On Thu, Aug 17, 2017 at 12:38:07PM +0200, Sebastian Andrzej Siewior wrote:

> > I worry a bit about "appears to fix".  It seems odd that the TPM device
> > driver would be the first code to uncover this.  Can anyone confirm that the
> > chipset does indeed have this bug?
> 
> What Haris says makes sense. It is just not all architectures
> accumulate/ batch writes to HW.

It doesn't seem that odd to me.. In modern Intel chipsets the physical
LPC bus is used for very little. Maybe some flash and possibly a
winbond super IO at worst?  Plus the TPM.

I can't confirm what Intel has done, but if writes are posted, then it
is not a 'bug', but expected operation for a PCI/LPC bridge device to
have an ordered queue of posted writes, and thus higher latency when
processing reads due to ordering requirments.

Other drivers may not see it because most LPC usages would not be
write heavy, or might use IO instructions which are not posted..

I can confirm that my ARM systems with a custom PCI-LPC bridge will
have exactly the same problem, and that the readl is the only
solution.

This is becuase writes to LPC are posted over PCI and will be buffered
in the root complex, device end port and internally in the LPC
bridge. Since they are posted there is no way for the CPU to know when
the complete and when it would be 'low latency' to issue a read.

> So powerpc (for instance) has a sync operation after each write to HW. I
> am wondering if we could need something like that on x86.

Even on something like PPC 'sync' is not defined to globally flush
posted writes, and wil not help. WMB is probably similar.

Jason


Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s

2017-08-17 Thread Sebastian Andrzej Siewior
On 2017-08-16 17:15:55 [-0400], Ken Goldman wrote:
> On 8/15/2017 4:13 PM, Haris Okanovic wrote:
> > ioread8() operations to TPM MMIO addresses can stall the cpu when
> > immediately following a sequence of iowrite*()'s to the same region.
> > 
> > For example, cyclitest measures ~400us latency spikes when a non-RT
> > usermode application communicates with an SPI-based TPM chip (Intel Atom
> > E3940 system, PREEMPT_RT_FULL kernel). The spikes are caused by a
> > stalling ioread8() operation following a sequence of 30+ iowrite8()s to
> > the same address. I believe this happens because the write sequence is
> > buffered (in cpu or somewhere along the bus), and gets flushed on the
> > first LOAD instruction (ioread*()) that follows.
> > 
> > The enclosed change appears to fix this issue: read the TPM chip's
> > access register (status code) after every iowrite*() operation to
> > amortize the cost of flushing data to chip across multiple instructions.

Haris, could you try a wmb() instead the read?

> I worry a bit about "appears to fix".  It seems odd that the TPM device
> driver would be the first code to uncover this.  Can anyone confirm that the
> chipset does indeed have this bug?

What Haris says makes sense. It is just not all architectures
accumulate/ batch writes to HW.

> I'd also like an indication of the performance penalty.  We're doing a lot
> of work to improve the performance and I worry that "do a read after every
> write" will have a performance impact.
So powerpc (for instance) has a sync operation after each write to HW. I
am wondering if we could need something like that on x86.

Sebastian


Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s

2017-08-17 Thread Sebastian Andrzej Siewior
On 2017-08-16 17:15:55 [-0400], Ken Goldman wrote:
> On 8/15/2017 4:13 PM, Haris Okanovic wrote:
> > ioread8() operations to TPM MMIO addresses can stall the cpu when
> > immediately following a sequence of iowrite*()'s to the same region.
> > 
> > For example, cyclitest measures ~400us latency spikes when a non-RT
> > usermode application communicates with an SPI-based TPM chip (Intel Atom
> > E3940 system, PREEMPT_RT_FULL kernel). The spikes are caused by a
> > stalling ioread8() operation following a sequence of 30+ iowrite8()s to
> > the same address. I believe this happens because the write sequence is
> > buffered (in cpu or somewhere along the bus), and gets flushed on the
> > first LOAD instruction (ioread*()) that follows.
> > 
> > The enclosed change appears to fix this issue: read the TPM chip's
> > access register (status code) after every iowrite*() operation to
> > amortize the cost of flushing data to chip across multiple instructions.

Haris, could you try a wmb() instead the read?

> I worry a bit about "appears to fix".  It seems odd that the TPM device
> driver would be the first code to uncover this.  Can anyone confirm that the
> chipset does indeed have this bug?

What Haris says makes sense. It is just not all architectures
accumulate/ batch writes to HW.

> I'd also like an indication of the performance penalty.  We're doing a lot
> of work to improve the performance and I worry that "do a read after every
> write" will have a performance impact.
So powerpc (for instance) has a sync operation after each write to HW. I
am wondering if we could need something like that on x86.

Sebastian


Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s

2017-08-16 Thread Alexander Stein
On Wednesday 16 August 2017 17:15:55, Ken Goldman wrote:
> On 8/15/2017 4:13 PM, Haris Okanovic wrote:
> > ioread8() operations to TPM MMIO addresses can stall the cpu when
> > immediately following a sequence of iowrite*()'s to the same region.
> > 
> > For example, cyclitest measures ~400us latency spikes when a non-RT
> > usermode application communicates with an SPI-based TPM chip (Intel Atom
> > E3940 system, PREEMPT_RT_FULL kernel). The spikes are caused by a
> > stalling ioread8() operation following a sequence of 30+ iowrite8()s to
> > the same address. I believe this happens because the write sequence is
> > buffered (in cpu or somewhere along the bus), and gets flushed on the
> > first LOAD instruction (ioread*()) that follows.
> > 
> > The enclosed change appears to fix this issue: read the TPM chip's
> > access register (status code) after every iowrite*() operation to
> > amortize the cost of flushing data to chip across multiple instructions.
> 
> I worry a bit about "appears to fix".  It seems odd that the TPM device
> driver would be the first code to uncover this.  Can anyone confirm that
> the chipset does indeed have this bug?

No, there was already a similar problem in e1000e where a PCIe read stalled 
the CPU, hence no interrupts are serviced. See 
https://www.spinics.net/lists/linux-rt-users/msg14077.html
AFAIK there was no outcome though.

> I'd also like an indication of the performance penalty.  We're doing a
> lot of work to improve the performance and I worry that "do a read after
> every write" will have a performance impact.

Realtime will always affect performance, but IMHO the latter is much more 
important.

Best regards,
Alexander




Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s

2017-08-16 Thread Alexander Stein
On Wednesday 16 August 2017 17:15:55, Ken Goldman wrote:
> On 8/15/2017 4:13 PM, Haris Okanovic wrote:
> > ioread8() operations to TPM MMIO addresses can stall the cpu when
> > immediately following a sequence of iowrite*()'s to the same region.
> > 
> > For example, cyclitest measures ~400us latency spikes when a non-RT
> > usermode application communicates with an SPI-based TPM chip (Intel Atom
> > E3940 system, PREEMPT_RT_FULL kernel). The spikes are caused by a
> > stalling ioread8() operation following a sequence of 30+ iowrite8()s to
> > the same address. I believe this happens because the write sequence is
> > buffered (in cpu or somewhere along the bus), and gets flushed on the
> > first LOAD instruction (ioread*()) that follows.
> > 
> > The enclosed change appears to fix this issue: read the TPM chip's
> > access register (status code) after every iowrite*() operation to
> > amortize the cost of flushing data to chip across multiple instructions.
> 
> I worry a bit about "appears to fix".  It seems odd that the TPM device
> driver would be the first code to uncover this.  Can anyone confirm that
> the chipset does indeed have this bug?

No, there was already a similar problem in e1000e where a PCIe read stalled 
the CPU, hence no interrupts are serviced. See 
https://www.spinics.net/lists/linux-rt-users/msg14077.html
AFAIK there was no outcome though.

> I'd also like an indication of the performance penalty.  We're doing a
> lot of work to improve the performance and I worry that "do a read after
> every write" will have a performance impact.

Realtime will always affect performance, but IMHO the latter is much more 
important.

Best regards,
Alexander




Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s

2017-08-16 Thread Ken Goldman

On 8/15/2017 4:13 PM, Haris Okanovic wrote:

ioread8() operations to TPM MMIO addresses can stall the cpu when
immediately following a sequence of iowrite*()'s to the same region.

For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT_FULL kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in cpu or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.

The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.


I worry a bit about "appears to fix".  It seems odd that the TPM device 
driver would be the first code to uncover this.  Can anyone confirm that 
the chipset does indeed have this bug?


I'd also like an indication of the performance penalty.  We're doing a 
lot of work to improve the performance and I worry that "do a read after 
every write" will have a performance impact.




Re: [tpmdd-devel] [PATCH v2] tpm_tis: fix stall after iowrite*()s

2017-08-16 Thread Ken Goldman

On 8/15/2017 4:13 PM, Haris Okanovic wrote:

ioread8() operations to TPM MMIO addresses can stall the cpu when
immediately following a sequence of iowrite*()'s to the same region.

For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT_FULL kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in cpu or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.

The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.


I worry a bit about "appears to fix".  It seems odd that the TPM device 
driver would be the first code to uncover this.  Can anyone confirm that 
the chipset does indeed have this bug?


I'd also like an indication of the performance penalty.  We're doing a 
lot of work to improve the performance and I worry that "do a read after 
every write" will have a performance impact.