Re: Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount

2017-08-16 Thread Ken Goldman

On 8/14/2017 6:10 AM, Jarkko Sakkinen wrote:

Nuvoton, ST Micro, and Infineon confirmed that the TPM empties a byte from
the FIFO in under 1 usec.  So, even with a static burst count, the entire
FIFO would empty in under 10 usec.


Does this break anything lets say in a decade time frame? If it does, I
will not even consider this. Can you give a definitive answer for that?


My attempt at predicting the future ...

1 - Clock speed should get faster, not slower, so the 1 usec wait state 
should get shorter.


2 - TPM buffers should get larger.  I'm already reading of 256 byte 
buffers.  So there should be fewer wait states.


3 - There is a trend toward using the CRB, making the FIFO less applicable.

4 - There is a trend away from LPC connected serial port devices, 
printers, and PS/2 mouse and keyboard, and toward USB connected devices. 
 I assume this will continue, making the already insignificant wait 
states irrelevant.




Re: Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount

2017-08-16 Thread Ken Goldman

On 8/14/2017 6:10 AM, Jarkko Sakkinen wrote:

Nuvoton, ST Micro, and Infineon confirmed that the TPM empties a byte from
the FIFO in under 1 usec.  So, even with a static burst count, the entire
FIFO would empty in under 10 usec.


Does this break anything lets say in a decade time frame? If it does, I
will not even consider this. Can you give a definitive answer for that?


My attempt at predicting the future ...

1 - Clock speed should get faster, not slower, so the 1 usec wait state 
should get shorter.


2 - TPM buffers should get larger.  I'm already reading of 256 byte 
buffers.  So there should be fewer wait states.


3 - There is a trend toward using the CRB, making the FIFO less applicable.

4 - There is a trend away from LPC connected serial port devices, 
printers, and PS/2 mouse and keyboard, and toward USB connected devices. 
 I assume this will continue, making the already insignificant wait 
states irrelevant.




Re: Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount

2017-08-11 Thread Ken Goldman

On 8/9/2017 4:43 PM, Peter Huewe wrote:


Yes that's bad, especially with current msleep(5) is actually
msleep(20). However, please also keep in mind SPI tpms show a much
higher burst count value,  (255) our I2C TPM SLB9645 even shows
something in the range of 1k. :)


One of our platforms has a TPM 1.2 with an 8 byte FIFO and a static 
burst count.  This is where we're debugging.



Would another solution be to reduce the burst count poll and sleep
to, e.g., 100 usec or even 10 usec? This would probably help
greatly, but till not incur the wait states that triggered the
NACK.


If you use sleep it's not guaranteed that you wakeup after exactly xx
specified amount of time. Just that you sleep at least xx amount of
time. Otherwise you would have to do delay/busywaiting.


Understood.  However, if the DD maintainers will not accept ignoring 
burstCount, would a short sleep (e.g., 10 usec) be acceptable.


If so, we can benchmark it.


Imho the best option is to figure out whether any vendor can
determine the "FIFO flush time" i.e. how much time does it take to
empty the fifo and fillup the burstcount and use this on the lower
bound of an usleep range. I don't think tpms are in the range of < 10
us...

@Ken: Maybe can you check in DDWG?


I asked this week.

Nuvoton, ST Micro, and Infineon confirmed that the TPM empties a byte 
from the FIFO in under 1 usec.  So, even with a static burst count, the 
entire FIFO would empty in under 10 usec.




Re: Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount

2017-08-11 Thread Ken Goldman

On 8/9/2017 4:43 PM, Peter Huewe wrote:


Yes that's bad, especially with current msleep(5) is actually
msleep(20). However, please also keep in mind SPI tpms show a much
higher burst count value,  (255) our I2C TPM SLB9645 even shows
something in the range of 1k. :)


One of our platforms has a TPM 1.2 with an 8 byte FIFO and a static 
burst count.  This is where we're debugging.



Would another solution be to reduce the burst count poll and sleep
to, e.g., 100 usec or even 10 usec? This would probably help
greatly, but till not incur the wait states that triggered the
NACK.


If you use sleep it's not guaranteed that you wakeup after exactly xx
specified amount of time. Just that you sleep at least xx amount of
time. Otherwise you would have to do delay/busywaiting.


Understood.  However, if the DD maintainers will not accept ignoring 
burstCount, would a short sleep (e.g., 10 usec) be acceptable.


If so, we can benchmark it.


Imho the best option is to figure out whether any vendor can
determine the "FIFO flush time" i.e. how much time does it take to
empty the fifo and fillup the burstcount and use this on the lower
bound of an usleep range. I don't think tpms are in the range of < 10
us...

@Ken: Maybe can you check in DDWG?


I asked this week.

Nuvoton, ST Micro, and Infineon confirmed that the TPM empties a byte 
from the FIFO in under 1 usec.  So, even with a static burst count, the 
entire FIFO would empty in under 10 usec.




Re: Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount

2017-08-11 Thread Ken Goldman

On 8/9/2017 5:00 PM, Peter Huewe wrote:


Since we are the linux kernel, we do have to care for legacy
devices. And a system with LPC, PS2Mouse on SuperIO and a TPM are not
that uncommon.

And heck, we even have support for 1.1b TPM devices


Understood.  However, remember that SuperIO is a 1980's device that
predates the TPM.  Since the TPM requires special LPC bus cycles, it's
even less likely that an old chipset has an attached TPM.


Interesting - let me check with Georg tomorrow. Unfortunately I do
not have access to my tcg mails from home (since I'm not working :), 
but did you _explicitly_ talk about LPC and the system? I'm sure the

TPM does not care about the waitstates...


Yes, Infineon was one of the 3 TPM vendors who confirmed this.




Re: Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount

2017-08-11 Thread Ken Goldman

On 8/9/2017 5:00 PM, Peter Huewe wrote:


Since we are the linux kernel, we do have to care for legacy
devices. And a system with LPC, PS2Mouse on SuperIO and a TPM are not
that uncommon.

And heck, we even have support for 1.1b TPM devices


Understood.  However, remember that SuperIO is a 1980's device that
predates the TPM.  Since the TPM requires special LPC bus cycles, it's
even less likely that an old chipset has an attached TPM.


Interesting - let me check with Georg tomorrow. Unfortunately I do
not have access to my tcg mails from home (since I'm not working :), 
but did you _explicitly_ talk about LPC and the system? I'm sure the

TPM does not care about the waitstates...


Yes, Infineon was one of the 3 TPM vendors who confirmed this.




Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount

2017-08-09 Thread Peter Huewe
Hi Ken,
(again speaking only on my behalf, not my employer)

> Does anyone know of platforms where this occurs?
> I suspect (but not sure) that the days of SuperIO connecting floppy
> drives, printer ports, and PS/2 mouse ports on the LPC bus are over, and
> such legacy systems will not have a TPM. Would SuperIO even support the
> special TPM LPC bus cycles?

Since we are the linux kernel, we do have to care for legacy devices.
And a system with LPC, PS2Mouse on SuperIO and a TPM are not that uncommon.

And heck, we even have support for 1.1b TPM devices


>> One more viewpoint: TCG must added the burst count for a reason (might
>> be very well related what Peter said). Is ignoring it something that TCG
>> recommends? Not following standard exactly in the driver code sometimes
>> makes sense on *small details* but I would not say that this a small
>> detail...

> I checked with the TCG's device driver work group (DDWG). Both the spec
> editor and 3 TPM vendors - Infineon, Nuvoton, and ST Micro - agreed that
> ignoring burst count may incur wait states but nothing more. Operations
> will still be successful.

Interesting - let me check with Georg tomorrow.
Unfortunately I do not have access to my tcg mails from home (since I'm not 
working :), 
but did you _explicitly_ talk about LPC and the system?
I'm sure the TPM does not care about the waitstates...

If my memory does not betray me, 
it is actually possible to "freeze up" a system completly by flooding the lpc 
bus.
Let me double check tomorrow...


In anycase - I really would like to see a much more performant tpm subsystem - 
however it will be quite an effort with a lot of legacy testing.
(which I unfortunately cannot spend on my private time ... and also of course 
lacking test systems).

Thanks,
Peter


Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount

2017-08-09 Thread Peter Huewe
Hi Ken,
(again speaking only on my behalf, not my employer)

> Does anyone know of platforms where this occurs?
> I suspect (but not sure) that the days of SuperIO connecting floppy
> drives, printer ports, and PS/2 mouse ports on the LPC bus are over, and
> such legacy systems will not have a TPM. Would SuperIO even support the
> special TPM LPC bus cycles?

Since we are the linux kernel, we do have to care for legacy devices.
And a system with LPC, PS2Mouse on SuperIO and a TPM are not that uncommon.

And heck, we even have support for 1.1b TPM devices


>> One more viewpoint: TCG must added the burst count for a reason (might
>> be very well related what Peter said). Is ignoring it something that TCG
>> recommends? Not following standard exactly in the driver code sometimes
>> makes sense on *small details* but I would not say that this a small
>> detail...

> I checked with the TCG's device driver work group (DDWG). Both the spec
> editor and 3 TPM vendors - Infineon, Nuvoton, and ST Micro - agreed that
> ignoring burst count may incur wait states but nothing more. Operations
> will still be successful.

Interesting - let me check with Georg tomorrow.
Unfortunately I do not have access to my tcg mails from home (since I'm not 
working :), 
but did you _explicitly_ talk about LPC and the system?
I'm sure the TPM does not care about the waitstates...

If my memory does not betray me, 
it is actually possible to "freeze up" a system completly by flooding the lpc 
bus.
Let me double check tomorrow...


In anycase - I really would like to see a much more performant tpm subsystem - 
however it will be quite an effort with a lot of legacy testing.
(which I unfortunately cannot spend on my private time ... and also of course 
lacking test systems).

Thanks,
Peter


Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount

2017-08-09 Thread Peter Huewe
Hi Ken,

(speaking on behalf of myself here, not my employer :) )
> On Mon, Aug 07, 2017 at 01:52:34PM +0200, Peter Huewe wrote:
>> Imho: NACK from my side.
> After these viewpoints definitive NACK from my side too...


> I responded to the thread comments separately. However, assuming NACK
> is the final response, I have a question.

Nothing is ever final :)

> The problem is the 5 msec sleep between polls of burst count. In the
> case of one TPM with an 8 byte FIFO, a 32 byte transfer incurs 4 of
> these sleeps.

Yes that's bad, especially with current msleep(5) is actually msleep(20).
However, please also keep in mind SPI tpms show a much higher burst count 
value,  (255)
our I2C TPM SLB9645 even shows something in the range of 1k. :)


> Would another solution be to reduce the burst count poll and sleep to,
> e.g., 100 usec or even 10 usec? This would probably help greatly, but
> till not incur the wait states that triggered the NACK.

Imho the 5ms were an arbitrary chosen value for old old tpms.
Back then also only msleep was available which was msleep(20) anyway.


> My worry is that the scheduler would not be able to context switch that
> fast, and so we wouldn't actually see usec speed polling.
> Can a kernel expert offer an opinion?
If you use sleep it's not guaranteed that you wakeup after exactly xx specified 
amount of time.
Just that you sleep at least xx amount of time. Otherwise you would have to do 
delay/busywaiting.

https://www.kernel.org/doc/Documentation/timers/timers-howto.txt


Imho the best option is to
figure out whether any vendor can determine the "FIFO flush time" i.e. how much 
time does it take to empty the fifo and fillup the burstcount and use this on 
the lower bound of an usleep range.
I don't think tpms are in the range of < 10 us...

@Ken: Maybe can you check in DDWG?

What we often see/saw during some performance optimization tests is something 
like 
burstcount 7, burstcount 1, burstcount 7, burstcount 1...
which is the oposite of what you want to achieve.

You'd like to have something like
burstcount 8, burstcount 8, burstcount 8

Unfortunately TPMS don't report their max burstcount (afaik),
otherwise it would be easy to write a calibration routine for the poll times.


Thanks,
Peter






Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount

2017-08-09 Thread Peter Huewe
Hi Ken,

(speaking on behalf of myself here, not my employer :) )
> On Mon, Aug 07, 2017 at 01:52:34PM +0200, Peter Huewe wrote:
>> Imho: NACK from my side.
> After these viewpoints definitive NACK from my side too...


> I responded to the thread comments separately. However, assuming NACK
> is the final response, I have a question.

Nothing is ever final :)

> The problem is the 5 msec sleep between polls of burst count. In the
> case of one TPM with an 8 byte FIFO, a 32 byte transfer incurs 4 of
> these sleeps.

Yes that's bad, especially with current msleep(5) is actually msleep(20).
However, please also keep in mind SPI tpms show a much higher burst count 
value,  (255)
our I2C TPM SLB9645 even shows something in the range of 1k. :)


> Would another solution be to reduce the burst count poll and sleep to,
> e.g., 100 usec or even 10 usec? This would probably help greatly, but
> till not incur the wait states that triggered the NACK.

Imho the 5ms were an arbitrary chosen value for old old tpms.
Back then also only msleep was available which was msleep(20) anyway.


> My worry is that the scheduler would not be able to context switch that
> fast, and so we wouldn't actually see usec speed polling.
> Can a kernel expert offer an opinion?
If you use sleep it's not guaranteed that you wakeup after exactly xx specified 
amount of time.
Just that you sleep at least xx amount of time. Otherwise you would have to do 
delay/busywaiting.

https://www.kernel.org/doc/Documentation/timers/timers-howto.txt


Imho the best option is to
figure out whether any vendor can determine the "FIFO flush time" i.e. how much 
time does it take to empty the fifo and fillup the burstcount and use this on 
the lower bound of an usleep range.
I don't think tpms are in the range of < 10 us...

@Ken: Maybe can you check in DDWG?

What we often see/saw during some performance optimization tests is something 
like 
burstcount 7, burstcount 1, burstcount 7, burstcount 1...
which is the oposite of what you want to achieve.

You'd like to have something like
burstcount 8, burstcount 8, burstcount 8

Unfortunately TPMS don't report their max burstcount (afaik),
otherwise it would be easy to write a calibration routine for the poll times.


Thanks,
Peter