Re: Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount
On 8/14/2017 6:10 AM, Jarkko Sakkinen wrote: Nuvoton, ST Micro, and Infineon confirmed that the TPM empties a byte from the FIFO in under 1 usec. So, even with a static burst count, the entire FIFO would empty in under 10 usec. Does this break anything lets say in a decade time frame? If it does, I will not even consider this. Can you give a definitive answer for that? My attempt at predicting the future ... 1 - Clock speed should get faster, not slower, so the 1 usec wait state should get shorter. 2 - TPM buffers should get larger. I'm already reading of 256 byte buffers. So there should be fewer wait states. 3 - There is a trend toward using the CRB, making the FIFO less applicable. 4 - There is a trend away from LPC connected serial port devices, printers, and PS/2 mouse and keyboard, and toward USB connected devices. I assume this will continue, making the already insignificant wait states irrelevant.
Re: Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount
On 8/14/2017 6:10 AM, Jarkko Sakkinen wrote: Nuvoton, ST Micro, and Infineon confirmed that the TPM empties a byte from the FIFO in under 1 usec. So, even with a static burst count, the entire FIFO would empty in under 10 usec. Does this break anything lets say in a decade time frame? If it does, I will not even consider this. Can you give a definitive answer for that? My attempt at predicting the future ... 1 - Clock speed should get faster, not slower, so the 1 usec wait state should get shorter. 2 - TPM buffers should get larger. I'm already reading of 256 byte buffers. So there should be fewer wait states. 3 - There is a trend toward using the CRB, making the FIFO less applicable. 4 - There is a trend away from LPC connected serial port devices, printers, and PS/2 mouse and keyboard, and toward USB connected devices. I assume this will continue, making the already insignificant wait states irrelevant.
Re: Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount
On 8/9/2017 4:43 PM, Peter Huewe wrote: Yes that's bad, especially with current msleep(5) is actually msleep(20). However, please also keep in mind SPI tpms show a much higher burst count value, (255) our I2C TPM SLB9645 even shows something in the range of 1k. :) One of our platforms has a TPM 1.2 with an 8 byte FIFO and a static burst count. This is where we're debugging. Would another solution be to reduce the burst count poll and sleep to, e.g., 100 usec or even 10 usec? This would probably help greatly, but till not incur the wait states that triggered the NACK. If you use sleep it's not guaranteed that you wakeup after exactly xx specified amount of time. Just that you sleep at least xx amount of time. Otherwise you would have to do delay/busywaiting. Understood. However, if the DD maintainers will not accept ignoring burstCount, would a short sleep (e.g., 10 usec) be acceptable. If so, we can benchmark it. Imho the best option is to figure out whether any vendor can determine the "FIFO flush time" i.e. how much time does it take to empty the fifo and fillup the burstcount and use this on the lower bound of an usleep range. I don't think tpms are in the range of < 10 us... @Ken: Maybe can you check in DDWG? I asked this week. Nuvoton, ST Micro, and Infineon confirmed that the TPM empties a byte from the FIFO in under 1 usec. So, even with a static burst count, the entire FIFO would empty in under 10 usec.
Re: Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount
On 8/9/2017 4:43 PM, Peter Huewe wrote: Yes that's bad, especially with current msleep(5) is actually msleep(20). However, please also keep in mind SPI tpms show a much higher burst count value, (255) our I2C TPM SLB9645 even shows something in the range of 1k. :) One of our platforms has a TPM 1.2 with an 8 byte FIFO and a static burst count. This is where we're debugging. Would another solution be to reduce the burst count poll and sleep to, e.g., 100 usec or even 10 usec? This would probably help greatly, but till not incur the wait states that triggered the NACK. If you use sleep it's not guaranteed that you wakeup after exactly xx specified amount of time. Just that you sleep at least xx amount of time. Otherwise you would have to do delay/busywaiting. Understood. However, if the DD maintainers will not accept ignoring burstCount, would a short sleep (e.g., 10 usec) be acceptable. If so, we can benchmark it. Imho the best option is to figure out whether any vendor can determine the "FIFO flush time" i.e. how much time does it take to empty the fifo and fillup the burstcount and use this on the lower bound of an usleep range. I don't think tpms are in the range of < 10 us... @Ken: Maybe can you check in DDWG? I asked this week. Nuvoton, ST Micro, and Infineon confirmed that the TPM empties a byte from the FIFO in under 1 usec. So, even with a static burst count, the entire FIFO would empty in under 10 usec.
Re: Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount
On 8/9/2017 5:00 PM, Peter Huewe wrote: Since we are the linux kernel, we do have to care for legacy devices. And a system with LPC, PS2Mouse on SuperIO and a TPM are not that uncommon. And heck, we even have support for 1.1b TPM devices Understood. However, remember that SuperIO is a 1980's device that predates the TPM. Since the TPM requires special LPC bus cycles, it's even less likely that an old chipset has an attached TPM. Interesting - let me check with Georg tomorrow. Unfortunately I do not have access to my tcg mails from home (since I'm not working :), but did you _explicitly_ talk about LPC and the system? I'm sure the TPM does not care about the waitstates... Yes, Infineon was one of the 3 TPM vendors who confirmed this.
Re: Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount
On 8/9/2017 5:00 PM, Peter Huewe wrote: Since we are the linux kernel, we do have to care for legacy devices. And a system with LPC, PS2Mouse on SuperIO and a TPM are not that uncommon. And heck, we even have support for 1.1b TPM devices Understood. However, remember that SuperIO is a 1980's device that predates the TPM. Since the TPM requires special LPC bus cycles, it's even less likely that an old chipset has an attached TPM. Interesting - let me check with Georg tomorrow. Unfortunately I do not have access to my tcg mails from home (since I'm not working :), but did you _explicitly_ talk about LPC and the system? I'm sure the TPM does not care about the waitstates... Yes, Infineon was one of the 3 TPM vendors who confirmed this.
Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount
Hi Ken, (again speaking only on my behalf, not my employer) > Does anyone know of platforms where this occurs? > I suspect (but not sure) that the days of SuperIO connecting floppy > drives, printer ports, and PS/2 mouse ports on the LPC bus are over, and > such legacy systems will not have a TPM. Would SuperIO even support the > special TPM LPC bus cycles? Since we are the linux kernel, we do have to care for legacy devices. And a system with LPC, PS2Mouse on SuperIO and a TPM are not that uncommon. And heck, we even have support for 1.1b TPM devices >> One more viewpoint: TCG must added the burst count for a reason (might >> be very well related what Peter said). Is ignoring it something that TCG >> recommends? Not following standard exactly in the driver code sometimes >> makes sense on *small details* but I would not say that this a small >> detail... > I checked with the TCG's device driver work group (DDWG). Both the spec > editor and 3 TPM vendors - Infineon, Nuvoton, and ST Micro - agreed that > ignoring burst count may incur wait states but nothing more. Operations > will still be successful. Interesting - let me check with Georg tomorrow. Unfortunately I do not have access to my tcg mails from home (since I'm not working :), but did you _explicitly_ talk about LPC and the system? I'm sure the TPM does not care about the waitstates... If my memory does not betray me, it is actually possible to "freeze up" a system completly by flooding the lpc bus. Let me double check tomorrow... In anycase - I really would like to see a much more performant tpm subsystem - however it will be quite an effort with a lot of legacy testing. (which I unfortunately cannot spend on my private time ... and also of course lacking test systems). Thanks, Peter
Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount
Hi Ken, (again speaking only on my behalf, not my employer) > Does anyone know of platforms where this occurs? > I suspect (but not sure) that the days of SuperIO connecting floppy > drives, printer ports, and PS/2 mouse ports on the LPC bus are over, and > such legacy systems will not have a TPM. Would SuperIO even support the > special TPM LPC bus cycles? Since we are the linux kernel, we do have to care for legacy devices. And a system with LPC, PS2Mouse on SuperIO and a TPM are not that uncommon. And heck, we even have support for 1.1b TPM devices >> One more viewpoint: TCG must added the burst count for a reason (might >> be very well related what Peter said). Is ignoring it something that TCG >> recommends? Not following standard exactly in the driver code sometimes >> makes sense on *small details* but I would not say that this a small >> detail... > I checked with the TCG's device driver work group (DDWG). Both the spec > editor and 3 TPM vendors - Infineon, Nuvoton, and ST Micro - agreed that > ignoring burst count may incur wait states but nothing more. Operations > will still be successful. Interesting - let me check with Georg tomorrow. Unfortunately I do not have access to my tcg mails from home (since I'm not working :), but did you _explicitly_ talk about LPC and the system? I'm sure the TPM does not care about the waitstates... If my memory does not betray me, it is actually possible to "freeze up" a system completly by flooding the lpc bus. Let me double check tomorrow... In anycase - I really would like to see a much more performant tpm subsystem - however it will be quite an effort with a lot of legacy testing. (which I unfortunately cannot spend on my private time ... and also of course lacking test systems). Thanks, Peter
Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount
Hi Ken, (speaking on behalf of myself here, not my employer :) ) > On Mon, Aug 07, 2017 at 01:52:34PM +0200, Peter Huewe wrote: >> Imho: NACK from my side. > After these viewpoints definitive NACK from my side too... > I responded to the thread comments separately. However, assuming NACK > is the final response, I have a question. Nothing is ever final :) > The problem is the 5 msec sleep between polls of burst count. In the > case of one TPM with an 8 byte FIFO, a 32 byte transfer incurs 4 of > these sleeps. Yes that's bad, especially with current msleep(5) is actually msleep(20). However, please also keep in mind SPI tpms show a much higher burst count value, (255) our I2C TPM SLB9645 even shows something in the range of 1k. :) > Would another solution be to reduce the burst count poll and sleep to, > e.g., 100 usec or even 10 usec? This would probably help greatly, but > till not incur the wait states that triggered the NACK. Imho the 5ms were an arbitrary chosen value for old old tpms. Back then also only msleep was available which was msleep(20) anyway. > My worry is that the scheduler would not be able to context switch that > fast, and so we wouldn't actually see usec speed polling. > Can a kernel expert offer an opinion? If you use sleep it's not guaranteed that you wakeup after exactly xx specified amount of time. Just that you sleep at least xx amount of time. Otherwise you would have to do delay/busywaiting. https://www.kernel.org/doc/Documentation/timers/timers-howto.txt Imho the best option is to figure out whether any vendor can determine the "FIFO flush time" i.e. how much time does it take to empty the fifo and fillup the burstcount and use this on the lower bound of an usleep range. I don't think tpms are in the range of < 10 us... @Ken: Maybe can you check in DDWG? What we often see/saw during some performance optimization tests is something like burstcount 7, burstcount 1, burstcount 7, burstcount 1... which is the oposite of what you want to achieve. You'd like to have something like burstcount 8, burstcount 8, burstcount 8 Unfortunately TPMS don't report their max burstcount (afaik), otherwise it would be easy to write a calibration routine for the poll times. Thanks, Peter
Aw: Re: [tpmdd-devel] [PATCH] tpm: improve tpm_tis send() performance by ignoring burstcount
Hi Ken, (speaking on behalf of myself here, not my employer :) ) > On Mon, Aug 07, 2017 at 01:52:34PM +0200, Peter Huewe wrote: >> Imho: NACK from my side. > After these viewpoints definitive NACK from my side too... > I responded to the thread comments separately. However, assuming NACK > is the final response, I have a question. Nothing is ever final :) > The problem is the 5 msec sleep between polls of burst count. In the > case of one TPM with an 8 byte FIFO, a 32 byte transfer incurs 4 of > these sleeps. Yes that's bad, especially with current msleep(5) is actually msleep(20). However, please also keep in mind SPI tpms show a much higher burst count value, (255) our I2C TPM SLB9645 even shows something in the range of 1k. :) > Would another solution be to reduce the burst count poll and sleep to, > e.g., 100 usec or even 10 usec? This would probably help greatly, but > till not incur the wait states that triggered the NACK. Imho the 5ms were an arbitrary chosen value for old old tpms. Back then also only msleep was available which was msleep(20) anyway. > My worry is that the scheduler would not be able to context switch that > fast, and so we wouldn't actually see usec speed polling. > Can a kernel expert offer an opinion? If you use sleep it's not guaranteed that you wakeup after exactly xx specified amount of time. Just that you sleep at least xx amount of time. Otherwise you would have to do delay/busywaiting. https://www.kernel.org/doc/Documentation/timers/timers-howto.txt Imho the best option is to figure out whether any vendor can determine the "FIFO flush time" i.e. how much time does it take to empty the fifo and fillup the burstcount and use this on the lower bound of an usleep range. I don't think tpms are in the range of < 10 us... @Ken: Maybe can you check in DDWG? What we often see/saw during some performance optimization tests is something like burstcount 7, burstcount 1, burstcount 7, burstcount 1... which is the oposite of what you want to achieve. You'd like to have something like burstcount 8, burstcount 8, burstcount 8 Unfortunately TPMS don't report their max burstcount (afaik), otherwise it would be easy to write a calibration routine for the poll times. Thanks, Peter