Re: sched/fair: Hard lockup from idle_balance()/task_numa_migrate() race
On Mon, Oct 16, 2017 at 4:11 PM, Ed Swierk wrote: > To recap: a dual-socket Xeon (E5 v4) server system had been running a > bunch of KVM workloads just fine for over 6 weeks. Suddenly hard > lockups occurred on cpu 13 in task_numa_migrate(), and cpu 0 in > idle_balance(). That condition persisted until the system was rebooted > after a couple of hours, and meanwhile all manner of soft lockups and > even SATA errors appeared. > > [3851435.62] NMI watchdog: Watchdog detected hard LOCKUP on cpu 13 > [3851435.784582] Modules linked in: vhost_net tun vhost macvtap 8021q garp > mrp drbg ansi_cprng dm_crypt algif_skcipher af_alg ip_gre ip_tunnel gre > macvlan ebtable_filter ebtables openvswitch nf_defrag_ipv6 libcrc32c nfsd > auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ip6table_filter ip6_tables > nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp nf_conntrack_ipv4 > nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables > iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp coretemp > kvm_intel kvm irqbypass crct10dif_pclmul raid10 crc32_pclmul aesni_intel > aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac lpc_ich > mfd_core mei_me mei i2c_i801 ioatdma ipmi_ssif ipmi_msghandler acpi_pad > acpi_cpufreq squashfs lz4_decompress xhci_pci xhci_hcd ixgbe mdio vxlan > ip6_udp_tunnel udp_tunnel ptp pps_core dca ehci_pci ehci_hcd usbcore ahci > libahci usb_common tpm_tis > [3851435.874884] irq event stamp: 3828060294 > [3851435.879398] hardirqs last enabled at (3828060293): [] > 810db86c > [3851435.10] hardirqs last disabled at (3828060294): [] > 8170ff74 > [3851435.898254] softirqs last enabled at (3828060156): [] > 8107f31e > [3851435.907666] softirqs last disabled at (3828060139): [] > 8107f648 > [3851435.917094] CPU: 13 PID: 2146 Comm: pocviexg Not tainted 4.4.52-grsec #1 > [3851435.925252] Hardware name: Intel S2600WTTR, BIOS > SE5C610.86B.01.01.0016.C1.033120161139 03/31/2016 > [3851435.937473] task: 881c9ff35c00 ti: 881c9ff372b0 task.ti: > 881c9ff372b0 > [3851435.946104] RIP: 0010:[] [] > try_to_wake_up+0xe4/0x500 > [3851435.955614] RSP: :c90016beba40 EFLAGS: 0002 > [3851435.961780] RAX: RBX: 88102cc8e3d8 RCX: > 0001 > [3851435.970016] RDX: 0001 RSI: 01e4 RDI: > > [3851435.978259] RBP: c90016beba80 R08: R09: > 0001 > [3851435.986493] R10: 00e0 R11: 881c9ff35c00 R12: > 0082 > [3851435.994727] R13: 88102cc8dc00 R14: 001d3f00 R15: > 81f25060 > [3851436.002962] FS: 6e79227fc700() GS:88203400() > knlGS: > [3851436.012266] CS: 0010 DS: ES: CR0: 80050033 > [3851436.018920] CR2: 6e79227fba38 CR3: 00201d2fe000 CR4: > 003626f0 > [3851436.027155] DR0: DR1: DR2: > > [3851436.035391] DR3: DR6: fffe0ff0 DR7: > 0400 > [3851436.043626] Stack: > [3851436.046098] 8820341ce620 8820341ce608 > 8820341ce600 > [3851436.054695] c90016bebb48 000d > 881033fce600 > [3851436.063259] c90016beba90 810adc80 c90016bebbe8 > 8112c830 > [3851436.071868] Call Trace: > [3851436.074862] [] wake_up_process+0x10/0x20 > > https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/core.c#L1997 > [3851436.081317] [] stop_two_cpus+0x1b0/0x280 > > https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/stop_machine.c#L235 > [3851436.087778] [] ? cpu_stop_park+0x40/0x40 > [3851436.094237] [] ? cpu_stop_park+0x40/0x40 > [3851436.100715] [] ? __migrate_swap_task.part.94+0x70/0x70 > [3851436.108539] [] migrate_swap+0xf1/0x1f0 > > https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/core.c#L1408 > [3851436.114800] [] task_numa_migrate+0x345/0xb60 > > https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/fair.c#L1572 > [3851436.121656] [] ? futex_wait_queue_me+0xf3/0x160 > [3851436.128801] [] numa_migrate_preferred+0x73/0x80 > > https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/fair.c#L1597 > [3851436.135936] [] task_numa_fault+0x7a4/0xee0 > > https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/fair.c#L2137 > [3851436.142604] [] ? should_numa_migrate_memory+0x50/0x130 > [3851436.150437] [] handle_mm_fault+0xfd6/0x1e10 > https:/
Re: sched/fair: Hard lockup from idle_balance()/task_numa_migrate() race
On Fri, Nov 3, 2017 at 8:03 AM, Linus Torvalds wrote: > I suspect that you'd get more responses if it was an upstream kernel > (and preferably newer), or even a distro one. > > If you run the grsec patches, you need to go to grsec to get support. > We don't even know what they are doing, since they're hiding their > patches and not breaking them out. I understand--we are also not keen on the way grsec distributes their patches or the timeliness with which they follow stable/LTS kernel releases. In this case, I can't conceive of a way to reproduce the problem with an upstream kernel or any other, yet to my limited understanding of the evidence it appears there may indeed be a real problem lurking in there. I will follow up with the grsec folks. > On Thu, Nov 2, 2017 at 5:51 PM, Ed Swierk wrote: >> Ping? >> >> On Wed, Oct 25, 2017 at 9:35 PM, Ed Swierk >> wrote: >>> >>> Ping? >>> >>> On Mon, Oct 16, 2017 at 4:11 PM, Ed Swierk >>> wrote: >>> > >>> > Ping for Peter, Ingo and other sched maintainers: >>> > >>> > I'd appreciate any feedback on this hard lockup issue, which occurred >>> > on a system running kernel 4.4.52-grsec. Thanks, --Ed
Re: sched/fair: Hard lockup from idle_balance()/task_numa_migrate() race
Ping? On Wed, Oct 25, 2017 at 9:35 PM, Ed Swierk wrote: > > Ping? > > On Mon, Oct 16, 2017 at 4:11 PM, Ed Swierk wrote: > > > > Ping for Peter, Ingo and other sched maintainers: > > > > I'd appreciate any feedback on this hard lockup issue, which occurred > > on a system running kernel 4.4.52-grsec. > > > > To recap: a dual-socket Xeon (E5 v4) server system had been running a > > bunch of KVM workloads just fine for over 6 weeks. Suddenly hard > > lockups occurred on cpu 13 in task_numa_migrate(), and cpu 0 in > > idle_balance(). That condition persisted until the system was rebooted > > after a couple of hours, and meanwhile all manner of soft lockups and > > even SATA errors appeared. > > > > [3851435.62] NMI watchdog: Watchdog detected hard LOCKUP on cpu 13 > > [3851435.784582] Modules linked in: vhost_net tun vhost macvtap 8021q garp > > mrp drbg ansi_cprng dm_crypt algif_skcipher af_alg ip_gre ip_tunnel gre > > macvlan ebtable_filter ebtables openvswitch nf_defrag_ipv6 libcrc32c nfsd > > auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ip6table_filter > > ip6_tables nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp > > nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter > > ip_tables x_tables iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal > > intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul raid10 > > crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper > > cryptd sb_edac lpc_ich mfd_core mei_me mei i2c_i801 ioatdma ipmi_ssif > > ipmi_msghandler acpi_pad acpi_cpufreq squashfs lz4_decompress xhci_pci > > xhci_hcd ixgbe mdio vxlan ip6_udp_tunnel udp_tunnel ptp pps_core dca > > ehci_pci ehci_hcd usbcore ahci libahci usb_common tpm_tis > > [3851435.874884] irq event stamp: 3828060294 > > [3851435.879398] hardirqs last enabled at (3828060293): > > [] 810db86c > > [3851435.10] hardirqs last disabled at (3828060294): > > [] 8170ff74 > > [3851435.898254] softirqs last enabled at (3828060156): > > [] 8107f31e > > [3851435.907666] softirqs last disabled at (3828060139): > > [] 8107f648 > > [3851435.917094] CPU: 13 PID: 2146 Comm: pocviexg Not tainted 4.4.52-grsec > > #1 > > [3851435.925252] Hardware name: Intel S2600WTTR, BIOS > > SE5C610.86B.01.01.0016.C1.033120161139 03/31/2016 > > [3851435.937473] task: 881c9ff35c00 ti: 881c9ff372b0 task.ti: > > 881c9ff372b0 > > [3851435.946104] RIP: 0010:[] [] > > try_to_wake_up+0xe4/0x500 > > [3851435.955614] RSP: :c90016beba40 EFLAGS: 0002 > > [3851435.961780] RAX: RBX: 88102cc8e3d8 RCX: > > 0001 > > [3851435.970016] RDX: 0001 RSI: 01e4 RDI: > > > > [3851435.978259] RBP: c90016beba80 R08: R09: > > 0001 > > [3851435.986493] R10: 00e0 R11: 881c9ff35c00 R12: > > 0082 > > [3851435.994727] R13: 88102cc8dc00 R14: 001d3f00 R15: > > 81f25060 > > [3851436.002962] FS: 6e79227fc700() GS:88203400() > > knlGS: > > [3851436.012266] CS: 0010 DS: ES: CR0: 80050033 > > [3851436.018920] CR2: 6e79227fba38 CR3: 00201d2fe000 CR4: > > 003626f0 > > [3851436.027155] DR0: DR1: DR2: > > > > [3851436.035391] DR3: DR6: fffe0ff0 DR7: > > 0400 > > [3851436.043626] Stack: > > [3851436.046098] 8820341ce620 8820341ce608 > > 8820341ce600 > > [3851436.054695] c90016bebb48 000d > > 881033fce600 > > [3851436.063259] c90016beba90 810adc80 c90016bebbe8 > > 8112c830 > > [3851436.071868] Call Trace: > > [3851436.074862] [] wake_up_process+0x10/0x20 > > > > https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/core.c#L1997 > > [3851436.081317] [] stop_two_cpus+0x1b0/0x280 > > > > https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/stop_machine.c#L235 > > [3851436.087778] [] ? cpu_stop_park+0x40/0x40 > > [3851436.094237] [] ? cpu_stop_park+0x40/0x40 > > [3851436.100715] [] ? > > __migrate_swap_task.part.94+0x70/0x70 > > [3851436.108539] [] migrate_swap+0xf1/0x1f0 > > > > https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kern
Re: sched/fair: Hard lockup from idle_balance()/task_numa_migrate() race
Ping? On Mon, Oct 16, 2017 at 4:11 PM, Ed Swierk wrote: > > Ping for Peter, Ingo and other sched maintainers: > > I'd appreciate any feedback on this hard lockup issue, which occurred > on a system running kernel 4.4.52-grsec. > > To recap: a dual-socket Xeon (E5 v4) server system had been running a > bunch of KVM workloads just fine for over 6 weeks. Suddenly hard > lockups occurred on cpu 13 in task_numa_migrate(), and cpu 0 in > idle_balance(). That condition persisted until the system was rebooted > after a couple of hours, and meanwhile all manner of soft lockups and > even SATA errors appeared. > > [3851435.62] NMI watchdog: Watchdog detected hard LOCKUP on cpu 13 > [3851435.784582] Modules linked in: vhost_net tun vhost macvtap 8021q garp > mrp drbg ansi_cprng dm_crypt algif_skcipher af_alg ip_gre ip_tunnel gre > macvlan ebtable_filter ebtables openvswitch nf_defrag_ipv6 libcrc32c nfsd > auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ip6table_filter ip6_tables > nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp nf_conntrack_ipv4 > nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables > iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp coretemp > kvm_intel kvm irqbypass crct10dif_pclmul raid10 crc32_pclmul aesni_intel > aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac lpc_ich > mfd_core mei_me mei i2c_i801 ioatdma ipmi_ssif ipmi_msghandler acpi_pad > acpi_cpufreq squashfs lz4_decompress xhci_pci xhci_hcd ixgbe mdio vxlan > ip6_udp_tunnel udp_tunnel ptp pps_core dca ehci_pci ehci_hcd usbcore ahci > libahci usb_common tpm_tis > [3851435.874884] irq event stamp: 3828060294 > [3851435.879398] hardirqs last enabled at (3828060293): [] > 810db86c > [3851435.10] hardirqs last disabled at (3828060294): [] > 8170ff74 > [3851435.898254] softirqs last enabled at (3828060156): [] > 8107f31e > [3851435.907666] softirqs last disabled at (3828060139): [] > 8107f648 > [3851435.917094] CPU: 13 PID: 2146 Comm: pocviexg Not tainted 4.4.52-grsec #1 > [3851435.925252] Hardware name: Intel S2600WTTR, BIOS > SE5C610.86B.01.01.0016.C1.033120161139 03/31/2016 > [3851435.937473] task: 881c9ff35c00 ti: 881c9ff372b0 task.ti: > 881c9ff372b0 > [3851435.946104] RIP: 0010:[] [] > try_to_wake_up+0xe4/0x500 > [3851435.955614] RSP: :c90016beba40 EFLAGS: 0002 > [3851435.961780] RAX: RBX: 88102cc8e3d8 RCX: > 0001 > [3851435.970016] RDX: 0001 RSI: 01e4 RDI: > > [3851435.978259] RBP: c90016beba80 R08: R09: > 0001 > [3851435.986493] R10: 00e0 R11: 881c9ff35c00 R12: > 0082 > [3851435.994727] R13: 88102cc8dc00 R14: 001d3f00 R15: > 81f25060 > [3851436.002962] FS: 6e79227fc700() GS:88203400() > knlGS: > [3851436.012266] CS: 0010 DS: ES: CR0: 80050033 > [3851436.018920] CR2: 6e79227fba38 CR3: 00201d2fe000 CR4: > 003626f0 > [3851436.027155] DR0: DR1: DR2: > > [3851436.035391] DR3: DR6: fffe0ff0 DR7: > 0400 > [3851436.043626] Stack: > [3851436.046098] 8820341ce620 8820341ce608 > 8820341ce600 > [3851436.054695] c90016bebb48 000d > 881033fce600 > [3851436.063259] c90016beba90 810adc80 c90016bebbe8 > 8112c830 > [3851436.071868] Call Trace: > [3851436.074862] [] wake_up_process+0x10/0x20 > > https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/core.c#L1997 > [3851436.081317] [] stop_two_cpus+0x1b0/0x280 > > https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/stop_machine.c#L235 > [3851436.087778] [] ? cpu_stop_park+0x40/0x40 > [3851436.094237] [] ? cpu_stop_park+0x40/0x40 > [3851436.100715] [] ? __migrate_swap_task.part.94+0x70/0x70 > [3851436.108539] [] migrate_swap+0xf1/0x1f0 > > https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/core.c#L1408 > [3851436.114800] [] task_numa_migrate+0x345/0xb60 > > https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/fair.c#L1572 > [3851436.121656] [] ? futex_wait_queue_me+0xf3/0x160 > [3851436.128801] [] numa_migrate_preferred+0x73/0x80 > > https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/fair.c#L1597 > [3851436.135936] [] task_numa_fault+0x7a4/0xee0 > > https://github.com/skyp
sched/fair: Hard lockup from idle_balance()/task_numa_migrate() race
Ping for Peter, Ingo and other sched maintainers: I'd appreciate any feedback on this hard lockup issue, which occurred on a system running kernel 4.4.52-grsec. To recap: a dual-socket Xeon (E5 v4) server system had been running a bunch of KVM workloads just fine for over 6 weeks. Suddenly hard lockups occurred on cpu 13 in task_numa_migrate(), and cpu 0 in idle_balance(). That condition persisted until the system was rebooted after a couple of hours, and meanwhile all manner of soft lockups and even SATA errors appeared. [3851435.62] NMI watchdog: Watchdog detected hard LOCKUP on cpu 13 [3851435.784582] Modules linked in: vhost_net tun vhost macvtap 8021q garp mrp drbg ansi_cprng dm_crypt algif_skcipher af_alg ip_gre ip_tunnel gre macvlan ebtable_filter ebtables openvswitch nf_defrag_ipv6 libcrc32c nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ip6table_filter ip6_tables nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul raid10 crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac lpc_ich mfd_core mei_me mei i2c_i801 ioatdma ipmi_ssif ipmi_msghandler acpi_pad acpi_cpufreq squashfs lz4_decompress xhci_pci xhci_hcd ixgbe mdio vxlan ip6_udp_tunnel udp_tunnel ptp pps_core dca ehci_pci ehci_hcd usbcore ahci libahci usb_common tpm_tis [3851435.874884] irq event stamp: 3828060294 [3851435.879398] hardirqs last enabled at (3828060293): [] 810db86c [3851435.10] hardirqs last disabled at (3828060294): [] 8170ff74 [3851435.898254] softirqs last enabled at (3828060156): [] 8107f31e [3851435.907666] softirqs last disabled at (3828060139): [] 8107f648 [3851435.917094] CPU: 13 PID: 2146 Comm: pocviexg Not tainted 4.4.52-grsec #1 [3851435.925252] Hardware name: Intel S2600WTTR, BIOS SE5C610.86B.01.01.0016.C1.033120161139 03/31/2016 [3851435.937473] task: 881c9ff35c00 ti: 881c9ff372b0 task.ti: 881c9ff372b0 [3851435.946104] RIP: 0010:[] [] try_to_wake_up+0xe4/0x500 [3851435.955614] RSP: :c90016beba40 EFLAGS: 0002 [3851435.961780] RAX: RBX: 88102cc8e3d8 RCX: 0001 [3851435.970016] RDX: 0001 RSI: 01e4 RDI: [3851435.978259] RBP: c90016beba80 R08: R09: 0001 [3851435.986493] R10: 00e0 R11: 881c9ff35c00 R12: 0082 [3851435.994727] R13: 88102cc8dc00 R14: 001d3f00 R15: 81f25060 [3851436.002962] FS: 6e79227fc700() GS:88203400() knlGS: [3851436.012266] CS: 0010 DS: ES: CR0: 80050033 [3851436.018920] CR2: 6e79227fba38 CR3: 00201d2fe000 CR4: 003626f0 [3851436.027155] DR0: DR1: DR2: [3851436.035391] DR3: DR6: fffe0ff0 DR7: 0400 [3851436.043626] Stack: [3851436.046098] 8820341ce620 8820341ce608 8820341ce600 [3851436.054695] c90016bebb48 000d 881033fce600 [3851436.063259] c90016beba90 810adc80 c90016bebbe8 8112c830 [3851436.071868] Call Trace: [3851436.074862] [] wake_up_process+0x10/0x20 https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/core.c#L1997 [3851436.081317] [] stop_two_cpus+0x1b0/0x280 https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/stop_machine.c#L235 [3851436.087778] [] ? cpu_stop_park+0x40/0x40 [3851436.094237] [] ? cpu_stop_park+0x40/0x40 [3851436.100715] [] ? __migrate_swap_task.part.94+0x70/0x70 [3851436.108539] [] migrate_swap+0xf1/0x1f0 https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/core.c#L1408 [3851436.114800] [] task_numa_migrate+0x345/0xb60 https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/fair.c#L1572 [3851436.121656] [] ? futex_wait_queue_me+0xf3/0x160 [3851436.128801] [] numa_migrate_preferred+0x73/0x80 https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/fair.c#L1597 [3851436.135936] [] task_numa_fault+0x7a4/0xee0 https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/fair.c#L2137 [3851436.142604] [] ? should_numa_migrate_memory+0x50/0x130 [3851436.150437] [] handle_mm_fault+0xfd6/0x1e10 https://github.com/skyportsystems/linux/blob/4.4.52-grsec/mm/memory.c#L3503 [3851436.157189] [] ? handle_mm_fault+0x48/0x1e10 [3851436.164051] [] ? ___might_sleep+0x1ea/0x220 [3851436.170804] [] __do_page_fault+0x1bf/0x460 [3851436.177475] [] ? sys_futex+0x74/0x230 [3851436.183642] [] do_page_fault+0x2c/0x40 [3851436.189909] [] page_fault+0x28/0x30 [3851
Re: Which came first, hard kernel lockup or SATA errors?
Continuing the conversation with the voices in my head... On Mon, Oct 9, 2017 at 10:45 PM, Ed Swierk wrote: > Based on the addresses in the stack and registers, here's what I think > happened. > > On cpu 13: > > - task_numa_fault() calls task_numa_migrate(), which selects the task > on cpu 0 as the dst_task. > - migrate_swap() calls stop_two_cpus(), which acquires the cpu_stopper > locks for the dst_cpu (cpu 0, at 0x881033fce600) and src_cpu > (cpu X, at 0x8820341ce600). > - stop_two_cpus() calls wake_up_process() on the lower-numbered cpu > first, which has to be cpu 0. > - wake_up_process() spins until the cpu 0 task (at 0x88102cc8dc00) > is no longer on_cpu. > > On cpu 0: > > - pick_next_task_fair() calls idle_balance(). According to the "This > is OK" comment, current is on_cpu at this point. > - idle_balance() calls load_balance() for dst_cpu 0. > - load_balance() decides to move a task from cpu X, so calls > stop_one_cpu_nowait() on cpu X. > - stop_one_cpu_nowait() spins trying to acquire the cpu_stopper lock > for cpu X (at 0x8820341ce600). > > So idle_balance() on cpu 0 is stuck waiting for task_numa_fault() to > move a task to cpu 0, which is blocked on idle_balance() completing. Also, it appears that task_numa_fault() tries to migrate current, so the src_cpu X used by task_numa_migrate() is cpu 13 in this case. Though the key issue is that both task_numa_migrate() and idle_balance() are trying to stop the same cpu, regardless of whether it's the cpu task_numa_migrate() is running on. So I'm wondering how this situation could be prevented. Can task_numa_migrate() avoid picking a dst_task that might itself try to stop either src_cpu or dst_cpu? Or, can load_balance() avoid a cpu that might be stopped for migration (or any other reason), or detect such a conflict and bail out rather than spinning forever? --Ed
Re: Which came first, hard kernel lockup or SATA errors?
On Fri, Oct 6, 2017 at 6:25 PM, Ed Swierk wrote: > I'm trying to untangle a series of problems that suddenly occurred on > a dual-socket Xeon server system that had been running a bunch of KVM > workloads just fine for over 6 weeks (4.4.52-grsec kernel, > Debian-derived userspace). I think I've narrowed the hard lockup to a race between task_numa_migrate() and idle_balance(). Here are the two lockup messages again, this time with approximate code references. [3851435.62] NMI watchdog: Watchdog detected hard LOCKUP on cpu 13 [3851435.784582] Modules linked in: vhost_net tun vhost macvtap 8021q garp mrp drbg ansi_cprng dm_crypt algif_skcipher af_alg ip_gre ip_tunnel gre macvlan ebtable_filter ebtables openvswitch nf_defrag_ipv6 libcrc32c nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ip6table_filter ip6_tables nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul raid10 crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac lpc_ich mfd_core mei_me mei i2c_i801 ioatdma ipmi_ssif ipmi_msghandler acpi_pad acpi_cpufreq squashfs lz4_decompress xhci_pci xhci_hcd ixgbe mdio vxlan ip6_udp_tunnel udp_tunnel ptp pps_core dca ehci_pci ehci_hcd usbcore ahci libahci usb_common tpm_tis [3851435.874884] irq event stamp: 3828060294 [3851435.879398] hardirqs last enabled at (3828060293): [] 810db86c [3851435.10] hardirqs last disabled at (3828060294): [] 8170ff74 [3851435.898254] softirqs last enabled at (3828060156): [] 8107f31e [3851435.907666] softirqs last disabled at (3828060139): [] 8107f648 [3851435.917094] CPU: 13 PID: 2146 Comm: pocviexg Not tainted 4.4.52-grsec #1 [3851435.925252] Hardware name: Intel S2600WTTR, BIOS SE5C610.86B.01.01.0016.C1.033120161139 03/31/2016 [3851435.937473] task: 881c9ff35c00 ti: 881c9ff372b0 task.ti: 881c9ff372b0 [3851435.946104] RIP: 0010:[] [] try_to_wake_up+0xe4/0x500 [3851435.955614] RSP: :c90016beba40 EFLAGS: 0002 [3851435.961780] RAX: RBX: 88102cc8e3d8 RCX: 0001 [3851435.970016] RDX: 0001 RSI: 01e4 RDI: [3851435.978259] RBP: c90016beba80 R08: R09: 0001 [3851435.986493] R10: 00e0 R11: 881c9ff35c00 R12: 0082 [3851435.994727] R13: 88102cc8dc00 R14: 001d3f00 R15: 81f25060 [3851436.002962] FS: 6e79227fc700() GS:88203400() knlGS: [3851436.012266] CS: 0010 DS: ES: CR0: 80050033 [3851436.018920] CR2: 6e79227fba38 CR3: 00201d2fe000 CR4: 003626f0 [3851436.027155] DR0: DR1: DR2: [3851436.035391] DR3: DR6: fffe0ff0 DR7: 0400 [3851436.043626] Stack: [3851436.046098] 8820341ce620 8820341ce608 8820341ce600 [3851436.054695] c90016bebb48 000d 881033fce600 [3851436.063259] c90016beba90 810adc80 c90016bebbe8 8112c830 [3851436.071868] Call Trace: [3851436.074862] [] wake_up_process+0x10/0x20 https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/core.c#L1997 [3851436.081317] [] stop_two_cpus+0x1b0/0x280 https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/stop_machine.c#L235 [3851436.087778] [] ? cpu_stop_park+0x40/0x40 [3851436.094237] [] ? cpu_stop_park+0x40/0x40 [3851436.100715] [] ? __migrate_swap_task.part.94+0x70/0x70 [3851436.108539] [] migrate_swap+0xf1/0x1f0 https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/core.c#L1408 [3851436.114800] [] task_numa_migrate+0x345/0xb60 https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/fair.c#L1572 [3851436.121656] [] ? futex_wait_queue_me+0xf3/0x160 [3851436.128801] [] numa_migrate_preferred+0x73/0x80 https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/fair.c#L1597 [3851436.135936] [] task_numa_fault+0x7a4/0xee0 https://github.com/skyportsystems/linux/blob/4.4.52-grsec/kernel/sched/fair.c#L2137 [3851436.142604] [] ? should_numa_migrate_memory+0x50/0x130 [3851436.150437] [] handle_mm_fault+0xfd6/0x1e10 https://github.com/skyportsystems/linux/blob/4.4.52-grsec/mm/memory.c#L3503 [3851436.157189] [] ? handle_mm_fault+0x48/0x1e10 [3851436.164051] [] ? ___might_sleep+0x1ea/0x220 [3851436.170804] [] __do_page_fault+0x1bf/0x460 [3851436.177475] [] ? sys_futex+0x74/0x230 [3851436.183642] [] do_page_fault+0x2c/0x40 [3851436.189909] [] page_fault+0x28/0x30 [3851436.195892] Code: 0f 85 e0 02 00 00 41 8
Which came first, hard kernel lockup or SATA errors?
I'm trying to untangle a series of problems that suddenly occurred on a dual-socket Xeon server system that had been running a bunch of KVM workloads just fine for over 6 weeks (4.4.52-grsec kernel, Debian-derived userspace). Here are the highlights, with timestamps in seconds: [3851435] NMI watchdog: Watchdog detected hard LOCKUP on cpu 13 [3851438] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0 [3851450] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [pocviexg:2136] [3851450] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 22s! [CPU 0/KVM:9662] [3851450] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [CPU 1/KVM:10302] [3851478] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [pocviexg:2136] [3851478] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 22s! [CPU 0/KVM:9662] [3851478] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [CPU 1/KVM:10302] [3851482] BUG: spinlock lockup suspected on CPU#0, migration/0/10 [3851482] Sending NMI to all CPUs: [3851482] NMI backtrace for cpu 0 [3851483] NMI backtrace for cpu 1 [3851483] NMI backtrace for cpu 2 [3851483] NMI backtrace for cpu 3 [3851483] NMI backtrace for cpu 4 [3851484] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ** [3851484] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ** [3851484] NMI backtrace for cpu 5 [3851484] NMI backtrace for cpu 6 ... [3851489] NMI backtrace for cpu 24 [3851489] ata8.00: qc timeout (cmd 0xec) ** [3851489] ata7.00: qc timeout (cmd 0xec) ** [3851489] NMI backtrace for cpu 25 ... [3851490] NMI backtrace for cpu 31 [3851499] ata8.00: qc timeout (cmd 0xec) ** [3851499] ata7.00: qc timeout (cmd 0xec) ** [3851510] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 22s! [CPU 0/KVM:9662] ...and so on for 2 more hours, until the system was finally rebooted First I want to know if I can blame the whole mess on flaky hardware. The error messages marked with ** are from IO operations on two SATA SSDs. On one hand, getting errors from two different drives at the exact same time suggests that the drives themselves aren't the problem. On the other hand, the fact that the system has two other drives connected to the same controller that continued to work (even though all were being accessed more or less simultaneously in a software RAID-10 configuration) suggests that the controller isn't totally broken. (The controller is embedded in the C610 PCH on the motherboard.) No SATA errors have occurred on that system before or since. Is there any chance that those SATA errors could have been caused by the kernel lockups--say due to a timing-sensitive interrupt not getting serviced? Or by any other conceivable software problem? If not, is it plausible that a hardware problem with the SATA controller or drives caused the kernel lockups, even though the first hard lockups were reported 48 seconds before the first SATA error messages? Following is an excerpt of the console log. Let me know if any other details would be helpful. [3851435.62] NMI watchdog: Watchdog detected hard LOCKUP on cpu 13 [3851435.784582] Modules linked in: vhost_net tun vhost macvtap 8021q garp mrp drbg ansi_cprng dm_crypt algif_skcipher af_alg ip_gre ip_tunnel gre macvlan ebtable_filter ebtables openvswitch nf_defrag_ipv6 libcrc32c nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ip6table_filter ip6_tables nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul raid10 crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac lpc_ich mfd_core mei_me mei i2c_i801 ioatdma ipmi_ssif ipmi_msghandler acpi_pad acpi_cpufreq squashfs lz4_decompress xhci_pci xhci_hcd ixgbe mdio vxlan ip6_udp_tunnel udp_tunnel ptp pps_core dca ehci_pci ehci_hcd usbcore ahci libahci usb_common tpm_tis [3851435.874884] irq event stamp: 3828060294 [3851435.879398] hardirqs last enabled at (3828060293): [] 810db86c [3851435.10] hardirqs last disabled at (3828060294): [] 8170ff74 [3851435.898254] softirqs last enabled at (3828060156): [] 8107f31e [3851435.907666] softirqs last disabled at (3828060139): [] 8107f648 [3851435.917094] CPU: 13 PID: 2146 Comm: pocviexg Not tainted 4.4.52-grsec #1 [3851435.925252] Hardware name: Intel S2600WTTR, BIOS SE5C610.86B.01.01.0016.C1.033120161139 03/31/2016 [3851435.937473] task: 881c9ff35c00 ti: 881c9ff372b0 task.ti: 881c9ff372b0 [3851435.946104] RIP: 0010:[] [] try_to_wake_up+0xe4/0x500 [3851435.955614] RSP: :c90016beba40 EFLAGS: 0002 [3851435.961780] RAX: RBX: 88102cc8e3d8 RCX: 0001 [3851435.970016] RDX: 0001 RSI: 01e4 RDI: [3851435.978259] RBP: c90016beba80 R08:
[PATCH] thermal: intel_pch_thermal: Read large temp values correctly
On all supported platforms, the TS Reading (TSR) field in the Temperature (TEMP) register is 9 bits wide. Values above 0x100 (78 degrees C) are plausible, so don't mask out the topmost bit. And the register itself is 16 bits wide, so use readw() rather than readl(). Signed-off-by: Ed Swierk --- drivers/thermal/intel_pch_thermal.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/thermal/intel_pch_thermal.c b/drivers/thermal/intel_pch_thermal.c index 9889c90..318fc1b 100644 --- a/drivers/thermal/intel_pch_thermal.c +++ b/drivers/thermal/intel_pch_thermal.c @@ -49,7 +49,7 @@ #define WPT_TSGPEN 0x84/* General Purpose Event Enables */ /* Wildcat Point-LP PCH Thermal Register bit definitions */ -#define WPT_TEMP_TSR 0x00ff /* Temp TS Reading */ +#define WPT_TEMP_TSR 0x01ff /* Temp TS Reading */ #define WPT_TSC_CPDE 0x01/* Catastrophic Power-Down Enable */ #define WPT_TSS_TSDSS 0x10/* Thermal Sensor Dynamic Shutdown Status */ #define WPT_TSS_GPES 0x08/* GPE status */ @@ -174,9 +174,9 @@ static int pch_wpt_init(struct pch_thermal_device *ptd, int *nr_trips) static int pch_wpt_get_temp(struct pch_thermal_device *ptd, int *temp) { - u8 wpt_temp; + u16 wpt_temp; - wpt_temp = WPT_TEMP_TSR & readl(ptd->hw_base + WPT_TEMP); + wpt_temp = WPT_TEMP_TSR & readw(ptd->hw_base + WPT_TEMP); /* Resolution of 1/2 degree C and an offset of -50C */ *temp = (wpt_temp * 1000 / 2 - 5); -- 1.9.1
[PATCH] thermal: intel_pch_thermal: Fix enable check on Broadwell-DE
Using the TSDSS flag to determine whether the thermal sensor is enabled is problematic. Broadwell-DE (Xeon D-1500) does not support dynamic shutdown and the TSDSS flag always reads 0 (contrary to the current datasheet). Even on hardware supporting dynamic shutdown, the driver does nothing to configure it, and the dynamic shutdown state should not prevent the driver from loading. The ETS flag itself indicates whether the thermal sensor is enabled, so use it instead of the TSDSS flag on all hardware platforms. Signed-off-by: Ed Swierk --- drivers/thermal/intel_pch_thermal.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/thermal/intel_pch_thermal.c b/drivers/thermal/intel_pch_thermal.c index 2b49e8d..9889c90 100644 --- a/drivers/thermal/intel_pch_thermal.c +++ b/drivers/thermal/intel_pch_thermal.c @@ -125,7 +125,7 @@ static int pch_wpt_init(struct pch_thermal_device *ptd, int *nr_trips) *nr_trips = 0; /* Check if BIOS has already enabled thermal sensor */ - if (WPT_TSS_TSDSS & readb(ptd->hw_base + WPT_TSS)) { + if (WPT_TSEL_ETS & readb(ptd->hw_base + WPT_TSEL)) { ptd->bios_enabled = true; goto read_trips; } @@ -141,7 +141,7 @@ static int pch_wpt_init(struct pch_thermal_device *ptd, int *nr_trips) } writeb(tsel|WPT_TSEL_ETS, ptd->hw_base + WPT_TSEL); - if (!(WPT_TSS_TSDSS & readb(ptd->hw_base + WPT_TSS))) { + if (!(WPT_TSEL_ETS & readb(ptd->hw_base + WPT_TSEL))) { dev_err(&ptd->pdev->dev, "Sensor can't be enabled\n"); return -ENODEV; } -- 1.9.1
Disable all network protocols on an interface?
I have a Linux kernel 4.4 system hosting a number of kvm VMs. Physical interface eth0 connects to an 802.1Q trunk port on an external switch. Each VM has a virtual interface (e1000 or virtio-net) connected to the physical NIC through a macvtap interface and a VLAN interface; traffic between the external switch and the host is tagged with a per-VM tag. The only logic is demultiplexing incoming traffic by VLAN tag and stripping the tag, and adding the tag for outgoing traffic. Other than that, the eth0-VM datapath is a dumb pipe. eth0 is assigned an IP address for host applications to send and receive untagged packets. For example, here's the setup with 2 VMs. +- (untagged) 192.168.0.2 eth0 -+- (tag 1) --- eth0.1 --- macvtap1 --- VM1 +- (tag 2) --- eth0.2 --- macvtap2 --- VM2 Various iptables rules filter the untagged packets received for host applications. The last rule in the INPUT chain logs incoming packets that don't match earlier rules: -A INPUT -m limit --limit 10/min -j LOG --log-prefix FilterInput This all works, but I see occasional FilterInput messages for traffic received on eth0.1 and eth0.2: so far, only DHCP packets with destination MAC address ff:ff:ff:ff:ff:ff. FilterInput IN=eth0.1 OUT= MAC=ff:ff:ff:ff:ff:ff:00:01:02:03:04:05:08:00 SRC=0.0.0.0 DST=255.255.255.255 LEN=328 TOS=0x10 PREC=0x00 TTL=128 ID=0 PROTO=UDP SPT=68 DPT=67 LEN=308 Even though these are IP packets, I naively expect packets received on the VLAN interface lacking IP address to be either consumed by the attached macvtap or dropped before they trigger an iptables filter INPUT rule. It's a bit alarming to see packets destined for a VM to be processed at all by the host IP stack. Digging through the code, I find that the core packet receive function __netif_receive_skb_core() first gives master devices like bridges and macvlans/macvtaps a chance to consume the packet; otherwise the packet gets handled by all installed protocols like IPv4. The packet gets pretty far down the IP receive process before it's discovered that there's nowhere to route it to, and no local sockets to deliver it to. The iptables INPUT chain is invoked well before that happens. (As far as I can tell, there's no explicit check in the IP receive code whether a local interface has an IP address.) The macvlan's rx_handler definitively consumes or drops unicast packets, depending on the destination MAC address. But for broadcast packets, it passes the packet to the attached VM interface and also tells the core receive function to continue processing it. Presumably this is to allow a macvlan to attach to one or more VMs as well as have a local IP address. The logic in the bridge driver is a bit different: it consumes all packets from the slave interface. This makes sense as only the bridge master interface can be assigned a local IP address. However in my application, I'm setting up the macvtap interfaces in passthrough mode, which precludes assigning a local IP address, just like a bridge slave. So it stands to reason that for a macvlan in passthrough mode, its rx_handler should consume or drop all packets, and not allow broadcast packets to also be handled locally. This one-line change seems to do the trick: --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -411,7 +411,7 @@ static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb) rx_handler_result_t handle_res; port = macvlan_port_get_rcu(skb->dev); - if (is_multicast_ether_addr(eth->h_dest)) { + if (is_multicast_ether_addr(eth->h_dest) && !port->passthru) { skb = ip_check_defrag(dev_net(skb->dev), skb, IP_DEFRAG_MACVLAN); if (!skb) return RX_HANDLER_CONSUMED; Well, mostly. I still see FilterInput log messages in the brief window between creating the VLAN interface and attaching the macvtap to it, since there's no rx_handler to consume them. Hooking the VLAN interface to a bridge rather than a macvtap suppresses local IP processing on the slave but enables it on the bridge master interface. Apparently any non-slave interface can handle IP traffic to some extent, even if it doesn't have an IP address. I worry that allowing any IP processing at all on eth0-VM traffic is a potential security hole, and I'm one configuration typo away from letting VM's traffic leak into another VM or a host application, and vice versa. And logging those FilterInput messages for non-local traffic just looks like sloppy security. Is there some way to stop all local protocols from handling packets received on an interface--a protocol-agnostic equivalent of net.ipv6.conf.INTF.disable_ipv6? Would it be reasonable to implement one? --Ed
Re: [PATCH v2 00/11] staging: octeon: multi rx group (queue) support
On 8/31/16 13:57, Aaro Koskinen wrote: > This series implements multiple RX group support that should improve > the networking performance on multi-core OCTEONs. Basically we register > IRQ and NAPI for each group, and ask the HW to select the group for > the incoming packets based on hash. > > Tested on EdgeRouter Lite with a simple forwarding test using two flows > and 16 RX groups distributed between two cores - the routing throughput > is roughly doubled. > > Also tested with EBH5600 (8 cores) and EBB6800 (16 cores) by sending > and receiving traffic in both directions using SGMII interfaces. With this series on 4.4.19, rx works with receive_group_order > 0. Setting receive_group_order=4, I do see 16 Ethernet interrupts. I tried fiddling with various smp_affinity values (e.g. setting them all to , or assigning a different one to each interrupt, or giving a few to some and a few to others), as well as different values for rps_cpus. 10-thread parallel iperf performance varies between 0.5 and 1.5 Gbit/sec total depending on the particular settings. With the SDK kernel I get over 8 Gbit/sec. It seems to be achieving that using just one interrupt (not even a separate one for tx, as far as I can tell) pegged to CPU 0 (the default smp_affinity). I must be missing some other major configuration tweak, perhaps specific to 10G. Can you run a test on the EBB6800 with the interfaces in 10G mode? --Ed
Re: [PATCH 0/9] staging: octeon: multi rx group (queue) support
On 8/31/16 14:20, Aaro Koskinen wrote: > On Wed, Aug 31, 2016 at 09:20:07AM -0700, Ed Swierk wrote: >> Here's my workaround: > > [...] > >> -static int cvm_oct_poll(struct oct_rx_group *rx_group, int budget) >> +static int cvm_oct_poll(int group, int budget) >> { >> const int coreid = cvmx_get_core_num(); >> u64 old_group_mask; >> @@ -181,13 +181,13 @@ static int cvm_oct_poll(struct oct_rx_group *rx_group, >> int budget) >> if (OCTEON_IS_MODEL(OCTEON_CN68XX)) { >> old_group_mask = cvmx_read_csr(CVMX_SSO_PPX_GRP_MSK(coreid)); >> cvmx_write_csr(CVMX_SSO_PPX_GRP_MSK(coreid), >> - BIT(rx_group->group)); >> + BIT(group)); >> @@ -447,7 +447,7 @@ static int cvm_oct_napi_poll(struct napi_struct *napi, >> int budget) >> napi); >> int rx_count; >> >> -rx_count = cvm_oct_poll(rx_group, budget); >> +rx_count = cvm_oct_poll(rx_group->group, budget); > > I'm confused - there should be no difference?! I can't figure out the difference either. I get a crash within the first couple packets, while with the workaround I can't get it to crash at all. It always bombs in netif_receive_skb(), which isn't very close to any rx_group pointer dereference. # ping 172.16.100.253 PING 172.16.100.253 (172.16.100.253): 56 data bytes Data bus error, epc == 803fd4ac, ra == 801943d8 Oops[#1]: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.19+ #94 task: 80863e80 ti: 8084 task.ti: 8084 $ 0 : 80126078 ef7bdef7bdef7bdf 815d3860 $ 4 : 80e045c8 81aae950 81aae950 $ 8 : 81aae950 0038 0070 03bf $12 : 5400 03bd $16 : 81aae950 81aae950 80e045c8 $20 : 00fa 0001 5c02e0fa $24 : 0062 80548468 $28 : 8084 808436d0 80feba38 801943d8 Hi: Lo: 05198e3760c0 epc : 803fd4ac __list_add_rcu+0x7c/0xa0 ra: 801943d8 __lock_acquire+0xd94/0x1bf0 Status: 10008ce2KX SX UX KERNEL EXL Cause : 40808c1c (ExcCode 07) PrId : 000d910a (Cavium Octeon II) Modules linked in: Process swapper/0 (pid: 0, threadinfo=8084, task=80863e80, tls=) Stack : 80863e80 808646c8 81aae950 801943d8 00fa 808646c0 0002 8057ab90 80864690 80870990 0001 0017 80193e08 0017 80864688 0001 8057ab90 808a7d28 80007f4b7500 80007a0b52e8 0001 807f 80007f768068 8085fac8 8019568c 808a7d10 80645e60 80007f4a8600 0254 808a7d58 8057ab90 0008 80007f7680a0 ... Call Trace: [<__list_add_rcu at list_debug.c:97 (discriminator 2)>] __list_add_rcu+0x7c/0xa0 [] __lock_acquire+0xd94/0x1bf0 [] lock_acquire+0x50/0x78 [<__raw_read_lock at rwlock_api_smp.h:150 (inlined by) _raw_read_lock at spinlock.c:223>] _raw_read_lock+0x4c/0x90 [] raw_local_deliver+0x58/0x1e8 [] ip_local_deliver_finish+0x118/0x4a8 [] ip_local_deliver+0x68/0xe0 [] ip_rcv+0x398/0x478 [<__netif_receive_skb_core at dev.c:3948>] __netif_receive_skb_core+0x764/0x818 [] netif_receive_skb_internal+0x148/0x214 [] cvm_oct_napi_poll+0x790/0xa2c [] net_rx_action+0x130/0x2e0 [] __do_softirq+0x1f0/0x318 [] irq_exit+0x64/0xcc [] octeon_irq_ciu2+0x154/0x1c4 [] plat_irq_dispatch+0x70/0x108 [] ret_from_irq+0x0/0x4 [] __r4k_wait+0x20/0x40 [] cpu_startup_entry+0x154/0x1d0 [] start_kernel+0x538/0x554 Presumably there's some sort of race condition that my change doesn't really fix but happens to avoid by dereferencing rx_group just once early on? --Ed
Re: [PATCH 0/9] staging: octeon: multi rx group (queue) support
Aaro Koskinen wrote: > Oops, looks like I tested without CONFIG_NET_POLL_CONTROLLER enabled > and that seems to be broken. Sorry. I'm not using CONFIG_NET_POLL_CONTROLLER either; the problem is in the normal cvm_oct_napi_poll() path. Here's my workaround: --- a/drivers/staging/octeon/ethernet-rx.c +++ b/drivers/staging/octeon/ethernet-rx.c @@ -159,7 +159,7 @@ static inline int cvm_oct_check_rcv_error(cvmx_wqe_t *work) return 0; } -static int cvm_oct_poll(struct oct_rx_group *rx_group, int budget) +static int cvm_oct_poll(int group, int budget) { const int coreid = cvmx_get_core_num(); u64 old_group_mask; @@ -181,13 +181,13 @@ static int cvm_oct_poll(struct oct_rx_group *rx_group, int budget) if (OCTEON_IS_MODEL(OCTEON_CN68XX)) { old_group_mask = cvmx_read_csr(CVMX_SSO_PPX_GRP_MSK(coreid)); cvmx_write_csr(CVMX_SSO_PPX_GRP_MSK(coreid), - BIT(rx_group->group)); + BIT(group)); cvmx_read_csr(CVMX_SSO_PPX_GRP_MSK(coreid)); /* Flush */ } else { old_group_mask = cvmx_read_csr(CVMX_POW_PP_GRP_MSKX(coreid)); cvmx_write_csr(CVMX_POW_PP_GRP_MSKX(coreid), (old_group_mask & ~0xull) | - BIT(rx_group->group)); + BIT(group)); } if (USE_ASYNC_IOBDMA) { @@ -212,15 +212,15 @@ static int cvm_oct_poll(struct oct_rx_group *rx_group, int budget) if (!work) { if (OCTEON_IS_MODEL(OCTEON_CN68XX)) { cvmx_write_csr(CVMX_SSO_WQ_IQ_DIS, - BIT(rx_group->group)); + BIT(group)); cvmx_write_csr(CVMX_SSO_WQ_INT, - BIT(rx_group->group)); + BIT(group)); } else { union cvmx_pow_wq_int wq_int; wq_int.u64 = 0; - wq_int.s.iq_dis = BIT(rx_group->group); - wq_int.s.wq_int = BIT(rx_group->group); + wq_int.s.iq_dis = BIT(group); + wq_int.s.wq_int = BIT(group); cvmx_write_csr(CVMX_POW_WQ_INT, wq_int.u64); } break; @@ -447,7 +447,7 @@ static int cvm_oct_napi_poll(struct napi_struct *napi, int budget) napi); int rx_count; - rx_count = cvm_oct_poll(rx_group, budget); + rx_count = cvm_oct_poll(rx_group->group, budget); if (rx_count < budget) { /* No more work */ @@ -466,7 +466,7 @@ static int cvm_oct_napi_poll(struct napi_struct *napi, int budget) */ void cvm_oct_poll_controller(struct net_device *dev) { - cvm_oct_poll(oct_rx_group, 16); + cvm_oct_poll(oct_rx_group->group, 16); } #endif > Can you see multiple ethernet IRQs in /proc/interrupts and their > counters increasing? > > With receive_group_order=4 you should see 16 IRQs. I see the 16 IRQs, and the first one does increase. But packets don't make it to the application. --Ed
Re: [PATCH 0/9] staging: octeon: multi rx group (queue) support
Hi Aaro, On Tue, Aug 30, 2016 at 11:47 AM, Aaro Koskinen wrote: > This series implements multiple RX group support that should improve > the networking performance on multi-core OCTEONs. Basically we register > IRQ and NAPI for each group, and ask the HW to select the group for > the incoming packets based on hash. > > Tested on EdgeRouter Lite with a simple forwarding test using two flows > and 16 RX groups distributed between two cores - the routing throughput > is roughly doubled. I applied the series to my 4.4.19 tree, which involved backporting a bunch of other patches from master, most of them trivial. When I test it on a Cavium Octeon 2 (CN6880) board, I get an immediate crash (bus error) in the netif_receive_skb() call from cvm_oct_poll(). Replacing the rx_group argument to cvm_oct_poll() with int group, and dereferencing rx_group->group in the caller (cvm_oct_napi_poll()) instead makes the crash disappear. Apparently there's some race in dereferencing rx_group from within cvm_oct_poll(). With this workaround in place, I can send and receive on XAUI interfaces, but don't see any performance improvement. I'm guessing I need to set receive_group_order > 0. But any value between 1 and 4 seems to break rx altogether. When I ping another host I see both request and response on the wire, and the interface counters increase, but the response doesn't make it back to ping. Is some other configuration needed to make use of multiple rx groups? --Ed
Re: [PATCH] xenbus: don't look up transaction IDs for ordinary writes
On Tue, Aug 16, 2016 at 3:07 AM, Juergen Gross wrote: > On 15/08/16 17:02, Jan Beulich wrote: >> This should really only be done for XS_TRANSACTION_END messages, or >> else at least some of the xenstore-* tools don't work anymore. >> >> Fixes: 0beef634b8 ("xenbus: don't BUG() on user mode induced condition") >> Reported-by: Richard Schütz >> Cc: >> Signed-off-by: Jan Beulich >> Tested-by: Richard Schütz > > Reviewed-by: Juergen Gross Confirmed that this patch fixes xenstore-read, xenstore-ls et al, on stable kernel 4.4.17. It should also go to 4.7, 4.6, 4.1, 3.18. --Ed
Re: [PATCH v9 0/5] tpm: Command duration logging and chip-specific override
On Wed, Jul 13, 2016 at 10:36 AM, Jason Gunthorpe wrote: > I think your bios is broken? The BIOS is broken in many ways. I already have to pass memmap=256M$0x8000, otherwise PCIe extended config space (MMCONFIG) is inaccessible. Also I found memmap=0x7000$0x7a7d works around "APEI: Can not request [mem 0x7a7d0018-0x7a7d0067] for APEI ERST registers", as the BIOS seems to be mistakenly reserving 0x7b7d-7b7d7000 instead. > A working BIOS will look like this: > > $ cat /proc/iomem | grep -i fed400 > fed4-fed44fff : pnp 00:00 > > It sets aside the struct resource during pnp: > > [0.097318] pnp: PnP ACPI init > [0.097366] system 00:00: [mem 0xfed4-0xfed44fff] has been reserved > > What did your system do? > > You should see prints like this: > > printk(KERN_DEBUG >"e820: reserve RAM buffer [mem %#010llx-%#010llx]\n", >start, end); > > Which only happen if E820_RAM is set, which is certainly not right for > TPM memory. On my system I see e820: reserve RAM buffer [mem 0x0009b000-0x0009] e820: reserve RAM buffer [mem 0x75b02000-0x77ff] e820: reserve RAM buffer [mem 0x7a1f6000-0x7bff] e820: reserve RAM buffer [mem 0x7b80-0x7bff] e820: reserve RAM buffer [mem 0xfed3-0x] e820: reserve RAM buffer [mem 0x505deb000-0x507ff] which doesn't make a whole lot of sense, as several of those areas overlap each other, never mind devices. > I don't know what kernel convention is to handle these sorts of > defects? > > Is the use of the memmap kernel command line an appropriate work > around? It works for me, though I would like to know if there's another approach. --Ed
Re: [PATCH v9 0/5] tpm: Command duration logging and chip-specific override
On Wed, Jul 13, 2016 at 9:19 AM, Ed Swierk wrote: > v9: Include command duration in existing error messages rather than > logging an extra debug message. Rebase onto Jarkko's tree. Incidentally, with Jarkko's tree the tpm_tis module refuses to initialize (with or without force=1): tpm_tis 00:03: can't request region for resource [mem 0xfed4-0xfed44fff] tpm_tis: probe of 00:03 failed with error -16 The memory region is not marked reserved by the BIOS: e820: BIOS-provided physical RAM map: Xen: [mem 0x-0x0005] usable Xen: [mem 0x0006-0x00067fff] reserved Xen: [mem 0x00068000-0x0009afff] usable Xen: [mem 0x000a-0x000f] reserved Xen: [mem 0x0010-0x75b01fff] usable Xen: [mem 0x75baf000-0x7a1f5fff] usable Xen: [mem 0x7b7d7000-0x7b7f] usable Xen: [mem 0x7bf0-0x7bff] usable Xen: [mem 0xc7ffc000-0xc7ffcfff] reserved Xen: [mem 0xfbffc000-0xfbffcfff] reserved Xen: [mem 0xfec0-0xfec01fff] reserved Xen: [mem 0xfec4-0xfec40fff] reserved Xen: [mem 0xfed2-0xfed2] usable Xen: [mem 0xfee0-0xfeef] reserved Xen: [mem 0x0001-0x000505deafff] usable With force=1, /proc/iomem shows: fed3-fedf : RAM buffer fed4-fed44fff : tpm_tis I can work around this problem by passing memmap=0x5000$0xfed4 on the kernel command line. --Ed
[PATCH v9 2/5] tpm: Add optional logging of TPM command durations
Some TPMs violate their own advertised command durations. This is much easier to debug with data about how long each command actually takes to complete. Add debug messages that can be enabled by running echo -n 'module tpm +p' >/sys/kernel/debug/dynamic_debug/control on a kernel configured with DYNAMIC_DEBUG=y. Signed-off-by: Ed Swierk Reviewed-by: Jarkko Sakkinen --- drivers/char/tpm/tpm-interface.c | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index 5e3c1b6..a4beb53 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -335,13 +335,14 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf, { ssize_t rc; u32 count, ordinal; - unsigned long stop; + unsigned long start, stop; if (bufsiz > TPM_BUFSIZE) bufsiz = TPM_BUFSIZE; count = be32_to_cpu(*((__be32 *) (buf + 2))); ordinal = be32_to_cpu(*((__be32 *) (buf + 6))); + dev_dbg(&chip->dev, "starting command %d count %d\n", ordinal, count); if (count == 0) return -ENODATA; if (count > bufsiz) { @@ -362,18 +363,23 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf, if (chip->flags & TPM_CHIP_FLAG_IRQ) goto out_recv; + start = jiffies; if (chip->flags & TPM_CHIP_FLAG_TPM2) - stop = jiffies + tpm2_calc_ordinal_duration(chip, ordinal); + stop = start + tpm2_calc_ordinal_duration(chip, ordinal); else - stop = jiffies + tpm_calc_ordinal_duration(chip, ordinal); + stop = start + tpm_calc_ordinal_duration(chip, ordinal); do { u8 status = chip->ops->status(chip); if ((status & chip->ops->req_complete_mask) == - chip->ops->req_complete_val) + chip->ops->req_complete_val) { + dev_dbg(&chip->dev, "completed command %d in %d ms\n", + ordinal, jiffies_to_msecs(jiffies - start)); goto out_recv; + } if (chip->ops->req_canceled(chip, status)) { - dev_err(&chip->dev, "Operation Canceled\n"); + dev_err(&chip->dev, "canceled command %d after %d ms\n", + ordinal, jiffies_to_msecs(jiffies - start)); rc = -ECANCELED; goto out; } @@ -383,7 +389,8 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf, } while (time_before(jiffies, stop)); chip->ops->cancel(chip); - dev_err(&chip->dev, "Operation Timed out\n"); + dev_err(&chip->dev, "command %d timed out after %d ms\n", ordinal, + jiffies_to_msecs(jiffies - start)); rc = -ETIME; goto out; -- 1.9.1
[PATCH v9 3/5] tpm: Clean up reading of timeout and duration capabilities
Call tpm_getcap() from tpm_get_timeouts() to eliminate redundant code. Return all errors to the caller rather than swallowing them (e.g. when tpm_transmit_cmd() returns nonzero). Signed-off-by: Ed Swierk --- drivers/char/tpm/tpm-interface.c | 74 +++- 1 file changed, 27 insertions(+), 47 deletions(-) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index a4beb53..dc492ee 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -460,9 +460,19 @@ ssize_t tpm_getcap(struct tpm_chip *chip, __be32 subcap_id, cap_t *cap, tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); tpm_cmd.params.getcap_in.subcap = subcap_id; } + rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, desc); + + if (!rc && + ((subcap_id == TPM_CAP_PROP_TIS_TIMEOUT && + be32_to_cpu(tpm_cmd.header.out.length) != TPM_HEADER_SIZE + 20) || +(subcap_id == TPM_CAP_PROP_TIS_DURATION && + be32_to_cpu(tpm_cmd.header.out.length) != TPM_HEADER_SIZE + 16))) + rc = -EINVAL; + if (!rc) *cap = tpm_cmd.params.getcap_out.cap; + return rc; } @@ -503,10 +513,9 @@ static int tpm_startup(struct tpm_chip *chip, __be16 startup_type) int tpm_get_timeouts(struct tpm_chip *chip) { - struct tpm_cmd_t tpm_cmd; + cap_t cap; unsigned long new_timeout[4]; unsigned long old_timeout[4]; - struct duration_t *duration_cap; ssize_t rc; if (chip->flags & TPM_CHIP_FLAG_TPM2) { @@ -524,42 +533,25 @@ int tpm_get_timeouts(struct tpm_chip *chip) return 0; } - tpm_cmd.header.in = tpm_getcap_header; - tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; - tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT; - rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, NULL); - + rc = tpm_getcap(chip, TPM_CAP_PROP_TIS_TIMEOUT, &cap, + "attempting to determine the timeouts"); if (rc == TPM_ERR_INVALID_POSTINIT) { /* The TPM is not started, we are the first to talk to it. Execute a startup command. */ - dev_info(&chip->dev, "Issuing TPM_STARTUP"); + dev_info(&chip->dev, "Issuing TPM_STARTUP\n"); if (tpm_startup(chip, TPM_ST_CLEAR)) return rc; - tpm_cmd.header.in = tpm_getcap_header; - tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; - tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT; - rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, - NULL); - } - if (rc) { - dev_err(&chip->dev, - "A TPM error (%zd) occurred attempting to determine the timeouts\n", - rc); - goto duration; + rc = tpm_getcap(chip, TPM_CAP_PROP_TIS_TIMEOUT, &cap, + "attempting to determine the timeouts"); } + if (rc) + return rc; - if (be32_to_cpu(tpm_cmd.header.out.return_code) != 0 || - be32_to_cpu(tpm_cmd.header.out.length) - != sizeof(tpm_cmd.header.out) + sizeof(u32) + 4 * sizeof(u32)) - return -EINVAL; - - old_timeout[0] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.a); - old_timeout[1] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.b); - old_timeout[2] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.c); - old_timeout[3] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.d); + old_timeout[0] = be32_to_cpu(cap.timeout.a); + old_timeout[1] = be32_to_cpu(cap.timeout.b); + old_timeout[2] = be32_to_cpu(cap.timeout.c); + old_timeout[3] = be32_to_cpu(cap.timeout.d); memcpy(new_timeout, old_timeout, sizeof(new_timeout)); /* @@ -597,29 +589,17 @@ int tpm_get_timeouts(struct tpm_chip *chip) chip->timeout_c = usecs_to_jiffies(new_timeout[2]); chip->timeout_d = usecs_to_jiffies(new_timeout[3]); -duration: - tpm_cmd.header.in = tpm_getcap_header; - tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; - tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_DURATION; - - rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, - "attempting to determine the durations"); + rc = tpm_getcap(chip, TPM_CAP_PROP_TIS_DURATION
[PATCH v9 4/5] tpm: Allow TPM chip drivers to override reported command durations
Some TPM chips report bogus command durations in their capabilities, just as others report incorrect timeouts. Rework tpm_get_timeouts() to allow chip drivers to override either via a single callback. Also clean up handling of TPMs that report milliseconds instead of microseconds. Signed-off-by: Ed Swierk --- drivers/char/tpm/tpm-interface.c | 148 ++- drivers/char/tpm/tpm_tis_core.c | 32 +++-- include/linux/tpm.h | 3 +- 3 files changed, 94 insertions(+), 89 deletions(-) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index dc492ee..9dafc25 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -513,10 +513,10 @@ static int tpm_startup(struct tpm_chip *chip, __be16 startup_type) int tpm_get_timeouts(struct tpm_chip *chip) { - cap_t cap; - unsigned long new_timeout[4]; - unsigned long old_timeout[4]; - ssize_t rc; + cap_t cap1, cap2; + int rc; + unsigned long orig_timeout_a, orig_timeout_b, orig_timeout_c, + orig_timeout_d, orig_duration[3]; if (chip->flags & TPM_CHIP_FLAG_TPM2) { /* Fixed timeouts for TPM2 */ @@ -533,7 +533,7 @@ int tpm_get_timeouts(struct tpm_chip *chip) return 0; } - rc = tpm_getcap(chip, TPM_CAP_PROP_TIS_TIMEOUT, &cap, + rc = tpm_getcap(chip, TPM_CAP_PROP_TIS_TIMEOUT, &cap1, "attempting to determine the timeouts"); if (rc == TPM_ERR_INVALID_POSTINIT) { /* The TPM is not started, we are the first to talk to it. @@ -542,77 +542,95 @@ int tpm_get_timeouts(struct tpm_chip *chip) if (tpm_startup(chip, TPM_ST_CLEAR)) return rc; - rc = tpm_getcap(chip, TPM_CAP_PROP_TIS_TIMEOUT, &cap, + rc = tpm_getcap(chip, TPM_CAP_PROP_TIS_TIMEOUT, &cap1, "attempting to determine the timeouts"); } if (rc) return rc; - old_timeout[0] = be32_to_cpu(cap.timeout.a); - old_timeout[1] = be32_to_cpu(cap.timeout.b); - old_timeout[2] = be32_to_cpu(cap.timeout.c); - old_timeout[3] = be32_to_cpu(cap.timeout.d); - memcpy(new_timeout, old_timeout, sizeof(new_timeout)); - - /* -* Provide ability for vendor overrides of timeout values in case -* of misreporting. -*/ - if (chip->ops->update_timeouts != NULL) - chip->timeout_adjusted = - chip->ops->update_timeouts(chip, new_timeout); - - if (!chip->timeout_adjusted) { - /* Don't overwrite default if value is 0 */ - if (new_timeout[0] != 0 && new_timeout[0] < 1000) { - int i; - - /* timeouts in msec rather usec */ - for (i = 0; i != ARRAY_SIZE(new_timeout); i++) - new_timeout[i] *= 1000; - chip->timeout_adjusted = true; - } - } - - /* Report adjusted timeouts */ - if (chip->timeout_adjusted) { - dev_info(&chip->dev, -HW_ERR "Adjusting reported timeouts: A %lu->%luus B %lu->%luus C %lu->%luus D %lu->%luus\n", -old_timeout[0], new_timeout[0], -old_timeout[1], new_timeout[1], -old_timeout[2], new_timeout[2], -old_timeout[3], new_timeout[3]); - } - - chip->timeout_a = usecs_to_jiffies(new_timeout[0]); - chip->timeout_b = usecs_to_jiffies(new_timeout[1]); - chip->timeout_c = usecs_to_jiffies(new_timeout[2]); - chip->timeout_d = usecs_to_jiffies(new_timeout[3]); - - rc = tpm_getcap(chip, TPM_CAP_PROP_TIS_DURATION, &cap, + rc = tpm_getcap(chip, TPM_CAP_PROP_TIS_DURATION, &cap2, "attempting to determine the durations"); if (rc) return rc; - chip->duration[TPM_SHORT] = - usecs_to_jiffies(be32_to_cpu(cap.duration.tpm_short)); - chip->duration[TPM_MEDIUM] = - usecs_to_jiffies(be32_to_cpu(cap.duration.tpm_medium)); - chip->duration[TPM_LONG] = - usecs_to_jiffies(be32_to_cpu(cap.duration.tpm_long)); + be32_to_cpus(&cap1.timeout.a); + be32_to_cpus(&cap1.timeout.b); + be32_to_cpus(&cap1.timeout.c); + be32_to_cpus(&cap1.timeout.d); + chip->timeout_a = usecs_to_jiffies(cap1.timeout.a); + chip->timeout_b = usecs_to_jiffies(cap1.timeout.b); + chip->timeout_c = usecs_to_jiffies(cap1.timeout.c); + chip->timeout_d = usecs_to_jiffies(cap1.timeout.d); - /* The Broadcom BCM0
[PATCH v9 0/5] tpm: Command duration logging and chip-specific override
v9: Include command duration in existing error messages rather than logging an extra debug message. Rebase onto Jarkko's tree. v8: Fix v7 goof-up in tpm_getcap(). v7: Use tpm_getcap() instead of a redundant new function. v6: Split tpm_get_cap_prop() out of tpm_get_timeouts(); always return error on TPM command failure. v5: Use msecs_to_jiffies() instead of * HZ / 1000. v4: Rework tpm_get_timeouts() to allow overriding both timeouts and durations via a single callback. This series - improves TPM command error reporting - adds optional logging of TPM command durations - allows chip-specific override of command durations as well as protocol timeouts - overrides ST19NP18 TPM command duration to avoid lockups Ed Swierk (5): tpm_tis: Improve reporting of IO errors tpm: Add optional logging of TPM command durations tpm: Clean up reading of timeout and duration capabilities tpm: Allow TPM chip drivers to override reported command durations tpm_tis: Increase ST19NP18 TPM command duration to avoid chip lockup drivers/char/tpm/tpm-interface.c | 219 --- drivers/char/tpm/tpm_tis_core.c | 46 include/linux/tpm.h | 3 +- 3 files changed, 136 insertions(+), 132 deletions(-) -- 1.9.1
[PATCH v9 5/5] tpm_tis: Increase ST19NP18 TPM command duration to avoid chip lockup
The STMicro ST19NP18-TPM sometimes takes much longer to execute commands than it reports in its capabilities. For example, command 186 (TPM_FlushSpecific) has been observed to take 14560 msec to complete, far longer than the 3000 msec limit for "short" commands reported by the chip. The behavior has also been seen with command 101 (TPM_GetCapability). Worse, when the tpm_tis driver attempts to cancel the current command (by writing commandReady = 1 to TPM_STS_x), the chip locks up completely, returning all-1s from all memory-mapped register reads. The lockup can be cleared only by resetting the system. The occurrence of this excessive command duration depends on the sequence of commands preceding it. One sequence is creating at least 2 new keys via TPM_CreateWrapKey, then letting the TPM idle for at least 30 seconds, then loading a key via TPM_LoadKey2. The next TPM_FlushSpecific occasionally takes tens of seconds to complete. Another sequence is creating many keys in a row without pause. The TPM_CreateWrapKey operation gets much slower after the first few iterations, as one would expect when the pool of precomputed keys is exhausted. Then after a 35-second pause, the same TPM_LoadKey2 followed by TPM_FlushSpecific sequence triggers the behavior. Our working theory is that this older TPM sometimes pauses to precompute keys, which modern chips implement as a background process. Without access to the chip's implementation details it's impossible to know whether any commands are immune to being blocked by this process. So it seems safest to ignore the chip's reported command durations, and use a value much higher than any observed duration, like 180 sec (which is the duration this chip reports for "long" commands). Signed-off-by: Ed Swierk --- drivers/char/tpm/tpm_tis_core.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c index f013664..84d6008 100644 --- a/drivers/char/tpm/tpm_tis_core.c +++ b/drivers/char/tpm/tpm_tis_core.c @@ -416,6 +416,12 @@ static void tpm_tis_update_timeouts(struct tpm_chip *chip) chip->timeout_d = msecs_to_jiffies(TIS_SHORT_TIMEOUT); chip->timeout_adjusted = true; break; + case 0x104a: /* STMicro ST19NP18-TPM */ + chip->duration[TPM_SHORT] = 180 * HZ; + chip->duration[TPM_MEDIUM] = 180 * HZ; + chip->duration[TPM_LONG] = 180 * HZ; + chip->duration_adjusted = true; + break; } } -- 1.9.1
[PATCH v9 1/5] tpm_tis: Improve reporting of IO errors
Mysterious TPM behavior can be difficult to track down through all the layers of software. Add error messages for conditions that should never happen. Also include the manufacturer ID along with other chip data printed during init. Signed-off-by: Ed Swierk Reviewed-by: Jarkko Sakkinen --- drivers/char/tpm/tpm_tis_core.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c index 8110b52..e62fdeb 100644 --- a/drivers/char/tpm/tpm_tis_core.c +++ b/drivers/char/tpm/tpm_tis_core.c @@ -217,6 +217,8 @@ static int tpm_tis_recv(struct tpm_chip *chip, u8 *buf, size_t count) expected = be32_to_cpu(*(__be32 *) (buf + 2)); if (expected > count) { + dev_err(&chip->dev, "Response too long (wanted %zd, got %d)\n", + count, expected); size = -EIO; goto out; } @@ -283,6 +285,8 @@ static int tpm_tis_send_data(struct tpm_chip *chip, u8 *buf, size_t len) &priv->int_queue, false); status = tpm_tis_status(chip); if (!itpm && (status & TPM_STS_DATA_EXPECT) == 0) { + dev_err(&chip->dev, "Chip not accepting %zd bytes\n", + len - count); rc = -EIO; goto out_err; } @@ -297,6 +301,7 @@ static int tpm_tis_send_data(struct tpm_chip *chip, u8 *buf, size_t len) &priv->int_queue, false); status = tpm_tis_status(chip); if (!itpm && (status & TPM_STS_DATA_EXPECT) != 0) { + dev_err(&chip->dev, "Chip not accepting last byte\n"); rc = -EIO; goto out_err; } @@ -707,8 +712,9 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq, if (rc < 0) goto out_err; - dev_info(dev, "%s TPM (device-id 0x%X, rev-id %d)\n", + dev_info(dev, "%s TPM (manufacturer-id 0x%X, device-id 0x%X, rev-id %d)\n", (chip->flags & TPM_CHIP_FLAG_TPM2) ? "2.0" : "1.2", +priv->manufacturer_id, vendor >> 16, rid); if (!(priv->flags & TPM_TIS_ITPM_POSSIBLE)) { -- 1.9.1
[PATCH v8 3/5] tpm: Clean up reading of timeout and duration capabilities
Call tpm_getcap() from tpm_get_timeouts() to eliminate redundant code. Return all errors to the caller rather than swallowing them (e.g. when tpm_transmit_cmd() returns nonzero). Signed-off-by: Ed Swierk --- drivers/char/tpm/tpm-interface.c | 74 +++- 1 file changed, 27 insertions(+), 47 deletions(-) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index cc1e5bc..b65c139 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -461,9 +461,19 @@ ssize_t tpm_getcap(struct device *dev, __be32 subcap_id, cap_t *cap, tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); tpm_cmd.params.getcap_in.subcap = subcap_id; } + rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, desc); + + if (!rc && + ((subcap_id == TPM_CAP_PROP_TIS_TIMEOUT && + be32_to_cpu(tpm_cmd.header.out.length) != TPM_HEADER_SIZE + 20) || +(subcap_id == TPM_CAP_PROP_TIS_DURATION && + be32_to_cpu(tpm_cmd.header.out.length) != TPM_HEADER_SIZE + 16))) + rc = -EINVAL; + if (!rc) *cap = tpm_cmd.params.getcap_out.cap; + return rc; } @@ -504,48 +514,30 @@ static int tpm_startup(struct tpm_chip *chip, __be16 startup_type) int tpm_get_timeouts(struct tpm_chip *chip) { - struct tpm_cmd_t tpm_cmd; + cap_t cap; unsigned long new_timeout[4]; unsigned long old_timeout[4]; - struct duration_t *duration_cap; ssize_t rc; - tpm_cmd.header.in = tpm_getcap_header; - tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; - tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT; - rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, NULL); - + rc = tpm_getcap(chip->pdev, TPM_CAP_PROP_TIS_TIMEOUT, &cap, + "attempting to determine the timeouts"); if (rc == TPM_ERR_INVALID_POSTINIT) { /* The TPM is not started, we are the first to talk to it. Execute a startup command. */ - dev_info(chip->pdev, "Issuing TPM_STARTUP"); + dev_info(chip->pdev, "Issuing TPM_STARTUP\n"); if (tpm_startup(chip, TPM_ST_CLEAR)) return rc; - tpm_cmd.header.in = tpm_getcap_header; - tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; - tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT; - rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, - NULL); - } - if (rc) { - dev_err(chip->pdev, - "A TPM error (%zd) occurred attempting to determine the timeouts\n", - rc); - goto duration; + rc = tpm_getcap(chip->pdev, TPM_CAP_PROP_TIS_TIMEOUT, &cap, + "attempting to determine the timeouts"); } + if (rc) + return rc; - if (be32_to_cpu(tpm_cmd.header.out.return_code) != 0 || - be32_to_cpu(tpm_cmd.header.out.length) - != sizeof(tpm_cmd.header.out) + sizeof(u32) + 4 * sizeof(u32)) - return -EINVAL; - - old_timeout[0] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.a); - old_timeout[1] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.b); - old_timeout[2] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.c); - old_timeout[3] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.d); + old_timeout[0] = be32_to_cpu(cap.timeout.a); + old_timeout[1] = be32_to_cpu(cap.timeout.b); + old_timeout[2] = be32_to_cpu(cap.timeout.c); + old_timeout[3] = be32_to_cpu(cap.timeout.d); memcpy(new_timeout, old_timeout, sizeof(new_timeout)); /* @@ -583,29 +575,17 @@ int tpm_get_timeouts(struct tpm_chip *chip) chip->vendor.timeout_c = usecs_to_jiffies(new_timeout[2]); chip->vendor.timeout_d = usecs_to_jiffies(new_timeout[3]); -duration: - tpm_cmd.header.in = tpm_getcap_header; - tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; - tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_DURATION; - - rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, - "attempting to determine the durations"); + rc = tpm_getcap(chip->pdev, TPM_CAP_PROP_TIS_DURATION, &cap, + "attempting to determine the durations"); if (rc) return rc; - if (be32_to
[PATCH v8 2/5] tpm: Add optional logging of TPM command durations
Some TPMs violate their own advertised command durations. This is much easier to debug with data about how long each command actually takes to complete. Add debug messages that can be enabled by running echo -n 'module tpm +p' >/sys/kernel/debug/dynamic_debug/control on a kernel configured with DYNAMIC_DEBUG=y. Signed-off-by: Ed Swierk Reviewed-by: Jarkko Sakkinen --- drivers/char/tpm/tpm-interface.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index c50637d..cc1e5bc 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -333,13 +333,14 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf, { ssize_t rc; u32 count, ordinal; - unsigned long stop; + unsigned long start, stop; if (bufsiz > TPM_BUFSIZE) bufsiz = TPM_BUFSIZE; count = be32_to_cpu(*((__be32 *) (buf + 2))); ordinal = be32_to_cpu(*((__be32 *) (buf + 6))); + dev_dbg(chip->pdev, "starting command %d count %d\n", ordinal, count); if (count == 0) return -ENODATA; if (count > bufsiz) { @@ -360,18 +361,24 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf, if (chip->vendor.irq) goto out_recv; + start = jiffies; if (chip->flags & TPM_CHIP_FLAG_TPM2) - stop = jiffies + tpm2_calc_ordinal_duration(chip, ordinal); + stop = start + tpm2_calc_ordinal_duration(chip, ordinal); else - stop = jiffies + tpm_calc_ordinal_duration(chip, ordinal); + stop = start + tpm_calc_ordinal_duration(chip, ordinal); do { u8 status = chip->ops->status(chip); if ((status & chip->ops->req_complete_mask) == - chip->ops->req_complete_val) + chip->ops->req_complete_val) { + dev_dbg(chip->pdev, "completed command %d in %d ms\n", + ordinal, jiffies_to_msecs(jiffies - start)); goto out_recv; + } if (chip->ops->req_canceled(chip, status)) { dev_err(chip->pdev, "Operation Canceled\n"); + dev_dbg(chip->pdev, "canceled command %d after %d ms\n", + ordinal, jiffies_to_msecs(jiffies - start)); rc = -ECANCELED; goto out; } @@ -382,6 +389,8 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf, chip->ops->cancel(chip); dev_err(chip->pdev, "Operation Timed out\n"); + dev_dbg(chip->pdev, "command %d timed out after %d ms\n", ordinal, + jiffies_to_msecs(jiffies - start)); rc = -ETIME; goto out; -- 1.9.1
[PATCH v8 4/5] tpm: Allow TPM chip drivers to override reported command durations
Some TPM chips report bogus command durations in their capabilities, just as others report incorrect timeouts. Rework tpm_get_timeouts() to allow chip drivers to override either via a single callback. Also clean up handling of TPMs that report milliseconds instead of microseconds. Signed-off-by: Ed Swierk --- drivers/char/tpm/tpm-interface.c | 143 +-- drivers/char/tpm/tpm_tis.c | 35 +++--- include/linux/tpm.h | 3 +- 3 files changed, 88 insertions(+), 93 deletions(-) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index b65c139..5f98488 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -514,12 +514,11 @@ static int tpm_startup(struct tpm_chip *chip, __be16 startup_type) int tpm_get_timeouts(struct tpm_chip *chip) { - cap_t cap; - unsigned long new_timeout[4]; - unsigned long old_timeout[4]; - ssize_t rc; + cap_t cap1, cap2; + int rc; + struct tpm_vendor_specific orig_vendor; - rc = tpm_getcap(chip->pdev, TPM_CAP_PROP_TIS_TIMEOUT, &cap, + rc = tpm_getcap(chip->pdev, TPM_CAP_PROP_TIS_TIMEOUT, &cap1, "attempting to determine the timeouts"); if (rc == TPM_ERR_INVALID_POSTINIT) { /* The TPM is not started, we are the first to talk to it. @@ -528,77 +527,91 @@ int tpm_get_timeouts(struct tpm_chip *chip) if (tpm_startup(chip, TPM_ST_CLEAR)) return rc; - rc = tpm_getcap(chip->pdev, TPM_CAP_PROP_TIS_TIMEOUT, &cap, + rc = tpm_getcap(chip->pdev, TPM_CAP_PROP_TIS_TIMEOUT, &cap1, "attempting to determine the timeouts"); } if (rc) return rc; - old_timeout[0] = be32_to_cpu(cap.timeout.a); - old_timeout[1] = be32_to_cpu(cap.timeout.b); - old_timeout[2] = be32_to_cpu(cap.timeout.c); - old_timeout[3] = be32_to_cpu(cap.timeout.d); - memcpy(new_timeout, old_timeout, sizeof(new_timeout)); - - /* -* Provide ability for vendor overrides of timeout values in case -* of misreporting. -*/ - if (chip->ops->update_timeouts != NULL) - chip->vendor.timeout_adjusted = - chip->ops->update_timeouts(chip, new_timeout); - - if (!chip->vendor.timeout_adjusted) { - /* Don't overwrite default if value is 0 */ - if (new_timeout[0] != 0 && new_timeout[0] < 1000) { - int i; - - /* timeouts in msec rather usec */ - for (i = 0; i != ARRAY_SIZE(new_timeout); i++) - new_timeout[i] *= 1000; - chip->vendor.timeout_adjusted = true; - } - } - - /* Report adjusted timeouts */ - if (chip->vendor.timeout_adjusted) { - dev_info(chip->pdev, -HW_ERR "Adjusting reported timeouts: A %lu->%luus B %lu->%luus C %lu->%luus D %lu->%luus\n", -old_timeout[0], new_timeout[0], -old_timeout[1], new_timeout[1], -old_timeout[2], new_timeout[2], -old_timeout[3], new_timeout[3]); - } - - chip->vendor.timeout_a = usecs_to_jiffies(new_timeout[0]); - chip->vendor.timeout_b = usecs_to_jiffies(new_timeout[1]); - chip->vendor.timeout_c = usecs_to_jiffies(new_timeout[2]); - chip->vendor.timeout_d = usecs_to_jiffies(new_timeout[3]); - - rc = tpm_getcap(chip->pdev, TPM_CAP_PROP_TIS_DURATION, &cap, + rc = tpm_getcap(chip->pdev, TPM_CAP_PROP_TIS_DURATION, &cap2, "attempting to determine the durations"); if (rc) return rc; - chip->vendor.duration[TPM_SHORT] = - usecs_to_jiffies(be32_to_cpu(cap.duration.tpm_short)); - chip->vendor.duration[TPM_MEDIUM] = - usecs_to_jiffies(be32_to_cpu(cap.duration.tpm_medium)); - chip->vendor.duration[TPM_LONG] = - usecs_to_jiffies(be32_to_cpu(cap.duration.tpm_long)); + be32_to_cpus(&cap1.timeout.a); + be32_to_cpus(&cap1.timeout.b); + be32_to_cpus(&cap1.timeout.c); + be32_to_cpus(&cap1.timeout.d); + chip->vendor.timeout_a = usecs_to_jiffies(cap1.timeout.a); + chip->vendor.timeout_b = usecs_to_jiffies(cap1.timeout.b); + chip->vendor.timeout_c = usecs_to_jiffies(cap1.timeout.c); + chip->vendor.timeout_d = usecs_to_jiffies(cap1.timeout.d); - /* The Broadcom BCM0102 chipset in a Dell Latitude D820 gets the above -* value wrong and apparently reports msecs rather than u
[PATCH v8 0/5] tpm: Command duration logging and chip-specific override
v8: Fix v7 goof-up in tpm_getcap(). v7: Use tpm_getcap() instead of a redundant new function. v6: Split tpm_get_cap_prop() out of tpm_get_timeouts(); always return error on TPM command failure. v5: Use msecs_to_jiffies() instead of * HZ / 1000. v4: Rework tpm_get_timeouts() to allow overriding both timeouts and durations via a single callback. This series - improves TPM command error reporting - adds optional logging of TPM command durations - allows chip-specific override of command durations as well as protocol timeouts - overrides ST19NP18 TPM command duration to avoid lockups Ed Swierk (5): tpm_tis: Improve reporting of IO errors tpm: Add optional logging of TPM command durations tpm: Clean up reading of timeout and duration capabilities tpm: Allow TPM chip drivers to override reported command durations tpm_tis: Increase ST19NP18 TPM command duration to avoid chip lockup drivers/char/tpm/tpm-interface.c | 212 --- drivers/char/tpm/tpm_tis.c | 49 - include/linux/tpm.h | 3 +- 3 files changed, 130 insertions(+), 134 deletions(-) -- 1.9.1
[PATCH v8 5/5] tpm_tis: Increase ST19NP18 TPM command duration to avoid chip lockup
The STMicro ST19NP18-TPM sometimes takes much longer to execute commands than it reports in its capabilities. For example, command 186 (TPM_FlushSpecific) has been observed to take 14560 msec to complete, far longer than the 3000 msec limit for "short" commands reported by the chip. The behavior has also been seen with command 101 (TPM_GetCapability). Worse, when the tpm_tis driver attempts to cancel the current command (by writing commandReady = 1 to TPM_STS_x), the chip locks up completely, returning all-1s from all memory-mapped register reads. The lockup can be cleared only by resetting the system. The occurrence of this excessive command duration depends on the sequence of commands preceding it. One sequence is creating at least 2 new keys via TPM_CreateWrapKey, then letting the TPM idle for at least 30 seconds, then loading a key via TPM_LoadKey2. The next TPM_FlushSpecific occasionally takes tens of seconds to complete. Another sequence is creating many keys in a row without pause. The TPM_CreateWrapKey operation gets much slower after the first few iterations, as one would expect when the pool of precomputed keys is exhausted. Then after a 35-second pause, the same TPM_LoadKey2 followed by TPM_FlushSpecific sequence triggers the behavior. Our working theory is that this older TPM sometimes pauses to precompute keys, which modern chips implement as a background process. Without access to the chip's implementation details it's impossible to know whether any commands are immune to being blocked by this process. So it seems safest to ignore the chip's reported command durations, and use a value much higher than any observed duration, like 180 sec (which is the duration this chip reports for "long" commands). Signed-off-by: Ed Swierk --- drivers/char/tpm/tpm_tis.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index caf7278..8355b45 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -485,6 +485,12 @@ static void tpm_tis_update_timeouts(struct tpm_chip *chip) chip->vendor.timeout_d = msecs_to_jiffies(TIS_SHORT_TIMEOUT); chip->vendor.timeout_adjusted = true; break; + case 0x104a: /* STMicro ST19NP18-TPM */ + chip->vendor.duration[TPM_SHORT] = 180 * HZ; + chip->vendor.duration[TPM_MEDIUM] = 180 * HZ; + chip->vendor.duration[TPM_LONG] = 180 * HZ; + chip->vendor.duration_adjusted = true; + break; } } -- 1.9.1
[PATCH v8 1/5] tpm_tis: Improve reporting of IO errors
Mysterious TPM behavior can be difficult to track down through all the layers of software. Add error messages for conditions that should never happen. Also include the manufacturer ID along with other chip data printed during init. Signed-off-by: Ed Swierk Reviewed-by: Jarkko Sakkinen --- drivers/char/tpm/tpm_tis.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index 65f7eec..088fa86 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -299,6 +299,8 @@ static int tpm_tis_recv(struct tpm_chip *chip, u8 *buf, size_t count) expected = be32_to_cpu(*(__be32 *) (buf + 2)); if (expected > count) { + dev_err(chip->pdev, "Response too long (wanted %zd, got %d)\n", + count, expected); size = -EIO; goto out; } @@ -366,6 +368,8 @@ static int tpm_tis_send_data(struct tpm_chip *chip, u8 *buf, size_t len) &chip->vendor.int_queue, false); status = tpm_tis_status(chip); if (!itpm && (status & TPM_STS_DATA_EXPECT) == 0) { + dev_err(chip->pdev, "Chip not accepting %zd bytes\n", + len - count); rc = -EIO; goto out_err; } @@ -378,6 +382,7 @@ static int tpm_tis_send_data(struct tpm_chip *chip, u8 *buf, size_t len) &chip->vendor.int_queue, false); status = tpm_tis_status(chip); if ((status & TPM_STS_DATA_EXPECT) != 0) { + dev_err(chip->pdev, "Chip not accepting last byte\n"); rc = -EIO; goto out_err; } @@ -689,8 +694,9 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info, vendor = ioread32(chip->vendor.iobase + TPM_DID_VID(0)); chip->vendor.manufacturer_id = vendor; - dev_info(dev, "%s TPM (device-id 0x%X, rev-id %d)\n", + dev_info(dev, "%s TPM (manufacturer-id 0x%X, device-id 0x%X, rev-id %d)\n", (chip->flags & TPM_CHIP_FLAG_TPM2) ? "2.0" : "1.2", +chip->vendor.manufacturer_id, vendor >> 16, ioread8(chip->vendor.iobase + TPM_RID(0))); if (!itpm) { -- 1.9.1
Re: [PATCH v7 3/5] tpm: Clean up reading of timeout and duration capabilities
On Mon, Jun 20, 2016 at 6:54 PM, Ed Swierk wrote: > --- a/drivers/char/tpm/tpm-interface.c > +++ b/drivers/char/tpm/tpm-interface.c > @@ -461,9 +461,19 @@ ssize_t tpm_getcap(struct device *dev, __be32 subcap_id, > cap_t *cap, > tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); > tpm_cmd.params.getcap_in.subcap = subcap_id; > } > + > rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, desc); > + > + if (!rc && > + ((subcap_id == TPM_CAP_PROP_TIS_TIMEOUT && > + be32_to_cpu(tpm_cmd.header.out.length) != TPM_HEADER_SIZE + 5) > || > +(subcap_id == TPM_CAP_PROP_TIS_DURATION && > + be32_to_cpu(tpm_cmd.header.out.length) != TPM_HEADER_SIZE + 4))) > + rc = -EINVAL; > + Woops, a totally innocuous last-minute (post-testing) cleanup broke this code; should be TPM_HEADER_SIZE + 20 and + 16. I'll push out v8 as soon as I redo my tests. --Ed
[PATCH v7 3/5] tpm: Clean up reading of timeout and duration capabilities
Call tpm_getcap() from tpm_get_timeouts() to eliminate redundant code. Return all errors to the caller rather than swallowing them (e.g. when tpm_transmit_cmd() returns nonzero). Signed-off-by: Ed Swierk --- drivers/char/tpm/tpm-interface.c | 74 +++- 1 file changed, 27 insertions(+), 47 deletions(-) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index cc1e5bc..73c3ee0 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -461,9 +461,19 @@ ssize_t tpm_getcap(struct device *dev, __be32 subcap_id, cap_t *cap, tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); tpm_cmd.params.getcap_in.subcap = subcap_id; } + rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, desc); + + if (!rc && + ((subcap_id == TPM_CAP_PROP_TIS_TIMEOUT && + be32_to_cpu(tpm_cmd.header.out.length) != TPM_HEADER_SIZE + 5) || +(subcap_id == TPM_CAP_PROP_TIS_DURATION && + be32_to_cpu(tpm_cmd.header.out.length) != TPM_HEADER_SIZE + 4))) + rc = -EINVAL; + if (!rc) *cap = tpm_cmd.params.getcap_out.cap; + return rc; } @@ -504,48 +514,30 @@ static int tpm_startup(struct tpm_chip *chip, __be16 startup_type) int tpm_get_timeouts(struct tpm_chip *chip) { - struct tpm_cmd_t tpm_cmd; + cap_t cap; unsigned long new_timeout[4]; unsigned long old_timeout[4]; - struct duration_t *duration_cap; ssize_t rc; - tpm_cmd.header.in = tpm_getcap_header; - tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; - tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT; - rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, NULL); - + rc = tpm_getcap(chip->pdev, TPM_CAP_PROP_TIS_TIMEOUT, &cap, + "attempting to determine the timeouts"); if (rc == TPM_ERR_INVALID_POSTINIT) { /* The TPM is not started, we are the first to talk to it. Execute a startup command. */ - dev_info(chip->pdev, "Issuing TPM_STARTUP"); + dev_info(chip->pdev, "Issuing TPM_STARTUP\n"); if (tpm_startup(chip, TPM_ST_CLEAR)) return rc; - tpm_cmd.header.in = tpm_getcap_header; - tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; - tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT; - rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, - NULL); - } - if (rc) { - dev_err(chip->pdev, - "A TPM error (%zd) occurred attempting to determine the timeouts\n", - rc); - goto duration; + rc = tpm_getcap(chip->pdev, TPM_CAP_PROP_TIS_TIMEOUT, &cap, + "attempting to determine the timeouts"); } + if (rc) + return rc; - if (be32_to_cpu(tpm_cmd.header.out.return_code) != 0 || - be32_to_cpu(tpm_cmd.header.out.length) - != sizeof(tpm_cmd.header.out) + sizeof(u32) + 4 * sizeof(u32)) - return -EINVAL; - - old_timeout[0] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.a); - old_timeout[1] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.b); - old_timeout[2] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.c); - old_timeout[3] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.d); + old_timeout[0] = be32_to_cpu(cap.timeout.a); + old_timeout[1] = be32_to_cpu(cap.timeout.b); + old_timeout[2] = be32_to_cpu(cap.timeout.c); + old_timeout[3] = be32_to_cpu(cap.timeout.d); memcpy(new_timeout, old_timeout, sizeof(new_timeout)); /* @@ -583,29 +575,17 @@ int tpm_get_timeouts(struct tpm_chip *chip) chip->vendor.timeout_c = usecs_to_jiffies(new_timeout[2]); chip->vendor.timeout_d = usecs_to_jiffies(new_timeout[3]); -duration: - tpm_cmd.header.in = tpm_getcap_header; - tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; - tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_DURATION; - - rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, - "attempting to determine the durations"); + rc = tpm_getcap(chip->pdev, TPM_CAP_PROP_TIS_DURATION, &cap, + "attempting to determine the durations"); if (rc) return rc; - if (be32_to
[PATCH v7 1/5] tpm_tis: Improve reporting of IO errors
Mysterious TPM behavior can be difficult to track down through all the layers of software. Add error messages for conditions that should never happen. Also include the manufacturer ID along with other chip data printed during init. Signed-off-by: Ed Swierk Reviewed-by: Jarkko Sakkinen --- drivers/char/tpm/tpm_tis.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index 65f7eec..088fa86 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -299,6 +299,8 @@ static int tpm_tis_recv(struct tpm_chip *chip, u8 *buf, size_t count) expected = be32_to_cpu(*(__be32 *) (buf + 2)); if (expected > count) { + dev_err(chip->pdev, "Response too long (wanted %zd, got %d)\n", + count, expected); size = -EIO; goto out; } @@ -366,6 +368,8 @@ static int tpm_tis_send_data(struct tpm_chip *chip, u8 *buf, size_t len) &chip->vendor.int_queue, false); status = tpm_tis_status(chip); if (!itpm && (status & TPM_STS_DATA_EXPECT) == 0) { + dev_err(chip->pdev, "Chip not accepting %zd bytes\n", + len - count); rc = -EIO; goto out_err; } @@ -378,6 +382,7 @@ static int tpm_tis_send_data(struct tpm_chip *chip, u8 *buf, size_t len) &chip->vendor.int_queue, false); status = tpm_tis_status(chip); if ((status & TPM_STS_DATA_EXPECT) != 0) { + dev_err(chip->pdev, "Chip not accepting last byte\n"); rc = -EIO; goto out_err; } @@ -689,8 +694,9 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info, vendor = ioread32(chip->vendor.iobase + TPM_DID_VID(0)); chip->vendor.manufacturer_id = vendor; - dev_info(dev, "%s TPM (device-id 0x%X, rev-id %d)\n", + dev_info(dev, "%s TPM (manufacturer-id 0x%X, device-id 0x%X, rev-id %d)\n", (chip->flags & TPM_CHIP_FLAG_TPM2) ? "2.0" : "1.2", +chip->vendor.manufacturer_id, vendor >> 16, ioread8(chip->vendor.iobase + TPM_RID(0))); if (!itpm) { -- 1.9.1
[PATCH v7 0/5] tpm: Command duration logging and chip-specific override
v7: Use tpm_getcap() instead of a redundant new function. v6: Split tpm_get_cap_prop() out of tpm_get_timeouts(); always return error on TPM command failure. v5: Use msecs_to_jiffies() instead of * HZ / 1000. v4: Rework tpm_get_timeouts() to allow overriding both timeouts and durations via a single callback. This series - improves TPM command error reporting - adds optional logging of TPM command durations - allows chip-specific override of command durations as well as protocol timeouts - overrides ST19NP18 TPM command duration to avoid lockups Ed Swierk (5): tpm_tis: Improve reporting of IO errors tpm: Add optional logging of TPM command durations tpm: Clean up reading of timeout and duration capabilities tpm: Allow TPM chip drivers to override reported command durations tpm_tis: Increase ST19NP18 TPM command duration to avoid chip lockup drivers/char/tpm/tpm-interface.c | 212 --- drivers/char/tpm/tpm_tis.c | 49 - include/linux/tpm.h | 3 +- 3 files changed, 130 insertions(+), 134 deletions(-) -- 1.9.1
[PATCH v7 4/5] tpm: Allow TPM chip drivers to override reported command durations
Some TPM chips report bogus command durations in their capabilities, just as others report incorrect timeouts. Rework tpm_get_timeouts() to allow chip drivers to override either via a single callback. Also clean up handling of TPMs that report milliseconds instead of microseconds. Signed-off-by: Ed Swierk --- drivers/char/tpm/tpm-interface.c | 143 +-- drivers/char/tpm/tpm_tis.c | 35 +++--- include/linux/tpm.h | 3 +- 3 files changed, 88 insertions(+), 93 deletions(-) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index 73c3ee0..36a6861 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -514,12 +514,11 @@ static int tpm_startup(struct tpm_chip *chip, __be16 startup_type) int tpm_get_timeouts(struct tpm_chip *chip) { - cap_t cap; - unsigned long new_timeout[4]; - unsigned long old_timeout[4]; - ssize_t rc; + cap_t cap1, cap2; + int rc; + struct tpm_vendor_specific orig_vendor; - rc = tpm_getcap(chip->pdev, TPM_CAP_PROP_TIS_TIMEOUT, &cap, + rc = tpm_getcap(chip->pdev, TPM_CAP_PROP_TIS_TIMEOUT, &cap1, "attempting to determine the timeouts"); if (rc == TPM_ERR_INVALID_POSTINIT) { /* The TPM is not started, we are the first to talk to it. @@ -528,77 +527,91 @@ int tpm_get_timeouts(struct tpm_chip *chip) if (tpm_startup(chip, TPM_ST_CLEAR)) return rc; - rc = tpm_getcap(chip->pdev, TPM_CAP_PROP_TIS_TIMEOUT, &cap, + rc = tpm_getcap(chip->pdev, TPM_CAP_PROP_TIS_TIMEOUT, &cap1, "attempting to determine the timeouts"); } if (rc) return rc; - old_timeout[0] = be32_to_cpu(cap.timeout.a); - old_timeout[1] = be32_to_cpu(cap.timeout.b); - old_timeout[2] = be32_to_cpu(cap.timeout.c); - old_timeout[3] = be32_to_cpu(cap.timeout.d); - memcpy(new_timeout, old_timeout, sizeof(new_timeout)); - - /* -* Provide ability for vendor overrides of timeout values in case -* of misreporting. -*/ - if (chip->ops->update_timeouts != NULL) - chip->vendor.timeout_adjusted = - chip->ops->update_timeouts(chip, new_timeout); - - if (!chip->vendor.timeout_adjusted) { - /* Don't overwrite default if value is 0 */ - if (new_timeout[0] != 0 && new_timeout[0] < 1000) { - int i; - - /* timeouts in msec rather usec */ - for (i = 0; i != ARRAY_SIZE(new_timeout); i++) - new_timeout[i] *= 1000; - chip->vendor.timeout_adjusted = true; - } - } - - /* Report adjusted timeouts */ - if (chip->vendor.timeout_adjusted) { - dev_info(chip->pdev, -HW_ERR "Adjusting reported timeouts: A %lu->%luus B %lu->%luus C %lu->%luus D %lu->%luus\n", -old_timeout[0], new_timeout[0], -old_timeout[1], new_timeout[1], -old_timeout[2], new_timeout[2], -old_timeout[3], new_timeout[3]); - } - - chip->vendor.timeout_a = usecs_to_jiffies(new_timeout[0]); - chip->vendor.timeout_b = usecs_to_jiffies(new_timeout[1]); - chip->vendor.timeout_c = usecs_to_jiffies(new_timeout[2]); - chip->vendor.timeout_d = usecs_to_jiffies(new_timeout[3]); - - rc = tpm_getcap(chip->pdev, TPM_CAP_PROP_TIS_DURATION, &cap, + rc = tpm_getcap(chip->pdev, TPM_CAP_PROP_TIS_DURATION, &cap2, "attempting to determine the durations"); if (rc) return rc; - chip->vendor.duration[TPM_SHORT] = - usecs_to_jiffies(be32_to_cpu(cap.duration.tpm_short)); - chip->vendor.duration[TPM_MEDIUM] = - usecs_to_jiffies(be32_to_cpu(cap.duration.tpm_medium)); - chip->vendor.duration[TPM_LONG] = - usecs_to_jiffies(be32_to_cpu(cap.duration.tpm_long)); + be32_to_cpus(&cap1.timeout.a); + be32_to_cpus(&cap1.timeout.b); + be32_to_cpus(&cap1.timeout.c); + be32_to_cpus(&cap1.timeout.d); + chip->vendor.timeout_a = usecs_to_jiffies(cap1.timeout.a); + chip->vendor.timeout_b = usecs_to_jiffies(cap1.timeout.b); + chip->vendor.timeout_c = usecs_to_jiffies(cap1.timeout.c); + chip->vendor.timeout_d = usecs_to_jiffies(cap1.timeout.d); - /* The Broadcom BCM0102 chipset in a Dell Latitude D820 gets the above -* value wrong and apparently reports msecs rather than u
[PATCH v7 5/5] tpm_tis: Increase ST19NP18 TPM command duration to avoid chip lockup
The STMicro ST19NP18-TPM sometimes takes much longer to execute commands than it reports in its capabilities. For example, command 186 (TPM_FlushSpecific) has been observed to take 14560 msec to complete, far longer than the 3000 msec limit for "short" commands reported by the chip. The behavior has also been seen with command 101 (TPM_GetCapability). Worse, when the tpm_tis driver attempts to cancel the current command (by writing commandReady = 1 to TPM_STS_x), the chip locks up completely, returning all-1s from all memory-mapped register reads. The lockup can be cleared only by resetting the system. The occurrence of this excessive command duration depends on the sequence of commands preceding it. One sequence is creating at least 2 new keys via TPM_CreateWrapKey, then letting the TPM idle for at least 30 seconds, then loading a key via TPM_LoadKey2. The next TPM_FlushSpecific occasionally takes tens of seconds to complete. Another sequence is creating many keys in a row without pause. The TPM_CreateWrapKey operation gets much slower after the first few iterations, as one would expect when the pool of precomputed keys is exhausted. Then after a 35-second pause, the same TPM_LoadKey2 followed by TPM_FlushSpecific sequence triggers the behavior. Our working theory is that this older TPM sometimes pauses to precompute keys, which modern chips implement as a background process. Without access to the chip's implementation details it's impossible to know whether any commands are immune to being blocked by this process. So it seems safest to ignore the chip's reported command durations, and use a value much higher than any observed duration, like 180 sec (which is the duration this chip reports for "long" commands). Signed-off-by: Ed Swierk --- drivers/char/tpm/tpm_tis.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index caf7278..8355b45 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -485,6 +485,12 @@ static void tpm_tis_update_timeouts(struct tpm_chip *chip) chip->vendor.timeout_d = msecs_to_jiffies(TIS_SHORT_TIMEOUT); chip->vendor.timeout_adjusted = true; break; + case 0x104a: /* STMicro ST19NP18-TPM */ + chip->vendor.duration[TPM_SHORT] = 180 * HZ; + chip->vendor.duration[TPM_MEDIUM] = 180 * HZ; + chip->vendor.duration[TPM_LONG] = 180 * HZ; + chip->vendor.duration_adjusted = true; + break; } } -- 1.9.1
[PATCH v7 2/5] tpm: Add optional logging of TPM command durations
Some TPMs violate their own advertised command durations. This is much easier to debug with data about how long each command actually takes to complete. Add debug messages that can be enabled by running echo -n 'module tpm +p' >/sys/kernel/debug/dynamic_debug/control on a kernel configured with DYNAMIC_DEBUG=y. Signed-off-by: Ed Swierk Reviewed-by: Jarkko Sakkinen --- drivers/char/tpm/tpm-interface.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index c50637d..cc1e5bc 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -333,13 +333,14 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf, { ssize_t rc; u32 count, ordinal; - unsigned long stop; + unsigned long start, stop; if (bufsiz > TPM_BUFSIZE) bufsiz = TPM_BUFSIZE; count = be32_to_cpu(*((__be32 *) (buf + 2))); ordinal = be32_to_cpu(*((__be32 *) (buf + 6))); + dev_dbg(chip->pdev, "starting command %d count %d\n", ordinal, count); if (count == 0) return -ENODATA; if (count > bufsiz) { @@ -360,18 +361,24 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf, if (chip->vendor.irq) goto out_recv; + start = jiffies; if (chip->flags & TPM_CHIP_FLAG_TPM2) - stop = jiffies + tpm2_calc_ordinal_duration(chip, ordinal); + stop = start + tpm2_calc_ordinal_duration(chip, ordinal); else - stop = jiffies + tpm_calc_ordinal_duration(chip, ordinal); + stop = start + tpm_calc_ordinal_duration(chip, ordinal); do { u8 status = chip->ops->status(chip); if ((status & chip->ops->req_complete_mask) == - chip->ops->req_complete_val) + chip->ops->req_complete_val) { + dev_dbg(chip->pdev, "completed command %d in %d ms\n", + ordinal, jiffies_to_msecs(jiffies - start)); goto out_recv; + } if (chip->ops->req_canceled(chip, status)) { dev_err(chip->pdev, "Operation Canceled\n"); + dev_dbg(chip->pdev, "canceled command %d after %d ms\n", + ordinal, jiffies_to_msecs(jiffies - start)); rc = -ECANCELED; goto out; } @@ -382,6 +389,8 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf, chip->ops->cancel(chip); dev_err(chip->pdev, "Operation Timed out\n"); + dev_dbg(chip->pdev, "command %d timed out after %d ms\n", ordinal, + jiffies_to_msecs(jiffies - start)); rc = -ETIME; goto out; -- 1.9.1
[PATCH v6 2/5] tpm: Add optional logging of TPM command durations
Some TPMs violate their own advertised command durations. This is much easier to debug with data about how long each command actually takes to complete. Add debug messages that can be enabled by running echo -n 'module tpm +p' >/sys/kernel/debug/dynamic_debug/control on a kernel configured with DYNAMIC_DEBUG=y. Signed-off-by: Ed Swierk Reviewed-by: Jarkko Sakkinen --- drivers/char/tpm/tpm-interface.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index c50637d..cc1e5bc 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -333,13 +333,14 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf, { ssize_t rc; u32 count, ordinal; - unsigned long stop; + unsigned long start, stop; if (bufsiz > TPM_BUFSIZE) bufsiz = TPM_BUFSIZE; count = be32_to_cpu(*((__be32 *) (buf + 2))); ordinal = be32_to_cpu(*((__be32 *) (buf + 6))); + dev_dbg(chip->pdev, "starting command %d count %d\n", ordinal, count); if (count == 0) return -ENODATA; if (count > bufsiz) { @@ -360,18 +361,24 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf, if (chip->vendor.irq) goto out_recv; + start = jiffies; if (chip->flags & TPM_CHIP_FLAG_TPM2) - stop = jiffies + tpm2_calc_ordinal_duration(chip, ordinal); + stop = start + tpm2_calc_ordinal_duration(chip, ordinal); else - stop = jiffies + tpm_calc_ordinal_duration(chip, ordinal); + stop = start + tpm_calc_ordinal_duration(chip, ordinal); do { u8 status = chip->ops->status(chip); if ((status & chip->ops->req_complete_mask) == - chip->ops->req_complete_val) + chip->ops->req_complete_val) { + dev_dbg(chip->pdev, "completed command %d in %d ms\n", + ordinal, jiffies_to_msecs(jiffies - start)); goto out_recv; + } if (chip->ops->req_canceled(chip, status)) { dev_err(chip->pdev, "Operation Canceled\n"); + dev_dbg(chip->pdev, "canceled command %d after %d ms\n", + ordinal, jiffies_to_msecs(jiffies - start)); rc = -ECANCELED; goto out; } @@ -382,6 +389,8 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf, chip->ops->cancel(chip); dev_err(chip->pdev, "Operation Timed out\n"); + dev_dbg(chip->pdev, "command %d timed out after %d ms\n", ordinal, + jiffies_to_msecs(jiffies - start)); rc = -ETIME; goto out; -- 1.9.1
[PATCH v6 3/5] tpm: Factor out reading of timeout and duration capabilities
Factor sending the TPM_GetCapability command and validating the result from tpm_get_timeouts() into a new function. Return all errors to the caller rather than swallowing them (e.g. when tpm_transmit_cmd() returns nonzero). Signed-off-by: Ed Swierk --- drivers/char/tpm/tpm-interface.c | 96 ++-- 1 file changed, 52 insertions(+), 44 deletions(-) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index cc1e5bc..4d1f62c 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -502,6 +502,52 @@ static int tpm_startup(struct tpm_chip *chip, __be16 startup_type) "attempting to start the TPM"); } +static int tpm_get_cap_prop(struct tpm_chip *chip, __be32 type, int size, + cap_t *cap, char *desc) +{ + struct tpm_cmd_t tpm_cmd; + ssize_t rc; + + tpm_cmd.header.in = tpm_getcap_header; + tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; + tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); + tpm_cmd.params.getcap_in.subcap = type; + rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, NULL); + + if (rc == TPM_ERR_INVALID_POSTINIT) { + /* The TPM is not started, we are the first to talk to it. + Execute a startup command. */ + dev_info(chip->pdev, "Issuing TPM_STARTUP\n"); + if (tpm_startup(chip, TPM_ST_CLEAR)) + return rc; + + tpm_cmd.header.in = tpm_getcap_header; + tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; + tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); + tpm_cmd.params.getcap_in.subcap = type; + rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, + NULL); + } + + if (rc) { + dev_err(chip->pdev, + "Error %zd reading %s\n", rc, desc); + return rc; + } + + if (be32_to_cpu(tpm_cmd.header.out.return_code) != 0 || + be32_to_cpu(tpm_cmd.header.out.length) + != sizeof(tpm_cmd.header.out) + sizeof(u32) + size * sizeof(u32)) { + dev_err(chip->pdev, + "Bad return code or length reading %s\n", desc); + return -EINVAL; + } + + memcpy(cap, &tpm_cmd.params.getcap_out.cap, sizeof(cap_t)); + + return 0; +} + int tpm_get_timeouts(struct tpm_chip *chip) { struct tpm_cmd_t tpm_cmd; @@ -510,37 +556,10 @@ int tpm_get_timeouts(struct tpm_chip *chip) struct duration_t *duration_cap; ssize_t rc; - tpm_cmd.header.in = tpm_getcap_header; - tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; - tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT; - rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, NULL); - - if (rc == TPM_ERR_INVALID_POSTINIT) { - /* The TPM is not started, we are the first to talk to it. - Execute a startup command. */ - dev_info(chip->pdev, "Issuing TPM_STARTUP"); - if (tpm_startup(chip, TPM_ST_CLEAR)) - return rc; - - tpm_cmd.header.in = tpm_getcap_header; - tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; - tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT; - rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, - NULL); - } - if (rc) { - dev_err(chip->pdev, - "A TPM error (%zd) occurred attempting to determine the timeouts\n", - rc); - goto duration; - } - - if (be32_to_cpu(tpm_cmd.header.out.return_code) != 0 || - be32_to_cpu(tpm_cmd.header.out.length) - != sizeof(tpm_cmd.header.out) + sizeof(u32) + 4 * sizeof(u32)) - return -EINVAL; + rc = tpm_get_cap_prop(chip, TPM_CAP_PROP_TIS_TIMEOUT, 4, + &tpm_cmd.params.getcap_out.cap, "timeouts"); + if (rc) + return rc; old_timeout[0] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.a); old_timeout[1] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.b); @@ -583,22 +602,11 @@ int tpm_get_timeouts(struct tpm_chip *chip) chip->vendor.timeout_c = usecs_to_jiffies(new_timeout[2]); chip->vendor.timeout_d = usecs_to_jiffies(new_timeout[3]); -duration: - tpm_cmd.header.in = tpm_getcap_header; - tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; - tpm_cmd.params.getcap
[PATCH v6 4/5] tpm: Allow TPM chip drivers to override reported command durations
Some TPM chips report bogus command durations in their capabilities, just as others report incorrect timeouts. Rework tpm_get_timeouts() to allow chip drivers to override either via a single callback. Also clean up handling of TPMs that report milliseconds instead of microseconds. Signed-off-by: Ed Swierk --- drivers/char/tpm/tpm-interface.c | 139 +-- drivers/char/tpm/tpm_tis.c | 35 +++--- include/linux/tpm.h | 3 +- 3 files changed, 85 insertions(+), 92 deletions(-) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index 4d1f62c..a14adfd 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -550,83 +550,94 @@ static int tpm_get_cap_prop(struct tpm_chip *chip, __be32 type, int size, int tpm_get_timeouts(struct tpm_chip *chip) { - struct tpm_cmd_t tpm_cmd; - unsigned long new_timeout[4]; - unsigned long old_timeout[4]; - struct duration_t *duration_cap; - ssize_t rc; + cap_t cap1, cap2; + int rc; + struct tpm_vendor_specific orig_vendor; - rc = tpm_get_cap_prop(chip, TPM_CAP_PROP_TIS_TIMEOUT, 4, - &tpm_cmd.params.getcap_out.cap, "timeouts"); + rc = tpm_get_cap_prop(chip, TPM_CAP_PROP_TIS_TIMEOUT, 4, &cap1, + "timeouts"); if (rc) return rc; - old_timeout[0] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.a); - old_timeout[1] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.b); - old_timeout[2] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.c); - old_timeout[3] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.d); - memcpy(new_timeout, old_timeout, sizeof(new_timeout)); + rc = tpm_get_cap_prop(chip, TPM_CAP_PROP_TIS_DURATION, 3, &cap2, + "durations"); + if (rc) + return rc; - /* -* Provide ability for vendor overrides of timeout values in case -* of misreporting. -*/ - if (chip->ops->update_timeouts != NULL) - chip->vendor.timeout_adjusted = - chip->ops->update_timeouts(chip, new_timeout); + be32_to_cpus(&cap1.timeout.a); + be32_to_cpus(&cap1.timeout.b); + be32_to_cpus(&cap1.timeout.c); + be32_to_cpus(&cap1.timeout.d); + chip->vendor.timeout_a = usecs_to_jiffies(cap1.timeout.a); + chip->vendor.timeout_b = usecs_to_jiffies(cap1.timeout.b); + chip->vendor.timeout_c = usecs_to_jiffies(cap1.timeout.c); + chip->vendor.timeout_d = usecs_to_jiffies(cap1.timeout.d); - if (!chip->vendor.timeout_adjusted) { - /* Don't overwrite default if value is 0 */ - if (new_timeout[0] != 0 && new_timeout[0] < 1000) { - int i; - - /* timeouts in msec rather usec */ - for (i = 0; i != ARRAY_SIZE(new_timeout); i++) - new_timeout[i] *= 1000; - chip->vendor.timeout_adjusted = true; - } + /* Some TPMs report timeouts in milliseconds rather than + microseconds. Use a value between 1 and 1000 as an + indication that this is the case. */ + if (cap1.timeout.a > 0 && cap1.timeout.a < 1000) { + chip->vendor.timeout_a = msecs_to_jiffies(cap1.timeout.a); + chip->vendor.timeout_b = msecs_to_jiffies(cap1.timeout.b); + chip->vendor.timeout_c = msecs_to_jiffies(cap1.timeout.c); + chip->vendor.timeout_d = msecs_to_jiffies(cap1.timeout.d); + chip->vendor.timeout_adjusted = true; } - /* Report adjusted timeouts */ + be32_to_cpus(&cap2.duration.tpm_short); + be32_to_cpus(&cap2.duration.tpm_medium); + be32_to_cpus(&cap2.duration.tpm_long); + chip->vendor.duration[TPM_SHORT] = + usecs_to_jiffies(cap2.duration.tpm_short); + chip->vendor.duration[TPM_MEDIUM] = + usecs_to_jiffies(cap2.duration.tpm_medium); + chip->vendor.duration[TPM_LONG] = + usecs_to_jiffies(cap2.duration.tpm_long); + + memcpy(&orig_vendor, &chip->vendor, sizeof(orig_vendor)); + + /* Interpret duration values between 1 and 1 as + milliseconds to deal with TPMs like the Broadcom BCM0102 in + the Dell Latitude D820. */ + if (cap2.duration.tpm_short > 0 && cap2.duration.tpm_short < 1) { + chip->vendor.duration[TPM_SHORT] = + msecs_to_jiffies(cap2.duration.tpm_short); + chip->vendor.duration[TPM_MEDIUM] = + msecs_to_jiffies(cap2.dur
[PATCH v6 0/5] tpm: Command duration logging and chip-specific override
v6: Split tpm_get_cap_prop() out of tpm_get_timeouts(); always return error on TPM command failure. v5: Use msecs_to_jiffies() instead of * HZ / 1000. v4: Rework tpm_get_timeouts() to allow overriding both timeouts and durations via a single callback. This series - improves TPM command error reporting - adds optional logging of TPM command durations - allows chip-specific override of command durations as well as protocol timeouts - overrides ST19NP18 TPM command duration to avoid lockups Ed Swierk (5): tpm_tis: Improve reporting of IO errors tpm: Add optional logging of TPM command durations tpm: Factor out reading of timeout and duration capabilities tpm: Allow TPM chip drivers to override reported command durations tpm_tis: Increase ST19NP18 TPM command duration to avoid chip lockup drivers/char/tpm/tpm-interface.c | 192 ++- drivers/char/tpm/tpm_tis.c | 48 +- include/linux/tpm.h | 3 +- 3 files changed, 132 insertions(+), 111 deletions(-) -- 1.9.1
[PATCH v6 5/5] tpm_tis: Increase ST19NP18 TPM command duration to avoid chip lockup
The STMicro ST19NP18-TPM sometimes takes much longer to execute commands than it reports in its capabilities. For example, command 186 (TPM_FlushSpecific) has been observed to take 14560 msec to complete, far longer than the 3000 msec limit for "short" commands reported by the chip. The behavior has also been seen with command 101 (TPM_GetCapability). Worse, when the tpm_tis driver attempts to cancel the current command (by writing commandReady = 1 to TPM_STS_x), the chip locks up completely, returning all-1s from all memory-mapped register reads. The lockup can be cleared only by resetting the system. The occurrence of this excessive command duration depends on the sequence of commands preceding it. One sequence is creating at least 2 new keys via TPM_CreateWrapKey, then letting the TPM idle for at least 30 seconds, then loading a key via TPM_LoadKey2. The next TPM_FlushSpecific occasionally takes tens of seconds to complete. Another sequence is creating many keys in a row without pause. The TPM_CreateWrapKey operation gets much slower after the first few iterations, as one would expect when the pool of precomputed keys is exhausted. Then after a 35-second pause, the same TPM_LoadKey2 followed by TPM_FlushSpecific sequence triggers the behavior. Our working theory is that this older TPM sometimes pauses to precompute keys, which modern chips implement as a background process. Without access to the chip's implementation details it's impossible to know whether any commands are immune to being blocked by this process. So it seems safest to ignore the chip's reported command durations, and use a value much higher than any observed duration, like 180 sec (which is the duration this chip reports for "long" commands). Signed-off-by: Ed Swierk --- drivers/char/tpm/tpm_tis.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index caf7278..862c502 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -485,6 +485,11 @@ static void tpm_tis_update_timeouts(struct tpm_chip *chip) chip->vendor.timeout_d = msecs_to_jiffies(TIS_SHORT_TIMEOUT); chip->vendor.timeout_adjusted = true; break; + case 0x104a: /* STMicro ST19NP18-TPM */ + chip->vendor.duration[TPM_SHORT] = 180 * HZ; + chip->vendor.duration[TPM_MEDIUM] = 180 * HZ; + chip->vendor.duration[TPM_LONG] = 180 * HZ; + chip->vendor.duration_adjusted = true; } } -- 1.9.1
[PATCH v6 1/5] tpm_tis: Improve reporting of IO errors
Mysterious TPM behavior can be difficult to track down through all the layers of software. Add error messages for conditions that should never happen. Also include the manufacturer ID along with other chip data printed during init. Signed-off-by: Ed Swierk Reviewed-by: Jarkko Sakkinen --- drivers/char/tpm/tpm_tis.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index 65f7eec..088fa86 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -299,6 +299,8 @@ static int tpm_tis_recv(struct tpm_chip *chip, u8 *buf, size_t count) expected = be32_to_cpu(*(__be32 *) (buf + 2)); if (expected > count) { + dev_err(chip->pdev, "Response too long (wanted %zd, got %d)\n", + count, expected); size = -EIO; goto out; } @@ -366,6 +368,8 @@ static int tpm_tis_send_data(struct tpm_chip *chip, u8 *buf, size_t len) &chip->vendor.int_queue, false); status = tpm_tis_status(chip); if (!itpm && (status & TPM_STS_DATA_EXPECT) == 0) { + dev_err(chip->pdev, "Chip not accepting %zd bytes\n", + len - count); rc = -EIO; goto out_err; } @@ -378,6 +382,7 @@ static int tpm_tis_send_data(struct tpm_chip *chip, u8 *buf, size_t len) &chip->vendor.int_queue, false); status = tpm_tis_status(chip); if ((status & TPM_STS_DATA_EXPECT) != 0) { + dev_err(chip->pdev, "Chip not accepting last byte\n"); rc = -EIO; goto out_err; } @@ -689,8 +694,9 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info, vendor = ioread32(chip->vendor.iobase + TPM_DID_VID(0)); chip->vendor.manufacturer_id = vendor; - dev_info(dev, "%s TPM (device-id 0x%X, rev-id %d)\n", + dev_info(dev, "%s TPM (manufacturer-id 0x%X, device-id 0x%X, rev-id %d)\n", (chip->flags & TPM_CHIP_FLAG_TPM2) ? "2.0" : "1.2", +chip->vendor.manufacturer_id, vendor >> 16, ioread8(chip->vendor.iobase + TPM_RID(0))); if (!itpm) { -- 1.9.1
Re: [PATCH v5 3/4] tpm: Allow TPM chip drivers to override reported command durations
On Fri, Jun 10, 2016 at 12:42 PM, Jarkko Sakkinen wrote: > On Fri, Jun 10, 2016 at 10:34:15AM -0700, Ed Swierk wrote: >> On Fri, Jun 10, 2016 at 5:19 AM, Jarkko Sakkinen >> wrote: >> > On Wed, Jun 08, 2016 at 04:00:17PM -0700, Ed Swierk wrote: >> >> Some TPM chips report bogus command durations in their capabilities, >> >> just as others report incorrect timeouts. Rework tpm_get_timeouts() >> >> to allow chip drivers to override either via a single callback. >> >> Also clean up handling of TPMs that report milliseconds instead of >> >> microseconds. >> >> >> >> Signed-off-by: Ed Swierk >> >> --- >> >> drivers/char/tpm/tpm-interface.c | 177 >> >> +-- >> >> drivers/char/tpm/tpm_tis.c | 35 ++-- >> >> include/linux/tpm.h | 3 +- >> >> 3 files changed, 106 insertions(+), 109 deletions(-) >> >> >> >> diff --git a/drivers/char/tpm/tpm-interface.c >> >> b/drivers/char/tpm/tpm-interface.c >> >> index cc1e5bc..b8a08bb 100644 >> >> --- a/drivers/char/tpm/tpm-interface.c >> >> +++ b/drivers/char/tpm/tpm-interface.c >> >> @@ -502,123 +502,138 @@ static int tpm_startup(struct tpm_chip *chip, >> >> __be16 startup_type) >> >> "attempting to start the TPM"); >> >> } >> >> >> >> -int tpm_get_timeouts(struct tpm_chip *chip) >> >> +static int tpm_get_cap_prop(struct tpm_chip *chip, __be32 type, int size, >> >> + cap_t *cap, char *desc) >> >> { >> >> struct tpm_cmd_t tpm_cmd; >> >> - unsigned long new_timeout[4]; >> >> - unsigned long old_timeout[4]; >> >> - struct duration_t *duration_cap; >> >> ssize_t rc; >> >> >> >> tpm_cmd.header.in = tpm_getcap_header; >> >> tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; >> >> tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); >> >> - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT; >> >> + tpm_cmd.params.getcap_in.subcap = type; >> >> rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, >> >> NULL); >> >> >> >> if (rc == TPM_ERR_INVALID_POSTINIT) { >> >> /* The TPM is not started, we are the first to talk to it. >> >> Execute a startup command. */ >> >> - dev_info(chip->pdev, "Issuing TPM_STARTUP"); >> >> + dev_info(chip->pdev, "Issuing TPM_STARTUP\n"); >> >> if (tpm_startup(chip, TPM_ST_CLEAR)) >> >> return rc; >> >> >> >> tpm_cmd.header.in = tpm_getcap_header; >> >> tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; >> >> tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); >> >> - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT; >> >> + tpm_cmd.params.getcap_in.subcap = type; >> >> rc = tpm_transmit_cmd(chip, &tpm_cmd, >> >> TPM_INTERNAL_RESULT_SIZE, >> >> NULL); >> >> } >> >> + >> >> if (rc) { >> >> dev_err(chip->pdev, >> >> - "A TPM error (%zd) occurred attempting to determine >> >> the timeouts\n", >> >> - rc); >> >> - goto duration; >> >> + "Error %zd reading %s\n", rc, desc); >> >> + return -EINVAL; >> >> } >> >> >> >> if (be32_to_cpu(tpm_cmd.header.out.return_code) != 0 || >> >> be32_to_cpu(tpm_cmd.header.out.length) >> >> - != sizeof(tpm_cmd.header.out) + sizeof(u32) + 4 * sizeof(u32)) >> >> + != sizeof(tpm_cmd.header.out) + sizeof(u32) + size * >> >> sizeof(u32)) { >> >> + dev_err(chip->pdev, >> >> + "Bad return code or length reading %s\n", desc); >> >> return -EINVAL; >> >> - >> >> - old_timeout[0] = >> >> be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.a); >> >> -
Re: [PATCH v5 3/4] tpm: Allow TPM chip drivers to override reported command durations
On Fri, Jun 10, 2016 at 5:19 AM, Jarkko Sakkinen wrote: > On Wed, Jun 08, 2016 at 04:00:17PM -0700, Ed Swierk wrote: >> Some TPM chips report bogus command durations in their capabilities, >> just as others report incorrect timeouts. Rework tpm_get_timeouts() >> to allow chip drivers to override either via a single callback. >> Also clean up handling of TPMs that report milliseconds instead of >> microseconds. >> >> Signed-off-by: Ed Swierk >> --- >> drivers/char/tpm/tpm-interface.c | 177 >> +-- >> drivers/char/tpm/tpm_tis.c | 35 ++-- >> include/linux/tpm.h | 3 +- >> 3 files changed, 106 insertions(+), 109 deletions(-) >> >> diff --git a/drivers/char/tpm/tpm-interface.c >> b/drivers/char/tpm/tpm-interface.c >> index cc1e5bc..b8a08bb 100644 >> --- a/drivers/char/tpm/tpm-interface.c >> +++ b/drivers/char/tpm/tpm-interface.c >> @@ -502,123 +502,138 @@ static int tpm_startup(struct tpm_chip *chip, __be16 >> startup_type) >> "attempting to start the TPM"); >> } >> >> -int tpm_get_timeouts(struct tpm_chip *chip) >> +static int tpm_get_cap_prop(struct tpm_chip *chip, __be32 type, int size, >> + cap_t *cap, char *desc) >> { >> struct tpm_cmd_t tpm_cmd; >> - unsigned long new_timeout[4]; >> - unsigned long old_timeout[4]; >> - struct duration_t *duration_cap; >> ssize_t rc; >> >> tpm_cmd.header.in = tpm_getcap_header; >> tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; >> tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); >> - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT; >> + tpm_cmd.params.getcap_in.subcap = type; >> rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, NULL); >> >> if (rc == TPM_ERR_INVALID_POSTINIT) { >> /* The TPM is not started, we are the first to talk to it. >> Execute a startup command. */ >> - dev_info(chip->pdev, "Issuing TPM_STARTUP"); >> + dev_info(chip->pdev, "Issuing TPM_STARTUP\n"); >> if (tpm_startup(chip, TPM_ST_CLEAR)) >> return rc; >> >> tpm_cmd.header.in = tpm_getcap_header; >> tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; >> tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); >> - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT; >> + tpm_cmd.params.getcap_in.subcap = type; >> rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, >> NULL); >> } >> + >> if (rc) { >> dev_err(chip->pdev, >> - "A TPM error (%zd) occurred attempting to determine >> the timeouts\n", >> - rc); >> - goto duration; >> + "Error %zd reading %s\n", rc, desc); >> + return -EINVAL; >> } >> >> if (be32_to_cpu(tpm_cmd.header.out.return_code) != 0 || >> be32_to_cpu(tpm_cmd.header.out.length) >> - != sizeof(tpm_cmd.header.out) + sizeof(u32) + 4 * sizeof(u32)) >> + != sizeof(tpm_cmd.header.out) + sizeof(u32) + size * sizeof(u32)) { >> + dev_err(chip->pdev, >> + "Bad return code or length reading %s\n", desc); >> return -EINVAL; >> - >> - old_timeout[0] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.a); >> - old_timeout[1] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.b); >> - old_timeout[2] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.c); >> - old_timeout[3] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.d); >> - memcpy(new_timeout, old_timeout, sizeof(new_timeout)); >> - >> - /* >> - * Provide ability for vendor overrides of timeout values in case >> - * of misreporting. >> - */ >> - if (chip->ops->update_timeouts != NULL) >> - chip->vendor.timeout_adjusted = >> - chip->ops->update_timeouts(chip, new_timeout); >> - >> - if (!chip->vendor.timeout_adjusted) { >> - /* Don't overwrite default if value is 0 */ >> - if (new_timeout[0] != 0 && new_timeout[0] < 1000) { >>
[PATCH v5 4/4] tpm_tis: Increase ST19NP18 TPM command duration to avoid chip lockup
The STMicro ST19NP18-TPM sometimes takes much longer to execute commands than it reports in its capabilities. For example, command 186 (TPM_FlushSpecific) has been observed to take 14560 msec to complete, far longer than the 3000 msec limit for "short" commands reported by the chip. The behavior has also been seen with command 101 (TPM_GetCapability). Worse, when the tpm_tis driver attempts to cancel the current command (by writing commandReady = 1 to TPM_STS_x), the chip locks up completely, returning all-1s from all memory-mapped register reads. The lockup can be cleared only by resetting the system. The occurrence of this excessive command duration depends on the sequence of commands preceding it. One sequence is creating at least 2 new keys via TPM_CreateWrapKey, then letting the TPM idle for at least 30 seconds, then loading a key via TPM_LoadKey2. The next TPM_FlushSpecific occasionally takes tens of seconds to complete. Another sequence is creating many keys in a row without pause. The TPM_CreateWrapKey operation gets much slower after the first few iterations, as one would expect when the pool of precomputed keys is exhausted. Then after a 35-second pause, the same TPM_LoadKey2 followed by TPM_FlushSpecific sequence triggers the behavior. Our working theory is that this older TPM sometimes pauses to precompute keys, which modern chips implement as a background process. Without access to the chip's implementation details it's impossible to know whether any commands are immune to being blocked by this process. So it seems safest to ignore the chip's reported command durations, and use a value much higher than any observed duration, like 180 sec (which is the duration this chip reports for "long" commands). Signed-off-by: Ed Swierk --- drivers/char/tpm/tpm_tis.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index caf7278..862c502 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -485,6 +485,11 @@ static void tpm_tis_update_timeouts(struct tpm_chip *chip) chip->vendor.timeout_d = msecs_to_jiffies(TIS_SHORT_TIMEOUT); chip->vendor.timeout_adjusted = true; break; + case 0x104a: /* STMicro ST19NP18-TPM */ + chip->vendor.duration[TPM_SHORT] = 180 * HZ; + chip->vendor.duration[TPM_MEDIUM] = 180 * HZ; + chip->vendor.duration[TPM_LONG] = 180 * HZ; + chip->vendor.duration_adjusted = true; } } -- 1.9.1
[PATCH v5 1/4] tpm_tis: Improve reporting of IO errors
Mysterious TPM behavior can be difficult to track down through all the layers of software. Add error messages for conditions that should never happen. Also include the manufacturer ID along with other chip data printed during init. Signed-off-by: Ed Swierk Reviewed-by: Jarkko Sakkinen --- drivers/char/tpm/tpm_tis.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index 65f7eec..088fa86 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -299,6 +299,8 @@ static int tpm_tis_recv(struct tpm_chip *chip, u8 *buf, size_t count) expected = be32_to_cpu(*(__be32 *) (buf + 2)); if (expected > count) { + dev_err(chip->pdev, "Response too long (wanted %zd, got %d)\n", + count, expected); size = -EIO; goto out; } @@ -366,6 +368,8 @@ static int tpm_tis_send_data(struct tpm_chip *chip, u8 *buf, size_t len) &chip->vendor.int_queue, false); status = tpm_tis_status(chip); if (!itpm && (status & TPM_STS_DATA_EXPECT) == 0) { + dev_err(chip->pdev, "Chip not accepting %zd bytes\n", + len - count); rc = -EIO; goto out_err; } @@ -378,6 +382,7 @@ static int tpm_tis_send_data(struct tpm_chip *chip, u8 *buf, size_t len) &chip->vendor.int_queue, false); status = tpm_tis_status(chip); if ((status & TPM_STS_DATA_EXPECT) != 0) { + dev_err(chip->pdev, "Chip not accepting last byte\n"); rc = -EIO; goto out_err; } @@ -689,8 +694,9 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info, vendor = ioread32(chip->vendor.iobase + TPM_DID_VID(0)); chip->vendor.manufacturer_id = vendor; - dev_info(dev, "%s TPM (device-id 0x%X, rev-id %d)\n", + dev_info(dev, "%s TPM (manufacturer-id 0x%X, device-id 0x%X, rev-id %d)\n", (chip->flags & TPM_CHIP_FLAG_TPM2) ? "2.0" : "1.2", +chip->vendor.manufacturer_id, vendor >> 16, ioread8(chip->vendor.iobase + TPM_RID(0))); if (!itpm) { -- 1.9.1
[PATCH v5 0/4] tpm: Command duration logging and chip-specific override
v5: Use msecs_to_jiffies() instead of * HZ / 1000. v4: Rework tpm_get_timeouts() to allow overriding both timeouts and durations via a single callback. This series - improves TPM command error reporting - adds optional logging of TPM command durations - allows chip-specific override of command durations as well as protocol timeouts - overrides ST19NP18 TPM command duration to avoid lockups Ed Swierk (4): tpm_tis: Improve reporting of IO errors tpm: Add optional logging of TPM command durations tpm: Allow TPM chip drivers to override reported command durations tpm_tis: Increase ST19NP18 TPM command duration to avoid chip lockup drivers/char/tpm/tpm-interface.c | 194 ++- drivers/char/tpm/tpm_tis.c | 48 +- include/linux/tpm.h | 3 +- 3 files changed, 131 insertions(+), 114 deletions(-) -- 1.9.1
[PATCH v5 2/4] tpm: Add optional logging of TPM command durations
Some TPMs violate their own advertised command durations. This is much easier to debug with data about how long each command actually takes to complete. Add debug messages that can be enabled by running echo -n 'module tpm +p' >/sys/kernel/debug/dynamic_debug/control on a kernel configured with DYNAMIC_DEBUG=y. Signed-off-by: Ed Swierk Reviewed-by: Jarkko Sakkinen --- drivers/char/tpm/tpm-interface.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index c50637d..cc1e5bc 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -333,13 +333,14 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf, { ssize_t rc; u32 count, ordinal; - unsigned long stop; + unsigned long start, stop; if (bufsiz > TPM_BUFSIZE) bufsiz = TPM_BUFSIZE; count = be32_to_cpu(*((__be32 *) (buf + 2))); ordinal = be32_to_cpu(*((__be32 *) (buf + 6))); + dev_dbg(chip->pdev, "starting command %d count %d\n", ordinal, count); if (count == 0) return -ENODATA; if (count > bufsiz) { @@ -360,18 +361,24 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf, if (chip->vendor.irq) goto out_recv; + start = jiffies; if (chip->flags & TPM_CHIP_FLAG_TPM2) - stop = jiffies + tpm2_calc_ordinal_duration(chip, ordinal); + stop = start + tpm2_calc_ordinal_duration(chip, ordinal); else - stop = jiffies + tpm_calc_ordinal_duration(chip, ordinal); + stop = start + tpm_calc_ordinal_duration(chip, ordinal); do { u8 status = chip->ops->status(chip); if ((status & chip->ops->req_complete_mask) == - chip->ops->req_complete_val) + chip->ops->req_complete_val) { + dev_dbg(chip->pdev, "completed command %d in %d ms\n", + ordinal, jiffies_to_msecs(jiffies - start)); goto out_recv; + } if (chip->ops->req_canceled(chip, status)) { dev_err(chip->pdev, "Operation Canceled\n"); + dev_dbg(chip->pdev, "canceled command %d after %d ms\n", + ordinal, jiffies_to_msecs(jiffies - start)); rc = -ECANCELED; goto out; } @@ -382,6 +389,8 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf, chip->ops->cancel(chip); dev_err(chip->pdev, "Operation Timed out\n"); + dev_dbg(chip->pdev, "command %d timed out after %d ms\n", ordinal, + jiffies_to_msecs(jiffies - start)); rc = -ETIME; goto out; -- 1.9.1
[PATCH v5 3/4] tpm: Allow TPM chip drivers to override reported command durations
Some TPM chips report bogus command durations in their capabilities, just as others report incorrect timeouts. Rework tpm_get_timeouts() to allow chip drivers to override either via a single callback. Also clean up handling of TPMs that report milliseconds instead of microseconds. Signed-off-by: Ed Swierk --- drivers/char/tpm/tpm-interface.c | 177 +-- drivers/char/tpm/tpm_tis.c | 35 ++-- include/linux/tpm.h | 3 +- 3 files changed, 106 insertions(+), 109 deletions(-) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index cc1e5bc..b8a08bb 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -502,123 +502,138 @@ static int tpm_startup(struct tpm_chip *chip, __be16 startup_type) "attempting to start the TPM"); } -int tpm_get_timeouts(struct tpm_chip *chip) +static int tpm_get_cap_prop(struct tpm_chip *chip, __be32 type, int size, + cap_t *cap, char *desc) { struct tpm_cmd_t tpm_cmd; - unsigned long new_timeout[4]; - unsigned long old_timeout[4]; - struct duration_t *duration_cap; ssize_t rc; tpm_cmd.header.in = tpm_getcap_header; tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT; + tpm_cmd.params.getcap_in.subcap = type; rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, NULL); if (rc == TPM_ERR_INVALID_POSTINIT) { /* The TPM is not started, we are the first to talk to it. Execute a startup command. */ - dev_info(chip->pdev, "Issuing TPM_STARTUP"); + dev_info(chip->pdev, "Issuing TPM_STARTUP\n"); if (tpm_startup(chip, TPM_ST_CLEAR)) return rc; tpm_cmd.header.in = tpm_getcap_header; tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT; + tpm_cmd.params.getcap_in.subcap = type; rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, NULL); } + if (rc) { dev_err(chip->pdev, - "A TPM error (%zd) occurred attempting to determine the timeouts\n", - rc); - goto duration; + "Error %zd reading %s\n", rc, desc); + return -EINVAL; } if (be32_to_cpu(tpm_cmd.header.out.return_code) != 0 || be32_to_cpu(tpm_cmd.header.out.length) - != sizeof(tpm_cmd.header.out) + sizeof(u32) + 4 * sizeof(u32)) + != sizeof(tpm_cmd.header.out) + sizeof(u32) + size * sizeof(u32)) { + dev_err(chip->pdev, + "Bad return code or length reading %s\n", desc); return -EINVAL; - - old_timeout[0] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.a); - old_timeout[1] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.b); - old_timeout[2] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.c); - old_timeout[3] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.d); - memcpy(new_timeout, old_timeout, sizeof(new_timeout)); - - /* -* Provide ability for vendor overrides of timeout values in case -* of misreporting. -*/ - if (chip->ops->update_timeouts != NULL) - chip->vendor.timeout_adjusted = - chip->ops->update_timeouts(chip, new_timeout); - - if (!chip->vendor.timeout_adjusted) { - /* Don't overwrite default if value is 0 */ - if (new_timeout[0] != 0 && new_timeout[0] < 1000) { - int i; - - /* timeouts in msec rather usec */ - for (i = 0; i != ARRAY_SIZE(new_timeout); i++) - new_timeout[i] *= 1000; - chip->vendor.timeout_adjusted = true; - } } - /* Report adjusted timeouts */ - if (chip->vendor.timeout_adjusted) { - dev_info(chip->pdev, -HW_ERR "Adjusting reported timeouts: A %lu->%luus B %lu->%luus C %lu->%luus D %lu->%luus\n", -old_timeout[0], new_timeout[0], -old_timeout[1], new_timeout[1], -old_timeout[2], new_timeout[2], -old_timeout[3], new_timeout[3]); - } + memcpy(cap, &tpm_cmd.params.getcap_out.ca
Re: [tpmdd-devel] [PATCH v4 3/4] tpm: Allow TPM chip drivers to override reported command durations
On Wed, Jun 8, 2016 at 12:05 PM, Jason Gunthorpe wrote: > On Tue, Jun 07, 2016 at 05:45:39PM -0700, Ed Swierk wrote: >> + case 0x32041114: /* Atmel 3204 */ >> + chip->vendor.timeout_a = TIS_SHORT_TIMEOUT * HZ / 1000; >> + chip->vendor.timeout_b = TIS_LONG_TIMEOUT * HZ / 1000; >> + chip->vendor.timeout_c = TIS_SHORT_TIMEOUT * HZ / 1000; >> + chip->vendor.timeout_d = TIS_SHORT_TIMEOUT * HZ / 1000; > > Shouldn't these use msec_to_jiffies? Indeed * HZ / 1000 can be one jiffy less than msec_to_jiffies(), depending on HZ. I'll change it. --Ed
[PATCH v4 2/4] tpm: Add optional logging of TPM command durations
Some TPMs violate their own advertised command durations. This is much easier to debug with data about how long each command actually takes to complete. Add debug messages that can be enabled by running echo -n 'module tpm +p' >/sys/kernel/debug/dynamic_debug/control on a kernel configured with DYNAMIC_DEBUG=y. Signed-off-by: Ed Swierk Reviewed-by: Jarkko Sakkinen --- drivers/char/tpm/tpm-interface.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index c50637d..cc1e5bc 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -333,13 +333,14 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf, { ssize_t rc; u32 count, ordinal; - unsigned long stop; + unsigned long start, stop; if (bufsiz > TPM_BUFSIZE) bufsiz = TPM_BUFSIZE; count = be32_to_cpu(*((__be32 *) (buf + 2))); ordinal = be32_to_cpu(*((__be32 *) (buf + 6))); + dev_dbg(chip->pdev, "starting command %d count %d\n", ordinal, count); if (count == 0) return -ENODATA; if (count > bufsiz) { @@ -360,18 +361,24 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf, if (chip->vendor.irq) goto out_recv; + start = jiffies; if (chip->flags & TPM_CHIP_FLAG_TPM2) - stop = jiffies + tpm2_calc_ordinal_duration(chip, ordinal); + stop = start + tpm2_calc_ordinal_duration(chip, ordinal); else - stop = jiffies + tpm_calc_ordinal_duration(chip, ordinal); + stop = start + tpm_calc_ordinal_duration(chip, ordinal); do { u8 status = chip->ops->status(chip); if ((status & chip->ops->req_complete_mask) == - chip->ops->req_complete_val) + chip->ops->req_complete_val) { + dev_dbg(chip->pdev, "completed command %d in %d ms\n", + ordinal, jiffies_to_msecs(jiffies - start)); goto out_recv; + } if (chip->ops->req_canceled(chip, status)) { dev_err(chip->pdev, "Operation Canceled\n"); + dev_dbg(chip->pdev, "canceled command %d after %d ms\n", + ordinal, jiffies_to_msecs(jiffies - start)); rc = -ECANCELED; goto out; } @@ -382,6 +389,8 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf, chip->ops->cancel(chip); dev_err(chip->pdev, "Operation Timed out\n"); + dev_dbg(chip->pdev, "command %d timed out after %d ms\n", ordinal, + jiffies_to_msecs(jiffies - start)); rc = -ETIME; goto out; -- 1.9.1
[PATCH v4 1/4] tpm_tis: Improve reporting of IO errors
Mysterious TPM behavior can be difficult to track down through all the layers of software. Add error messages for conditions that should never happen. Also include the manufacturer ID along with other chip data printed during init. Signed-off-by: Ed Swierk Reviewed-by: Jarkko Sakkinen --- drivers/char/tpm/tpm_tis.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index 65f7eec..088fa86 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -299,6 +299,8 @@ static int tpm_tis_recv(struct tpm_chip *chip, u8 *buf, size_t count) expected = be32_to_cpu(*(__be32 *) (buf + 2)); if (expected > count) { + dev_err(chip->pdev, "Response too long (wanted %zd, got %d)\n", + count, expected); size = -EIO; goto out; } @@ -366,6 +368,8 @@ static int tpm_tis_send_data(struct tpm_chip *chip, u8 *buf, size_t len) &chip->vendor.int_queue, false); status = tpm_tis_status(chip); if (!itpm && (status & TPM_STS_DATA_EXPECT) == 0) { + dev_err(chip->pdev, "Chip not accepting %zd bytes\n", + len - count); rc = -EIO; goto out_err; } @@ -378,6 +382,7 @@ static int tpm_tis_send_data(struct tpm_chip *chip, u8 *buf, size_t len) &chip->vendor.int_queue, false); status = tpm_tis_status(chip); if ((status & TPM_STS_DATA_EXPECT) != 0) { + dev_err(chip->pdev, "Chip not accepting last byte\n"); rc = -EIO; goto out_err; } @@ -689,8 +694,9 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info, vendor = ioread32(chip->vendor.iobase + TPM_DID_VID(0)); chip->vendor.manufacturer_id = vendor; - dev_info(dev, "%s TPM (device-id 0x%X, rev-id %d)\n", + dev_info(dev, "%s TPM (manufacturer-id 0x%X, device-id 0x%X, rev-id %d)\n", (chip->flags & TPM_CHIP_FLAG_TPM2) ? "2.0" : "1.2", +chip->vendor.manufacturer_id, vendor >> 16, ioread8(chip->vendor.iobase + TPM_RID(0))); if (!itpm) { -- 1.9.1
[PATCH v4 0/4] tpm: Command duration logging and chip-specific override
v4: Rework tpm_get_timeouts() to allow overriding both timeouts and durations via a single callback. This series - improves TPM command error reporting - adds optional logging of TPM command durations - allows chip-specific override of command durations as well as protocol timeouts - overrides ST19NP18 TPM command duration to avoid lockups Ed Swierk (4): tpm_tis: Improve reporting of IO errors tpm: Add optional logging of TPM command durations tpm: Allow TPM chip drivers to override reported command durations tpm_tis: Increase ST19NP18 TPM command duration to avoid chip lockup drivers/char/tpm/tpm-interface.c | 194 ++- drivers/char/tpm/tpm_tis.c | 48 +- include/linux/tpm.h | 3 +- 3 files changed, 131 insertions(+), 114 deletions(-) -- 1.9.1
[PATCH v4 4/4] tpm_tis: Increase ST19NP18 TPM command duration to avoid chip lockup
The STMicro ST19NP18-TPM sometimes takes much longer to execute commands than it reports in its capabilities. For example, command 186 (TPM_FlushSpecific) has been observed to take 14560 msec to complete, far longer than the 3000 msec limit for "short" commands reported by the chip. The behavior has also been seen with command 101 (TPM_GetCapability). Worse, when the tpm_tis driver attempts to cancel the current command (by writing commandReady = 1 to TPM_STS_x), the chip locks up completely, returning all-1s from all memory-mapped register reads. The lockup can be cleared only by resetting the system. The occurrence of this excessive command duration depends on the sequence of commands preceding it. One sequence is creating at least 2 new keys via TPM_CreateWrapKey, then letting the TPM idle for at least 30 seconds, then loading a key via TPM_LoadKey2. The next TPM_FlushSpecific occasionally takes tens of seconds to complete. Another sequence is creating many keys in a row without pause. The TPM_CreateWrapKey operation gets much slower after the first few iterations, as one would expect when the pool of precomputed keys is exhausted. Then after a 35-second pause, the same TPM_LoadKey2 followed by TPM_FlushSpecific sequence triggers the behavior. Our working theory is that this older TPM sometimes pauses to precompute keys, which modern chips implement as a background process. Without access to the chip's implementation details it's impossible to know whether any commands are immune to being blocked by this process. So it seems safest to ignore the chip's reported command durations, and use a value much higher than any observed duration, like 180 sec (which is the duration this chip reports for "long" commands). Signed-off-by: Ed Swierk --- drivers/char/tpm/tpm_tis.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index 5c74980..0041622 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -485,6 +485,11 @@ static void tpm_tis_update_timeouts(struct tpm_chip *chip) chip->vendor.timeout_d = TIS_SHORT_TIMEOUT * HZ / 1000; chip->vendor.timeout_adjusted = true; break; + case 0x104a: /* STMicro ST19NP18-TPM */ + chip->vendor.duration[TPM_SHORT] = 180 * HZ; + chip->vendor.duration[TPM_MEDIUM] = 180 * HZ; + chip->vendor.duration[TPM_LONG] = 180 * HZ; + chip->vendor.duration_adjusted = true; } } -- 1.9.1
[PATCH v4 3/4] tpm: Allow TPM chip drivers to override reported command durations
Some TPM chips report bogus command durations in their capabilities, just as others report incorrect timeouts. Rework tpm_get_timeouts() to allow chip drivers to override either via a single callback. Also clean up handling of TPMs that report milliseconds instead of microseconds. Signed-off-by: Ed Swierk --- drivers/char/tpm/tpm-interface.c | 177 +-- drivers/char/tpm/tpm_tis.c | 35 ++-- include/linux/tpm.h | 3 +- 3 files changed, 106 insertions(+), 109 deletions(-) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index cc1e5bc..b8a08bb 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -502,123 +502,138 @@ static int tpm_startup(struct tpm_chip *chip, __be16 startup_type) "attempting to start the TPM"); } -int tpm_get_timeouts(struct tpm_chip *chip) +static int tpm_get_cap_prop(struct tpm_chip *chip, __be32 type, int size, + cap_t *cap, char *desc) { struct tpm_cmd_t tpm_cmd; - unsigned long new_timeout[4]; - unsigned long old_timeout[4]; - struct duration_t *duration_cap; ssize_t rc; tpm_cmd.header.in = tpm_getcap_header; tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT; + tpm_cmd.params.getcap_in.subcap = type; rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, NULL); if (rc == TPM_ERR_INVALID_POSTINIT) { /* The TPM is not started, we are the first to talk to it. Execute a startup command. */ - dev_info(chip->pdev, "Issuing TPM_STARTUP"); + dev_info(chip->pdev, "Issuing TPM_STARTUP\n"); if (tpm_startup(chip, TPM_ST_CLEAR)) return rc; tpm_cmd.header.in = tpm_getcap_header; tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP; tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4); - tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT; + tpm_cmd.params.getcap_in.subcap = type; rc = tpm_transmit_cmd(chip, &tpm_cmd, TPM_INTERNAL_RESULT_SIZE, NULL); } + if (rc) { dev_err(chip->pdev, - "A TPM error (%zd) occurred attempting to determine the timeouts\n", - rc); - goto duration; + "Error %zd reading %s\n", rc, desc); + return -EINVAL; } if (be32_to_cpu(tpm_cmd.header.out.return_code) != 0 || be32_to_cpu(tpm_cmd.header.out.length) - != sizeof(tpm_cmd.header.out) + sizeof(u32) + 4 * sizeof(u32)) + != sizeof(tpm_cmd.header.out) + sizeof(u32) + size * sizeof(u32)) { + dev_err(chip->pdev, + "Bad return code or length reading %s\n", desc); return -EINVAL; - - old_timeout[0] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.a); - old_timeout[1] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.b); - old_timeout[2] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.c); - old_timeout[3] = be32_to_cpu(tpm_cmd.params.getcap_out.cap.timeout.d); - memcpy(new_timeout, old_timeout, sizeof(new_timeout)); - - /* -* Provide ability for vendor overrides of timeout values in case -* of misreporting. -*/ - if (chip->ops->update_timeouts != NULL) - chip->vendor.timeout_adjusted = - chip->ops->update_timeouts(chip, new_timeout); - - if (!chip->vendor.timeout_adjusted) { - /* Don't overwrite default if value is 0 */ - if (new_timeout[0] != 0 && new_timeout[0] < 1000) { - int i; - - /* timeouts in msec rather usec */ - for (i = 0; i != ARRAY_SIZE(new_timeout); i++) - new_timeout[i] *= 1000; - chip->vendor.timeout_adjusted = true; - } } - /* Report adjusted timeouts */ - if (chip->vendor.timeout_adjusted) { - dev_info(chip->pdev, -HW_ERR "Adjusting reported timeouts: A %lu->%luus B %lu->%luus C %lu->%luus D %lu->%luus\n", -old_timeout[0], new_timeout[0], -old_timeout[1], new_timeout[1], -old_timeout[2], new_timeout[2], -old_timeout[3], new_timeout[3]); - } + memcpy(cap, &tpm_cmd.params.getcap_out.ca
Re: [RFC PATCH urcu on mips, parisc] Fix: compat_futex should work-around futex signal-restart kernel bug
I believe e967ef02 "MIPS: Fix restart of indirect syscalls" should be backported to all stable kernels. It would be a surprising coincidence if parisc suffers from the same problem. --Ed On Thu, Dec 17, 2015 at 4:54 AM, Mathieu Desnoyers wrote: > - On Dec 16, 2015, at 5:09 PM, Mathieu Desnoyers > mathieu.desnoy...@efficios.com wrote: > >> When testing liburcu on a 3.18 Linux kernel, 2-core MIPS (cpu model : >> Ingenic JZRISC V4.15 FPU V0.0), we notice that a blocked sys_futex >> FUTEX_WAIT returns -1, errno=ENOSYS when interrupted by a SA_RESTART >> signal handler. This spurious ENOSYS behavior causes hangs in liburcu >> 0.9.x. Running a MIPS 3.18 kernel under a QEMU emulator exhibits the >> same behavior. This might affect earlier kernels. >> >> This issue appears to be fixed in 3.18.y stable kernels and 3.19, but >> nevertheless, we should try to handle this kernel bug more gracefully >> than a user-space hang due to unexpected spurious ENOSYS return value. > > It's actually fixed in 3.19, but not in 3.18.y stable kernels. The > Linux kernel upstream fix commit is: > e967ef02 "MIPS: Fix restart of indirect syscalls" > > I've created a small test program that could also be used on parisc > to check if it suffers from the same issue (see attached). > > On bogus mips kernels, we see the following output: > [OK] Test program with pid: 5748 SIGUSR1 handler > [FAIL] futex returns -1, Function not implemented > > Let me know if someone can try it out on a parisc kernel. > > Thanks! > > Mathieu > >> >> Therefore, fallback on the "async-safe" version of compat_futex in those >> situations where FUTEX_WAIT returns ENOSYS. This async-safe fallback has >> the nice property of being OK to use concurrently with other FUTEX_WAKE >> and FUTEX_WAIT futex() calls, because it's simply a busy-wait scheme. >> >> We suspect that parisc might be affected by a similar issue (Debian >> build bots reported a similar hang on both mips and parisc), but we do >> not have access to the hardware required to test this hypothesis. >> >> Signed-off-by: Mathieu Desnoyers >> CC: Michael Jeanson >> CC: Paul E. McKenney >> CC: Ralf Baechle >> CC: linux-m...@linux-mips.org >> CC: linux-kernel@vger.kernel.org >> CC: "James E.J. Bottomley" >> CC: Helge Deller >> CC: linux-par...@vger.kernel.org >> --- >> compat_futex.c | 2 ++ >> urcu/futex.h | 12 +++- >> 2 files changed, 13 insertions(+), 1 deletion(-) >> >> diff --git a/compat_futex.c b/compat_futex.c >> index b7f78f0..9e918fe 100644 >> --- a/compat_futex.c >> +++ b/compat_futex.c >> @@ -111,6 +111,8 @@ end: >> * _ASYNC SIGNAL-SAFE_. >> * For now, timeout, uaddr2 and val3 are unused. >> * Waiter will busy-loop trying to read the condition. >> + * It is OK to use compat_futex_async() on a futex address on which >> + * futex() WAKE operations are also performed. >> */ >> >> int compat_futex_async(int32_t *uaddr, int op, int32_t val, >> diff --git a/urcu/futex.h b/urcu/futex.h >> index 4d16cfa..a17eda8 100644 >> --- a/urcu/futex.h >> +++ b/urcu/futex.h >> @@ -73,7 +73,17 @@ static inline int futex_noasync(int32_t *uaddr, int op, >> int32_t val, >> >> ret = futex(uaddr, op, val, timeout, uaddr2, val3); >> if (caa_unlikely(ret < 0 && errno == ENOSYS)) { >> - return compat_futex_noasync(uaddr, op, val, timeout, >> + /* >> + * The fallback on ENOSYS is the async-safe version of >> + * the compat futex implementation, because the >> + * async-safe compat implementation allows being used >> + * concurrently with calls to futex(). Indeed, sys_futex >> + * FUTEX_WAIT, on some architectures (e.g. mips), within >> + * a given process, spuriously return ENOSYS due to >> + * signal restart bugs on some kernel versions (e.g. >> + * Linux kernel 3.18 and possibly earlier). >> + */ >> + return compat_futex_async(uaddr, op, val, timeout, >> uaddr2, val3); >> } >> return ret; >> -- >> 2.1.4 > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ethtool: Extend ethtool plugin module eeprom API to phylib
This patch extends the ethtool plugin module eeprom API to support cards whose phy support is delegated to a separate driver. The handlers for ETHTOOL_GMODULEINFO and ETHTOOL_GMODULEEEPROM call the module_info and module_eeprom functions if the phy driver provides them; otherwise the handlers call the equivalent ethtool_ops functions provided by network drivers with built-in phy support. Signed-off-by: Ed Swierk --- include/linux/phy.h | 9 + net/core/ethtool.c | 45 ++--- 2 files changed, 43 insertions(+), 11 deletions(-) diff --git a/include/linux/phy.h b/include/linux/phy.h index 565188c..04e5f5c 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -472,6 +472,15 @@ struct phy_driver { /* See set_wol, but for checking whether Wake on LAN is enabled. */ void (*get_wol)(struct phy_device *dev, struct ethtool_wolinfo *wol); + /* Get the size and type of the eeprom contained within a plug-in +* module */ + int (*module_info)(struct phy_device *dev, + struct ethtool_modinfo *modinfo); + + /* Get the eeprom information from the plug-in module */ + int (*module_eeprom)(struct phy_device *dev, +struct ethtool_eeprom *ee, u8 *data); + struct device_driver driver; }; #define to_phy_driver(d) container_of(d, struct phy_driver, driver) diff --git a/net/core/ethtool.c b/net/core/ethtool.c index 30071de..466526b 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -1405,20 +1405,31 @@ static int ethtool_get_ts_info(struct net_device *dev, void __user *useraddr) return err; } +static int __ethtool_get_module_info(struct net_device *dev, +struct ethtool_modinfo *modinfo) +{ + const struct ethtool_ops *ops = dev->ethtool_ops; + struct phy_device *phydev = dev->phydev; + + if (phydev && phydev->drv && phydev->drv->module_info) + return phydev->drv->module_info(phydev, modinfo); + + if (ops->get_module_info) + return ops->get_module_info(dev, modinfo); + + return -EOPNOTSUPP; +} + static int ethtool_get_module_info(struct net_device *dev, void __user *useraddr) { int ret; struct ethtool_modinfo modinfo; - const struct ethtool_ops *ops = dev->ethtool_ops; - - if (!ops->get_module_info) - return -EOPNOTSUPP; if (copy_from_user(&modinfo, useraddr, sizeof(modinfo))) return -EFAULT; - ret = ops->get_module_info(dev, &modinfo); + ret = __ethtool_get_module_info(dev, &modinfo); if (ret) return ret; @@ -1428,21 +1439,33 @@ static int ethtool_get_module_info(struct net_device *dev, return 0; } +static int __ethtool_get_module_eeprom(struct net_device *dev, + struct ethtool_eeprom *ee, u8 *data) +{ + const struct ethtool_ops *ops = dev->ethtool_ops; + struct phy_device *phydev = dev->phydev; + + if (phydev && phydev->drv && phydev->drv->module_eeprom) + return phydev->drv->module_eeprom(phydev, ee, data); + + if (ops->get_module_eeprom) + return ops->get_module_eeprom(dev, ee, data); + + return -EOPNOTSUPP; +} + static int ethtool_get_module_eeprom(struct net_device *dev, void __user *useraddr) { int ret; struct ethtool_modinfo modinfo; - const struct ethtool_ops *ops = dev->ethtool_ops; - - if (!ops->get_module_info || !ops->get_module_eeprom) - return -EOPNOTSUPP; - ret = ops->get_module_info(dev, &modinfo); + ret = __ethtool_get_module_info(dev, &modinfo); if (ret) return ret; - return ethtool_get_any_eeprom(dev, useraddr, ops->get_module_eeprom, + return ethtool_get_any_eeprom(dev, useraddr, + __ethtool_get_module_eeprom, modinfo.eeprom_len); } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[updated PATCH] forcedeth: power down phy when interface is down
On 9/19/07, Ayaz Abdulla <[EMAIL PROTECTED]> wrote: > It seems that you are powering down the phy even if WOL is enabled. Right; I've updated the patch to skip powering down the phy when wol is enabled. > Secondly, can you powerdown the phy at the same time you start > performing autoneg restart? This seems to work fine with a BCM5461S phy; it would be worth testing with other phys to be sure. --Ed forcedeth-phy-power-down.patch Description: Binary data
[PATCH] forcedeth: "no link" is informational
Log "no link during initialization" at KERN_INFO as it's not an error, and occurs every time the interface comes up (when the forcedeth-phy-power-down patch is applied). Signed-off-by: Ed Swierk <[EMAIL PROTECTED]> forcedeth-open-no-link-printk.patch Description: Binary data
[PATCH] forcedeth: power down phy when interface is down
Bring the physical link down when the interface is down, by placing the PHY in power-down state. This mirrors the behavior of other drivers including e1000 and tg3. Signed-off-by: Ed Swierk <[EMAIL PROTECTED]> forcedeth-phy-power-down.patch Description: Binary data
Re: [PATCH] Configurable tap interface MTU
On 9/11/07, Herbert Xu <[EMAIL PROTECTED]> wrote: > Please make it 65535 without an Ethernet header and 65521 > with an Ethernet header. Here is a revised patch that allows MTUs up to 65535 for tap interfaces and up to 65521 for tun interfaces. (If I set the MTU to 65521 on a tun interface, ping complains "message too long" when I send a 65521-byte packet; 65520 works okay, though.) --Ed tap-change-mtu.patch Description: Binary data
[PATCH] Configurable tap interface MTU
This patch makes it possible to change the MTU on a tap interface. Increasing the MTU beyond the 1500-byte default is useful for applications that interoperate with Ethernet devices supporting jumbo frames. The patch caps the MTU somewhat arbitrarily at 16000 bytes. This is slightly lower than the value used by the e1000 driver, so it seems like a safe upper limit. Signed-off-by: Ed Swierk <[EMAIL PROTECTED]> --- tap-change-mtu.patch Description: Binary data
Re: Longer command line for kexec on i386
Ed Swierk arastra.com> writes: > I made the following change to kexec: > > [deleted] > > but I'm now seeing intermittent corruption of the initrd in the new kernel--a > few bytes at different locations each time. > > I suspect I've neglected some other important changes to the boot protocol. The corruption is due to a "hardware" bug (qemu), which was in fact corrupting memory (http://article.gmane.org/gmane.comp.emulators.qemu/18774). Extending the command line in kexec somehow made the corruption noticeable. --Ed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Longer command line for kexec on i386
I'm attempting to get kexec to pass a command line longer than 256 bytes to the new kernel, using kexec-tools-testing-20070330 and kernel 2.6.22.1. The new kernel is a bzImage, and I'm using kexec to tack on a command line and an initrd. I made the following change to kexec: --- kexec-tools-testing-20070330.orig/include/x86/x86-linux.h +++ kexec-tools-testing-20070330/include/x86/x86-linux.h @@ -148,14 +148,13 @@ #endif struct e820entry e820_map[E820MAX]; /* 0x2d0 */ /* 0x550 */ -#define COMMAND_LINE_SIZE 256 +#define COMMAND_LINE_SIZE 2048 }; struct x86_linux_faked_param_header { struct x86_linux_param_header hdr; /* 0x00 */ uint8_t reserved16[688];/* 0x550 */ - uint8_t command_line[COMMAND_LINE_SIZE]; /* 0x800 */ - uint8_t reserved17[1792]; /* 0x900 - 0x1000 */ + uint8_t command_line[COMMAND_LINE_SIZE]; /* 0x800 - 0x1000 */ }; struct x86_linux_header { but I'm now seeing intermittent corruption of the initrd in the new kernel--a few bytes at different locations each time. I suspect I've neglected some other important changes to the boot protocol. Any clues would be appreciated. Thanks, --Ed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/