Re: [PATCH V5 0/2] nvme-pci: fix the timeout case when reset is ongoing
On Thu, Jan 18, 2018 at 06:10:00PM +0800, Jianchao Wang wrote: > Hello > > Please consider the following scenario. > nvme_reset_ctrl > -> set state to RESETTING > -> queue reset_work > (scheduling) > nvme_reset_work > -> nvme_dev_disable > -> quiesce queues > -> nvme_cancel_request >on outstanding requests > ---_boundary_ > -> nvme initializing (issue request on adminq) > > Before the _boundary_, not only quiesce the queues, but only cancel > all the outstanding requests. > > A request could expire when the ctrl state is RESETTING. > - If the timeout occur before the _boundary_, the expired requests >are from the previous work. > - Otherwise, the expired requests are from the controller initializing >procedure, such as sending cq/sq create commands to adminq to setup >io queues. > In current implementation, nvme_timeout cannot identify the _boundary_ > so only handles second case above. Bare with me a moment, as I'm only just now getting a real chance to look at this, and I'm not quite sure I follow what problem this is solving. The nvme_dev_disable routine makes forward progress without depending on timeout handling to complete expired commands. Once controller disabling completes, there can't possibly be any started requests that can expire. So we don't need nvme_timeout to do anything for requests above the boundary.
Re: [PATCH V5 0/2] nvme-pci: fix the timeout case when reset is ongoing
On Thu, Jan 18, 2018 at 06:10:00PM +0800, Jianchao Wang wrote: > Hello > > Please consider the following scenario. > nvme_reset_ctrl > -> set state to RESETTING > -> queue reset_work > (scheduling) > nvme_reset_work > -> nvme_dev_disable > -> quiesce queues > -> nvme_cancel_request >on outstanding requests > ---_boundary_ > -> nvme initializing (issue request on adminq) > > Before the _boundary_, not only quiesce the queues, but only cancel > all the outstanding requests. > > A request could expire when the ctrl state is RESETTING. > - If the timeout occur before the _boundary_, the expired requests >are from the previous work. > - Otherwise, the expired requests are from the controller initializing >procedure, such as sending cq/sq create commands to adminq to setup >io queues. > In current implementation, nvme_timeout cannot identify the _boundary_ > so only handles second case above. Bare with me a moment, as I'm only just now getting a real chance to look at this, and I'm not quite sure I follow what problem this is solving. The nvme_dev_disable routine makes forward progress without depending on timeout handling to complete expired commands. Once controller disabling completes, there can't possibly be any started requests that can expire. So we don't need nvme_timeout to do anything for requests above the boundary.
Re: [PATCH 6/6] s390: scrub registers on kernel entry and KVM exit
On 01/19/2018 07:29 AM, QingFeng Hao wrote: > > > 在 2018/1/17 17:48, Martin Schwidefsky 写道: >> Clear all user space registers on entry to the kernel and all KVM guest >> registers on KVM guest exit if the register does not contain either a >> parameter or a result value. > I am not sure if I understand this but it will be safer? It ist similar to commit 0cb5b30698fd ("kvm: vmx: Scrub hardware GPRs at VM-exit"). The idea is to minimize potential payload channels. > And can we abstract the operations to be a macro like CLEAR_REG_7? No, please. xgr %r7,%r7 is absolutely clear what it does, a MACRO often is not.
Re: [PATCH 6/6] s390: scrub registers on kernel entry and KVM exit
On 01/19/2018 07:29 AM, QingFeng Hao wrote: > > > 在 2018/1/17 17:48, Martin Schwidefsky 写道: >> Clear all user space registers on entry to the kernel and all KVM guest >> registers on KVM guest exit if the register does not contain either a >> parameter or a result value. > I am not sure if I understand this but it will be safer? It ist similar to commit 0cb5b30698fd ("kvm: vmx: Scrub hardware GPRs at VM-exit"). The idea is to minimize potential payload channels. > And can we abstract the operations to be a macro like CLEAR_REG_7? No, please. xgr %r7,%r7 is absolutely clear what it does, a MACRO often is not.
[PATCH][V2] mtd: nand: marvell: fix spelling mistake: "suceed"-> "succeed"
From: Colin Ian KingTrivial fix to spelling mistakes in dev_err error message text. Signed-off-by: Colin Ian King --- drivers/mtd/nand/marvell_nand.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mtd/nand/marvell_nand.c b/drivers/mtd/nand/marvell_nand.c index b8fec6093b75..4bd53b360277 100644 --- a/drivers/mtd/nand/marvell_nand.c +++ b/drivers/mtd/nand/marvell_nand.c @@ -517,7 +517,7 @@ static int marvell_nfc_prepare_cmd(struct nand_chip *chip) /* Poll ND_RUN and clear NDSR before issuing any command */ ret = marvell_nfc_wait_ndrun(chip); if (ret) { - dev_err(nfc->dev, "Last operation did not suceed\n"); + dev_err(nfc->dev, "Last operation did not succeed\n"); return ret; } -- 2.15.1
[PATCH][V2] mtd: nand: marvell: fix spelling mistake: "suceed"-> "succeed"
From: Colin Ian King Trivial fix to spelling mistakes in dev_err error message text. Signed-off-by: Colin Ian King --- drivers/mtd/nand/marvell_nand.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mtd/nand/marvell_nand.c b/drivers/mtd/nand/marvell_nand.c index b8fec6093b75..4bd53b360277 100644 --- a/drivers/mtd/nand/marvell_nand.c +++ b/drivers/mtd/nand/marvell_nand.c @@ -517,7 +517,7 @@ static int marvell_nfc_prepare_cmd(struct nand_chip *chip) /* Poll ND_RUN and clear NDSR before issuing any command */ ret = marvell_nfc_wait_ndrun(chip); if (ret) { - dev_err(nfc->dev, "Last operation did not suceed\n"); + dev_err(nfc->dev, "Last operation did not succeed\n"); return ret; } -- 2.15.1
Re: [PATCH 4.4 045/115] sched/deadline: Throttle a constrained deadline task activated after the deadline
On Fri, Jan 19, 2018 at 01:00:45AM +, Ben Hutchings wrote: > On Mon, 2017-12-18 at 16:48 +0100, Greg Kroah-Hartman wrote: > > 4.4-stable review patch. If anyone has any objections, please let me > > know. > > > > -- > > > > From: Daniel Bristot de Oliveira> > > > > > [ Upstream commit df8eac8cafce7d086be3bd5cf5a838fa37594dfb ] > [...] > > I think this needs another fix on top: > > commit ae83b56a56f8d9643dedbee86b457fa1c5d42f59 > Author: Xunlei Pang > Date: Wed May 10 21:03:37 2017 +0800 > > sched/deadline: Zero out positive runtime after throttling constrained > tasks Now queued up, thanks. > There's another fix related to this, but it doesn't appear to fix a > regression and I don't know how critical it is: > > commit 3effcb4247e74a51f5d8b775a1ee4abf87cc089a > Author: Daniel Bristot de Oliveira > Date: Mon May 29 16:24:03 2017 +0200 > > sched/deadline: Use the revised wakeup rule for suspending constrained dl > tasks I'll hold off on this one until someone actually asks for it, as it's a big change. thanks again for the review, greg k-h
Re: [PATCH 4.4 045/115] sched/deadline: Throttle a constrained deadline task activated after the deadline
On Fri, Jan 19, 2018 at 01:00:45AM +, Ben Hutchings wrote: > On Mon, 2017-12-18 at 16:48 +0100, Greg Kroah-Hartman wrote: > > 4.4-stable review patch. If anyone has any objections, please let me > > know. > > > > -- > > > > From: Daniel Bristot de Oliveira > > > > > > [ Upstream commit df8eac8cafce7d086be3bd5cf5a838fa37594dfb ] > [...] > > I think this needs another fix on top: > > commit ae83b56a56f8d9643dedbee86b457fa1c5d42f59 > Author: Xunlei Pang > Date: Wed May 10 21:03:37 2017 +0800 > > sched/deadline: Zero out positive runtime after throttling constrained > tasks Now queued up, thanks. > There's another fix related to this, but it doesn't appear to fix a > regression and I don't know how critical it is: > > commit 3effcb4247e74a51f5d8b775a1ee4abf87cc089a > Author: Daniel Bristot de Oliveira > Date: Mon May 29 16:24:03 2017 +0200 > > sched/deadline: Use the revised wakeup rule for suspending constrained dl > tasks I'll hold off on this one until someone actually asks for it, as it's a big change. thanks again for the review, greg k-h
Re: [PATCH 4.4 040/115] scsi: hpsa: update check for logical volume status
On Fri, Jan 19, 2018 at 12:29:12AM +, Ben Hutchings wrote: > On Mon, 2017-12-18 at 16:48 +0100, Greg Kroah-Hartman wrote: > > 4.4-stable review patch. If anyone has any objections, please let me know. > > > > -- > > > > From: Don Brace> > > > > > [ Upstream commit 85b29008d8af6d94a0723aaa8d93cfb6e041158b ] > > > > - Add in a new case for volume offline. Resolves internal testing bug > > for multilun array management. > > - Return correct status for failed TURs. > [...] > > This apparently caused a regression that is fixed by: > > commit eb94588dabec82e012281608949a860f64752914 > Author: Tomas Henzl > Date: Mon Mar 20 16:42:48 2017 +0100 > > scsi: hpsa: fix volume offline state Many thanks, also now queued up for 4.9 which needs this too. greg k-h
Re: [PATCH 4.4 040/115] scsi: hpsa: update check for logical volume status
On Fri, Jan 19, 2018 at 12:29:12AM +, Ben Hutchings wrote: > On Mon, 2017-12-18 at 16:48 +0100, Greg Kroah-Hartman wrote: > > 4.4-stable review patch. If anyone has any objections, please let me know. > > > > -- > > > > From: Don Brace > > > > > > [ Upstream commit 85b29008d8af6d94a0723aaa8d93cfb6e041158b ] > > > > - Add in a new case for volume offline. Resolves internal testing bug > > for multilun array management. > > - Return correct status for failed TURs. > [...] > > This apparently caused a regression that is fixed by: > > commit eb94588dabec82e012281608949a860f64752914 > Author: Tomas Henzl > Date: Mon Mar 20 16:42:48 2017 +0100 > > scsi: hpsa: fix volume offline state Many thanks, also now queued up for 4.9 which needs this too. greg k-h
Re: [PATCH] general protection fault in sock_has_perm
On Thu, Jan 18, 2018 at 01:58:45PM -0800, Mark Salyzyn wrote: > general protection fault: [#1] PREEMPT SMP KASAN > CPU: 1 PID: 14233 Comm: syz-executor2 Not tainted 4.4.112-g5f6325b #28 > task: 8801d1095f00 task.stack: 8800b595 > RIP: 0010:[] [] > sock_has_perm+0x1fe/0x3e0 security/selinux/hooks.c:4069 > RSP: 0018:8800b5957ce0 EFLAGS: 00010202 > RAX: dc00 RBX: 110016b2af9f RCX: 81b69b51 > RDX: 0002 RSI: RDI: 0010 > RBP: 8800b5957de0 R08: 0001 R09: 0001 > R10: R11: 110016b2af68 R12: 8800b5957db8 > R13: R14: 8800b7259f40 R15: 00d7 > FS: 7f72f5ae2700() GS:8801db30() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 00a2fa38 CR3: 0001d798 CR4: 00160670 > DR0: DR1: DR2: > DR3: DR6: fffe0ff0 DR7: 0400 > Stack: > 81b69a1f 8800b5957d58 8000b5957d30 41b58ab3 > 83fc82f2 81b69980 0246 8801d1096770 > 8801d3165668 8157844b 8801d1095f00 > 8801 > Call Trace: > [] selinux_socket_setsockopt+0x4d/0x80 > security/selinux/hooks.c:4338 > [] security_socket_setsockopt+0x7d/0xb0 > security/security.c:1257 > [] SYSC_setsockopt net/socket.c:1757 [inline] > [] SyS_setsockopt+0xe8/0x250 net/socket.c:1746 > [] entry_SYSCALL_64_fastpath+0x16/0x92 > Code: c2 42 9b b6 81 be 01 00 00 00 48 c7 c7 a0 cb 2b 84 e8 > f7 2f 6d ff 49 8d 7d 10 48 b8 00 00 00 00 00 fc ff df 48 89 > fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 83 01 00 > 00 41 8b 75 10 31 > RIP [] sock_has_perm+0x1fe/0x3e0 > security/selinux/hooks.c:4069 > RSP > ---[ end trace 7b5aaf788fef6174 ]--- > > In the absence of commit a4298e4522d6 ("net: add SOCK_RCU_FREE socket > flag") and all the associated infrastructure changes to take advantage > of a RCU grace period before freeing, there is a heightened > possibility that a security check is performed while an ill-timed > setsockopt call races in from user space. It then is prudent to null > check sk_security, and if the case, reject the permissions. > > This adjustment is orthogonal to infrastructure improvements that may > nullify the needed check, but should be added as good code hygiene. > > Signed-off-by: Mark Salyzyn> Cc: Paul Moore > Cc: Stephen Smalley > Cc: Eric Paris > Cc: James Morris > Cc: "Serge E. Hallyn" > Cc: seli...@tycho.nsa.gov > Cc: linux-security-mod...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: sta...@vger.kernel.org > --- > This patch should be applied to all stable trees (author wants > minimum of 3.18, 4.4, 4.9 and 4.14) Note, if you want this type of thing to show up in the patch itself, so I will see it when it hits Linus's tree, you can just change the stable line to be: cc: stable # 3.18+ thanks, greg k-h
Re: [PATCH] general protection fault in sock_has_perm
On Thu, Jan 18, 2018 at 01:58:45PM -0800, Mark Salyzyn wrote: > general protection fault: [#1] PREEMPT SMP KASAN > CPU: 1 PID: 14233 Comm: syz-executor2 Not tainted 4.4.112-g5f6325b #28 > task: 8801d1095f00 task.stack: 8800b595 > RIP: 0010:[] [] > sock_has_perm+0x1fe/0x3e0 security/selinux/hooks.c:4069 > RSP: 0018:8800b5957ce0 EFLAGS: 00010202 > RAX: dc00 RBX: 110016b2af9f RCX: 81b69b51 > RDX: 0002 RSI: RDI: 0010 > RBP: 8800b5957de0 R08: 0001 R09: 0001 > R10: R11: 110016b2af68 R12: 8800b5957db8 > R13: R14: 8800b7259f40 R15: 00d7 > FS: 7f72f5ae2700() GS:8801db30() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 00a2fa38 CR3: 0001d798 CR4: 00160670 > DR0: DR1: DR2: > DR3: DR6: fffe0ff0 DR7: 0400 > Stack: > 81b69a1f 8800b5957d58 8000b5957d30 41b58ab3 > 83fc82f2 81b69980 0246 8801d1096770 > 8801d3165668 8157844b 8801d1095f00 > 8801 > Call Trace: > [] selinux_socket_setsockopt+0x4d/0x80 > security/selinux/hooks.c:4338 > [] security_socket_setsockopt+0x7d/0xb0 > security/security.c:1257 > [] SYSC_setsockopt net/socket.c:1757 [inline] > [] SyS_setsockopt+0xe8/0x250 net/socket.c:1746 > [] entry_SYSCALL_64_fastpath+0x16/0x92 > Code: c2 42 9b b6 81 be 01 00 00 00 48 c7 c7 a0 cb 2b 84 e8 > f7 2f 6d ff 49 8d 7d 10 48 b8 00 00 00 00 00 fc ff df 48 89 > fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 83 01 00 > 00 41 8b 75 10 31 > RIP [] sock_has_perm+0x1fe/0x3e0 > security/selinux/hooks.c:4069 > RSP > ---[ end trace 7b5aaf788fef6174 ]--- > > In the absence of commit a4298e4522d6 ("net: add SOCK_RCU_FREE socket > flag") and all the associated infrastructure changes to take advantage > of a RCU grace period before freeing, there is a heightened > possibility that a security check is performed while an ill-timed > setsockopt call races in from user space. It then is prudent to null > check sk_security, and if the case, reject the permissions. > > This adjustment is orthogonal to infrastructure improvements that may > nullify the needed check, but should be added as good code hygiene. > > Signed-off-by: Mark Salyzyn > Cc: Paul Moore > Cc: Stephen Smalley > Cc: Eric Paris > Cc: James Morris > Cc: "Serge E. Hallyn" > Cc: seli...@tycho.nsa.gov > Cc: linux-security-mod...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: sta...@vger.kernel.org > --- > This patch should be applied to all stable trees (author wants > minimum of 3.18, 4.4, 4.9 and 4.14) Note, if you want this type of thing to show up in the patch itself, so I will see it when it hits Linus's tree, you can just change the stable line to be: cc: stable # 3.18+ thanks, greg k-h
答复: 答复: 答复: [PATCH v6] mfd: Add support for RTS5250S power saving
> On Wed, Dec 27, 2017 at 05:37:50PM -0600, Bjorn Helgaas wrote: > > On Tue, Dec 19, 2017 at 08:15:24AM +, 冯锐 wrote: > > > > On Fri, Dec 15, 2017 at 09:42:45AM +, 冯锐 wrote: > > > > > > [+cc Hans, Dave, linux-pci] > > > > > > > > > > > > On Thu, Sep 07, 2017 at 04:26:39PM +0800, > > > > > > rui_f...@realsil.com.cn > > > > wrote: > > > > > > > From: Rui Feng> > > > > > > > > > > > I wish this had been posted to linux-pci before being merged. > > > > > > > > > > > > I'm concerned because some of this appears to overlap and > > > > > > conflict with PCI core management of ASPM. > > > > > > > > > > > > I assume these devices advertise ASPM support in their Link > > > > > > Capabilites registers, right? If so, why isn't the existing > > > > > > PCI core ASPM support sufficient? > > > > > > > > > > > When L1SS is configured, the device(hardware) can't enter L1SS > > > > > status automatically, it need driver(software) to do some work > > > > > to achieve the > > > > function. > > > > > > > > So this is a hardware defect in the device? As far as I know, > > > > ASPM and L1SS are specified such that they should work without special > driver support. > > > > > > > Yes, you can say that. > > > > > > > > > > Enable power saving for RTS5250S as following steps: > > > > > > > 1.Set 0xFE58 to enable clock power management. > > > > > > > > > > > > Is this clock power management something specific to RTS5250S, > > > > > > or is it standard PCIe architected stuff? > > > > > > > > > > > 0xFE58 is specific register to RTS5250S not standard PCIe architected > stuff. > > > > > > > > OK. I asked because devices often mirror architected PCIe config > > > > things in device-specific MMIO space, and if I squint just right, > > > > I can sort of match up the register bits you used with things in the > > > > PCIe > spec. > > > > > > > > > > > 2.Check cfg space whether support L1SS or not. > > > > > > > > > > > > This sounds like standard PCIe ASPM L1 Substates, right? > > > > > > > > > > > Yes. > > > > > > > > > > > > 3.If support L1SS, set 0xFF03 to free clkreq. > > > > > > > 4.When entering idle status, enable aspm > > > > > > > and set parameters for L1SS and LTR. > > > > > > > 5.Wnen entering run status, disable aspm > > > > > > > and set parameters for L1SS and LTR. > > > > > > > > > > > > In general, drivers should not configure ASPM, L1SS, and LTR > > > > > > themselves; the PCI core should do that. > > > > > > > > > > > > If a driver needs to tweak ASPM at run-time, it should use > > > > > > interfaces exported by the PCI core to do so. > > > > > > > > > > > Which interface I can use to set ASPM? I use "pci_write_config_byte" > now. > > > > > > > > What do you need to do? include/linux/pci-aspm.h exports > > > > pci_disable_link_state(), which is mainly used to avoid ASPM > > > > states that have hardware errata. > > > > > > > I want to enable ASPM(L0 -> L1) and disable ASPM(L1 -> L0), which > > > interface can I use? > > > > You can use pci_disable_link_state() to disable usage of L1. > > > > Currently there is no corresponding pci_enable_link_state(). What if > > we added something like the following (untested)? Would that work for > > you? > > Hi Rui, > > Any thoughts on the patch below? I'm busy with other work, the patch seems ok, I will test it later. > > > commit 209930d809fa602b8aafdd171b26719cee6c6649 > > Author: Bjorn Helgaas > > Date: Wed Dec 27 16:56:26 2017 -0600 > > > > PCI/ASPM: Add pci_enable_link_state() > > > > Some drivers want control over the ASPM states their device is allowed > to > > use. We already have a pci_disable_link_state(), and drivers can use > that > > to prevent the device from entering L0 or L1s. > > > > Add a corresponding pci_enable_link_state() so a driver can enable use > of > > L0 or L1s again. > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index > > 3b9b4d50cd98..ca217195f800 100644 > > --- a/drivers/pci/pcie/aspm.c > > +++ b/drivers/pci/pcie/aspm.c > > @@ -1028,6 +1028,67 @@ void pcie_aspm_powersave_config_link(struct > pci_dev *pdev) > > up_read(_bus_sem); > > } > > > > +/** > > + * pci_enable_link_state - Enable device's link state, so the link > > +may > > + * enter specific states. Note that if the BIOS didn't grant ASPM > > + * control to the OS, this does nothing because we can't touch the > > +LNKCTL > > + * register. > > + * > > + * @pdev: PCI device > > + * @state: ASPM link state to enable > > + */ > > +void pci_enable_link_state(struct pci_dev *pdev, int state) { > > + struct pci_dev *parent = pdev->bus->self; > > + struct pcie_link_state *link; > > + u32 lnkcap; > > + > > + if (!pci_is_pcie(pdev)) > > + return; > > + > > + if (pdev->has_secondary_link) > > + parent = pdev; > > + if (!parent || !parent->link_state) > > + return; > > + > > + /* > > +* A driver requested that ASPM be enabled on this
答复: 答复: 答复: [PATCH v6] mfd: Add support for RTS5250S power saving
> On Wed, Dec 27, 2017 at 05:37:50PM -0600, Bjorn Helgaas wrote: > > On Tue, Dec 19, 2017 at 08:15:24AM +, 冯锐 wrote: > > > > On Fri, Dec 15, 2017 at 09:42:45AM +, 冯锐 wrote: > > > > > > [+cc Hans, Dave, linux-pci] > > > > > > > > > > > > On Thu, Sep 07, 2017 at 04:26:39PM +0800, > > > > > > rui_f...@realsil.com.cn > > > > wrote: > > > > > > > From: Rui Feng > > > > > > > > > > > > I wish this had been posted to linux-pci before being merged. > > > > > > > > > > > > I'm concerned because some of this appears to overlap and > > > > > > conflict with PCI core management of ASPM. > > > > > > > > > > > > I assume these devices advertise ASPM support in their Link > > > > > > Capabilites registers, right? If so, why isn't the existing > > > > > > PCI core ASPM support sufficient? > > > > > > > > > > > When L1SS is configured, the device(hardware) can't enter L1SS > > > > > status automatically, it need driver(software) to do some work > > > > > to achieve the > > > > function. > > > > > > > > So this is a hardware defect in the device? As far as I know, > > > > ASPM and L1SS are specified such that they should work without special > driver support. > > > > > > > Yes, you can say that. > > > > > > > > > > Enable power saving for RTS5250S as following steps: > > > > > > > 1.Set 0xFE58 to enable clock power management. > > > > > > > > > > > > Is this clock power management something specific to RTS5250S, > > > > > > or is it standard PCIe architected stuff? > > > > > > > > > > > 0xFE58 is specific register to RTS5250S not standard PCIe architected > stuff. > > > > > > > > OK. I asked because devices often mirror architected PCIe config > > > > things in device-specific MMIO space, and if I squint just right, > > > > I can sort of match up the register bits you used with things in the > > > > PCIe > spec. > > > > > > > > > > > 2.Check cfg space whether support L1SS or not. > > > > > > > > > > > > This sounds like standard PCIe ASPM L1 Substates, right? > > > > > > > > > > > Yes. > > > > > > > > > > > > 3.If support L1SS, set 0xFF03 to free clkreq. > > > > > > > 4.When entering idle status, enable aspm > > > > > > > and set parameters for L1SS and LTR. > > > > > > > 5.Wnen entering run status, disable aspm > > > > > > > and set parameters for L1SS and LTR. > > > > > > > > > > > > In general, drivers should not configure ASPM, L1SS, and LTR > > > > > > themselves; the PCI core should do that. > > > > > > > > > > > > If a driver needs to tweak ASPM at run-time, it should use > > > > > > interfaces exported by the PCI core to do so. > > > > > > > > > > > Which interface I can use to set ASPM? I use "pci_write_config_byte" > now. > > > > > > > > What do you need to do? include/linux/pci-aspm.h exports > > > > pci_disable_link_state(), which is mainly used to avoid ASPM > > > > states that have hardware errata. > > > > > > > I want to enable ASPM(L0 -> L1) and disable ASPM(L1 -> L0), which > > > interface can I use? > > > > You can use pci_disable_link_state() to disable usage of L1. > > > > Currently there is no corresponding pci_enable_link_state(). What if > > we added something like the following (untested)? Would that work for > > you? > > Hi Rui, > > Any thoughts on the patch below? I'm busy with other work, the patch seems ok, I will test it later. > > > commit 209930d809fa602b8aafdd171b26719cee6c6649 > > Author: Bjorn Helgaas > > Date: Wed Dec 27 16:56:26 2017 -0600 > > > > PCI/ASPM: Add pci_enable_link_state() > > > > Some drivers want control over the ASPM states their device is allowed > to > > use. We already have a pci_disable_link_state(), and drivers can use > that > > to prevent the device from entering L0 or L1s. > > > > Add a corresponding pci_enable_link_state() so a driver can enable use > of > > L0 or L1s again. > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index > > 3b9b4d50cd98..ca217195f800 100644 > > --- a/drivers/pci/pcie/aspm.c > > +++ b/drivers/pci/pcie/aspm.c > > @@ -1028,6 +1028,67 @@ void pcie_aspm_powersave_config_link(struct > pci_dev *pdev) > > up_read(_bus_sem); > > } > > > > +/** > > + * pci_enable_link_state - Enable device's link state, so the link > > +may > > + * enter specific states. Note that if the BIOS didn't grant ASPM > > + * control to the OS, this does nothing because we can't touch the > > +LNKCTL > > + * register. > > + * > > + * @pdev: PCI device > > + * @state: ASPM link state to enable > > + */ > > +void pci_enable_link_state(struct pci_dev *pdev, int state) { > > + struct pci_dev *parent = pdev->bus->self; > > + struct pcie_link_state *link; > > + u32 lnkcap; > > + > > + if (!pci_is_pcie(pdev)) > > + return; > > + > > + if (pdev->has_secondary_link) > > + parent = pdev; > > + if (!parent || !parent->link_state) > > + return; > > + > > + /* > > +* A driver requested that ASPM be enabled on this device, but > > +* if we don't have
Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle
On Fri, Jan 19, 2018 at 05:09:46AM +, Bart Van Assche wrote: > On Fri, 2018-01-19 at 10:32 +0800, Ming Lei wrote: > > Now most of times both NVMe and SCSI won't return BLK_STS_RESOURCE, and > > it should be DM-only which returns STS_RESOURCE so often. > > That's wrong at least for SCSI. See also > https://marc.info/?l=linux-block=151578329417076. > > For other scenario's, e.g. if a SCSI initiator submits a > SCSI request over a fabric and the SCSI target replies with "BUSY" then the Could you explain a bit when SCSI target replies with BUSY very often? Inside initiator, we have limited the max per-LUN requests and per-host requests already before calling .queue_rq(). > SCSI core will end the I/O request with status BLK_STS_RESOURCE after the > maximum number of retries has been reached (see also scsi_io_completion()). > In that last case, if a SCSI target sends a "BUSY" reply over the wire back > to the initiator, there is no other approach for the SCSI initiator to > figure out whether it can queue another request than to resubmit the > request. The worst possible strategy is to resubmit a request immediately > because that will cause a significant fraction of the fabric bandwidth to > be used just for replying "BUSY" to requests that can't be processed > immediately. -- Ming
Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle
On Fri, Jan 19, 2018 at 05:09:46AM +, Bart Van Assche wrote: > On Fri, 2018-01-19 at 10:32 +0800, Ming Lei wrote: > > Now most of times both NVMe and SCSI won't return BLK_STS_RESOURCE, and > > it should be DM-only which returns STS_RESOURCE so often. > > That's wrong at least for SCSI. See also > https://marc.info/?l=linux-block=151578329417076. > > For other scenario's, e.g. if a SCSI initiator submits a > SCSI request over a fabric and the SCSI target replies with "BUSY" then the Could you explain a bit when SCSI target replies with BUSY very often? Inside initiator, we have limited the max per-LUN requests and per-host requests already before calling .queue_rq(). > SCSI core will end the I/O request with status BLK_STS_RESOURCE after the > maximum number of retries has been reached (see also scsi_io_completion()). > In that last case, if a SCSI target sends a "BUSY" reply over the wire back > to the initiator, there is no other approach for the SCSI initiator to > figure out whether it can queue another request than to resubmit the > request. The worst possible strategy is to resubmit a request immediately > because that will cause a significant fraction of the fabric bandwidth to > be used just for replying "BUSY" to requests that can't be processed > immediately. -- Ming
Re: [patch v17 2/4] drivers: jtag: Add Aspeed SoC 24xx and 25xx families JTAG master driver
Hi Oleksandr, I love your patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.15-rc8] [cannot apply to next-20180118] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Oleksandr-Shamray/drivers-jtag-Add-JTAG-core-driver/20180119-123719 config: ia64-allmodconfig (attached as .config) compiler: ia64-linux-gcc (GCC) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=ia64 Note: the linux-review/Oleksandr-Shamray/drivers-jtag-Add-JTAG-core-driver/20180119-123719 HEAD b9c3d4721186f8264960ad87c6c499cdd1b6c2e8 builds fine. It only hurts bisectibility. All error/warnings (new ones prefixed by >>): drivers/jtag/jtag-aspeed.c: In function 'aspeed_jtag_init': >> drivers/jtag/jtag-aspeed.c:657:21: error: implicit declaration of function >> 'devm_reset_control_get_shared'; did you mean 'devm_pinctrl_get_select'? >> [-Werror=implicit-function-declaration] aspeed_jtag->rst = devm_reset_control_get_shared(aspeed_jtag->dev, ^ devm_pinctrl_get_select >> drivers/jtag/jtag-aspeed.c:657:19: warning: assignment makes pointer from >> integer without a cast [-Wint-conversion] aspeed_jtag->rst = devm_reset_control_get_shared(aspeed_jtag->dev, ^ >> drivers/jtag/jtag-aspeed.c:664:2: error: implicit declaration of function >> 'reset_control_deassert' [-Werror=implicit-function-declaration] reset_control_deassert(aspeed_jtag->rst); ^~ drivers/jtag/jtag-aspeed.c: In function 'aspeed_jtag_deinit': >> drivers/jtag/jtag-aspeed.c:707:2: error: implicit declaration of function >> 'reset_control_assert' [-Werror=implicit-function-declaration] reset_control_assert(aspeed_jtag->rst); ^~~~ cc1: some warnings being treated as errors vim +657 drivers/jtag/jtag-aspeed.c 631 632 int aspeed_jtag_init(struct platform_device *pdev, 633 struct aspeed_jtag *aspeed_jtag) 634 { 635 struct resource *res; 636 int err; 637 638 res = platform_get_resource(pdev, IORESOURCE_MEM, 0); 639 aspeed_jtag->reg_base = devm_ioremap_resource(aspeed_jtag->dev, res); 640 if (IS_ERR(aspeed_jtag->reg_base)) 641 return -ENOMEM; 642 643 aspeed_jtag->pclk = devm_clk_get(aspeed_jtag->dev, NULL); 644 if (IS_ERR(aspeed_jtag->pclk)) { 645 dev_err(aspeed_jtag->dev, "devm_clk_get failed\n"); 646 return PTR_ERR(aspeed_jtag->pclk); 647 } 648 649 aspeed_jtag->irq = platform_get_irq(pdev, 0); 650 if (aspeed_jtag->irq < 0) { 651 dev_err(aspeed_jtag->dev, "no irq specified\n"); 652 return -ENOENT; 653 } 654 655 clk_prepare_enable(aspeed_jtag->pclk); 656 > 657 aspeed_jtag->rst = > devm_reset_control_get_shared(aspeed_jtag->dev, 658 NULL); 659 if (IS_ERR(aspeed_jtag->rst)) { 660 dev_err(aspeed_jtag->dev, 661 "missing or invalid reset controller device tree entry"); 662 return PTR_ERR(aspeed_jtag->rst); 663 } > 664 reset_control_deassert(aspeed_jtag->rst); 665 666 /* Enable clock */ 667 aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_CTL_ENG_EN | 668ASPEED_JTAG_CTL_ENG_OUT_EN, ASPEED_JTAG_CTRL); 669 aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_SW_MODE_EN | 670ASPEED_JTAG_SW_MODE_TDIO, ASPEED_JTAG_SW); 671 672 err = devm_request_irq(aspeed_jtag->dev, aspeed_jtag->irq, 673 aspeed_jtag_interrupt, 0, 674 "aspeed-jtag", aspeed_jtag); 675 if (err) { 676 dev_err(aspeed_jtag->dev, "unable to get IRQ"); 677 goto clk_unprep; 678 } 679 dev_dbg(>dev, "IRQ %d.\n", aspeed_jtag->irq); 680 681 aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_ISR_INST_PAUSE | 682ASPEED_JTAG_ISR_INST_COMPLETE | 683ASPEED_JTAG_ISR_DATA_PAUSE | 684ASPEED_JTAG
Re: [patch v17 2/4] drivers: jtag: Add Aspeed SoC 24xx and 25xx families JTAG master driver
Hi Oleksandr, I love your patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.15-rc8] [cannot apply to next-20180118] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Oleksandr-Shamray/drivers-jtag-Add-JTAG-core-driver/20180119-123719 config: ia64-allmodconfig (attached as .config) compiler: ia64-linux-gcc (GCC) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=ia64 Note: the linux-review/Oleksandr-Shamray/drivers-jtag-Add-JTAG-core-driver/20180119-123719 HEAD b9c3d4721186f8264960ad87c6c499cdd1b6c2e8 builds fine. It only hurts bisectibility. All error/warnings (new ones prefixed by >>): drivers/jtag/jtag-aspeed.c: In function 'aspeed_jtag_init': >> drivers/jtag/jtag-aspeed.c:657:21: error: implicit declaration of function >> 'devm_reset_control_get_shared'; did you mean 'devm_pinctrl_get_select'? >> [-Werror=implicit-function-declaration] aspeed_jtag->rst = devm_reset_control_get_shared(aspeed_jtag->dev, ^ devm_pinctrl_get_select >> drivers/jtag/jtag-aspeed.c:657:19: warning: assignment makes pointer from >> integer without a cast [-Wint-conversion] aspeed_jtag->rst = devm_reset_control_get_shared(aspeed_jtag->dev, ^ >> drivers/jtag/jtag-aspeed.c:664:2: error: implicit declaration of function >> 'reset_control_deassert' [-Werror=implicit-function-declaration] reset_control_deassert(aspeed_jtag->rst); ^~ drivers/jtag/jtag-aspeed.c: In function 'aspeed_jtag_deinit': >> drivers/jtag/jtag-aspeed.c:707:2: error: implicit declaration of function >> 'reset_control_assert' [-Werror=implicit-function-declaration] reset_control_assert(aspeed_jtag->rst); ^~~~ cc1: some warnings being treated as errors vim +657 drivers/jtag/jtag-aspeed.c 631 632 int aspeed_jtag_init(struct platform_device *pdev, 633 struct aspeed_jtag *aspeed_jtag) 634 { 635 struct resource *res; 636 int err; 637 638 res = platform_get_resource(pdev, IORESOURCE_MEM, 0); 639 aspeed_jtag->reg_base = devm_ioremap_resource(aspeed_jtag->dev, res); 640 if (IS_ERR(aspeed_jtag->reg_base)) 641 return -ENOMEM; 642 643 aspeed_jtag->pclk = devm_clk_get(aspeed_jtag->dev, NULL); 644 if (IS_ERR(aspeed_jtag->pclk)) { 645 dev_err(aspeed_jtag->dev, "devm_clk_get failed\n"); 646 return PTR_ERR(aspeed_jtag->pclk); 647 } 648 649 aspeed_jtag->irq = platform_get_irq(pdev, 0); 650 if (aspeed_jtag->irq < 0) { 651 dev_err(aspeed_jtag->dev, "no irq specified\n"); 652 return -ENOENT; 653 } 654 655 clk_prepare_enable(aspeed_jtag->pclk); 656 > 657 aspeed_jtag->rst = > devm_reset_control_get_shared(aspeed_jtag->dev, 658 NULL); 659 if (IS_ERR(aspeed_jtag->rst)) { 660 dev_err(aspeed_jtag->dev, 661 "missing or invalid reset controller device tree entry"); 662 return PTR_ERR(aspeed_jtag->rst); 663 } > 664 reset_control_deassert(aspeed_jtag->rst); 665 666 /* Enable clock */ 667 aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_CTL_ENG_EN | 668ASPEED_JTAG_CTL_ENG_OUT_EN, ASPEED_JTAG_CTRL); 669 aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_SW_MODE_EN | 670ASPEED_JTAG_SW_MODE_TDIO, ASPEED_JTAG_SW); 671 672 err = devm_request_irq(aspeed_jtag->dev, aspeed_jtag->irq, 673 aspeed_jtag_interrupt, 0, 674 "aspeed-jtag", aspeed_jtag); 675 if (err) { 676 dev_err(aspeed_jtag->dev, "unable to get IRQ"); 677 goto clk_unprep; 678 } 679 dev_dbg(>dev, "IRQ %d.\n", aspeed_jtag->irq); 680 681 aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_ISR_INST_PAUSE | 682ASPEED_JTAG_ISR_INST_COMPLETE | 683ASPEED_JTAG_ISR_DATA_PAUSE | 684ASPEED_JTAG
Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle
On Thu, Jan 18, 2018 at 09:02:45PM -0700, Jens Axboe wrote: > On 1/18/18 7:32 PM, Ming Lei wrote: > > On Thu, Jan 18, 2018 at 01:11:01PM -0700, Jens Axboe wrote: > >> On 1/18/18 11:47 AM, Bart Van Assche wrote: > This is all very tiresome. > >>> > >>> Yes, this is tiresome. It is very annoying to me that others keep > >>> introducing so many regressions in such important parts of the kernel. > >>> It is also annoying to me that I get blamed if I report a regression > >>> instead of seeing that the regression gets fixed. > >> > >> I agree, it sucks that any change there introduces the regression. I'm > >> fine with doing the delay insert again until a new patch is proven to be > >> better. > > > > That way is still buggy as I explained, since rerun queue before adding > > request to hctx->dispatch_list isn't correct. Who can make sure the request > > is visible when __blk_mq_run_hw_queue() is called? > > That race basically doesn't exist for a 10ms gap. > > > Not mention this way will cause performance regression again. > > How so? It's _exactly_ the same as what you are proposing, except mine > will potentially run the queue when it need not do so. But given that > these are random 10ms queue kicks because we are screwed, it should not > matter. The key point is that it only should be if we have NO better > options. If it's a frequently occurring event that we have to return > BLK_STS_RESOURCE, then we need to get a way to register an event for > when that condition clears. That event will then kick the necessary > queue(s). Please see queue_delayed_work_on(), hctx->run_work is shared by all scheduling, once blk_mq_delay_run_hw_queue(100ms) returns, no new scheduling can make progress during the 100ms. > > >> From the original topic of this email, we have conditions that can cause > >> the driver to not be able to submit an IO. A set of those conditions can > >> only happen if IO is in flight, and those cases we have covered just > >> fine. Another set can potentially trigger without IO being in flight. > >> These are cases where a non-device resource is unavailable at the time > >> of submission. This might be iommu running out of space, for instance, > >> or it might be a memory allocation of some sort. For these cases, we > >> don't get any notification when the shortage clears. All we can do is > >> ensure that we restart operations at some point in the future. We're SOL > >> at that point, but we have to ensure that we make forward progress. > > > > Right, it is a generic issue, not DM-specific one, almost all drivers > > call kmalloc(GFP_ATOMIC) in IO path. > > GFP_ATOMIC basically never fails, unless we are out of memory. The I guess GFP_KERNEL may never fail, but GFP_ATOMIC failure might be possible, and it is mentioned[1] there is such code in mm allocation path, also OOM can happen too. if (some randomly generated condition) && (request is atomic) return NULL; [1] https://lwn.net/Articles/276731/ > exception is higher order allocations. If a driver has a higher order > atomic allocation in its IO path, the device driver writer needs to be > taken out behind the barn and shot. Simple as that. It will NEVER work > well in a production environment. Witness the disaster that so many NIC > driver writers have learned. > > This is NOT the case we care about here. It's resources that are more > readily depleted because other devices are using them. If it's a high > frequency or generally occurring event, then we simply must have a > callback to restart the queue from that. The condition then becomes > identical to device private starvation, the only difference being from > where we restart the queue. > > > IMO, there is enough time for figuring out a generic solution before > > 4.16 release. > > I would hope so, but the proposed solutions have not filled me with > a lot of confidence in the end result so far. > > >> That last set of conditions better not be a a common occurence, since > >> performance is down the toilet at that point. I don't want to introduce > >> hot path code to rectify it. Have the driver return if that happens in a > >> way that is DIFFERENT from needing a normal restart. The driver knows if > >> this is a resource that will become available when IO completes on this > >> device or not. If we get that return, we have a generic run-again delay. > > > > Now most of times both NVMe and SCSI won't return BLK_STS_RESOURCE, and > > it should be DM-only which returns STS_RESOURCE so often. > > Where does the dm STS_RESOURCE error usually come from - what's exact > resource are we running out of? It is from blk_get_request(underlying queue), see multipath_clone_and_map(). Thanks, Ming
Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle
On Thu, Jan 18, 2018 at 09:02:45PM -0700, Jens Axboe wrote: > On 1/18/18 7:32 PM, Ming Lei wrote: > > On Thu, Jan 18, 2018 at 01:11:01PM -0700, Jens Axboe wrote: > >> On 1/18/18 11:47 AM, Bart Van Assche wrote: > This is all very tiresome. > >>> > >>> Yes, this is tiresome. It is very annoying to me that others keep > >>> introducing so many regressions in such important parts of the kernel. > >>> It is also annoying to me that I get blamed if I report a regression > >>> instead of seeing that the regression gets fixed. > >> > >> I agree, it sucks that any change there introduces the regression. I'm > >> fine with doing the delay insert again until a new patch is proven to be > >> better. > > > > That way is still buggy as I explained, since rerun queue before adding > > request to hctx->dispatch_list isn't correct. Who can make sure the request > > is visible when __blk_mq_run_hw_queue() is called? > > That race basically doesn't exist for a 10ms gap. > > > Not mention this way will cause performance regression again. > > How so? It's _exactly_ the same as what you are proposing, except mine > will potentially run the queue when it need not do so. But given that > these are random 10ms queue kicks because we are screwed, it should not > matter. The key point is that it only should be if we have NO better > options. If it's a frequently occurring event that we have to return > BLK_STS_RESOURCE, then we need to get a way to register an event for > when that condition clears. That event will then kick the necessary > queue(s). Please see queue_delayed_work_on(), hctx->run_work is shared by all scheduling, once blk_mq_delay_run_hw_queue(100ms) returns, no new scheduling can make progress during the 100ms. > > >> From the original topic of this email, we have conditions that can cause > >> the driver to not be able to submit an IO. A set of those conditions can > >> only happen if IO is in flight, and those cases we have covered just > >> fine. Another set can potentially trigger without IO being in flight. > >> These are cases where a non-device resource is unavailable at the time > >> of submission. This might be iommu running out of space, for instance, > >> or it might be a memory allocation of some sort. For these cases, we > >> don't get any notification when the shortage clears. All we can do is > >> ensure that we restart operations at some point in the future. We're SOL > >> at that point, but we have to ensure that we make forward progress. > > > > Right, it is a generic issue, not DM-specific one, almost all drivers > > call kmalloc(GFP_ATOMIC) in IO path. > > GFP_ATOMIC basically never fails, unless we are out of memory. The I guess GFP_KERNEL may never fail, but GFP_ATOMIC failure might be possible, and it is mentioned[1] there is such code in mm allocation path, also OOM can happen too. if (some randomly generated condition) && (request is atomic) return NULL; [1] https://lwn.net/Articles/276731/ > exception is higher order allocations. If a driver has a higher order > atomic allocation in its IO path, the device driver writer needs to be > taken out behind the barn and shot. Simple as that. It will NEVER work > well in a production environment. Witness the disaster that so many NIC > driver writers have learned. > > This is NOT the case we care about here. It's resources that are more > readily depleted because other devices are using them. If it's a high > frequency or generally occurring event, then we simply must have a > callback to restart the queue from that. The condition then becomes > identical to device private starvation, the only difference being from > where we restart the queue. > > > IMO, there is enough time for figuring out a generic solution before > > 4.16 release. > > I would hope so, but the proposed solutions have not filled me with > a lot of confidence in the end result so far. > > >> That last set of conditions better not be a a common occurence, since > >> performance is down the toilet at that point. I don't want to introduce > >> hot path code to rectify it. Have the driver return if that happens in a > >> way that is DIFFERENT from needing a normal restart. The driver knows if > >> this is a resource that will become available when IO completes on this > >> device or not. If we get that return, we have a generic run-again delay. > > > > Now most of times both NVMe and SCSI won't return BLK_STS_RESOURCE, and > > it should be DM-only which returns STS_RESOURCE so often. > > Where does the dm STS_RESOURCE error usually come from - what's exact > resource are we running out of? It is from blk_get_request(underlying queue), see multipath_clone_and_map(). Thanks, Ming
Re: [RESEND PATCH 3/3] x86/apic: Clean up the names of legacy irq mode setting related functions
On 01/19/18 at 02:42pm, Dou Liyang wrote: > Hi Baoquan, > > At 01/05/2018 12:39 PM, Baoquan He wrote: > [...] > > /* > > - * Not an __init, needed by kexec/kdump code. > > - * For safety IO-APIC and Local APIC need be cleared before this. > > + * In legacy irq mode, full DOS compatibility with the uniprocessor PC/AT > > is > > + * provided by using the APICs in conjunction with standard > > 8259A-equivalent > > + * programmable interrupt controllers (PICs). It's necessary to deliver > > legacy > > + * interrupts even when APIC mode is not enabled. This is required by > > kexec/ > > + * kdump before enter into the 2nd kernel. > >*/ > > void switch_to_legacy_irq_mode(void) > > { > > if (!nr_legacy_irqs()) > > return; > > - x86_io_apic_ops.disable(); > > + ioapic_set_virtual_wire_mode(); > > + > > + if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config()) > > + lapic_set_legacy_irq_mode(ioapic_i8259.pin != -1); > > Seems these two function, ioapic/lapic_set_legacy_irq_mode should be > exclusive. Thanks for looking into this, dou! It might be not exclusive. You can see mp_spec 3.6.2.2 Virtual Wire Mode subsection, there are two kinds of virtual wire mode, one is 8259A-Equivalent pics is connected to lint0 of boot cpu LAPIC, the other is 8259A-Equivalent pics go through IO-APIC, then is connected to lint0 of LAPIC. Whatever it is, LAPIC need be set as through-lapic. Above is what I got from mp_spec. But from function native_disable_io_apic() and disconnect_bsp_APIC(), the code seems to be telling that if io-apic is connected to 8259A-Equivalent pics, we need mask lvt0 of LAPIC. This conflicts with mp_spec 3.6.2.2. Thanks Baoquan > > But We do that because both the through-lapic and through-ioapic virtual > wire mode need setup the APIC_SPIV_APIC_ENABLED which is only located in > the lapic_set_legacy_irq_mode(). So we need call them both. > > IMO, this cleanup may not make it clear. we can separate these two mode > totally or just keep it like before. > > Thanks, > dou. > > } > > #ifdef CONFIG_X86_32 > > diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c > > index 1151ccd72ce9..c30f0f273dbd 100644 > > --- a/arch/x86/kernel/x86_init.c > > +++ b/arch/x86/kernel/x86_init.c > > @@ -148,5 +148,5 @@ void arch_restore_msi_irqs(struct pci_dev *dev) > > struct x86_io_apic_ops x86_io_apic_ops __ro_after_init = { > > .read = native_io_apic_read, > > - .disable= native_disable_io_apic, > > + .disable= switch_to_legacy_irq_mode, > > }; > > diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c > > index 49721b4e1975..751472ddf536 100644 > > --- a/drivers/iommu/irq_remapping.c > > +++ b/drivers/iommu/irq_remapping.c > > @@ -37,7 +37,7 @@ static void irq_remapping_disable_io_apic(void) > > * now. > > */ > > if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config()) > > - disconnect_bsp_APIC(0); > > + lapic_set_legacy_irq_mode(0); > > } > > static void __init irq_remapping_modify_x86_ops(void) > > > >
Re: [RESEND PATCH 3/3] x86/apic: Clean up the names of legacy irq mode setting related functions
On 01/19/18 at 02:42pm, Dou Liyang wrote: > Hi Baoquan, > > At 01/05/2018 12:39 PM, Baoquan He wrote: > [...] > > /* > > - * Not an __init, needed by kexec/kdump code. > > - * For safety IO-APIC and Local APIC need be cleared before this. > > + * In legacy irq mode, full DOS compatibility with the uniprocessor PC/AT > > is > > + * provided by using the APICs in conjunction with standard > > 8259A-equivalent > > + * programmable interrupt controllers (PICs). It's necessary to deliver > > legacy > > + * interrupts even when APIC mode is not enabled. This is required by > > kexec/ > > + * kdump before enter into the 2nd kernel. > >*/ > > void switch_to_legacy_irq_mode(void) > > { > > if (!nr_legacy_irqs()) > > return; > > - x86_io_apic_ops.disable(); > > + ioapic_set_virtual_wire_mode(); > > + > > + if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config()) > > + lapic_set_legacy_irq_mode(ioapic_i8259.pin != -1); > > Seems these two function, ioapic/lapic_set_legacy_irq_mode should be > exclusive. Thanks for looking into this, dou! It might be not exclusive. You can see mp_spec 3.6.2.2 Virtual Wire Mode subsection, there are two kinds of virtual wire mode, one is 8259A-Equivalent pics is connected to lint0 of boot cpu LAPIC, the other is 8259A-Equivalent pics go through IO-APIC, then is connected to lint0 of LAPIC. Whatever it is, LAPIC need be set as through-lapic. Above is what I got from mp_spec. But from function native_disable_io_apic() and disconnect_bsp_APIC(), the code seems to be telling that if io-apic is connected to 8259A-Equivalent pics, we need mask lvt0 of LAPIC. This conflicts with mp_spec 3.6.2.2. Thanks Baoquan > > But We do that because both the through-lapic and through-ioapic virtual > wire mode need setup the APIC_SPIV_APIC_ENABLED which is only located in > the lapic_set_legacy_irq_mode(). So we need call them both. > > IMO, this cleanup may not make it clear. we can separate these two mode > totally or just keep it like before. > > Thanks, > dou. > > } > > #ifdef CONFIG_X86_32 > > diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c > > index 1151ccd72ce9..c30f0f273dbd 100644 > > --- a/arch/x86/kernel/x86_init.c > > +++ b/arch/x86/kernel/x86_init.c > > @@ -148,5 +148,5 @@ void arch_restore_msi_irqs(struct pci_dev *dev) > > struct x86_io_apic_ops x86_io_apic_ops __ro_after_init = { > > .read = native_io_apic_read, > > - .disable= native_disable_io_apic, > > + .disable= switch_to_legacy_irq_mode, > > }; > > diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c > > index 49721b4e1975..751472ddf536 100644 > > --- a/drivers/iommu/irq_remapping.c > > +++ b/drivers/iommu/irq_remapping.c > > @@ -37,7 +37,7 @@ static void irq_remapping_disable_io_apic(void) > > * now. > > */ > > if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config()) > > - disconnect_bsp_APIC(0); > > + lapic_set_legacy_irq_mode(0); > > } > > static void __init irq_remapping_modify_x86_ops(void) > > > >
Re: [PATCH v4] perf report: Fix regression when decoding intelPT traces
On 18/01/18 18:29, Arnaldo Carvalho de Melo wrote: > Em Wed, Jan 10, 2018 at 01:31:52PM -0700, Mathieu Poirier escreveu: >> Commit (93d10af26bb7 perf tools: Optimize sample parsing for ordered >> events) breaks intelPT trace decoding by invariably returning an error if >> the event type isn't a PERF_SAMPLE_TIME. > > Adrian, have you had the chance of looking at this? > > I'm tentatively applying with Jiri's ack. Yes, it is fine. FWIW Acked-by: Adrian Hunter> > - Arnaldo > >> With this patch the timestamp is initialised and processing is allowed to >> continue if the error returned by function >> perf_evlist__parse_sample_timestamp() is not a fault. >> >> Signed-off-by: Mathieu Poirier >> Acked-by: Jiri Olsa >> --- >> Changes for v4: >> - Rebased to latest perf/core branch >> - Added Jiri's ACK >> --- >> tools/perf/util/session.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c >> index 54e30f1bcbd7..07221884f725 100644 >> --- a/tools/perf/util/session.c >> +++ b/tools/perf/util/session.c >> @@ -1508,10 +1508,10 @@ static s64 perf_session__process_event(struct >> perf_session *session, >> return perf_session__process_user_event(session, event, >> file_offset); >> >> if (tool->ordered_events) { >> -u64 timestamp; >> +u64 timestamp = -1ULL; >> >> ret = perf_evlist__parse_sample_timestamp(evlist, event, >> ); >> -if (ret) >> +if (ret && ret != -1) >> return ret; >> >> ret = perf_session__queue_event(session, event, timestamp, >> file_offset); >> -- >> 2.7.4 >
Re: [PATCH v4] perf report: Fix regression when decoding intelPT traces
On 18/01/18 18:29, Arnaldo Carvalho de Melo wrote: > Em Wed, Jan 10, 2018 at 01:31:52PM -0700, Mathieu Poirier escreveu: >> Commit (93d10af26bb7 perf tools: Optimize sample parsing for ordered >> events) breaks intelPT trace decoding by invariably returning an error if >> the event type isn't a PERF_SAMPLE_TIME. > > Adrian, have you had the chance of looking at this? > > I'm tentatively applying with Jiri's ack. Yes, it is fine. FWIW Acked-by: Adrian Hunter > > - Arnaldo > >> With this patch the timestamp is initialised and processing is allowed to >> continue if the error returned by function >> perf_evlist__parse_sample_timestamp() is not a fault. >> >> Signed-off-by: Mathieu Poirier >> Acked-by: Jiri Olsa >> --- >> Changes for v4: >> - Rebased to latest perf/core branch >> - Added Jiri's ACK >> --- >> tools/perf/util/session.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c >> index 54e30f1bcbd7..07221884f725 100644 >> --- a/tools/perf/util/session.c >> +++ b/tools/perf/util/session.c >> @@ -1508,10 +1508,10 @@ static s64 perf_session__process_event(struct >> perf_session *session, >> return perf_session__process_user_event(session, event, >> file_offset); >> >> if (tool->ordered_events) { >> -u64 timestamp; >> +u64 timestamp = -1ULL; >> >> ret = perf_evlist__parse_sample_timestamp(evlist, event, >> ); >> -if (ret) >> +if (ret && ret != -1) >> return ret; >> >> ret = perf_session__queue_event(session, event, timestamp, >> file_offset); >> -- >> 2.7.4 >
[PATCH] Fix explanation of lower bits in the SPARSEMEM mem_map pointer
The comment is confusing. On the one hand, it refers to 32-bit alignment (struct page alignment on 32-bit platforms), but this would only guarantee that the 2 lowest bits must be zero. On the other hand, it claims that at least 3 bits are available, and 3 bits are actually used. This is not broken, because there is a stronger alignment guarantee, just less obvious. Let's fix the comment to make it clear how many bits are available and why. Signed-off-by: Petr Tesarik--- include/linux/mmzone.h | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 67f2e3c38939..7522a6987595 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1166,8 +1166,16 @@ extern unsigned long usemap_size(void); /* * We use the lower bits of the mem_map pointer to store - * a little bit of information. There should be at least - * 3 bits here due to 32-bit alignment. + * a little bit of information. The pointer is calculated + * as mem_map - section_nr_to_pfn(pnum). The result is + * aligned to the minimum alignment of the two values: + * 1. All mem_map arrays are page-aligned. + * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT + * lowest bits. PFN_SECTION_SHIFT is arch-specific + * (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the + * worst combination is powerpc with 256k pages, + * which results in PFN_SECTION_SHIFT equal 6. + * To sum it up, at least 6 bits are available. */ #defineSECTION_MARKED_PRESENT (1UL<<0) #define SECTION_HAS_MEM_MAP(1UL<<1) -- 2.13.6
[PATCH] Fix explanation of lower bits in the SPARSEMEM mem_map pointer
The comment is confusing. On the one hand, it refers to 32-bit alignment (struct page alignment on 32-bit platforms), but this would only guarantee that the 2 lowest bits must be zero. On the other hand, it claims that at least 3 bits are available, and 3 bits are actually used. This is not broken, because there is a stronger alignment guarantee, just less obvious. Let's fix the comment to make it clear how many bits are available and why. Signed-off-by: Petr Tesarik --- include/linux/mmzone.h | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 67f2e3c38939..7522a6987595 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1166,8 +1166,16 @@ extern unsigned long usemap_size(void); /* * We use the lower bits of the mem_map pointer to store - * a little bit of information. There should be at least - * 3 bits here due to 32-bit alignment. + * a little bit of information. The pointer is calculated + * as mem_map - section_nr_to_pfn(pnum). The result is + * aligned to the minimum alignment of the two values: + * 1. All mem_map arrays are page-aligned. + * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT + * lowest bits. PFN_SECTION_SHIFT is arch-specific + * (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the + * worst combination is powerpc with 256k pages, + * which results in PFN_SECTION_SHIFT equal 6. + * To sum it up, at least 6 bits are available. */ #defineSECTION_MARKED_PRESENT (1UL<<0) #define SECTION_HAS_MEM_MAP(1UL<<1) -- 2.13.6
Re: [PATCH v5 0/2] kprobes: improve error handling when arming/disarming kprobes
Hi Ingo, Could you pick this to tip tree? Thank you, On Wed, 10 Jan 2018 00:51:22 +0100 Jessica Yuwrote: > Hi, > > This patchset attempts to improve error handling when arming or disarming > ftrace-based kprobes. The current behavior is to simply WARN when ftrace > (un-)registration fails, without propagating the error code. This can lead > to confusing situations where, for example, register_kprobe()/enable_kprobe() > would return 0 indicating success even if arming via ftrace had failed. In > this scenario we'd end up with a non-functioning kprobe even though kprobe > registration (or enablement) returned success. In this patchset, we take > errors from ftrace into account and propagate the error when we cannot arm > or disarm a kprobe. > > Below is an example that illustrates the problem using livepatch and > systemtap (which uses kprobes underneath). Both livepatch and kprobes use > ftrace ops with the IPMODIFY flag set, so registration at the same > function entry is limited to only one ftrace user. > > Before > -- > # modprobe livepatch-sample # patches cmdline_proc_show, ftrace ops has > IPMODIFY set > # stap -e 'probe kernel.function("cmdline_proc_show").call { printf > ("cmdline_proc_show\n"); }' > >.. (nothing prints after reading /proc/cmdline) .. > > The systemtap handler doesn't execute due to a kprobe arming failure caused > by a ftrace IPMODIFY conflict with livepatch, and there isn't an obvious > indication of error from systemtap (because register_kprobe() returned > success) unless the user inspects dmesg. > > After > - > # modprobe livepatch-sample > # stap -e 'probe kernel.function("cmdline_proc_show").call { printf > ("cmdline_proc_show\n"); }' > WARNING: probe > kernel.function("cmdline_proc_show@/home/jeyu/work/linux-next/fs/proc/cmdline.c:6").call > (address 0xa82fe910) registration error (rc -16) > > Although the systemtap handler doesn't execute (as it shouldn't), the > ftrace error is propagated and now systemtap prints a visible error message > stating that (kprobe) registration had failed (because register_kprobe() > returned an error), along with the propagated error code. > > This patchset was based on Petr Mladek's original patchset (patches 2 and 3) > back in 2015, which improved kprobes error handling, found here: > >https://lkml.org/lkml/2015/2/26/452 > > However, further work on this had been paused since then and the patches > were not upstreamed. > > This patchset has been lightly sanity-tested (on linux-next) with kprobes, > kretprobes, and optimized kprobes. It passes the kprobes smoke test, but > more testing is greatly appreciated. > > Changes from v4: > - Switch from WARN() to pr_debug() in arm_kprobe_ftrace() so the stack >dumps don't pollute dmesg, as IPMODIFY conflicts can occur in normal usage > - Added Masami's ack to the first patch > > Changes from v3: > - Have (dis)arm_kprobe_ftrace() return -ENODEV instead of 0 in case of >!CONFIG_KPROBES_ON_FTRACE > - Add total count of all probes tried in (dis)arm_all_kprobes() > > Changes from v2: > - Add missing synchronize rcu in register_aggr_kprobe() > - s/kprobes/probes/ on error message in (dis)arm_all_kprobes() > > Changes from v1: > - Don't arm the kprobe before adding it to the kprobe table, otherwise > we'll temporarily see a stray breakpoint. > - Remove kprobe from the kprobe_table and call synchronize_sched() if > arming during register_kprobe() fails. > - add Masami's ack on the 2nd patch (unchanged from v1) > > --- > Jessica Yu (2): > kprobes: propagate error from arm_kprobe_ftrace() > kprobes: propagate error from disarm_kprobe_ftrace() > > kernel/kprobes.c | 178 > +++ > 1 file changed, 128 insertions(+), 50 deletions(-) > > -- > 2.13.6 > -- Masami Hiramatsu
Re: [PATCH v5 0/2] kprobes: improve error handling when arming/disarming kprobes
Hi Ingo, Could you pick this to tip tree? Thank you, On Wed, 10 Jan 2018 00:51:22 +0100 Jessica Yu wrote: > Hi, > > This patchset attempts to improve error handling when arming or disarming > ftrace-based kprobes. The current behavior is to simply WARN when ftrace > (un-)registration fails, without propagating the error code. This can lead > to confusing situations where, for example, register_kprobe()/enable_kprobe() > would return 0 indicating success even if arming via ftrace had failed. In > this scenario we'd end up with a non-functioning kprobe even though kprobe > registration (or enablement) returned success. In this patchset, we take > errors from ftrace into account and propagate the error when we cannot arm > or disarm a kprobe. > > Below is an example that illustrates the problem using livepatch and > systemtap (which uses kprobes underneath). Both livepatch and kprobes use > ftrace ops with the IPMODIFY flag set, so registration at the same > function entry is limited to only one ftrace user. > > Before > -- > # modprobe livepatch-sample # patches cmdline_proc_show, ftrace ops has > IPMODIFY set > # stap -e 'probe kernel.function("cmdline_proc_show").call { printf > ("cmdline_proc_show\n"); }' > >.. (nothing prints after reading /proc/cmdline) .. > > The systemtap handler doesn't execute due to a kprobe arming failure caused > by a ftrace IPMODIFY conflict with livepatch, and there isn't an obvious > indication of error from systemtap (because register_kprobe() returned > success) unless the user inspects dmesg. > > After > - > # modprobe livepatch-sample > # stap -e 'probe kernel.function("cmdline_proc_show").call { printf > ("cmdline_proc_show\n"); }' > WARNING: probe > kernel.function("cmdline_proc_show@/home/jeyu/work/linux-next/fs/proc/cmdline.c:6").call > (address 0xa82fe910) registration error (rc -16) > > Although the systemtap handler doesn't execute (as it shouldn't), the > ftrace error is propagated and now systemtap prints a visible error message > stating that (kprobe) registration had failed (because register_kprobe() > returned an error), along with the propagated error code. > > This patchset was based on Petr Mladek's original patchset (patches 2 and 3) > back in 2015, which improved kprobes error handling, found here: > >https://lkml.org/lkml/2015/2/26/452 > > However, further work on this had been paused since then and the patches > were not upstreamed. > > This patchset has been lightly sanity-tested (on linux-next) with kprobes, > kretprobes, and optimized kprobes. It passes the kprobes smoke test, but > more testing is greatly appreciated. > > Changes from v4: > - Switch from WARN() to pr_debug() in arm_kprobe_ftrace() so the stack >dumps don't pollute dmesg, as IPMODIFY conflicts can occur in normal usage > - Added Masami's ack to the first patch > > Changes from v3: > - Have (dis)arm_kprobe_ftrace() return -ENODEV instead of 0 in case of >!CONFIG_KPROBES_ON_FTRACE > - Add total count of all probes tried in (dis)arm_all_kprobes() > > Changes from v2: > - Add missing synchronize rcu in register_aggr_kprobe() > - s/kprobes/probes/ on error message in (dis)arm_all_kprobes() > > Changes from v1: > - Don't arm the kprobe before adding it to the kprobe table, otherwise > we'll temporarily see a stray breakpoint. > - Remove kprobe from the kprobe_table and call synchronize_sched() if > arming during register_kprobe() fails. > - add Masami's ack on the 2nd patch (unchanged from v1) > > --- > Jessica Yu (2): > kprobes: propagate error from arm_kprobe_ftrace() > kprobes: propagate error from disarm_kprobe_ftrace() > > kernel/kprobes.c | 178 > +++ > 1 file changed, 128 insertions(+), 50 deletions(-) > > -- > 2.13.6 > -- Masami Hiramatsu
Re: [PATCH] xhci:Fix NULL pointer in xhci debugfs
On 19.01.2018 04:13, Zhengjun Xing wrote: Commit dde634057da7 ("xhci: Fix use-after-free in xhci debugfs") causes a null pointer dereference while fixing xhci-debugfs usage of ring pointers that were freed during hibernate. The fix passed addresses to ring pointers instead, but forgot to do this change for the xhci_ring_trb_show function. The address of the ring pointer passed to xhci-debugfs was of a temporary ring pointer "new_ring" instead of the actual ring "ring" pointer. The temporary new_ring pointer will be set to NULL later causing the NULL pointer dereference. This issue was seen when reading xhci related files in debugfs: cat /sys/kernel/debug/usb/xhci/*/devices/*/ep*/trbs [ 184.604861] BUG: unable to handle kernel NULL pointer dereference at (null) [ 184.613776] IP: xhci_ring_trb_show+0x3a/0x890 [ 184.618733] PGD 264193067 P4D 264193067 PUD 263238067 PMD 0 [ 184.625184] Oops: [#1] SMP [ 184.726410] RIP: 0010:xhci_ring_trb_show+0x3a/0x890 [ 184.731944] RSP: 0018:ba8243c0fd90 EFLAGS: 00010246 [ 184.737880] RAX: RBX: RCX: 000295d6 [ 184.746020] RDX: 000295d5 RSI: 0001 RDI: 971a6418d400 [ 184.754121] RBP: R08: R09: [ 184.76] R10: 971a64c98a80 R11: 971a62a00e40 R12: 971a62a85500 [ 184.770325] R13: 0002 R14: 971a6418d400 R15: 971a6418d400 [ 184.778448] FS: 7fe725a79700() GS:971a6ec0() knlGS: [ 184.787644] CS: 0010 DS: ES: CR0: 80050033 [ 184.794168] CR2: CR3: 00025f365005 CR4: 003606f0 [ 184.802318] Call Trace: [ 184.805094] ? seq_read+0x281/0x3b0 [ 184.809068] seq_read+0xeb/0x3b0 [ 184.812735] full_proxy_read+0x4d/0x70 [ 184.817007] __vfs_read+0x23/0x120 [ 184.820870] vfs_read+0x91/0x130 [ 184.824538] SyS_read+0x42/0x90 [ 184.828106] entry_SYSCALL_64_fastpath+0x1a/0x7d Fixes: dde634057da7 ("xhci: Fix use-after-free in xhci debugfs") Signed-off-by: Zhengjun Xing--- Thanks, adding to queue -Mathias
Re: [PATCH] xhci:Fix NULL pointer in xhci debugfs
On 19.01.2018 04:13, Zhengjun Xing wrote: Commit dde634057da7 ("xhci: Fix use-after-free in xhci debugfs") causes a null pointer dereference while fixing xhci-debugfs usage of ring pointers that were freed during hibernate. The fix passed addresses to ring pointers instead, but forgot to do this change for the xhci_ring_trb_show function. The address of the ring pointer passed to xhci-debugfs was of a temporary ring pointer "new_ring" instead of the actual ring "ring" pointer. The temporary new_ring pointer will be set to NULL later causing the NULL pointer dereference. This issue was seen when reading xhci related files in debugfs: cat /sys/kernel/debug/usb/xhci/*/devices/*/ep*/trbs [ 184.604861] BUG: unable to handle kernel NULL pointer dereference at (null) [ 184.613776] IP: xhci_ring_trb_show+0x3a/0x890 [ 184.618733] PGD 264193067 P4D 264193067 PUD 263238067 PMD 0 [ 184.625184] Oops: [#1] SMP [ 184.726410] RIP: 0010:xhci_ring_trb_show+0x3a/0x890 [ 184.731944] RSP: 0018:ba8243c0fd90 EFLAGS: 00010246 [ 184.737880] RAX: RBX: RCX: 000295d6 [ 184.746020] RDX: 000295d5 RSI: 0001 RDI: 971a6418d400 [ 184.754121] RBP: R08: R09: [ 184.76] R10: 971a64c98a80 R11: 971a62a00e40 R12: 971a62a85500 [ 184.770325] R13: 0002 R14: 971a6418d400 R15: 971a6418d400 [ 184.778448] FS: 7fe725a79700() GS:971a6ec0() knlGS: [ 184.787644] CS: 0010 DS: ES: CR0: 80050033 [ 184.794168] CR2: CR3: 00025f365005 CR4: 003606f0 [ 184.802318] Call Trace: [ 184.805094] ? seq_read+0x281/0x3b0 [ 184.809068] seq_read+0xeb/0x3b0 [ 184.812735] full_proxy_read+0x4d/0x70 [ 184.817007] __vfs_read+0x23/0x120 [ 184.820870] vfs_read+0x91/0x130 [ 184.824538] SyS_read+0x42/0x90 [ 184.828106] entry_SYSCALL_64_fastpath+0x1a/0x7d Fixes: dde634057da7 ("xhci: Fix use-after-free in xhci debugfs") Signed-off-by: Zhengjun Xing --- Thanks, adding to queue -Mathias
linux-next: Tree for Jan 19
Hi all, News: there will probably be very few, if any, releases next week as LCA is on (unfortunate clash with the merge window). Changes since 20180118: The powerpc tree gained a build failure due to an interaction with Linus' tree, so I applied a merge fix patch. It gained another for which I applied a supplied fix patch. The f2fs tree gained a build failure due to an interaction with the btrfs tree for which I reverted a commit. The net-next tree gained a conflict against the net tree. Non-merge commits (relative to Linus' tree): 9833 9793 files changed, 406830 insertions(+), 263432 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 256 trees (counting Linus' and 44 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (dda3e15231b3 Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm) Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi) Merging kbuild-current/fixes (36c1681678b5 genksyms: drop *.hash.c from .gitignore) Merging arc-current/for-curr (8ff3afc159f2 ARC: Enable fatal signals on boot for dev platforms) Merging arm-current/fixes (091f02483df7 ARM: net: bpf: clarify tail_call index) Merging m68k-current/for-linus (5e387199c17c m68k/defconfig: Update defconfigs for v4.14-rc7) Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups) Merging powerpc-fixes/fixes (1b689a95ce74 powerpc/pseries: include linux/types.h in asm/hvcall.h) Merging sparc/master (59585b4be9ae sparc64: repair calling incorrect hweight function from stubs) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (b200bfd6112a fm10k: mark PM functions as __maybe_unused) Merging bpf/master (7155f8f39157 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf) Merging ipsec/master (ad9294dbc227 bpf: fix cls_bpf on filter replace) Merging netfilter/master (889c604fd0b5 netfilter: x_tables: fix int overflow in xt_alloc_table_info()) Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook mask only if set) Merging wireless-drivers/master (cc124d5cc8d8 brcmfmac: fix CLM load error for legacy chips when user helper is enabled) Merging mac80211/master (59b179b48ce2 cfg80211: check dev_set_name() return value) Merging rdma-fixes/for-rc (ae59c3f0b6cf RDMA/mlx5: Fix out-of-bound access while querying AH) Merging sound-current/for-linus (b3defb791b26 ALSA: seq: Make ioctls race-free) Merging pci-current/for-linus (d6c1efecd1e1 x86/PCI: Enable AMD 64-bit window on resume) Merging driver-core.current/driver-core-linus (30a7acd57389 Linux 4.15-rc6) Merging tty.current/tty-linus (30a7acd57389 Linux 4.15-rc6) Merging usb.current/usb-linus (a8750ddca918 Linux 4.15-rc8) Merging usb-gadget-fixes/fixes (b2cd1df66037 Linux 4.15-rc7) Merging usb-serial-fixes/usb-linus (d14ac576d10f USB: serial: cp210x: add new device ID ELV ALC 8xxx) Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: fix ulpi-node lookup) Merging phy/fixes (2b88212c4cc6 phy: rcar-gen3-usb2: select USB_COMMON) Merging staging.current/staging-linus (a8750ddca918 Linux 4.15-rc8) Merging char-misc.current/char-misc-linus (a8750ddca918 Linux 4.15-rc8) Merging input-current/for-linus
linux-next: Tree for Jan 19
Hi all, News: there will probably be very few, if any, releases next week as LCA is on (unfortunate clash with the merge window). Changes since 20180118: The powerpc tree gained a build failure due to an interaction with Linus' tree, so I applied a merge fix patch. It gained another for which I applied a supplied fix patch. The f2fs tree gained a build failure due to an interaction with the btrfs tree for which I reverted a commit. The net-next tree gained a conflict against the net tree. Non-merge commits (relative to Linus' tree): 9833 9793 files changed, 406830 insertions(+), 263432 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 256 trees (counting Linus' and 44 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (dda3e15231b3 Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm) Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi) Merging kbuild-current/fixes (36c1681678b5 genksyms: drop *.hash.c from .gitignore) Merging arc-current/for-curr (8ff3afc159f2 ARC: Enable fatal signals on boot for dev platforms) Merging arm-current/fixes (091f02483df7 ARM: net: bpf: clarify tail_call index) Merging m68k-current/for-linus (5e387199c17c m68k/defconfig: Update defconfigs for v4.14-rc7) Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups) Merging powerpc-fixes/fixes (1b689a95ce74 powerpc/pseries: include linux/types.h in asm/hvcall.h) Merging sparc/master (59585b4be9ae sparc64: repair calling incorrect hweight function from stubs) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (b200bfd6112a fm10k: mark PM functions as __maybe_unused) Merging bpf/master (7155f8f39157 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf) Merging ipsec/master (ad9294dbc227 bpf: fix cls_bpf on filter replace) Merging netfilter/master (889c604fd0b5 netfilter: x_tables: fix int overflow in xt_alloc_table_info()) Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook mask only if set) Merging wireless-drivers/master (cc124d5cc8d8 brcmfmac: fix CLM load error for legacy chips when user helper is enabled) Merging mac80211/master (59b179b48ce2 cfg80211: check dev_set_name() return value) Merging rdma-fixes/for-rc (ae59c3f0b6cf RDMA/mlx5: Fix out-of-bound access while querying AH) Merging sound-current/for-linus (b3defb791b26 ALSA: seq: Make ioctls race-free) Merging pci-current/for-linus (d6c1efecd1e1 x86/PCI: Enable AMD 64-bit window on resume) Merging driver-core.current/driver-core-linus (30a7acd57389 Linux 4.15-rc6) Merging tty.current/tty-linus (30a7acd57389 Linux 4.15-rc6) Merging usb.current/usb-linus (a8750ddca918 Linux 4.15-rc8) Merging usb-gadget-fixes/fixes (b2cd1df66037 Linux 4.15-rc7) Merging usb-serial-fixes/usb-linus (d14ac576d10f USB: serial: cp210x: add new device ID ELV ALC 8xxx) Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: fix ulpi-node lookup) Merging phy/fixes (2b88212c4cc6 phy: rcar-gen3-usb2: select USB_COMMON) Merging staging.current/staging-linus (a8750ddca918 Linux 4.15-rc8) Merging char-misc.current/char-misc-linus (a8750ddca918 Linux 4.15-rc8) Merging input-current/for-linus
Re: [PATCH V5 2/2] nvme-pci: fixup the timeout case when reset is ongoing
Hi Keith Thanks for your kindly reminding. On 01/19/2018 02:05 PM, Keith Busch wrote: >>> The driver may be giving up on the command here, but that doesn't mean >>> the controller has. We can't just end the request like this because that >>> will release the memory the controller still owns. We must wait until >>> after nvme_dev_disable clears bus master because we can't say for sure >>> the controller isn't going to write to that address right after we end >>> the request. >>> >> Yes, but the controller is going to be reseted or shutdown at the moment, >> even if the controller accesses a bad address and goes wrong, everything will >> be ok after reset or shutdown. :) > Hm, I don't follow. DMA access after free is never okay. Yes, this may cause unexpected memory corruption. Thanks Jianchao
Re: [PATCH V5 2/2] nvme-pci: fixup the timeout case when reset is ongoing
Hi Keith Thanks for your kindly reminding. On 01/19/2018 02:05 PM, Keith Busch wrote: >>> The driver may be giving up on the command here, but that doesn't mean >>> the controller has. We can't just end the request like this because that >>> will release the memory the controller still owns. We must wait until >>> after nvme_dev_disable clears bus master because we can't say for sure >>> the controller isn't going to write to that address right after we end >>> the request. >>> >> Yes, but the controller is going to be reseted or shutdown at the moment, >> even if the controller accesses a bad address and goes wrong, everything will >> be ok after reset or shutdown. :) > Hm, I don't follow. DMA access after free is never okay. Yes, this may cause unexpected memory corruption. Thanks Jianchao
[git pull] drm fixes for 4.15 final
Hi Linus, This is a set of drm regression fixes that I'd like to get into 4.15 final, but I understand if it's too much too late, and am happy to drop these into -next and make people chase the stable monkey. The i915 change fixes a display corruption problem introduced in 4.15, the nouveau changes are for regressions in 4.15, one of the vmwgfx fixes goes back a little further, the other is a 4.15 regression fix, the 3 sun4i changes fix blank HDMI output on those devices. Again happy if you don't take these, just let me know, I suspect 4.15 will have a lot of stable backports for security things over time! Thanks, Dave. The following changes since commit a8750ddca918032d6349adbf9a4b6555e7db20da: Linux 4.15-rc8 (2018-01-14 15:32:30 -0800) are available in the git repository at: git://people.freedesktop.org/~airlied/linux tags/drm-fixes-for-v4.15-rc9 for you to fetch changes up to 04cef3eadcf0bf9783a985286cc5f48c5d33fd7a: Merge tag 'drm-intel-fixes-2018-01-18' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes (2018-01-19 12:40:07 +1000) nouveau, i915, vmwgfx and sun4i regression fixes Ben Skeggs (1): drm/nouveau/mmu/mcp77: fix regressions in stolen memory handling Dave Airlie (4): Merge branch 'vmwgfx-fixes-4.15' of git://people.freedesktop.org/~thomash/linux into drm-fixes Merge tag 'drm-misc-fixes-2018-01-17' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Merge branch 'linux-4.15' of git://github.com/skeggsb/linux into drm-fixes Merge tag 'drm-intel-fixes-2018-01-18' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes Jon Hunter (1): drm/nouveau/bar/gk20a: Avoid bar teardown during init Jonathan Liu (3): drm/sun4i: hdmi: Check for unset best_parent in sun4i_tmds_determine_rate drm/sun4i: hdmi: Fix incorrect assignment in sun4i_tmds_determine_rate drm/sun4i: hdmi: Add missing rate halving check in sun4i_tmds_determine_rate Rob Clark (1): drm/vmwgfx: fix memory corruption with legacy/sou connectors Thierry Reding (1): drm/nouveau/drm/nouveau: Pass the proper arguments to nvif_object_map_handle() Ville Syrjälä (3): drm/i915: Add .get_hw_state() method for planes drm/i915: Redo plane sanitation during readout drm/i915: Fix deadlock in i830_disable_pipe() Woody Suwalski (1): drm/vmwgfx: Fix a boot time warning drivers/gpu/drm/i915/intel_display.c | 303 +++-- drivers/gpu/drm/i915/intel_drv.h | 2 + drivers/gpu/drm/i915/intel_sprite.c| 83 ++ drivers/gpu/drm/nouveau/include/nvkm/subdev/mmu.h | 1 + drivers/gpu/drm/nouveau/nouveau_bo.c | 4 +- drivers/gpu/drm/nouveau/nvkm/engine/device/base.c | 4 +- drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c | 3 +- drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c| 1 - drivers/gpu/drm/nouveau/nvkm/subdev/mmu/Kbuild | 2 + drivers/gpu/drm/nouveau/nvkm/subdev/mmu/mcp77.c| 41 +++ drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h | 10 + drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmmcp77.c | 45 +++ drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c | 16 +- drivers/gpu/drm/sun4i/sun4i_hdmi_tmds_clk.c| 9 +- drivers/gpu/drm/vmwgfx/vmwgfx_kms.c| 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c| 4 +- drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c | 4 +- 17 files changed, 367 insertions(+), 167 deletions(-) create mode 100644 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/mcp77.c create mode 100644 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmmcp77.c
[git pull] drm fixes for 4.15 final
Hi Linus, This is a set of drm regression fixes that I'd like to get into 4.15 final, but I understand if it's too much too late, and am happy to drop these into -next and make people chase the stable monkey. The i915 change fixes a display corruption problem introduced in 4.15, the nouveau changes are for regressions in 4.15, one of the vmwgfx fixes goes back a little further, the other is a 4.15 regression fix, the 3 sun4i changes fix blank HDMI output on those devices. Again happy if you don't take these, just let me know, I suspect 4.15 will have a lot of stable backports for security things over time! Thanks, Dave. The following changes since commit a8750ddca918032d6349adbf9a4b6555e7db20da: Linux 4.15-rc8 (2018-01-14 15:32:30 -0800) are available in the git repository at: git://people.freedesktop.org/~airlied/linux tags/drm-fixes-for-v4.15-rc9 for you to fetch changes up to 04cef3eadcf0bf9783a985286cc5f48c5d33fd7a: Merge tag 'drm-intel-fixes-2018-01-18' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes (2018-01-19 12:40:07 +1000) nouveau, i915, vmwgfx and sun4i regression fixes Ben Skeggs (1): drm/nouveau/mmu/mcp77: fix regressions in stolen memory handling Dave Airlie (4): Merge branch 'vmwgfx-fixes-4.15' of git://people.freedesktop.org/~thomash/linux into drm-fixes Merge tag 'drm-misc-fixes-2018-01-17' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Merge branch 'linux-4.15' of git://github.com/skeggsb/linux into drm-fixes Merge tag 'drm-intel-fixes-2018-01-18' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes Jon Hunter (1): drm/nouveau/bar/gk20a: Avoid bar teardown during init Jonathan Liu (3): drm/sun4i: hdmi: Check for unset best_parent in sun4i_tmds_determine_rate drm/sun4i: hdmi: Fix incorrect assignment in sun4i_tmds_determine_rate drm/sun4i: hdmi: Add missing rate halving check in sun4i_tmds_determine_rate Rob Clark (1): drm/vmwgfx: fix memory corruption with legacy/sou connectors Thierry Reding (1): drm/nouveau/drm/nouveau: Pass the proper arguments to nvif_object_map_handle() Ville Syrjälä (3): drm/i915: Add .get_hw_state() method for planes drm/i915: Redo plane sanitation during readout drm/i915: Fix deadlock in i830_disable_pipe() Woody Suwalski (1): drm/vmwgfx: Fix a boot time warning drivers/gpu/drm/i915/intel_display.c | 303 +++-- drivers/gpu/drm/i915/intel_drv.h | 2 + drivers/gpu/drm/i915/intel_sprite.c| 83 ++ drivers/gpu/drm/nouveau/include/nvkm/subdev/mmu.h | 1 + drivers/gpu/drm/nouveau/nouveau_bo.c | 4 +- drivers/gpu/drm/nouveau/nvkm/engine/device/base.c | 4 +- drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c | 3 +- drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c| 1 - drivers/gpu/drm/nouveau/nvkm/subdev/mmu/Kbuild | 2 + drivers/gpu/drm/nouveau/nvkm/subdev/mmu/mcp77.c| 41 +++ drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h | 10 + drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmmcp77.c | 45 +++ drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c | 16 +- drivers/gpu/drm/sun4i/sun4i_hdmi_tmds_clk.c| 9 +- drivers/gpu/drm/vmwgfx/vmwgfx_kms.c| 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c| 4 +- drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c | 4 +- 17 files changed, 367 insertions(+), 167 deletions(-) create mode 100644 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/mcp77.c create mode 100644 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmmcp77.c
Re: [RESEND PATCH 3/3] x86/apic: Clean up the names of legacy irq mode setting related functions
Hi Baoquan, At 01/05/2018 12:39 PM, Baoquan He wrote: [...] /* - * Not an __init, needed by kexec/kdump code. - * For safety IO-APIC and Local APIC need be cleared before this. + * In legacy irq mode, full DOS compatibility with the uniprocessor PC/AT is + * provided by using the APICs in conjunction with standard 8259A-equivalent + * programmable interrupt controllers (PICs). It's necessary to deliver legacy + * interrupts even when APIC mode is not enabled. This is required by kexec/ + * kdump before enter into the 2nd kernel. */ void switch_to_legacy_irq_mode(void) { if (!nr_legacy_irqs()) return; - x86_io_apic_ops.disable(); + ioapic_set_virtual_wire_mode(); + + if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config()) + lapic_set_legacy_irq_mode(ioapic_i8259.pin != -1); Seems these two function, ioapic/lapic_set_legacy_irq_mode should be exclusive. But We do that because both the through-lapic and through-ioapic virtual wire mode need setup the APIC_SPIV_APIC_ENABLED which is only located in the lapic_set_legacy_irq_mode(). So we need call them both. IMO, this cleanup may not make it clear. we can separate these two mode totally or just keep it like before. Thanks, dou. } #ifdef CONFIG_X86_32 diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c index 1151ccd72ce9..c30f0f273dbd 100644 --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -148,5 +148,5 @@ void arch_restore_msi_irqs(struct pci_dev *dev) struct x86_io_apic_ops x86_io_apic_ops __ro_after_init = { .read = native_io_apic_read, - .disable= native_disable_io_apic, + .disable= switch_to_legacy_irq_mode, }; diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c index 49721b4e1975..751472ddf536 100644 --- a/drivers/iommu/irq_remapping.c +++ b/drivers/iommu/irq_remapping.c @@ -37,7 +37,7 @@ static void irq_remapping_disable_io_apic(void) * now. */ if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config()) - disconnect_bsp_APIC(0); + lapic_set_legacy_irq_mode(0); } static void __init irq_remapping_modify_x86_ops(void)
Re: [RESEND PATCH 3/3] x86/apic: Clean up the names of legacy irq mode setting related functions
Hi Baoquan, At 01/05/2018 12:39 PM, Baoquan He wrote: [...] /* - * Not an __init, needed by kexec/kdump code. - * For safety IO-APIC and Local APIC need be cleared before this. + * In legacy irq mode, full DOS compatibility with the uniprocessor PC/AT is + * provided by using the APICs in conjunction with standard 8259A-equivalent + * programmable interrupt controllers (PICs). It's necessary to deliver legacy + * interrupts even when APIC mode is not enabled. This is required by kexec/ + * kdump before enter into the 2nd kernel. */ void switch_to_legacy_irq_mode(void) { if (!nr_legacy_irqs()) return; - x86_io_apic_ops.disable(); + ioapic_set_virtual_wire_mode(); + + if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config()) + lapic_set_legacy_irq_mode(ioapic_i8259.pin != -1); Seems these two function, ioapic/lapic_set_legacy_irq_mode should be exclusive. But We do that because both the through-lapic and through-ioapic virtual wire mode need setup the APIC_SPIV_APIC_ENABLED which is only located in the lapic_set_legacy_irq_mode(). So we need call them both. IMO, this cleanup may not make it clear. we can separate these two mode totally or just keep it like before. Thanks, dou. } #ifdef CONFIG_X86_32 diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c index 1151ccd72ce9..c30f0f273dbd 100644 --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -148,5 +148,5 @@ void arch_restore_msi_irqs(struct pci_dev *dev) struct x86_io_apic_ops x86_io_apic_ops __ro_after_init = { .read = native_io_apic_read, - .disable= native_disable_io_apic, + .disable= switch_to_legacy_irq_mode, }; diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c index 49721b4e1975..751472ddf536 100644 --- a/drivers/iommu/irq_remapping.c +++ b/drivers/iommu/irq_remapping.c @@ -37,7 +37,7 @@ static void irq_remapping_disable_io_apic(void) * now. */ if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config()) - disconnect_bsp_APIC(0); + lapic_set_legacy_irq_mode(0); } static void __init irq_remapping_modify_x86_ops(void)
Re: [RESEND] phy: sun4i-usb: add support for R40 USB PHY
于 2018年1月19日 GMT+08:00 下午2:25:09, Chen-Yu Tsai写到: >Hi Kishon, > >On Mon, Jan 15, 2018 at 11:06 PM, Hermann Lauer > wrote: >> On Wed, Jan 03, 2018 at 04:49:44PM +0800, Icenowy Zheng wrote: >>> Allwinner R40 features a USB PHY like the one in A64, but with 3 >PHYs. >>> >>> Add support for it. >>> >>> Signed-off-by: Icenowy Zheng >>> Acked-by: Maxime Ripard >>> Acked-by: Rob Herring >> >> You may add >> >> Tested-by: hermann.la...@iwr.uni-heidelberg.de > >Gentle ping for this patch to be included in 4.16 I think maybe I forgot PATCH in title so it didn't enter patchwork? > >ChenYu > >___ >linux-arm-kernel mailing list >linux-arm-ker...@lists.infradead.org >http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Re: [RESEND] phy: sun4i-usb: add support for R40 USB PHY
于 2018年1月19日 GMT+08:00 下午2:25:09, Chen-Yu Tsai 写到: >Hi Kishon, > >On Mon, Jan 15, 2018 at 11:06 PM, Hermann Lauer > wrote: >> On Wed, Jan 03, 2018 at 04:49:44PM +0800, Icenowy Zheng wrote: >>> Allwinner R40 features a USB PHY like the one in A64, but with 3 >PHYs. >>> >>> Add support for it. >>> >>> Signed-off-by: Icenowy Zheng >>> Acked-by: Maxime Ripard >>> Acked-by: Rob Herring >> >> You may add >> >> Tested-by: hermann.la...@iwr.uni-heidelberg.de > >Gentle ping for this patch to be included in 4.16 I think maybe I forgot PATCH in title so it didn't enter patchwork? > >ChenYu > >___ >linux-arm-kernel mailing list >linux-arm-ker...@lists.infradead.org >http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Re: [PATCH 6/6] s390: scrub registers on kernel entry and KVM exit
在 2018/1/17 17:48, Martin Schwidefsky 写道: Clear all user space registers on entry to the kernel and all KVM guest registers on KVM guest exit if the register does not contain either a parameter or a result value. I am not sure if I understand this but it will be safer? And can we abstract the operations to be a macro like CLEAR_REG_7? Thanks Suggested-by: Christian BorntraegerReviewed-by: Christian Borntraeger Signed-off-by: Martin Schwidefsky --- arch/s390/kernel/entry.S | 41 + 1 file changed, 41 insertions(+) diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S index 2a22c03..47227d3 100644 --- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -322,6 +322,12 @@ ENTRY(sie64a) sie_exit: lg %r14,__SF_EMPTY+8(%r15) # load guest register save area stmg%r0,%r13,0(%r14)# save guest gprs 0-13 + xgr %r0,%r0 # clear guest registers + xgr %r1,%r1 + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 lmg %r6,%r14,__SF_GPRS(%r15)# restore kernel registers lg %r2,__SF_EMPTY+16(%r15) # return exit reason code br %r14 @@ -358,6 +364,7 @@ ENTRY(system_call) UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER BPENTER __TI_flags(%r12),_TIF_NOBP stmg%r0,%r7,__PT_R0(%r11) + xgr %r0,%r0 mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC mvc __PT_PSW(16,%r11),__LC_SVC_OLD_PSW mvc __PT_INT_CODE(4,%r11),__LC_SVC_ILC @@ -640,6 +647,14 @@ ENTRY(pgm_check_handler) 4:lgr %r13,%r11 la %r11,STACK_FRAME_OVERHEAD(%r15) stmg%r0,%r7,__PT_R0(%r11) + xgr %r0,%r0 # clear user space registers + xgr %r1,%r1 + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 + xgr %r6,%r6 + xgr %r7,%r7 mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC stmg%r8,%r9,__PT_PSW(%r11) mvc __PT_INT_CODE(4,%r11),__LC_PGM_ILC @@ -706,6 +721,15 @@ ENTRY(io_int_handler) lmg %r8,%r9,__LC_IO_OLD_PSW SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER stmg%r0,%r7,__PT_R0(%r11) + xgr %r0,%r0 # clear user space registers + xgr %r1,%r1 + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 + xgr %r6,%r6 + xgr %r7,%r7 + xgr %r10,%r10 mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC stmg%r8,%r9,__PT_PSW(%r11) mvc __PT_INT_CODE(12,%r11),__LC_SUBCHANNEL_ID @@ -924,6 +948,15 @@ ENTRY(ext_int_handler) lmg %r8,%r9,__LC_EXT_OLD_PSW SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER stmg%r0,%r7,__PT_R0(%r11) + xgr %r0,%r0 # clear user space registers + xgr %r1,%r1 + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 + xgr %r6,%r6 + xgr %r7,%r7 + xgr %r10,%r10 mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC stmg%r8,%r9,__PT_PSW(%r11) lghi%r1,__LC_EXT_PARAMS2 @@ -1133,6 +1166,14 @@ ENTRY(mcck_int_handler) .Lmcck_skip: lghi%r14,__LC_GPREGS_SAVE_AREA+64 stmg%r0,%r7,__PT_R0(%r11) + xgr %r0,%r0 # clear user space registers + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 + xgr %r6,%r6 + xgr %r7,%r7 + xgr %r10,%r10 mvc __PT_R8(64,%r11),0(%r14) stmg%r8,%r9,__PT_PSW(%r11) xc __PT_FLAGS(8,%r11),__PT_FLAGS(%r11) -- Regards QingFeng Hao
Re: [PATCH 6/6] s390: scrub registers on kernel entry and KVM exit
在 2018/1/17 17:48, Martin Schwidefsky 写道: Clear all user space registers on entry to the kernel and all KVM guest registers on KVM guest exit if the register does not contain either a parameter or a result value. I am not sure if I understand this but it will be safer? And can we abstract the operations to be a macro like CLEAR_REG_7? Thanks Suggested-by: Christian Borntraeger Reviewed-by: Christian Borntraeger Signed-off-by: Martin Schwidefsky --- arch/s390/kernel/entry.S | 41 + 1 file changed, 41 insertions(+) diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S index 2a22c03..47227d3 100644 --- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -322,6 +322,12 @@ ENTRY(sie64a) sie_exit: lg %r14,__SF_EMPTY+8(%r15) # load guest register save area stmg%r0,%r13,0(%r14)# save guest gprs 0-13 + xgr %r0,%r0 # clear guest registers + xgr %r1,%r1 + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 lmg %r6,%r14,__SF_GPRS(%r15)# restore kernel registers lg %r2,__SF_EMPTY+16(%r15) # return exit reason code br %r14 @@ -358,6 +364,7 @@ ENTRY(system_call) UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER BPENTER __TI_flags(%r12),_TIF_NOBP stmg%r0,%r7,__PT_R0(%r11) + xgr %r0,%r0 mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC mvc __PT_PSW(16,%r11),__LC_SVC_OLD_PSW mvc __PT_INT_CODE(4,%r11),__LC_SVC_ILC @@ -640,6 +647,14 @@ ENTRY(pgm_check_handler) 4:lgr %r13,%r11 la %r11,STACK_FRAME_OVERHEAD(%r15) stmg%r0,%r7,__PT_R0(%r11) + xgr %r0,%r0 # clear user space registers + xgr %r1,%r1 + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 + xgr %r6,%r6 + xgr %r7,%r7 mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC stmg%r8,%r9,__PT_PSW(%r11) mvc __PT_INT_CODE(4,%r11),__LC_PGM_ILC @@ -706,6 +721,15 @@ ENTRY(io_int_handler) lmg %r8,%r9,__LC_IO_OLD_PSW SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER stmg%r0,%r7,__PT_R0(%r11) + xgr %r0,%r0 # clear user space registers + xgr %r1,%r1 + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 + xgr %r6,%r6 + xgr %r7,%r7 + xgr %r10,%r10 mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC stmg%r8,%r9,__PT_PSW(%r11) mvc __PT_INT_CODE(12,%r11),__LC_SUBCHANNEL_ID @@ -924,6 +948,15 @@ ENTRY(ext_int_handler) lmg %r8,%r9,__LC_EXT_OLD_PSW SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER stmg%r0,%r7,__PT_R0(%r11) + xgr %r0,%r0 # clear user space registers + xgr %r1,%r1 + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 + xgr %r6,%r6 + xgr %r7,%r7 + xgr %r10,%r10 mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC stmg%r8,%r9,__PT_PSW(%r11) lghi%r1,__LC_EXT_PARAMS2 @@ -1133,6 +1166,14 @@ ENTRY(mcck_int_handler) .Lmcck_skip: lghi%r14,__LC_GPREGS_SAVE_AREA+64 stmg%r0,%r7,__PT_R0(%r11) + xgr %r0,%r0 # clear user space registers + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 + xgr %r6,%r6 + xgr %r7,%r7 + xgr %r10,%r10 mvc __PT_R8(64,%r11),0(%r14) stmg%r8,%r9,__PT_PSW(%r11) xc __PT_FLAGS(8,%r11),__PT_FLAGS(%r11) -- Regards QingFeng Hao
Re: [RESEND] phy: sun4i-usb: add support for R40 USB PHY
Hi Kishon, On Mon, Jan 15, 2018 at 11:06 PM, Hermann Lauerwrote: > On Wed, Jan 03, 2018 at 04:49:44PM +0800, Icenowy Zheng wrote: >> Allwinner R40 features a USB PHY like the one in A64, but with 3 PHYs. >> >> Add support for it. >> >> Signed-off-by: Icenowy Zheng >> Acked-by: Maxime Ripard >> Acked-by: Rob Herring > > You may add > > Tested-by: hermann.la...@iwr.uni-heidelberg.de Gentle ping for this patch to be included in 4.16 ChenYu
Re: [RESEND] phy: sun4i-usb: add support for R40 USB PHY
Hi Kishon, On Mon, Jan 15, 2018 at 11:06 PM, Hermann Lauer wrote: > On Wed, Jan 03, 2018 at 04:49:44PM +0800, Icenowy Zheng wrote: >> Allwinner R40 features a USB PHY like the one in A64, but with 3 PHYs. >> >> Add support for it. >> >> Signed-off-by: Icenowy Zheng >> Acked-by: Maxime Ripard >> Acked-by: Rob Herring > > You may add > > Tested-by: hermann.la...@iwr.uni-heidelberg.de Gentle ping for this patch to be included in 4.16 ChenYu
Re: [RESEND PATCH 2/3] x86/apic/kexec: Enable legacy irq mode before jump to kexec/kdump kernel
Hi Baoquan, At 01/17/2018 06:08 PM, Baoquan He wrote: On 01/17/18 at 05:47pm, Dou Liyang wrote: Hi Baoquan, At 01/05/2018 12:38 PM, Baoquan He wrote: In commit commit 522e66464467 ("x86/apic: Disable I/O APIC before shutdown of the local APIC"). lapic_shutdown() invocation is moved after disable_IO_APIC(). In fact in disable_IO_APIC(), it not only calls clear_IO_APIC() to disable IO-APIC, also sets sets LAPIC and IO-APIC to make system be PIC or Virtual wire mode. While the above commit putting disable_IO_APIC earlier causes local APIC is completely disabled. So the legacy irq mode is disabled too before jump to kexec/kdump kernel. I have a question: As you said, Due to disable_IO_APIC() is triggered before lapic_shutdown(), So the interrupt virtual wire mode will be disabled. but, I found that: After machine_crash_shutdown() is executed, Linux will call machine_kexec(), and in machine_kexec(), disable_IO_APIC() will also be called again, why it can't switch to virtual wire mode successfully? Or is my understanding wrong? The disable_IO_APIC() calling has a condition check, if (image->preserve_context) { disable_IO_APIC(); } For preserve_context case, it comes from kernel_kexec(). You can check it in kexec man page, that is another scenario we use kexec for. But not kexec and kdump. Understood! This patch looks good to me and I also tested it, it's OK. Thanks, dou. +--+ | __crash_kexec| +--+ | |+-+ +--> | machine_crash_shutdown | |+++ | | | | +-+ | +> | disable_IO_APIC | | | +-+ | | | | ++ | +-^+ lapic_shutdown | |++ | |+-+ +--> | machine_kexec | |+++ | | | | +-+ | +> | disable_IO_APIC | |+-+ | v Thanks, dou. In normal kernel it defaults to be PIC mode or Virtual Wire mode during system initialization before APIC mode is enabled and this is done by BIOS initialization. But kexec/kdump kernel won't go through BIOS, so we should set system as PIC or Virtual Wire mode before jump to kdump kernel code directly. So let's take clear_IO_APIC out from disable_IO_APIC and rename disable_IO_APIC as switch_to_legacy_irq_mode. Then only call clear_IO_APIC when IO-APIC need be disabled. And call switch_to_legacy_irq_mode before kexec/kdump jumping. Signed-off-by: Baoquan He--- arch/x86/include/asm/io_apic.h | 3 ++- arch/x86/kernel/apic/io_apic.c | 12 arch/x86/kernel/crash.c| 2 +- arch/x86/kernel/machine_kexec_32.c | 15 +-- arch/x86/kernel/machine_kexec_64.c | 15 +-- arch/x86/kernel/reboot.c | 2 +- 6 files changed, 18 insertions(+), 31 deletions(-) diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h index a8834dd546cd..e38ad3863a2c 100644 --- a/arch/x86/include/asm/io_apic.h +++ b/arch/x86/include/asm/io_apic.h @@ -192,7 +192,8 @@ static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg) extern void setup_IO_APIC(void); extern void enable_IO_APIC(void); -extern void disable_IO_APIC(void); +extern void clear_IO_APIC (void); +extern void switch_to_legacy_irq_mode(void); extern int IO_APIC_get_PCI_irq_vector(int bus, int devfn, int pin); extern void print_IO_APICs(void); #else /* !CONFIG_X86_IO_APIC */ diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 8a7963421460..a47aa915d18c 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -587,7 +587,7 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) mpc_ioapic_id(apic), pin); } -static void clear_IO_APIC (void) +void clear_IO_APIC (void) { int apic, pin; @@ -1439,15 +1439,11 @@ void native_disable_io_apic(void) } /* - * Not an __init, needed by the reboot code + * Not an __init, needed by kexec/kdump code. + * For safety IO-APIC and Local APIC need be cleared before this. */ -void disable_IO_APIC(void) +void switch_to_legacy_irq_mode(void) { - /* -* Clear the IO-APIC before rebooting: -*/ - clear_IO_APIC(); - if (!nr_legacy_irqs()) return; diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 10e74d4778a1..318ffeaaf55a 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -199,7 +199,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs) #ifdef CONFIG_X86_IO_APIC /* Prevent crash_kexec() from deadlocking on ioapic_lock. */ ioapic_zap_locks(); - disable_IO_APIC(); + clear_IO_APIC(); #endif lapic_shutdown(); #ifdef CONFIG_HPET_TIMER diff --git a/arch/x86/kernel/machine_kexec_32.c
Re: [RESEND PATCH 2/3] x86/apic/kexec: Enable legacy irq mode before jump to kexec/kdump kernel
Hi Baoquan, At 01/17/2018 06:08 PM, Baoquan He wrote: On 01/17/18 at 05:47pm, Dou Liyang wrote: Hi Baoquan, At 01/05/2018 12:38 PM, Baoquan He wrote: In commit commit 522e66464467 ("x86/apic: Disable I/O APIC before shutdown of the local APIC"). lapic_shutdown() invocation is moved after disable_IO_APIC(). In fact in disable_IO_APIC(), it not only calls clear_IO_APIC() to disable IO-APIC, also sets sets LAPIC and IO-APIC to make system be PIC or Virtual wire mode. While the above commit putting disable_IO_APIC earlier causes local APIC is completely disabled. So the legacy irq mode is disabled too before jump to kexec/kdump kernel. I have a question: As you said, Due to disable_IO_APIC() is triggered before lapic_shutdown(), So the interrupt virtual wire mode will be disabled. but, I found that: After machine_crash_shutdown() is executed, Linux will call machine_kexec(), and in machine_kexec(), disable_IO_APIC() will also be called again, why it can't switch to virtual wire mode successfully? Or is my understanding wrong? The disable_IO_APIC() calling has a condition check, if (image->preserve_context) { disable_IO_APIC(); } For preserve_context case, it comes from kernel_kexec(). You can check it in kexec man page, that is another scenario we use kexec for. But not kexec and kdump. Understood! This patch looks good to me and I also tested it, it's OK. Thanks, dou. +--+ | __crash_kexec| +--+ | |+-+ +--> | machine_crash_shutdown | |+++ | | | | +-+ | +> | disable_IO_APIC | | | +-+ | | | | ++ | +-^+ lapic_shutdown | |++ | |+-+ +--> | machine_kexec | |+++ | | | | +-+ | +> | disable_IO_APIC | |+-+ | v Thanks, dou. In normal kernel it defaults to be PIC mode or Virtual Wire mode during system initialization before APIC mode is enabled and this is done by BIOS initialization. But kexec/kdump kernel won't go through BIOS, so we should set system as PIC or Virtual Wire mode before jump to kdump kernel code directly. So let's take clear_IO_APIC out from disable_IO_APIC and rename disable_IO_APIC as switch_to_legacy_irq_mode. Then only call clear_IO_APIC when IO-APIC need be disabled. And call switch_to_legacy_irq_mode before kexec/kdump jumping. Signed-off-by: Baoquan He --- arch/x86/include/asm/io_apic.h | 3 ++- arch/x86/kernel/apic/io_apic.c | 12 arch/x86/kernel/crash.c| 2 +- arch/x86/kernel/machine_kexec_32.c | 15 +-- arch/x86/kernel/machine_kexec_64.c | 15 +-- arch/x86/kernel/reboot.c | 2 +- 6 files changed, 18 insertions(+), 31 deletions(-) diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h index a8834dd546cd..e38ad3863a2c 100644 --- a/arch/x86/include/asm/io_apic.h +++ b/arch/x86/include/asm/io_apic.h @@ -192,7 +192,8 @@ static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg) extern void setup_IO_APIC(void); extern void enable_IO_APIC(void); -extern void disable_IO_APIC(void); +extern void clear_IO_APIC (void); +extern void switch_to_legacy_irq_mode(void); extern int IO_APIC_get_PCI_irq_vector(int bus, int devfn, int pin); extern void print_IO_APICs(void); #else /* !CONFIG_X86_IO_APIC */ diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 8a7963421460..a47aa915d18c 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -587,7 +587,7 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) mpc_ioapic_id(apic), pin); } -static void clear_IO_APIC (void) +void clear_IO_APIC (void) { int apic, pin; @@ -1439,15 +1439,11 @@ void native_disable_io_apic(void) } /* - * Not an __init, needed by the reboot code + * Not an __init, needed by kexec/kdump code. + * For safety IO-APIC and Local APIC need be cleared before this. */ -void disable_IO_APIC(void) +void switch_to_legacy_irq_mode(void) { - /* -* Clear the IO-APIC before rebooting: -*/ - clear_IO_APIC(); - if (!nr_legacy_irqs()) return; diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 10e74d4778a1..318ffeaaf55a 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -199,7 +199,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs) #ifdef CONFIG_X86_IO_APIC /* Prevent crash_kexec() from deadlocking on ioapic_lock. */ ioapic_zap_locks(); - disable_IO_APIC(); + clear_IO_APIC(); #endif lapic_shutdown(); #ifdef CONFIG_HPET_TIMER diff --git a/arch/x86/kernel/machine_kexec_32.c
Re: [PATCH v22 2/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ
On 01/18/2018 12:44 AM, Michael S. Tsirkin wrote: On Wed, Jan 17, 2018 at 01:10:11PM +0800, Wei Wang wrote: +static void virtballoon_changed(struct virtio_device *vdev) +{ + struct virtio_balloon *vb = vdev->priv; + unsigned long flags; + __u32 cmd_id; + s64 diff = towards_target(vb); + + if (diff) { + spin_lock_irqsave(>stop_update_lock, flags); + if (!vb->stop_update) Why do you ignore stop_update for freeze? This means new wq entries can be added during remove causing use after free issues. I think stop_update isn't needed, because the lock has already been handled internally by the APIs. Similar examples like mem_cgroup_css_free() in "mm/memcontrol.c", there is no such locks used for cancel_work_sync(>high_work). Best, Wei
Re: [PATCH v22 2/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ
On 01/18/2018 12:44 AM, Michael S. Tsirkin wrote: On Wed, Jan 17, 2018 at 01:10:11PM +0800, Wei Wang wrote: +static void virtballoon_changed(struct virtio_device *vdev) +{ + struct virtio_balloon *vb = vdev->priv; + unsigned long flags; + __u32 cmd_id; + s64 diff = towards_target(vb); + + if (diff) { + spin_lock_irqsave(>stop_update_lock, flags); + if (!vb->stop_update) Why do you ignore stop_update for freeze? This means new wq entries can be added during remove causing use after free issues. I think stop_update isn't needed, because the lock has already been handled internally by the APIs. Similar examples like mem_cgroup_css_free() in "mm/memcontrol.c", there is no such locks used for cancel_work_sync(>high_work). Best, Wei
Re: [PATCH v5 20/44] dt-bindings: clock: Add bindings for TI DA8XX USB PHY clocks
On Friday 19 January 2018 12:30 AM, David Lechner wrote: > On 01/18/2018 06:10 AM, Sekhar Nori wrote: >> On Monday 08 January 2018 07:47 AM, David Lechner wrote: >>> This adds a new binding for TI DA8XX USB PHY clocks. These clocks are >>> part >>> of a syscon register called CFGCHIP3. >> >> CFGCHIP2 >> >>> >>> Signed-off-by: David Lechner>> >>> +Examples: >>> + >>> + cfgchip: syscon@1417c { >>> + compatible = "ti,da830-cfgchip", "syscon", "simple-mfd"; >>> + reg = <0x1417c 0x14>; >>> + >>> + usb0_phy_clk: usb0-phy-clock { >>> + compatible = "ti,da830-usb0-phy-clock"; >>> + #clock-cells = <0>; >>> + clocks = <_refclkin>, <_aux_clk>, < 1>; >>> + clock-names = "usb_refclkin", "auxclk", "usb0_lpsc"; >>> + clock-output-names = "usb0_phy_clk"; >> >> Probably call this "usb0_phy" to match with the input name used for >> usb1_phy_clk? > > I was planning on just dropping clock-output-names altogether actually > since they don't really do anything useful. > > Also, I was considering sending a series to change the con_id for the > PHY clocks. > > My current revision of the device tree bindings is looking like this: > > usb_phy: usb-phy { > compatible = "ti,da830-usb-phy"; > #phy-cells = <1>; > clocks = <_phy_clk 0>, <_phy_clk 1>; > clock-names = "usb20_phy", "usb11_phy"; > status = "disabled"; > }; > usb_phy_clk: usb-phy-clocks { > compatible = "ti,da830-usb-phy-clocks"; > #clock-cells = <1>; > clocks = < 1>, <_refclkin>, <_auxclk>; > clock-names = "fck", "usb_refclkin", "auxclk"; > }; > > The clock-names = "usb20_phy", "usb11_phy" comes from the existing con_ids > in the PHY driver's clk_get()s. > > However, in device tree, we are usually referring to the USB devices as > usb0 and usb1 instead of usb20 and usb11, respectively. Figure 6-2 "USB > Clocking Diagram" in spruh82c.pdf (AM1808 TRM) calls these clocks "CLK48" > and "CLK48MHz from USB 2.0 PHY", so I was thinking of changing the con_ids > (and therefore also clock-names) to "usb0_clk48" and "usb1_clk48". This is fine with me. Thanks, Sekhar
Re: [PATCH v5 20/44] dt-bindings: clock: Add bindings for TI DA8XX USB PHY clocks
On Friday 19 January 2018 12:30 AM, David Lechner wrote: > On 01/18/2018 06:10 AM, Sekhar Nori wrote: >> On Monday 08 January 2018 07:47 AM, David Lechner wrote: >>> This adds a new binding for TI DA8XX USB PHY clocks. These clocks are >>> part >>> of a syscon register called CFGCHIP3. >> >> CFGCHIP2 >> >>> >>> Signed-off-by: David Lechner >> >>> +Examples: >>> + >>> + cfgchip: syscon@1417c { >>> + compatible = "ti,da830-cfgchip", "syscon", "simple-mfd"; >>> + reg = <0x1417c 0x14>; >>> + >>> + usb0_phy_clk: usb0-phy-clock { >>> + compatible = "ti,da830-usb0-phy-clock"; >>> + #clock-cells = <0>; >>> + clocks = <_refclkin>, <_aux_clk>, < 1>; >>> + clock-names = "usb_refclkin", "auxclk", "usb0_lpsc"; >>> + clock-output-names = "usb0_phy_clk"; >> >> Probably call this "usb0_phy" to match with the input name used for >> usb1_phy_clk? > > I was planning on just dropping clock-output-names altogether actually > since they don't really do anything useful. > > Also, I was considering sending a series to change the con_id for the > PHY clocks. > > My current revision of the device tree bindings is looking like this: > > usb_phy: usb-phy { > compatible = "ti,da830-usb-phy"; > #phy-cells = <1>; > clocks = <_phy_clk 0>, <_phy_clk 1>; > clock-names = "usb20_phy", "usb11_phy"; > status = "disabled"; > }; > usb_phy_clk: usb-phy-clocks { > compatible = "ti,da830-usb-phy-clocks"; > #clock-cells = <1>; > clocks = < 1>, <_refclkin>, <_auxclk>; > clock-names = "fck", "usb_refclkin", "auxclk"; > }; > > The clock-names = "usb20_phy", "usb11_phy" comes from the existing con_ids > in the PHY driver's clk_get()s. > > However, in device tree, we are usually referring to the USB devices as > usb0 and usb1 instead of usb20 and usb11, respectively. Figure 6-2 "USB > Clocking Diagram" in spruh82c.pdf (AM1808 TRM) calls these clocks "CLK48" > and "CLK48MHz from USB 2.0 PHY", so I was thinking of changing the con_ids > (and therefore also clock-names) to "usb0_clk48" and "usb1_clk48". This is fine with me. Thanks, Sekhar
Re: [PATCH v5 43/44] ARM: da8xx-dt: switch to device tree clocks
On Friday 19 January 2018 12:10 AM, David Lechner wrote: > On 01/18/2018 09:27 AM, Sekhar Nori wrote: >> On Monday 08 January 2018 07:55 AM, David Lechner wrote: >>> This removes all of the clock init code from da8xx-dt.c. This includes >>> all of the OF_DEV_AUXDATA that was just used for looking up clocks. >>> >>> Note: You need to have clocks defined in your device tree or your system >>> won't boot after this patch. >> >> I am not sure we can do this then, as we cannot break DT compatibility. >> > > In the past, you have told me that you don't want the .dts changes and code > changes in the same patch. In this case, if you apply either one Thats still true. > separately, > it will break clocks. It does not matter which one is first. > > So either we have to squash [PATCH v5 44/44] ARM: dts: da850: Add clocks > into this patch or deal with the breakage. I am not so much concerned about temporary breakage in the middle of the series, but more about DT compatibility after the entire series is applied. Thanks, Sekhar
Re: [PATCH v5 43/44] ARM: da8xx-dt: switch to device tree clocks
On Friday 19 January 2018 12:10 AM, David Lechner wrote: > On 01/18/2018 09:27 AM, Sekhar Nori wrote: >> On Monday 08 January 2018 07:55 AM, David Lechner wrote: >>> This removes all of the clock init code from da8xx-dt.c. This includes >>> all of the OF_DEV_AUXDATA that was just used for looking up clocks. >>> >>> Note: You need to have clocks defined in your device tree or your system >>> won't boot after this patch. >> >> I am not sure we can do this then, as we cannot break DT compatibility. >> > > In the past, you have told me that you don't want the .dts changes and code > changes in the same patch. In this case, if you apply either one Thats still true. > separately, > it will break clocks. It does not matter which one is first. > > So either we have to squash [PATCH v5 44/44] ARM: dts: da850: Add clocks > into this patch or deal with the breakage. I am not so much concerned about temporary breakage in the middle of the series, but more about DT compatibility after the entire series is applied. Thanks, Sekhar
[PATCH V2 net-next 1/4] net: hns3: add support for get_regs
From: Fuyun LiangThis patch adds get_regs support for ethtool cmd. Signed-off-by: Fuyun Liang Signed-off-by: Peng Li --- drivers/net/ethernet/hisilicon/hns3/hnae3.h| 3 +- drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 23 +++ .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h | 4 + .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 176 + 4 files changed, 205 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h index 634e932..d104ce5 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h @@ -356,7 +356,8 @@ struct hnae3_ae_ops { u32 stringset, u8 *data); int (*get_sset_count)(struct hnae3_handle *handle, int stringset); - void (*get_regs)(struct hnae3_handle *handle, void *data); + void (*get_regs)(struct hnae3_handle *handle, u32 *version, +void *data); int (*get_regs_len)(struct hnae3_handle *handle); u32 (*get_rss_key_size)(struct hnae3_handle *handle); diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c index 358f780..1c8b293 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c @@ -1063,6 +1063,27 @@ static int hns3_set_coalesce(struct net_device *netdev, return 0; } +static int hns3_get_regs_len(struct net_device *netdev) +{ + struct hnae3_handle *h = hns3_get_handle(netdev); + + if (!h->ae_algo->ops->get_regs_len) + return -EOPNOTSUPP; + + return h->ae_algo->ops->get_regs_len(h); +} + +static void hns3_get_regs(struct net_device *netdev, + struct ethtool_regs *cmd, void *data) +{ + struct hnae3_handle *h = hns3_get_handle(netdev); + + if (!h->ae_algo->ops->get_regs) + return; + + h->ae_algo->ops->get_regs(h, >version, data); +} + static const struct ethtool_ops hns3vf_ethtool_ops = { .get_drvinfo = hns3_get_drvinfo, .get_ringparam = hns3_get_ringparam, @@ -1103,6 +1124,8 @@ static const struct ethtool_ops hns3_ethtool_ops = { .set_channels = hns3_set_channels, .get_coalesce = hns3_get_coalesce, .set_coalesce = hns3_set_coalesce, + .get_regs_len = hns3_get_regs_len, + .get_regs = hns3_get_regs, }; void hns3_ethtool_set_ops(struct net_device *netdev) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h index 3c3159b..2561e7a 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h @@ -102,6 +102,10 @@ enum hclge_opcode_type { HCLGE_OPC_STATS_64_BIT = 0x0030, HCLGE_OPC_STATS_32_BIT = 0x0031, HCLGE_OPC_STATS_MAC = 0x0032, + + HCLGE_OPC_QUERY_REG_NUM = 0x0040, + HCLGE_OPC_QUERY_32_BIT_REG = 0x0041, + HCLGE_OPC_QUERY_64_BIT_REG = 0x0042, /* Device management command */ /* MAC commond */ diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 27f0ab6..c3d2cca 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -5544,6 +5544,180 @@ static int hclge_set_channels(struct hnae3_handle *handle, u32 new_tqps_num) return ret; } +static int hclge_get_regs_num(struct hclge_dev *hdev, u32 *regs_num_32_bit, + u32 *regs_num_64_bit) +{ + struct hclge_desc desc; + u32 total_num; + int ret; + + hclge_cmd_setup_basic_desc(, HCLGE_OPC_QUERY_REG_NUM, true); + ret = hclge_cmd_send(>hw, , 1); + if (ret) { + dev_err(>pdev->dev, + "Query register number cmd failed, ret = %d.\n", ret); + return ret; + } + + *regs_num_32_bit = le32_to_cpu(desc.data[0]); + *regs_num_64_bit = le32_to_cpu(desc.data[1]); + + total_num = *regs_num_32_bit + *regs_num_64_bit; + if (!total_num) + return -EINVAL; + + return 0; +} + +static int hclge_get_32_bit_regs(struct hclge_dev *hdev, u32 regs_num, +void *data) +{ +#define HCLGE_32_BIT_REG_RTN_DATANUM 8 + + struct hclge_desc *desc; + u32 *reg_val = data; + __le32 *desc_data; + int cmd_num; + int i, k, n; + int ret; + + if (regs_num == 0) + return 0; + + cmd_num = DIV_ROUND_UP(regs_num + 2, HCLGE_32_BIT_REG_RTN_DATANUM); + desc = kcalloc(cmd_num, sizeof(struct hclge_desc), GFP_KERNEL); + if (!desc)
[PATCH V2 net-next 1/4] net: hns3: add support for get_regs
From: Fuyun Liang This patch adds get_regs support for ethtool cmd. Signed-off-by: Fuyun Liang Signed-off-by: Peng Li --- drivers/net/ethernet/hisilicon/hns3/hnae3.h| 3 +- drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 23 +++ .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h | 4 + .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 176 + 4 files changed, 205 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h index 634e932..d104ce5 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h @@ -356,7 +356,8 @@ struct hnae3_ae_ops { u32 stringset, u8 *data); int (*get_sset_count)(struct hnae3_handle *handle, int stringset); - void (*get_regs)(struct hnae3_handle *handle, void *data); + void (*get_regs)(struct hnae3_handle *handle, u32 *version, +void *data); int (*get_regs_len)(struct hnae3_handle *handle); u32 (*get_rss_key_size)(struct hnae3_handle *handle); diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c index 358f780..1c8b293 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c @@ -1063,6 +1063,27 @@ static int hns3_set_coalesce(struct net_device *netdev, return 0; } +static int hns3_get_regs_len(struct net_device *netdev) +{ + struct hnae3_handle *h = hns3_get_handle(netdev); + + if (!h->ae_algo->ops->get_regs_len) + return -EOPNOTSUPP; + + return h->ae_algo->ops->get_regs_len(h); +} + +static void hns3_get_regs(struct net_device *netdev, + struct ethtool_regs *cmd, void *data) +{ + struct hnae3_handle *h = hns3_get_handle(netdev); + + if (!h->ae_algo->ops->get_regs) + return; + + h->ae_algo->ops->get_regs(h, >version, data); +} + static const struct ethtool_ops hns3vf_ethtool_ops = { .get_drvinfo = hns3_get_drvinfo, .get_ringparam = hns3_get_ringparam, @@ -1103,6 +1124,8 @@ static const struct ethtool_ops hns3_ethtool_ops = { .set_channels = hns3_set_channels, .get_coalesce = hns3_get_coalesce, .set_coalesce = hns3_set_coalesce, + .get_regs_len = hns3_get_regs_len, + .get_regs = hns3_get_regs, }; void hns3_ethtool_set_ops(struct net_device *netdev) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h index 3c3159b..2561e7a 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h @@ -102,6 +102,10 @@ enum hclge_opcode_type { HCLGE_OPC_STATS_64_BIT = 0x0030, HCLGE_OPC_STATS_32_BIT = 0x0031, HCLGE_OPC_STATS_MAC = 0x0032, + + HCLGE_OPC_QUERY_REG_NUM = 0x0040, + HCLGE_OPC_QUERY_32_BIT_REG = 0x0041, + HCLGE_OPC_QUERY_64_BIT_REG = 0x0042, /* Device management command */ /* MAC commond */ diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 27f0ab6..c3d2cca 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -5544,6 +5544,180 @@ static int hclge_set_channels(struct hnae3_handle *handle, u32 new_tqps_num) return ret; } +static int hclge_get_regs_num(struct hclge_dev *hdev, u32 *regs_num_32_bit, + u32 *regs_num_64_bit) +{ + struct hclge_desc desc; + u32 total_num; + int ret; + + hclge_cmd_setup_basic_desc(, HCLGE_OPC_QUERY_REG_NUM, true); + ret = hclge_cmd_send(>hw, , 1); + if (ret) { + dev_err(>pdev->dev, + "Query register number cmd failed, ret = %d.\n", ret); + return ret; + } + + *regs_num_32_bit = le32_to_cpu(desc.data[0]); + *regs_num_64_bit = le32_to_cpu(desc.data[1]); + + total_num = *regs_num_32_bit + *regs_num_64_bit; + if (!total_num) + return -EINVAL; + + return 0; +} + +static int hclge_get_32_bit_regs(struct hclge_dev *hdev, u32 regs_num, +void *data) +{ +#define HCLGE_32_BIT_REG_RTN_DATANUM 8 + + struct hclge_desc *desc; + u32 *reg_val = data; + __le32 *desc_data; + int cmd_num; + int i, k, n; + int ret; + + if (regs_num == 0) + return 0; + + cmd_num = DIV_ROUND_UP(regs_num + 2, HCLGE_32_BIT_REG_RTN_DATANUM); + desc = kcalloc(cmd_num, sizeof(struct hclge_desc), GFP_KERNEL); + if (!desc) + return -ENOMEM; + +
[PATCH V2 net-next 2/4] net: hns3: add manager table initialization for hardware
From: Fuyun LiangThe manager table is empty by default. If it is not initialized, the management pkgs like LLDP will be dropped by hardware. Default entries need to be added to manager table. Signed-off-by: Fuyun Liang Signed-off-by: Peng Li --- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h | 22 + .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 101 + 2 files changed, 123 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h index 2561e7a..1cd28e0 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h @@ -605,6 +605,28 @@ struct hclge_mac_vlan_mask_entry_cmd { u8 rsv2[14]; }; +#define HCLGE_MAC_MGR_MASK_VLAN_B BIT(0) +#define HCLGE_MAC_MGR_MASK_MAC_B BIT(1) +#define HCLGE_MAC_MGR_MASK_ETHERTYPE_B BIT(2) +#define HCLGE_MAC_ETHERTYPE_LLDP 0x88cc + +struct hclge_mac_mgr_tbl_entry_cmd { + u8 flags; + u8 resp_code; + __le16 vlan_tag; + __le32 mac_addr_hi32; + __le16 mac_addr_lo16; + __le16 rsv1; + __le16 ethter_type; + __le16 egress_port; + __le16 egress_queue; + u8 sw_port_id_aware; + u8 rsv2; + u8 i_port_bitmap; + u8 i_port_direction; + u8 rsv3[2]; +}; + #define HCLGE_CFG_MTA_MAC_SEL_S0x0 #define HCLGE_CFG_MTA_MAC_SEL_MGENMASK(1, 0) #define HCLGE_CFG_MTA_MAC_EN_B 0x7 diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index c3d2cca..6e64bed 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -392,6 +392,16 @@ static const struct hclge_comm_stats_str g_mac_stats_string[] = { HCLGE_MAC_STATS_FIELD_OFF(mac_rx_send_app_bad_pkt_num)} }; +static const struct hclge_mac_mgr_tbl_entry_cmd hclge_mgr_table[] = { + { + .flags = HCLGE_MAC_MGR_MASK_VLAN_B, + .ethter_type = cpu_to_le16(HCLGE_MAC_ETHERTYPE_LLDP), + .mac_addr_hi32 = cpu_to_le32(htonl(0x0180C200)), + .mac_addr_lo16 = cpu_to_le16(htons(0x000E)), + .i_port_bitmap = 0x1, + }, +}; + static int hclge_64_bit_update_stats(struct hclge_dev *hdev) { #define HCLGE_64_BIT_CMD_NUM 5 @@ -4249,6 +4259,91 @@ int hclge_rm_mc_addr_common(struct hclge_vport *vport, return status; } +static int hclge_get_mac_ethertype_cmd_status(struct hclge_dev *hdev, + u16 cmdq_resp, u8 resp_code) +{ +#define HCLGE_ETHERTYPE_SUCCESS_ADD0 +#define HCLGE_ETHERTYPE_ALREADY_ADD1 +#define HCLGE_ETHERTYPE_MGR_TBL_OVERFLOW 2 +#define HCLGE_ETHERTYPE_KEY_CONFLICT 3 + + int return_status; + + if (cmdq_resp) { + dev_err(>pdev->dev, + "cmdq execute failed for get_mac_ethertype_cmd_status, status=%d.\n", + cmdq_resp); + return -EIO; + } + + switch (resp_code) { + case HCLGE_ETHERTYPE_SUCCESS_ADD: + case HCLGE_ETHERTYPE_ALREADY_ADD: + return_status = 0; + break; + case HCLGE_ETHERTYPE_MGR_TBL_OVERFLOW: + dev_err(>pdev->dev, + "add mac ethertype failed for manager table overflow.\n"); + return_status = -EIO; + break; + case HCLGE_ETHERTYPE_KEY_CONFLICT: + dev_err(>pdev->dev, + "add mac ethertype failed for key conflict.\n"); + return_status = -EIO; + break; + default: + dev_err(>pdev->dev, + "add mac ethertype failed for undefined, code=%d.\n", + resp_code); + return_status = -EIO; + } + + return return_status; +} + +static int hclge_add_mgr_tbl(struct hclge_dev *hdev, +const struct hclge_mac_mgr_tbl_entry_cmd *req) +{ + struct hclge_desc desc; + u8 resp_code; + u16 retval; + int ret; + + hclge_cmd_setup_basic_desc(, HCLGE_OPC_MAC_ETHTYPE_ADD, false); + memcpy(desc.data, req, sizeof(struct hclge_mac_mgr_tbl_entry_cmd)); + + ret = hclge_cmd_send(>hw, , 1); + if (ret) { + dev_err(>pdev->dev, + "add mac ethertype failed for cmd_send, ret =%d.\n", + ret); + return ret; + } + + resp_code = (le32_to_cpu(desc.data[0]) >> 8) & 0xff; + retval = le16_to_cpu(desc.retval); + + return hclge_get_mac_ethertype_cmd_status(hdev, retval, resp_code);
[PATCH V2 net-next 3/4] net: hns3: add ethtool -p support for fiber port
From: Jian ShenAdd led location support for fiber port. The led will keep blinking when locating. Signed-off-by: Jian Shen Signed-off-by: Peng Li --- drivers/net/ethernet/hisilicon/hns3/hnae3.h| 2 + drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 12 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h | 20 +++ .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 70 ++ 4 files changed, 104 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h index d104ce5..fd06bc7 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h @@ -405,6 +405,8 @@ struct hnae3_ae_ops { int (*set_channels)(struct hnae3_handle *handle, u32 new_tqps_num); void (*get_flowctrl_adv)(struct hnae3_handle *handle, u32 *flowctrl_adv); + int (*set_led_id)(struct hnae3_handle *handle, + enum ethtool_phys_id_state status); }; struct hnae3_dcb_ops { diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c index 1c8b293..7410205 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c @@ -1084,6 +1084,17 @@ static void hns3_get_regs(struct net_device *netdev, h->ae_algo->ops->get_regs(h, >version, data); } +static int hns3_set_phys_id(struct net_device *netdev, + enum ethtool_phys_id_state state) +{ + struct hnae3_handle *h = hns3_get_handle(netdev); + + if (!h->ae_algo || !h->ae_algo->ops || !h->ae_algo->ops->set_led_id) + return -EOPNOTSUPP; + + return h->ae_algo->ops->set_led_id(h, state); +} + static const struct ethtool_ops hns3vf_ethtool_ops = { .get_drvinfo = hns3_get_drvinfo, .get_ringparam = hns3_get_ringparam, @@ -1126,6 +1137,7 @@ static const struct ethtool_ops hns3_ethtool_ops = { .set_coalesce = hns3_set_coalesce, .get_regs_len = hns3_get_regs_len, .get_regs = hns3_get_regs, + .set_phys_id = hns3_set_phys_id, }; void hns3_ethtool_set_ops(struct net_device *netdev) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h index 1cd28e0..122f862 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h @@ -227,6 +227,9 @@ enum hclge_opcode_type { /* Mailbox cmd */ HCLGEVF_OPC_MBX_PF_TO_VF= 0x2000, + + /* Led command */ + HCLGE_OPC_LED_STATUS_CFG= 0xB000, }; #define HCLGE_TQP_REG_OFFSET 0x8 @@ -807,6 +810,23 @@ struct hclge_reset_cmd { #define HCLGE_NIC_CMQ_DESC_NUM 1024 #define HCLGE_NIC_CMQ_DESC_NUM_S 3 +#define HCLGE_LED_PORT_SPEED_STATE_S 0 +#define HCLGE_LED_PORT_SPEED_STATE_M GENMASK(5, 0) +#define HCLGE_LED_ACTIVITY_STATE_S 0 +#define HCLGE_LED_ACTIVITY_STATE_M GENMASK(1, 0) +#define HCLGE_LED_LINK_STATE_S 0 +#define HCLGE_LED_LINK_STATE_M GENMASK(1, 0) +#define HCLGE_LED_LOCATE_STATE_S 0 +#define HCLGE_LED_LOCATE_STATE_M GENMASK(1, 0) + +struct hclge_set_led_state_cmd { + u8 port_speed_led_config; + u8 link_led_config; + u8 activity_led_config; + u8 locate_led_config; + u8 rsv[20]; +}; + int hclge_cmd_init(struct hclge_dev *hdev); static inline void hclge_write_reg(void __iomem *base, u32 reg, u32 value) { diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 6e64bed..12150f2 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -5819,6 +5819,75 @@ static void hclge_get_regs(struct hnae3_handle *handle, u32 *version, "Get 64 bit register failed, ret = %d.\n", ret); } +static int hclge_set_led_status_sfp(struct hclge_dev *hdev, u8 speed_led_status, + u8 act_led_status, u8 link_led_status, + u8 locate_led_status) +{ + struct hclge_set_led_state_cmd *req; + struct hclge_desc desc; + int ret; + + hclge_cmd_setup_basic_desc(, HCLGE_OPC_LED_STATUS_CFG, false); + + req = (struct hclge_set_led_state_cmd *)desc.data; + hnae_set_field(req->port_speed_led_config, HCLGE_LED_PORT_SPEED_STATE_M, + HCLGE_LED_PORT_SPEED_STATE_S, speed_led_status); + hnae_set_field(req->link_led_config, HCLGE_LED_ACTIVITY_STATE_M, + HCLGE_LED_ACTIVITY_STATE_S, act_led_status); + hnae_set_field(req->activity_led_config, HCLGE_LED_LINK_STATE_M, +
[PATCH V2 net-next 2/4] net: hns3: add manager table initialization for hardware
From: Fuyun Liang The manager table is empty by default. If it is not initialized, the management pkgs like LLDP will be dropped by hardware. Default entries need to be added to manager table. Signed-off-by: Fuyun Liang Signed-off-by: Peng Li --- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h | 22 + .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 101 + 2 files changed, 123 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h index 2561e7a..1cd28e0 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h @@ -605,6 +605,28 @@ struct hclge_mac_vlan_mask_entry_cmd { u8 rsv2[14]; }; +#define HCLGE_MAC_MGR_MASK_VLAN_B BIT(0) +#define HCLGE_MAC_MGR_MASK_MAC_B BIT(1) +#define HCLGE_MAC_MGR_MASK_ETHERTYPE_B BIT(2) +#define HCLGE_MAC_ETHERTYPE_LLDP 0x88cc + +struct hclge_mac_mgr_tbl_entry_cmd { + u8 flags; + u8 resp_code; + __le16 vlan_tag; + __le32 mac_addr_hi32; + __le16 mac_addr_lo16; + __le16 rsv1; + __le16 ethter_type; + __le16 egress_port; + __le16 egress_queue; + u8 sw_port_id_aware; + u8 rsv2; + u8 i_port_bitmap; + u8 i_port_direction; + u8 rsv3[2]; +}; + #define HCLGE_CFG_MTA_MAC_SEL_S0x0 #define HCLGE_CFG_MTA_MAC_SEL_MGENMASK(1, 0) #define HCLGE_CFG_MTA_MAC_EN_B 0x7 diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index c3d2cca..6e64bed 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -392,6 +392,16 @@ static const struct hclge_comm_stats_str g_mac_stats_string[] = { HCLGE_MAC_STATS_FIELD_OFF(mac_rx_send_app_bad_pkt_num)} }; +static const struct hclge_mac_mgr_tbl_entry_cmd hclge_mgr_table[] = { + { + .flags = HCLGE_MAC_MGR_MASK_VLAN_B, + .ethter_type = cpu_to_le16(HCLGE_MAC_ETHERTYPE_LLDP), + .mac_addr_hi32 = cpu_to_le32(htonl(0x0180C200)), + .mac_addr_lo16 = cpu_to_le16(htons(0x000E)), + .i_port_bitmap = 0x1, + }, +}; + static int hclge_64_bit_update_stats(struct hclge_dev *hdev) { #define HCLGE_64_BIT_CMD_NUM 5 @@ -4249,6 +4259,91 @@ int hclge_rm_mc_addr_common(struct hclge_vport *vport, return status; } +static int hclge_get_mac_ethertype_cmd_status(struct hclge_dev *hdev, + u16 cmdq_resp, u8 resp_code) +{ +#define HCLGE_ETHERTYPE_SUCCESS_ADD0 +#define HCLGE_ETHERTYPE_ALREADY_ADD1 +#define HCLGE_ETHERTYPE_MGR_TBL_OVERFLOW 2 +#define HCLGE_ETHERTYPE_KEY_CONFLICT 3 + + int return_status; + + if (cmdq_resp) { + dev_err(>pdev->dev, + "cmdq execute failed for get_mac_ethertype_cmd_status, status=%d.\n", + cmdq_resp); + return -EIO; + } + + switch (resp_code) { + case HCLGE_ETHERTYPE_SUCCESS_ADD: + case HCLGE_ETHERTYPE_ALREADY_ADD: + return_status = 0; + break; + case HCLGE_ETHERTYPE_MGR_TBL_OVERFLOW: + dev_err(>pdev->dev, + "add mac ethertype failed for manager table overflow.\n"); + return_status = -EIO; + break; + case HCLGE_ETHERTYPE_KEY_CONFLICT: + dev_err(>pdev->dev, + "add mac ethertype failed for key conflict.\n"); + return_status = -EIO; + break; + default: + dev_err(>pdev->dev, + "add mac ethertype failed for undefined, code=%d.\n", + resp_code); + return_status = -EIO; + } + + return return_status; +} + +static int hclge_add_mgr_tbl(struct hclge_dev *hdev, +const struct hclge_mac_mgr_tbl_entry_cmd *req) +{ + struct hclge_desc desc; + u8 resp_code; + u16 retval; + int ret; + + hclge_cmd_setup_basic_desc(, HCLGE_OPC_MAC_ETHTYPE_ADD, false); + memcpy(desc.data, req, sizeof(struct hclge_mac_mgr_tbl_entry_cmd)); + + ret = hclge_cmd_send(>hw, , 1); + if (ret) { + dev_err(>pdev->dev, + "add mac ethertype failed for cmd_send, ret =%d.\n", + ret); + return ret; + } + + resp_code = (le32_to_cpu(desc.data[0]) >> 8) & 0xff; + retval = le16_to_cpu(desc.retval); + + return hclge_get_mac_ethertype_cmd_status(hdev, retval, resp_code); +} + +static int init_mgr_tbl(struct hclge_dev *hdev) +{ + int
[PATCH V2 net-next 3/4] net: hns3: add ethtool -p support for fiber port
From: Jian Shen Add led location support for fiber port. The led will keep blinking when locating. Signed-off-by: Jian Shen Signed-off-by: Peng Li --- drivers/net/ethernet/hisilicon/hns3/hnae3.h| 2 + drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 12 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h | 20 +++ .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 70 ++ 4 files changed, 104 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h index d104ce5..fd06bc7 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h @@ -405,6 +405,8 @@ struct hnae3_ae_ops { int (*set_channels)(struct hnae3_handle *handle, u32 new_tqps_num); void (*get_flowctrl_adv)(struct hnae3_handle *handle, u32 *flowctrl_adv); + int (*set_led_id)(struct hnae3_handle *handle, + enum ethtool_phys_id_state status); }; struct hnae3_dcb_ops { diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c index 1c8b293..7410205 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c @@ -1084,6 +1084,17 @@ static void hns3_get_regs(struct net_device *netdev, h->ae_algo->ops->get_regs(h, >version, data); } +static int hns3_set_phys_id(struct net_device *netdev, + enum ethtool_phys_id_state state) +{ + struct hnae3_handle *h = hns3_get_handle(netdev); + + if (!h->ae_algo || !h->ae_algo->ops || !h->ae_algo->ops->set_led_id) + return -EOPNOTSUPP; + + return h->ae_algo->ops->set_led_id(h, state); +} + static const struct ethtool_ops hns3vf_ethtool_ops = { .get_drvinfo = hns3_get_drvinfo, .get_ringparam = hns3_get_ringparam, @@ -1126,6 +1137,7 @@ static const struct ethtool_ops hns3_ethtool_ops = { .set_coalesce = hns3_set_coalesce, .get_regs_len = hns3_get_regs_len, .get_regs = hns3_get_regs, + .set_phys_id = hns3_set_phys_id, }; void hns3_ethtool_set_ops(struct net_device *netdev) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h index 1cd28e0..122f862 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h @@ -227,6 +227,9 @@ enum hclge_opcode_type { /* Mailbox cmd */ HCLGEVF_OPC_MBX_PF_TO_VF= 0x2000, + + /* Led command */ + HCLGE_OPC_LED_STATUS_CFG= 0xB000, }; #define HCLGE_TQP_REG_OFFSET 0x8 @@ -807,6 +810,23 @@ struct hclge_reset_cmd { #define HCLGE_NIC_CMQ_DESC_NUM 1024 #define HCLGE_NIC_CMQ_DESC_NUM_S 3 +#define HCLGE_LED_PORT_SPEED_STATE_S 0 +#define HCLGE_LED_PORT_SPEED_STATE_M GENMASK(5, 0) +#define HCLGE_LED_ACTIVITY_STATE_S 0 +#define HCLGE_LED_ACTIVITY_STATE_M GENMASK(1, 0) +#define HCLGE_LED_LINK_STATE_S 0 +#define HCLGE_LED_LINK_STATE_M GENMASK(1, 0) +#define HCLGE_LED_LOCATE_STATE_S 0 +#define HCLGE_LED_LOCATE_STATE_M GENMASK(1, 0) + +struct hclge_set_led_state_cmd { + u8 port_speed_led_config; + u8 link_led_config; + u8 activity_led_config; + u8 locate_led_config; + u8 rsv[20]; +}; + int hclge_cmd_init(struct hclge_dev *hdev); static inline void hclge_write_reg(void __iomem *base, u32 reg, u32 value) { diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 6e64bed..12150f2 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -5819,6 +5819,75 @@ static void hclge_get_regs(struct hnae3_handle *handle, u32 *version, "Get 64 bit register failed, ret = %d.\n", ret); } +static int hclge_set_led_status_sfp(struct hclge_dev *hdev, u8 speed_led_status, + u8 act_led_status, u8 link_led_status, + u8 locate_led_status) +{ + struct hclge_set_led_state_cmd *req; + struct hclge_desc desc; + int ret; + + hclge_cmd_setup_basic_desc(, HCLGE_OPC_LED_STATUS_CFG, false); + + req = (struct hclge_set_led_state_cmd *)desc.data; + hnae_set_field(req->port_speed_led_config, HCLGE_LED_PORT_SPEED_STATE_M, + HCLGE_LED_PORT_SPEED_STATE_S, speed_led_status); + hnae_set_field(req->link_led_config, HCLGE_LED_ACTIVITY_STATE_M, + HCLGE_LED_ACTIVITY_STATE_S, act_led_status); + hnae_set_field(req->activity_led_config, HCLGE_LED_LINK_STATE_M, + HCLGE_LED_LINK_STATE_S, link_led_status); +
[PATCH V2 net-next 4/4] net: hns3: add net status led support for fiber port
From: Jian ShenCheck the net status per second, include port speed, total rx/tx packets and link status. Updating the led status for fiber port. Signed-off-by: Jian Shen Signed-off-by: Peng Li --- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h | 1 + .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 109 + .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h| 3 + 3 files changed, 113 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h index 122f862..3fd10a6 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h @@ -115,6 +115,7 @@ enum hclge_opcode_type { HCLGE_OPC_QUERY_LINK_STATUS = 0x0307, HCLGE_OPC_CONFIG_MAX_FRM_SIZE = 0x0308, HCLGE_OPC_CONFIG_SPEED_DUP = 0x0309, + HCLGE_OPC_STATS_MAC_TRAFFIC = 0x0314, /* MACSEC command */ /* PFC/Pause CMD*/ diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 12150f2..32bc6f6 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -39,6 +39,7 @@ static int hclge_set_mta_filter_mode(struct hclge_dev *hdev, static int hclge_set_mtu(struct hnae3_handle *handle, int new_mtu); static int hclge_init_vlan_config(struct hclge_dev *hdev); static int hclge_reset_ae_dev(struct hnae3_ae_dev *ae_dev); +static int hclge_update_led_status(struct hclge_dev *hdev); static struct hnae3_ae_algo ae_algo; @@ -505,6 +506,38 @@ static int hclge_32_bit_update_stats(struct hclge_dev *hdev) return 0; } +static int hclge_mac_get_traffic_stats(struct hclge_dev *hdev) +{ + struct hclge_mac_stats *mac_stats = >hw_stats.mac_stats; + struct hclge_desc desc; + __le64 *desc_data; + int ret; + + /* for fiber port, need to query the total rx/tx packets statstics, +* used for data transferring checking. +*/ + if (hdev->hw.mac.media_type != HNAE3_MEDIA_TYPE_FIBER) + return 0; + + if (test_bit(HCLGE_STATE_STATISTICS_UPDATING, >state)) + return 0; + + hclge_cmd_setup_basic_desc(, HCLGE_OPC_STATS_MAC_TRAFFIC, true); + ret = hclge_cmd_send(>hw, , 1); + if (ret) { + dev_err(>pdev->dev, + "Get MAC total pkt stats fail, ret = %d\n", ret); + + return ret; + } + + desc_data = (__le64 *)([0]); + mac_stats->mac_tx_total_pkt_num += le64_to_cpu(*desc_data++); + mac_stats->mac_rx_total_pkt_num += le64_to_cpu(*desc_data); + + return 0; +} + static int hclge_mac_update_stats(struct hclge_dev *hdev) { #define HCLGE_MAC_CMD_NUM 21 @@ -2846,13 +2879,20 @@ static void hclge_service_task(struct work_struct *work) struct hclge_dev *hdev = container_of(work, struct hclge_dev, service_task); + /* The total rx/tx packets statstics are wanted to be updated +* per second. Both hclge_update_stats_for_all() and +* hclge_mac_get_traffic_stats() can do it. +*/ if (hdev->hw_stats.stats_timer >= HCLGE_STATS_TIMER_INTERVAL) { hclge_update_stats_for_all(hdev); hdev->hw_stats.stats_timer = 0; + } else { + hclge_mac_get_traffic_stats(hdev); } hclge_update_speed_duplex(hdev); hclge_update_link_status(hdev); + hclge_update_led_status(hdev); hclge_service_complete(hdev); } @@ -5888,6 +5928,75 @@ static int hclge_set_led_id(struct hnae3_handle *handle, return ret; } +enum hclge_led_port_speed { + HCLGE_SPEED_LED_FOR_1G, + HCLGE_SPEED_LED_FOR_10G, + HCLGE_SPEED_LED_FOR_25G, + HCLGE_SPEED_LED_FOR_40G, + HCLGE_SPEED_LED_FOR_50G, + HCLGE_SPEED_LED_FOR_100G, +}; + +static u8 hclge_led_get_speed_status(u32 speed) +{ + u8 speed_led; + + switch (speed) { + case HCLGE_MAC_SPEED_1G: + speed_led = HCLGE_SPEED_LED_FOR_1G; + break; + case HCLGE_MAC_SPEED_10G: + speed_led = HCLGE_SPEED_LED_FOR_10G; + break; + case HCLGE_MAC_SPEED_25G: + speed_led = HCLGE_SPEED_LED_FOR_25G; + break; + case HCLGE_MAC_SPEED_40G: + speed_led = HCLGE_SPEED_LED_FOR_40G; + break; + case HCLGE_MAC_SPEED_50G: + speed_led = HCLGE_SPEED_LED_FOR_50G; + break; + case HCLGE_MAC_SPEED_100G: + speed_led = HCLGE_SPEED_LED_FOR_100G; + break; + default: + speed_led = HCLGE_LED_NO_CHANGE; + } + + return speed_led; +} + +static int hclge_update_led_status(struct
[PATCH V2 net-next 4/4] net: hns3: add net status led support for fiber port
From: Jian Shen Check the net status per second, include port speed, total rx/tx packets and link status. Updating the led status for fiber port. Signed-off-by: Jian Shen Signed-off-by: Peng Li --- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h | 1 + .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 109 + .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h| 3 + 3 files changed, 113 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h index 122f862..3fd10a6 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h @@ -115,6 +115,7 @@ enum hclge_opcode_type { HCLGE_OPC_QUERY_LINK_STATUS = 0x0307, HCLGE_OPC_CONFIG_MAX_FRM_SIZE = 0x0308, HCLGE_OPC_CONFIG_SPEED_DUP = 0x0309, + HCLGE_OPC_STATS_MAC_TRAFFIC = 0x0314, /* MACSEC command */ /* PFC/Pause CMD*/ diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 12150f2..32bc6f6 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -39,6 +39,7 @@ static int hclge_set_mta_filter_mode(struct hclge_dev *hdev, static int hclge_set_mtu(struct hnae3_handle *handle, int new_mtu); static int hclge_init_vlan_config(struct hclge_dev *hdev); static int hclge_reset_ae_dev(struct hnae3_ae_dev *ae_dev); +static int hclge_update_led_status(struct hclge_dev *hdev); static struct hnae3_ae_algo ae_algo; @@ -505,6 +506,38 @@ static int hclge_32_bit_update_stats(struct hclge_dev *hdev) return 0; } +static int hclge_mac_get_traffic_stats(struct hclge_dev *hdev) +{ + struct hclge_mac_stats *mac_stats = >hw_stats.mac_stats; + struct hclge_desc desc; + __le64 *desc_data; + int ret; + + /* for fiber port, need to query the total rx/tx packets statstics, +* used for data transferring checking. +*/ + if (hdev->hw.mac.media_type != HNAE3_MEDIA_TYPE_FIBER) + return 0; + + if (test_bit(HCLGE_STATE_STATISTICS_UPDATING, >state)) + return 0; + + hclge_cmd_setup_basic_desc(, HCLGE_OPC_STATS_MAC_TRAFFIC, true); + ret = hclge_cmd_send(>hw, , 1); + if (ret) { + dev_err(>pdev->dev, + "Get MAC total pkt stats fail, ret = %d\n", ret); + + return ret; + } + + desc_data = (__le64 *)([0]); + mac_stats->mac_tx_total_pkt_num += le64_to_cpu(*desc_data++); + mac_stats->mac_rx_total_pkt_num += le64_to_cpu(*desc_data); + + return 0; +} + static int hclge_mac_update_stats(struct hclge_dev *hdev) { #define HCLGE_MAC_CMD_NUM 21 @@ -2846,13 +2879,20 @@ static void hclge_service_task(struct work_struct *work) struct hclge_dev *hdev = container_of(work, struct hclge_dev, service_task); + /* The total rx/tx packets statstics are wanted to be updated +* per second. Both hclge_update_stats_for_all() and +* hclge_mac_get_traffic_stats() can do it. +*/ if (hdev->hw_stats.stats_timer >= HCLGE_STATS_TIMER_INTERVAL) { hclge_update_stats_for_all(hdev); hdev->hw_stats.stats_timer = 0; + } else { + hclge_mac_get_traffic_stats(hdev); } hclge_update_speed_duplex(hdev); hclge_update_link_status(hdev); + hclge_update_led_status(hdev); hclge_service_complete(hdev); } @@ -5888,6 +5928,75 @@ static int hclge_set_led_id(struct hnae3_handle *handle, return ret; } +enum hclge_led_port_speed { + HCLGE_SPEED_LED_FOR_1G, + HCLGE_SPEED_LED_FOR_10G, + HCLGE_SPEED_LED_FOR_25G, + HCLGE_SPEED_LED_FOR_40G, + HCLGE_SPEED_LED_FOR_50G, + HCLGE_SPEED_LED_FOR_100G, +}; + +static u8 hclge_led_get_speed_status(u32 speed) +{ + u8 speed_led; + + switch (speed) { + case HCLGE_MAC_SPEED_1G: + speed_led = HCLGE_SPEED_LED_FOR_1G; + break; + case HCLGE_MAC_SPEED_10G: + speed_led = HCLGE_SPEED_LED_FOR_10G; + break; + case HCLGE_MAC_SPEED_25G: + speed_led = HCLGE_SPEED_LED_FOR_25G; + break; + case HCLGE_MAC_SPEED_40G: + speed_led = HCLGE_SPEED_LED_FOR_40G; + break; + case HCLGE_MAC_SPEED_50G: + speed_led = HCLGE_SPEED_LED_FOR_50G; + break; + case HCLGE_MAC_SPEED_100G: + speed_led = HCLGE_SPEED_LED_FOR_100G; + break; + default: + speed_led = HCLGE_LED_NO_CHANGE; + } + + return speed_led; +} + +static int hclge_update_led_status(struct hclge_dev *hdev) +{ + u8 port_speed_status, link_status,
[PATCH V2 net-next 0/4] add some features to hns3 driver
This patchset adds some features to hns3 driver, include the support for ethtool command -d, -p and support for manager table. [Patch 1/4] adds support for ethtool command -d, its ops is get_regs. driver will send command to command queue, and get regs number and regs value from command queue. [Patch 2/4] adds manager table initialization for hardware. [Patch 3/4] adds support for ethtool command -p. For fiber ports, driver sends command to command queue, and IMP will write SGPIO regs to control leds. [Patch 4/4] adds support for net status led for fiber ports. Net status include port speed, total rx/tx packets and link status. Driver send the status to command queue, and IMP will write SGPIO to control leds. --- Change log: V1 -> V2: 1, fix comments from Andrew Lunn, remove the patch "net: hns3: add ethtool -p support for phy device". --- Fuyun Liang (2): net: hns3: add support for get_regs net: hns3: add manager table initialization for hardware Jian Shen (2): net: hns3: add ethtool -p support for fiber port net: hns3: add net status led support for fiber port drivers/net/ethernet/hisilicon/hns3/hnae3.h| 5 +- drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 35 ++ .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h | 47 +++ .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 456 + .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h| 3 + 5 files changed, 545 insertions(+), 1 deletion(-) -- 2.9.3
[PATCH V2 net-next 0/4] add some features to hns3 driver
This patchset adds some features to hns3 driver, include the support for ethtool command -d, -p and support for manager table. [Patch 1/4] adds support for ethtool command -d, its ops is get_regs. driver will send command to command queue, and get regs number and regs value from command queue. [Patch 2/4] adds manager table initialization for hardware. [Patch 3/4] adds support for ethtool command -p. For fiber ports, driver sends command to command queue, and IMP will write SGPIO regs to control leds. [Patch 4/4] adds support for net status led for fiber ports. Net status include port speed, total rx/tx packets and link status. Driver send the status to command queue, and IMP will write SGPIO to control leds. --- Change log: V1 -> V2: 1, fix comments from Andrew Lunn, remove the patch "net: hns3: add ethtool -p support for phy device". --- Fuyun Liang (2): net: hns3: add support for get_regs net: hns3: add manager table initialization for hardware Jian Shen (2): net: hns3: add ethtool -p support for fiber port net: hns3: add net status led support for fiber port drivers/net/ethernet/hisilicon/hns3/hnae3.h| 5 +- drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 35 ++ .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h | 47 +++ .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 456 + .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h| 3 + 5 files changed, 545 insertions(+), 1 deletion(-) -- 2.9.3
Re: [PATCH 3/4] drm/gem: adjust per file OOM badness on handling buffers
On 2018年01月19日 00:47, Andrey Grodzovsky wrote: Large amounts of VRAM are usually not CPU accessible, so they are not mapped into the processes address space. But since the device drivers usually support swapping buffers from VRAM to system memory we can still run into an out of memory situation when userspace starts to allocate to much. This patch gives the OOM another hint which process is holding how many resources. Signed-off-by: Andrey Grodzovsky--- drivers/gpu/drm/drm_file.c | 12 drivers/gpu/drm/drm_gem.c | 8 include/drm/drm_file.h | 4 3 files changed, 24 insertions(+) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index b3c6e99..626cc76 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -747,3 +747,15 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e) spin_unlock_irqrestore(>event_lock, irqflags); } EXPORT_SYMBOL(drm_send_event); + +long drm_oom_badness(struct file *f) +{ + + struct drm_file *file_priv = f->private_data; + + if (file_priv) + return atomic_long_read(_priv->f_oom_badness); + + return 0; +} +EXPORT_SYMBOL(drm_oom_badness); diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 01f8d94..ffbadc8 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -264,6 +264,9 @@ drm_gem_object_release_handle(int id, void *ptr, void *data) drm_gem_remove_prime_handles(obj, file_priv); drm_vma_node_revoke(>vma_node, file_priv); + atomic_long_sub(obj->size >> PAGE_SHIFT, + _priv->f_oom_badness); + drm_gem_object_handle_put_unlocked(obj); return 0; @@ -299,6 +302,8 @@ drm_gem_handle_delete(struct drm_file *filp, u32 handle) idr_remove(>object_idr, handle); spin_unlock(>table_lock); + atomic_long_sub(obj->size >> PAGE_SHIFT, >f_oom_badness); + return 0; } EXPORT_SYMBOL(drm_gem_handle_delete); @@ -417,6 +422,9 @@ drm_gem_handle_create_tail(struct drm_file *file_priv, } *handlep = handle; + + atomic_long_add(obj->size >> PAGE_SHIFT, + _priv->f_oom_badness); For VRAM case, it should be counted only when vram bo is evicted to system memory. For example, vram total is 8GB, system memory total is 8GB, one application allocates 7GB vram and 7GB system memory, which is allowed, but if following your idea, then this application will be killed by OOM, right? Regards, David Zhou return 0; err_revoke: diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 0e0c868..ac3aa75 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -317,6 +317,8 @@ struct drm_file { /* private: */ unsigned long lock_count; /* DRI1 legacy lock count */ + + atomic_long_t f_oom_badness; }; /** @@ -378,4 +380,6 @@ void drm_event_cancel_free(struct drm_device *dev, void drm_send_event_locked(struct drm_device *dev, struct drm_pending_event *e); void drm_send_event(struct drm_device *dev, struct drm_pending_event *e); +long drm_oom_badness(struct file *f); + #endif /* _DRM_FILE_H_ */
Re: [PATCH V5 2/2] nvme-pci: fixup the timeout case when reset is ongoing
On Fri, Jan 19, 2018 at 01:55:29PM +0800, jianchao.wang wrote: > On 01/19/2018 12:59 PM, Keith Busch wrote: > > On Thu, Jan 18, 2018 at 06:10:02PM +0800, Jianchao Wang wrote: > >> + * - When the ctrl.state is NVME_CTRL_RESETTING, the expired > >> + * request should come from the previous work and we handle > >> + * it as nvme_cancel_request. > >> + * - When the ctrl.state is NVME_CTRL_RECONNECTING, the expired > >> + * request should come from the initializing procedure such as > >> + * setup io queues, because all the previous outstanding > >> + * requests should have been cancelled. > >> */ > >> - if (dev->ctrl.state == NVME_CTRL_RESETTING) { > >> - dev_warn(dev->ctrl.device, > >> - "I/O %d QID %d timeout, disable controller\n", > >> - req->tag, nvmeq->qid); > >> - nvme_dev_disable(dev, false); > >> + switch (dev->ctrl.state) { > >> + case NVME_CTRL_RESETTING: > >> + nvme_req(req)->status = NVME_SC_ABORT_REQ; > >> + return BLK_EH_HANDLED; > >> + case NVME_CTRL_RECONNECTING: > >> + WARN_ON_ONCE(nvmeq->qid); > >>nvme_req(req)->flags |= NVME_REQ_CANCELLED; > >>return BLK_EH_HANDLED; > >> + default: > >> + break; > >>} > > > > The driver may be giving up on the command here, but that doesn't mean > > the controller has. We can't just end the request like this because that > > will release the memory the controller still owns. We must wait until > > after nvme_dev_disable clears bus master because we can't say for sure > > the controller isn't going to write to that address right after we end > > the request. > > > Yes, but the controller is going to be reseted or shutdown at the moment, > even if the controller accesses a bad address and goes wrong, everything will > be ok after reset or shutdown. :) Hm, I don't follow. DMA access after free is never okay.
Re: [PATCH 3/4] drm/gem: adjust per file OOM badness on handling buffers
On 2018年01月19日 00:47, Andrey Grodzovsky wrote: Large amounts of VRAM are usually not CPU accessible, so they are not mapped into the processes address space. But since the device drivers usually support swapping buffers from VRAM to system memory we can still run into an out of memory situation when userspace starts to allocate to much. This patch gives the OOM another hint which process is holding how many resources. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/drm_file.c | 12 drivers/gpu/drm/drm_gem.c | 8 include/drm/drm_file.h | 4 3 files changed, 24 insertions(+) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index b3c6e99..626cc76 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -747,3 +747,15 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e) spin_unlock_irqrestore(>event_lock, irqflags); } EXPORT_SYMBOL(drm_send_event); + +long drm_oom_badness(struct file *f) +{ + + struct drm_file *file_priv = f->private_data; + + if (file_priv) + return atomic_long_read(_priv->f_oom_badness); + + return 0; +} +EXPORT_SYMBOL(drm_oom_badness); diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 01f8d94..ffbadc8 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -264,6 +264,9 @@ drm_gem_object_release_handle(int id, void *ptr, void *data) drm_gem_remove_prime_handles(obj, file_priv); drm_vma_node_revoke(>vma_node, file_priv); + atomic_long_sub(obj->size >> PAGE_SHIFT, + _priv->f_oom_badness); + drm_gem_object_handle_put_unlocked(obj); return 0; @@ -299,6 +302,8 @@ drm_gem_handle_delete(struct drm_file *filp, u32 handle) idr_remove(>object_idr, handle); spin_unlock(>table_lock); + atomic_long_sub(obj->size >> PAGE_SHIFT, >f_oom_badness); + return 0; } EXPORT_SYMBOL(drm_gem_handle_delete); @@ -417,6 +422,9 @@ drm_gem_handle_create_tail(struct drm_file *file_priv, } *handlep = handle; + + atomic_long_add(obj->size >> PAGE_SHIFT, + _priv->f_oom_badness); For VRAM case, it should be counted only when vram bo is evicted to system memory. For example, vram total is 8GB, system memory total is 8GB, one application allocates 7GB vram and 7GB system memory, which is allowed, but if following your idea, then this application will be killed by OOM, right? Regards, David Zhou return 0; err_revoke: diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 0e0c868..ac3aa75 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -317,6 +317,8 @@ struct drm_file { /* private: */ unsigned long lock_count; /* DRI1 legacy lock count */ + + atomic_long_t f_oom_badness; }; /** @@ -378,4 +380,6 @@ void drm_event_cancel_free(struct drm_device *dev, void drm_send_event_locked(struct drm_device *dev, struct drm_pending_event *e); void drm_send_event(struct drm_device *dev, struct drm_pending_event *e); +long drm_oom_badness(struct file *f); + #endif /* _DRM_FILE_H_ */
Re: [PATCH V5 2/2] nvme-pci: fixup the timeout case when reset is ongoing
On Fri, Jan 19, 2018 at 01:55:29PM +0800, jianchao.wang wrote: > On 01/19/2018 12:59 PM, Keith Busch wrote: > > On Thu, Jan 18, 2018 at 06:10:02PM +0800, Jianchao Wang wrote: > >> + * - When the ctrl.state is NVME_CTRL_RESETTING, the expired > >> + * request should come from the previous work and we handle > >> + * it as nvme_cancel_request. > >> + * - When the ctrl.state is NVME_CTRL_RECONNECTING, the expired > >> + * request should come from the initializing procedure such as > >> + * setup io queues, because all the previous outstanding > >> + * requests should have been cancelled. > >> */ > >> - if (dev->ctrl.state == NVME_CTRL_RESETTING) { > >> - dev_warn(dev->ctrl.device, > >> - "I/O %d QID %d timeout, disable controller\n", > >> - req->tag, nvmeq->qid); > >> - nvme_dev_disable(dev, false); > >> + switch (dev->ctrl.state) { > >> + case NVME_CTRL_RESETTING: > >> + nvme_req(req)->status = NVME_SC_ABORT_REQ; > >> + return BLK_EH_HANDLED; > >> + case NVME_CTRL_RECONNECTING: > >> + WARN_ON_ONCE(nvmeq->qid); > >>nvme_req(req)->flags |= NVME_REQ_CANCELLED; > >>return BLK_EH_HANDLED; > >> + default: > >> + break; > >>} > > > > The driver may be giving up on the command here, but that doesn't mean > > the controller has. We can't just end the request like this because that > > will release the memory the controller still owns. We must wait until > > after nvme_dev_disable clears bus master because we can't say for sure > > the controller isn't going to write to that address right after we end > > the request. > > > Yes, but the controller is going to be reseted or shutdown at the moment, > even if the controller accesses a bad address and goes wrong, everything will > be ok after reset or shutdown. :) Hm, I don't follow. DMA access after free is never okay.
RE: [RFC] Per file OOM badness
-Original Message- From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of Michal Hocko Sent: Friday, January 19, 2018 1:14 AM To: Grodzovsky, AndreyCc: linux...@kvack.org; amd-...@lists.freedesktop.org; linux-kernel@vger.kernel.org; dri-de...@lists.freedesktop.org; Koenig, Christian Subject: Re: [RFC] Per file OOM badness On Thu 18-01-18 18:00:06, Michal Hocko wrote: > On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote: > > Hi, this series is a revised version of an RFC sent by Christian > > König a few years ago. The original RFC can be found at > > https://lists.freedesktop.org/archives/dri-devel/2015-September/0897 > > 78.html > > > > This is the same idea and I've just adressed his concern from the > > original RFC and switched to a callback into file_ops instead of a new > > member in struct file. > > Please add the full description to the cover letter and do not make > people hunt links. > > Here is the origin cover letter text > : I'm currently working on the issue that when device drivers allocate > memory on > : behalf of an application the OOM killer usually doesn't knew about > that unless > : the application also get this memory mapped into their address space. > : > : This is especially annoying for graphics drivers where a lot of the > VRAM > : usually isn't CPU accessible and so doesn't make sense to map into > the > : address space of the process using it. > : > : The problem now is that when an application starts to use a lot of > VRAM those > : buffers objects sooner or later get swapped out to system memory, > but when we > : now run into an out of memory situation the OOM killer obviously > doesn't knew > : anything about that memory and so usually kills the wrong process. OK, but how do you attribute that memory to a particular OOM killable entity? And how do you actually enforce that those resources get freed on the oom killer action? Here I think we need more fine granularity for distinguishing the buffer is taking VRAM or system memory. > : The following set of patches tries to address this problem by > introducing a per > : file OOM badness score, which device drivers can use to give the OOM > killer a > : hint how many resources are bound to a file descriptor so that it > can make > : better decisions which process to kill. But files are not killable, they can be shared... In other words this doesn't help the oom killer to make an educated guess at all. > : > : So question at every one: What do you think about this approach? I thing is just just wrong semantically. Non-reclaimable memory is a pain, especially when there is way too much of it. If you can free that memory somehow then you can hook into slab shrinker API and react on the memory pressure. If you can account such amemory to a particular process and make sure that the consumption is bound by the process life time then we can think of an accounting that oom_badness can consider when selecting a victim. I think you are misunderstanding here. Actually for now, the memory in TTM Pools already has mm_shrink which is implemented in ttm_pool_mm_shrink_init. And here the memory we want to make it contribute to OOM badness is not in TTM Pools. Because when TTM buffer allocation success, the memory already is removed from TTM Pools. Thanks Roger(Hongbo.He) -- Michal Hocko SUSE Labs ___ amd-gfx mailing list amd-...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [RFC] Per file OOM badness
-Original Message- From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of Michal Hocko Sent: Friday, January 19, 2018 1:14 AM To: Grodzovsky, Andrey Cc: linux...@kvack.org; amd-...@lists.freedesktop.org; linux-kernel@vger.kernel.org; dri-de...@lists.freedesktop.org; Koenig, Christian Subject: Re: [RFC] Per file OOM badness On Thu 18-01-18 18:00:06, Michal Hocko wrote: > On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote: > > Hi, this series is a revised version of an RFC sent by Christian > > König a few years ago. The original RFC can be found at > > https://lists.freedesktop.org/archives/dri-devel/2015-September/0897 > > 78.html > > > > This is the same idea and I've just adressed his concern from the > > original RFC and switched to a callback into file_ops instead of a new > > member in struct file. > > Please add the full description to the cover letter and do not make > people hunt links. > > Here is the origin cover letter text > : I'm currently working on the issue that when device drivers allocate > memory on > : behalf of an application the OOM killer usually doesn't knew about > that unless > : the application also get this memory mapped into their address space. > : > : This is especially annoying for graphics drivers where a lot of the > VRAM > : usually isn't CPU accessible and so doesn't make sense to map into > the > : address space of the process using it. > : > : The problem now is that when an application starts to use a lot of > VRAM those > : buffers objects sooner or later get swapped out to system memory, > but when we > : now run into an out of memory situation the OOM killer obviously > doesn't knew > : anything about that memory and so usually kills the wrong process. OK, but how do you attribute that memory to a particular OOM killable entity? And how do you actually enforce that those resources get freed on the oom killer action? Here I think we need more fine granularity for distinguishing the buffer is taking VRAM or system memory. > : The following set of patches tries to address this problem by > introducing a per > : file OOM badness score, which device drivers can use to give the OOM > killer a > : hint how many resources are bound to a file descriptor so that it > can make > : better decisions which process to kill. But files are not killable, they can be shared... In other words this doesn't help the oom killer to make an educated guess at all. > : > : So question at every one: What do you think about this approach? I thing is just just wrong semantically. Non-reclaimable memory is a pain, especially when there is way too much of it. If you can free that memory somehow then you can hook into slab shrinker API and react on the memory pressure. If you can account such amemory to a particular process and make sure that the consumption is bound by the process life time then we can think of an accounting that oom_badness can consider when selecting a victim. I think you are misunderstanding here. Actually for now, the memory in TTM Pools already has mm_shrink which is implemented in ttm_pool_mm_shrink_init. And here the memory we want to make it contribute to OOM badness is not in TTM Pools. Because when TTM buffer allocation success, the memory already is removed from TTM Pools. Thanks Roger(Hongbo.He) -- Michal Hocko SUSE Labs ___ amd-gfx mailing list amd-...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH V5 2/2] nvme-pci: fixup the timeout case when reset is ongoing
Hi Keith Thanks for your kindly response and directive. On 01/19/2018 12:59 PM, Keith Busch wrote: > On Thu, Jan 18, 2018 at 06:10:02PM +0800, Jianchao Wang wrote: >> + * - When the ctrl.state is NVME_CTRL_RESETTING, the expired >> + * request should come from the previous work and we handle >> + * it as nvme_cancel_request. >> + * - When the ctrl.state is NVME_CTRL_RECONNECTING, the expired >> + * request should come from the initializing procedure such as >> + * setup io queues, because all the previous outstanding >> + * requests should have been cancelled. >> */ >> -if (dev->ctrl.state == NVME_CTRL_RESETTING) { >> -dev_warn(dev->ctrl.device, >> - "I/O %d QID %d timeout, disable controller\n", >> - req->tag, nvmeq->qid); >> -nvme_dev_disable(dev, false); >> +switch (dev->ctrl.state) { >> +case NVME_CTRL_RESETTING: >> +nvme_req(req)->status = NVME_SC_ABORT_REQ; >> +return BLK_EH_HANDLED; >> +case NVME_CTRL_RECONNECTING: >> +WARN_ON_ONCE(nvmeq->qid); >> nvme_req(req)->flags |= NVME_REQ_CANCELLED; >> return BLK_EH_HANDLED; >> +default: >> +break; >> } > > The driver may be giving up on the command here, but that doesn't mean > the controller has. We can't just end the request like this because that > will release the memory the controller still owns. We must wait until > after nvme_dev_disable clears bus master because we can't say for sure > the controller isn't going to write to that address right after we end > the request. > Yes, but the controller is going to be reseted or shutdown at the moment, even if the controller accesses a bad address and goes wrong, everything will be ok after reset or shutdown. :) Thanks Jianchao
Re: [PATCH V5 2/2] nvme-pci: fixup the timeout case when reset is ongoing
Hi Keith Thanks for your kindly response and directive. On 01/19/2018 12:59 PM, Keith Busch wrote: > On Thu, Jan 18, 2018 at 06:10:02PM +0800, Jianchao Wang wrote: >> + * - When the ctrl.state is NVME_CTRL_RESETTING, the expired >> + * request should come from the previous work and we handle >> + * it as nvme_cancel_request. >> + * - When the ctrl.state is NVME_CTRL_RECONNECTING, the expired >> + * request should come from the initializing procedure such as >> + * setup io queues, because all the previous outstanding >> + * requests should have been cancelled. >> */ >> -if (dev->ctrl.state == NVME_CTRL_RESETTING) { >> -dev_warn(dev->ctrl.device, >> - "I/O %d QID %d timeout, disable controller\n", >> - req->tag, nvmeq->qid); >> -nvme_dev_disable(dev, false); >> +switch (dev->ctrl.state) { >> +case NVME_CTRL_RESETTING: >> +nvme_req(req)->status = NVME_SC_ABORT_REQ; >> +return BLK_EH_HANDLED; >> +case NVME_CTRL_RECONNECTING: >> +WARN_ON_ONCE(nvmeq->qid); >> nvme_req(req)->flags |= NVME_REQ_CANCELLED; >> return BLK_EH_HANDLED; >> +default: >> +break; >> } > > The driver may be giving up on the command here, but that doesn't mean > the controller has. We can't just end the request like this because that > will release the memory the controller still owns. We must wait until > after nvme_dev_disable clears bus master because we can't say for sure > the controller isn't going to write to that address right after we end > the request. > Yes, but the controller is going to be reseted or shutdown at the moment, even if the controller accesses a bad address and goes wrong, everything will be ok after reset or shutdown. :) Thanks Jianchao
linux-next: build failure after merge of the powerpc tree
Hi all, After merging the powerpc tree, today's linux-next build (powerpc64 allnoconfig) failed like this: arch/powerpc/kernel/mce_power.o: In function `.mce_handle_error': mce_power.c:(.text+0x5a8): undefined reference to `.hash__tlbiel_all' mce_power.c:(.text+0x6b8): undefined reference to `.hash__tlbiel_all' arch/powerpc/mm/hash_utils_64.o: In function `.hash__early_init_mmu': hash_utils_64.c:(.init.text+0x9d0): undefined reference to `.hash__tlbiel_all' Caused by commit d4748276ae14 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") The definition of hash__tlbiel_all() is in arch/powerpc/mm/hash_native_64.c which is only built if CONFIG_PPC_NATIVE is set, which it is not for this build. I applied a supplied fix patch. -- Cheers, Stephen Rothwell
linux-next: build failure after merge of the powerpc tree
Hi all, After merging the powerpc tree, today's linux-next build (powerpc64 allnoconfig) failed like this: arch/powerpc/kernel/mce_power.o: In function `.mce_handle_error': mce_power.c:(.text+0x5a8): undefined reference to `.hash__tlbiel_all' mce_power.c:(.text+0x6b8): undefined reference to `.hash__tlbiel_all' arch/powerpc/mm/hash_utils_64.o: In function `.hash__early_init_mmu': hash_utils_64.c:(.init.text+0x9d0): undefined reference to `.hash__tlbiel_all' Caused by commit d4748276ae14 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") The definition of hash__tlbiel_all() is in arch/powerpc/mm/hash_native_64.c which is only built if CONFIG_PPC_NATIVE is set, which it is not for this build. I applied a supplied fix patch. -- Cheers, Stephen Rothwell
Re: [PATCH 2/4] dmaengine: qcom: bam_dma: add num-channels binding for remotely controlled
On Tue, Jan 16, 2018 at 07:02:34PM +, srinivas.kandaga...@linaro.org wrote: > From: Srinivas Kandagatla> > When Linux is master of BAM, it can directly read registers to know number > of supported channels, however when its remotely controlled reading these > registers would trigger a crash if the BAM is not yet intialized/powered up > on the remote side. > > This patch adds num-channels binding to specify number of supported > dma channels on remotely controlled BAM. > > Signed-off-by: Srinivas Kandagatla > --- > Documentation/devicetree/bindings/dma/qcom_bam_dma.txt | 2 ++ > drivers/dma/qcom/bam_dma.c | 13 +++-- > 2 files changed, 13 insertions(+), 2 deletions(-) > > diff --git a/Documentation/devicetree/bindings/dma/qcom_bam_dma.txt > b/Documentation/devicetree/bindings/dma/qcom_bam_dma.txt > index 9cbf5d9df8fd..aa6822cbb230 100644 > --- a/Documentation/devicetree/bindings/dma/qcom_bam_dma.txt > +++ b/Documentation/devicetree/bindings/dma/qcom_bam_dma.txt > @@ -15,6 +15,8 @@ Required properties: >the secure world. > - qcom,controlled-remotely : optional, indicates that the bam is controlled > by >remote proccessor i.e. execution environment. > +- num-channels : optional, indicates supported number of DMA channels in a > + remotely controlled bam. > > Example: > > diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c > index 78e488e8f96d..523bd178047a 100644 > --- a/drivers/dma/qcom/bam_dma.c > +++ b/drivers/dma/qcom/bam_dma.c > @@ -1083,8 +1083,10 @@ static int bam_init(struct bam_device *bdev) > if (bdev->ee >= val) > return -EINVAL; > > - val = readl_relaxed(bam_addr(bdev, 0, BAM_NUM_PIPES)); > - bdev->num_channels = val & BAM_NUM_PIPES_MASK; > + if (!bdev->num_channels) { > + val = readl_relaxed(bam_addr(bdev, 0, BAM_NUM_PIPES)); > + bdev->num_channels = val & BAM_NUM_PIPES_MASK; > + } > > if (bdev->controlled_remotely) > return 0; > @@ -1179,6 +1181,13 @@ static int bam_dma_probe(struct platform_device *pdev) > bdev->controlled_remotely = of_property_read_bool(pdev->dev.of_node, > "qcom,controlled-remotely"); > > + if (bdev->controlled_remotely) { hmm so if we remove the remotely controlled instanced from DT and then Linux won't see them and not do anything. Do we need to do configuration of these instances too? > + ret = of_property_read_u32(pdev->dev.of_node, "num-channels", > +>num_channels); > + if (ret) > + dev_err(bdev->dev, "num-channels unspecified in dt\n"); > + } > + > bdev->bamclk = devm_clk_get(bdev->dev, "bam_clk"); > if (IS_ERR(bdev->bamclk)) { > bdev->bamclk = NULL; > -- > 2.15.1 > -- ~Vinod
Re: [PATCH 2/4] dmaengine: qcom: bam_dma: add num-channels binding for remotely controlled
On Tue, Jan 16, 2018 at 07:02:34PM +, srinivas.kandaga...@linaro.org wrote: > From: Srinivas Kandagatla > > When Linux is master of BAM, it can directly read registers to know number > of supported channels, however when its remotely controlled reading these > registers would trigger a crash if the BAM is not yet intialized/powered up > on the remote side. > > This patch adds num-channels binding to specify number of supported > dma channels on remotely controlled BAM. > > Signed-off-by: Srinivas Kandagatla > --- > Documentation/devicetree/bindings/dma/qcom_bam_dma.txt | 2 ++ > drivers/dma/qcom/bam_dma.c | 13 +++-- > 2 files changed, 13 insertions(+), 2 deletions(-) > > diff --git a/Documentation/devicetree/bindings/dma/qcom_bam_dma.txt > b/Documentation/devicetree/bindings/dma/qcom_bam_dma.txt > index 9cbf5d9df8fd..aa6822cbb230 100644 > --- a/Documentation/devicetree/bindings/dma/qcom_bam_dma.txt > +++ b/Documentation/devicetree/bindings/dma/qcom_bam_dma.txt > @@ -15,6 +15,8 @@ Required properties: >the secure world. > - qcom,controlled-remotely : optional, indicates that the bam is controlled > by >remote proccessor i.e. execution environment. > +- num-channels : optional, indicates supported number of DMA channels in a > + remotely controlled bam. > > Example: > > diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c > index 78e488e8f96d..523bd178047a 100644 > --- a/drivers/dma/qcom/bam_dma.c > +++ b/drivers/dma/qcom/bam_dma.c > @@ -1083,8 +1083,10 @@ static int bam_init(struct bam_device *bdev) > if (bdev->ee >= val) > return -EINVAL; > > - val = readl_relaxed(bam_addr(bdev, 0, BAM_NUM_PIPES)); > - bdev->num_channels = val & BAM_NUM_PIPES_MASK; > + if (!bdev->num_channels) { > + val = readl_relaxed(bam_addr(bdev, 0, BAM_NUM_PIPES)); > + bdev->num_channels = val & BAM_NUM_PIPES_MASK; > + } > > if (bdev->controlled_remotely) > return 0; > @@ -1179,6 +1181,13 @@ static int bam_dma_probe(struct platform_device *pdev) > bdev->controlled_remotely = of_property_read_bool(pdev->dev.of_node, > "qcom,controlled-remotely"); > > + if (bdev->controlled_remotely) { hmm so if we remove the remotely controlled instanced from DT and then Linux won't see them and not do anything. Do we need to do configuration of these instances too? > + ret = of_property_read_u32(pdev->dev.of_node, "num-channels", > +>num_channels); > + if (ret) > + dev_err(bdev->dev, "num-channels unspecified in dt\n"); > + } > + > bdev->bamclk = devm_clk_get(bdev->dev, "bam_clk"); > if (IS_ERR(bdev->bamclk)) { > bdev->bamclk = NULL; > -- > 2.15.1 > -- ~Vinod
RE: [PATCH] USB TYPEC: RT1711H Type-C Chip Driver
Hi Jun, For now, RT1711H is not fully compatible with TCPCI. So the existing tcpci.c may not work for it. Best Regards, * Shu-Fan Lee Richtek Technology Corporation TEL: +886-3-5526789 #2359 FAX: +886-3-5526612 * -Original Message- From: Jun Li [mailto:jun...@nxp.com] Sent: Friday, January 19, 2018 11:10 AM To: ShuFanLee; heikki.kroge...@linux.intel.com Cc: cy_huang(黃啟原); shufan_lee(李書帆); linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; Guenter Roeck Subject: RE: [PATCH] USB TYPEC: RT1711H Type-C Chip Driver Hi > -Original Message- > From: linux-usb-ow...@vger.kernel.org [mailto:linux-usb- > ow...@vger.kernel.org] On Behalf Of ShuFanLee > Sent: Wednesday, January 10, 2018 2:59 PM > To: heikki.kroge...@linux.intel.com > Cc: cy_hu...@richtek.com; shufan_...@richtek.com; linux- > ker...@vger.kernel.org; linux-...@vger.kernel.org > Subject: [PATCH] USB TYPEC: RT1711H Type-C Chip Driver > > From: ShuFanLee> > Richtek RT1711H Type-C chip driver that works with Type-C Port > Controller Manager to provide USB PD and USB Type-C functionalities. A general question, is this Rt1711h type-c chip compatible with TCPCI (Universal Serial Bus Type-C Port Controller Interface Specification)? looks like it has the same register map and has some extension, can the existing ./drivers/staging/typec/tcpic.c basically work for you? +Guenter Li Jun > > Signed-off-by: ShuFanLee > --- > .../devicetree/bindings/usb/richtek,rt1711h.txt| 38 + > arch/arm64/boot/dts/hisilicon/rt1711h.dtsi | 11 + > drivers/usb/typec/Kconfig |2 + > drivers/usb/typec/Makefile |1 + > drivers/usb/typec/rt1711h/Kconfig |7 + > drivers/usb/typec/rt1711h/Makefile |2 + > drivers/usb/typec/rt1711h/rt1711h.c| 2241 > > drivers/usb/typec/rt1711h/rt1711h.h| 300 +++ > 8 files changed, 2602 insertions(+) > create mode 100644 > Documentation/devicetree/bindings/usb/richtek,rt1711h.txt > create mode 100644 arch/arm64/boot/dts/hisilicon/rt1711h.dtsi > create mode 100644 drivers/usb/typec/rt1711h/Kconfig create mode > 100644 drivers/usb/typec/rt1711h/Makefile > create mode 100644 drivers/usb/typec/rt1711h/rt1711h.c > create mode 100644 drivers/usb/typec/rt1711h/rt1711h.h > * Email Confidentiality Notice The information contained in this e-mail message (including any attachments) may be confidential, proprietary, privileged, or otherwise exempt from disclosure under applicable laws. It is intended to be conveyed only to the designated recipient(s). Any use, dissemination, distribution, printing, retaining or copying of this e-mail (including its attachments) by unintended recipient(s) is strictly prohibited and may be unlawful. If you are not an intended recipient of this e-mail, or believe that you have received this e-mail in error, please notify the sender immediately (by replying to this e-mail), delete any and all copies of this e-mail (including any attachments) from your system, and do not disclose the content of this e-mail to any other person. Thank you!
RE: [PATCH] USB TYPEC: RT1711H Type-C Chip Driver
Hi Jun, For now, RT1711H is not fully compatible with TCPCI. So the existing tcpci.c may not work for it. Best Regards, * Shu-Fan Lee Richtek Technology Corporation TEL: +886-3-5526789 #2359 FAX: +886-3-5526612 * -Original Message- From: Jun Li [mailto:jun...@nxp.com] Sent: Friday, January 19, 2018 11:10 AM To: ShuFanLee; heikki.kroge...@linux.intel.com Cc: cy_huang(黃啟原); shufan_lee(李書帆); linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; Guenter Roeck Subject: RE: [PATCH] USB TYPEC: RT1711H Type-C Chip Driver Hi > -Original Message- > From: linux-usb-ow...@vger.kernel.org [mailto:linux-usb- > ow...@vger.kernel.org] On Behalf Of ShuFanLee > Sent: Wednesday, January 10, 2018 2:59 PM > To: heikki.kroge...@linux.intel.com > Cc: cy_hu...@richtek.com; shufan_...@richtek.com; linux- > ker...@vger.kernel.org; linux-...@vger.kernel.org > Subject: [PATCH] USB TYPEC: RT1711H Type-C Chip Driver > > From: ShuFanLee > > Richtek RT1711H Type-C chip driver that works with Type-C Port > Controller Manager to provide USB PD and USB Type-C functionalities. A general question, is this Rt1711h type-c chip compatible with TCPCI (Universal Serial Bus Type-C Port Controller Interface Specification)? looks like it has the same register map and has some extension, can the existing ./drivers/staging/typec/tcpic.c basically work for you? +Guenter Li Jun > > Signed-off-by: ShuFanLee > --- > .../devicetree/bindings/usb/richtek,rt1711h.txt| 38 + > arch/arm64/boot/dts/hisilicon/rt1711h.dtsi | 11 + > drivers/usb/typec/Kconfig |2 + > drivers/usb/typec/Makefile |1 + > drivers/usb/typec/rt1711h/Kconfig |7 + > drivers/usb/typec/rt1711h/Makefile |2 + > drivers/usb/typec/rt1711h/rt1711h.c| 2241 > > drivers/usb/typec/rt1711h/rt1711h.h| 300 +++ > 8 files changed, 2602 insertions(+) > create mode 100644 > Documentation/devicetree/bindings/usb/richtek,rt1711h.txt > create mode 100644 arch/arm64/boot/dts/hisilicon/rt1711h.dtsi > create mode 100644 drivers/usb/typec/rt1711h/Kconfig create mode > 100644 drivers/usb/typec/rt1711h/Makefile > create mode 100644 drivers/usb/typec/rt1711h/rt1711h.c > create mode 100644 drivers/usb/typec/rt1711h/rt1711h.h > * Email Confidentiality Notice The information contained in this e-mail message (including any attachments) may be confidential, proprietary, privileged, or otherwise exempt from disclosure under applicable laws. It is intended to be conveyed only to the designated recipient(s). Any use, dissemination, distribution, printing, retaining or copying of this e-mail (including its attachments) by unintended recipient(s) is strictly prohibited and may be unlawful. If you are not an intended recipient of this e-mail, or believe that you have received this e-mail in error, please notify the sender immediately (by replying to this e-mail), delete any and all copies of this e-mail (including any attachments) from your system, and do not disclose the content of this e-mail to any other person. Thank you!
Re: [PATCH 1/4] dmaengine: qcom: bam_dma: make bam clk optional
On Tue, Jan 16, 2018 at 07:02:33PM +, srinivas.kandaga...@linaro.org wrote: > From: Srinivas Kandagatla> > When BAM is remotely controlled it does not sound correct to control > its clk on Linux side. Make it optional, so that its not madatory s/madatory/mandatory > for remote controlled BAM instances. > > Signed-off-by: Srinivas Kandagatla > --- > drivers/dma/qcom/bam_dma.c | 15 --- > 1 file changed, 8 insertions(+), 7 deletions(-) > > diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c > index 03c4eb3fd314..78e488e8f96d 100644 > --- a/drivers/dma/qcom/bam_dma.c > +++ b/drivers/dma/qcom/bam_dma.c > @@ -1180,13 +1180,14 @@ static int bam_dma_probe(struct platform_device *pdev) > "qcom,controlled-remotely"); > > bdev->bamclk = devm_clk_get(bdev->dev, "bam_clk"); but you still do clk_get unconditionally? > - if (IS_ERR(bdev->bamclk)) > - return PTR_ERR(bdev->bamclk); > - > - ret = clk_prepare_enable(bdev->bamclk); > - if (ret) { > - dev_err(bdev->dev, "failed to prepare/enable clock\n"); > - return ret; > + if (IS_ERR(bdev->bamclk)) { > + bdev->bamclk = NULL; > + } else { > + ret = clk_prepare_enable(bdev->bamclk); > + if (ret) { > + dev_err(bdev->dev, "failed to prepare/enable clock\n"); > + return ret; > + } wouldn't it be better to set that an instance is remote controlled and thus not at all visible to Linux? > } > > ret = bam_init(bdev); > -- > 2.15.1 > > -- > To unsubscribe from this list: send the line "unsubscribe dmaengine" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- ~Vinod
Re: [PATCH 1/4] dmaengine: qcom: bam_dma: make bam clk optional
On Tue, Jan 16, 2018 at 07:02:33PM +, srinivas.kandaga...@linaro.org wrote: > From: Srinivas Kandagatla > > When BAM is remotely controlled it does not sound correct to control > its clk on Linux side. Make it optional, so that its not madatory s/madatory/mandatory > for remote controlled BAM instances. > > Signed-off-by: Srinivas Kandagatla > --- > drivers/dma/qcom/bam_dma.c | 15 --- > 1 file changed, 8 insertions(+), 7 deletions(-) > > diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c > index 03c4eb3fd314..78e488e8f96d 100644 > --- a/drivers/dma/qcom/bam_dma.c > +++ b/drivers/dma/qcom/bam_dma.c > @@ -1180,13 +1180,14 @@ static int bam_dma_probe(struct platform_device *pdev) > "qcom,controlled-remotely"); > > bdev->bamclk = devm_clk_get(bdev->dev, "bam_clk"); but you still do clk_get unconditionally? > - if (IS_ERR(bdev->bamclk)) > - return PTR_ERR(bdev->bamclk); > - > - ret = clk_prepare_enable(bdev->bamclk); > - if (ret) { > - dev_err(bdev->dev, "failed to prepare/enable clock\n"); > - return ret; > + if (IS_ERR(bdev->bamclk)) { > + bdev->bamclk = NULL; > + } else { > + ret = clk_prepare_enable(bdev->bamclk); > + if (ret) { > + dev_err(bdev->dev, "failed to prepare/enable clock\n"); > + return ret; > + } wouldn't it be better to set that an instance is remote controlled and thus not at all visible to Linux? > } > > ret = bam_init(bdev); > -- > 2.15.1 > > -- > To unsubscribe from this list: send the line "unsubscribe dmaengine" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- ~Vinod
Re: [PATCH] print kdump kernel loaded status in stack dump
On (01/18/18 10:02), Andi Kleen wrote: > Dave Youngwrites: > > printk("%sHardware name: %s\n", > >log_lvl, dump_stack_arch_desc_str); > > + if (kexec_crash_loaded()) > > + printk("%skdump kernel loaded\n", log_lvl); > > Oops/warnings are getting longer and longer, often scrolling away > from the screen, and if the kernel crashes backscroll does not work > anymore, so precious information is lost. true. I even ended up having a console_reflush_on_panic() function. it simply re-prints with a delay [so I can at least read the oops] logbuf entries every once in a while, staring with the first oops_in_progress record. something like below [it's completely hacked up, but at least gives an idea] --- include/linux/console.h | 1 + kernel/panic.c | 7 +++ kernel/printk/printk.c | 39 ++- 3 files changed, 46 insertions(+), 1 deletion(-) diff --git a/include/linux/console.h b/include/linux/console.h index b8920a031a3e..502e3f539448 100644 --- a/include/linux/console.h +++ b/include/linux/console.h @@ -168,6 +168,7 @@ extern void console_unlock(void); extern void console_conditional_schedule(void); extern void console_unblank(void); extern void console_flush_on_panic(void); +extern void console_reflush_on_panic(void); extern struct tty_driver *console_device(int *); extern void console_stop(struct console *); extern void console_start(struct console *); diff --git a/kernel/panic.c b/kernel/panic.c index 2cfef408fec9..39cd59bbfaab 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -137,6 +137,7 @@ void panic(const char *fmt, ...) va_list args; long i, i_next = 0; int state = 0; + int reflush_tick = 0; int old_cpu, this_cpu; bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers; @@ -298,6 +299,12 @@ void panic(const char *fmt, ...) i_next = i + 3600 / PANIC_BLINK_SPD; } mdelay(PANIC_TIMER_STEP); + + reflush_tick++; + if (reflush_tick == 32) { /* don't reflush too often */ + console_reflush_on_panic(); + reflush_tick = 0; + } } } diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 9cb943c90d98..ef3f28d4c741 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -426,6 +426,10 @@ static u32 log_next_idx; static u64 console_seq; static u32 console_idx; +/* index and sequence number of the record which started the oops print out */ +static u64 log_oops_seq; +static u32 log_oops_idx; + /* the next printk record to read after the last 'clear' command */ static u64 clear_seq; static u32 clear_idx; @@ -1736,6 +1740,15 @@ static inline void printk_delay(void) } } +/* + * Why do we have printk_delay() in vprintk_emit() + * and not in console_unlock()? + */ +static inline void console_unlock_delay(void) +{ + printk_delay(); +} + /* * Continuation lines are buffered, and not committed to the record buffer * until the line is complete, or a race forces it. The line fragments @@ -1849,6 +1862,7 @@ asmlinkage int vprintk_emit(int facility, int level, /* This stops the holder of console_sem just where we want him */ logbuf_lock_irqsave(flags); + /* * The printf needs to come first; we need the syslog * prefix which might be passed-in as a parameter. @@ -1890,7 +1904,11 @@ asmlinkage int vprintk_emit(int facility, int level, lflags |= LOG_PREFIX|LOG_NEWLINE; printed_len = log_output(facility, level, lflags, dict, dictlen, text, text_len); - + /* Oops... */ + if (oops_in_progress && !log_oops_seq) { + log_oops_seq = log_next_seq; + log_oops_idx = log_next_idx; + } logbuf_unlock_irqrestore(flags); /* If called from the scheduler, we can not call up(). */ @@ -2396,6 +2414,7 @@ void console_unlock(void) stop_critical_timings();/* don't trace print latency */ call_console_drivers(ext_text, ext_len, text, len); + console_unlock_delay(); start_critical_timings(); if (console_lock_spinning_disable_and_check()) { @@ -2495,6 +2514,24 @@ void console_flush_on_panic(void) console_unlock(); } +/** + * console_reflush_on_panic - re-flush console content starting from the + * first oops_in_progress record + */ +void console_reflush_on_panic(void) +{ + unsigned long flags; + + logbuf_lock_irqsave(flags); + console_seq = log_oops_seq; + console_idx = log_oops_idx; + logbuf_unlock_irqrestore(flags); + + if (!printk_delay_msec) + printk_delay_msec = 273; /* I can't read any faster */ + console_flush_on_panic(); +} + /* * Return the console tty driver structure and its
Re: [PATCH] print kdump kernel loaded status in stack dump
On (01/18/18 10:02), Andi Kleen wrote: > Dave Young writes: > > printk("%sHardware name: %s\n", > >log_lvl, dump_stack_arch_desc_str); > > + if (kexec_crash_loaded()) > > + printk("%skdump kernel loaded\n", log_lvl); > > Oops/warnings are getting longer and longer, often scrolling away > from the screen, and if the kernel crashes backscroll does not work > anymore, so precious information is lost. true. I even ended up having a console_reflush_on_panic() function. it simply re-prints with a delay [so I can at least read the oops] logbuf entries every once in a while, staring with the first oops_in_progress record. something like below [it's completely hacked up, but at least gives an idea] --- include/linux/console.h | 1 + kernel/panic.c | 7 +++ kernel/printk/printk.c | 39 ++- 3 files changed, 46 insertions(+), 1 deletion(-) diff --git a/include/linux/console.h b/include/linux/console.h index b8920a031a3e..502e3f539448 100644 --- a/include/linux/console.h +++ b/include/linux/console.h @@ -168,6 +168,7 @@ extern void console_unlock(void); extern void console_conditional_schedule(void); extern void console_unblank(void); extern void console_flush_on_panic(void); +extern void console_reflush_on_panic(void); extern struct tty_driver *console_device(int *); extern void console_stop(struct console *); extern void console_start(struct console *); diff --git a/kernel/panic.c b/kernel/panic.c index 2cfef408fec9..39cd59bbfaab 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -137,6 +137,7 @@ void panic(const char *fmt, ...) va_list args; long i, i_next = 0; int state = 0; + int reflush_tick = 0; int old_cpu, this_cpu; bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers; @@ -298,6 +299,12 @@ void panic(const char *fmt, ...) i_next = i + 3600 / PANIC_BLINK_SPD; } mdelay(PANIC_TIMER_STEP); + + reflush_tick++; + if (reflush_tick == 32) { /* don't reflush too often */ + console_reflush_on_panic(); + reflush_tick = 0; + } } } diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 9cb943c90d98..ef3f28d4c741 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -426,6 +426,10 @@ static u32 log_next_idx; static u64 console_seq; static u32 console_idx; +/* index and sequence number of the record which started the oops print out */ +static u64 log_oops_seq; +static u32 log_oops_idx; + /* the next printk record to read after the last 'clear' command */ static u64 clear_seq; static u32 clear_idx; @@ -1736,6 +1740,15 @@ static inline void printk_delay(void) } } +/* + * Why do we have printk_delay() in vprintk_emit() + * and not in console_unlock()? + */ +static inline void console_unlock_delay(void) +{ + printk_delay(); +} + /* * Continuation lines are buffered, and not committed to the record buffer * until the line is complete, or a race forces it. The line fragments @@ -1849,6 +1862,7 @@ asmlinkage int vprintk_emit(int facility, int level, /* This stops the holder of console_sem just where we want him */ logbuf_lock_irqsave(flags); + /* * The printf needs to come first; we need the syslog * prefix which might be passed-in as a parameter. @@ -1890,7 +1904,11 @@ asmlinkage int vprintk_emit(int facility, int level, lflags |= LOG_PREFIX|LOG_NEWLINE; printed_len = log_output(facility, level, lflags, dict, dictlen, text, text_len); - + /* Oops... */ + if (oops_in_progress && !log_oops_seq) { + log_oops_seq = log_next_seq; + log_oops_idx = log_next_idx; + } logbuf_unlock_irqrestore(flags); /* If called from the scheduler, we can not call up(). */ @@ -2396,6 +2414,7 @@ void console_unlock(void) stop_critical_timings();/* don't trace print latency */ call_console_drivers(ext_text, ext_len, text, len); + console_unlock_delay(); start_critical_timings(); if (console_lock_spinning_disable_and_check()) { @@ -2495,6 +2514,24 @@ void console_flush_on_panic(void) console_unlock(); } +/** + * console_reflush_on_panic - re-flush console content starting from the + * first oops_in_progress record + */ +void console_reflush_on_panic(void) +{ + unsigned long flags; + + logbuf_lock_irqsave(flags); + console_seq = log_oops_seq; + console_idx = log_oops_idx; + logbuf_unlock_irqrestore(flags); + + if (!printk_delay_msec) + printk_delay_msec = 273; /* I can't read any faster */ + console_flush_on_panic(); +} + /* * Return the console tty driver structure and its associated index */ --
Re: [PATCH] cpufreq: remove at32ap-cpufreq
On 18-01-18, 21:02, Corentin Labbe wrote: > Since AVR32 arch was removed, at32ap-cpufreq is useless. > Remove this driver. > > Signed-off-by: Corentin Labbe> --- > drivers/cpufreq/Kconfig | 10 --- > drivers/cpufreq/Makefile | 1 - > drivers/cpufreq/at32ap-cpufreq.c | 127 > --- > 3 files changed, 138 deletions(-) > delete mode 100644 drivers/cpufreq/at32ap-cpufreq.c Acked-by: Viresh Kumar -- viresh
Re: [PATCH] cpufreq: remove at32ap-cpufreq
On 18-01-18, 21:02, Corentin Labbe wrote: > Since AVR32 arch was removed, at32ap-cpufreq is useless. > Remove this driver. > > Signed-off-by: Corentin Labbe > --- > drivers/cpufreq/Kconfig | 10 --- > drivers/cpufreq/Makefile | 1 - > drivers/cpufreq/at32ap-cpufreq.c | 127 > --- > 3 files changed, 138 deletions(-) > delete mode 100644 drivers/cpufreq/at32ap-cpufreq.c Acked-by: Viresh Kumar -- viresh
Re: [PATCH 0/7] PM /Domain/OPP: Add support to get performance state from DT
On 18-01-18, 20:24, Rafael J. Wysocki wrote: > On Thursday, January 18, 2018 7:34:04 AM CET Viresh Kumar wrote: > > On 22-12-17, 12:56, Viresh Kumar wrote: > > > Hi, > > > > > > Now that the DT bindings [1] are already Reviewed/Acked by respective > > > maintainers, here is the code to start using them. > > > > > > The first two patches provide helpers in the OPP core, [3-5]/7 update > > > the PM domain core to start supporting domain OPP tables, etc, 6/7 > > > updates the OPP core to use the new callback provided by the PM domains > > > to get performance state and the last one removes the unused helpers > > > now. > > > > > > This is tested on Hikey620 and works just fine. > > > > Ping ! > > Well, whom are you pinging exactly and why? Ulf and Kevin as its been almost a month since this series is posted and has received no comments at all. -- viresh
Re: [PATCH 0/7] PM /Domain/OPP: Add support to get performance state from DT
On 18-01-18, 20:24, Rafael J. Wysocki wrote: > On Thursday, January 18, 2018 7:34:04 AM CET Viresh Kumar wrote: > > On 22-12-17, 12:56, Viresh Kumar wrote: > > > Hi, > > > > > > Now that the DT bindings [1] are already Reviewed/Acked by respective > > > maintainers, here is the code to start using them. > > > > > > The first two patches provide helpers in the OPP core, [3-5]/7 update > > > the PM domain core to start supporting domain OPP tables, etc, 6/7 > > > updates the OPP core to use the new callback provided by the PM domains > > > to get performance state and the last one removes the unused helpers > > > now. > > > > > > This is tested on Hikey620 and works just fine. > > > > Ping ! > > Well, whom are you pinging exactly and why? Ulf and Kevin as its been almost a month since this series is posted and has received no comments at all. -- viresh
RE: [RFC] Per file OOM badness
Basically the idea is right to me. 1. But we need smaller granularity to control the contribution to OOM badness. Because when the TTM buffer resides in VRAM rather than evict to system memory, we should not take this account into badness. But I think it is not easy to implement. 2. If the TTM buffer(GTT here) is mapped to user for CPU access, not quite sure the buffer size is already taken into account for kernel. If yes, at last the size will be counted again by your patches. So, I am thinking if we can counted the TTM buffer size into: struct mm_rss_stat { atomic_long_t count[NR_MM_COUNTERS]; }; Which is done by kernel based on CPU VM (page table). Something like that: When GTT allocate suceess: add_mm_counter(vma->vm_mm, MM_ANONPAGES, buffer_size); When GTT swapped out: dec_mm_counter from MM_ANONPAGES frist, then add_mm_counter(vma->vm_mm, MM_SWAPENTS, buffer_size); // or MM_SHMEMPAGES or add new item. Update the corresponding item in mm_rss_stat always. If that, we can control the status update accurately. What do you think about that? And is there any side-effect for this approach? Thanks Roger(Hongbo.He) -Original Message- From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf Of Andrey Grodzovsky Sent: Friday, January 19, 2018 12:48 AM To: linux-kernel@vger.kernel.org; linux...@kvack.org; dri-de...@lists.freedesktop.org; amd-...@lists.freedesktop.org Cc: Koenig, ChristianSubject: [RFC] Per file OOM badness Hi, this series is a revised version of an RFC sent by Christian König a few years ago. The original RFC can be found at https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html This is the same idea and I've just adressed his concern from the original RFC and switched to a callback into file_ops instead of a new member in struct file. Thanks, Andrey ___ dri-devel mailing list dri-de...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: [RFC] Per file OOM badness
Basically the idea is right to me. 1. But we need smaller granularity to control the contribution to OOM badness. Because when the TTM buffer resides in VRAM rather than evict to system memory, we should not take this account into badness. But I think it is not easy to implement. 2. If the TTM buffer(GTT here) is mapped to user for CPU access, not quite sure the buffer size is already taken into account for kernel. If yes, at last the size will be counted again by your patches. So, I am thinking if we can counted the TTM buffer size into: struct mm_rss_stat { atomic_long_t count[NR_MM_COUNTERS]; }; Which is done by kernel based on CPU VM (page table). Something like that: When GTT allocate suceess: add_mm_counter(vma->vm_mm, MM_ANONPAGES, buffer_size); When GTT swapped out: dec_mm_counter from MM_ANONPAGES frist, then add_mm_counter(vma->vm_mm, MM_SWAPENTS, buffer_size); // or MM_SHMEMPAGES or add new item. Update the corresponding item in mm_rss_stat always. If that, we can control the status update accurately. What do you think about that? And is there any side-effect for this approach? Thanks Roger(Hongbo.He) -Original Message- From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf Of Andrey Grodzovsky Sent: Friday, January 19, 2018 12:48 AM To: linux-kernel@vger.kernel.org; linux...@kvack.org; dri-de...@lists.freedesktop.org; amd-...@lists.freedesktop.org Cc: Koenig, Christian Subject: [RFC] Per file OOM badness Hi, this series is a revised version of an RFC sent by Christian König a few years ago. The original RFC can be found at https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html This is the same idea and I've just adressed his concern from the original RFC and switched to a callback into file_ops instead of a new member in struct file. Thanks, Andrey ___ dri-devel mailing list dri-de...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH] e1000e: Remove Other from EIAC.
On 2018/01/18 18:42, Shrikrishna Khare wrote: > > > On Thu, 18 Jan 2018, Benjamin Poirier wrote: > > > On 2018/01/18 15:50, Benjamin Poirier wrote: > > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build > > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver > > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after > > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other() > > > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware, > > > icr=0x8004 (_INT_ASSERTED | _OTHER) in the same situation. > > > > > > Some experimentation showed that this flaw in vmware e1000e emulation can > > > be worked around by not setting Other in EIAC. This is how it was before > > > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1). > > > > vmware folks, please comment. > > Thank you for bringing this to our attention. > > Using the reported build (ESX 6.5, 7526125) and 4.15.0-rc8+ kernel (which > has the said patch), I could bring up e1000e interface (version: 3.2.6-k), > get dhcp address and even do large file downloads without difficulty. > > Could you give us more pointers on how we may be able to reproduce this > locally? Was there anything different with the configuration when the > issue was observed? Is the issue consistently reproducible? It's consistently reproducible, however I noticed that once in a while there is a genuine "Other" interrupt that comes in and triggers the link status change. The problem is with interrupts that are triggered via a write to ICS (such as in e1000e_trigger_lsc()). Can you reproduce a problem if you do: ip link set ethX down ip link set ethX up If you're building your own kernel, you can add the following patch and cat /sys/kernel/debug/tracing/trace_pipe For me it shows on v4.15-rc8: <...>-2578 [000] 83527.938321: e1000e_trigger_lsc: trigger_lsc <...>-2578 [000] d.h. 83527.938398: e1000_msix_other: icr 0x0 With the patch that I submitted, it shows: wickedd-1329 [002] .N..20.123545: e1000e_trigger_lsc: trigger_lsc -0 [000] d.h.20.123630: e1000_msix_other: icr 0x8104 -0 [000] d.h.20.123654: e1000_msix_other: lsc -0 [000] d.h.20.123676: e1000_msix_other: mod_timer diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 9f18d39bdc8f..16620ce840fc 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -1918,22 +1918,29 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data) bool enable = true; icr = er32(ICR); + trace_printk("icr 0x%x\n", icr); + if (icr & E1000_ICR_RXO) { + trace_printk("rxo\n"); ew32(ICR, E1000_ICR_RXO); enable = false; /* napi poll will re-enable Other, make sure it runs */ if (napi_schedule_prep(>napi)) { + trace_printk("napi schedule\n"); adapter->total_rx_bytes = 0; adapter->total_rx_packets = 0; __napi_schedule(>napi); } } if (icr & E1000_ICR_LSC) { + trace_printk("lsc\n"); ew32(ICR, E1000_ICR_LSC); hw->mac.get_link_status = true; /* guard against interrupt when we're going down */ - if (!test_bit(__E1000_DOWN, >state)) + if (!test_bit(__E1000_DOWN, >state)) { + trace_printk("mod_timer\n"); mod_timer(>watchdog_timer, jiffies + 1); + } } if (enable && !test_bit(__E1000_DOWN, >state)) @@ -4221,6 +4228,8 @@ static void e1000e_trigger_lsc(struct e1000_adapter *adapter) { struct e1000_hw *hw = >hw; + trace_printk("trigger_lsc\n"); + if (adapter->msix_entries) ew32(ICS, E1000_ICS_LSC | E1000_ICS_OTHER); else
Re: [RFC PATCH] e1000e: Remove Other from EIAC.
On 2018/01/18 18:42, Shrikrishna Khare wrote: > > > On Thu, 18 Jan 2018, Benjamin Poirier wrote: > > > On 2018/01/18 15:50, Benjamin Poirier wrote: > > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build > > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver > > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after > > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other() > > > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware, > > > icr=0x8004 (_INT_ASSERTED | _OTHER) in the same situation. > > > > > > Some experimentation showed that this flaw in vmware e1000e emulation can > > > be worked around by not setting Other in EIAC. This is how it was before > > > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1). > > > > vmware folks, please comment. > > Thank you for bringing this to our attention. > > Using the reported build (ESX 6.5, 7526125) and 4.15.0-rc8+ kernel (which > has the said patch), I could bring up e1000e interface (version: 3.2.6-k), > get dhcp address and even do large file downloads without difficulty. > > Could you give us more pointers on how we may be able to reproduce this > locally? Was there anything different with the configuration when the > issue was observed? Is the issue consistently reproducible? It's consistently reproducible, however I noticed that once in a while there is a genuine "Other" interrupt that comes in and triggers the link status change. The problem is with interrupts that are triggered via a write to ICS (such as in e1000e_trigger_lsc()). Can you reproduce a problem if you do: ip link set ethX down ip link set ethX up If you're building your own kernel, you can add the following patch and cat /sys/kernel/debug/tracing/trace_pipe For me it shows on v4.15-rc8: <...>-2578 [000] 83527.938321: e1000e_trigger_lsc: trigger_lsc <...>-2578 [000] d.h. 83527.938398: e1000_msix_other: icr 0x0 With the patch that I submitted, it shows: wickedd-1329 [002] .N..20.123545: e1000e_trigger_lsc: trigger_lsc -0 [000] d.h.20.123630: e1000_msix_other: icr 0x8104 -0 [000] d.h.20.123654: e1000_msix_other: lsc -0 [000] d.h.20.123676: e1000_msix_other: mod_timer diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 9f18d39bdc8f..16620ce840fc 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -1918,22 +1918,29 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data) bool enable = true; icr = er32(ICR); + trace_printk("icr 0x%x\n", icr); + if (icr & E1000_ICR_RXO) { + trace_printk("rxo\n"); ew32(ICR, E1000_ICR_RXO); enable = false; /* napi poll will re-enable Other, make sure it runs */ if (napi_schedule_prep(>napi)) { + trace_printk("napi schedule\n"); adapter->total_rx_bytes = 0; adapter->total_rx_packets = 0; __napi_schedule(>napi); } } if (icr & E1000_ICR_LSC) { + trace_printk("lsc\n"); ew32(ICR, E1000_ICR_LSC); hw->mac.get_link_status = true; /* guard against interrupt when we're going down */ - if (!test_bit(__E1000_DOWN, >state)) + if (!test_bit(__E1000_DOWN, >state)) { + trace_printk("mod_timer\n"); mod_timer(>watchdog_timer, jiffies + 1); + } } if (enable && !test_bit(__E1000_DOWN, >state)) @@ -4221,6 +4228,8 @@ static void e1000e_trigger_lsc(struct e1000_adapter *adapter) { struct e1000_hw *hw = >hw; + trace_printk("trigger_lsc\n"); + if (adapter->msix_entries) ew32(ICS, E1000_ICS_LSC | E1000_ICS_OTHER); else
Re: [PATCH v8 5/5] document: add document for kaslr_mem
On Fri, Jan 19, 2018 at 11:53:31AM +0800, Baoquan He wrote: >On 01/19/18 at 11:36am, Chao Fan wrote: >> Signed-off-by: Chao Fan>> --- >> Documentation/admin-guide/kernel-parameters.txt | 10 ++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt >> b/Documentation/admin-guide/kernel-parameters.txt >> index e2de7c006a74..28a879f62560 100644 >> --- a/Documentation/admin-guide/kernel-parameters.txt >> +++ b/Documentation/admin-guide/kernel-parameters.txt >> @@ -2350,6 +2350,16 @@ >> allocations which rules out almost all kernel >> allocations. Use with caution! >> >> +kaslr_mem=nn[KMG][@ss[KMG]] >> +[KNL] Force usage of a specific region of memory >> +for KASLR during kernel decompression stage. >> +Region of usable memory is from ss to ss+nn. If ss >> +is omitted, it is qeuivalent to kaslr_mem=nn[KMG]@0. >> +Multiple regions can be specified, comma delimited. >> +Notice: we support 4 regions at most now. > >Better not use 'we' here. You can refer to kernel-parameter.txt. You are right, so I resend this part, and add several Cc. Thanks, Chao Fan > >> +Example: >> +kaslr_mem=1G,500M@2G,1G@4G >> + >> MTD_Partition= [MTD] >> Format: ,,, >> >> -- >> 2.14.3 >> >> >> > >
Re: [PATCH v8 5/5] document: add document for kaslr_mem
On Fri, Jan 19, 2018 at 11:53:31AM +0800, Baoquan He wrote: >On 01/19/18 at 11:36am, Chao Fan wrote: >> Signed-off-by: Chao Fan >> --- >> Documentation/admin-guide/kernel-parameters.txt | 10 ++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt >> b/Documentation/admin-guide/kernel-parameters.txt >> index e2de7c006a74..28a879f62560 100644 >> --- a/Documentation/admin-guide/kernel-parameters.txt >> +++ b/Documentation/admin-guide/kernel-parameters.txt >> @@ -2350,6 +2350,16 @@ >> allocations which rules out almost all kernel >> allocations. Use with caution! >> >> +kaslr_mem=nn[KMG][@ss[KMG]] >> +[KNL] Force usage of a specific region of memory >> +for KASLR during kernel decompression stage. >> +Region of usable memory is from ss to ss+nn. If ss >> +is omitted, it is qeuivalent to kaslr_mem=nn[KMG]@0. >> +Multiple regions can be specified, comma delimited. >> +Notice: we support 4 regions at most now. > >Better not use 'we' here. You can refer to kernel-parameter.txt. You are right, so I resend this part, and add several Cc. Thanks, Chao Fan > >> +Example: >> +kaslr_mem=1G,500M@2G,1G@4G >> + >> MTD_Partition= [MTD] >> Format: ,,, >> >> -- >> 2.14.3 >> >> >> > >
[RESEND PATCH v8 5/5] document: add document for kaslr_mem
Cc: linux-...@vger.kernel.org Cc: Jonathan CorbetCc: Randy Dunlap Signed-off-by: Chao Fan --- Documentation/admin-guide/kernel-parameters.txt | 10 ++ 1 file changed, 10 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index e2de7c006a74..2e3d5fb13f7f 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2350,6 +2350,16 @@ allocations which rules out almost all kernel allocations. Use with caution! + kaslr_mem=nn[KMG][@ss[KMG]] + [KNL] Force usage of a specific region of memory + for KASLR during kernel decompression stage. + Region of usable memory is from ss to ss+nn. If ss + is omitted, it is qeuivalent to kaslr_mem=nn[KMG]@0. + Multiple regions can be specified, comma delimited. + Notice: only support 4 regions at most now. + Example: + kaslr_mem=1G,500M@2G,1G@4G + MTD_Partition= [MTD] Format: ,,, -- 2.14.3
[RESEND PATCH v8 5/5] document: add document for kaslr_mem
Cc: linux-...@vger.kernel.org Cc: Jonathan Corbet Cc: Randy Dunlap Signed-off-by: Chao Fan --- Documentation/admin-guide/kernel-parameters.txt | 10 ++ 1 file changed, 10 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index e2de7c006a74..2e3d5fb13f7f 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2350,6 +2350,16 @@ allocations which rules out almost all kernel allocations. Use with caution! + kaslr_mem=nn[KMG][@ss[KMG]] + [KNL] Force usage of a specific region of memory + for KASLR during kernel decompression stage. + Region of usable memory is from ss to ss+nn. If ss + is omitted, it is qeuivalent to kaslr_mem=nn[KMG]@0. + Multiple regions can be specified, comma delimited. + Notice: only support 4 regions at most now. + Example: + kaslr_mem=1G,500M@2G,1G@4G + MTD_Partition= [MTD] Format: ,,, -- 2.14.3
Re: [PATCH v4 07/13] ARM: dts: rockchip: add clocks in vop iommu nodes
On Fri, Jan 19, 2018 at 1:55 PM, JeffyChenwrote: > Hi Tomasz, > > Thanks for your reply. > > > On 01/19/2018 11:23 AM, Tomasz Figa wrote: >> >> On Thu, Jan 18, 2018 at 8:52 PM, Jeffy Chen >> wrote: >>> >>> Add clocks in vop iommu nodes, since we are going to control clocks in >>> rockchip iommu driver. >>> >>> Signed-off-by: Jeffy Chen >>> --- >>> >>> Changes in v4: None >>> Changes in v3: None >>> Changes in v2: None >>> >>> arch/arm/boot/dts/rk3036.dtsi | 2 ++ >>> arch/arm/boot/dts/rk3288.dtsi | 4 >>> 2 files changed, 6 insertions(+) >>> >>> diff --git a/arch/arm/boot/dts/rk3036.dtsi >>> b/arch/arm/boot/dts/rk3036.dtsi >>> index 3b704cfed69a..95b0ebc7a40f 100644 >>> --- a/arch/arm/boot/dts/rk3036.dtsi >>> +++ b/arch/arm/boot/dts/rk3036.dtsi >>> @@ -197,6 +197,8 @@ >>> reg = <0x10118300 0x100>; >>> interrupts = ; >>> interrupt-names = "vop_mmu"; >>> + clocks = < ACLK_LCDC>, < SCLK_LCDC>, < >>> HCLK_LCDC>; >>> + clock-names = "aclk_vop", "dclk_vop", "hclk_vop"; >> >> >> We should remove clock-names from IOMMU nodes. The Rockchip IOMMU >> bindings don't define clock names and only the clocks property should >> be given. >> > hmmm, i'm trying to switch to clk_bulk APIs, the get and put are name based. > or maybe i can use clk_get/put along with other clk_bulk APIs I think it should be possible to just put the clock pointers to the clk_bulk_data struct manually. Otherwise, I'm not sure what names we could use for clock-names, since the clocks depend on master. (Something like "clock0, clock1, clock2, ..., clockN" could work, but it doesn't add any value IMHO...).
Re: [PATCH v4 07/13] ARM: dts: rockchip: add clocks in vop iommu nodes
On Fri, Jan 19, 2018 at 1:55 PM, JeffyChen wrote: > Hi Tomasz, > > Thanks for your reply. > > > On 01/19/2018 11:23 AM, Tomasz Figa wrote: >> >> On Thu, Jan 18, 2018 at 8:52 PM, Jeffy Chen >> wrote: >>> >>> Add clocks in vop iommu nodes, since we are going to control clocks in >>> rockchip iommu driver. >>> >>> Signed-off-by: Jeffy Chen >>> --- >>> >>> Changes in v4: None >>> Changes in v3: None >>> Changes in v2: None >>> >>> arch/arm/boot/dts/rk3036.dtsi | 2 ++ >>> arch/arm/boot/dts/rk3288.dtsi | 4 >>> 2 files changed, 6 insertions(+) >>> >>> diff --git a/arch/arm/boot/dts/rk3036.dtsi >>> b/arch/arm/boot/dts/rk3036.dtsi >>> index 3b704cfed69a..95b0ebc7a40f 100644 >>> --- a/arch/arm/boot/dts/rk3036.dtsi >>> +++ b/arch/arm/boot/dts/rk3036.dtsi >>> @@ -197,6 +197,8 @@ >>> reg = <0x10118300 0x100>; >>> interrupts = ; >>> interrupt-names = "vop_mmu"; >>> + clocks = < ACLK_LCDC>, < SCLK_LCDC>, < >>> HCLK_LCDC>; >>> + clock-names = "aclk_vop", "dclk_vop", "hclk_vop"; >> >> >> We should remove clock-names from IOMMU nodes. The Rockchip IOMMU >> bindings don't define clock names and only the clocks property should >> be given. >> > hmmm, i'm trying to switch to clk_bulk APIs, the get and put are name based. > or maybe i can use clk_get/put along with other clk_bulk APIs I think it should be possible to just put the clock pointers to the clk_bulk_data struct manually. Otherwise, I'm not sure what names we could use for clock-names, since the clocks depend on master. (Something like "clock0, clock1, clock2, ..., clockN" could work, but it doesn't add any value IMHO...).
Re: linux-next: build warning after merge of the crypto tree
On Fri, Jan 19, 2018 at 09:51:43AM +0530, Harsh Jain wrote: > Hi Herbert, > > It's an indentation issue. Seems checkpatch and default compile options does > not report this warning. > > How would you like to take the fix. Should I sent whole series again with fix > or only indentation patch. Please send an incremental patch. Thanks, -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: linux-next: build warning after merge of the crypto tree
On Fri, Jan 19, 2018 at 09:51:43AM +0530, Harsh Jain wrote: > Hi Herbert, > > It's an indentation issue. Seems checkpatch and default compile options does > not report this warning. > > How would you like to take the fix. Should I sent whole series again with fix > or only indentation patch. Please send an incremental patch. Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle
On Fri, 2018-01-19 at 10:32 +0800, Ming Lei wrote: > Now most of times both NVMe and SCSI won't return BLK_STS_RESOURCE, and > it should be DM-only which returns STS_RESOURCE so often. That's wrong at least for SCSI. See also https://marc.info/?l=linux-block=151578329417076. Bart.
Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle
On Fri, 2018-01-19 at 10:32 +0800, Ming Lei wrote: > Now most of times both NVMe and SCSI won't return BLK_STS_RESOURCE, and > it should be DM-only which returns STS_RESOURCE so often. That's wrong at least for SCSI. See also https://marc.info/?l=linux-block=151578329417076. Bart.
Re: [PATCH v5 29/44] ARM: da8xx: add new USB PHY clock init using common clock framework
On Friday 19 January 2018 12:13 AM, David Lechner wrote: > On 01/18/2018 09:14 AM, Sekhar Nori wrote: >> On Monday 08 January 2018 07:47 AM, David Lechner wrote: >>> +int __init da8xx_register_usb20_phy_clk(bool use_usb_refclkin) >>> +{ >>> + struct regmap *cfgchip; >>> + struct clk *usb0_psc_clk, *clk; >>> + struct clk_hw *parent; >>> + >>> + cfgchip = syscon_regmap_lookup_by_compatible("ti,da830-cfgchip"); >> >> Am I right in understanding that this API is only called for non-DT >> boot? If yes, do we really need the lookup by compatible? > > This code is used in DT boot until [PATCH v5 43/44] "ARM: da8xx-dt: > switch to device tree clocks". So, yes it is needed temporarily to > prevent breaking USB. Alright, so this line should probably be dropped either as part of 43/44 or later. Thanks, Sekhar
Re: [PATCH v5 29/44] ARM: da8xx: add new USB PHY clock init using common clock framework
On Friday 19 January 2018 12:13 AM, David Lechner wrote: > On 01/18/2018 09:14 AM, Sekhar Nori wrote: >> On Monday 08 January 2018 07:47 AM, David Lechner wrote: >>> +int __init da8xx_register_usb20_phy_clk(bool use_usb_refclkin) >>> +{ >>> + struct regmap *cfgchip; >>> + struct clk *usb0_psc_clk, *clk; >>> + struct clk_hw *parent; >>> + >>> + cfgchip = syscon_regmap_lookup_by_compatible("ti,da830-cfgchip"); >> >> Am I right in understanding that this API is only called for non-DT >> boot? If yes, do we really need the lookup by compatible? > > This code is used in DT boot until [PATCH v5 43/44] "ARM: da8xx-dt: > switch to device tree clocks". So, yes it is needed temporarily to > prevent breaking USB. Alright, so this line should probably be dropped either as part of 43/44 or later. Thanks, Sekhar