Re: [RFC][PATCH] sysctl: Remove the sysctl system call

2019-10-02 Thread Florian Weimer
Is anyone else getting a very incomplete set of messages in this
thread?

These changes likely matter to glibc, and I've yet to see the actual
patch.  Would someone please forward it to me?

The original message didn't make it into the lore.kernel.org archives
(the cross-post to linux-kernel should have taken care of that).


Re: [PATCH v11 5/7] drm/sun4i: sun6i_mipi_dsi: Add VCC-DSI regulator support

2019-10-02 Thread Chen-Yu Tsai
On Thu, Oct 3, 2019 at 2:46 PM Jagan Teki  wrote:
>
> Allwinner MIPI DSI controllers are supplied with SoC
> DSI power rails via VCC-DSI pin.
>
> Add support for this supply pin by adding voltage
> regulator handling code to MIPI DSI driver.
>
> Tested-by: Merlijn Wajer 
> Signed-off-by: Jagan Teki 
> ---
>  drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c | 14 ++
>  drivers/gpu/drm/sun4i/sun6i_mipi_dsi.h |  2 ++
>  2 files changed, 16 insertions(+)
>
> diff --git a/drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c 
> b/drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c
> index 446dc56cc44b..fe9a3667f3a1 100644
> --- a/drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c
> +++ b/drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c
> @@ -1110,6 +1110,12 @@ static int sun6i_dsi_probe(struct platform_device 
> *pdev)
> return PTR_ERR(base);
> }
>
> +   dsi->regulator = devm_regulator_get(dev, "vcc-dsi");
> +   if (IS_ERR(dsi->regulator)) {
> +   dev_err(dev, "Couldn't get VCC-DSI supply\n");
> +   return PTR_ERR(dsi->regulator);
> +   }
> +
> dsi->regs = devm_regmap_init_mmio_clk(dev, "bus", base,
>   &sun6i_dsi_regmap_config);
> if (IS_ERR(dsi->regs)) {
> @@ -1183,6 +1189,13 @@ static int sun6i_dsi_remove(struct platform_device 
> *pdev)
>  static int __maybe_unused sun6i_dsi_runtime_resume(struct device *dev)
>  {
> struct sun6i_dsi *dsi = dev_get_drvdata(dev);
> +   int err;
> +
> +   err = regulator_enable(dsi->regulator);
> +   if (err) {
> +   dev_err(dsi->dev, "failed to enable VCC-DSI supply: %d\n", 
> err);
> +   return err;
> +   }
>
> reset_control_deassert(dsi->reset);
> clk_prepare_enable(dsi->mod_clk);
> @@ -1215,6 +1228,7 @@ static int __maybe_unused 
> sun6i_dsi_runtime_suspend(struct device *dev)
>
> clk_disable_unprepare(dsi->mod_clk);
> reset_control_assert(dsi->reset);
> +   regulator_disable(dsi->regulator);
>
> return 0;
>  }
> diff --git a/drivers/gpu/drm/sun4i/sun6i_mipi_dsi.h 
> b/drivers/gpu/drm/sun4i/sun6i_mipi_dsi.h
> index 5c3ad5be0690..a01d44e9e461 100644
> --- a/drivers/gpu/drm/sun4i/sun6i_mipi_dsi.h
> +++ b/drivers/gpu/drm/sun4i/sun6i_mipi_dsi.h
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

You don't need to include the header file since you are only
including a pointer to the struct, and nothing else.

Otherwise,

Reviewed-by: Chen-Yu Tsai 

>
>  #define SUN6I_DSI_TCON_DIV 4
>
> @@ -23,6 +24,7 @@ struct sun6i_dsi {
> struct clk  *bus_clk;
> struct clk  *mod_clk;
> struct regmap   *regs;
> +   struct regulator*regulator;
> struct reset_control*reset;
> struct phy  *dphy;
>
> --
> 2.18.0.321.gffc6fa0e3
>


Re: [Patch 1/3] media: ov5640: add PIXEL_RATE control

2019-10-02 Thread Jacopo Mondi
Hi Benoit,

On Wed, Oct 02, 2019 at 10:06:15AM -0500, Benoit Parrot wrote:
> Jacopo Mondi  wrote on Wed [2019-Oct-02 16:32:26 +0200]:
> > Hi Benoit,
> >
> > On Wed, Oct 02, 2019 at 07:14:38AM -0500, Benoit Parrot wrote:
> > > Hi Jacopo,
> > >
> > > Maybe, I miss spoke when I mentioned a helper I did not intent a framework
> > > level generic function. Just a function to help in this case :)
> >
> > Yes indeed, the discussion thread I linked here was mostly interesting
> > because Hugues tried to do the same for LINK_FREQ iirc, and there
> > where some usefult pointers.
> >
> > >
> > > That being said, I re-read the thread you mentioned. And as Hughes pointed
> > > out dynamically generating a "working" link frequency value which can be
> > > used by a CSI2 receiver to properly configure its PHY is not trivial.
> > >
> > > When I created this patch, I also had another to add V4L2_CID_LINK_FREQ
> > > support. I am testing this against the TI CAL CSI2 receiver, which already
> > > uses the V4L2_CID_PIXEL_RATE value for that purpose, so I also had a patch
> > > to add support for V4L2_CID_LINK_FREQ to that driver as well.
> > >
> > > Unfortunately, similar to Hughes' findings I was not able to make it 
> > > "work"
> > > with all supported resolution/framerate.
> >
> > As reported by Hugues findings, the PLL calculation procedure might be
> > faulty, and the actuall frequencies on the bus are different from the
> > calculated ones.
> >
> > I wish I had more time to re-look at that, as they worked for my and
> > Sam's use case, but deserve some rework.
> >
> > >
> > > Unlike my V4L2_CID_PIXEL_RATE solution which now works in all mode with 
> > > the
> > > same receiver.
> > >
> >
> > It seems to me you're reporting a fixed rate. It might make your
> > receiver happy, but does not report what is acutally put on the bus.
> > Am I missing something ?
>
> No it is not fixed, the only fixed value was the initial value (which
> representative of the initial/default resolution and framerate), I
> fixed this in v2. The reported PIXEL_RATE is re-calculated every time there
> is a s_fmt and/or framerate change and the V4L2_CID_PIXEL_RATE control
> value is updated accordingly.

Oh I missed v2! I only seen this one.
I'll reply there.

Thanks
  j

>
> >
> > > So long story short I dropped the V4L2_CID_LINK_FREQ patch and focused on
> > > V4L2_CID_PIXEL_RATE instead.
> > >
> >
> > As Sakari pointed out, going from one to the other is trivial and
> > could be done on top.
>
> As you said it could be done on top. :)
>
> Benoit
>
> >
> > Thanks
> >j
> >
> > > Regard,
> > > Benoit
> > >
> > > Jacopo Mondi  wrote on Wed [2019-Oct-02 09:59:51 
> > > +0200]:
> > > > Hi Benoit,
> > > >   +Hugues
> > > >
> > > > If you're considering an helper, this thread might be useful to you:
> > > > https://patchwork.kernel.org/patch/11019673/
> > > >
> > > > Thanks
> > > >j
> > > >
> > >
> > >
>
>


signature.asc
Description: PGP signature


Re: [PATCH v2 2/2] reset: Reset controller driver for Intel LGM SoC

2019-10-02 Thread Dilip Kota

Hi Martin and Philipp,


On 20/9/2019 10:47 AM, Dilip Kota wrote:

Hi Martin,

On 9/20/2019 3:51 AM, Martin Blumenstingl wrote:

Hi Dilip,

(sorry for the late reply)

On Thu, Sep 12, 2019 at 8:38 AM Dilip Kota 
 wrote:

[...]

The major difference between the vrx200 and lgm is:
1.) RCU in vrx200 is having multiple register regions wheres RCU in lgm
has one single register region.
2.) Register offsets and bit offsets are different.

So enhancing the intel-reset-syscon.c to provide compatibility/support
for vrx200.
Please check the below dtsi binding proposal and let me know your view.

rcu0:reset-controller@ {
compatible= "intel,rcu-lgm";
reg = <0x000 0x8>, , ,
;

I'm not sure that I understand what are reg_set2/3/4 for
the first resource (0x8 at 0x0) already covers the whole LGM RCU,
so what is the purpose of the other register resources
Yes, as you said the first register resource is enough for LGM RCU as 
registers are at one single region. Whereas in older SoCs RCU 
registers are at different regions, so for that reason reg_set2/3/4 
are used.


Driver will decide in reading the no. of register resources based on 
the "struct of_device_id".


Regards,
Dilip



intel,global-reset = <0x10 30>;
#reset-cells = <3>;
};

"#reset-cells":
const:3
description: |
The 1st cell is the reset register offset.
The 2nd cell is the reset set bit offset.
The 3rd cell is the reset status bit offset.

I think this will work fine for VRX200 (and even older SoCs)
as you have described in your previous emails we can determine the
status offset from the reset offset using a simple if/else

for LGM I like your initial suggestion with #reset-cells = <2> because
it's easier to read and write.


Reset driver takes care of parsing the register address "reg" as per the
".data" structure in struct of_device_id.
Reset driver takes care of traversing the status register offset.

the differentiation between two and three #reset-cells can also happen
based on the struct of_device_id:
- the LGM implementation would simply also use the reset bit as status
bit (only two cells are needed)
- the implementation for earlier SoCs would parse the third cell and
use that as status bit

Philipp, can you please share your opinion on how to move forward with
the reset-intel driver from this series?
The reset_control_ops from the reset-intel driver are (in my opinion)
a bug-fixed and improved version of what we already have in
drivers/reset/reset-lantiq.c. The driver is NOT simply copy and paste
because the register layout was greatly simplified for the newer SoCs
(for which there is reset-intel) compared to the older ones
(reset-lantiq).
Dilip's suggestion (in my own words) is that you take his new
reset-intel driver, then we will work on porting reset-lantiq over to
that so in the end we can drop the reset-lantiq driver. This approach
means more work for me (as I am probably the one who then has to do
the work to port reset-lantiq over to reset-intel). I'm happy to do
that work if you think that it's worth following this approach.
So I want your opinion on this before I spend any effort on porting
reset-lantiq over to reset-intel.


I will start implementing this design in the next patch version along 
with the other changes suggested in this patch review, please let me 
know if you have other thoughts in this design.


Regards,
Dilip




Martin


[PATCH] arm64: dts: qcom: msm8998: Disable coresight by default

2019-10-02 Thread Sai Prakash Ranjan
Boot failure has been reported on MSM8998 based laptop when
coresight is enabled. This is most likely due to lack of
firmware support for coresight on production device when
compared to debug device like MTP where this issue is not
observed. So disable coresight by default for MSM8998 and
enable it only for MSM8998 MTP.

Reported-and-tested-by: Jeffrey Hugo 
Fixes: 783abfa2249a ("arm64: dts: qcom: msm8998: Add Coresight support")
Signed-off-by: Sai Prakash Ranjan 
---
 arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi | 68 +++
 arch/arm64/boot/dts/qcom/msm8998.dtsi | 51 +++--
 2 files changed, 102 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi 
b/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi
index 108667ce4f31..8d15572d18e6 100644
--- a/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi
@@ -27,6 +27,66 @@
status = "okay";
 };
 
+&etf {
+   status = "okay";
+};
+
+&etm1 {
+   status = "okay";
+};
+
+&etm2 {
+   status = "okay";
+};
+
+&etm3 {
+   status = "okay";
+};
+
+&etm4 {
+   status = "okay";
+};
+
+&etm5 {
+   status = "okay";
+};
+
+&etm6 {
+   status = "okay";
+};
+
+&etm7 {
+   status = "okay";
+};
+
+&etm8 {
+   status = "okay";
+};
+
+&etr {
+   status = "okay";
+};
+
+&funnel1 {
+   status = "okay";
+};
+
+&funnel2 {
+   status = "okay";
+};
+
+&funnel3 {
+   status = "okay";
+};
+
+&funnel4 {
+   status = "okay";
+};
+
+&funnel5 {
+   status = "okay";
+};
+
 &pm8005_lsid1 {
pm8005-regulators {
compatible = "qcom,pm8005-regulators";
@@ -51,6 +111,10 @@
vdda-phy-dpdm-supply = <&vreg_l24a_3p075>;
 };
 
+&replicator1 {
+   status = "okay";
+};
+
 &rpm_requests {
pm8998-regulators {
compatible = "qcom,rpm-pm8998-regulators";
@@ -249,6 +313,10 @@
pinctrl-1 = <&sdc2_clk_off &sdc2_cmd_off &sdc2_data_off &sdc2_cd_off>;
 };
 
+&stm {
+   status = "okay";
+};
+
 &ufshc {
vcc-supply = <&vreg_l20a_2p95>;
vccq-supply = <&vreg_l26a_1p2>;
diff --git a/arch/arm64/boot/dts/qcom/msm8998.dtsi 
b/arch/arm64/boot/dts/qcom/msm8998.dtsi
index c6f81431983e..ffb64fc239ee 100644
--- a/arch/arm64/boot/dts/qcom/msm8998.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8998.dtsi
@@ -998,11 +998,12 @@
#interrupt-cells = <0x2>;
};
 
-   stm@6002000 {
+   stm: stm@6002000 {
compatible = "arm,coresight-stm", "arm,primecell";
reg = <0x06002000 0x1000>,
  <0x1628 0x18>;
reg-names = "stm-base", "stm-data-base";
+   status = "disabled";
 
clocks = <&rpmcc RPM_SMD_QDSS_CLK>, <&rpmcc 
RPM_SMD_QDSS_A_CLK>;
clock-names = "apb_pclk", "atclk";
@@ -1016,9 +1017,10 @@
};
};
 
-   funnel@6041000 {
+   funnel1: funnel@6041000 {
compatible = "arm,coresight-dynamic-funnel", 
"arm,primecell";
reg = <0x06041000 0x1000>;
+   status = "disabled";
 
clocks = <&rpmcc RPM_SMD_QDSS_CLK>, <&rpmcc 
RPM_SMD_QDSS_A_CLK>;
clock-names = "apb_pclk", "atclk";
@@ -1045,9 +1047,10 @@
};
};
 
-   funnel@6042000 {
+   funnel2: funnel@6042000 {
compatible = "arm,coresight-dynamic-funnel", 
"arm,primecell";
reg = <0x06042000 0x1000>;
+   status = "disabled";
 
clocks = <&rpmcc RPM_SMD_QDSS_CLK>, <&rpmcc 
RPM_SMD_QDSS_A_CLK>;
clock-names = "apb_pclk", "atclk";
@@ -1075,9 +1078,10 @@
};
};
 
-   funnel@6045000 {
+   funnel3: funnel@6045000 {
compatible = "arm,coresight-dynamic-funnel", 
"arm,primecell";
reg = <0x06045000 0x1000>;
+   status = "disabled";
 
clocks = <&rpmcc RPM_SMD_QDSS_CLK>, <&rpmcc 
RPM_SMD_QDSS_A_CLK>;
clock-names = "apb_pclk", "atclk";
@@ -1113,9 +1117,10 @@
};
};
 
-   replicator@6046000 {
+   replicator1: replicator@6046000 {
compatible = "arm,coresight-dynamic-replicator", 
"arm,primecell";
reg = <0x06046000 0x1000>;
+   status = "disabled";
 
clocks = <&rpmcc RPM_SMD_QDSS_CLK>, <&rpmcc 
RPM_SMD_QDSS_A_CLK>;
clock-names = "apb_pclk", "atclk";
@@ -1137,9 +1142,10 @@
};
};
 
-   etf@6047000 {
+   e

RE: [PATCH] Input: hyperv-keyboard: Add the support of hibernation

2019-10-02 Thread Dexuan Cui
> From: Dexuan Cui
> Sent: Wednesday, October 2, 2019 10:35 PM
> > ... 
> >
> > ¯\_(ツ)_/¯ If you do not want to implement hibernation properly in vmbus
> > code that is totally up to you (have you read in pm.h how freeze() is
> > different from suspend()?).
> > Dmitry
> 
> I understand freeze() is different from suspend(). Here I treat suspend() as a
> heavyweight freeze() for simplicity and IMHO the extra cost of time is
> neglectable considering the long hibernation process, which can take
> 5~10+ seconds.
> 
> Even if I implement all the pm ops, IMO the issue we're talking about
> (i.e. the hibernation process can be aborted by user's keyboard/mouse
> activities) still exists. Actually I think a physical Linux machine should 
> have
> the same issue.
> 
> In practice, IMO the issue is not a big concern, as the VM usually runs in
> a remote data center, and the user has no access to the VM's
> keyboard/mouse. :-)
> 
> I hope I understood your comments. I'll post a v2 without the notifier.
> Please Ack the v2 if it looks good to you.
> 
> -- Dexuan

I think I understood now: it looks the vmbus driver should implement
a prepare() or freeze(), which calls the hyperv_keyboard driver's
prepare() or freeze(), which can set the flag or disable the keyboard
event handling. This way we don't need the notifier.

Please let me know if I still don't get it right.

Thanks,
-- Dexuan


[PATCH] perf/core: fix corner case in perf_rotate_context()

2019-10-02 Thread Song Liu
This is a rare corner case, but it does happen:

In perf_rotate_context(), when the first cpu flexible event fail to
schedule, cpu_rotate is 1, while cpu_event is NULL. Since cpu_event is
NULL, perf_rotate_context will _NOT_ call cpu_ctx_sched_out(), thus
cpuctx->ctx.is_active will have EVENT_FLEXIBLE set. Then, the next
perf_event_sched_in() will skip all cpu flexible events because of the
EVENT_FLEXIBLE bit.

In the next call of perf_rotate_context(), cpu_rotate stays 1, and
cpu_event stays NULL, so this process repeats. The end result is, flexible
events on this cpu will not be scheduled (until another event being added
to the cpuctx).

Similar issue may happen with the task_ctx. But it is usually not a
problem because the task_ctx moves around different CPU.

Fix this corner case by using cpu_rotate and task_rotate to gate calls for
(cpu_)ctx_sched_out and rotate_ctx. Also enable rotate_ctx() to handle
event == NULL case.

Fixes: 8d5bce0c37fa ("perf/core: Optimize perf_rotate_context() event 
scheduling")
Cc: sta...@vger.kernel.org # v4.17+
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Thomas Gleixner 
Signed-off-by: Song Liu 
---
 kernel/events/core.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4655adbbae10..50021735f367 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3775,6 +3775,13 @@ static void rotate_ctx(struct perf_event_context *ctx, 
struct perf_event *event)
if (ctx->rotate_disable)
return;
 
+   /* if no event specified, try to rotate the first event */
+   if (!event)
+   event = rb_entry_safe(rb_first(&ctx->flexible_groups.tree),
+ typeof(*event), group_node);
+   if (!event)
+   return;
+
perf_event_groups_delete(&ctx->flexible_groups, event);
perf_event_groups_insert(&ctx->flexible_groups, event);
 }
@@ -3816,14 +3823,14 @@ static bool perf_rotate_context(struct perf_cpu_context 
*cpuctx)
 * As per the order given at ctx_resched() first 'pop' task flexible
 * and then, if needed CPU flexible.
 */
-   if (task_event || (task_ctx && cpu_event))
+   if (task_rotate || (task_ctx && cpu_rotate))
ctx_sched_out(task_ctx, cpuctx, EVENT_FLEXIBLE);
-   if (cpu_event)
+   if (cpu_rotate)
cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
 
-   if (task_event)
+   if (task_rotate)
rotate_ctx(task_ctx, task_event);
-   if (cpu_event)
+   if (cpu_rotate)
rotate_ctx(&cpuctx->ctx, cpu_event);
 
perf_event_sched_in(cpuctx, task_ctx, current);
-- 
2.17.1



Re: Fwd: Re: [PATCH v3 1/2] dt-bindings: PCI: intel: Add YAML schemas for the PCIe RC controller

2019-10-02 Thread Dilip Kota

Hi Rob,


On 18/9/2019 2:56 PM, Dilip Kota wrote:

On 9/18/2019 2:40 AM, Rob Herring wrote:

On Wed, Sep 04, 2019 at 06:10:30PM +0800, Dilip Kota wrote:

The Intel PCIe RC controller is Synopsys Designware
based PCIe core. Add YAML schemas for PCIe in RC mode
present in Intel Universal Gateway soc.

Signed-off-by: Dilip Kota 
---
changes on v3:
Add the appropriate License-Identifier
Rename intel,rst-interval to 'reset-assert-us'
Add additionalProperties: false
Rename phy-names to 'pciephy'
Remove the dtsi node split of SoC and board in the example
Add #interrupt-cells = <1>; or else interrupt parsing will fail
Name yaml file with compatible name

.../devicetree/bindings/pci/intel,lgm-pcie.yaml | 137 
+

1 file changed, 137 insertions(+)
create mode 100644 
Documentation/devicetree/bindings/pci/intel,lgm-pcie.yaml


diff --git 
a/Documentation/devicetree/bindings/pci/intel,lgm-pcie.yaml 
b/Documentation/devicetree/bindings/pci/intel,lgm-pcie.yaml

new file mode 100644
index ..5e5cc7fd66cd
--- /dev/null
+++ b/Documentation/devicetree/bindings/pci/intel,lgm-pcie.yaml
@@ -0,0 +1,137 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/pci/intel-pcie.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Intel AXI bus based PCI express root complex
+
+maintainers:
+ - Dilip Kota 
+
+properties:
+ compatible:
+ const: intel,lgm-pcie
+
+ device_type:
+ const: pci
+
+ "#address-cells":
+ const: 3
+
+ "#size-cells":
+ const: 2

These all belong in a common schema.


+
+ reg:
+ items:
+ - description: Controller control and status registers.
+ - description: PCIe configuration registers.
+ - description: Controller application registers.
+
+ reg-names:
+ items:
+ - const: dbi
+ - const: config
+ - const: app
+
+ ranges:
+ description: Ranges for the PCI memory and I/O regions.

And this.


+
+ resets:
+ maxItems: 1
+
+ clocks:
+ description: PCIe registers interface clock.
+
+ phys:
+ maxItems: 1
+
+ phy-names:
+ const: pciephy
+
+ reset-gpios:
+ maxItems: 1
+
+ num-lanes:
+ description: Number of lanes to use for this port.
+
+ linux,pci-domain:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description: PCI domain ID.

These 2 also should be common.


+
+ interrupts:
+ description: PCIe core integrated miscellaneous interrupt.

How many? No need for description if there's only 1.


+
+ '#interrupt-cells':
+ const: 1
+
+ interrupt-map-mask:
+ description: Standard PCI IRQ mapping properties.
+
+ interrupt-map:
+ description: Standard PCI IRQ mapping properties.
+
+ max-link-speed:
+ description: Specify PCI Gen for link capability.
+
+ bus-range:
+ description: Range of bus numbers associated with this controller.

All common.

You mean to remove all the common schema entries description!.
In most of the Documention/devicetree/binding/pci documents all the 
common entries are described so I followed the same.


If common schemas are removed, getting below warning during the 
dt_binding_check.


Documentation/devicetree/bindings/pci/intel,lgm-pcie.example.dt.yaml: 
pcie@d0e0: '#address-cells', '#interrupt-cells', '#size-cells', 
'bus-range', 'device_type', 'interrupt-map', 'interrupt-map-mask', 
'interrupt-parent', 'linux,pci-domain', 'ranges', 'reset-gpios' do not 
match any of the regexes: 'pinctrl-[0-9]+'


Regards,
Dilip




+
+ reset-assert-ms:
+ description: |
+ Device reset interval in ms.
+ Some devices need an interval upto 500ms. By default it is 100ms.

This is a property of a device, so it belongs in a device node. How
would you deal with this without DT?
This property is for the PCIe RC to keep a delay before notifying the 
reset to the device.
If this entry is not present, PCIe driver will set a default value of 
100ms.



+
+required:
+ - compatible
+ - device_type
+ - reg
+ - reg-names
+ - ranges
+ - resets
+ - clocks
+ - phys
+ - phy-names
+ - reset-gpios
+ - num-lanes
+ - linux,pci-domain
+ - interrupts
+ - interrupt-map
+ - interrupt-map-mask
+
+additionalProperties: false
+
+examples:
+ - |
+ pcie10:pcie@d0e0 {
+ compatible = "intel,lgm-pcie";
+ device_type = "pci";
+ #address-cells = <3>;
+ #size-cells = <2>;
+ reg = <
+ 0xd0e0 0x1000
+ 0xd200 0x80
+ 0xd0a41000 0x1000
+ >;
+ reg-names = "dbi", "config", "app";
+ linux,pci-domain = <0>;
+ max-link-speed = <4>;
+ bus-range = <0x00 0x08>;
+ interrupt-parent = <&ioapic1>;
+ interrupts = <67 1>;
+ #interrupt-cells = <1>;
+ interrupt-map-mask = <0 0 0 0x7>;
+ interrupt-map = <0 0 0 1 &ioapic1 27 1>,
+ <0 0 0 2 &ioapic1 28 1>,
+ <0 0 0 3 &ioapic1 29 1>,
+ <0 0 0 4 &ioapic1 30 1>;
+ ranges = <0x0200 0 0xd400 0xd400 0 0x0400>;
+ resets = <&rcu0 0x50 0>;
+ clocks = <&cgu0 LGM_GCLK_PCIE10>;
+ phys = <&cb0phy0>;
+ phy-names = "pciephy";
+ status = "okay";
+ reset-assert-ms = <500>;
+ reset-gpios = <&gpio0 3 GPIO_ACTIVE_LOW>;
+ num-lanes = <2>;
+ };
-- 2.11.0



[PATCH v5 0/3] Forced-wakeup for stop states on Powernv

2019-10-02 Thread Abhishek Goel
Currently, the cpuidle governors determine what idle state a idling CPU
should enter into based on heuristics that depend on the idle history on
that CPU. Given that no predictive heuristic is perfect, there are cases
where the governor predicts a shallow idle state, hoping that the CPU will
be busy soon. However, if no new workload is scheduled on that CPU in the
near future, the CPU will end up in the shallow state.

Motivation
--
In case of POWER, this is problematic, when the predicted state in the
aforementioned scenario is a shallow stop state on a tickless system. As
we might get stuck into shallow states even for hours, in absence of ticks
or interrupts.

To address this, We forcefully wakeup the cpu by setting the decrementer.
The decrementer is set to a value that corresponds with the residency of
the next available state. Thus firing up a timer that will forcefully
wakeup the cpu. Few such iterations will essentially train the governor to
select a deeper state for that cpu, as the timer here corresponds to the
next available cpuidle state residency. Thus, cpu will eventually end up
in the deepest possible state and we won't get stuck in a shallow state
for long duration.

Experiment
--
For earlier versions when this feature was meat to be only for shallow lite
states, I performed experiments for three scenarios to collect some data.

case 1 :
Without this patch and without tick retained, i.e. in a upstream kernel,
It would spend more than even a second to get out of stop0_lite.

case 2 : With tick retained in a upstream kernel -

Generally, we have a sched tick at 4ms(CONF_HZ = 250). Ideally I expected
it to take 8 sched tick to get out of stop0_lite. Experimentally,
observation was

=
sample  minmax   99percentile
20  4ms12ms  4ms
=

It would take atleast one sched tick to get out of stop0_lite.

case 2 :  With this patch (not stopping tick, but explicitly queuing a
  timer)


sample  min max 99percentile

20  144us   192us   144us



Description of current implementation
-

We calculate timeout for the current idle state as the residency value
of the next available idle state. If the decrementer is set to be
greater than this timeout, we update the decrementer value with the
residency of next available idle state. Thus, essentially training the
governor to select the next available deeper state until we reach the
deepest state. Hence, we won't get stuck unnecessarily in shallow states
for longer duration.


v1 of auto-promotion : https://lkml.org/lkml/2019/3/22/58 This patch was
implemented only for shallow lite state in generic cpuidle driver.

v2 : Removed timeout_needed and rebased to current
upstream kernel

Then,
v1 of forced-wakeup : Moved the code to cpuidle powernv driver and started
as forced wakeup instead of auto-promotion

v2 : Extended the forced wakeup logic for all states.
Setting the decrementer instead of queuing up a hrtimer to implement the
logic.

v3 : 1) Cleanly handle setting the decrementer after exiting out of stop
   states.
 2) Added a disable_callback feature to compute timeout whenever a
state is enbaled or disabled instead of computing everytime in fast
idle path.
 3) Use disable callback to recompute timeout whenever state usage
is changed for a state. Also, cleaned up the get_snooze_timeout
function.

v4 :Changed the type and name of set/reset decrementer function.
Handled irq work pending in try_set_dec_before_idle.
No change in patch 2 and 3.

v5 :Removed forced wakeup for the last state. We dont want to wakeup
unnecessarily when already in deepest state. It was a mistake in
previous patches that was found out in recent experiments.
No change in patch 2 and 3.

Abhishek Goel (3):
  cpuidle-powernv : forced wakeup for stop states
  cpuidle : Add callback whenever a state usage is enabled/disabled
  cpuidle-powernv : Recompute the idle-state timeouts when state usage
is enabled/disabled

 arch/powerpc/include/asm/time.h   |  2 ++
 arch/powerpc/kernel/time.c| 43 
 drivers/cpuidle/cpuidle-powernv.c | 55 +++
 drivers/cpuidle/sysfs.c   | 15 -
 include/linux/cpuidle.h   |  4 +++
 5 files changed, 105 insertions(+), 14 deletions(-)

-- 
2.17.1



[RFC v5 2/3] cpuidle : Add callback whenever a state usage is enabled/disabled

2019-10-02 Thread Abhishek Goel
To force wakeup a cpu, we need to compute the timeout in the fast idle
path as a state may be enabled or disabled but there did not exist a
feedback to driver when a state is enabled or disabled.
This patch adds a callback whenever a state_usage records a store for
disable attribute

Signed-off-by: Abhishek Goel 
---
 drivers/cpuidle/sysfs.c | 15 ++-
 include/linux/cpuidle.h |  3 +++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c
index 2bb2683b4..6c9bf2f7b 100644
--- a/drivers/cpuidle/sysfs.c
+++ b/drivers/cpuidle/sysfs.c
@@ -418,8 +418,21 @@ static ssize_t cpuidle_state_store(struct kobject *kobj, 
struct attribute *attr,
struct cpuidle_state_attr *cattr = attr_to_stateattr(attr);
struct cpuidle_device *dev = kobj_to_device(kobj);
 
-   if (cattr->store)
+   if (cattr->store) {
ret = cattr->store(state, state_usage, buf, size);
+   if (ret == size &&
+   strncmp(cattr->attr.name, "disable",
+   strlen("disable"))) {
+   struct kobject *cpuidle_kobj = kobj->parent;
+   struct cpuidle_device *dev =
+   to_cpuidle_device(cpuidle_kobj);
+   struct cpuidle_driver *drv =
+   cpuidle_get_cpu_driver(dev);
+
+   if (drv->disable_callback)
+   drv->disable_callback(dev, drv);
+   }
+   }
 
/* reset poll time cache */
dev->poll_limit_ns = 0;
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 4b6b5bea8..1729a497b 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -122,6 +122,9 @@ struct cpuidle_driver {
/* the driver handles the cpus in cpumask */
struct cpumask  *cpumask;
 
+   void (*disable_callback)(struct cpuidle_device *dev,
+struct cpuidle_driver *drv);
+
/* preferred governor to switch at register time */
const char  *governor;
 };
-- 
2.17.1



[RFC v5 3/3] cpuidle-powernv : Recompute the idle-state timeouts when state usage is enabled/disabled

2019-10-02 Thread Abhishek Goel
The disable callback can be used to compute timeout for other states
whenever a state is enabled or disabled. We store the computed timeout
in "timeout" defined in cpuidle state strucure. So, we compute timeout
only when some state is enabled or disabled and not every time in the
fast idle path.
We also use the computed timeout to get timeout for snooze, thus getting
rid of get_snooze_timeout for snooze loop.

Signed-off-by: Abhishek Goel 
---
 drivers/cpuidle/cpuidle-powernv.c | 35 +++
 include/linux/cpuidle.h   |  1 +
 2 files changed, 13 insertions(+), 23 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index d7686ce6e..a75226f52 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -45,7 +45,6 @@ struct stop_psscr_table {
 static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX] 
__read_mostly;
 
 static u64 default_snooze_timeout __read_mostly;
-static bool snooze_timeout_en __read_mostly;
 
 static u64 forced_wakeup_timeout(struct cpuidle_device *dev,
 struct cpuidle_driver *drv,
@@ -67,26 +66,13 @@ static u64 forced_wakeup_timeout(struct cpuidle_device *dev,
return 0;
 }
 
-static u64 get_snooze_timeout(struct cpuidle_device *dev,
- struct cpuidle_driver *drv,
- int index)
+static void pnv_disable_callback(struct cpuidle_device *dev,
+struct cpuidle_driver *drv)
 {
int i;
 
-   if (unlikely(!snooze_timeout_en))
-   return default_snooze_timeout;
-
-   for (i = index + 1; i < drv->state_count; i++) {
-   struct cpuidle_state *s = &drv->states[i];
-   struct cpuidle_state_usage *su = &dev->states_usage[i];
-
-   if (s->disabled || su->disable)
-   continue;
-
-   return s->target_residency * tb_ticks_per_usec;
-   }
-
-   return default_snooze_timeout;
+   for (i = 0; i < drv->state_count; i++)
+   drv->states[i].timeout = forced_wakeup_timeout(dev, drv, i);
 }
 
 static int snooze_loop(struct cpuidle_device *dev,
@@ -94,16 +80,20 @@ static int snooze_loop(struct cpuidle_device *dev,
int index)
 {
u64 snooze_exit_time;
+   u64 snooze_timeout = drv->states[index].timeout;
+
+   if (!snooze_timeout)
+   snooze_timeout = default_snooze_timeout;
 
set_thread_flag(TIF_POLLING_NRFLAG);
 
local_irq_enable();
 
-   snooze_exit_time = get_tb() + get_snooze_timeout(dev, drv, index);
+   snooze_exit_time = get_tb() + snooze_timeout;
ppc64_runlatch_off();
HMT_very_low();
while (!need_resched()) {
-   if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
+   if (get_tb() > snooze_exit_time) {
/*
 * Task has not woken up but we are exiting the polling
 * loop anyway. Require a barrier after polling is
@@ -168,7 +158,7 @@ static int stop_loop(struct cpuidle_device *dev,
u64 timeout_tb;
bool forced_wakeup = false;
 
-   timeout_tb = forced_wakeup_timeout(dev, drv, index);
+   timeout_tb = drv->states[index].timeout;
 
if (timeout_tb) {
/* Ensure that the timeout is at least one microsecond
@@ -263,6 +253,7 @@ static int powernv_cpuidle_driver_init(void)
 */
 
drv->cpumask = (struct cpumask *)cpu_present_mask;
+   drv->disable_callback = pnv_disable_callback;
 
return 0;
 }
@@ -422,8 +413,6 @@ static int powernv_idle_probe(void)
/* Device tree can indicate more idle states */
max_idle_state = powernv_add_idle_states();
default_snooze_timeout = TICK_USEC * tb_ticks_per_usec;
-   if (max_idle_state > 1)
-   snooze_timeout_en = true;
} else
return -ENODEV;
 
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 1729a497b..64195861b 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -50,6 +50,7 @@ struct cpuidle_state {
int power_usage; /* in mW */
unsigned inttarget_residency; /* in US */
booldisabled; /* disabled on all CPUs */
+   unsigned long long timeout; /* timeout for exiting out of a state */
 
int (*enter)(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
-- 
2.17.1



[PATCH v5 1/3] cpuidle-powernv : forced wakeup for stop states

2019-10-02 Thread Abhishek Goel
Currently, the cpuidle governors determine what idle state a idling CPU
should enter into based on heuristics that depend on the idle history on
that CPU. Given that no predictive heuristic is perfect, there are cases
where the governor predicts a shallow idle state, hoping that the CPU will
be busy soon. However, if no new workload is scheduled on that CPU in the
near future, the CPU may end up in the shallow state.

This is problematic, when the predicted state in the aforementioned
scenario is a shallow stop state on a tickless system. As we might get
stuck into shallow states for hours, in absence of ticks or interrupts.

To address this, We forcefully wakeup the cpu by setting the
decrementer. The decrementer is set to a value that corresponds with the
residency of the next available state. Thus firing up a timer that will
forcefully wakeup the cpu. Few such iterations will essentially train the
governor to select a deeper state for that cpu, as the timer here
corresponds to the next available cpuidle state residency. Thus, cpu will
eventually end up in the deepest possible state.

Signed-off-by: Abhishek Goel 
---

Auto-promotion
  v1 : started as auto promotion logic for cpuidle states in generic
driver
  v2 : Removed timeout_needed and rebased the code to upstream kernel
Forced-wakeup
  v1 : New patch with name of forced wakeup started
  v2 : Extending the forced wakeup logic for all states. Setting the
decrementer instead of queuing up a hrtimer to implement the logic.
  v3 : Cleanly handle setting/resetting of decrementer so as to not break
irq work
  v4 : Changed type and name of set/reset decrementer fucntion
   Handled irq_work_pending in try_set_dec_before_idle
  v5 : Removed forced wakeup for last stop state by changing the
   checking conditon of timeout_tb

 arch/powerpc/include/asm/time.h   |  2 ++
 arch/powerpc/kernel/time.c| 43 +++
 drivers/cpuidle/cpuidle-powernv.c | 40 
 3 files changed, 85 insertions(+)

diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 08dbe3e68..06a6a2314 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -184,6 +184,8 @@ static inline unsigned long tb_ticks_since(unsigned long 
tstamp)
 extern u64 mulhdu(u64, u64);
 #endif
 
+extern bool try_set_dec_before_idle(u64 timeout);
+extern void try_reset_dec_after_idle(void);
 extern void div128_by_32(u64 dividend_high, u64 dividend_low,
 unsigned divisor, struct div_result *dr);
 
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 694522308..d004c0d8e 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -576,6 +576,49 @@ void arch_irq_work_raise(void)
 
 #endif /* CONFIG_IRQ_WORK */
 
+/*
+ * This function tries setting decrementer before entering into idle.
+ * Returns true if we have reprogrammed the decrementer for idle.
+ * Returns false if the decrementer is unchanged.
+ */
+bool try_set_dec_before_idle(u64 timeout)
+{
+   u64 *next_tb = this_cpu_ptr(&decrementers_next_tb);
+   u64 now = get_tb_or_rtc();
+
+   if (now + timeout > *next_tb)
+   return false;
+
+   set_dec(timeout);
+   if (test_irq_work_pending())
+   set_dec(1);
+
+   return true;
+}
+
+/*
+ * This function gets called if we have set decrementer before
+ * entering into idle. It tries to reset/restore the decrementer
+ * to its original value.
+ */
+void try_reset_dec_after_idle(void)
+{
+   u64 now;
+   u64 *next_tb;
+
+   if (test_irq_work_pending())
+   return;
+
+   now = get_tb_or_rtc();
+   next_tb = this_cpu_ptr(&decrementers_next_tb);
+   if (now >= *next_tb)
+   return;
+
+   set_dec(*next_tb - now);
+   if (test_irq_work_pending())
+   set_dec(1);
+}
+
 /*
  * timer_interrupt - gets called when the decrementer overflows,
  * with interrupts disabled.
diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 84b1ebe21..d7686ce6e 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Expose only those Hardware idle states via the cpuidle framework
@@ -46,6 +47,26 @@ static struct stop_psscr_table 
stop_psscr_table[CPUIDLE_STATE_MAX] __read_mostly
 static u64 default_snooze_timeout __read_mostly;
 static bool snooze_timeout_en __read_mostly;
 
+static u64 forced_wakeup_timeout(struct cpuidle_device *dev,
+struct cpuidle_driver *drv,
+int index)
+{
+   int i;
+
+   for (i = index + 1; i < drv->state_count; i++) {
+   struct cpuidle_state *s = &drv->states[i];
+   struct cpuidle_state_usage *su = &dev->states_usage[i];
+
+   if (s->disabled || su->disable)
+   c

Re: [PATCH 1/2] x86: math-emu: check __copy_from_user result

2019-10-02 Thread Kees Cook
On Wed, Oct 02, 2019 at 09:11:23AM +0200, Arnd Bergmann wrote:
> On Wed, Oct 2, 2019 at 1:39 AM Kees Cook  wrote:
> 
> > > diff --git a/arch/x86/math-emu/reg_ld_str.c 
> > > b/arch/x86/math-emu/reg_ld_str.c
> > > index f3779743d15e..fe6246ff9887 100644
> > > --- a/arch/x86/math-emu/reg_ld_str.c
> > > +++ b/arch/x86/math-emu/reg_ld_str.c
> > > @@ -85,7 +85,7 @@ int FPU_load_extended(long double __user *s, int stnr)
> > >
> > >   RE_ENTRANT_CHECK_OFF;
> > >   FPU_access_ok(s, 10);
> > > - __copy_from_user(sti_ptr, s, 10);
> > > + FPU_copy_from_user(sti_ptr, s, 10);
> >
> > These access_ok() checks seem redundant everywhere in this file (after
> > your switch from __copy* to copy*. I mean, I guess, just leave them, but
> > *shrug*
> 
> There have always been duplicate/inconsistent for the get_user/put_user
> case. I considered cleaning it all up but then decided to touch it as little
> as possible.

Yeah, at this point, I'd agree. :)

-- 
Kees Cook


Re: [PATCH] kasan: fix the missing underflow in memmove and memcpy with CONFIG_KASAN_GENERIC=y

2019-10-02 Thread Dmitry Vyukov
On Thu, Oct 3, 2019 at 4:18 AM Walter Wu  wrote:
>
> On Wed, 2019-10-02 at 15:57 +0200, Dmitry Vyukov wrote:
> > On Wed, Oct 2, 2019 at 2:15 PM Walter Wu  wrote:
> > >
> > > On Mon, 2019-09-30 at 12:36 +0800, Walter Wu wrote:
> > > > On Fri, 2019-09-27 at 21:41 +0200, Dmitry Vyukov wrote:
> > > > > On Fri, Sep 27, 2019 at 4:22 PM Walter Wu  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, 2019-09-27 at 15:07 +0200, Dmitry Vyukov wrote:
> > > > > > > On Fri, Sep 27, 2019 at 5:43 AM Walter Wu 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > memmove() and memcpy() have missing underflow issues.
> > > > > > > > When -7 <= size < 0, then KASAN will miss to catch the 
> > > > > > > > underflow issue.
> > > > > > > > It looks like shadow start address and shadow end address is 
> > > > > > > > the same,
> > > > > > > > so it does not actually check anything.
> > > > > > > >
> > > > > > > > The following test is indeed not caught by KASAN:
> > > > > > > >
> > > > > > > > char *p = kmalloc(64, GFP_KERNEL);
> > > > > > > > memset((char *)p, 0, 64);
> > > > > > > > memmove((char *)p, (char *)p + 4, -2);
> > > > > > > > kfree((char*)p);
> > > > > > > >
> > > > > > > > It should be checked here:
> > > > > > > >
> > > > > > > > void *memmove(void *dest, const void *src, size_t len)
> > > > > > > > {
> > > > > > > > check_memory_region((unsigned long)src, len, false, 
> > > > > > > > _RET_IP_);
> > > > > > > > check_memory_region((unsigned long)dest, len, true, 
> > > > > > > > _RET_IP_);
> > > > > > > >
> > > > > > > > return __memmove(dest, src, len);
> > > > > > > > }
> > > > > > > >
> > > > > > > > We fix the shadow end address which is calculated, then generic 
> > > > > > > > KASAN
> > > > > > > > get the right shadow end address and detect this underflow 
> > > > > > > > issue.
> > > > > > > >
> > > > > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=199341
> > > > > > > >
> > > > > > > > Signed-off-by: Walter Wu 
> > > > > > > > Reported-by: Dmitry Vyukov 
> > > > > > > > ---
> > > > > > > >  lib/test_kasan.c   | 36 
> > > > > > > >  mm/kasan/generic.c |  8 ++--
> > > > > > > >  2 files changed, 42 insertions(+), 2 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/lib/test_kasan.c b/lib/test_kasan.c
> > > > > > > > index b63b367a94e8..8bd014852556 100644
> > > > > > > > --- a/lib/test_kasan.c
> > > > > > > > +++ b/lib/test_kasan.c
> > > > > > > > @@ -280,6 +280,40 @@ static noinline void __init 
> > > > > > > > kmalloc_oob_in_memset(void)
> > > > > > > > kfree(ptr);
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static noinline void __init 
> > > > > > > > kmalloc_oob_in_memmove_underflow(void)
> > > > > > > > +{
> > > > > > > > +   char *ptr;
> > > > > > > > +   size_t size = 64;
> > > > > > > > +
> > > > > > > > +   pr_info("underflow out-of-bounds in memmove\n");
> > > > > > > > +   ptr = kmalloc(size, GFP_KERNEL);
> > > > > > > > +   if (!ptr) {
> > > > > > > > +   pr_err("Allocation failed\n");
> > > > > > > > +   return;
> > > > > > > > +   }
> > > > > > > > +
> > > > > > > > +   memset((char *)ptr, 0, 64);
> > > > > > > > +   memmove((char *)ptr, (char *)ptr + 4, -2);
> > > > > > > > +   kfree(ptr);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static noinline void __init 
> > > > > > > > kmalloc_oob_in_memmove_overflow(void)
> > > > > > > > +{
> > > > > > > > +   char *ptr;
> > > > > > > > +   size_t size = 64;
> > > > > > > > +
> > > > > > > > +   pr_info("overflow out-of-bounds in memmove\n");
> > > > > > > > +   ptr = kmalloc(size, GFP_KERNEL);
> > > > > > > > +   if (!ptr) {
> > > > > > > > +   pr_err("Allocation failed\n");
> > > > > > > > +   return;
> > > > > > > > +   }
> > > > > > > > +
> > > > > > > > +   memset((char *)ptr, 0, 64);
> > > > > > > > +   memmove((char *)ptr + size, (char *)ptr, 2);
> > > > > > > > +   kfree(ptr);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  static noinline void __init kmalloc_uaf(void)
> > > > > > > >  {
> > > > > > > > char *ptr;
> > > > > > > > @@ -734,6 +768,8 @@ static int __init kmalloc_tests_init(void)
> > > > > > > > kmalloc_oob_memset_4();
> > > > > > > > kmalloc_oob_memset_8();
> > > > > > > > kmalloc_oob_memset_16();
> > > > > > > > +   kmalloc_oob_in_memmove_underflow();
> > > > > > > > +   kmalloc_oob_in_memmove_overflow();
> > > > > > > > kmalloc_uaf();
> > > > > > > > kmalloc_uaf_memset();
> > > > > > > > kmalloc_uaf2();
> > > > > > > > diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
> > > > > > > > index 616f9dd82d12..34ca23d59e67 100644
> > > > > > > > --- a/mm/kasan/generic.c
> > > > > > > > +++ b/mm/kasan/generic.c
> > > > > > > > @@ -131,9 +131,13 @@ static __always_inline bool 
> > > > > > > > memory_is_poisoned_n(

KASAN: use-after-free Read in rds_inc_put

2019-10-02 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:a32db7e1 Add linux-next specific files for 20191002
git tree:   linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=10ff857b60
kernel config:  https://syzkaller.appspot.com/x/.config?x=599cf05035799eef
dashboard link: https://syzkaller.appspot.com/bug?extid=322126673a98080e677f
compiler:   gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+322126673a98080e6...@syzkaller.appspotmail.com

==
BUG: KASAN: use-after-free in rds_inc_put+0x141/0x150 net/rds/recv.c:82
Read of size 8 at addr 88806f5371b0 by task syz-executor.0/18418

CPU: 1 PID: 18418 Comm: syz-executor.0 Not tainted 5.4.0-rc1-next-20191002  
#0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
 __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506
 kasan_report+0x12/0x20 mm/kasan/common.c:634
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
 rds_inc_put+0x141/0x150 net/rds/recv.c:82
 rds_clear_recv_queue+0x157/0x380 net/rds/recv.c:770
 rds_release+0x117/0x3d0 net/rds/af_rds.c:73
 __sock_release+0xce/0x280 net/socket.c:591
 sock_close+0x1e/0x30 net/socket.c:1269
 __fput+0x2ff/0x890 fs/file_table.c:280
 fput+0x16/0x20 fs/file_table.c:313
 task_work_run+0x145/0x1c0 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x904/0x2e60 kernel/exit.c:817
 do_group_exit+0x135/0x360 kernel/exit.c:921
 get_signal+0x47c/0x2500 kernel/signal.c:2734
 do_signal+0x87/0x1700 arch/x86/kernel/signal.c:815
 exit_to_usermode_loop+0x286/0x380 arch/x86/entry/common.c:159
 prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
 do_syscall_64+0x65f/0x760 arch/x86/entry/common.c:300
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x459a29
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00

RSP: 002b:7f4088663cf8 EFLAGS: 0246 ORIG_RAX: 00ca
RAX: fe00 RBX: 0075bf28 RCX: 00459a29
RDX:  RSI: 0080 RDI: 0075bf28
RBP: 0075bf20 R08:  R09: 
R10:  R11: 0246 R12: 0075bf2c
R13: 7ffe10e1ad8f R14: 7f40886649c0 R15: 0075bf2c

Allocated by task 12004:
 save_stack+0x23/0x90 mm/kasan/common.c:69
 set_track mm/kasan/common.c:77 [inline]
 __kasan_kmalloc mm/kasan/common.c:510 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:483
 kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:518
 slab_post_alloc_hook mm/slab.h:584 [inline]
 slab_alloc mm/slab.c:3319 [inline]
 kmem_cache_alloc+0x121/0x710 mm/slab.c:3483
 kmem_cache_zalloc include/linux/slab.h:680 [inline]
 __rds_conn_create+0x63f/0x20b0 net/rds/connection.c:193
 rds_conn_create_outgoing+0x4b/0x60 net/rds/connection.c:351
 rds_sendmsg+0x19a4/0x35b0 net/rds/send.c:1294
 sock_sendmsg_nosec net/socket.c:638 [inline]
 sock_sendmsg+0xd7/0x130 net/socket.c:658
 __sys_sendto+0x262/0x380 net/socket.c:1953
 __do_sys_sendto net/socket.c:1965 [inline]
 __se_sys_sendto net/socket.c:1961 [inline]
 __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1961
 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 9330:
 save_stack+0x23/0x90 mm/kasan/common.c:69
 set_track mm/kasan/common.c:77 [inline]
 kasan_set_free_info mm/kasan/common.c:332 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:471
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:480
 __cache_free mm/slab.c:3425 [inline]
 kmem_cache_free+0x86/0x320 mm/slab.c:3693
 rds_conn_destroy+0x61f/0x880 net/rds/connection.c:501
 rds_loop_kill_conns net/rds/loop.c:213 [inline]
 rds_loop_exit_net+0x2fc/0x4a0 net/rds/loop.c:219
 ops_exit_list.isra.0+0xaa/0x150 net/core/net_namespace.c:172
 cleanup_net+0x4e2/0xa60 net/core/net_namespace.c:594
 process_one_work+0x9af/0x1740 kernel/workqueue.c:2269
 worker_thread+0x98/0xe40 kernel/workqueue.c:2415
 kthread+0x361/0x430 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at 88806f537168
 which belongs to the cache rds_connection of size 232
The buggy address is located 72 bytes inside of
 232-byte region [88806f537168, 88806f537250)
The buggy address belongs to the page:
page:ea0001bd4dc0 refcount:1 mapcount:0 mapping:88809bcbd1c0  
index:0x8

Re: [PATCH v3 1/3] mm: kmemleak: Make the tool tolerant to struct scan_area allocation failures

2019-10-02 Thread Alexey Kardashevskiy



On 13/08/2019 02:06, Catalin Marinas wrote:
> Object scan areas are an optimisation aimed to decrease the false
> positives and slightly improve the scanning time of large objects known
> to only have a few specific pointers. If a struct scan_area fails to
> allocate, kmemleak can still function normally by scanning the full
> object.
> 
> Introduce an OBJECT_FULL_SCAN flag and mark objects as such when
> scan_area allocation fails.


I came across this one while bisecting sudden drop in throughput of a 100Gbit 
Mellanox CX4 ethernet card in a PPC POWER9
system, the speed dropped from 100Gbit to about 40Gbit. Bisect pointed at 
dba82d943177, this are the relevant config
options:

[fstn1-p1 kernel]$ grep KMEMLEAK ~/pbuild/kernel-le-4g/.config
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=16000
# CONFIG_DEBUG_KMEMLEAK_TEST is not set
# CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y

Setting CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=400 or even 4000 (this is what 
KMEMLEAK_EARLY_LOG_SIZE is now in the master)
produces soft lockups on the recent upstream (sha1 a3c0e7b1fe1f):

[c01fde64fb60] [c0c24ed4] _raw_write_unlock_irqrestore+0x54/0x70
[c01fde64fb90] [c04117e4] find_and_remove_object+0xa4/0xd0
[c01fde64fbe0] [c0411c74] delete_object_full+0x24/0x50
[c01fde64fc00] [c0411d28] __kmemleak_do_cleanup+0x88/0xd0
[c01fde64fc40] [c012a1a4] process_one_work+0x374/0x6a0
[c01fde64fd20] [c012a548] worker_thread+0x78/0x5a0
[c01fde64fdb0] [c0135508] kthread+0x198/0x1a0
[c01fde64fe20] [c000b980] ret_from_kernel_thread+0x5c/0x7c

KMEMLEAK_EARLY_LOG_SIZE=8000 works but slow.

Interestingly KMEMLEAK_EARLY_LOG_SIZE=400 on dba82d943177 still worked and I 
saw my 100Gbit. Disabling KMEMLEAK also
fixes the speed (apparently).

Is that something expected? Thanks,



> 
> Signed-off-by: Catalin Marinas 
> ---
>  mm/kmemleak.c | 16 ++--
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index f6e602918dac..5ba7fad00fda 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -168,6 +168,8 @@ struct kmemleak_object {
>  #define OBJECT_REPORTED  (1 << 1)
>  /* flag set to not scan the object */
>  #define OBJECT_NO_SCAN   (1 << 2)
> +/* flag set to fully scan the object when scan_area allocation failed */
> +#define OBJECT_FULL_SCAN (1 << 3)
>  
>  #define HEX_PREFIX   ""
>  /* number of bytes to print per line; must be 16 or 32 */
> @@ -773,12 +775,14 @@ static void add_scan_area(unsigned long ptr, size_t 
> size, gfp_t gfp)
>   }
>  
>   area = kmem_cache_alloc(scan_area_cache, gfp_kmemleak_mask(gfp));
> - if (!area) {
> - pr_warn("Cannot allocate a scan area\n");
> - goto out;
> - }
>  
>   spin_lock_irqsave(&object->lock, flags);
> + if (!area) {
> + pr_warn_once("Cannot allocate a scan area, scanning the full 
> object\n");
> + /* mark the object for full scan to avoid false positives */
> + object->flags |= OBJECT_FULL_SCAN;
> + goto out_unlock;
> + }
>   if (size == SIZE_MAX) {
>   size = object->pointer + object->size - ptr;
>   } else if (ptr + size > object->pointer + object->size) {
> @@ -795,7 +799,6 @@ static void add_scan_area(unsigned long ptr, size_t size, 
> gfp_t gfp)
>   hlist_add_head(&area->node, &object->area_list);
>  out_unlock:
>   spin_unlock_irqrestore(&object->lock, flags);
> -out:
>   put_object(object);
>  }
>  
> @@ -1408,7 +1411,8 @@ static void scan_object(struct kmemleak_object *object)
>   if (!(object->flags & OBJECT_ALLOCATED))
>   /* already freed object */
>   goto out;
> - if (hlist_empty(&object->area_list)) {
> + if (hlist_empty(&object->area_list) ||
> + object->flags & OBJECT_FULL_SCAN) {
>   void *start = (void *)object->pointer;
>   void *end = (void *)(object->pointer + object->size);
>   void *next;
> 

-- 
Alexey


Re: [PATCH] rt2x00: remove input-polldev.h header

2019-10-02 Thread Kalle Valo
Dmitry Torokhov  writes:

> The driver does not use input subsystem so we do not need this header,
> and it is being removed, so stop pulling it in.
>
> Signed-off-by: Dmitry Torokhov 

I'll queue this for v5.4.

-- 
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


Re: [PATCH 6/6] arm64: tegra: Enable XUSB host in P2972-0000 board

2019-10-02 Thread JC Kuo
On 10/2/19 6:26 PM, Thierry Reding wrote:
> On Wed, Oct 02, 2019 at 04:00:51PM +0800, JC Kuo wrote:
>> This commit enables XUSB host and pad controller in Tegra194
>> P2972- board.
>>
>> Signed-off-by: JC Kuo 
>> ---
>>  .../arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 31 +-
>>  .../boot/dts/nvidia/tegra194-p2972-.dts   | 59 +++
>>  2 files changed, 89 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi 
>> b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
>> index 4c38426a6969..cb236edc6a0d 100644
>> --- a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
>> +++ b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
>> @@ -229,7 +229,7 @@
>>  regulator-max-microvolt = 
>> <330>;
>>  };
>>  
>> -ldo5 {
>> +vdd_usb_3v3: ldo5 {
>>  regulator-name = "VDD_USB_3V3";
>>  regulator-min-microvolt = 
>> <330>;
>>  regulator-max-microvolt = 
>> <330>;
>> @@ -313,5 +313,34 @@
>>  regulator-boot-on;
>>  enable-active-low;
>>  };
>> +
>> +vdd_5v_sata: regulator@4 {
>> +compatible = "regulator-fixed";
>> +reg = <4>;
>> +
>> +regulator-name = "vdd-5v-sata";
> 
> Please keep capitalization of regulator names consistent. We use
> all-caps and underscores for the others (which mirrors the names in the
> schematics), so please stick with that for this one as well.
> 
Sure. I will fix this.
> Also, I'm wondering if perhaps you can clarify something here. My
> understanding is that SATA functionality is provided via a controller on
> the PCI bus. Why is it that we route the 5 V SATA power to the USB port?
> Oh wait... this is one of those eSATAp (hybrid) ports that can take
> either eSATA or USB, right? Do we need any additional setup to switch
> between eSATA and USB modes? Or is this all done in hardware? That is,
> if I plug in an eSATA, does it automatically hotplug detect the device
> as SATA and if I plug in a USB device, does it automatically detect it
> as USB?
> 
Yes, this 5V supply is for the eSATAp port. eSATA cable will deliver the 5V to
SATA device. No SATA/USB switch is required as USB SuperSpeed and PCIE-to-SATA
controller each has its own UPHY lane. SATA TX/RX pairs also have dedicated pins
at the eSATAp connector. External SATA drive can be hotplug and hardware will
detect automatically.

>> +regulator-min-microvolt = <500>;
>> +regulator-max-microvolt = <500>;
>> +gpio = <&gpio TEGRA194_MAIN_GPIO(Z, 1) GPIO_ACTIVE_LOW>;
> 
> This will actually cause a warning on boot. For fixed regulators the
> polarity of the enable GPIO is not specified in the GPIO specifier.
> Instead you're supposed to use the boolean enable-active-high property
> to specify if the enable GPIO is active-high. By default the enable GPIO
> is considered to be active-low. The GPIO specifier needs to have the
> GPIO_ACTIVE_HIGH flag set regardless for backwards-compatibilitiy
> reasons.
> 
> Note that regulator@3 above your new entry does this wrongly, but
> next-20191002 should have a fix for that.
> 
Thanks for the information. I will fix this in the next revision.
>> +};
>> +};
>> +
>> +padctl@352 {
> 
> Don't forget to move this into /cbb as well to match the changes to
> patch 5/6.
> 
Sure, will do.
>> +avdd-usb-supply = <&vdd_usb_3v3>;
>> +vclamp-usb-supply = <&vdd_1v8ao>;
>> +ports {
> 
> Blank line between the above two for better readability.
> 
Okay.
>> +usb2-1 {
>> +vbus-supply = <&vdd_5v0_sys>;
>> +};
>> +usb2-3 {
> 
> Same here and below.
> 
>> +vbus-supply = <&vdd_5v_sata>;
>> +};
>> +usb3-0 {
>> +vbus-supply = <&vdd_5v0_sys>;
>> +};
>> +usb3-3 {
>> +vbus-supply = <&vdd_5v0_sys&g

Re: [PATCH 2/3] x86/alternatives,jump_label: Provide better text_poke() batching interface

2019-10-02 Thread Masami Hiramatsu
Hi Peter,

On Tue, 27 Aug 2019 20:06:24 +0200
Peter Zijlstra  wrote:

> Adding another text_poke_bp_batch() user made me realize the interface
> is all sorts of wrong. The text poke vector should be internal to the
> implementation.
> 
> This then results in a trivial interface:
> 
>   text_poke_queue()  - which has the 'normal' text_poke_bp() interface
>   text_poke_finish() - which takes no arguments and flushes any
>pending text_poke()s.

Looks good to me. Maybe it is easy to apply to optprobe too.

Reviewed-by: Masami Hiramatsu 

Thank you,

> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Cc: Steven Rostedt 
> Cc: Daniel Bristot de Oliveira 
> Cc: Masami Hiramatsu 
> ---
>  arch/x86/include/asm/text-patching.h |   16 ++-
>  arch/x86/kernel/alternative.c|   64 +---
>  arch/x86/kernel/jump_label.c |   80 
> +--
>  3 files changed, 84 insertions(+), 76 deletions(-)
> 
> --- a/arch/x86/include/asm/text-patching.h
> +++ b/arch/x86/include/asm/text-patching.h
> @@ -25,14 +25,6 @@ static inline void apply_paravirt(struct
>   */
>  #define POKE_MAX_OPCODE_SIZE 5
>  
> -struct text_poke_loc {
> - void *addr;
> - int len;
> - s32 rel32;
> - u8 opcode;
> - const char text[POKE_MAX_OPCODE_SIZE];
> -};
> -
>  extern void text_poke_early(void *addr, const void *opcode, size_t len);
>  
>  /*
> @@ -53,13 +45,15 @@ extern void *text_poke(void *addr, const
>  extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
>  extern int poke_int3_handler(struct pt_regs *regs);
>  extern void text_poke_bp(void *addr, const void *opcode, size_t len, const 
> void *emulate);
> -extern void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int 
> nr_entries);
> -extern void text_poke_loc_init(struct text_poke_loc *tp, void *addr,
> -const void *opcode, size_t len, const void 
> *emulate);
> +
> +extern void text_poke_queue(void *addr, const void *opcode, size_t len, 
> const void *emulate);
> +extern void text_poke_finish(void);
> +
>  extern int after_bootmem;
>  extern __ro_after_init struct mm_struct *poking_mm;
>  extern __ro_after_init unsigned long poking_addr;
>  
> +
>  #ifndef CONFIG_UML_X86
>  static inline void int3_emulate_jmp(struct pt_regs *regs, unsigned long ip)
>  {
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -936,6 +936,14 @@ static void do_sync_core(void *info)
>   sync_core();
>  }
>  
> +struct text_poke_loc {
> + void *addr;
> + int len;
> + s32 rel32;
> + u8 opcode;
> + const char text[POKE_MAX_OPCODE_SIZE];
> +};
> +
>  static struct bp_patching_desc {
>   struct text_poke_loc *vec;
>   int nr_entries;
> @@ -1017,6 +1025,10 @@ int poke_int3_handler(struct pt_regs *re
>  }
>  NOKPROBE_SYMBOL(poke_int3_handler);
>  
> +#define TP_VEC_MAX (PAGE_SIZE / sizeof(struct text_poke_loc))
> +static struct text_poke_loc tp_vec[TP_VEC_MAX];
> +static int tp_vec_nr;
> +
>  /**
>   * text_poke_bp_batch() -- update instructions on live kernel on SMP
>   * @tp:  vector of instructions to patch
> @@ -1038,7 +1050,7 @@ NOKPROBE_SYMBOL(poke_int3_handler);
>   * replacing opcode
>   *   - sync cores
>   */
> -void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries)
> +static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int 
> nr_entries)
>  {
>   unsigned char int3 = INT3_INSN_OPCODE;
>   int patched_all_but_first = 0;
> @@ -1105,11 +1117,7 @@ void text_poke_loc_init(struct text_poke
>  {
>   struct insn insn;
>  
> - if (!opcode)
> - opcode = (void *)tp->text;
> - else
> - memcpy((void *)tp->text, opcode, len);
> -
> + memcpy((void *)tp->text, opcode, len);
>   if (!emulate)
>   emulate = opcode;
>  
> @@ -1147,6 +1155,50 @@ void text_poke_loc_init(struct text_poke
>   }
>  }
>  
> +/*
> + * We hard rely on the tp_vec being ordered; ensure this is so by flushing
> + * early if needed.
> + */
> +static bool tp_order_fail(void *addr)
> +{
> + struct text_poke_loc *tp;
> +
> + if (!tp_vec_nr)
> + return false;
> +
> + if (!addr) /* force */
> + return true;
> +
> + tp = &tp_vec[tp_vec_nr - 1];
> + if ((unsigned long)tp->addr > (unsigned long)addr)
> + return true;
> +
> + return false;
> +}
> +
> +static void text_poke_flush(void *addr)
> +{
> + if (tp_vec_nr == TP_VEC_MAX || tp_order_fail(addr)) {
> + text_poke_bp_batch(tp_vec, tp_vec_nr);
> + tp_vec_nr = 0;
> + }
> +}
> +
> +void text_poke_finish(void)
> +{
> + text_poke_flush(NULL);
> +}
> +
> +void text_poke_queue(void *addr, const void *opcode, size_t len, const void 
> *emulate)
> +{
> + struct text_poke_loc *tp;
> +
> + text_poke_flush(addr);
> +
> + tp = &tp_vec[tp_vec_nr++];
> + text_poke_loc_init(tp,

Re: [PATCH 3/3] x86/ftrace: Use text_poke()

2019-10-02 Thread Masami Hiramatsu
On Wed, 2 Oct 2019 18:35:26 +0200
Daniel Bristot de Oliveira  wrote:

> ftrace was already batching the updates, for instance, causing 3 IPIs to 
> enable
> all functions. The text_poke() batching also works. But because of the limited
> buffer [ see the reply to the patch 2/3 ], it is flushing the buffer during 
> the
> operation, causing more IPIs than the previous code. Using the 5.4-rc1 in a 
> VM,
> when enabling the function tracer, I see 250+ intermediate text_poke_finish()
> because of a full buffer...

Would you have any performance numbers of the previous code and applying this?

Thank you,


-- 
Masami Hiramatsu 


RE: [PATCH] Input: hyperv-keyboard: Add the support of hibernation

2019-10-02 Thread Dexuan Cui
> From: dmitry.torok...@gmail.com 
> Sent: Monday, September 30, 2019 4:07 PM
> 
> On Mon, Sep 30, 2019 at 10:09:27PM +, Dexuan Cui wrote:
> > > From: dmitry.torok...@gmail.com 
> > > Sent: Friday, September 27, 2019 5:32 PM
> > > > ...
> > > > pm_wakeup_pending() is tested in a lot of places in the suspend
> > > > process and eventually an unintentional keystroke (or mouse movement,
> > > > when it comes to the Hyper-V mouse driver drivers/hid/hid-hyperv.c)
> > > > causes the whole hibernation process to be aborted. Usually this
> > > > behavior is not expected by the user, I think.
> > >
> > > Why not? If a device is configured as wakeup source, then it activity
> > > should wake up the system, unless you disable it.
> >
> > Generally speaking, I agree, but compared to a physical machine, IMO
> > the scenario is a little different when it comes to a VM running on Hyper-V:
> > on the host there is a window that represents the VM, and the user can
> > unintentionally switch the keyboard input focus to the window (or move
> > the mouse/cursor over the window) and then the host automatically
> > sends some special keystrokes (and mouse events) , and this aborts the
> > hibernation process.
> >
> > And, when it comes to the Hyper-V mouse device, IMO it's easy for the
> > user to unintentionally move the mouse after the "hibernation" button
> > is clicked. I suppose a physical machine would have the same issue, though.
> 
> If waking the machine up by mouse/keyboard activity is not desired in
> Hyper-V environment, then simply disable them as wakeup sources.

Sorry for the late reply! I have been sidetracked by something else...

Several years ago, we marked Hyper-V mouse/keyboard devices as wake 
sources to fix such a bug: the VM can not be woken up after we run
"echo freeze > /sys/power/state". IMO we should keep the mouse/keyboard
as wakeup sources.
 
> >
> > > > So, I use the notifier to set the flag variable and with it the driver 
> > > > can
> > > > know when it should not call pm_wakeup_hard_event().
> > >
> > > No, please implement hibernation support properly, as notifier + flag is
> > > a hack.
> >
> > The keyboard/mouse driver can avoid the flag by disabling the
> > keyboard/mouse event handling, but the problem is that they don't know
> > when exactly they should disable the event handling. I think the PM
> > notifier is the only way to tell the drivers a hibernation process is 
> > ongoing.
> 
> Whatever initiates hibernation (in userspace) can adjust wakeup sources
> as needed if you want them disabled completely.

Good to know this! I just found the userspace is able to disable the Hyper-V
mouse/keyboard as wakeup sources by something like:

echo disabled >  /sys/bus/vmbus/devices/XXX/power/wakeup
(XXX is the device GUID).
 
> >
> > Do you think this idea (notifier + disabling event handling) is acceptable?
> 
> No, I believe this a hack, that is why I am pushing back on this.

Ok, I think we can get rid of the notifier completely, and tell the users to 
disable
the 2 wakeup sources, if they think the wakeup behavior is undesired.
 
> >
> > If not, then I'll have to remove the notifier completely, and document this 
> > as
> > a known issue to the user: when a hibernation process is started, be careful
> > to not switch input focus and not touch the keyboard/mouse until the
> > hibernation process is finished. :-)
> >
> > > In this particular case you do not want to have your
> > > hv_kbd_resume() to be called in place of pm_ops->thaw() as that is what
> > > reenables the keyboard vmbus channel and causes the undesired wakeup
> > > events.
> >
> > This is only part of the issues. Another example: before the
> > pm_ops()->freeze()'s of all the devices are called, pm_wakeup_pending()
> > is already tested in a lot of places (e.g. in try_to_freeze_tasks ()) in the
> > suspend process, and can abort the whole suspend process upon the user's
> > unintentional input focus switch, keystroke and mouse movement.
> 
> How long is the prepare() phase on your systems? 

I have no specific data, but I know it's fast.

> User may wiggle mouse at any time really, even before the notifier fires up.

This doesn't matter, because the counter "pm_abort_suspend" is cleared at
a later place. The code path is:

hibernate() ->
  __pm_notifier_call_chain(PM_HIBERNATION_PREPARE, -1, &nr_calls)
  freeze_processes() ->
pm_wakeup_clear() -> 
  atomic_set(&pm_abort_suspend, 0);

This patch sets the flag in the PM_HIBERNATION_PREPARE notifier, so
there is no race.

Since I'm going to get rid of the notifier, we don't care at all about this now.
 
> >
> > > Your vmbus implementation should allow individual drivers to
> > > control the set of PM operations that they wish to use, instead of
> > > forcing everything through suspend/resume.
> > >
> > > Dmitry
> >
> > Since the devices are pure software-emulated devices, no PM operation was
> > supported in the past, and now suspend/resume are the only two PM
> operat

Re: [PATCH v2 00/21] Refine memblock API

2019-10-02 Thread Mike Rapoport
(trimmed the CC)

On Wed, Oct 02, 2019 at 06:14:11AM -0500, Adam Ford wrote:
> On Wed, Oct 2, 2019 at 2:36 AM Mike Rapoport  wrote:
> >
> 
> Before the patch:
> 
> # cat /sys/kernel/debug/memblock/memory
>0: 0x1000..0x8fff
> # cat /sys/kernel/debug/memblock/reserved
>0: 0x10004000..0x10007fff
>   34: 0x2f88..0x3fff
> 
> 
> After the patch:
> # cat /sys/kernel/debug/memblock/memory
>0: 0x1000..0x8fff
> # cat /sys/kernel/debug/memblock/reserved
>0: 0x10004000..0x10007fff
>   36: 0x8000..0x8fff

I'm still not convinced that the memblock refactoring didn't uncovered an
issue in etnaviv driver.

Why moving the CMA area from 0x8000 to 0x3000 makes it fail?

BTW, the code that complained about "command buffer outside valid memory
window" has been removed by the commit 17e4660ae3d7 ("drm/etnaviv:
implement per-process address spaces on MMUv2"). 

Could be that recent changes to MMU management of etnaviv resolve the
issue?

> > From 06529f861772b7dea2912fc2245debe4690139b8 Mon Sep 17 00:00:00 2001
> > From: Mike Rapoport 
> > Date: Wed, 2 Oct 2019 10:14:17 +0300
> > Subject: [PATCH] mm: memblock: do not enforce current limit for 
> > memblock_phys*
> >  family
> >
> > Until commit 92d12f9544b7 ("memblock: refactor internal allocation
> > functions") the maximal address for memblock allocations was forced to
> > memblock.current_limit only for the allocation functions returning virtual
> > address. The changes introduced by that commit moved the limit enforcement
> > into the allocation core and as a result the allocation functions returning
> > physical address also started to limit allocations to
> > memblock.current_limit.
> >
> > This caused breakage of etnaviv GPU driver:
> >
> > [3.682347] etnaviv etnaviv: bound 13.gpu (ops gpu_ops)
> > [3.688669] etnaviv etnaviv: bound 134000.gpu (ops gpu_ops)
> > [3.695099] etnaviv etnaviv: bound 2204000.gpu (ops gpu_ops)
> > [3.700800] etnaviv-gpu 13.gpu: model: GC2000, revision: 5108
> > [3.723013] etnaviv-gpu 13.gpu: command buffer outside valid
> > memory window
> > [3.731308] etnaviv-gpu 134000.gpu: model: GC320, revision: 5007
> > [3.752437] etnaviv-gpu 134000.gpu: command buffer outside valid
> > memory window
> > [3.760583] etnaviv-gpu 2204000.gpu: model: GC355, revision: 1215
> > [3.766766] etnaviv-gpu 2204000.gpu: Ignoring GPU with VG and FE2.0
> >
> > Restore the behaviour of memblock_phys* family so that these functions will
> > not enforce memblock.current_limit.
> >
> 
> This fixed the issue.  Thank you
> 
> Tested-by: Adam Ford  #imx6q-logicpd
> 
> > Fixes: 92d12f9544b7 ("memblock: refactor internal allocation functions")
> > Reported-by: Adam Ford 
> > Signed-off-by: Mike Rapoport 
> > ---
> >  mm/memblock.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index 7d4f61a..c4b16ca 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -1356,9 +1356,6 @@ static phys_addr_t __init 
> > memblock_alloc_range_nid(phys_addr_t size,
> > align = SMP_CACHE_BYTES;
> > }
> >
> > -   if (end > memblock.current_limit)
> > -   end = memblock.current_limit;
> > -
> >  again:
> > found = memblock_find_in_range_node(size, align, start, end, nid,
> > flags);
> > @@ -1469,6 +1466,9 @@ static void * __init memblock_alloc_internal(
> > if (WARN_ON_ONCE(slab_is_available()))
> > return kzalloc_node(size, GFP_NOWAIT, nid);
> >
> > +   if (max_addr > memblock.current_limit)
> > +   max_addr = memblock.current_limit;
> > +
> > alloc = memblock_alloc_range_nid(size, align, min_addr, max_addr, 
> > nid);
> >
> > /* retry allocation without lower limit */
> > --
> > 2.7.4
> >
> >
> > > > adam
> > > >
> > > > On Sat, Sep 28, 2019 at 2:33 AM Mike Rapoport  
> > > > wrote:
> > > > >
> > > > > On Thu, Sep 26, 2019 at 02:35:53PM -0500, Adam Ford wrote:
> > > > > > On Thu, Sep 26, 2019 at 11:04 AM Mike Rapoport  
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > On Thu, Sep 26, 2019 at 08:09:52AM -0500, Adam Ford wrote:
> > > > > > > > On Wed, Sep 25, 2019 at 10:17 AM Fabio Estevam 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Sep 25, 2019 at 9:17 AM Adam Ford 
> > > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > > I tried cma=256M and noticed the cma dump at the beginning 
> > > > > > > > > > didn't
> > > > > > > > > > change.  Do we need to setup a reserved-memory node like
> > > > > > > > > > imx6ul-ccimx6ulsom.dtsi did?
> > > > > > > > >
> > > > > > > > > I don't think so.
> > > > > > > > >
> > > > > > > > > Were you able to identify what was the exact commit that 
> > > > > > > > > caused such regression?
> > > > > > > >
> > > > > > > > I was able to narrow it down the 92d12f9544b7 ("memblock: 
> > > > > > 

Re: [PATCH v3 2/3] media: i2c: Add IMX290 CMOS image sensor driver

2019-10-02 Thread Manivannan Sadhasivam
Hi Sakari,

On Wed, Oct 02, 2019 at 01:37:15PM +0300, Sakari Ailus wrote:
> Hi Manivannan,
> 
> On Wed, Oct 02, 2019 at 12:12:00AM +0530, Manivannan Sadhasivam wrote:
> > Hi Sakari,
> > 
> > On Mon, Sep 23, 2019 at 12:22:09PM +0300, Sakari Ailus wrote:
> > > Hi Manivannan,
> > > 
> > > On Fri, Aug 30, 2019 at 02:49:42PM +0530, Manivannan Sadhasivam wrote:
> > > > Add driver for Sony IMX290 CMOS image sensor driver. The driver only
> > > > supports I2C interface for programming and MIPI CSI-2 for sensor output.
> > > > 
> > > > Signed-off-by: Manivannan Sadhasivam 
> > > > ---
> > > >  drivers/media/i2c/Kconfig  |  11 +
> > > >  drivers/media/i2c/Makefile |   1 +
> > > >  drivers/media/i2c/imx290.c | 881 +
> > > >  3 files changed, 893 insertions(+)
> > > >  create mode 100644 drivers/media/i2c/imx290.c
> > > > 
> > > > diff --git a/drivers/media/i2c/Kconfig b/drivers/media/i2c/Kconfig
> > > > index 79ce9ec6fc1b..4ebb80b18748 100644
> > > > --- a/drivers/media/i2c/Kconfig
> > > > +++ b/drivers/media/i2c/Kconfig
> > > > @@ -595,6 +595,17 @@ config VIDEO_IMX274
> > > >   This is a V4L2 sensor driver for the Sony IMX274
> > > >   CMOS image sensor.
> > > >  
> > > > +config VIDEO_IMX290
> > > > +   tristate "Sony IMX290 sensor support"
> > > > +   depends on I2C && VIDEO_V4L2 && VIDEO_V4L2_SUBDEV_API
> > > > +   depends on MEDIA_CAMERA_SUPPORT
> > > 
> > > Please drop this line. It will be redundant very soon.
> > > 
> > 
> > okay.
> > 
> > > > +   help
> > > > + This is a Video4Linux2 sensor driver for the Sony
> > > > + IMX290 camera sensor.
> > > > +
> > > > + To compile this driver as a module, choose M here: the
> > > > + module will be called imx290.
> > > > +
> > > >  config VIDEO_IMX319
> > > > tristate "Sony IMX319 sensor support"
> > > > depends on I2C && VIDEO_V4L2 && VIDEO_V4L2_SUBDEV_API
> > > > diff --git a/drivers/media/i2c/Makefile b/drivers/media/i2c/Makefile
> > > > index fd4ea86dedd5..04411ddb4922 100644
> > > > --- a/drivers/media/i2c/Makefile
> > > > +++ b/drivers/media/i2c/Makefile
> > > > @@ -111,6 +111,7 @@ obj-$(CONFIG_VIDEO_TC358743)+= tc358743.o
> > > >  obj-$(CONFIG_VIDEO_IMX214) += imx214.o
> > > >  obj-$(CONFIG_VIDEO_IMX258) += imx258.o
> > > >  obj-$(CONFIG_VIDEO_IMX274) += imx274.o
> > > > +obj-$(CONFIG_VIDEO_IMX290) += imx290.o
> > > >  obj-$(CONFIG_VIDEO_IMX319) += imx319.o
> > > >  obj-$(CONFIG_VIDEO_IMX355) += imx355.o
> > > >  obj-$(CONFIG_VIDEO_ST_MIPID02) += st-mipid02.o
> > > > diff --git a/drivers/media/i2c/imx290.c b/drivers/media/i2c/imx290.c
> > > > new file mode 100644
> > > > index ..db5bb0d69eb8
> > > > --- /dev/null
> > > > +++ b/drivers/media/i2c/imx290.c
> > > > @@ -0,0 +1,881 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + * Sony IMX290 CMOS Image Sensor Driver
> > > > + *
> > > > + * Copyright (C) 2019 FRAMOS GmbH.
> > > > + *
> > > > + * Copyright (C) 2019 Linaro Ltd.
> > > > + * Author: Manivannan Sadhasivam 
> > > > + */
> > > > +
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +
> > > > +#define IMX290_STANDBY 0x3000
> > > > +#define IMX290_REGHOLD 0x3001
> > > > +#define IMX290_XMSTA 0x3002
> > > > +#define IMX290_GAIN 0x3014
> > > > +
> > > > +#define IMX290_DEFAULT_LINK_FREQ 44550
> > > > +
> > > > +static const char * const imx290_supply_name[] = {
> > > > +   "vdda",
> > > > +   "vddd",
> > > > +   "vdddo",
> > > > +};
> > > > +
> > > > +#define IMX290_NUM_SUPPLIES ARRAY_SIZE(imx290_supply_name)
> > > > +
> > > > +struct imx290_regval {
> > > > +   u16 reg;
> > > > +   u8 val;
> > > > +};
> > > > +
> > > > +struct imx290_mode {
> > > > +   u32 width;
> > > > +   u32 height;
> > > > +   u32 pixel_rate;
> > > > +   u32 link_freq_index;
> > > > +
> > > > +   const struct imx290_regval *data;
> > > > +   u32 data_size;
> > > > +};
> > > > +
> > > > +struct imx290 {
> > > > +   struct device *dev;
> > > > +   struct clk *xclk;
> > > > +   struct regmap *regmap;
> > > > +
> > > > +   struct v4l2_subdev sd;
> > > > +   struct v4l2_fwnode_endpoint ep;
> > > > +   struct media_pad pad;
> > > > +   struct v4l2_mbus_framefmt current_format;
> > > > +   const struct imx290_mode *current_mode;
> > > > +
> > > > +   struct regulator_bulk_data supplies[IMX290_NUM_SUPPLIES];
> > > > +   struct gpio_desc *rst_gpio;
> > > > +
> > > > +   struct v4l2_ctrl_handler ctrls;
> > > > +   struct v4l2_ctrl *link_freq;
> > > > +   struct v4l2_ctrl *pixel_rate;
> > > > +
> > > > +   struct mutex lock;
> > > > +};
> > > > +
> > > > +struct imx290_pixfmt {
> > > > +   u32 code;
> > > > +};
> >

Re: [PATCH v2 3/3] RISC-V: Move SBI related macros under uapi.

2019-10-02 Thread Anup Patel
On Sat, Sep 28, 2019 at 3:51 AM Christoph Hellwig  wrote:
>
> On Thu, Sep 26, 2019 at 05:09:15PM -0700, Atish Patra wrote:
> > All SBI related macros can be reused by KVM RISC-V and userspace tools
> > such as kvmtool, qemu-kvm. SBI calls can also be emulated by userspace
> > if required. Any future vendor extensions can leverage this to emulate
> > the specific extension in userspace instead of kernel.
>
> Just because userspace can use them that doesn't mean they are a
> userspace API.  Please don't do this as this limits how we can ever
> remove previously existing symbols.  Just copy over the current
> version of the file into the other project of your choice instead
> of creating and API we need to maintain.

These defines are indeed part of KVM userspace API because we will
be forwarding SBI calls not handled by KVM RISC-V kernel module to
KVM userspace (QEMU/KVMTOOL). The forwarded SBI call details
are passed to userspace via "struct kvm_run" of KVM_RUN ioctl.

Please refer PATCH17 and PATCH18 of KVM RISC-V v8 series.

Currently, we implement SBI v0.1 for KVM Guest hence we end-up
forwarding CONSOLE_GETCHAR and CONSOLE_PUTCHART to
KVM userspace.

In future we will implement SBI v0.2 for KVM Guest where we will be
forwarding the SBI v0.2 experimental and vendor extension calls
to KVM userspace.

Eventually, we will stop emulating SBI v0.1 for Guest once we have
all required calls in SBI v0.2. At that time, all SBI v0.1 calls will be
always forwarded to KVM userspace.

Regards,
Anup


Re: [rfc] mm, hugetlb: allow hugepage allocations to excessively reclaim

2019-10-02 Thread Michal Hocko
On Wed 02-10-19 16:03:03, David Rientjes wrote:
> Hugetlb allocations use __GFP_RETRY_MAYFAIL to aggressively attempt to get 
> hugepages that the user needs.  Commit b39d0ee2632d ("mm, page_alloc: 
> avoid expensive reclaim when compaction may not succeed") intends to 
> improve allocator behind for thp allocations to prevent excessive amounts 
> of reclaim especially when constrained to a single node.
> 
> Since hugetlb allocations have explicitly preferred to loop and do reclaim 
> and compaction, exempt them from this new behavior at least for the time 
> being.  It is not shown that hugetlb allocation success rate has been 
> impacted by commit b39d0ee2632d but hugetlb allocations are admittedly 
> beyond the scope of what the patch is intended to address (thp 
> allocations).

It has become pretty clear that b39d0ee2632d has regressed hugetlb
allocation success rate for any non-trivial case (complately free
memory) http://lkml.kernel.org/r/20191001054343.ga15...@dhcp22.suse.cz.
And this really is not just about hugetlb requests, really. They are
likely the most obvious example but __GFP_RETRY_MAYFAIL in general is
supposed to try as hard as feasible to success the allocation. The
decision to bail out is done at a different spot and b39d0ee2632d is
effectively bypassing that logic.

Now to the patch itself. I didn't get to test it on my testing
workload but hey steps are clearly documented and easily to set up and
reproduce. I am at a training for today and unlikely to get to test by
the end of the week infortunatelly. Anyway the patch should be fixing
the problem because it explicitly opts out for __GFP_RETRY_MAYFAIL.

I am pretty sure we will need more follow ups because the bail out logic
is simply behaving quite randomly as my measurements show (I would really
appreciate a feedback there). We need a more systematic solution because
the current logic has been rushed through without a proper analysis and
without any actual workloads to verify the effect.

> Cc: Mike Kravetz 
Fixes: b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when compaction 
may not succeed")

> Signed-off-by: David Rientjes 

I am willing to give my ack by considering that this is a clear
regression and this is probably the simplest fix but the changelog
should be explicit about the effect (feel free to borrow my numbers and
explanation in this thread).

> ---
>  Mike, you eluded that you may want to opt hugetlbfs out of this for the
>  time being in https://marc.info/?l=linux-kernel&m=156771690024533 --
>  not sure if you want to allow this excessive amount of reclaim for 
>  hugetlb allocations or not given the swap storms Andrea has shown is
>  possible (and nr_hugepages_mempolicy does exist), but hugetlbfs was not
>  part of the problem we are trying to address here so no objection to
>  opting it out.  
> 
>  You might want to consider how expensive hugetlb allocations can become
>  and disruptive to the system if it does not yield additional hugepages,
>  but that can be done at any time later as a general improvement rather
>  than part of a series aimed at thp.
> 
>  mm/page_alloc.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4467,12 +4467,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
> order,
>   if (page)
>   goto got_pg;
>  
> -  if (order >= pageblock_order && (gfp_mask & __GFP_IO)) {
> +  if (order >= pageblock_order && (gfp_mask & __GFP_IO) &&
> +  !(gfp_mask & __GFP_RETRY_MAYFAIL)) {
>   /*
>* If allocating entire pageblock(s) and compaction
>* failed because all zones are below low watermarks
>* or is prohibited because it recently failed at this
> -  * order, fail immediately.
> +  * order, fail immediately unless the allocator has
> +  * requested compaction and reclaim retry.
>*
>* Reclaim is
>*  - potentially very expensive because zones are far

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/2] misc: add cc1101 devicetree binding

2019-10-02 Thread Heiko Schocher

Hello Rob,

Am 02.10.2019 um 16:19 schrieb Rob Herring:

On Sun, Sep 22, 2019 at 08:03:55AM +0200, Heiko Schocher wrote:

add devicetree binding for cc1101 misc driver.

Signed-off-by: Heiko Schocher 
---

  .../devicetree/bindings/misc/cc1101.txt   | 27 +++
  1 file changed, 27 insertions(+)
  create mode 100644 Documentation/devicetree/bindings/misc/cc1101.txt


Can you please convert this to DT schema.


Of course, missed this point, sorry, reworked. Is there a HowTo
for writting a schema?
(beside Documentation/devicetree/bindings/example-schema.yaml)


diff --git a/Documentation/devicetree/bindings/misc/cc1101.txt 
b/Documentation/devicetree/bindings/misc/cc1101.txt
new file mode 100644
index 0..afea6acf4a9c5
--- /dev/null
+++ b/Documentation/devicetree/bindings/misc/cc1101.txt


Normal naming is to use compatible string. So ti,cc1101.yaml for schema.


renamed.




@@ -0,0 +1,27 @@
+driver cc1101 Low-Power Sub-1 GHz RF Transceiver chip from Texas
+Instruments.
+
+Requires node properties:
+- compatible : should be "ti,cc1101";
+- reg: Chip select address of device, see:
+   Documentation/devicetree/bindings/spi/spi-bus.txt
+- gpios  : list of 2 gpios, first gpio is for GDO0 pin
+   second for GDO2 pin, see more:


Is there a GDO1? Would be hard to add later because you can't change the
indices once defined.


Good point. There is a GDO1, so yes, this makes sense, added.




+   Documentation/devicetree/bindings/gpio/gpio.txt
+
+Recommended properties:
+ - spi-max-frequency: Definition as per
+Documentation/devicetree/bindings/spi/spi-bus.txt


Notice that this file is now just in redirection...


Ok.




+ - freq   : used spi frequency for communication with cc1101 chip


What's this? Doesn't spi-max-frequency cover it?


Of course, removed.

Thanks for your time and review.

bye,
Heiko
--
DENX Software Engineering GmbH,  Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-52   Fax: +49-8142-66989-80   Email: h...@denx.de


Re: [PATCH v2 0/3] media: cedrus: improvements

2019-10-02 Thread Jernej Škrabec
Dne četrtek, 03. oktober 2019 ob 00:23:07 CEST je Paul Kocialkowski 
napisal(a):
> Hi,
> 
> On Wed 02 Oct 19, 21:35, Jernej Skrabec wrote:
> > This is continuation of https://lkml.org/lkml/2019/5/30/1459 with several
> > patches removed (2 merged, others needs redesign) and one added.
> 
> Thanks for the continued effort on this, these fixes are greatly appreciated
> (and more generally, all the work you are putting into cedrus)!
> 
> Although I've been out of the loop on cedrus, it is very nice to see
> development happening. Keep up the good work!

Thanks! Those fixes are just cleaned up version of patches I'm using in 
LibreELEC for some time now. I hate maintaining out-of-tree patches over a 
long period, so pushing them upstream is beneficial for all :)

I'll send more improvements once your HEVC patches are merged.

Best regards,
Jernej

> 
> Cheers,
> 
> Paul
> 
> > Patch 1 fixes h264 playback issue which happens in rare cases.
> > 
> > Patch 2 sets PPS default reference index count in register from PPS
> > control. Currently it was set to values from slice control.
> > 
> > Patch 3 replaces direct accesses to capture queue from m2m contex with
> > helpers which was designed exactly for that. It's also safer since
> > helpers do some checks.
> > 
> > Best regards,
> > Jernej
> > 
> > Jernej Skrabec (3):
> >   media: cedrus: Fix decoding for some H264 videos
> >   media: cedrus: Fix H264 default reference index count
> >   media: cedrus: Use helpers to access capture queue
> >  
> >  drivers/staging/media/sunxi/cedrus/cedrus.h   |  8 +++-
> >  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 46 ++-
> >  .../staging/media/sunxi/cedrus/cedrus_regs.h  |  3 ++
> >  3 files changed, 44 insertions(+), 13 deletions(-)






Re: [PATCH] dt-bindings: memory-controllers: exynos5422-dmc: Correct example syntax and memory region

2019-10-02 Thread Lukasz Luba
Hi Krzysztof,

On 10/2/19 7:44 PM, Krzysztof Kozlowski wrote:
> After adding the interrupt properties to Exynos5422 DMC bindings
> example, the mapped memory region must be big enough to access
> performance counters registers.
> 
> Fix also syntax errors (semicolons) and adjust indentation.
> 
> Signed-off-by: Krzysztof Kozlowski 
> 
> ---
> 
> Rebased on top of my for-next branch as exynos5422-dmc.txt bindings were
> applied by me.
> ---
>   .../bindings/memory-controllers/exynos5422-dmc.txt| 8 
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git 
> a/Documentation/devicetree/bindings/memory-controllers/exynos5422-dmc.txt 
> b/Documentation/devicetree/bindings/memory-controllers/exynos5422-dmc.txt
> index e2434cac4858..02e4a1f862f1 100644
> --- a/Documentation/devicetree/bindings/memory-controllers/exynos5422-dmc.txt
> +++ b/Documentation/devicetree/bindings/memory-controllers/exynos5422-dmc.txt
> @@ -55,7 +55,7 @@ Example:
>   
>   dmc: memory-controller@10c2 {
>   compatible = "samsung,exynos5422-dmc";
> - reg = <0x10c2 0x100>, <0x10c3 0x100>,
> + reg = <0x10c2 0x1>, <0x10c3 0x1>;
>   clocks = <&clock CLK_FOUT_SPLL>,
><&clock CLK_MOUT_SCLK_SPLL>,
><&clock CLK_FF_DOUT_SPLL2>,
> @@ -63,7 +63,7 @@ Example:
><&clock CLK_MOUT_BPLL>,
><&clock CLK_SCLK_BPLL>,
><&clock CLK_MOUT_MX_MSPLL_CCORE>,
> -  <&clock CLK_MOUT_MCLK_CDREX>,
> +  <&clock CLK_MOUT_MCLK_CDREX>;
>   clock-names = "fout_spll",
> "mout_sclk_spll",
> "ff_dout_spll2",
> @@ -71,10 +71,10 @@ Example:
> "mout_bpll",
> "sclk_bpll",
> "mout_mx_mspll_ccore",
> -   "mout_mclk_cdrex",
> +   "mout_mclk_cdrex";
>   operating-points-v2 = <&dmc_opp_table>;
>   devfreq-events = <&ppmu_event3_dmc0_0>, <&ppmu_event3_dmc0_1>,
> - <&ppmu_event3_dmc1_0>, <&ppmu_event3_dmc1_1>;
> +  <&ppmu_event3_dmc1_0>, <&ppmu_event3_dmc1_1>;
>   device-handle = <&samsung_K3QF2F20DB>;
>   vdd-supply = <&buck1_reg>;
>   samsung,syscon-clk = <&clock>;
> 

Thank you for the patch. Indeed it must also be updated.

Reviewed-by: Lukasz Luba 

Regards,
Lukasz


Re: [PATCH v2 2/3] RISC-V: Add basic support for SBI v0.2

2019-10-02 Thread Anup Patel
On Fri, Sep 27, 2019 at 5:39 AM Atish Patra  wrote:
>
> The SBI v0.2 introduces a base extension which is backward compatible
> with v0.1. Implement all helper functions and minimum required SBI
> calls from v0.2 for now. All other base extension function will be
> added later as per need.
> As v0.2 calling convention is backward compatible with v0.1, remove
> the v0.1 helper functions and just use v0.2 calling convention.
>
> Signed-off-by: Atish Patra 
> ---
>  arch/riscv/include/asm/sbi.h | 139 ++--
>  arch/riscv/kernel/Makefile   |   1 +
>  arch/riscv/kernel/sbi.c  | 241 +++
>  arch/riscv/kernel/setup.c|   2 +
>  4 files changed, 311 insertions(+), 72 deletions(-)
>  create mode 100644 arch/riscv/kernel/sbi.c
>
> diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> index 2147f384fad0..279b7f10b3c2 100644
> --- a/arch/riscv/include/asm/sbi.h
> +++ b/arch/riscv/include/asm/sbi.h
> @@ -8,93 +8,88 @@
>
>  #include 
>
> -#define SBI_EXT_0_1_SET_TIMER 0x0
> -#define SBI_EXT_0_1_CONSOLE_PUTCHAR 0x1
> -#define SBI_EXT_0_1_CONSOLE_GETCHAR 0x2
> -#define SBI_EXT_0_1_CLEAR_IPI 0x3
> -#define SBI_EXT_0_1_SEND_IPI 0x4
> -#define SBI_EXT_0_1_REMOTE_FENCE_I 0x5
> -#define SBI_EXT_0_1_REMOTE_SFENCE_VMA 0x6
> -#define SBI_EXT_0_1_REMOTE_SFENCE_VMA_ASID 0x7
> -#define SBI_EXT_0_1_SHUTDOWN 0x8
> +enum sbi_ext_id {
> +   SBI_EXT_0_1_SET_TIMER = 0x0,
> +   SBI_EXT_0_1_CONSOLE_PUTCHAR = 0x1,
> +   SBI_EXT_0_1_CONSOLE_GETCHAR = 0x2,
> +   SBI_EXT_0_1_CLEAR_IPI = 0x3,
> +   SBI_EXT_0_1_SEND_IPI = 0x4,
> +   SBI_EXT_0_1_REMOTE_FENCE_I = 0x5,
> +   SBI_EXT_0_1_REMOTE_SFENCE_VMA = 0x6,
> +   SBI_EXT_0_1_REMOTE_SFENCE_VMA_ASID = 0x7,
> +   SBI_EXT_0_1_SHUTDOWN = 0x8,
> +   SBI_EXT_BASE = 0x10,
> +};
>
> -#define SBI_CALL(which, arg0, arg1, arg2, arg3) ({ \
> -   register uintptr_t a0 asm ("a0") = (uintptr_t)(arg0);   \
> -   register uintptr_t a1 asm ("a1") = (uintptr_t)(arg1);   \
> -   register uintptr_t a2 asm ("a2") = (uintptr_t)(arg2);   \
> -   register uintptr_t a3 asm ("a3") = (uintptr_t)(arg3);   \
> -   register uintptr_t a7 asm ("a7") = (uintptr_t)(which);  \
> -   asm volatile ("ecall"   \
> - : "+r" (a0)   \
> - : "r" (a1), "r" (a2), "r" (a3), "r" (a7)  \
> - : "memory");  \
> -   a0; \
> -})
> +enum sbi_ext_base_fid {
> +   SBI_BASE_GET_SPEC_VERSION = 0,
> +   SBI_BASE_GET_IMP_ID,
> +   SBI_BASE_GET_IMP_VERSION,
> +   SBI_BASE_PROBE_EXT,
> +   SBI_BASE_GET_MVENDORID,
> +   SBI_BASE_GET_MARCHID,
> +   SBI_BASE_GET_MIMPID,
> +};
>
> -/* Lazy implementations until SBI is finalized */
> -#define SBI_CALL_0(which) SBI_CALL(which, 0, 0, 0, 0)
> -#define SBI_CALL_1(which, arg0) SBI_CALL(which, arg0, 0, 0, 0)
> -#define SBI_CALL_2(which, arg0, arg1) SBI_CALL(which, arg0, arg1, 0, 0)
> -#define SBI_CALL_3(which, arg0, arg1, arg2) \
> -   SBI_CALL(which, arg0, arg1, arg2, 0)
> -#define SBI_CALL_4(which, arg0, arg1, arg2, arg3) \
> -   SBI_CALL(which, arg0, arg1, arg2, arg3)
> +#define SBI_SPEC_VERSION_DEFAULT   0x1
> +#define SBI_SPEC_VERSION_MAJOR_OFFSET  24
> +#define SBI_SPEC_VERSION_MAJOR_MASK0x7f
> +#define SBI_SPEC_VERSION_MINOR_MASK0xff
>
> -static inline void sbi_console_putchar(int ch)
> -{
> -   SBI_CALL_1(SBI_EXT_0_1_CONSOLE_PUTCHAR, ch);
> -}
> +/* SBI return error codes */
> +#define SBI_SUCCESS0
> +#define SBI_ERR_FAILURE-1
> +#define SBI_ERR_NOT_SUPPORTED  -2
> +#define SBI_ERR_INVALID_PARAM   -3
> +#define SBI_ERR_DENIED -4
> +#define SBI_ERR_INVALID_ADDRESS -5
>
> -static inline int sbi_console_getchar(void)
> -{
> -   return SBI_CALL_0(SBI_EXT_0_1_CONSOLE_GETCHAR);
> -}
> -
> -static inline void sbi_set_timer(uint64_t stime_value)
> -{
> -#if __riscv_xlen == 32
> -   SBI_CALL_2(SBI_EXT_0_1_SET_TIMER, stime_value,
> - stime_value >> 32);
> -#else
> -   SBI_CALL_1(SBI_EXT_0_1_SET_TIMER, stime_value);
> -#endif
> -}
> +extern unsigned long sbi_spec_version;
> +struct sbiret {
> +   long error;
> +   long value;
> +};
>
> -static inline void sbi_shutdown(void)
> -{
> -   SBI_CALL_0(SBI_EXT_0_1_SHUTDOWN);
> -}
> +void sbi_init(void);
> +struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
> + unsigned long arg1, unsigned long arg2,
> + unsigned long arg3);
> +int sbi_err_map_linux_errorno(int err);
>
> -static inline void sbi_clear_ipi(void)
> -{
> -   SBI_CALL_0(SBI_EXT_0_1_CLEAR_IPI);
> -}
> +void sbi_console_putchar(int ch);
> +int sbi_console_getchar(void);
> +void sbi_set_timer(uint64_t stime_value);
> +void sbi_shutdown(void);
> +void 

Re: [PATCH v2 2/3] media: cedrus: Fix H264 default reference index count

2019-10-02 Thread Jernej Škrabec
Dne četrtek, 03. oktober 2019 ob 00:06:50 CEST je Paul Kocialkowski 
napisal(a):
> Hi,
> 
> On Wed 02 Oct 19, 21:35, Jernej Skrabec wrote:
> > Reference index count in VE_H264_PPS should come from PPS control.
> > However, this is not really important, because reference index count is
> > in our case always overridden by that from slice header.
> 
> Thanks for the fixup!
> 
> Our libva userspace and v4l2-request testing tool currently don't provide
> this, but I have a pending merge request adding it for the hantro so it's
> good to go.

Actually, I think this is just cosmetic and it would work even if it would be 
always 0. We always override this number in SHS2 register with 
VE_H264_SHS2_NUM_REF_IDX_ACTIVE_OVRD flag and recently there was a patch merged 
to clarify that value in slice parameters should be the one that's set on 
default value if override flag is not set in bitstream:
https://git.linuxtv.org/media_tree.git/commit/?
id=187ef7c5c78153acdce8c8714e5918b1018c710b

Well, we could always compare default and value in slice parameters, but I 
really don't see the benefit of doing that extra work.

Best regards,
Jernej

> 
> Acked-by: Paul Kocialkowski 
> 
> Cheers,
> 
> Paul
> 
> > Signed-off-by: Jernej Skrabec 
> > ---
> > 
> >  drivers/staging/media/sunxi/cedrus/cedrus_h264.c | 8 ++--
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c index
> > bd848146eada..4a0e69855c7f 100644
> > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > @@ -364,12 +364,8 @@ static void cedrus_set_params(struct cedrus_ctx *ctx,
> > 
> > // picture parameters
> > reg = 0;
> > 
> > -   /*
> > -* FIXME: the kernel headers are allowing the default value to
> > -* be passed, but the libva doesn't give us that.
> > -*/
> > -   reg |= (slice->num_ref_idx_l0_active_minus1 & 0x1f) << 10;
> > -   reg |= (slice->num_ref_idx_l1_active_minus1 & 0x1f) << 5;
> > +   reg |= (pps->num_ref_idx_l0_default_active_minus1 & 0x1f) << 10;
> > +   reg |= (pps->num_ref_idx_l1_default_active_minus1 & 0x1f) << 5;
> > 
> > reg |= (pps->weighted_bipred_idc & 0x3) << 2;
> > if (pps->flags & V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE)
> > 
> > reg |= VE_H264_PPS_ENTROPY_CODING_MODE;






[PATCH v8 19/19] RISC-V: KVM: Add MAINTAINERS entry

2019-10-02 Thread Anup Patel
Add myself as maintainer for KVM RISC-V and Atish as designated reviewer.

Signed-off-by: Atish Patra 
Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 MAINTAINERS | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 296de2b51c83..67f6cb317866 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8980,6 +8980,16 @@ F:   arch/powerpc/include/asm/kvm*
 F: arch/powerpc/kvm/
 F: arch/powerpc/kernel/kvm*
 
+KERNEL VIRTUAL MACHINE FOR RISC-V (KVM/riscv)
+M: Anup Patel 
+R: Atish Patra 
+L: k...@vger.kernel.org
+T: git git://github.com/kvm-riscv/linux.git
+S: Maintained
+F: arch/riscv/include/uapi/asm/kvm*
+F: arch/riscv/include/asm/kvm*
+F: arch/riscv/kvm/
+
 KERNEL VIRTUAL MACHINE for s390 (KVM/s390)
 M: Christian Borntraeger 
 M: Janosch Frank 
-- 
2.17.1



[PATCH v8 17/19] RISC-V: KVM: Forward unhandled SBI calls to userspace

2019-10-02 Thread Anup Patel
Instead of returning error to Guest for unhandled SBI calls, we should
forward such SBI calls to KVM user-space tool (QEMU/KVMTOOL).

This way KVM userspace tool can do something about unhandled SBI calls:
1. Print unhandled SBI call details and kill the Guest
2. Emulate unhandled SBI call and resume the Guest

To achieve this, we end-up having a RISC-V specific SBI exit reason
and riscv_sbi member under "struct kvm_run". The riscv_sbi member of
"struct kvm_run" added by this patch is compatible with both SBI v0.1
and SBI v0.2 specs.

Currently, we implement SBI v0.1 for Guest where CONSOLE_GETCHAR and
CONSOLE_PUTCHART SBI calls are unhandled in KVM RISC-V kernel module
so we forward these calls to userspace. In future when we implement
SBI v0.2 for Guest, we will forward SBI v0.2 experimental and vendor
extension calls to userspace.

Signed-off-by: Anup Patel 
---
 arch/riscv/include/asm/kvm_host.h |  8 
 arch/riscv/kvm/vcpu.c |  9 
 arch/riscv/kvm/vcpu_sbi.c | 69 +--
 include/uapi/linux/kvm.h  |  8 
 4 files changed, 81 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index 74ccd8d00ec5..6f44eefc1641 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -74,6 +74,10 @@ struct kvm_mmio_decode {
int return_handled;
 };
 
+struct kvm_sbi_context {
+   int return_handled;
+};
+
 #define KVM_MMU_PAGE_CACHE_NR_OBJS 32
 
 struct kvm_mmu_page_cache {
@@ -176,6 +180,9 @@ struct kvm_vcpu_arch {
/* MMIO instruction details */
struct kvm_mmio_decode mmio_decode;
 
+   /* SBI context */
+   struct kvm_sbi_context sbi_context;
+
/* Cache pages needed to program page tables with spinlock held */
struct kvm_mmu_page_cache mmu_page_cache;
 
@@ -250,6 +257,7 @@ bool kvm_riscv_vcpu_has_interrupt(struct kvm_vcpu *vcpu);
 void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
 void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
 
+int kvm_riscv_vcpu_sbi_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run);
 
 #endif /* __RISCV_KVM_HOST_H__ */
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 8f2b058a4714..27174e2ec8a0 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -885,6 +885,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
}
}
 
+   /* Process SBI value returned from user-space */
+   if (run->exit_reason == KVM_EXIT_RISCV_SBI) {
+   ret = kvm_riscv_vcpu_sbi_return(vcpu, vcpu->run);
+   if (ret) {
+   srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx);
+   return ret;
+   }
+   }
+
if (run->immediate_exit) {
srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx);
return -EINTR;
diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
index 88fa0faa3545..983ccaf2a54e 100644
--- a/arch/riscv/kvm/vcpu_sbi.c
+++ b/arch/riscv/kvm/vcpu_sbi.c
@@ -31,6 +31,44 @@ static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu,
run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
 }
 
+static void kvm_riscv_vcpu_sbi_forward(struct kvm_vcpu *vcpu,
+  struct kvm_run *run)
+{
+   struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
+
+   vcpu->arch.sbi_context.return_handled = 0;
+   run->exit_reason = KVM_EXIT_RISCV_SBI;
+   run->riscv_sbi.extension_id = cp->a7;
+   run->riscv_sbi.function_id = cp->a6;
+   run->riscv_sbi.args[0] = cp->a0;
+   run->riscv_sbi.args[1] = cp->a1;
+   run->riscv_sbi.args[2] = cp->a2;
+   run->riscv_sbi.args[3] = cp->a3;
+   run->riscv_sbi.args[4] = cp->a4;
+   run->riscv_sbi.args[5] = cp->a5;
+   run->riscv_sbi.ret[0] = cp->a0;
+   run->riscv_sbi.ret[1] = cp->a1;
+}
+
+int kvm_riscv_vcpu_sbi_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+   struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
+
+   /* Handle SBI return only once */
+   if (vcpu->arch.sbi_context.return_handled)
+   return 0;
+   vcpu->arch.sbi_context.return_handled = 1;
+
+   /* Update return values */
+   cp->a0 = run->riscv_sbi.ret[0];
+   cp->a1 = run->riscv_sbi.ret[1];
+
+   /* Move to next instruction */
+   vcpu->arch.guest_context.sepc += 4;
+
+   return 0;
+}
+
 int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
int i, ret = 1;
@@ -44,7 +82,16 @@ int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
return -EINVAL;
 
switch (cp->a7) {
-   case SBI_SET_TIMER:
+   case SBI_EXT_0_1_CONSOLE_GETCHAR:
+   case SBI_EXT_0_1_CONSOLE_PUTCHAR:
+   /*
+* The CONSOLE

[PATCH v8 18/19] RISC-V: KVM: Document RISC-V specific parts of KVM API.

2019-10-02 Thread Anup Patel
Document RISC-V specific parts of the KVM API, such as:
 - The interrupt numbers passed to the KVM_INTERRUPT ioctl.
 - The states supported by the KVM_{GET,SET}_MP_STATE ioctls.
 - The registers supported by the KVM_{GET,SET}_ONE_REG interface
   and the encoding of those register ids.
 - The exit reason KVM_EXIT_RISCV_SBI for SBI calls forwarded to
   userspace tool.

Signed-off-by: Anup Patel 
---
 Documentation/virt/kvm/api.txt | 158 +++--
 1 file changed, 151 insertions(+), 7 deletions(-)

diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
index 4833904d32a5..f9ea81fe1143 100644
--- a/Documentation/virt/kvm/api.txt
+++ b/Documentation/virt/kvm/api.txt
@@ -471,7 +471,7 @@ struct kvm_translation {
 4.16 KVM_INTERRUPT
 
 Capability: basic
-Architectures: x86, ppc, mips
+Architectures: x86, ppc, mips, riscv
 Type: vcpu ioctl
 Parameters: struct kvm_interrupt (in)
 Returns: 0 on success, negative on failure.
@@ -531,6 +531,22 @@ interrupt number dequeues the interrupt.
 
 This is an asynchronous vcpu ioctl and can be invoked from any thread.
 
+RISC-V:
+
+Queues an external interrupt to be injected into the virutal CPU. This ioctl
+is overloaded with 2 different irq values:
+
+a) KVM_INTERRUPT_SET
+
+  This sets external interrupt for a virtual CPU and it will receive
+  once it is ready.
+
+b) KVM_INTERRUPT_UNSET
+
+  This clears pending external interrupt for a virtual CPU.
+
+This is an asynchronous vcpu ioctl and can be invoked from any thread.
+
 
 4.17 KVM_DEBUG_GUEST
 
@@ -1219,7 +1235,7 @@ for vm-wide capabilities.
 4.38 KVM_GET_MP_STATE
 
 Capability: KVM_CAP_MP_STATE
-Architectures: x86, s390, arm, arm64
+Architectures: x86, s390, arm, arm64, riscv
 Type: vcpu ioctl
 Parameters: struct kvm_mp_state (out)
 Returns: 0 on success; -1 on error
@@ -1233,7 +1249,8 @@ uniprocessor guests).
 
 Possible values are:
 
- - KVM_MP_STATE_RUNNABLE:the vcpu is currently running [x86,arm/arm64]
+ - KVM_MP_STATE_RUNNABLE:the vcpu is currently running
+ [x86,arm/arm64,riscv]
  - KVM_MP_STATE_UNINITIALIZED:   the vcpu is an application processor (AP)
  which has not yet received an INIT signal 
[x86]
  - KVM_MP_STATE_INIT_RECEIVED:   the vcpu has received an INIT signal, and is
@@ -1242,7 +1259,7 @@ Possible values are:
  is waiting for an interrupt [x86]
  - KVM_MP_STATE_SIPI_RECEIVED:   the vcpu has just received a SIPI (vector
  accessible via KVM_GET_VCPU_EVENTS) [x86]
- - KVM_MP_STATE_STOPPED: the vcpu is stopped [s390,arm/arm64]
+ - KVM_MP_STATE_STOPPED: the vcpu is stopped [s390,arm/arm64,riscv]
  - KVM_MP_STATE_CHECK_STOP:  the vcpu is in a special error state [s390]
  - KVM_MP_STATE_OPERATING:   the vcpu is operating (running or halted)
  [s390]
@@ -1253,7 +1270,7 @@ On x86, this ioctl is only useful after 
KVM_CREATE_IRQCHIP. Without an
 in-kernel irqchip, the multiprocessing state must be maintained by userspace on
 these architectures.
 
-For arm/arm64:
+For arm/arm64/riscv:
 
 The only states that are valid are KVM_MP_STATE_STOPPED and
 KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
@@ -1261,7 +1278,7 @@ KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused 
or not.
 4.39 KVM_SET_MP_STATE
 
 Capability: KVM_CAP_MP_STATE
-Architectures: x86, s390, arm, arm64
+Architectures: x86, s390, arm, arm64, riscv
 Type: vcpu ioctl
 Parameters: struct kvm_mp_state (in)
 Returns: 0 on success; -1 on error
@@ -1273,7 +1290,7 @@ On x86, this ioctl is only useful after 
KVM_CREATE_IRQCHIP. Without an
 in-kernel irqchip, the multiprocessing state must be maintained by userspace on
 these architectures.
 
-For arm/arm64:
+For arm/arm64/riscv:
 
 The only states that are valid are KVM_MP_STATE_STOPPED and
 KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
@@ -2282,6 +2299,116 @@ following id bit patterns:
   0x7020  0003 02 <0:3> 
 
 
+RISC-V registers are mapped using the lower 32 bits. The upper 8 bits of
+that is the register group type.
+
+RISC-V config registers are meant for configuring a Guest VCPU and it has
+the following id bit patterns:
+  0x8020  01  (32bit Host)
+  0x8030  01  (64bit Host)
+
+Following are the RISC-V config registers:
+
+EncodingRegister  Description
+--
+  0x80x0  0100  isa   ISA feature bitmap of Guest VCPU
+  0x80x0  0100 0001 tbfreqTime base frequency
+
+The isa config register can be read anytime but can only be written before
+a Guest VCPU runs. It will have ISA feature bits matching underlying host
+set by default. The tbfreq config register is a read-only register and it
+will return host timebase frequenc.
+
+RISC-V core registers represent the general excution state of a Guest VC

[PATCH v8 13/19] RISC-V: KVM: Add timer functionality

2019-10-02 Thread Anup Patel
From: Atish Patra 

The RISC-V hypervisor specification doesn't have any virtual timer
feature.

Due to this, the guest VCPU timer will be programmed via SBI calls.
The host will use a separate hrtimer event for each guest VCPU to
provide timer functionality. We inject a virtual timer interrupt to
the guest VCPU whenever the guest VCPU hrtimer event expires.

The following features are not supported yet and will be added in
future:
1. A time offset to adjust guest time from host time
2. A saved next event in guest vcpu for vm migration

Signed-off-by: Atish Patra 
Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
---
 arch/riscv/include/asm/kvm_host.h   |   4 +
 arch/riscv/include/asm/kvm_vcpu_timer.h |  30 +++
 arch/riscv/kvm/Makefile |   2 +-
 arch/riscv/kvm/vcpu.c   |   6 ++
 arch/riscv/kvm/vcpu_timer.c | 113 
 drivers/clocksource/timer-riscv.c   |   8 ++
 include/clocksource/timer-riscv.h   |  16 
 7 files changed, 178 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h
 create mode 100644 arch/riscv/kvm/vcpu_timer.c
 create mode 100644 include/clocksource/timer-riscv.h

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index 79ceb2aa8ae6..9179ff019235 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_64BIT
 #define KVM_MAX_VCPUS  (1U << 16)
@@ -168,6 +169,9 @@ struct kvm_vcpu_arch {
unsigned long irqs_pending;
unsigned long irqs_pending_mask;
 
+   /* VCPU Timer */
+   struct kvm_vcpu_timer timer;
+
/* MMIO instruction details */
struct kvm_mmio_decode mmio_decode;
 
diff --git a/arch/riscv/include/asm/kvm_vcpu_timer.h 
b/arch/riscv/include/asm/kvm_vcpu_timer.h
new file mode 100644
index ..6f904d49e27e
--- /dev/null
+++ b/arch/riscv/include/asm/kvm_vcpu_timer.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Atish Patra 
+ */
+
+#ifndef __KVM_VCPU_RISCV_TIMER_H
+#define __KVM_VCPU_RISCV_TIMER_H
+
+#include 
+
+struct kvm_vcpu_timer {
+   bool init_done;
+   /* Check if the timer is programmed */
+   bool next_set;
+   u64 next_cycles;
+   struct hrtimer hrt;
+   /* Mult & Shift values to get nanosec from cycles */
+   u32 mult;
+   u32 shift;
+};
+
+int kvm_riscv_vcpu_timer_init(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu, u64 ncycles);
+
+#endif
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index c0f57f26c13d..3e0c7558320d 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
 kvm-objs := $(common-objs-y)
 
 kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
-kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o
+kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
 
 obj-$(CONFIG_KVM)  += kvm.o
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 12bd837f564a..2ca913f00570 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -54,6 +54,8 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
 
memcpy(cntx, reset_cntx, sizeof(*cntx));
 
+   kvm_riscv_vcpu_timer_reset(vcpu);
+
WRITE_ONCE(vcpu->arch.irqs_pending, 0);
WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
 }
@@ -108,6 +110,9 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
cntx->hstatus |= HSTATUS_SP2P;
cntx->hstatus |= HSTATUS_SPV;
 
+   /* Setup VCPU timer */
+   kvm_riscv_vcpu_timer_init(vcpu);
+
/* Reset VCPU */
kvm_riscv_reset_vcpu(vcpu);
 
@@ -116,6 +121,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
+   kvm_riscv_vcpu_timer_deinit(vcpu);
kvm_riscv_stage2_flush_cache(vcpu);
kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
diff --git a/arch/riscv/kvm/vcpu_timer.c b/arch/riscv/kvm/vcpu_timer.c
new file mode 100644
index ..9ffdd6ff8d6e
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_timer.c
@@ -0,0 +1,113 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Atish Patra 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define VCPU_TIMER_PROGRAM_THRESHOLD_NS1000
+
+static enum hrtimer_restart kvm_riscv_vcpu_hrtimer_expired(struct hrtimer *h)
+{
+   struct kvm_vcpu_timer *t = container_of(h, struct kvm_vcpu_timer, hrt);
+   struct kvm_vcpu *vcpu = container_of(t, struct kvm_vcpu, arch.timer);
+
+   t->n

[PATCH v8 11/19] RISC-V: KVM: Implement stage2 page table programming

2019-10-02 Thread Anup Patel
This patch implements all required functions for programming
the stage2 page table for each Guest/VM.

At high-level, the flow of stage2 related functions is similar
from KVM ARM/ARM64 implementation but the stage2 page table
format is quite different for KVM RISC-V.

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
---
 arch/riscv/include/asm/kvm_host.h |  10 +
 arch/riscv/include/asm/pgtable-bits.h |   1 +
 arch/riscv/kvm/mmu.c  | 643 +-
 3 files changed, 644 insertions(+), 10 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index 8aaf22a900be..bc27f664b443 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -73,6 +73,13 @@ struct kvm_mmio_decode {
int return_handled;
 };
 
+#define KVM_MMU_PAGE_CACHE_NR_OBJS 32
+
+struct kvm_mmu_page_cache {
+   int nobjs;
+   void *objects[KVM_MMU_PAGE_CACHE_NR_OBJS];
+};
+
 struct kvm_cpu_context {
unsigned long zero;
unsigned long ra;
@@ -164,6 +171,9 @@ struct kvm_vcpu_arch {
/* MMIO instruction details */
struct kvm_mmio_decode mmio_decode;
 
+   /* Cache pages needed to program page tables with spinlock held */
+   struct kvm_mmu_page_cache mmu_page_cache;
+
/* VCPU power-off state */
bool power_off;
 
diff --git a/arch/riscv/include/asm/pgtable-bits.h 
b/arch/riscv/include/asm/pgtable-bits.h
index bbaeb5d35842..be49d62fcc2b 100644
--- a/arch/riscv/include/asm/pgtable-bits.h
+++ b/arch/riscv/include/asm/pgtable-bits.h
@@ -26,6 +26,7 @@
 
 #define _PAGE_SPECIAL   _PAGE_SOFT
 #define _PAGE_TABLE _PAGE_PRESENT
+#define _PAGE_LEAF  (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
 
 /*
  * _PAGE_PROT_NONE is set on not-present pages (and ignored by the hardware) to
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 2b965f9aac07..590669290139 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -18,6 +18,438 @@
 #include 
 #include 
 
+#ifdef CONFIG_64BIT
+#define stage2_have_pmdtrue
+#define stage2_gpa_size((phys_addr_t)(1ULL << 39))
+#define stage2_cache_min_pages 2
+#else
+#define pmd_index(x)   0
+#define pfn_pmd(x, y)  ({ pmd_t __x = { 0 }; __x; })
+#define stage2_have_pmdfalse
+#define stage2_gpa_size((phys_addr_t)(1ULL << 32))
+#define stage2_cache_min_pages 1
+#endif
+
+static int stage2_cache_topup(struct kvm_mmu_page_cache *pcache,
+ int min, int max)
+{
+   void *page;
+
+   BUG_ON(max > KVM_MMU_PAGE_CACHE_NR_OBJS);
+   if (pcache->nobjs >= min)
+   return 0;
+   while (pcache->nobjs < max) {
+   page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+   if (!page)
+   return -ENOMEM;
+   pcache->objects[pcache->nobjs++] = page;
+   }
+
+   return 0;
+}
+
+static void stage2_cache_flush(struct kvm_mmu_page_cache *pcache)
+{
+   while (pcache && pcache->nobjs)
+   free_page((unsigned long)pcache->objects[--pcache->nobjs]);
+}
+
+static void *stage2_cache_alloc(struct kvm_mmu_page_cache *pcache)
+{
+   void *p;
+
+   if (!pcache)
+   return NULL;
+
+   BUG_ON(!pcache->nobjs);
+   p = pcache->objects[--pcache->nobjs];
+
+   return p;
+}
+
+struct local_guest_tlb_info {
+   struct kvm_vmid *vmid;
+   gpa_t addr;
+};
+
+static void local_guest_tlb_flush_vmid_gpa(void *info)
+{
+   struct local_guest_tlb_info *infop = info;
+
+   __kvm_riscv_hfence_gvma_vmid_gpa(READ_ONCE(infop->vmid->vmid_version),
+infop->addr);
+}
+
+static void stage2_remote_tlb_flush(struct kvm *kvm, gpa_t addr)
+{
+   struct local_guest_tlb_info info;
+   struct kvm_vmid *vmid = &kvm->arch.vmid;
+
+   /*
+* Ideally, we should have a SBI call OR some remote TLB instruction
+* but we don't have it so we explicitly flush TLBs using IPIs.
+*
+* TODO: Instead of cpu_online_mask, we should only target CPUs
+* where the Guest/VM is running.
+*/
+   info.vmid = vmid;
+   info.addr = addr;
+   preempt_disable();
+   smp_call_function_many(cpu_online_mask,
+  local_guest_tlb_flush_vmid_gpa, &info, true);
+   preempt_enable();
+}
+
+static int stage2_set_pgd(struct kvm *kvm, gpa_t addr, const pgd_t *new_pgd)
+{
+   pgd_t *pgdp = &kvm->arch.pgd[pgd_index(addr)];
+
+   *pgdp = *new_pgd;
+   if (pgd_val(*pgdp) & _PAGE_LEAF)
+   stage2_remote_tlb_flush(kvm, addr);
+
+   return 0;
+}
+
+static int stage2_set_pmd(struct kvm *kvm, struct kvm_mmu_page_cache *pcache,
+ gpa_t addr, const pmd_t *new_pmd)
+{
+   int rc;
+   pmd_t *pmdp;
+   pgd_t new_pgd;
+   pgd_t *pgdp = &kvm->arch.pg

[PATCH v8 10/19] RISC-V: KVM: Implement VMID allocator

2019-10-02 Thread Anup Patel
We implement a simple VMID allocator for Guests/VMs which:
1. Detects number of VMID bits at boot-time
2. Uses atomic number to track VMID version and increments
   VMID version whenever we run-out of VMIDs
3. Flushes Guest TLBs on all host CPUs whenever we run-out
   of VMIDs
4. Force updates HW Stage2 VMID for each Guest VCPU whenever
   VMID changes using VCPU request KVM_REQ_UPDATE_HGATP

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/asm/kvm_host.h |  25 ++
 arch/riscv/kvm/Makefile   |   3 +-
 arch/riscv/kvm/main.c |   4 +
 arch/riscv/kvm/tlb.S  |  43 +++
 arch/riscv/kvm/vcpu.c |   9 +++
 arch/riscv/kvm/vm.c   |   6 ++
 arch/riscv/kvm/vmid.c | 123 ++
 7 files changed, 212 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/kvm/tlb.S
 create mode 100644 arch/riscv/kvm/vmid.c

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index 2a5209fff68d..8aaf22a900be 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -27,6 +27,7 @@
 #define KVM_REQ_SLEEP \
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(1)
+#define KVM_REQ_UPDATE_HGATP   KVM_ARCH_REQ(2)
 
 struct kvm_vm_stat {
ulong remote_tlb_flush;
@@ -47,7 +48,19 @@ struct kvm_vcpu_stat {
 struct kvm_arch_memory_slot {
 };
 
+struct kvm_vmid {
+   /*
+* Writes to vmid_version and vmid happen with vmid_lock held
+* whereas reads happen without any lock held.
+*/
+   unsigned long vmid_version;
+   unsigned long vmid;
+};
+
 struct kvm_arch {
+   /* stage2 vmid */
+   struct kvm_vmid vmid;
+
/* stage2 page table */
pgd_t *pgd;
phys_addr_t pgd_phys;
@@ -170,6 +183,12 @@ static inline void kvm_arch_vcpu_block_finish(struct 
kvm_vcpu *vcpu) {}
 int kvm_riscv_setup_vsip(void);
 void kvm_riscv_cleanup_vsip(void);
 
+void __kvm_riscv_hfence_gvma_vmid_gpa(unsigned long vmid,
+ unsigned long gpa);
+void __kvm_riscv_hfence_gvma_vmid(unsigned long vmid);
+void __kvm_riscv_hfence_gvma_gpa(unsigned long gpa);
+void __kvm_riscv_hfence_gvma_all(void);
+
 int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
 bool is_write);
 void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
@@ -177,6 +196,12 @@ int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
 void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
 void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu);
 
+void kvm_riscv_stage2_vmid_detect(void);
+unsigned long kvm_riscv_stage2_vmid_bits(void);
+int kvm_riscv_stage2_vmid_init(struct kvm *kvm);
+bool kvm_riscv_stage2_vmid_ver_changed(struct kvm_vmid *vmid);
+void kvm_riscv_stage2_vmid_update(struct kvm_vcpu *vcpu);
+
 void __kvm_riscv_unpriv_trap(void);
 
 unsigned long kvm_riscv_vcpu_unpriv_read(struct kvm_vcpu *vcpu,
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index 845579273727..c0f57f26c13d 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -8,6 +8,7 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
 
 kvm-objs := $(common-objs-y)
 
-kvm-objs += main.o vm.o mmu.o vcpu.o vcpu_exit.o vcpu_switch.o
+kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
+kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o
 
 obj-$(CONFIG_KVM)  += kvm.o
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index d088247843c5..55df85184241 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -72,8 +72,12 @@ int kvm_arch_init(void *opaque)
if (ret)
return ret;
 
+   kvm_riscv_stage2_vmid_detect();
+
kvm_info("hypervisor extension available\n");
 
+   kvm_info("host has %ld VMID bits\n", kvm_riscv_stage2_vmid_bits());
+
return 0;
 }
 
diff --git a/arch/riscv/kvm/tlb.S b/arch/riscv/kvm/tlb.S
new file mode 100644
index ..453fca8d7940
--- /dev/null
+++ b/arch/riscv/kvm/tlb.S
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Anup Patel 
+ */
+
+#include 
+#include 
+
+   .text
+   .altmacro
+   .option norelax
+
+   /*
+* Instruction encoding of hfence.gvma is:
+* 0110001 rs2(5) rs1(5) 000 0 1110011
+*/
+
+ENTRY(__kvm_riscv_hfence_gvma_vmid_gpa)
+   /* hfence.gvma a1, a0 */
+   .word 0x62a60073
+   ret
+ENDPROC(__kvm_riscv_hfence_gvma_vmid_gpa)
+
+ENTRY(__kvm_riscv_hfence_gvma_vmid)
+   /* hfence.gvma zero, a0 */
+   .word 0x62a00073
+   ret
+ENDPROC(__kvm_riscv_hfence_gvma_vmid)
+
+ENTRY(__kvm_riscv_hfence_gvma_gpa)
+   /* hfence.gvma a0 */
+   .word 0x62050073
+   ret
+ENDPROC(__kvm_riscv_hfence_gvma_gpa)
+

[PATCH v8 15/19] RISC-V: KVM: Implement ONE REG interface for FP registers

2019-10-02 Thread Anup Patel
From: Atish Patra 

Add a KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctl interface for floating
point registers such as F0-F31 and FCSR. This support is added for
both 'F' and 'D' extensions.

Signed-off-by: Atish Patra 
Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/uapi/asm/kvm.h |  10 +++
 arch/riscv/kvm/vcpu.c | 104 ++
 2 files changed, 114 insertions(+)

diff --git a/arch/riscv/include/uapi/asm/kvm.h 
b/arch/riscv/include/uapi/asm/kvm.h
index 997b85f6fded..19811823ab70 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -96,6 +96,16 @@ struct kvm_riscv_csr {
 #define KVM_REG_RISCV_CSR_REG(name)\
(offsetof(struct kvm_riscv_csr, name) / sizeof(unsigned long))
 
+/* F extension registers are mapped as type4 */
+#define KVM_REG_RISCV_FP_F (0x04 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_FP_F_REG(name)   \
+   (offsetof(struct __riscv_f_ext_state, name) / sizeof(u32))
+
+/* D extension registers are mapped as type 5 */
+#define KVM_REG_RISCV_FP_D (0x05 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_FP_D_REG(name)   \
+   (offsetof(struct __riscv_d_ext_state, name) / sizeof(u64))
+
 #endif
 
 #endif /* __LINUX_KVM_RISCV_H */
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 67f9dd66f2db..8f2b058a4714 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -431,6 +431,98 @@ static int kvm_riscv_vcpu_set_reg_csr(struct kvm_vcpu 
*vcpu,
return 0;
 }
 
+static int kvm_riscv_vcpu_get_reg_fp(struct kvm_vcpu *vcpu,
+const struct kvm_one_reg *reg,
+unsigned long rtype)
+{
+   struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
+   unsigned long isa = vcpu->arch.isa;
+   unsigned long __user *uaddr =
+   (unsigned long __user *)(unsigned long)reg->addr;
+   unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+   KVM_REG_SIZE_MASK |
+   rtype);
+   void *reg_val;
+
+   if ((rtype == KVM_REG_RISCV_FP_F) &&
+   riscv_isa_extension_available(&isa, f)) {
+   if (KVM_REG_SIZE(reg->id) != sizeof(u32))
+   return -EINVAL;
+   if (reg_num == KVM_REG_RISCV_FP_F_REG(fcsr))
+   reg_val = &cntx->fp.f.fcsr;
+   else if ((KVM_REG_RISCV_FP_F_REG(f[0]) <= reg_num) &&
+ reg_num <= KVM_REG_RISCV_FP_F_REG(f[31]))
+   reg_val = &cntx->fp.f.f[reg_num];
+   else
+   return -EINVAL;
+   } else if ((rtype == KVM_REG_RISCV_FP_D) &&
+  riscv_isa_extension_available(&isa, d)) {
+   if (reg_num == KVM_REG_RISCV_FP_D_REG(fcsr)) {
+   if (KVM_REG_SIZE(reg->id) != sizeof(u32))
+   return -EINVAL;
+   reg_val = &cntx->fp.d.fcsr;
+   } else if ((KVM_REG_RISCV_FP_D_REG(f[0]) <= reg_num) &&
+  reg_num <= KVM_REG_RISCV_FP_D_REG(f[31])) {
+   if (KVM_REG_SIZE(reg->id) != sizeof(u64))
+   return -EINVAL;
+   reg_val = &cntx->fp.d.f[reg_num];
+   } else
+   return -EINVAL;
+   } else
+   return -EINVAL;
+
+   if (copy_to_user(uaddr, reg_val, KVM_REG_SIZE(reg->id)))
+   return -EFAULT;
+
+   return 0;
+}
+
+static int kvm_riscv_vcpu_set_reg_fp(struct kvm_vcpu *vcpu,
+const struct kvm_one_reg *reg,
+unsigned long rtype)
+{
+   struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
+   unsigned long isa = vcpu->arch.isa;
+   unsigned long __user *uaddr =
+   (unsigned long __user *)(unsigned long)reg->addr;
+   unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+   KVM_REG_SIZE_MASK |
+   rtype);
+   void *reg_val;
+
+   if ((rtype == KVM_REG_RISCV_FP_F) &&
+   riscv_isa_extension_available(&isa, f)) {
+   if (KVM_REG_SIZE(reg->id) != sizeof(u32))
+   return -EINVAL;
+   if (reg_num == KVM_REG_RISCV_FP_F_REG(fcsr))
+   reg_val = &cntx->fp.f.fcsr;
+   else if ((KVM_REG_RISCV_FP_F_REG(f[0]) <= reg_num) &&
+ reg_num <= KVM_REG_RISCV_FP_F_REG(f[31]))
+   reg_val = &cntx->fp.f.f[reg_num];
+   else
+   return -EINVAL;
+   } else if ((rtype == KVM_REG_RISCV_FP_D) &&
+  riscv_isa_extension_available

[PATCH v8 14/19] RISC-V: KVM: FP lazy save/restore

2019-10-02 Thread Anup Patel
From: Atish Patra 

This patch adds floating point (F and D extension) context save/restore
for guest VCPUs. The FP context is saved and restored lazily only when
kernel enter/exits the in-kernel run loop and not during the KVM world
switch. This way FP save/restore has minimal impact on KVM performance.

Signed-off-by: Atish Patra 
Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/asm/kvm_host.h |   5 +
 arch/riscv/kernel/asm-offsets.c   |  72 +
 arch/riscv/kvm/vcpu.c |  81 ++
 arch/riscv/kvm/vcpu_switch.S  | 174 ++
 4 files changed, 332 insertions(+)

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index 9179ff019235..928c67828b1b 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -117,6 +117,7 @@ struct kvm_cpu_context {
unsigned long sepc;
unsigned long sstatus;
unsigned long hstatus;
+   union __riscv_fp_state fp;
 };
 
 struct kvm_vcpu_csr {
@@ -236,6 +237,10 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct 
kvm_run *run,
unsigned long scause, unsigned long stval);
 
 void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch);
+void __kvm_riscv_fp_f_save(struct kvm_cpu_context *context);
+void __kvm_riscv_fp_f_restore(struct kvm_cpu_context *context);
+void __kvm_riscv_fp_d_save(struct kvm_cpu_context *context);
+void __kvm_riscv_fp_d_restore(struct kvm_cpu_context *context);
 
 int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
 int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index 711656710190..9980069a1acf 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -185,6 +185,78 @@ void asm_offsets(void)
OFFSET(KVM_ARCH_HOST_SSCRATCH, kvm_vcpu_arch, host_sscratch);
OFFSET(KVM_ARCH_HOST_STVEC, kvm_vcpu_arch, host_stvec);
 
+   /* F extension */
+
+   OFFSET(KVM_ARCH_FP_F_F0, kvm_cpu_context, fp.f.f[0]);
+   OFFSET(KVM_ARCH_FP_F_F1, kvm_cpu_context, fp.f.f[1]);
+   OFFSET(KVM_ARCH_FP_F_F2, kvm_cpu_context, fp.f.f[2]);
+   OFFSET(KVM_ARCH_FP_F_F3, kvm_cpu_context, fp.f.f[3]);
+   OFFSET(KVM_ARCH_FP_F_F4, kvm_cpu_context, fp.f.f[4]);
+   OFFSET(KVM_ARCH_FP_F_F5, kvm_cpu_context, fp.f.f[5]);
+   OFFSET(KVM_ARCH_FP_F_F6, kvm_cpu_context, fp.f.f[6]);
+   OFFSET(KVM_ARCH_FP_F_F7, kvm_cpu_context, fp.f.f[7]);
+   OFFSET(KVM_ARCH_FP_F_F8, kvm_cpu_context, fp.f.f[8]);
+   OFFSET(KVM_ARCH_FP_F_F9, kvm_cpu_context, fp.f.f[9]);
+   OFFSET(KVM_ARCH_FP_F_F10, kvm_cpu_context, fp.f.f[10]);
+   OFFSET(KVM_ARCH_FP_F_F11, kvm_cpu_context, fp.f.f[11]);
+   OFFSET(KVM_ARCH_FP_F_F12, kvm_cpu_context, fp.f.f[12]);
+   OFFSET(KVM_ARCH_FP_F_F13, kvm_cpu_context, fp.f.f[13]);
+   OFFSET(KVM_ARCH_FP_F_F14, kvm_cpu_context, fp.f.f[14]);
+   OFFSET(KVM_ARCH_FP_F_F15, kvm_cpu_context, fp.f.f[15]);
+   OFFSET(KVM_ARCH_FP_F_F16, kvm_cpu_context, fp.f.f[16]);
+   OFFSET(KVM_ARCH_FP_F_F17, kvm_cpu_context, fp.f.f[17]);
+   OFFSET(KVM_ARCH_FP_F_F18, kvm_cpu_context, fp.f.f[18]);
+   OFFSET(KVM_ARCH_FP_F_F19, kvm_cpu_context, fp.f.f[19]);
+   OFFSET(KVM_ARCH_FP_F_F20, kvm_cpu_context, fp.f.f[20]);
+   OFFSET(KVM_ARCH_FP_F_F21, kvm_cpu_context, fp.f.f[21]);
+   OFFSET(KVM_ARCH_FP_F_F22, kvm_cpu_context, fp.f.f[22]);
+   OFFSET(KVM_ARCH_FP_F_F23, kvm_cpu_context, fp.f.f[23]);
+   OFFSET(KVM_ARCH_FP_F_F24, kvm_cpu_context, fp.f.f[24]);
+   OFFSET(KVM_ARCH_FP_F_F25, kvm_cpu_context, fp.f.f[25]);
+   OFFSET(KVM_ARCH_FP_F_F26, kvm_cpu_context, fp.f.f[26]);
+   OFFSET(KVM_ARCH_FP_F_F27, kvm_cpu_context, fp.f.f[27]);
+   OFFSET(KVM_ARCH_FP_F_F28, kvm_cpu_context, fp.f.f[28]);
+   OFFSET(KVM_ARCH_FP_F_F29, kvm_cpu_context, fp.f.f[29]);
+   OFFSET(KVM_ARCH_FP_F_F30, kvm_cpu_context, fp.f.f[30]);
+   OFFSET(KVM_ARCH_FP_F_F31, kvm_cpu_context, fp.f.f[31]);
+   OFFSET(KVM_ARCH_FP_F_FCSR, kvm_cpu_context, fp.f.fcsr);
+
+   /* D extension */
+
+   OFFSET(KVM_ARCH_FP_D_F0, kvm_cpu_context, fp.d.f[0]);
+   OFFSET(KVM_ARCH_FP_D_F1, kvm_cpu_context, fp.d.f[1]);
+   OFFSET(KVM_ARCH_FP_D_F2, kvm_cpu_context, fp.d.f[2]);
+   OFFSET(KVM_ARCH_FP_D_F3, kvm_cpu_context, fp.d.f[3]);
+   OFFSET(KVM_ARCH_FP_D_F4, kvm_cpu_context, fp.d.f[4]);
+   OFFSET(KVM_ARCH_FP_D_F5, kvm_cpu_context, fp.d.f[5]);
+   OFFSET(KVM_ARCH_FP_D_F6, kvm_cpu_context, fp.d.f[6]);
+   OFFSET(KVM_ARCH_FP_D_F7, kvm_cpu_context, fp.d.f[7]);
+   OFFSET(KVM_ARCH_FP_D_F8, kvm_cpu_context, fp.d.f[8]);
+   OFFSET(KVM_ARCH_FP_D_F9, kvm_cpu_context, fp.d.f[9]);
+   OFFSET(KVM_ARCH_FP_D_F10, kvm_cpu_context, fp.d.f[10]);
+   OFFSET(KVM_ARCH_FP_D_F11, kvm_cpu_co

[PATCH v8 16/19] RISC-V: KVM: Add SBI v0.1 support

2019-10-02 Thread Anup Patel
From: Atish Patra 

The KVM host kernel running in HS-mode needs to handle SBI calls coming
from guest kernel running in VS-mode.

This patch adds SBI v0.1 support in KVM RISC-V. All the SBI calls are
implemented correctly except remote tlb flushes. For remote TLB flushes,
we are doing full TLB flush and this will be optimized in future.

Signed-off-by: Atish Patra 
Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
---
 arch/riscv/include/asm/kvm_host.h |   2 +
 arch/riscv/kvm/Makefile   |   2 +-
 arch/riscv/kvm/vcpu_exit.c|   4 ++
 arch/riscv/kvm/vcpu_sbi.c | 106 ++
 4 files changed, 113 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/kvm/vcpu_sbi.c

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index 928c67828b1b..74ccd8d00ec5 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -250,4 +250,6 @@ bool kvm_riscv_vcpu_has_interrupt(struct kvm_vcpu *vcpu);
 void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
 void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
 
+int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
 #endif /* __RISCV_KVM_HOST_H__ */
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index 3e0c7558320d..b56dc1650d2c 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
 kvm-objs := $(common-objs-y)
 
 kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
-kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
+kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o vcpu_sbi.o
 
 obj-$(CONFIG_KVM)  += kvm.o
diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
index 7507b859246b..0e9b0ffa169d 100644
--- a/arch/riscv/kvm/vcpu_exit.c
+++ b/arch/riscv/kvm/vcpu_exit.c
@@ -587,6 +587,10 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct 
kvm_run *run,
(vcpu->arch.guest_context.hstatus & HSTATUS_STL))
ret = stage2_page_fault(vcpu, run, scause, stval);
break;
+   case EXC_SUPERVISOR_SYSCALL:
+   if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
+   ret = kvm_riscv_vcpu_sbi_ecall(vcpu, run);
+   break;
default:
break;
};
diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
new file mode 100644
index ..88fa0faa3545
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_sbi.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * Copyright (c) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Atish Patra 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define SBI_VERSION_MAJOR  0
+#define SBI_VERSION_MINOR  1
+
+static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu,
+   struct kvm_run *run, u32 type)
+{
+   int i;
+   struct kvm_vcpu *tmp;
+
+   kvm_for_each_vcpu(i, tmp, vcpu->kvm)
+   tmp->arch.power_off = true;
+   kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
+
+   memset(&run->system_event, 0, sizeof(run->system_event));
+   run->system_event.type = type;
+   run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+}
+
+int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+   int i, ret = 1;
+   u64 next_cycle;
+   struct kvm_vcpu *rvcpu;
+   bool next_sepc = true;
+   ulong hmask, ut_scause = 0;
+   struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
+
+   if (!cp)
+   return -EINVAL;
+
+   switch (cp->a7) {
+   case SBI_SET_TIMER:
+#if __riscv_xlen == 32
+   next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0;
+#else
+   next_cycle = (u64)cp->a0;
+#endif
+   kvm_riscv_vcpu_timer_next_event(vcpu, next_cycle);
+   break;
+   case SBI_CLEAR_IPI:
+   kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_S_SOFT);
+   break;
+   case SBI_SEND_IPI:
+   hmask = kvm_riscv_vcpu_unpriv_read(vcpu, false, cp->a0,
+  &ut_scause);
+   if (ut_scause) {
+   kvm_riscv_vcpu_trap_redirect(vcpu, ut_scause,
+cp->a0);
+   next_sepc = false;
+   } else {
+   for_each_set_bit(i, &hmask, BITS_PER_LONG) {
+   rvcpu = kvm_get_vcpu_by_id(vcpu->kvm, i);
+   kvm_riscv_vcpu_set_interrupt(rvcpu, IRQ_S_SOFT);
+   }
+   }
+   break;
+   case SBI_SHUTDOWN:
+   kvm_sbi_system_shutdown(vcpu, run, KVM_SYSTEM_EVENT_SHUTDOWN);
+   ret = 0;
+   break;
+   case

[PATCH v8 12/19] RISC-V: KVM: Implement MMU notifiers

2019-10-02 Thread Anup Patel
This patch implements MMU notifiers for KVM RISC-V so that Guest
physical address space is in-sync with Host physical address space.

This will allow swapping, page migration, etc to work transparently
with KVM RISC-V.

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/asm/kvm_host.h |   7 ++
 arch/riscv/kvm/Kconfig|   1 +
 arch/riscv/kvm/mmu.c  | 200 +-
 arch/riscv/kvm/vm.c   |   1 +
 4 files changed, 208 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index bc27f664b443..79ceb2aa8ae6 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -193,6 +193,13 @@ static inline void kvm_arch_vcpu_block_finish(struct 
kvm_vcpu *vcpu) {}
 int kvm_riscv_setup_vsip(void);
 void kvm_riscv_cleanup_vsip(void);
 
+#define KVM_ARCH_WANT_MMU_NOTIFIER
+int kvm_unmap_hva_range(struct kvm *kvm,
+   unsigned long start, unsigned long end);
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
+int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
+
 void __kvm_riscv_hfence_gvma_vmid_gpa(unsigned long vmid,
  unsigned long gpa);
 void __kvm_riscv_hfence_gvma_vmid(unsigned long vmid);
diff --git a/arch/riscv/kvm/Kconfig b/arch/riscv/kvm/Kconfig
index 9cca98c4673b..d8fa13b0da18 100644
--- a/arch/riscv/kvm/Kconfig
+++ b/arch/riscv/kvm/Kconfig
@@ -20,6 +20,7 @@ if VIRTUALIZATION
 config KVM
tristate "Kernel-based Virtual Machine (KVM) support (EXPERIMENTAL)"
depends on OF
+   select MMU_NOTIFIER
select PREEMPT_NOTIFIERS
select ANON_INODES
select KVM_MMIO
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 590669290139..d8a692d3e640 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -67,6 +67,66 @@ static void *stage2_cache_alloc(struct kvm_mmu_page_cache 
*pcache)
return p;
 }
 
+static int stage2_pgdp_test_and_clear_young(pgd_t *pgd)
+{
+   return ptep_test_and_clear_young(NULL, 0, (pte_t *)pgd);
+}
+
+static int stage2_pmdp_test_and_clear_young(pmd_t *pmd)
+{
+   return ptep_test_and_clear_young(NULL, 0, (pte_t *)pmd);
+}
+
+static int stage2_ptep_test_and_clear_young(pte_t *pte)
+{
+   return ptep_test_and_clear_young(NULL, 0, pte);
+}
+
+static bool stage2_get_leaf_entry(struct kvm *kvm, gpa_t addr,
+ pgd_t **pgdpp, pmd_t **pmdpp, pte_t **ptepp)
+{
+   pgd_t *pgdp;
+   pmd_t *pmdp;
+   pte_t *ptep;
+
+   *pgdpp = NULL;
+   *pmdpp = NULL;
+   *ptepp = NULL;
+
+   pgdp = &kvm->arch.pgd[pgd_index(addr)];
+   if (!pgd_val(*pgdp))
+   return false;
+   if (pgd_val(*pgdp) & _PAGE_LEAF) {
+   *pgdpp = pgdp;
+   return true;
+   }
+
+   if (stage2_have_pmd) {
+   pmdp = (void *)pgd_page_vaddr(*pgdp);
+   pmdp = &pmdp[pmd_index(addr)];
+   if (!pmd_present(*pmdp))
+   return false;
+   if (pmd_val(*pmdp) & _PAGE_LEAF) {
+   *pmdpp = pmdp;
+   return true;
+   }
+
+   ptep = (void *)pmd_page_vaddr(*pmdp);
+   } else {
+   ptep = (void *)pgd_page_vaddr(*pgdp);
+   }
+
+   ptep = &ptep[pte_index(addr)];
+   if (!pte_present(*ptep))
+   return false;
+   if (pte_val(*ptep) & _PAGE_LEAF) {
+   *ptepp = ptep;
+   return true;
+   }
+
+   return false;
+}
+
 struct local_guest_tlb_info {
struct kvm_vmid *vmid;
gpa_t addr;
@@ -450,6 +510,38 @@ int stage2_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t 
hpa,
 
 }
 
+static int handle_hva_to_gpa(struct kvm *kvm,
+unsigned long start,
+unsigned long end,
+int (*handler)(struct kvm *kvm,
+   gpa_t gpa, u64 size,
+   void *data),
+void *data)
+{
+   struct kvm_memslots *slots;
+   struct kvm_memory_slot *memslot;
+   int ret = 0;
+
+   slots = kvm_memslots(kvm);
+
+   /* we only care about the pages that the guest sees */
+   kvm_for_each_memslot(memslot, slots) {
+   unsigned long hva_start, hva_end;
+   gfn_t gpa;
+
+   hva_start = max(start, memslot->userspace_addr);
+   hva_end = min(end, memslot->userspace_addr +
+   (memslot->npages << PAGE_SHIFT));
+   if (hva_start >= hva_end)
+   continue;
+
+   gpa = hva_to_gfn_memslot(hva_start, memslot) << PAG

[PATCH v8 06/19] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls

2019-10-02 Thread Anup Patel
For KVM RISC-V, we use KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls to access
VCPU config and registers from user-space.

We have three types of VCPU registers:
1. CONFIG - these are VCPU config and capabilities
2. CORE   - these are VCPU general purpose registers
3. CSR- these are VCPU control and status registers

The CONFIG registers available to user-space are ISA and TIMEBASE. Out
of these, TIMEBASE is a read-only register which inform user-space about
VCPU timer base frequency. The ISA register is a read and write register
where user-space can only write the desired VCPU ISA capabilities before
running the VCPU.

The CORE registers available to user-space are PC, RA, SP, GP, TP, A0-A7,
T0-T6, S0-S11 and MODE. Most of these are RISC-V general registers except
PC and MODE. The PC register represents program counter whereas the MODE
register represent VCPU privilege mode (i.e. S/U-mode).

The CSRs available to user-space are SSTATUS, SIE, STVEC, SSCRATCH, SEPC,
SCAUSE, STVAL, SIP, and SATP. All of these are read/write registers.

In future, more VCPU register types will be added (such as FP) for the
KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls.

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
---
 arch/riscv/include/uapi/asm/kvm.h |  53 ++-
 arch/riscv/kvm/vcpu.c | 239 +-
 2 files changed, 289 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/uapi/asm/kvm.h 
b/arch/riscv/include/uapi/asm/kvm.h
index 6dbc056d58ba..997b85f6fded 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -41,10 +41,61 @@ struct kvm_guest_debug_arch {
 struct kvm_sync_regs {
 };
 
-/* dummy definition */
+/* for KVM_GET_SREGS and KVM_SET_SREGS */
 struct kvm_sregs {
 };
 
+/* CONFIG registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */
+struct kvm_riscv_config {
+   unsigned long isa;
+   unsigned long tbfreq;
+};
+
+/* CORE registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */
+struct kvm_riscv_core {
+   struct user_regs_struct regs;
+   unsigned long mode;
+};
+
+/* Possible privilege modes for kvm_riscv_core */
+#define KVM_RISCV_MODE_S   1
+#define KVM_RISCV_MODE_U   0
+
+/* CSR registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */
+struct kvm_riscv_csr {
+   unsigned long sstatus;
+   unsigned long sie;
+   unsigned long stvec;
+   unsigned long sscratch;
+   unsigned long sepc;
+   unsigned long scause;
+   unsigned long stval;
+   unsigned long sip;
+   unsigned long satp;
+};
+
+#define KVM_REG_SIZE(id)   \
+   (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
+
+/* If you need to interpret the index values, here is the key: */
+#define KVM_REG_RISCV_TYPE_MASK0xFF00
+#define KVM_REG_RISCV_TYPE_SHIFT   24
+
+/* Config registers are mapped as type 1 */
+#define KVM_REG_RISCV_CONFIG   (0x01 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_CONFIG_REG(name) \
+   (offsetof(struct kvm_riscv_config, name) / sizeof(unsigned long))
+
+/* Core registers are mapped as type 2 */
+#define KVM_REG_RISCV_CORE (0x02 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_CORE_REG(name)   \
+   (offsetof(struct kvm_riscv_core, name) / sizeof(unsigned long))
+
+/* Control and status registers are mapped as type 3 */
+#define KVM_REG_RISCV_CSR  (0x03 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_CSR_REG(name)\
+   (offsetof(struct kvm_riscv_csr, name) / sizeof(unsigned long))
+
 #endif
 
 #endif /* __LINUX_KVM_RISCV_H */
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 3223f723f79e..c9faca14f8cd 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -165,6 +165,219 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, 
struct vm_fault *vmf)
return VM_FAULT_SIGBUS;
 }
 
+static int kvm_riscv_vcpu_get_reg_config(struct kvm_vcpu *vcpu,
+const struct kvm_one_reg *reg)
+{
+   unsigned long __user *uaddr =
+   (unsigned long __user *)(unsigned long)reg->addr;
+   unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+   KVM_REG_SIZE_MASK |
+   KVM_REG_RISCV_CONFIG);
+   unsigned long reg_val;
+
+   if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
+   return -EINVAL;
+
+   switch (reg_num) {
+   case KVM_REG_RISCV_CONFIG_REG(isa):
+   reg_val = vcpu->arch.isa;
+   break;
+   case KVM_REG_RISCV_CONFIG_REG(tbfreq):
+   reg_val = riscv_timebase;
+   break;
+   default:
+   return -EINVAL;
+   };
+
+   if (copy_to_user(uaddr, ®_val, KVM_REG_SIZE(reg->id)))
+   return -EFAULT;
+
+   return 0;
+}
+
+static int kvm_riscv_vcpu_set_reg_config(struct kvm_vc

[PATCH v8 08/19] RISC-V: KVM: Handle MMIO exits for VCPU

2019-10-02 Thread Anup Patel
We will get stage2 page faults whenever Guest/VM access SW emulated
MMIO device or unmapped Guest RAM.

This patch implements MMIO read/write emulation by extracting MMIO
details from the trapped load/store instruction and forwarding the
MMIO read/write to user-space. The actual MMIO emulation will happen
in user-space and KVM kernel module will only take care of register
updates before resuming the trapped VCPU.

The handling for stage2 page faults for unmapped Guest RAM will be
implemeted by a separate patch later.

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/asm/kvm_host.h |  20 ++
 arch/riscv/kvm/mmu.c  |   7 +
 arch/riscv/kvm/vcpu_exit.c| 505 +-
 arch/riscv/kvm/vcpu_switch.S  |  14 +
 4 files changed, 543 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index 18f1097f1d8d..2a5209fff68d 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -53,6 +53,13 @@ struct kvm_arch {
phys_addr_t pgd_phys;
 };
 
+struct kvm_mmio_decode {
+   unsigned long insn;
+   int len;
+   int shift;
+   int return_handled;
+};
+
 struct kvm_cpu_context {
unsigned long zero;
unsigned long ra;
@@ -141,6 +148,9 @@ struct kvm_vcpu_arch {
unsigned long irqs_pending;
unsigned long irqs_pending_mask;
 
+   /* MMIO instruction details */
+   struct kvm_mmio_decode mmio_decode;
+
/* VCPU power-off state */
bool power_off;
 
@@ -160,11 +170,21 @@ static inline void kvm_arch_vcpu_block_finish(struct 
kvm_vcpu *vcpu) {}
 int kvm_riscv_setup_vsip(void);
 void kvm_riscv_cleanup_vsip(void);
 
+int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
+bool is_write);
 void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
 int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
 void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
 void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu);
 
+void __kvm_riscv_unpriv_trap(void);
+
+unsigned long kvm_riscv_vcpu_unpriv_read(struct kvm_vcpu *vcpu,
+bool read_insn,
+unsigned long guest_addr,
+unsigned long *trap_scause);
+void kvm_riscv_vcpu_trap_redirect(struct kvm_vcpu *vcpu,
+ unsigned long scause, unsigned long stval);
 int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
unsigned long scause, unsigned long stval);
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 04dd089b86ff..2b965f9aac07 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -61,6 +61,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
return 0;
 }
 
+int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
+bool is_write)
+{
+   /* TODO: */
+   return 0;
+}
+
 void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu)
 {
/* TODO: */
diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
index e4d7c8f0807a..f1378c0a447f 100644
--- a/arch/riscv/kvm/vcpu_exit.c
+++ b/arch/riscv/kvm/vcpu_exit.c
@@ -6,9 +6,430 @@
  * Anup Patel 
  */
 
+#include 
 #include 
 #include 
 #include 
+#include 
+
+#define INSN_MATCH_LB  0x3
+#define INSN_MASK_LB   0x707f
+#define INSN_MATCH_LH  0x1003
+#define INSN_MASK_LH   0x707f
+#define INSN_MATCH_LW  0x2003
+#define INSN_MASK_LW   0x707f
+#define INSN_MATCH_LD  0x3003
+#define INSN_MASK_LD   0x707f
+#define INSN_MATCH_LBU 0x4003
+#define INSN_MASK_LBU  0x707f
+#define INSN_MATCH_LHU 0x5003
+#define INSN_MASK_LHU  0x707f
+#define INSN_MATCH_LWU 0x6003
+#define INSN_MASK_LWU  0x707f
+#define INSN_MATCH_SB  0x23
+#define INSN_MASK_SB   0x707f
+#define INSN_MATCH_SH  0x1023
+#define INSN_MASK_SH   0x707f
+#define INSN_MATCH_SW  0x2023
+#define INSN_MASK_SW   0x707f
+#define INSN_MATCH_SD  0x3023
+#define INSN_MASK_SD   0x707f
+
+#define INSN_MATCH_C_LD0x6000
+#define INSN_MASK_C_LD 0xe003
+#define INSN_MATCH_C_SD0xe000
+#define INSN_MASK_C_SD 0xe003
+#define INSN_MATCH_C_LW0x4000
+#define INSN_MASK_C_LW 0xe003
+#define INSN_MATCH_C_SW0xc000
+#define INSN_MASK_C_SW 0xe003
+#define INSN_MATCH_C_LDSP  0x6002
+#define INSN_MASK_C_LDSP   0xe003
+#define INSN_MATCH_C_SDSP  0xe002
+#define INSN_MASK_C_SDSP   0xe003
+#define INSN_MATCH_C_LWSP  0x4002
+#define INSN_MASK_C_LWSP   0xe003
+#

[PATCH v8 07/19] RISC-V: KVM: Implement VCPU world-switch

2019-10-02 Thread Anup Patel
This patch implements the VCPU world-switch for KVM RISC-V.

The KVM RISC-V world-switch (i.e. __kvm_riscv_switch_to()) mostly
switches general purpose registers, SSTATUS, STVEC, SSCRATCH and
HSTATUS CSRs. Other CSRs are switched via vcpu_load() and vcpu_put()
interface in kvm_arch_vcpu_load() and kvm_arch_vcpu_put() functions
respectively.

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/asm/kvm_host.h |   9 +-
 arch/riscv/kernel/asm-offsets.c   |  76 
 arch/riscv/kvm/Makefile   |   2 +-
 arch/riscv/kvm/vcpu.c |  32 -
 arch/riscv/kvm/vcpu_switch.S  | 194 ++
 5 files changed, 309 insertions(+), 4 deletions(-)
 create mode 100644 arch/riscv/kvm/vcpu_switch.S

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index d801216da6d0..18f1097f1d8d 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -110,6 +110,13 @@ struct kvm_vcpu_arch {
/* ISA feature bits (similar to MISA) */
unsigned long isa;
 
+   /* SSCRATCH and STVEC of Host */
+   unsigned long host_sscratch;
+   unsigned long host_stvec;
+
+   /* CPU context of Host */
+   struct kvm_cpu_context host_context;
+
/* CPU context of Guest VCPU */
struct kvm_cpu_context guest_context;
 
@@ -162,7 +169,7 @@ int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, 
struct kvm_run *run);
 int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
unsigned long scause, unsigned long stval);
 
-static inline void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch) {}
+void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch);
 
 int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
 int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index 9f5628c38ac9..711656710190 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -7,7 +7,9 @@
 #define GENERATING_ASM_OFFSETS
 
 #include 
+#include 
 #include 
+#include 
 #include 
 #include 
 
@@ -109,6 +111,80 @@ void asm_offsets(void)
OFFSET(PT_SBADADDR, pt_regs, sbadaddr);
OFFSET(PT_SCAUSE, pt_regs, scause);
 
+   OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
+   OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
+   OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
+   OFFSET(KVM_ARCH_GUEST_GP, kvm_vcpu_arch, guest_context.gp);
+   OFFSET(KVM_ARCH_GUEST_TP, kvm_vcpu_arch, guest_context.tp);
+   OFFSET(KVM_ARCH_GUEST_T0, kvm_vcpu_arch, guest_context.t0);
+   OFFSET(KVM_ARCH_GUEST_T1, kvm_vcpu_arch, guest_context.t1);
+   OFFSET(KVM_ARCH_GUEST_T2, kvm_vcpu_arch, guest_context.t2);
+   OFFSET(KVM_ARCH_GUEST_S0, kvm_vcpu_arch, guest_context.s0);
+   OFFSET(KVM_ARCH_GUEST_S1, kvm_vcpu_arch, guest_context.s1);
+   OFFSET(KVM_ARCH_GUEST_A0, kvm_vcpu_arch, guest_context.a0);
+   OFFSET(KVM_ARCH_GUEST_A1, kvm_vcpu_arch, guest_context.a1);
+   OFFSET(KVM_ARCH_GUEST_A2, kvm_vcpu_arch, guest_context.a2);
+   OFFSET(KVM_ARCH_GUEST_A3, kvm_vcpu_arch, guest_context.a3);
+   OFFSET(KVM_ARCH_GUEST_A4, kvm_vcpu_arch, guest_context.a4);
+   OFFSET(KVM_ARCH_GUEST_A5, kvm_vcpu_arch, guest_context.a5);
+   OFFSET(KVM_ARCH_GUEST_A6, kvm_vcpu_arch, guest_context.a6);
+   OFFSET(KVM_ARCH_GUEST_A7, kvm_vcpu_arch, guest_context.a7);
+   OFFSET(KVM_ARCH_GUEST_S2, kvm_vcpu_arch, guest_context.s2);
+   OFFSET(KVM_ARCH_GUEST_S3, kvm_vcpu_arch, guest_context.s3);
+   OFFSET(KVM_ARCH_GUEST_S4, kvm_vcpu_arch, guest_context.s4);
+   OFFSET(KVM_ARCH_GUEST_S5, kvm_vcpu_arch, guest_context.s5);
+   OFFSET(KVM_ARCH_GUEST_S6, kvm_vcpu_arch, guest_context.s6);
+   OFFSET(KVM_ARCH_GUEST_S7, kvm_vcpu_arch, guest_context.s7);
+   OFFSET(KVM_ARCH_GUEST_S8, kvm_vcpu_arch, guest_context.s8);
+   OFFSET(KVM_ARCH_GUEST_S9, kvm_vcpu_arch, guest_context.s9);
+   OFFSET(KVM_ARCH_GUEST_S10, kvm_vcpu_arch, guest_context.s10);
+   OFFSET(KVM_ARCH_GUEST_S11, kvm_vcpu_arch, guest_context.s11);
+   OFFSET(KVM_ARCH_GUEST_T3, kvm_vcpu_arch, guest_context.t3);
+   OFFSET(KVM_ARCH_GUEST_T4, kvm_vcpu_arch, guest_context.t4);
+   OFFSET(KVM_ARCH_GUEST_T5, kvm_vcpu_arch, guest_context.t5);
+   OFFSET(KVM_ARCH_GUEST_T6, kvm_vcpu_arch, guest_context.t6);
+   OFFSET(KVM_ARCH_GUEST_SEPC, kvm_vcpu_arch, guest_context.sepc);
+   OFFSET(KVM_ARCH_GUEST_SSTATUS, kvm_vcpu_arch, guest_context.sstatus);
+   OFFSET(KVM_ARCH_GUEST_HSTATUS, kvm_vcpu_arch, guest_context.hstatus);
+
+   OFFSET(KVM_ARCH_HOST_ZERO, kvm_vcpu_arch, host_context.zero);
+   OFFSET(KVM_ARCH_HOST_RA, kvm_vcpu_arch, host_context.ra);
+   OFFSET(KVM_ARCH_HOST_SP, kvm_

[PATCH v8 09/19] RISC-V: KVM: Handle WFI exits for VCPU

2019-10-02 Thread Anup Patel
We get illegal instruction trap whenever Guest/VM executes WFI
instruction.

This patch handles WFI trap by blocking the trapped VCPU using
kvm_vcpu_block() API. The blocked VCPU will be automatically
resumed whenever a VCPU interrupt is injected from user-space
or from in-kernel IRQCHIP emulation.

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
---
 arch/riscv/kvm/vcpu_exit.c | 72 ++
 1 file changed, 72 insertions(+)

diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
index f1378c0a447f..7507b859246b 100644
--- a/arch/riscv/kvm/vcpu_exit.c
+++ b/arch/riscv/kvm/vcpu_exit.c
@@ -12,6 +12,13 @@
 #include 
 #include 
 
+#define INSN_OPCODE_MASK   0x007c
+#define INSN_OPCODE_SHIFT  2
+#define INSN_OPCODE_SYSTEM 28
+
+#define INSN_MASK_WFI  0xff00
+#define INSN_MATCH_WFI 0x1050
+
 #define INSN_MATCH_LB  0x3
 #define INSN_MASK_LB   0x707f
 #define INSN_MATCH_LH  0x1003
@@ -116,6 +123,67 @@
 (s32)(((insn) >> 7) & 0x1f))
 #define MASK_FUNCT30x7000
 
+static int truly_illegal_insn(struct kvm_vcpu *vcpu,
+ struct kvm_run *run,
+ ulong insn)
+{
+   /* Redirect trap to Guest VCPU */
+   kvm_riscv_vcpu_trap_redirect(vcpu, EXC_INST_ILLEGAL, insn);
+
+   return 1;
+}
+
+static int system_opcode_insn(struct kvm_vcpu *vcpu,
+ struct kvm_run *run,
+ ulong insn)
+{
+   if ((insn & INSN_MASK_WFI) == INSN_MATCH_WFI) {
+   vcpu->stat.wfi_exit_stat++;
+   if (!kvm_arch_vcpu_runnable(vcpu)) {
+   srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx);
+   kvm_vcpu_block(vcpu);
+   vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+   kvm_clear_request(KVM_REQ_UNHALT, vcpu);
+   }
+   vcpu->arch.guest_context.sepc += INSN_LEN(insn);
+   return 1;
+   }
+
+   return truly_illegal_insn(vcpu, run, insn);
+}
+
+static int illegal_inst_fault(struct kvm_vcpu *vcpu, struct kvm_run *run,
+ unsigned long insn)
+{
+   unsigned long ut_scause = 0;
+   struct kvm_cpu_context *ct;
+
+   if (unlikely(INSN_IS_16BIT(insn))) {
+   if (insn == 0) {
+   ct = &vcpu->arch.guest_context;
+   insn = kvm_riscv_vcpu_unpriv_read(vcpu, true,
+ ct->sepc,
+ &ut_scause);
+   if (ut_scause) {
+   if (ut_scause == EXC_LOAD_PAGE_FAULT)
+   ut_scause = EXC_INST_PAGE_FAULT;
+   kvm_riscv_vcpu_trap_redirect(vcpu, ut_scause,
+ct->sepc);
+   return 1;
+   }
+   }
+   if (INSN_IS_16BIT(insn))
+   return truly_illegal_insn(vcpu, run, insn);
+   }
+
+   switch ((insn & INSN_OPCODE_MASK) >> INSN_OPCODE_SHIFT) {
+   case INSN_OPCODE_SYSTEM:
+   return system_opcode_insn(vcpu, run, insn);
+   default:
+   return truly_illegal_insn(vcpu, run, insn);
+   }
+}
+
 static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run,
unsigned long fault_addr)
 {
@@ -508,6 +576,10 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct 
kvm_run *run,
ret = -EFAULT;
run->exit_reason = KVM_EXIT_UNKNOWN;
switch (scause) {
+   case EXC_INST_ILLEGAL:
+   if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
+   ret = illegal_inst_fault(vcpu, run, stval);
+   break;
case EXC_INST_PAGE_FAULT:
case EXC_LOAD_PAGE_FAULT:
case EXC_STORE_PAGE_FAULT:
-- 
2.17.1



[PATCH v8 01/19] RISC-V: Add bitmap reprensenting ISA features common across CPUs

2019-10-02 Thread Anup Patel
This patch adds riscv_isa bitmap which represents Host ISA features
common across all Host CPUs. The riscv_isa is not same as elf_hwcap
because elf_hwcap will only have ISA features relevant for user-space
apps whereas riscv_isa will have ISA features relevant to both kernel
and user-space apps.

One of the use-case for riscv_isa bitmap is in KVM hypervisor where
we will use it to do following operations:

1. Check whether hypervisor extension is available
2. Find ISA features that need to be virtualized (e.g. floating
   point support, vector extension, etc.)

Signed-off-by: Anup Patel 
Signed-off-by: Atish Patra 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/asm/hwcap.h | 22 +
 arch/riscv/kernel/cpufeature.c | 83 --
 2 files changed, 102 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 7ecb7c6a57b1..5989dd4426d1 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -8,6 +8,7 @@
 #ifndef __ASM_HWCAP_H
 #define __ASM_HWCAP_H
 
+#include 
 #include 
 
 #ifndef __ASSEMBLY__
@@ -22,5 +23,26 @@ enum {
 };
 
 extern unsigned long elf_hwcap;
+
+#define RISCV_ISA_EXT_a('a' - 'a')
+#define RISCV_ISA_EXT_c('c' - 'a')
+#define RISCV_ISA_EXT_d('d' - 'a')
+#define RISCV_ISA_EXT_f('f' - 'a')
+#define RISCV_ISA_EXT_h('h' - 'a')
+#define RISCV_ISA_EXT_i('i' - 'a')
+#define RISCV_ISA_EXT_m('m' - 'a')
+#define RISCV_ISA_EXT_s('s' - 'a')
+#define RISCV_ISA_EXT_u('u' - 'a')
+
+#define RISCV_ISA_EXT_MAX  256
+
+unsigned long riscv_isa_extension_base(const unsigned long *isa_bitmap);
+
+#define riscv_isa_extension_mask(ext) BIT_MASK(RISCV_ISA_EXT_##ext)
+
+bool __riscv_isa_extension_available(const unsigned long *isa_bitmap, int bit);
+#define riscv_isa_extension_available(isa_bitmap, ext) \
+   __riscv_isa_extension_available(isa_bitmap, RISCV_ISA_EXT_##ext)
+
 #endif
 #endif
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index b1ade9a49347..941aeb33f85b 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -6,21 +6,64 @@
  * Copyright (C) 2017 SiFive
  */
 
+#include 
 #include 
 #include 
 #include 
 #include 
 
 unsigned long elf_hwcap __read_mostly;
+
+/* Host ISA bitmap */
+static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly;
+
 #ifdef CONFIG_FPU
 bool has_fpu __read_mostly;
 #endif
 
+/**
+ * riscv_isa_extension_base() - Get base extension word
+ *
+ * @isa_bitmap: ISA bitmap to use
+ * Return: base extension word as unsigned long value
+ *
+ * NOTE: If isa_bitmap is NULL then Host ISA bitmap will be used.
+ */
+unsigned long riscv_isa_extension_base(const unsigned long *isa_bitmap)
+{
+   if (!isa_bitmap)
+   return riscv_isa[0];
+   return isa_bitmap[0];
+}
+EXPORT_SYMBOL_GPL(riscv_isa_extension_base);
+
+/**
+ * __riscv_isa_extension_available() - Check whether given extension
+ * is available or not
+ *
+ * @isa_bitmap: ISA bitmap to use
+ * @bit: bit position of the desired extension
+ * Return: true or false
+ *
+ * NOTE: If isa_bitmap is NULL then Host ISA bitmap will be used.
+ */
+bool __riscv_isa_extension_available(const unsigned long *isa_bitmap, int bit)
+{
+   const unsigned long *bmap = (isa_bitmap) ? isa_bitmap : riscv_isa;
+
+   if (bit >= RISCV_ISA_EXT_MAX)
+   return false;
+
+   return test_bit(bit, bmap) ? true : false;
+}
+EXPORT_SYMBOL_GPL(__riscv_isa_extension_available);
+
 void riscv_fill_hwcap(void)
 {
struct device_node *node;
const char *isa;
-   size_t i;
+   char print_str[BITS_PER_LONG+1];
+   size_t i, j, isa_len;
static unsigned long isa2hwcap[256] = {0};
 
isa2hwcap['i'] = isa2hwcap['I'] = COMPAT_HWCAP_ISA_I;
@@ -32,8 +75,11 @@ void riscv_fill_hwcap(void)
 
elf_hwcap = 0;
 
+   bitmap_zero(riscv_isa, RISCV_ISA_EXT_MAX);
+
for_each_of_cpu_node(node) {
unsigned long this_hwcap = 0;
+   unsigned long this_isa = 0;
 
if (riscv_of_processor_hartid(node) < 0)
continue;
@@ -43,8 +89,24 @@ void riscv_fill_hwcap(void)
continue;
}
 
-   for (i = 0; i < strlen(isa); ++i)
+   i = 0;
+   isa_len = strlen(isa);
+#if defined(CONFIG_32BIT)
+   if (!strncmp(isa, "rv32", 4))
+   i += 4;
+#elif defined(CONFIG_64BIT)
+   if (!strncmp(isa, "rv64", 4))
+   i += 4;
+#endif
+   for (; i < isa_len; ++i) {
this_hwcap |= isa2hwcap[(unsigned char)(isa[i])];
+   /*
+* TODO: X, Y and Z extension parsing for Host ISA
+* bitmap will be added in

[PATCH v8 02/19] RISC-V: Add hypervisor extension related CSR defines

2019-10-02 Thread Anup Patel
This patch extends asm/csr.h by adding RISC-V hypervisor extension
related defines.

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/asm/csr.h | 58 
 1 file changed, 58 insertions(+)

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index a18923fa23c8..059c5cb22aaf 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -27,6 +27,8 @@
 #define SR_XS_CLEAN_AC(0x0001, UL)
 #define SR_XS_DIRTY_AC(0x00018000, UL)
 
+#define SR_MXR _AC(0x0008, UL)
+
 #ifndef CONFIG_64BIT
 #define SR_SD  _AC(0x8000, UL) /* FS/XS dirty */
 #else
@@ -59,10 +61,13 @@
 
 #define EXC_INST_MISALIGNED0
 #define EXC_INST_ACCESS1
+#define EXC_INST_ILLEGAL   2
 #define EXC_BREAKPOINT 3
 #define EXC_LOAD_ACCESS5
 #define EXC_STORE_ACCESS   7
 #define EXC_SYSCALL8
+#define EXC_HYPERVISOR_SYSCALL 9
+#define EXC_SUPERVISOR_SYSCALL 10
 #define EXC_INST_PAGE_FAULT12
 #define EXC_LOAD_PAGE_FAULT13
 #define EXC_STORE_PAGE_FAULT   15
@@ -72,6 +77,43 @@
 #define SIE_STIE   (_AC(0x1, UL) << IRQ_S_TIMER)
 #define SIE_SEIE   (_AC(0x1, UL) << IRQ_S_EXT)
 
+/* HSTATUS flags */
+#define HSTATUS_VTSR   _AC(0x0040, UL)
+#define HSTATUS_VTVM   _AC(0x0010, UL)
+#define HSTATUS_SP2V   _AC(0x0200, UL)
+#define HSTATUS_SP2P   _AC(0x0100, UL)
+#define HSTATUS_SPV_AC(0x0080, UL)
+#define HSTATUS_STL_AC(0x0040, UL)
+#define HSTATUS_SPRV   _AC(0x0001, UL)
+
+/* HGATP flags */
+#define HGATP_MODE_OFF _AC(0, UL)
+#define HGATP_MODE_SV32X4  _AC(1, UL)
+#define HGATP_MODE_SV39X4  _AC(8, UL)
+#define HGATP_MODE_SV48X4  _AC(9, UL)
+
+#define HGATP32_MODE_SHIFT 31
+#define HGATP32_VMID_SHIFT 22
+#define HGATP32_VMID_MASK  _AC(0x1FC0, UL)
+#define HGATP32_PPN_AC(0x003F, UL)
+
+#define HGATP64_MODE_SHIFT 60
+#define HGATP64_VMID_SHIFT 44
+#define HGATP64_VMID_MASK  _AC(0x03FFF000, UL)
+#define HGATP64_PPN_AC(0x0FFF, UL)
+
+#ifdef CONFIG_64BIT
+#define HGATP_PPN  HGATP64_PPN
+#define HGATP_VMID_SHIFT   HGATP64_VMID_SHIFT
+#define HGATP_VMID_MASKHGATP64_VMID_MASK
+#define HGATP_MODE (HGATP_MODE_SV39X4 << HGATP64_MODE_SHIFT)
+#else
+#define HGATP_PPN  HGATP32_PPN
+#define HGATP_VMID_SHIFT   HGATP32_VMID_SHIFT
+#define HGATP_VMID_MASKHGATP32_VMID_MASK
+#define HGATP_MODE (HGATP_MODE_SV32X4 << HGATP32_MODE_SHIFT)
+#endif
+
 #define CSR_CYCLE  0xc00
 #define CSR_TIME   0xc01
 #define CSR_INSTRET0xc02
@@ -85,6 +127,22 @@
 #define CSR_STVAL  0x143
 #define CSR_SIP0x144
 #define CSR_SATP   0x180
+
+#define CSR_VSSTATUS   0x200
+#define CSR_VSIE   0x204
+#define CSR_VSTVEC 0x205
+#define CSR_VSSCRATCH  0x240
+#define CSR_VSEPC  0x241
+#define CSR_VSCAUSE0x242
+#define CSR_VSTVAL 0x243
+#define CSR_VSIP   0x244
+#define CSR_VSATP  0x280
+
+#define CSR_HSTATUS0x600
+#define CSR_HEDELEG0x602
+#define CSR_HIDELEG0x603
+#define CSR_HGATP  0x680
+
 #define CSR_CYCLEH 0xc80
 #define CSR_TIMEH  0xc81
 #define CSR_INSTRETH   0xc82
-- 
2.17.1



[PATCH v8 04/19] RISC-V: KVM: Implement VCPU create, init and destroy functions

2019-10-02 Thread Anup Patel
This patch implements VCPU create, init and destroy functions
required by generic KVM module. We don't have much dynamic
resources in struct kvm_vcpu_arch so these functions are quite
simple for KVM RISC-V.

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/asm/kvm_host.h | 68 +++
 arch/riscv/kvm/vcpu.c | 68 +--
 2 files changed, 132 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index 9459709656be..dab32c9c3470 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -53,7 +53,75 @@ struct kvm_arch {
phys_addr_t pgd_phys;
 };
 
+struct kvm_cpu_context {
+   unsigned long zero;
+   unsigned long ra;
+   unsigned long sp;
+   unsigned long gp;
+   unsigned long tp;
+   unsigned long t0;
+   unsigned long t1;
+   unsigned long t2;
+   unsigned long s0;
+   unsigned long s1;
+   unsigned long a0;
+   unsigned long a1;
+   unsigned long a2;
+   unsigned long a3;
+   unsigned long a4;
+   unsigned long a5;
+   unsigned long a6;
+   unsigned long a7;
+   unsigned long s2;
+   unsigned long s3;
+   unsigned long s4;
+   unsigned long s5;
+   unsigned long s6;
+   unsigned long s7;
+   unsigned long s8;
+   unsigned long s9;
+   unsigned long s10;
+   unsigned long s11;
+   unsigned long t3;
+   unsigned long t4;
+   unsigned long t5;
+   unsigned long t6;
+   unsigned long sepc;
+   unsigned long sstatus;
+   unsigned long hstatus;
+};
+
+struct kvm_vcpu_csr {
+   unsigned long vsstatus;
+   unsigned long vsie;
+   unsigned long vstvec;
+   unsigned long vsscratch;
+   unsigned long vsepc;
+   unsigned long vscause;
+   unsigned long vstval;
+   unsigned long vsip;
+   unsigned long vsatp;
+};
+
 struct kvm_vcpu_arch {
+   /* VCPU ran atleast once */
+   bool ran_atleast_once;
+
+   /* ISA feature bits (similar to MISA) */
+   unsigned long isa;
+
+   /* CPU context of Guest VCPU */
+   struct kvm_cpu_context guest_context;
+
+   /* CPU CSR context of Guest VCPU */
+   struct kvm_vcpu_csr guest_csr;
+
+   /* CPU context upon Guest VCPU reset */
+   struct kvm_cpu_context guest_reset_context;
+
+   /* CPU CSR context upon Guest VCPU reset */
+   struct kvm_vcpu_csr guest_reset_csr;
+
/* Don't run the VCPU (blocked) */
bool pause;
 
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 48536cb0c8e7..8272b05d6ce4 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -31,10 +31,48 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
 };
 
+#define KVM_RISCV_ISA_ALLOWED  (riscv_isa_extension_mask(a) | \
+riscv_isa_extension_mask(c) | \
+riscv_isa_extension_mask(d) | \
+riscv_isa_extension_mask(f) | \
+riscv_isa_extension_mask(i) | \
+riscv_isa_extension_mask(m) | \
+riscv_isa_extension_mask(s) | \
+riscv_isa_extension_mask(u))
+
+static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
+{
+   struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
+   struct kvm_vcpu_csr *reset_csr = &vcpu->arch.guest_reset_csr;
+   struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
+   struct kvm_cpu_context *reset_cntx = &vcpu->arch.guest_reset_context;
+
+   memcpy(csr, reset_csr, sizeof(*csr));
+
+   memcpy(cntx, reset_cntx, sizeof(*cntx));
+}
+
 struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
 {
-   /* TODO: */
-   return NULL;
+   int err;
+   struct kvm_vcpu *vcpu;
+
+   vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
+   if (!vcpu) {
+   err = -ENOMEM;
+   goto out;
+   }
+
+   err = kvm_vcpu_init(vcpu, kvm, id);
+   if (err)
+   goto free_vcpu;
+
+   return vcpu;
+
+free_vcpu:
+   kmem_cache_free(kvm_vcpu_cache, vcpu);
+out:
+   return ERR_PTR(err);
 }
 
 int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
@@ -48,13 +86,32 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
 
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
-   /* TODO: */
+   struct kvm_cpu_context *cntx;
+
+   /* Mark this VCPU never ran */
+   vcpu->arch.ran_atleast_once = false;
+
+   /* Setup ISA features available to VCPU */
+   vcpu->arch.isa = riscv_isa_extension_base(NULL) & KVM_RISCV_ISA_ALLOWED;
+
+   /* Setup reset state of shadow SSTATUS and HSTATUS CSRs */
+   cntx = &vcpu->arch.guest_reset_context;
+   cntx->sstatus = SR_S

[PATCH v8 05/19] RISC-V: KVM: Implement VCPU interrupts and requests handling

2019-10-02 Thread Anup Patel
This patch implements VCPU interrupts and requests which are both
asynchronous events.

The VCPU interrupts can be set/unset using KVM_INTERRUPT ioctl from
user-space. In future, the in-kernel IRQCHIP emulation will use
kvm_riscv_vcpu_set_interrupt() and kvm_riscv_vcpu_unset_interrupt()
functions to set/unset VCPU interrupts.

Important VCPU requests implemented by this patch are:
KVM_REQ_SLEEP   - set whenever VCPU itself goes to sleep state
KVM_REQ_VCPU_RESET  - set whenever VCPU reset is requested

The WFI trap-n-emulate (added later) will use KVM_REQ_SLEEP request
and kvm_riscv_vcpu_has_interrupt() function.

The KVM_REQ_VCPU_RESET request will be used by SBI emulation (added
later) to power-up a VCPU in power-off state. The user-space can use
the GET_MPSTATE/SET_MPSTATE ioctls to get/set power state of a VCPU.

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/asm/kvm_host.h |  26 
 arch/riscv/include/uapi/asm/kvm.h |   3 +
 arch/riscv/kvm/main.c |   8 ++
 arch/riscv/kvm/vcpu.c | 193 --
 4 files changed, 217 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index dab32c9c3470..d801216da6d0 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -122,6 +122,21 @@ struct kvm_vcpu_arch {
/* CPU CSR context upon Guest VCPU reset */
struct kvm_vcpu_csr guest_reset_csr;
 
+   /*
+* VCPU interrupts
+*
+* We have a lockless approach for tracking pending VCPU interrupts
+* implemented using atomic bitops. The irqs_pending bitmap represent
+* pending interrupts whereas irqs_pending_mask represent bits changed
+* in irqs_pending. Our approach is modeled around multiple producer
+* and single consumer problem where the consumer is the VCPU itself.
+*/
+   unsigned long irqs_pending;
+   unsigned long irqs_pending_mask;
+
+   /* VCPU power-off state */
+   bool power_off;
+
/* Don't run the VCPU (blocked) */
bool pause;
 
@@ -135,6 +150,9 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu 
*vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
 
+int kvm_riscv_setup_vsip(void);
+void kvm_riscv_cleanup_vsip(void);
+
 void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
 int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
 void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
@@ -146,4 +164,12 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct 
kvm_run *run,
 
 static inline void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch) {}
 
+int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
+int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
+void kvm_riscv_vcpu_flush_interrupts(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu);
+bool kvm_riscv_vcpu_has_interrupt(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
+
 #endif /* __RISCV_KVM_HOST_H__ */
diff --git a/arch/riscv/include/uapi/asm/kvm.h 
b/arch/riscv/include/uapi/asm/kvm.h
index d15875818b6e..6dbc056d58ba 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -18,6 +18,9 @@
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 
+#define KVM_INTERRUPT_SET  -1U
+#define KVM_INTERRUPT_UNSET-2U
+
 /* for KVM_GET_REGS and KVM_SET_REGS */
 struct kvm_regs {
 };
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index e1ffe6d42f39..d088247843c5 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -48,6 +48,8 @@ int kvm_arch_hardware_enable(void)
hideleg |= SIE_SEIE;
csr_write(CSR_HIDELEG, hideleg);
 
+   csr_write(CSR_VSIP, 0);
+
return 0;
 }
 
@@ -59,11 +61,17 @@ void kvm_arch_hardware_disable(void)
 
 int kvm_arch_init(void *opaque)
 {
+   int ret;
+
if (!riscv_isa_extension_available(NULL, h)) {
kvm_info("hypervisor extension not available\n");
return -ENODEV;
}
 
+   ret = kvm_riscv_setup_vsip();
+   if (ret)
+   return ret;
+
kvm_info("hypervisor extension available\n");
 
return 0;
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 8272b05d6ce4..3223f723f79e 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -40,6 +41,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 riscv_isa_extension_mask(s) | \
 riscv_isa_extension_mask(u))
 
+static unsigned long __percpu *vsip_shadow;

[PATCH v8 00/19] KVM RISC-V Support

2019-10-02 Thread Anup Patel
This series adds initial KVM RISC-V support. Currently, we are able to boot
RISC-V 64bit Linux Guests with multiple VCPUs.

Few key aspects of KVM RISC-V added by this series are:
1. Minimal possible KVM world-switch which touches only GPRs and few CSRs.
2. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure.
3. KVM ONE_REG interface for VCPU register access from user-space.
4. PLIC emulation is done in user-space.
5. Timer and IPI emuation is done in-kernel.
6. MMU notifiers supported.
7. FP lazy save/restore supported.
8. SBI v0.1 emulation for KVM Guest available.
9. Forward unhandled SBI calls to KVM userspace.

Here's a brief TODO list which we will work upon after this series:
1. Implement recursive stage2 page table programing
2. SBI v0.2 emulation in-kernel
3. SBI v0.2 hart hotplug emulation in-kernel
4. In-kernel PLIC emulation
5. . and more .

This series can be found in riscv_kvm_v8 branch at:
https//github.com/avpatel/linux.git

Our work-in-progress KVMTOOL RISC-V port can be found in riscv_v1 branch at:
https//github.com/avpatel/kvmtool.git

The QEMU RISC-V hypervisor emulation is done by Alistair and is available
in riscv-hyp-work.next branch at:
https://github.com/alistair23/qemu.git

To play around with KVM RISC-V, refer KVM RISC-V wiki at:
https://github.com/kvm-riscv/howto/wiki
https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-QEMU

Changes since v7:
- Rebased series on Linux-5.4-rc1 and Atish's SBI v0.2 patches
- Removed PATCH1, PATCH3, and PATCH20 because these already merged
- Use kernel doc style comments for ISA bitmap functions
- Don't parse X, Y, and Z extension in riscv_fill_hwcap() because it will
  be added in-future
- Mark KVM RISC-V kconfig option as EXPERIMENTAL
- Typo fix in commit description of PATCH6 of v7 series
- Use separate structs for CORE and CSR registers of ONE_REG interface
- Explicitly include asm/sbi.h in kvm/vcpu_sbi.c
- Removed implicit switch-case fall-through in kvm_riscv_vcpu_exit()
- No need to set VSSTATUS.MXR bit in kvm_riscv_vcpu_unpriv_read()
- Removed register for instruction length in kvm_riscv_vcpu_unpriv_read()
- Added defines for checking/decoding instruction length
- Added separate patch to forward unhandled SBI calls to userspace tool

Changes since v6:
- Rebased patches on Linux-5.3-rc7
- Added "return_handled" in struct kvm_mmio_decode to ensure that
  kvm_riscv_vcpu_mmio_return() updates SEPC only once
- Removed trap_stval parameter from kvm_riscv_vcpu_unpriv_read()
- Updated git repo URL in MAINTAINERS entry

Changes since v5:
- Renamed KVM_REG_RISCV_CONFIG_TIMEBASE register to
  KVM_REG_RISCV_CONFIG_TBFREQ register in ONE_REG interface
- Update SPEC in kvm_riscv_vcpu_mmio_return() for MMIO exits
- Use switch case instead of illegal instruction opcode table for simplicity
- Improve comments in stage2_remote_tlb_flush() for a potential remote TLB
  flush optimization
- Handle all unsupported SBI calls in default case of
  kvm_riscv_vcpu_sbi_ecall() function
- Fixed kvm_riscv_vcpu_sync_interrupts() for software interrupts
- Improved unprivilege reads to handle traps due to Guest stage1 page table
- Added separate patch to document RISC-V specific things in
  Documentation/virt/kvm/api.txt

Changes since v4:
- Rebased patches on Linux-5.3-rc5
- Added Paolo's Acked-by and Reviewed-by
- Updated mailing list in MAINTAINERS entry

Changes since v3:
- Moved patch for ISA bitmap from KVM prep series to this series
- Make vsip_shadow as run-time percpu variable instead of compile-time
- Flush Guest TLBs on all Host CPUs whenever we run-out of VMIDs

Changes since v2:
- Removed references of KVM_REQ_IRQ_PENDING from all patches
- Use kvm->srcu within in-kernel KVM run loop
- Added percpu vsip_shadow to track last value programmed in VSIP CSR
- Added comments about irqs_pending and irqs_pending_mask
- Used kvm_arch_vcpu_runnable() in-place-of kvm_riscv_vcpu_has_interrupt()
  in system_opcode_insn()
- Removed unwanted smp_wmb() in kvm_riscv_stage2_vmid_update()
- Use kvm_flush_remote_tlbs() in kvm_riscv_stage2_vmid_update()
- Use READ_ONCE() in kvm_riscv_stage2_update_hgatp() for vmid

Changes since v1:
- Fixed compile errors in building KVM RISC-V as module
- Removed unused kvm_riscv_halt_guest() and kvm_riscv_resume_guest()
- Set KVM_CAP_SYNC_MMU capability only after MMU notifiers are implemented
- Made vmid_version as unsigned long instead of atomic
- Renamed KVM_REQ_UPDATE_PGTBL to KVM_REQ_UPDATE_HGATP
- Renamed kvm_riscv_stage2_update_pgtbl() to kvm_riscv_stage2_update_hgatp()
- Configure HIDELEG and HEDELEG in kvm_arch_hardware_enable()
- Updated ONE_REG interface for CSR access to user-space
- Removed irqs_pending_lock and use atomic bitops instead
- Added separate patch for FP ONE_REG interface
- Added separate patch for updating MAINTAINERS file

Anup Patel (15):
  RISC-V: Add bitmap reprensenting ISA features common across CPUs
  RISC-V: Add hypervisor extension related CSR defines
  RISC-V: Add initial skeletal KVM

[PATCH v8 03/19] RISC-V: Add initial skeletal KVM support

2019-10-02 Thread Anup Patel
This patch adds initial skeletal KVM RISC-V support which has:
1. A simple implementation of arch specific VM functions
   except kvm_vm_ioctl_get_dirty_log() which will implemeted
   in-future as part of stage2 page loging.
2. Stubs of required arch specific VCPU functions except
   kvm_arch_vcpu_ioctl_run() which is semi-complete and
   extended by subsequent patches.
3. Stubs for required arch specific stage2 MMU functions.

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/Kconfig|   2 +
 arch/riscv/Makefile   |   2 +
 arch/riscv/include/asm/kvm_host.h |  81 
 arch/riscv/include/uapi/asm/kvm.h |  47 +
 arch/riscv/kvm/Kconfig|  33 
 arch/riscv/kvm/Makefile   |  13 ++
 arch/riscv/kvm/main.c |  80 
 arch/riscv/kvm/mmu.c  |  83 
 arch/riscv/kvm/vcpu.c | 312 ++
 arch/riscv/kvm/vcpu_exit.c|  35 
 arch/riscv/kvm/vm.c   |  79 
 11 files changed, 767 insertions(+)
 create mode 100644 arch/riscv/include/asm/kvm_host.h
 create mode 100644 arch/riscv/include/uapi/asm/kvm.h
 create mode 100644 arch/riscv/kvm/Kconfig
 create mode 100644 arch/riscv/kvm/Makefile
 create mode 100644 arch/riscv/kvm/main.c
 create mode 100644 arch/riscv/kvm/mmu.c
 create mode 100644 arch/riscv/kvm/vcpu.c
 create mode 100644 arch/riscv/kvm/vcpu_exit.c
 create mode 100644 arch/riscv/kvm/vm.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 3815808f95fa..2744b50eaeea 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -327,3 +327,5 @@ menu "Power management options"
 source "kernel/power/Kconfig"
 
 endmenu
+
+source "arch/riscv/kvm/Kconfig"
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index f5e914210245..a2067cdae2cd 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -77,6 +77,8 @@ head-y := arch/riscv/kernel/head.o
 
 core-y += arch/riscv/
 
+core-$(CONFIG_KVM) += arch/riscv/kvm/
+
 libs-y += arch/riscv/lib/
 
 PHONY += vdso_install
diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
new file mode 100644
index ..9459709656be
--- /dev/null
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Anup Patel 
+ */
+
+#ifndef __RISCV_KVM_HOST_H__
+#define __RISCV_KVM_HOST_H__
+
+#include 
+#include 
+#include 
+
+#ifdef CONFIG_64BIT
+#define KVM_MAX_VCPUS  (1U << 16)
+#else
+#define KVM_MAX_VCPUS  (1U << 9)
+#endif
+
+#define KVM_USER_MEM_SLOTS 512
+#define KVM_HALT_POLL_NS_DEFAULT   50
+
+#define KVM_VCPU_MAX_FEATURES  0
+
+#define KVM_REQ_SLEEP \
+   KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(1)
+
+struct kvm_vm_stat {
+   ulong remote_tlb_flush;
+};
+
+struct kvm_vcpu_stat {
+   u64 halt_successful_poll;
+   u64 halt_attempted_poll;
+   u64 halt_poll_invalid;
+   u64 halt_wakeup;
+   u64 ecall_exit_stat;
+   u64 wfi_exit_stat;
+   u64 mmio_exit_user;
+   u64 mmio_exit_kernel;
+   u64 exits;
+};
+
+struct kvm_arch_memory_slot {
+};
+
+struct kvm_arch {
+   /* stage2 page table */
+   pgd_t *pgd;
+   phys_addr_t pgd_phys;
+};
+
+struct kvm_vcpu_arch {
+   /* Don't run the VCPU (blocked) */
+   bool pause;
+
+   /* SRCU lock index for in-kernel run loop */
+   int srcu_idx;
+};
+
+static inline void kvm_arch_hardware_unsetup(void) {}
+static inline void kvm_arch_sync_events(struct kvm *kvm) {}
+static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
+
+void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
+int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
+void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
+void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu);
+
+int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
+   unsigned long scause, unsigned long stval);
+
+static inline void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch) {}
+
+#endif /* __RISCV_KVM_HOST_H__ */
diff --git a/arch/riscv/include/uapi/asm/kvm.h 
b/arch/riscv/include/uapi/asm/kvm.h
new file mode 100644
index ..d15875818b6e
--- /dev/null
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Anup Patel 
+ */
+
+#ifndef __LINUX_KVM_RISCV_H
+#define __LINUX_KVM_RISCV_H
+
+#ifndef __

Re: [PATCH 1/3] x86/alternatives: Teach text_poke_bp() to emulate instructions

2019-10-02 Thread Masami Hiramatsu
Hi Peter,

On Tue, 27 Aug 2019 20:06:23 +0200
Peter Zijlstra  wrote:

> In preparation for static_call and variable size jump_label support,
> teach text_poke_bp() to emulate instructions, namely:
> 
>   JMP32, JMP8, CALL, NOP2, NOP_ATOMIC5
> 
> The current text_poke_bp() takes a @handler argument which is used as
> a jump target when the temporary INT3 is hit by a different CPU.
> 
> When patching CALL instructions, this doesn't work because we'd miss
> the PUSH of the return address. Instead, teach poke_int3_handler() to
> emulate an instruction, typically the instruction we're patching in.
> 
> This fits almost all text_poke_bp() users, except
> arch_unoptimize_kprobe() which restores random text, and for that site
> we have to build an explicit emulate instruction.

OK, and in this case, I would like to change RELATIVEJUMP_OPCODE
to JMP32_INSN_OPCODE for readability. (or at least
making RELATIVEJUMP_OPCODE as an alias of JMP32_INSN_OPCODE)

Except for that, this looks good to me.

Reviewed-by: Masami Hiramatsu 

Thank you,

> 
> Cc: Masami Hiramatsu 
> Cc: Steven Rostedt 
> Cc: Daniel Bristot de Oliveira 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  arch/x86/include/asm/text-patching.h |   24 ++--
>  arch/x86/kernel/alternative.c|   98 
> ++-
>  arch/x86/kernel/jump_label.c |9 +--
>  arch/x86/kernel/kprobes/opt.c|   11 ++-
>  4 files changed, 103 insertions(+), 39 deletions(-)
> 
> --- a/arch/x86/include/asm/text-patching.h
> +++ b/arch/x86/include/asm/text-patching.h
> @@ -26,10 +26,11 @@ static inline void apply_paravirt(struct
>  #define POKE_MAX_OPCODE_SIZE 5
>  
>  struct text_poke_loc {
> - void *detour;
>   void *addr;
> - size_t len;
> - const char opcode[POKE_MAX_OPCODE_SIZE];
> + int len;
> + s32 rel32;
> + u8 opcode;
> + const char text[POKE_MAX_OPCODE_SIZE];
>  };
>  
>  extern void text_poke_early(void *addr, const void *opcode, size_t len);
> @@ -51,8 +52,10 @@ extern void text_poke_early(void *addr,
>  extern void *text_poke(void *addr, const void *opcode, size_t len);
>  extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
>  extern int poke_int3_handler(struct pt_regs *regs);
> -extern void text_poke_bp(void *addr, const void *opcode, size_t len, void 
> *handler);
> +extern void text_poke_bp(void *addr, const void *opcode, size_t len, const 
> void *emulate);
>  extern void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int 
> nr_entries);
> +extern void text_poke_loc_init(struct text_poke_loc *tp, void *addr,
> +const void *opcode, size_t len, const void 
> *emulate);
>  extern int after_bootmem;
>  extern __ro_after_init struct mm_struct *poking_mm;
>  extern __ro_after_init unsigned long poking_addr;
> @@ -63,8 +66,17 @@ static inline void int3_emulate_jmp(stru
>   regs->ip = ip;
>  }
>  
> -#define INT3_INSN_SIZE 1
> -#define CALL_INSN_SIZE 5
> +#define INT3_INSN_SIZE   1
> +#define INT3_INSN_OPCODE 0xCC
> +
> +#define CALL_INSN_SIZE   5
> +#define CALL_INSN_OPCODE 0xE8
> +
> +#define JMP32_INSN_SIZE  5
> +#define JMP32_INSN_OPCODE0xE9
> +
> +#define JMP8_INSN_SIZE   2
> +#define JMP8_INSN_OPCODE 0xEB
>  
>  static inline void int3_emulate_push(struct pt_regs *regs, unsigned long val)
>  {
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -956,16 +956,15 @@ NOKPROBE_SYMBOL(patch_cmp);
>  int poke_int3_handler(struct pt_regs *regs)
>  {
>   struct text_poke_loc *tp;
> - unsigned char int3 = 0xcc;
>   void *ip;
>  
>   /*
>* Having observed our INT3 instruction, we now must observe
>* bp_patching.nr_entries.
>*
> -  *  nr_entries != 0 INT3
> -  *  WMB RMB
> -  *  write INT3  if (nr_entries)
> +  *  nr_entries != 0 INT3
> +  *  WMB RMB
> +  *  write INT3  if (nr_entries)
>*
>* Idem for other elements in bp_patching.
>*/
> @@ -978,9 +977,9 @@ int poke_int3_handler(struct pt_regs *re
>   return 0;
>  
>   /*
> -  * Discount the sizeof(int3). See text_poke_bp_batch().
> +  * Discount the INT3. See text_poke_bp_batch().
>*/
> - ip = (void *) regs->ip - sizeof(int3);
> + ip = (void *) regs->ip - INT3_INSN_SIZE;
>  
>   /*
>* Skip the binary search if there is a single member in the vector.
> @@ -997,8 +996,22 @@ int poke_int3_handler(struct pt_regs *re
>   return 0;
>   }
>  
> - /* set up the specified breakpoint detour */
> - regs->ip = (unsigned long) tp->detour;
> + ip += tp->len;
> +
> + switch (tp->opcode) {
> + case CALL_INSN_OPCODE:
> + int3_emulate_call(regs, (long)ip + tp->rel32);
> +   

Re: [rfc] mm, hugetlb: allow hugepage allocations to excessively reclaim

2019-10-02 Thread Michal Hocko
On Wed 02-10-19 16:37:57, Linus Torvalds wrote:
> On Wed, Oct 2, 2019 at 4:03 PM David Rientjes  wrote:
> >
> > Since hugetlb allocations have explicitly preferred to loop and do reclaim
> > and compaction, exempt them from this new behavior at least for the time
> > being.  It is not shown that hugetlb allocation success rate has been
> > impacted by commit b39d0ee2632d but hugetlb allocations are admittedly
> > beyond the scope of what the patch is intended to address (thp
> > allocations).
> 
> I'd like to see some numbers to show that this special case makes sense.

http://lkml.kernel.org/r/20191001054343.ga15...@dhcp22.suse.cz
While the test is somehow artificial it is not too much different from
real workloads which do preallocate a non trivial (50% in my case) of
memory for hugetlb pages. Having a moderately utilized memory (by page
cache in my case) is not really unexpected.

> I understand the "this is what it used to do, and hugetlbfs wasn't the
> intended recipient of the new semantics", and I don't think the patch
> is wrong.

This is not only about this used to work. It is an expected and
documented semantic of __GFP_RETRY_MAYFAIL

 * %__GFP_RETRY_MAYFAIL: The VM implementation will retry memory reclaim
 * procedures that have previously failed if there is some indication
 * that progress has been made else where.  It can wait for other
 * tasks to attempt high level approaches to freeing memory such as
 * compaction (which removes fragmentation) and page-out.
 * There is still a definite limit to the number of retries, but it is
 * a larger limit than with %__GFP_NORETRY.
 * Allocations with this flag may fail, but only when there is
 * genuinely little unused memory. While these allocations do not
 * directly trigger the OOM killer, their failure indicates that
 * the system is likely to need to use the OOM killer soon.  The
 * caller must handle failure, but can reasonably do so by failing
 * a higher-level request, or completing it only in a much less
 * efficient manner.
 
> But at the same time, we do know that swap storms happen for other
> loads, and if we say "hugetlbfs is different" then there should at
> least be some rationale for why it's different other than "history".
> Some actual "yes, we _want_ the possibile swap storms, because load
> XYZ".
> 
> And I don't mean microbenchmark numbers for "look, behavior changed".
> I mean "look, this is a real load, and now it runs X% slower because
> it relied on this hugetlbfs behavior".

It is not about running slower. It is about not getting the expected
amount of hugetlb pages requested by admin who knows that that size is
needed.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH RFC] reboot: hotplug cpus in migrate_to_reboot_cpu()

2019-10-02 Thread Hsin-Yi Wang
On Wed, Oct 2, 2019 at 7:41 PM Hsin-Yi Wang  wrote:
>
> Currently system reboots use arch specific codes (eg. smp_send_stop) to
> offline non reboot cpus. Some arch like arm64, arm, and x86... set offline
> masks to cpu without really offlining them. Thus it causes some race
> condition and kernel warning comes out sometimes when system reboots. We
> can do cpu hotplug in migrate_to_reboot_cpu() to avoid this issue.
>
> Signed-off-by: Hsin-Yi Wang 
> ---
> kernel warnings at reboot:
> [1] https://lore.kernel.org/lkml/20190820100843.3028-1-hsi...@chromium.org/
> [2] https://lore.kernel.org/lkml/20190727164450.ga11...@roeck-us.net/
> ---
>  kernel/cpu.c| 35 +++
>  kernel/reboot.c | 18 --
>  2 files changed, 35 insertions(+), 18 deletions(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index fc28e17940e0..2f4d51fe91e3 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #define CREATE_TRACE_POINTS
> @@ -1366,6 +1367,40 @@ int __boot_cpu_id;
>
>  #endif /* CONFIG_SMP */
>
> +void migrate_to_reboot_cpu(void)
> +{
> +   /* The boot cpu is always logical cpu 0 */
> +   int cpu = reboot_cpu;
> +
> +   /* Make certain the cpu I'm about to reboot on is online */
> +   if (!cpu_online(cpu))
> +   cpu = cpumask_first(cpu_online_mask);
> +
> +   /* Prevent races with other tasks migrating this task */
> +   current->flags |= PF_NO_SETAFFINITY;
> +
> +   /* Make certain I only run on the appropriate processor */
> +   set_cpus_allowed_ptr(current, cpumask_of(cpu));
> +
> +   /* Hotplug other cpus if possible */
> +   if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {

Should use #ifdef CONFIG_HOTPLUG_CPU here. Will fix in the next
version if this patch is reasonable.
(Reported-by: kbuild test robot )
> +   int i, err;
> +
> +   cpu_maps_update_begin();
> +
> +   for_each_online_cpu(i) {
> +   if (i == cpu)
> +   continue;
> +   err = _cpu_down(i, 0, CPUHP_OFFLINE);
> +   if (err)
> +   pr_info("Failed to offline cpu %d\n", i);
> +   }
> +   cpu_hotplug_disabled++;
> +
> +   cpu_maps_update_done();
> +   }
> +}
> +
>  /* Boot processor state steps */
>  static struct cpuhp_step cpuhp_hp_states[] = {
> [CPUHP_OFFLINE] = {
> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index c4d472b7f1b4..f0046be34a60 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -215,24 +215,6 @@ void do_kernel_restart(char *cmd)
> atomic_notifier_call_chain(&restart_handler_list, reboot_mode, cmd);
>  }
>
> -void migrate_to_reboot_cpu(void)
> -{
> -   /* The boot cpu is always logical cpu 0 */
> -   int cpu = reboot_cpu;
> -
> -   cpu_hotplug_disable();
> -
> -   /* Make certain the cpu I'm about to reboot on is online */
> -   if (!cpu_online(cpu))
> -   cpu = cpumask_first(cpu_online_mask);
> -
> -   /* Prevent races with other tasks migrating this task */
> -   current->flags |= PF_NO_SETAFFINITY;
> -
> -   /* Make certain I only run on the appropriate processor */
> -   set_cpus_allowed_ptr(current, cpumask_of(cpu));
> -}
> -
>  /**
>   * kernel_restart - reboot the system
>   * @cmd: pointer to buffer containing command to execute for restart
> --
> 2.23.0.444.g18eeb5a265-goog
>


[PATCH] ALSA: hda/realtek: Reduce the Headphone static noise on XPS 9350/9360

2019-10-02 Thread Kai-Heng Feng
Headphone on XPS 9350/9360 produces a background white noise. The The
noise level somehow correlates with "Headphone Mic Boost", when it sets
to 1 the noise disappears. However, doing this has a side effect, which
also decreases the overall headphone volume so I didn't send the patch
upstream.

The noise was bearable back then, but after commit 717f43d81afc ("ALSA:
hda/realtek - Update headset mode for ALC256") the noise exacerbates to
a point it starts hurting ears.

So let's use the workaround to set "Headphone Mic Boost" to 1 and lock
it so it's not touchable by userspace.

Fixes: 717f43d81afc ("ALSA: hda/realtek - Update headset mode for ALC256")
BugLink: https://bugs.launchpad.net/bugs/1654448
BugLink: https://bugs.launchpad.net/bugs/1845810
Signed-off-by: Kai-Heng Feng 
---
 sound/pci/hda/patch_realtek.c | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index b000b36ac3c6..b5c225a56b98 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -5358,6 +5358,17 @@ static void alc271_hp_gate_mic_jack(struct hda_codec 
*codec,
}
 }
 
+static void alc256_fixup_dell_xps_13_headphone_noise2(struct hda_codec *codec,
+ const struct hda_fixup 
*fix,
+ int action)
+{
+   if (action != HDA_FIXUP_ACT_PRE_PROBE)
+   return;
+
+   snd_hda_codec_amp_stereo(codec, 0x1a, HDA_INPUT, 0, HDA_AMP_VOLMASK, 1);
+   snd_hda_override_wcaps(codec, 0x1a, get_wcaps(codec, 0x1a) & 
~AC_WCAP_IN_AMP);
+}
+
 static void alc269_fixup_limit_int_mic_boost(struct hda_codec *codec,
 const struct hda_fixup *fix,
 int action)
@@ -5822,6 +5833,7 @@ enum {
ALC298_FIXUP_DELL_AIO_MIC_NO_PRESENCE,
ALC275_FIXUP_DELL_XPS,
ALC256_FIXUP_DELL_XPS_13_HEADPHONE_NOISE,
+   ALC256_FIXUP_DELL_XPS_13_HEADPHONE_NOISE2,
ALC293_FIXUP_LENOVO_SPK_NOISE,
ALC233_FIXUP_LENOVO_LINE2_MIC_HOTKEY,
ALC255_FIXUP_DELL_SPK_NOISE,
@@ -6558,6 +6570,12 @@ static const struct hda_fixup alc269_fixups[] = {
.chained = true,
.chain_id = ALC255_FIXUP_DELL1_MIC_NO_PRESENCE
},
+   [ALC256_FIXUP_DELL_XPS_13_HEADPHONE_NOISE2] = {
+   .type = HDA_FIXUP_FUNC,
+   .v.func = alc256_fixup_dell_xps_13_headphone_noise2,
+   .chained = true,
+   .chain_id = ALC256_FIXUP_DELL_XPS_13_HEADPHONE_NOISE
+   },
[ALC293_FIXUP_LENOVO_SPK_NOISE] = {
.type = HDA_FIXUP_FUNC,
.v.func = alc_fixup_disable_aamix,
@@ -7001,17 +7019,17 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x1028, 0x06de, "Dell", 
ALC293_FIXUP_DISABLE_AAMIX_MULTIJACK),
SND_PCI_QUIRK(0x1028, 0x06df, "Dell", 
ALC293_FIXUP_DISABLE_AAMIX_MULTIJACK),
SND_PCI_QUIRK(0x1028, 0x06e0, "Dell", 
ALC293_FIXUP_DISABLE_AAMIX_MULTIJACK),
-   SND_PCI_QUIRK(0x1028, 0x0704, "Dell XPS 13 9350", 
ALC256_FIXUP_DELL_XPS_13_HEADPHONE_NOISE),
+   SND_PCI_QUIRK(0x1028, 0x0704, "Dell XPS 13 9350", 
ALC256_FIXUP_DELL_XPS_13_HEADPHONE_NOISE2),
SND_PCI_QUIRK(0x1028, 0x0706, "Dell Inspiron 7559", 
ALC256_FIXUP_DELL_INSPIRON_7559_SUBWOOFER),
SND_PCI_QUIRK(0x1028, 0x0725, "Dell Inspiron 3162", 
ALC255_FIXUP_DELL_SPK_NOISE),
SND_PCI_QUIRK(0x1028, 0x0738, "Dell Precision 5820", 
ALC269_FIXUP_NO_SHUTUP),
-   SND_PCI_QUIRK(0x1028, 0x075b, "Dell XPS 13 9360", 
ALC256_FIXUP_DELL_XPS_13_HEADPHONE_NOISE),
+   SND_PCI_QUIRK(0x1028, 0x075b, "Dell XPS 13 9360", 
ALC256_FIXUP_DELL_XPS_13_HEADPHONE_NOISE2),
SND_PCI_QUIRK(0x1028, 0x075c, "Dell XPS 27 7760", 
ALC298_FIXUP_SPK_VOLUME),
SND_PCI_QUIRK(0x1028, 0x075d, "Dell AIO", ALC298_FIXUP_SPK_VOLUME),
SND_PCI_QUIRK(0x1028, 0x07b0, "Dell Precision 7520", 
ALC295_FIXUP_DISABLE_DAC3),
SND_PCI_QUIRK(0x1028, 0x0798, "Dell Inspiron 17 7000 Gaming", 
ALC256_FIXUP_DELL_INSPIRON_7559_SUBWOOFER),
SND_PCI_QUIRK(0x1028, 0x080c, "Dell WYSE", 
ALC225_FIXUP_DELL_WYSE_MIC_NO_PRESENCE),
-   SND_PCI_QUIRK(0x1028, 0x082a, "Dell XPS 13 9360", 
ALC256_FIXUP_DELL_XPS_13_HEADPHONE_NOISE),
+   SND_PCI_QUIRK(0x1028, 0x082a, "Dell XPS 13 9360", 
ALC256_FIXUP_DELL_XPS_13_HEADPHONE_NOISE2),
SND_PCI_QUIRK(0x1028, 0x084b, "Dell", 
ALC274_FIXUP_DELL_AIO_LINEOUT_VERB),
SND_PCI_QUIRK(0x1028, 0x084e, "Dell", 
ALC274_FIXUP_DELL_AIO_LINEOUT_VERB),
SND_PCI_QUIRK(0x1028, 0x0871, "Dell Precision 3630", 
ALC255_FIXUP_DELL_HEADSET_MIC),
-- 
2.17.1



linux-next: Tree for Oct 3

2019-10-02 Thread Stephen Rothwell
Hi all,

Changes since 20191002:

New tree: tomoyo

My fixes tree contains:

  04e6dac68d9b ("powerpc/64s/radix: fix for "tidy up TLB flushing code" and 
!CONFIG_PPC_RADIX_MMU")

The amdgpu tree gained a conflict against Linus' tree.

The keys tree with gained a conflict against mips-fixes tree.

Non-merge commits (relative to Linus' tree): 1484
 1654 files changed, 44635 insertions(+), 19751 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 314 trees (counting Linus' and 78 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (65aa35c93cc0 Merge tag 'erofs-for-5.4-rc2-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs)
Merging fixes/master (04e6dac68d9b powerpc/64s/radix: fix for "tidy up TLB 
flushing code" and !CONFIG_PPC_RADIX_MMU)
Merging kbuild-current/fixes (d159d87700e9 namespace: fix namespace.pl script 
to support relative paths)
Merging arc-current/for-curr (41277ba7eb4e ARC: mm: tlb flush optim: elide 
redundant uTLB invalidates for MMUv3)
Merging arm-current/fixes (5b3efa4f1479 ARM: 8901/1: add a criteria for 
pfn_valid of arm)
Merging arm-soc-fixes/arm/fixes (cdee3b60af59 ARM: dts: ux500: Fix up the CPU 
thermal zone)
Merging arm64-fixes/for-next/fixes (a2b99dcac36c docs: arm64: Fix indentation 
and doc formatting)
Merging m68k-current/for-linus (0f1979b402df m68k: Remove ioremap_fullcache())
Merging powerpc-fixes/fixes (253c892193ab powerpc/eeh: Fix eeh 
eeh_debugfs_break_device() with SRIOV devices)
Merging s390-fixes/fixes (9f494438d4bc s390/qdio: clarify size of the QIB parm 
area)
Merging sparc/master (038029c03e21 sparc: remove unneeded uapi/asm/statfs.h)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (53de429f4e88 net: hisilicon: Fix usage of uninitialized 
variable in function mdio_sc_cfg_reg_write())
Merging bpf/master (a2d074e4c6e8 selftests/bpf: test_progs: Don't leak 
server_fd in test_sockopt_inherit)
Merging ipsec/master (68ce6688a5ba net: sched: taprio: Fix potential integer 
overflow in taprio_set_picos_per_byte)
Merging netfilter/master (34a4c95abd25 netfilter: nft_connlimit: disable bh on 
garbage collection)
Merging ipvs/master (02dc96ef6c25 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net)
Merging wireless-drivers/master (c91a9cfe9f6d rt2x00: initialize last_reset)
Merging mac80211/master (569aad4fcd82 net: ag71xx: fix mdio subnode support)
Merging rdma-fixes/for-rc (b66f31efbdad RDMA/iwcm: Fix a lock inversion issue)
Merging sound-current/for-linus (f41f900568d9 ALSA: usb-audio: Add DSD support 
for EVGA NU Audio)
Merging sound-asoc-fixes/for-linus (bbb90c4ed03b Merge branch 'asoc-5.4' into 
asoc-linus)
Merging regmap-fixes/for-linus (54ecb8f7028c Linux 5.4-rc1)
Merging regulator-fixes/for-linus (c3f1e312854c Merge branch 'regulator-5.4' 
into regulator-linus)
Merging spi-fixes/for-linus (6efab62559b1 Merge branch 'spi-5.4' into spi-linus)
Merging pci-current/for-linus (54ecb8f7028c Linux 5.4-rc1)
Merging driver-core.current/driver-core-linus (82af5b660967 sysfs: Fixes 
__BIN_ATTR_WO() macro)
Merging tty.current/tty-linus (54ecb8f7028c Linux 5.4-rc1)
Merging usb.current/usb-linus (54ecb8f7028c Li

[PATCH] arm64: dts: qcom: sdm845: Add APSS watchdog node

2019-10-02 Thread Bjorn Andersson
Add a node describing the watchdog found in the application subsystem.

Signed-off-by: Bjorn Andersson 
---
 arch/arm64/boot/dts/qcom/sdm845.dtsi | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index f0b2db34ec4a..23915eab4187 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -3488,6 +3488,12 @@
status = "disabled";
};
 
+   watchdog@1798 {
+   compatible = "qcom,apss-wdt-sdm845", "qcom,kpss-wdt";
+   reg = <0 0x1798 0 0x1000>;
+   clocks = <&sleep_clk>;
+   };
+
apss_shared: mailbox@1799 {
compatible = "qcom,sdm845-apss-shared";
reg = <0 0x1799 0 0x1000>;
-- 
2.18.0



[PATCH v1 0/2] mmc: sdhci-of-arasan: Add Support for Intel LGM SDXC

2019-10-02 Thread Ramuthevar,Vadivel MuruganX
The current arasan sdhci PHY configuration isn't compatible
with the PHY on Intel's LGM(Lightning Mountain) SoC devices.

Therefore, add a new compatible, to adapt the Intel's LGM
SDXC PHY with arasan-sdhc controller to configure the PHY.

Linux code base : V5.4-rc1 

Ramuthevar Vadivel Murugan (2):
  dt-bindings: mmc: sdhci-of-arasan: Add new compatible for Intel LGM
SDXC
  mmc: sdhci-of-arasan: Add Support for Intel LGM SDXC

 Documentation/devicetree/bindings/mmc/arasan,sdhci.txt | 17 +
 drivers/mmc/host/sdhci-of-arasan.c | 15 +++
 2 files changed, 32 insertions(+)

-- 
2.11.0



[PATCH v1 1/2] dt-bindings: mmc: sdhci-of-arasan: Add new compatible for Intel LGM SDXC

2019-10-02 Thread Ramuthevar,Vadivel MuruganX
From: Ramuthevar Vadivel Murugan 

Add a new compatible to use the sdhc-arasan host controller driver
with the eMMC PHY on Intel's Lightning Mountain(LGM) SoC.

Signed-off-by: Ramuthevar Vadivel Murugan 

---
 Documentation/devicetree/bindings/mmc/arasan,sdhci.txt | 17 +
 1 file changed, 17 insertions(+)

diff --git a/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt 
b/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt
index 7ca0aa7ccc0b..eb78d9a28c8b 100644
--- a/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt
+++ b/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt
@@ -19,6 +19,8 @@ Required Properties:
Note: This binding has been deprecated and moved to [5].
 - "intel,lgm-sdhci-5.1-emmc", "arasan,sdhci-5.1": Intel LGM eMMC PHY
   For this device it is strongly suggested to include 
arasan,soc-ctl-syscon.
+- "intel,lgm-sdhci-5.1-sdxc", "arasan,sdhci-5.1": Intel LGM SDXC PHY
+  For this device it is strongly suggested to include 
arasan,soc-ctl-syscon.
 
   [5] Documentation/devicetree/bindings/mmc/sdhci-am654.txt
 
@@ -97,3 +99,18 @@ Example:
phy-names = "phy_arasan";
arasan,soc-ctl-syscon = <&sysconf>;
};
+
+   sdxc: sdhci@ec60 {
+   compatible = "arasan,sdhci-5.1", "intel,lgm-sdhci-5.1-sdxc";
+   reg = <0xec60 0x300>;
+   interrupt-parent = <&ioapic1>;
+   interrupts = <43 1>;
+   clocks = <&cgu0 LGM_CLK_SDIO>, <&cgu0 LGM_CLK_NGI>,
+<&cgu0 LGM_GCLK_SDXC>;
+   clock-names = "clk_xin", "clk_ahb", "gate";
+   clock-output-names = "sdxc_cardclock";
+   #clock-cells = <0>;
+   phys = <&sdxc_phy>;
+   phy-names = "phy_arasan";
+   arasan,soc-ctl-syscon = <&sysconf>;
+   };
-- 
2.11.0



[PATCH v1 2/2] mmc: sdhci-of-arasan: Add Support for Intel LGM SDXC

2019-10-02 Thread Ramuthevar,Vadivel MuruganX
From: Ramuthevar Vadivel Murugan 

The current arasan sdhci PHY configuration isn't compatible
with the PHY on Intel's LGM(Lightning Mountain) SoC devices.

Therefore, add a new compatible, to adapt the Intel's LGM
SDXC PHY with arasan-sdhc controller to configure the PHY.

Signed-off-by: Ramuthevar Vadivel Murugan 

---
 drivers/mmc/host/sdhci-of-arasan.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/mmc/host/sdhci-of-arasan.c 
b/drivers/mmc/host/sdhci-of-arasan.c
index 7023cbec4017..55de839a8a5e 100644
--- a/drivers/mmc/host/sdhci-of-arasan.c
+++ b/drivers/mmc/host/sdhci-of-arasan.c
@@ -120,6 +120,12 @@ static const struct sdhci_arasan_soc_ctl_map 
intel_lgm_emmc_soc_ctl_map = {
.hiword_update = false,
 };
 
+static const struct sdhci_arasan_soc_ctl_map intel_lgm_sdxc_soc_ctl_map = {
+   .baseclkfreq = { .reg = 0x80, .width = 8, .shift = 2 },
+   .clockmultiplier = { .reg = 0, .width = -1, .shift = -1 },
+   .hiword_update = false,
+};
+
 /**
  * sdhci_arasan_syscon_write - Write to a field in soc_ctl registers
  *
@@ -384,6 +390,11 @@ static struct sdhci_arasan_of_data intel_lgm_emmc_data = {
.pdata = &sdhci_arasan_cqe_pdata,
 };
 
+static struct sdhci_arasan_of_data intel_lgm_sdxc_data = {
+   .soc_ctl_map = &intel_lgm_sdxc_soc_ctl_map,
+   .pdata = &sdhci_arasan_cqe_pdata,
+};
+
 #ifdef CONFIG_PM_SLEEP
 /**
  * sdhci_arasan_suspend - Suspend method for the driver
@@ -489,6 +500,10 @@ static const struct of_device_id sdhci_arasan_of_match[] = 
{
.compatible = "intel,lgm-sdhci-5.1-emmc",
.data = &intel_lgm_emmc_data,
},
+   {
+   .compatible = "intel,lgm-sdhci-5.1-sdxc",
+   .data = &intel_lgm_sdxc_data,
+   },
/* Generic compatible below here */
{
.compatible = "arasan,sdhci-8.9a",
-- 
2.11.0



[PATCH] vfio/type1: remove hugepage checks in is_invalid_reserved_pfn()

2019-10-02 Thread Ben Luo
Currently, no hugepage split code can transfer the reserved bit
from head to tail during the split, so checking the head can't make
a difference in a racing condition with hugepage spliting.

The buddy wouldn't allow a driver to allocate an hugepage if any
subpage is reserved in the e820 map at boot, if any driver sets the
reserved bit of head page before mapping the hugepage in userland,
it needs to set the reserved bit in all subpages to be safe.

Signed-off-by: Ben Luo 
---
 drivers/vfio/vfio_iommu_type1.c | 26 --
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 054391f..e2019ba 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -287,31 +287,13 @@ static int vfio_lock_acct(struct vfio_dma *dma, long 
npage, bool async)
  * Some mappings aren't backed by a struct page, for example an mmap'd
  * MMIO range for our own or another device.  These use a different
  * pfn conversion and shouldn't be tracked as locked pages.
+ * For compound pages, any driver that sets the reserved bit in head
+ * page needs to set the reserved bit in all subpages to be safe.
  */
 static bool is_invalid_reserved_pfn(unsigned long pfn)
 {
-   if (pfn_valid(pfn)) {
-   bool reserved;
-   struct page *tail = pfn_to_page(pfn);
-   struct page *head = compound_head(tail);
-   reserved = !!(PageReserved(head));
-   if (head != tail) {
-   /*
-* "head" is not a dangling pointer
-* (compound_head takes care of that)
-* but the hugepage may have been split
-* from under us (and we may not hold a
-* reference count on the head page so it can
-* be reused before we run PageReferenced), so
-* we've to check PageTail before returning
-* what we just read.
-*/
-   smp_rmb();
-   if (PageTail(tail))
-   return reserved;
-   }
-   return PageReserved(tail);
-   }
+   if (pfn_valid(pfn))
+   return PageReserved(pfn_to_page(pfn));
 
return true;
 }
-- 
1.8.3.1



RE: [PATCH] can: m_can: add support for one shot mode

2019-10-02 Thread pankj.sharma
Gentle Ping

> -Original Message-
> From: Pankaj Sharma 
> Subject: [PATCH] can: m_can: add support for one shot mode
> 
> According to the CAN Specification (see ISO 11898-1:2015, 8.3.4 Recovery
> Management), the M_CAN provides means for automatic retransmission of
> frames that have lost arbitration or that have been disturbed by errors during
> transmission. By default automatic retransmission is enabled.
> 
> The Bosch MCAN controller has support for disabling automatic retransmission.
> 
> To support time-triggered communication as described in ISO 11898-1:2015,
> chapter 9.2, the automatic retransmission may be disabled via CCCR.DAR.
> 
> CAN_CTRLMODE_ONE_SHOT is used for disabling automatic retransmission.
> 
> Signed-off-by: Pankaj Sharma 
> Signed-off-by: Sriram Dash 
> ---
>  drivers/net/can/m_can/m_can.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
> index deb274a19ba0..e7165404ba8a 100644
> --- a/drivers/net/can/m_can/m_can.c
> +++ b/drivers/net/can/m_can/m_can.c
> @@ -150,6 +150,7 @@ enum m_can_mram_cfg {
>  #define CCCR_CME_CANFD_BRS   0x2
>  #define CCCR_TXP BIT(14)
>  #define CCCR_TESTBIT(7)
> +#define CCCR_DAR BIT(6)
>  #define CCCR_MON BIT(5)
>  #define CCCR_CSR BIT(4)
>  #define CCCR_CSA BIT(3)
> @@ -1123,7 +1124,7 @@ static void m_can_chip_config(struct net_device
> *dev)
>   if (priv->version == 30) {
>   /* Version 3.0.x */
> 
> - cccr &= ~(CCCR_TEST | CCCR_MON |
> + cccr &= ~(CCCR_TEST | CCCR_MON | CCCR_DAR |
>   (CCCR_CMR_MASK << CCCR_CMR_SHIFT) |
>   (CCCR_CME_MASK << CCCR_CME_SHIFT));
> 
> @@ -1133,7 +1134,7 @@ static void m_can_chip_config(struct net_device
> *dev)
>   } else {
>   /* Version 3.1.x or 3.2.x */
>   cccr &= ~(CCCR_TEST | CCCR_MON | CCCR_BRSE | CCCR_FDOE
> |
> -   CCCR_NISO);
> +   CCCR_NISO | CCCR_DAR);
> 
>   /* Only 3.2.x has NISO Bit implemented */
>   if (priv->can.ctrlmode & CAN_CTRLMODE_FD_NON_ISO) @@ -
> 1153,6 +1154,10 @@ static void m_can_chip_config(struct net_device *dev)
>   if (priv->can.ctrlmode & CAN_CTRLMODE_LISTENONLY)
>   cccr |= CCCR_MON;
> 
> + /* Disable Auto Retransmission (all versions) */
> + if (priv->can.ctrlmode & CAN_CTRLMODE_ONE_SHOT)
> + cccr |= CCCR_DAR;
> +
>   /* Write config */
>   m_can_write(priv, M_CAN_CCCR, cccr);
>   m_can_write(priv, M_CAN_TEST, test);
> @@ -1291,7 +1296,8 @@ static int m_can_dev_setup(struct platform_device
> *pdev, struct net_device *dev,
>   priv->can.ctrlmode_supported = CAN_CTRLMODE_LOOPBACK |
>   CAN_CTRLMODE_LISTENONLY |
>   CAN_CTRLMODE_BERR_REPORTING |
> - CAN_CTRLMODE_FD;
> + CAN_CTRLMODE_FD |
> + CAN_CTRLMODE_ONE_SHOT;
> 
>   /* Set properties depending on M_CAN version */
>   switch (priv->version) {
> --
> 2.17.1




Re: [PATCH 0/3] PCI: pciehp: Do not turn off slot if presence comes up after link

2019-10-02 Thread Lukas Wunner
On Wed, Oct 02, 2019 at 05:13:58PM -0500, Alex G. wrote:
> On 10/1/19 11:13 PM, Lukas Wunner wrote:
> > On Tue, Oct 01, 2019 at 05:14:16PM -0400, Stuart Hayes wrote:
> > > This patch set is based on a patch set [1] submitted many months ago by
> > > Alexandru Gagniuc, who is no longer working on it.
> > 
> > I'm not sure if it's appropriate to change the author and
> > omit Alex' Signed-off-by.
> 
> Legally Dell owns the patches. I have no objection on my end.

>From a kernel community POV, I don't think it matters (in this case)
who legally owns the copyright to the contributed code.  It's just that
we go to great lengths to provide proper attribution even for small
contributions (e.g. Tested-by).

The benefit to the community is that we know who to cc if that portion
of the code is changed again and someone knowledgable should take a look.

The benefit to contributors is that they can change jobs and their past
contributions are still visible in the git history and attributed to
their names.  By contrast, if you've worked on closed source code and
changed jobs, that work isn't visible to future employers or even yourself,
and it may happen that someone else takes credit for your past work
without you even knowing about it or being able to stop that.
(I've seen it before.)

In this case, there should be a S-o-b line for Alex preceding that
for Stuart, and the author of the commit should be Alex unless a
significant portion of the patch was changed.

Thanks,

Lukas


KMSAN: uninit-value in cxusb_rc_query

2019-10-02 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:cebbfdbc kmsan: merge set_no_shadow_page() and set_no_orig..
git tree:   https://github.com/google/kmsan.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=1277527e60
kernel config:  https://syzkaller.appspot.com/x/.config?x=f03c659d0830ab8d
dashboard link: https://syzkaller.appspot.com/bug?extid=98730b985cad4931a552
compiler:   clang version 9.0.0 (/home/glider/llvm/clang  
80fee25776c2fb61e74c1ecb1a523375c2500b69)

syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=10a648e560
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=156af54560

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+98730b985cad4931a...@syzkaller.appspotmail.com

dvb-usb: bulk message failed: -22 (1/-30591)
==
BUG: KMSAN: uninit-value in cxusb_rc_query+0x2f7/0x360  
drivers/media/usb/dvb-usb/cxusb.c:547

CPU: 0 PID: 761 Comm: kworker/0:2 Not tainted 5.3.0-rc7+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: events dvb_usb_read_remote_control
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x191/0x1f0 lib/dump_stack.c:113
 kmsan_report+0x162/0x2d0 mm/kmsan/kmsan_report.c:109
 __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:294
 cxusb_rc_query+0x2f7/0x360 drivers/media/usb/dvb-usb/cxusb.c:547
 dvb_usb_read_remote_control+0xf9/0x290  
drivers/media/usb/dvb-usb/dvb-usb-remote.c:261

 process_one_work+0x1572/0x1ef0 kernel/workqueue.c:2269
 worker_thread+0x111b/0x2460 kernel/workqueue.c:2415
 kthread+0x4b5/0x4f0 kernel/kthread.c:256
 ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:355

Local variable description: ircode@cxusb_rc_query
Variable was created at:
 cxusb_rc_query+0x4d/0x360 drivers/media/usb/dvb-usb/cxusb.c:543
 dvb_usb_read_remote_control+0xf9/0x290  
drivers/media/usb/dvb-usb/dvb-usb-remote.c:261

==
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 761 Comm: kworker/0:2 Tainted: GB 5.3.0-rc7+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: events dvb_usb_read_remote_control
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x191/0x1f0 lib/dump_stack.c:113
 panic+0x3c9/0xc1e kernel/panic.c:219
 kmsan_report+0x2ca/0x2d0 mm/kmsan/kmsan_report.c:129
 __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:294
 cxusb_rc_query+0x2f7/0x360 drivers/media/usb/dvb-usb/cxusb.c:547
 dvb_usb_read_remote_control+0xf9/0x290  
drivers/media/usb/dvb-usb/dvb-usb-remote.c:261

 process_one_work+0x1572/0x1ef0 kernel/workqueue.c:2269
 worker_thread+0x111b/0x2460 kernel/workqueue.c:2415
 kthread+0x4b5/0x4f0 kernel/kthread.c:256
 ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:355
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches


KMSAN: uninit-value in sr9800_bind

2019-10-02 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:124037e0 kmsan: drop inlines, rename do_kmsan_task_create()
git tree:   https://github.com/google/kmsan.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=10f7e0cd60
kernel config:  https://syzkaller.appspot.com/x/.config?x=f03c659d0830ab8d
dashboard link: https://syzkaller.appspot.com/bug?extid=f1842130bbcfb335bac1
compiler:   clang version 9.0.0 (/home/glider/llvm/clang  
80fee25776c2fb61e74c1ecb1a523375c2500b69)

syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=142acef360
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11811bbd60

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+f1842130bbcfb335b...@syzkaller.appspotmail.com

CoreChips 2-1:0.159 (unnamed net_device) (uninitialized): Error reading  
RX_CTL register:ffb9
CoreChips 2-1:0.159 (unnamed net_device) (uninitialized): Failed to enable  
software MII access
CoreChips 2-1:0.159 (unnamed net_device) (uninitialized): Failed to enable  
hardware MII access

=
BUG: KMSAN: uninit-value in usbnet_probe+0x10ae/0x3960  
drivers/net/usb/usbnet.c:1722

CPU: 1 PID: 11159 Comm: kworker/1:4 Not tainted 5.3.0-rc7+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: usb_hub_wq hub_event
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x191/0x1f0 lib/dump_stack.c:113
 kmsan_report+0x13a/0x2b0 mm/kmsan/kmsan_report.c:108
 __msan_warning+0x73/0xe0 mm/kmsan/kmsan_instr.c:250
 sr_get_phyid drivers/net/usb/sr9800.c:380 [inline]
 sr9800_bind+0xd39/0x1b10 drivers/net/usb/sr9800.c:800
 usbnet_probe+0x10ae/0x3960 drivers/net/usb/usbnet.c:1722
 usb_probe_interface+0xd19/0x1310 drivers/usb/core/driver.c:361
 really_probe+0x1373/0x1dc0 drivers/base/dd.c:552
 driver_probe_device+0x1ba/0x510 drivers/base/dd.c:709
 __device_attach_driver+0x5b8/0x790 drivers/base/dd.c:816
 bus_for_each_drv+0x28e/0x3b0 drivers/base/bus.c:454
 __device_attach+0x489/0x750 drivers/base/dd.c:882
 device_initial_probe+0x4a/0x60 drivers/base/dd.c:929
 bus_probe_device+0x131/0x390 drivers/base/bus.c:514
 device_add+0x25b5/0x2df0 drivers/base/core.c:2165
 usb_set_configuration+0x309f/0x3710 drivers/usb/core/message.c:2027
 generic_probe+0xe7/0x280 drivers/usb/core/generic.c:210
 usb_probe_device+0x146/0x200 drivers/usb/core/driver.c:266
 really_probe+0x1373/0x1dc0 drivers/base/dd.c:552
 driver_probe_device+0x1ba/0x510 drivers/base/dd.c:709
 __device_attach_driver+0x5b8/0x790 drivers/base/dd.c:816
 bus_for_each_drv+0x28e/0x3b0 drivers/base/bus.c:454
 __device_attach+0x489/0x750 drivers/base/dd.c:882
 device_initial_probe+0x4a/0x60 drivers/base/dd.c:929
 bus_probe_device+0x131/0x390 drivers/base/bus.c:514
 device_add+0x25b5/0x2df0 drivers/base/core.c:2165
 usb_new_device+0x23e5/0x2fb0 drivers/usb/core/hub.c:2536
 hub_port_connect drivers/usb/core/hub.c:5098 [inline]
 hub_port_connect_change drivers/usb/core/hub.c:5213 [inline]
 port_event drivers/usb/core/hub.c:5359 [inline]
 hub_event+0x581d/0x72f0 drivers/usb/core/hub.c:5441
 process_one_work+0x1572/0x1ef0 kernel/workqueue.c:2269
 process_scheduled_works kernel/workqueue.c:2331 [inline]
 worker_thread+0x189c/0x2460 kernel/workqueue.c:2417
 kthread+0x4b5/0x4f0 kernel/kthread.c:256
 ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:355

Local variable description: res@sr_mdio_read
Variable was created at:
 sr_mdio_read+0x78/0x360 drivers/net/usb/sr9800.c:341
 sr_get_phyid drivers/net/usb/sr9800.c:379 [inline]
 sr9800_bind+0xce9/0x1b10 drivers/net/usb/sr9800.c:800
=


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches


Re: Stop breaking the CSRNG

2019-10-02 Thread Theodore Y. Ts'o
On Wed, Oct 02, 2019 at 06:55:33PM +0200, Kurt Roeckx wrote:
> 
> But it seems people are now thinking about breaking getrandom() too,
> to let it return data when it's not initialized by default. Please
> don't.

"It's complicated"

The problem is that whether a CRNG can be considered secure is a
property of the entire system, including the hardware, and given the
large number of hardware configurations which the kernel and OpenSSL
can be used, in practice, we can't assure that getrandom(2) is
"secure" without making certain assumptions.  For example, if we
assume that the CPU is an x86 processor new enough to support RDRAND,
and that RDRAND is competently implemented (e.g., it won't disappear
after a suspend/resume) and doesn't have any backdoors implanted in
it, then it's easy to say that getrandom() will always be secure.

But if you assume that there is no hardware random number generator,
and everything is driven from a single master oscillator, with no
exernal input, and the CPU is utterly simple, with speculation or
anything else that might be non-determinstic, AND if we assume that
the idiots who make an IOT device use the same random seed across
millions of devices all cloned off of the same master imagine, there
is ***absoutely*** nothing the kernel can do to guarantee, with 100%
certainty, that the CRNG will be initialzied.  (This is especially
true if the idiots who design the IOT device call OpenSSL to generate
their long-term private key the moment the device is first plugged in,
before any networking device is brought on-line.)

The point with all of this is that both the kernel and OpenSSL, and
whether or not they can be combined to create a secure overall
solution is going to be dependent on the hardware choices, and choices
of the distribution and the application programmers in terms of what
other software components are used, and when and where those
components try to request random numbers, especially super-early in
the boot process.

Historically, I've tried to work around this problem by being super
paranoid about the choices of thresholds before declaring the CRNG to
be initialized, while *also* making sure that at least on most common
x86 systems, the CRNG could be considered initialized before the root
file system was mounted read/write.

But over time, assumptions of what is common hardware changes.  SSD's
replace HDD's; NAPI and other polling techniques are more common to
reduce the number of interrupts; the use of a single master oscillator
to drive the all of the various clocks on the system, etc.  And
software changes --- systemd running boot scripts in parallel means
that boot times are reduced, which is good, but it also means the time
to when the root is mounted read/write is much shortened.

So in the absence of a hardware RNG, or a hardware random number
generator which is considered trusted (i.e., should RDRAND
beconsidered trusted?), there *will* be times when we will simply fail
to be able to generate secure random numbers (at least by our
hueristics, which can potentially be overly optimistic on some
hardware platforms, and overly conservative on others).

The question is then, what do we do?  Do we hang the boot --- at which
point users will complain to Linus?  Or do we just hope that things
are "good enough", and that even if the user has elected to say that
they don't trust RDRAND, that we'll hope it's competently implement
and not backdoored?  Or do we assume that using a jitter entropy
scheme is actually secure, as opposed to security through obscurity
(and maybe is completely pointless on a simple and completely open
architecture with no speculation such as RISC-V)?

There really are no good choices here.  The one thing which Linus has
made very clear is that hanging at boot is Not Acceptable.  Long term,
the best we can do is to through the kitchen sink at the problem.  So
we should try to use UEFI's RNG if available; use the TPM's RNG if
available; use RDRAND if available; try to use a seed file if
available (and hope it's not cloned to be identical on a million IOT
devices); and so on.  Hopefully, they won't *all* incompetently
implemented and/or implanted with a backdoor from the NSA or MSS or
the KGB.

The only words of hope that I can give you is that it's likely that
there are so many zero day bugs in the kernel, in userspace
applications, and crypto libraries (including maybe OpenSSL), that we
don't have to make the CRNG impossible to attack in order to make a
difference.  We just have to make it harder than finding and
exploiting zero day security bugs in *other* parts of the system.

   "When a mountain bear is chasing after you, you don’t have to
   outrun the bear. You only have to outrun the person running next to
   you."  :-)

Bottom line, we can do the best we can with each of our various
components, but without control over the hardware that will be in use,
or for OpenSSL, what applications are trying to call OpenSSL for, and
when they might tr

[RFC][PATCH] dt-bindings: usb: rt1711h: Add connector bindings

2019-10-02 Thread John Stultz
Add connector binding documentation for Richtek RT1711H Type-C
chip driver

It was noted by Rob Herring that the rt1711h binding docs
doesn't include the connector binding.

Thus this patch adds such documentation following the details
in Documentation/devicetree/bindings/usb/typec-tcpci.txt

CC: ShuFan Lee 
Cc: Greg Kroah-Hartman 
Cc: Rob Herring 
Cc: Mark Rutland 
Cc: linux-...@vger.kernel.org
Cc: devicet...@vger.kernel.org
Signed-off-by: John Stultz 
---
 .../bindings/usb/richtek,rt1711h.txt  | 29 +++
 1 file changed, 29 insertions(+)

diff --git a/Documentation/devicetree/bindings/usb/richtek,rt1711h.txt 
b/Documentation/devicetree/bindings/usb/richtek,rt1711h.txt
index d4cf53c071d9..e3fc57e605ed 100644
--- a/Documentation/devicetree/bindings/usb/richtek,rt1711h.txt
+++ b/Documentation/devicetree/bindings/usb/richtek,rt1711h.txt
@@ -6,10 +6,39 @@ Required properties:
  - interrupts :  where a is the interrupt number and b represents an
encoding of the sense and level information for the interrupt.
 
+Required sub-node:
+- connector: The "usb-c-connector" attached to the tcpci chip, the bindings
+  of connector node are specified in
+  Documentation/devicetree/bindings/connector/usb-connector.txt
+
 Example :
 rt1711h@4e {
compatible = "richtek,rt1711h";
reg = <0x4e>;
interrupt-parent = <&gpio26>;
interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
+
+   usb_con: connector {
+   compatible = "usb-c-connector";
+   label = "USB-C";
+   data-role = "dual";
+   power-role = "dual";
+   try-power-role = "sink";
+   source-pdos = ;
+   sink-pdos = ;
+   op-sink-microwatt = <1000>;
+
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   port@1 {
+   reg = <1>;
+   usb_con_ss: endpoint {
+   remote-endpoint = <&usb3_data_ss>;
+   };
+   };
+   };
+   };
 };
-- 
2.17.1



[PATCH 2/2] HID: google: Detect base folded usage instead of hard-coding whiskers

2019-10-02 Thread Nicolas Boichat
Some other hammer-like device will emit a similar code, let's look for
the folded event in HID usage table, instead of hard-coding whiskers
in many places.

Signed-off-by: Nicolas Boichat 
---
 drivers/hid/hid-google-hammer.c | 53 -
 1 file changed, 25 insertions(+), 28 deletions(-)

diff --git a/drivers/hid/hid-google-hammer.c b/drivers/hid/hid-google-hammer.c
index a52535ebc6fe92c..2aa4ed157aec875 100644
--- a/drivers/hid/hid-google-hammer.c
+++ b/drivers/hid/hid-google-hammer.c
@@ -370,7 +370,7 @@ static void hammer_unregister_leds(struct hid_device *hdev)
 
 #define HID_UP_GOOGLEVENDOR0xffd1
 #define HID_VD_KBD_FOLDED  0x0019
-#define WHISKERS_KBD_FOLDED(HID_UP_GOOGLEVENDOR | HID_VD_KBD_FOLDED)
+#define HID_USAGE_KBD_FOLDED   (HID_UP_GOOGLEVENDOR | HID_VD_KBD_FOLDED)
 
 /* HID usage for keyboard backlight (Alphanumeric display brightness) */
 #define HID_AD_BRIGHTNESS  0x00140046
@@ -380,8 +380,7 @@ static int hammer_input_mapping(struct hid_device *hdev, 
struct hid_input *hi,
struct hid_usage *usage,
unsigned long **bit, int *max)
 {
-   if (hdev->product == USB_DEVICE_ID_GOOGLE_WHISKERS &&
-   usage->hid == WHISKERS_KBD_FOLDED) {
+   if (usage->hid == HID_USAGE_KBD_FOLDED) {
/*
 * We do not want to have this usage mapped as it will get
 * mixed in with "base attached" signal and delivered over
@@ -398,8 +397,7 @@ static int hammer_event(struct hid_device *hid, struct 
hid_field *field,
 {
unsigned long flags;
 
-   if (hid->product == USB_DEVICE_ID_GOOGLE_WHISKERS &&
-   usage->hid == WHISKERS_KBD_FOLDED) {
+   if (usage->hid == HID_USAGE_KBD_FOLDED) {
spin_lock_irqsave(&cbas_ec_lock, flags);
 
/*
@@ -424,33 +422,22 @@ static int hammer_event(struct hid_device *hid, struct 
hid_field *field,
return 0;
 }
 
-static bool hammer_is_keyboard_interface(struct hid_device *hdev)
+static bool hammer_has_usage(struct hid_device *hdev, unsigned int report_type,
+   unsigned application, unsigned usage)
 {
-   struct hid_report_enum *re = &hdev->report_enum[HID_INPUT_REPORT];
-   struct hid_report *report;
-
-   list_for_each_entry(report, &re->report_list, list)
-   if (report->application == HID_GD_KEYBOARD)
-   return true;
-
-   return false;
-}
-
-static bool hammer_has_backlight_control(struct hid_device *hdev)
-{
-   struct hid_report_enum *re = &hdev->report_enum[HID_OUTPUT_REPORT];
+   struct hid_report_enum *re = &hdev->report_enum[report_type];
struct hid_report *report;
int i, j;
 
list_for_each_entry(report, &re->report_list, list) {
-   if (report->application != HID_GD_KEYBOARD)
+   if (report->application != application)
continue;
 
for (i = 0; i < report->maxfield; i++) {
struct hid_field *field = report->field[i];
 
for (j = 0; j < field->maxusage; j++)
-   if (field->usage[j].hid == HID_AD_BRIGHTNESS)
+   if (field->usage[j].hid == usage)
return true;
}
}
@@ -458,6 +445,18 @@ static bool hammer_has_backlight_control(struct hid_device 
*hdev)
return false;
 }
 
+static bool hammer_has_folded_event(struct hid_device *hdev)
+{
+   return hammer_has_usage(hdev, HID_INPUT_REPORT,
+   HID_GD_KEYBOARD, HID_USAGE_KBD_FOLDED);
+}
+
+static bool hammer_has_backlight_control(struct hid_device *hdev)
+{
+   return hammer_has_usage(hdev, HID_OUTPUT_REPORT,
+   HID_GD_KEYBOARD, HID_AD_BRIGHTNESS);
+}
+
 static int hammer_probe(struct hid_device *hdev,
const struct hid_device_id *id)
 {
@@ -473,12 +472,11 @@ static int hammer_probe(struct hid_device *hdev,
 
/*
 * We always want to poll for, and handle tablet mode events from
-* Whiskers, even when nobody has opened the input device. This also
-* prevents the hid core from dropping early tablet mode events from
-* the device.
+* devices that have folded usage, even when nobody has opened the input
+* device. This also prevents the hid core from dropping early tablet
+* mode events from the device.
 */
-   if (hdev->product == USB_DEVICE_ID_GOOGLE_WHISKERS &&
-   hammer_is_keyboard_interface(hdev)) {
+   if (hammer_has_folded_event(hdev)) {
hdev->quirks |= HID_QUIRK_ALWAYS_POLL;
error = hid_hw_open(hdev);
if (error)
@@ -500,8 +498,7 @@ static void hammer_remove(struct hid_device *hdev)
 {
unsigned long flags;
 
-   if (hdev->product == USB_DEVICE_ID

[PATCH 1/2] HID: google: add magnemite/masterball USB ids

2019-10-02 Thread Nicolas Boichat
Add 2 additional hammer-like devices.

Signed-off-by: Nicolas Boichat 
---
 drivers/hid/hid-google-hammer.c | 4 
 drivers/hid/hid-ids.h   | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/drivers/hid/hid-google-hammer.c b/drivers/hid/hid-google-hammer.c
index 31e4a39946f59ad..a52535ebc6fe92c 100644
--- a/drivers/hid/hid-google-hammer.c
+++ b/drivers/hid/hid-google-hammer.c
@@ -531,6 +531,10 @@ static void hammer_remove(struct hid_device *hdev)
 static const struct hid_device_id hammer_devices[] = {
{ HID_DEVICE(BUS_USB, HID_GROUP_GENERIC,
 USB_VENDOR_ID_GOOGLE, USB_DEVICE_ID_GOOGLE_HAMMER) },
+   { HID_DEVICE(BUS_USB, HID_GROUP_GENERIC,
+USB_VENDOR_ID_GOOGLE, USB_DEVICE_ID_GOOGLE_MAGNEMITE) },
+   { HID_DEVICE(BUS_USB, HID_GROUP_GENERIC,
+USB_VENDOR_ID_GOOGLE, USB_DEVICE_ID_GOOGLE_MASTERBALL) },
{ HID_DEVICE(BUS_USB, HID_GROUP_GENERIC,
 USB_VENDOR_ID_GOOGLE, USB_DEVICE_ID_GOOGLE_STAFF) },
{ HID_DEVICE(BUS_USB, HID_GROUP_GENERIC,
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h
index 76969a22b0f2f79..447e8db21174ae7 100644
--- a/drivers/hid/hid-ids.h
+++ b/drivers/hid/hid-ids.h
@@ -476,6 +476,8 @@
 #define USB_DEVICE_ID_GOOGLE_STAFF 0x502b
 #define USB_DEVICE_ID_GOOGLE_WAND  0x502d
 #define USB_DEVICE_ID_GOOGLE_WHISKERS  0x5030
+#define USB_DEVICE_ID_GOOGLE_MASTERBALL0x503c
+#define USB_DEVICE_ID_GOOGLE_MAGNEMITE 0x503d
 
 #define USB_VENDOR_ID_GOTOP0x08f2
 #define USB_DEVICE_ID_SUPER_Q2 0x007f
-- 
2.23.0.444.g18eeb5a265-goog



[PATCH RFC] reboot: hotplug cpus in migrate_to_reboot_cpu()

2019-10-02 Thread Hsin-Yi Wang
Currently system reboots use arch specific codes (eg. smp_send_stop) to
offline non reboot cpus. Some arch like arm64, arm, and x86... set offline
masks to cpu without really offlining them. Thus it causes some race
condition and kernel warning comes out sometimes when system reboots. We
can do cpu hotplug in migrate_to_reboot_cpu() to avoid this issue.

Signed-off-by: Hsin-Yi Wang 
---
kernel warnings at reboot:
[1] https://lore.kernel.org/lkml/20190820100843.3028-1-hsi...@chromium.org/
[2] https://lore.kernel.org/lkml/20190727164450.ga11...@roeck-us.net/
---
 kernel/cpu.c| 35 +++
 kernel/reboot.c | 18 --
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index fc28e17940e0..2f4d51fe91e3 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #define CREATE_TRACE_POINTS
@@ -1366,6 +1367,40 @@ int __boot_cpu_id;
 
 #endif /* CONFIG_SMP */
 
+void migrate_to_reboot_cpu(void)
+{
+   /* The boot cpu is always logical cpu 0 */
+   int cpu = reboot_cpu;
+
+   /* Make certain the cpu I'm about to reboot on is online */
+   if (!cpu_online(cpu))
+   cpu = cpumask_first(cpu_online_mask);
+
+   /* Prevent races with other tasks migrating this task */
+   current->flags |= PF_NO_SETAFFINITY;
+
+   /* Make certain I only run on the appropriate processor */
+   set_cpus_allowed_ptr(current, cpumask_of(cpu));
+
+   /* Hotplug other cpus if possible */
+   if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
+   int i, err;
+
+   cpu_maps_update_begin();
+
+   for_each_online_cpu(i) {
+   if (i == cpu)
+   continue;
+   err = _cpu_down(i, 0, CPUHP_OFFLINE);
+   if (err)
+   pr_info("Failed to offline cpu %d\n", i);
+   }
+   cpu_hotplug_disabled++;
+
+   cpu_maps_update_done();
+   }
+}
+
 /* Boot processor state steps */
 static struct cpuhp_step cpuhp_hp_states[] = {
[CPUHP_OFFLINE] = {
diff --git a/kernel/reboot.c b/kernel/reboot.c
index c4d472b7f1b4..f0046be34a60 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -215,24 +215,6 @@ void do_kernel_restart(char *cmd)
atomic_notifier_call_chain(&restart_handler_list, reboot_mode, cmd);
 }
 
-void migrate_to_reboot_cpu(void)
-{
-   /* The boot cpu is always logical cpu 0 */
-   int cpu = reboot_cpu;
-
-   cpu_hotplug_disable();
-
-   /* Make certain the cpu I'm about to reboot on is online */
-   if (!cpu_online(cpu))
-   cpu = cpumask_first(cpu_online_mask);
-
-   /* Prevent races with other tasks migrating this task */
-   current->flags |= PF_NO_SETAFFINITY;
-
-   /* Make certain I only run on the appropriate processor */
-   set_cpus_allowed_ptr(current, cpumask_of(cpu));
-}
-
 /**
  * kernel_restart - reboot the system
  * @cmd: pointer to buffer containing command to execute for restart
-- 
2.23.0.444.g18eeb5a265-goog



[PATCH v2] kbuild: update compile-test header list for v5.4-rc2

2019-10-02 Thread Masahiro Yamada
Commit 6dc280ebeed2 ("coda: remove uapi/linux/coda_psdev.h") removed
a header in question. Some more build errors were fixed. Add more
headers into the test coverage.

Signed-off-by: Masahiro Yamada 
---

Changes in v2:
  - remove linux/coda_psdev.h as well

 usr/include/Makefile | 10 --
 1 file changed, 10 deletions(-)

diff --git a/usr/include/Makefile b/usr/include/Makefile
index c9449aaf438d..57b20f7b6729 100644
--- a/usr/include/Makefile
+++ b/usr/include/Makefile
@@ -29,13 +29,11 @@ header-test- += linux/android/binderfs.h
 header-test-$(CONFIG_CPU_BIG_ENDIAN) += linux/byteorder/big_endian.h
 header-test-$(CONFIG_CPU_LITTLE_ENDIAN) += linux/byteorder/little_endian.h
 header-test- += linux/coda.h
-header-test- += linux/coda_psdev.h
 header-test- += linux/elfcore.h
 header-test- += linux/errqueue.h
 header-test- += linux/fsmap.h
 header-test- += linux/hdlc/ioctl.h
 header-test- += linux/ivtv.h
-header-test- += linux/jffs2.h
 header-test- += linux/kexec.h
 header-test- += linux/matroxfb.h
 header-test- += linux/netfilter_ipv4/ipt_LOG.h
@@ -55,20 +53,12 @@ header-test- += linux/v4l2-mediabus.h
 header-test- += linux/v4l2-subdev.h
 header-test- += linux/videodev2.h
 header-test- += linux/vm_sockets.h
-header-test- += scsi/scsi_bsg_fc.h
-header-test- += scsi/scsi_netlink.h
-header-test- += scsi/scsi_netlink_fc.h
 header-test- += sound/asequencer.h
 header-test- += sound/asoc.h
 header-test- += sound/asound.h
 header-test- += sound/compress_offload.h
 header-test- += sound/emu10k1.h
 header-test- += sound/sfnt_info.h
-header-test- += sound/sof/eq.h
-header-test- += sound/sof/fw.h
-header-test- += sound/sof/header.h
-header-test- += sound/sof/manifest.h
-header-test- += sound/sof/trace.h
 header-test- += xen/evtchn.h
 header-test- += xen/gntdev.h
 header-test- += xen/privcmd.h
-- 
2.17.1



RE: [PATCH 0/2] peci: aspeed: Add AST2600 compatible

2019-10-02 Thread ChiaWei Wang
Hi Jae Hyun,

Thanks for the feedback.
For now should I use GitHub pull-request to submit the patches of PECI-related 
change to OpenBMC dev-5.3 tree only?

Regards,
Chiawei

* Email Confidentiality Notice 
DISCLAIMER:
This message (and any attachments) may contain legally privileged and/or other 
confidential information. If you have received it in error, please notify the 
sender by reply e-mail and immediately delete the e-mail and any attachments 
without copying or disclosing the contents. Thank you.


-Original Message-
From: Jae Hyun Yoo [mailto:jae.hyun@linux.intel.com] 
Sent: Thursday, October 3, 2019 7:43 AM
To: Joel Stanley 
Cc: ChiaWei Wang ; Jason M Biils 
; Rob Herring ; Mark Rutland 
; Andrew Jeffery ; linux-aspeed 
; OpenBMC Maillist ; 
devicetree ; Linux ARM 
; Linux Kernel Mailing List 
; Ryan Chen 
Subject: Re: [PATCH 0/2] peci: aspeed: Add AST2600 compatible

On 10/2/2019 3:05 PM, Joel Stanley wrote:
> On Wed, 2 Oct 2019 at 18:11, Jae Hyun Yoo  
> wrote:
>>
>> Hi Chia-Wei,
>>
>> On 10/1/2019 11:11 PM, Chia-Wei, Wang wrote:
>>> Update the Aspeed PECI driver with the AST2600 compatible string.
>>> A new comptabile string is needed for the extended HW feature of 
>>> AST2600.
>>>
>>> Chia-Wei, Wang (2):
>>> peci: aspeed: Add AST2600 compatible string
>>> dt-bindings: peci: aspeed: Add AST2600 compatible
>>>
>>>Documentation/devicetree/bindings/peci/peci-aspeed.txt | 1 +
>>>drivers/peci/peci-aspeed.c | 1 +
>>>2 files changed, 2 insertions(+)
>>>
>>
>> PECI subsystem isn't in linux upstream yet so you should submit it 
>> into OpenBMC dev-5.3 tree only.
> 
> OpenBMC has been carrying the out of tree patches for some time now. I 
> haven't seen a new version posted for a while. Do you have a timeline 
> for when you plan to submit it upstream?

Thanks for your effort for carrying the out of tree patches in OpenBMC.
I don't have a exact timeline but I'm gonna upstream it as soon as it gets 
ready.

Thanks,

Jae


Re: [PATCH] scsi: ch: add include guard to chio.h

2019-10-02 Thread Masahiro Yamada
Hi,

On Mon, Jul 29, 2019 at 1:47 AM Masahiro Yamada
 wrote:
>
> Add a header include guard just in case.
>
> Signed-off-by: Masahiro Yamada 
> ---

Ping?


>  include/uapi/linux/chio.h | 11 ---
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/include/uapi/linux/chio.h b/include/uapi/linux/chio.h
> index 689fc93fafda..e1cad4c319ee 100644
> --- a/include/uapi/linux/chio.h
> +++ b/include/uapi/linux/chio.h
> @@ -3,6 +3,9 @@
>   * ioctl interface for the scsi media changer driver
>   */
>
> +#ifndef _UAPI_LINUX_CHIO_H
> +#define _UAPI_LINUX_CHIO_H
> +
>  /* changer element types */
>  #define CHET_MT   0/* media transport element (robot) */
>  #define CHET_ST   1/* storage element (media slots) */
> @@ -160,10 +163,4 @@ struct changer_set_voltag {
>  #define CHIOSVOLTAG_IOW('c',18,struct changer_set_voltag)
>  #define CHIOGVPARAMS   _IOR('c',19,struct changer_vendor_params)
>
> -/* -- */
> -
> -/*
> - * Local variables:
> - * c-basic-offset: 8
> - * End:
> - */
> +#endif /* _UAPI_LINUX_CHIO_H */
> --
> 2.17.1
>


-- 
Best Regards
Masahiro Yamada


Re: [PATCH] kasan: fix the missing underflow in memmove and memcpy with CONFIG_KASAN_GENERIC=y

2019-10-02 Thread Walter Wu
On Wed, 2019-10-02 at 15:57 +0200, Dmitry Vyukov wrote:
> On Wed, Oct 2, 2019 at 2:15 PM Walter Wu  wrote:
> >
> > On Mon, 2019-09-30 at 12:36 +0800, Walter Wu wrote:
> > > On Fri, 2019-09-27 at 21:41 +0200, Dmitry Vyukov wrote:
> > > > On Fri, Sep 27, 2019 at 4:22 PM Walter Wu  
> > > > wrote:
> > > > >
> > > > > On Fri, 2019-09-27 at 15:07 +0200, Dmitry Vyukov wrote:
> > > > > > On Fri, Sep 27, 2019 at 5:43 AM Walter Wu 
> > > > > >  wrote:
> > > > > > >
> > > > > > > memmove() and memcpy() have missing underflow issues.
> > > > > > > When -7 <= size < 0, then KASAN will miss to catch the underflow 
> > > > > > > issue.
> > > > > > > It looks like shadow start address and shadow end address is the 
> > > > > > > same,
> > > > > > > so it does not actually check anything.
> > > > > > >
> > > > > > > The following test is indeed not caught by KASAN:
> > > > > > >
> > > > > > > char *p = kmalloc(64, GFP_KERNEL);
> > > > > > > memset((char *)p, 0, 64);
> > > > > > > memmove((char *)p, (char *)p + 4, -2);
> > > > > > > kfree((char*)p);
> > > > > > >
> > > > > > > It should be checked here:
> > > > > > >
> > > > > > > void *memmove(void *dest, const void *src, size_t len)
> > > > > > > {
> > > > > > > check_memory_region((unsigned long)src, len, false, 
> > > > > > > _RET_IP_);
> > > > > > > check_memory_region((unsigned long)dest, len, true, 
> > > > > > > _RET_IP_);
> > > > > > >
> > > > > > > return __memmove(dest, src, len);
> > > > > > > }
> > > > > > >
> > > > > > > We fix the shadow end address which is calculated, then generic 
> > > > > > > KASAN
> > > > > > > get the right shadow end address and detect this underflow issue.
> > > > > > >
> > > > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=199341
> > > > > > >
> > > > > > > Signed-off-by: Walter Wu 
> > > > > > > Reported-by: Dmitry Vyukov 
> > > > > > > ---
> > > > > > >  lib/test_kasan.c   | 36 
> > > > > > >  mm/kasan/generic.c |  8 ++--
> > > > > > >  2 files changed, 42 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/lib/test_kasan.c b/lib/test_kasan.c
> > > > > > > index b63b367a94e8..8bd014852556 100644
> > > > > > > --- a/lib/test_kasan.c
> > > > > > > +++ b/lib/test_kasan.c
> > > > > > > @@ -280,6 +280,40 @@ static noinline void __init 
> > > > > > > kmalloc_oob_in_memset(void)
> > > > > > > kfree(ptr);
> > > > > > >  }
> > > > > > >
> > > > > > > +static noinline void __init 
> > > > > > > kmalloc_oob_in_memmove_underflow(void)
> > > > > > > +{
> > > > > > > +   char *ptr;
> > > > > > > +   size_t size = 64;
> > > > > > > +
> > > > > > > +   pr_info("underflow out-of-bounds in memmove\n");
> > > > > > > +   ptr = kmalloc(size, GFP_KERNEL);
> > > > > > > +   if (!ptr) {
> > > > > > > +   pr_err("Allocation failed\n");
> > > > > > > +   return;
> > > > > > > +   }
> > > > > > > +
> > > > > > > +   memset((char *)ptr, 0, 64);
> > > > > > > +   memmove((char *)ptr, (char *)ptr + 4, -2);
> > > > > > > +   kfree(ptr);
> > > > > > > +}
> > > > > > > +
> > > > > > > +static noinline void __init kmalloc_oob_in_memmove_overflow(void)
> > > > > > > +{
> > > > > > > +   char *ptr;
> > > > > > > +   size_t size = 64;
> > > > > > > +
> > > > > > > +   pr_info("overflow out-of-bounds in memmove\n");
> > > > > > > +   ptr = kmalloc(size, GFP_KERNEL);
> > > > > > > +   if (!ptr) {
> > > > > > > +   pr_err("Allocation failed\n");
> > > > > > > +   return;
> > > > > > > +   }
> > > > > > > +
> > > > > > > +   memset((char *)ptr, 0, 64);
> > > > > > > +   memmove((char *)ptr + size, (char *)ptr, 2);
> > > > > > > +   kfree(ptr);
> > > > > > > +}
> > > > > > > +
> > > > > > >  static noinline void __init kmalloc_uaf(void)
> > > > > > >  {
> > > > > > > char *ptr;
> > > > > > > @@ -734,6 +768,8 @@ static int __init kmalloc_tests_init(void)
> > > > > > > kmalloc_oob_memset_4();
> > > > > > > kmalloc_oob_memset_8();
> > > > > > > kmalloc_oob_memset_16();
> > > > > > > +   kmalloc_oob_in_memmove_underflow();
> > > > > > > +   kmalloc_oob_in_memmove_overflow();
> > > > > > > kmalloc_uaf();
> > > > > > > kmalloc_uaf_memset();
> > > > > > > kmalloc_uaf2();
> > > > > > > diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
> > > > > > > index 616f9dd82d12..34ca23d59e67 100644
> > > > > > > --- a/mm/kasan/generic.c
> > > > > > > +++ b/mm/kasan/generic.c
> > > > > > > @@ -131,9 +131,13 @@ static __always_inline bool 
> > > > > > > memory_is_poisoned_n(unsigned long addr,
> > > > > > > size_t size)
> > > > > > >  {
> > > > > > > unsigned long ret;
> > > > > > > +   void *shadow_start = kasan_mem_to_shadow((void *)addr);
> > > > > > > +   void *shadow_end = kasan_mem_to_shadow((void *)addr + 
> >

Re: [PATCH] compiler: enable CONFIG_OPTIMIZE_INLINING forcibly

2019-10-02 Thread Masahiro Yamada
On Thu, Oct 3, 2019 at 5:46 AM Linus Torvalds
 wrote:
>
> On Wed, Oct 2, 2019 at 5:56 AM Geert Uytterhoeven  
> wrote:
> >
> > >
> > > Then use the C preprocessor to force the inlining.  I'm sorry it's not
> > > as pretty as static inline functions.
> >
> > Which makes us lose the baby^H^H^H^Htype checking performed
> > on function parameters, requiring to add more ugly checks.
>
> I'm 100% agreed on this.
>
> If the inline change is being pushed by people who say "you should
> have used macros instead if you wanted inlining", then I will just
> revert that stupid commit that is causing problems.
>
> No, the preprocessor is not the answer.
>
> That said, code that relies on inlining for _correctness_ should use
> "__always_inline" and possibly even have a comment about why.
>
> But I am considering just undoing commit 9012d011660e ("compiler:
> allow all arches to enable CONFIG_OPTIMIZE_INLINING") entirely.

No, please do not.

Macrofying the 'inline' is a horrid mistake that makes incorrect code work.
It would eternally prevent people from writing portable, correct code.
Please do not encourage to hide problems.


> The
> advantages are questionable, and when the advantages are balanced
> against actual regressions and the arguments are "use macros", that
> just shows how badly thought out this was.
>
> Linus



-- 
Best Regards
Masahiro Yamada


Re: [PATCH] Make SPLIT_RSS_COUNTING configurable

2019-10-02 Thread Daniel Colascione
On Wed, Oct 2, 2019 at 6:56 PM Qian Cai  wrote:
> > On Oct 2, 2019, at 4:29 PM, Daniel Colascione  wrote:
> >
> > Adding the correct linux-mm address.
> >
> >
> >> On Wed, Oct 2, 2019 at 1:25 PM Daniel Colascione  wrote:
> >>
> >> Using the new config option, users can disable SPLIT_RSS_COUNTING to
> >> get increased accuracy in user-visible mm counters.
> >>
> >> Signed-off-by: Daniel Colascione 
> >> ---
> >> include/linux/mm.h|  4 ++--
> >> include/linux/mm_types_task.h |  5 ++---
> >> include/linux/sched.h |  2 +-
> >> kernel/fork.c |  2 +-
> >> mm/Kconfig| 11 +++
> >> mm/memory.c   |  6 +++---
> >> 6 files changed, 20 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >> index cc292273e6ba..221395de3cb4 100644
> >> --- a/include/linux/mm.h
> >> +++ b/include/linux/mm.h
> >> @@ -1637,7 +1637,7 @@ static inline unsigned long get_mm_counter(struct 
> >> mm_struct *mm, int member)
> >> {
> >>long val = atomic_long_read(&mm->rss_stat.count[member]);
> >>
> >> -#ifdef SPLIT_RSS_COUNTING
> >> +#ifdef CONFIG_SPLIT_RSS_COUNTING
> >>/*
> >> * counter is updated in asynchronous manner and may go to minus.
> >> * But it's never be expected number for users.
> >> @@ -1723,7 +1723,7 @@ static inline void setmax_mm_hiwater_rss(unsigned 
> >> long *maxrss,
> >>*maxrss = hiwater_rss;
> >> }
> >>
> >> -#if defined(SPLIT_RSS_COUNTING)
> >> +#ifdef CONFIG_SPLIT_RSS_COUNTING
> >> void sync_mm_rss(struct mm_struct *mm);
> >> #else
> >> static inline void sync_mm_rss(struct mm_struct *mm)
> >> diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h
> >> index c1bc6731125c..d2adc8057e65 100644
> >> --- a/include/linux/mm_types_task.h
> >> +++ b/include/linux/mm_types_task.h
> >> @@ -48,14 +48,13 @@ enum {
> >>NR_MM_COUNTERS
> >> };
> >>
> >> -#if USE_SPLIT_PTE_PTLOCKS && defined(CONFIG_MMU)
> >> -#define SPLIT_RSS_COUNTING
> >> +#ifdef CONFIG_SPLIT_RSS_COUNTING
> >> /* per-thread cached information, */
> >> struct task_rss_stat {
> >>int events; /* for synchronization threshold */
> >>int count[NR_MM_COUNTERS];
> >> };
> >> -#endif /* USE_SPLIT_PTE_PTLOCKS */
> >> +#endif /* CONFIG_SPLIT_RSS_COUNTING */
> >>
> >> struct mm_rss_stat {
> >>atomic_long_t count[NR_MM_COUNTERS];
> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
> >> index 2c2e56bd8913..22f354774540 100644
> >> --- a/include/linux/sched.h
> >> +++ b/include/linux/sched.h
> >> @@ -729,7 +729,7 @@ struct task_struct {
> >>/* Per-thread vma caching: */
> >>struct vmacache vmacache;
> >>
> >> -#ifdef SPLIT_RSS_COUNTING
> >> +#ifdef CONFIG_SPLIT_RSS_COUNTING
> >>struct task_rss_statrss_stat;
> >> #endif
> >>int exit_state;
> >> diff --git a/kernel/fork.c b/kernel/fork.c
> >> index f9572f416126..fc5e0889922b 100644
> >> --- a/kernel/fork.c
> >> +++ b/kernel/fork.c
> >> @@ -1917,7 +1917,7 @@ static __latent_entropy struct task_struct 
> >> *copy_process(
> >>p->vtime.state = VTIME_INACTIVE;
> >> #endif
> >>
> >> -#if defined(SPLIT_RSS_COUNTING)
> >> +#ifdef CONFIG_SPLIT_RSS_COUNTING
> >>memset(&p->rss_stat, 0, sizeof(p->rss_stat));
> >> #endif
> >>
> >> diff --git a/mm/Kconfig b/mm/Kconfig
> >> index a5dae9a7eb51..372ef9449924 100644
> >> --- a/mm/Kconfig
> >> +++ b/mm/Kconfig
> >> @@ -736,4 +736,15 @@ config ARCH_HAS_PTE_SPECIAL
> >> config ARCH_HAS_HUGEPD
> >>bool
> >>
> >> +config SPLIT_RSS_COUNTING
> >> +   bool "Per-thread mm counter caching"
> >> +   depends on MMU
> >> +   default y if NR_CPUS >= SPLIT_PTLOCK_CPUS
> >> +   help
> >> + Cache mm counter updates in thread structures and
> >> + flush them to visible per-process statistics in batches.
> >> + Say Y here to slightly reduce cache contention in processes
> >> + with many threads at the expense of decreasing the accuracy
> >> + of memory statistics in /proc.
> >> +
> >> endmenu
>
> All those vague words are going to make developers almost impossible to 
> decide the right selection here. It sounds like we should kill 
> SPLIT_RSS_COUNTING at all to simplify the code as the benefit is so small vs 
> the side-effect?

Killing SPLIT_RSS_COUNTING would be my first choice; IME, on mobile
and a basic desktop, it doesn't make a difference. I figured making it
a knob would help allay concerns about the performance impact in more
extreme configurations.


Re: [PATCH 5/6] arm64: tegra: Add XUSB and pad controller on Tegra194

2019-10-02 Thread JC Kuo
On 10/2/19 6:11 PM, Thierry Reding wrote:
> On Wed, Oct 02, 2019 at 04:00:50PM +0800, JC Kuo wrote:
>> Adds the XUSB pad and XUSB controllers on Tegra194.
>>
>> Signed-off-by: JC Kuo 
>> ---
>>  arch/arm64/boot/dts/nvidia/tegra194.dtsi | 130 +++
>>  1 file changed, 130 insertions(+)
>>
>> diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
>> b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
>> index 3c0cf54f0aab..4d3371d3a407 100644
>> --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
>> +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
>> @@ -1599,4 +1599,134 @@
>>  interrupt-parent = <&gic>;
>>  always-on;
>>  };
>> +
>> +xusb_padctl: padctl@352 {
> 
> I also noticed that this is outside of the /cbb bus node. It really
> belongs inside /cbb. Same for the XHCI controller node.
> 
> Thierry
Thanks. I will move both inside /cbb.
> 
>> +compatible = "nvidia,tegra194-xusb-padctl";
>> +reg = <0x0 0x0352 0x0 0x1000>,
>> +<0x0 0x0354 0x0 0x1000>;
>> +reg-names = "padctl", "ao";
>> +
>> +resets = <&bpmp TEGRA194_RESET_XUSB_PADCTL>;
>> +reset-names = "padctl";
>> +
>> +status = "disabled";
>> +
>> +pads {
>> +usb2 {
>> +clocks = <&bpmp TEGRA194_CLK_USB2_TRK>;
>> +clock-names = "trk";
>> +
>> +lanes {
>> +usb2-0 {
>> +nvidia,function = "xusb";
>> +status = "disabled";
>> +#phy-cells = <0>;
>> +};
>> +usb2-1 {
>> +nvidia,function = "xusb";
>> +status = "disabled";
>> +#phy-cells = <0>;
>> +};
>> +usb2-2 {
>> +nvidia,function = "xusb";
>> +status = "disabled";
>> +#phy-cells = <0>;
>> +};
>> +usb2-3 {
>> +nvidia,function = "xusb";
>> +status = "disabled";
>> +#phy-cells = <0>;
>> +};
>> +};
>> +};
>> +usb3 {
>> +lanes {
>> +usb3-0 {
>> +nvidia,function = "xusb";
>> +status = "disabled";
>> +#phy-cells = <0>;
>> +};
>> +usb3-1 {
>> +nvidia,function = "xusb";
>> +status = "disabled";
>> +#phy-cells = <0>;
>> +};
>> +usb3-2 {
>> +nvidia,function = "xusb";
>> +status = "disabled";
>> +#phy-cells = <0>;
>> +};
>> +usb3-3 {
>> +nvidia,function = "xusb";
>> +status = "disabled";
>> +#phy-cells = <0>;
>> +};
>> +};
>> +};
>> +};
>> +
>> +ports {
>> +usb2-0 {
>> +status = "disabled";
>> +};
>> +usb2-1 {
>> +status = "disabled";
>> +};
>> +usb2-2 {
>> +status = "disabled";
>> +};
>> +usb2-3 {
>> +status = "disabled";
>> +};
>> +usb3-0 {
>> +status = "disabled";
>> +};
>> +usb3-1 {
>> +status = "disabled";
>> +};
>> +usb3-2 {
>> +status = "disabled";
>> +};
>> +usb3-3 {
>> +status = "disable

Re: [PATCH 5/6] arm64: tegra: Add XUSB and pad controller on Tegra194

2019-10-02 Thread JC Kuo
On 10/2/19 6:10 PM, Thierry Reding wrote:
> On Wed, Oct 02, 2019 at 04:00:50PM +0800, JC Kuo wrote:
>> Adds the XUSB pad and XUSB controllers on Tegra194.
>>
>> Signed-off-by: JC Kuo 
>> ---
>>  arch/arm64/boot/dts/nvidia/tegra194.dtsi | 130 +++
>>  1 file changed, 130 insertions(+)
>>
>> diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
>> b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
>> index 3c0cf54f0aab..4d3371d3a407 100644
>> --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
>> +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
>> @@ -1599,4 +1599,134 @@
>>  interrupt-parent = <&gic>;
>>  always-on;
>>  };
>> +
>> +xusb_padctl: padctl@352 {
>> +compatible = "nvidia,tegra194-xusb-padctl";
>> +reg = <0x0 0x0352 0x0 0x1000>,
>> +<0x0 0x0354 0x0 0x1000>;
> 
> These should generally be aligned. Use tabs first and then spaces to
> make the opening < on subsequent lines align with the opening < on the
> first line. There are a couple more like this below.
Thanks. I will make those aligned.
> 
>> +reg-names = "padctl", "ao";
>> +
>> +resets = <&bpmp TEGRA194_RESET_XUSB_PADCTL>;
>> +reset-names = "padctl";
>> +
>> +status = "disabled";
>> +
>> +pads {
>> +usb2 {
>> +clocks = <&bpmp TEGRA194_CLK_USB2_TRK>;
>> +clock-names = "trk";
>> +
>> +lanes {
>> +usb2-0 {
>> +nvidia,function = "xusb";
>> +status = "disabled";
>> +#phy-cells = <0>;
>> +};
>> +usb2-1 {
> 
> I prefer blank lines to visually separate blocks here and below.
Sure, will do.
> 
>> +nvidia,function = "xusb";
>> +status = "disabled";
>> +#phy-cells = <0>;
>> +};
>> +usb2-2 {
>> +nvidia,function = "xusb";
>> +status = "disabled";
>> +#phy-cells = <0>;
>> +};
>> +usb2-3 {
>> +nvidia,function = "xusb";
>> +status = "disabled";
>> +#phy-cells = <0>;
>> +};
>> +};
>> +};
>> +usb3 {
>> +lanes {
>> +usb3-0 {
>> +nvidia,function = "xusb";
>> +status = "disabled";
>> +#phy-cells = <0>;
>> +};
>> +usb3-1 {
>> +nvidia,function = "xusb";
>> +status = "disabled";
>> +#phy-cells = <0>;
>> +};
>> +usb3-2 {
>> +nvidia,function = "xusb";
>> +status = "disabled";
>> +#phy-cells = <0>;
>> +};
>> +usb3-3 {
>> +nvidia,function = "xusb";
>> +status = "disabled";
>> +#phy-cells = <0>;
>> +};
>> +};
>> +};
>> +};
>> +
>> +ports {
>> +usb2-0 {
>> +status = "disabled";
>> +};
>> +usb2-1 {
>> +status = "disabled";
>> +};
>> +usb2-2 {
>> +status = "disabled";
>> +};
>> +usb2-3 {
>> +status = "disabled";
>> +};
>> +usb3-0 {
>> +status = "disabled";
>> +};
>> +usb3-1 {
>> +status = "disabled";
>> +};
>> +usb3-2 {
>> +   

Re: [PATCH 3/6] phy: tegra: xusb: Add Tegra194 support

2019-10-02 Thread JC Kuo
On 10/2/19 6:02 PM, Thierry Reding wrote:
> On Wed, Oct 02, 2019 at 04:00:48PM +0800, JC Kuo wrote:
>> Add support for the XUSB pad controller found on Tegra194 SoCs. It is
>> mostly similar to the same IP found on Tegra186, but the number of
>> pads exposed differs, as do the programming sequences. Because most of
>> the Tegra194 XUSB PADCTL registers definition and programming sequence
>> are the same as Tegra186, Tegra194 XUSB PADCTL can share the same
>> driver, xusb-tegra186.c, with Tegra186 XUSB PADCTL.
>>
>> Tegra194 XUSB PADCTL supports up to USB 3.1 Gen 2 speed, however, it
>> is possible for some platforms have long signal trace that could not
>> provide sufficient electrical environment for Gen 2 speed. This patch
>> introduce a new device node property "nvidia,disable-gen2" that can
>> be used to specifically disable Gen 2 speed for a particular USB 3.0
>> port so that the port can be limited to Gen 1 speed and avoid the
>> instability.
>>
>> Signed-off-by: JC Kuo 
>> ---
>>  drivers/phy/tegra/Makefile|  1 +
>>  drivers/phy/tegra/xusb-tegra186.c | 77 +++
>>  drivers/phy/tegra/xusb.c  | 13 ++
>>  drivers/phy/tegra/xusb.h  |  4 ++
>>  4 files changed, 95 insertions(+)
>>
>> diff --git a/drivers/phy/tegra/Makefile b/drivers/phy/tegra/Makefile
>> index 320dd389f34d..89b84067cb4c 100644
>> --- a/drivers/phy/tegra/Makefile
>> +++ b/drivers/phy/tegra/Makefile
>> @@ -6,4 +6,5 @@ phy-tegra-xusb-$(CONFIG_ARCH_TEGRA_124_SOC) += 
>> xusb-tegra124.o
>>  phy-tegra-xusb-$(CONFIG_ARCH_TEGRA_132_SOC) += xusb-tegra124.o
>>  phy-tegra-xusb-$(CONFIG_ARCH_TEGRA_210_SOC) += xusb-tegra210.o
>>  phy-tegra-xusb-$(CONFIG_ARCH_TEGRA_186_SOC) += xusb-tegra186.o
>> +phy-tegra-xusb-$(CONFIG_ARCH_TEGRA_194_SOC) += xusb-tegra186.o
>>  obj-$(CONFIG_PHY_TEGRA194_P2U) += phy-tegra194-p2u.o
>> diff --git a/drivers/phy/tegra/xusb-tegra186.c 
>> b/drivers/phy/tegra/xusb-tegra186.c
>> index 6f3afaf9398f..4e27acf398b2 100644
>> --- a/drivers/phy/tegra/xusb-tegra186.c
>> +++ b/drivers/phy/tegra/xusb-tegra186.c
>> @@ -64,6 +64,13 @@
>>  #define  SSPX_ELPG_CLAMP_EN_EARLY(x)BIT(1 + (x) * 3)
>>  #define  SSPX_ELPG_VCORE_DOWN(x)BIT(2 + (x) * 3)
>>  
>> +#if IS_ENABLED(CONFIG_ARCH_TEGRA_194_SOC)
>> +#define XUSB_PADCTL_SS_PORT_CFG 0x2c
>> +#define   PORTX_SPEED_SUPPORT_SHIFT(x)  ((x) * 4)
>> +#define   PORTX_SPEED_SUPPORT_MASK  (0x3)
>> +#define PORT_SPEED_SUPPORT_GEN1 (0x0)
>> +#endif
> 
> I wouldn't bother protecting these with the #if/#endif.
It will be removed in the next revision.
> 
>> +
>>  #define XUSB_PADCTL_USB2_OTG_PADX_CTL0(x)   (0x88 + (x) * 0x40)
>>  #define  HS_CURR_LEVEL(x)   ((x) & 0x3f)
>>  #define  TERM_SEL   BIT(25)
>> @@ -635,6 +642,17 @@ static int tegra186_usb3_phy_power_on(struct phy *phy)
>>  
>>  padctl_writel(padctl, value, XUSB_PADCTL_SS_PORT_CAP);
>>  
>> +#if IS_ENABLED(CONFIG_ARCH_TEGRA_194_SOC)
>> +if (padctl->soc == &tegra194_xusb_padctl_soc && port->disable_gen2) {
>> +value = padctl_readl(padctl, XUSB_PADCTL_SS_PORT_CFG);
>> +value &= ~(PORTX_SPEED_SUPPORT_MASK <<
>> +PORTX_SPEED_SUPPORT_SHIFT(index));
>> +value |= (PORT_SPEED_SUPPORT_GEN1 <<
>> +PORTX_SPEED_SUPPORT_SHIFT(index));
>> +padctl_writel(padctl, value, XUSB_PADCTL_SS_PORT_CFG);
>> +}
>> +#endif
> 
> Same here. Also, I think you can drop the extra check for padctl->soc
> and only rely on port->disable_gen2. This is not a lot of code, so might
> as well make our life simpler by building it unconditionally.
> 
> On another note: checking the padctl->soc pointer against a SoC-specific
> structure is a neat way to check for this support. However, it's not
> very flexible. Consider what happens when the next chip is released. I
> think we can assume that it will also support gen 2 and may also require
> some boards to disable gen 2 because of long signal traces. In order to
> accomodate that, you'd have to extend this check with another comparison
> to that new SoC structure.
> 
> A better alternative would be to add this as a "feature" flag to the SoC
> structure:
> 
>   struct tegra_xusb_pad_soc {
>   ...
>   bool supports_gen2;
>   };
> 
> Presumably every SoC that supports gen 2 will also need support for
> explicitly disabling gen 2 if the board doesn't support it, so you can
> use that new feature flag to conditionalize this code.
> 
> This way, the next SoC generation can support can simply be added by
> setting supports_gen2 = true, without requiring any actual code changes
> (unless of course if it supports new features).
> 
> Multi-SoC support is also a good argument for dropping the #if/#endif
> protection, because those would need to be extended for the next SoC
> generation as well.
> 
Thanks Thierry. This implementation is better. I wi

Re: [PATCH] Make SPLIT_RSS_COUNTING configurable

2019-10-02 Thread Qian Cai



> On Oct 2, 2019, at 4:29 PM, Daniel Colascione  wrote:
> 
> Adding the correct linux-mm address.
> 
> 
>> On Wed, Oct 2, 2019 at 1:25 PM Daniel Colascione  wrote:
>> 
>> Using the new config option, users can disable SPLIT_RSS_COUNTING to
>> get increased accuracy in user-visible mm counters.
>> 
>> Signed-off-by: Daniel Colascione 
>> ---
>> include/linux/mm.h|  4 ++--
>> include/linux/mm_types_task.h |  5 ++---
>> include/linux/sched.h |  2 +-
>> kernel/fork.c |  2 +-
>> mm/Kconfig| 11 +++
>> mm/memory.c   |  6 +++---
>> 6 files changed, 20 insertions(+), 10 deletions(-)
>> 
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index cc292273e6ba..221395de3cb4 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1637,7 +1637,7 @@ static inline unsigned long get_mm_counter(struct 
>> mm_struct *mm, int member)
>> {
>>long val = atomic_long_read(&mm->rss_stat.count[member]);
>> 
>> -#ifdef SPLIT_RSS_COUNTING
>> +#ifdef CONFIG_SPLIT_RSS_COUNTING
>>/*
>> * counter is updated in asynchronous manner and may go to minus.
>> * But it's never be expected number for users.
>> @@ -1723,7 +1723,7 @@ static inline void setmax_mm_hiwater_rss(unsigned long 
>> *maxrss,
>>*maxrss = hiwater_rss;
>> }
>> 
>> -#if defined(SPLIT_RSS_COUNTING)
>> +#ifdef CONFIG_SPLIT_RSS_COUNTING
>> void sync_mm_rss(struct mm_struct *mm);
>> #else
>> static inline void sync_mm_rss(struct mm_struct *mm)
>> diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h
>> index c1bc6731125c..d2adc8057e65 100644
>> --- a/include/linux/mm_types_task.h
>> +++ b/include/linux/mm_types_task.h
>> @@ -48,14 +48,13 @@ enum {
>>NR_MM_COUNTERS
>> };
>> 
>> -#if USE_SPLIT_PTE_PTLOCKS && defined(CONFIG_MMU)
>> -#define SPLIT_RSS_COUNTING
>> +#ifdef CONFIG_SPLIT_RSS_COUNTING
>> /* per-thread cached information, */
>> struct task_rss_stat {
>>int events; /* for synchronization threshold */
>>int count[NR_MM_COUNTERS];
>> };
>> -#endif /* USE_SPLIT_PTE_PTLOCKS */
>> +#endif /* CONFIG_SPLIT_RSS_COUNTING */
>> 
>> struct mm_rss_stat {
>>atomic_long_t count[NR_MM_COUNTERS];
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 2c2e56bd8913..22f354774540 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -729,7 +729,7 @@ struct task_struct {
>>/* Per-thread vma caching: */
>>struct vmacache vmacache;
>> 
>> -#ifdef SPLIT_RSS_COUNTING
>> +#ifdef CONFIG_SPLIT_RSS_COUNTING
>>struct task_rss_statrss_stat;
>> #endif
>>int exit_state;
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index f9572f416126..fc5e0889922b 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -1917,7 +1917,7 @@ static __latent_entropy struct task_struct 
>> *copy_process(
>>p->vtime.state = VTIME_INACTIVE;
>> #endif
>> 
>> -#if defined(SPLIT_RSS_COUNTING)
>> +#ifdef CONFIG_SPLIT_RSS_COUNTING
>>memset(&p->rss_stat, 0, sizeof(p->rss_stat));
>> #endif
>> 
>> diff --git a/mm/Kconfig b/mm/Kconfig
>> index a5dae9a7eb51..372ef9449924 100644
>> --- a/mm/Kconfig
>> +++ b/mm/Kconfig
>> @@ -736,4 +736,15 @@ config ARCH_HAS_PTE_SPECIAL
>> config ARCH_HAS_HUGEPD
>>bool
>> 
>> +config SPLIT_RSS_COUNTING
>> +   bool "Per-thread mm counter caching"
>> +   depends on MMU
>> +   default y if NR_CPUS >= SPLIT_PTLOCK_CPUS
>> +   help
>> + Cache mm counter updates in thread structures and
>> + flush them to visible per-process statistics in batches.
>> + Say Y here to slightly reduce cache contention in processes
>> + with many threads at the expense of decreasing the accuracy
>> + of memory statistics in /proc.
>> +
>> endmenu

All those vague words are going to make developers almost impossible to decide 
the right selection here. It sounds like we should kill SPLIT_RSS_COUNTING at 
all to simplify the code as the benefit is so small vs the side-effect?

>> diff --git a/mm/memory.c b/mm/memory.c
>> index b1ca51a079f2..bf557ed5ba23 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -141,7 +141,7 @@ static int __init init_zero_pfn(void)
>> core_initcall(init_zero_pfn);
>> 
>> 
>> -#if defined(SPLIT_RSS_COUNTING)
>> +#ifdef CONFIG_SPLIT_RSS_COUNTING
>> 
>> void sync_mm_rss(struct mm_struct *mm)
>> {
>> @@ -177,7 +177,7 @@ static void check_sync_rss_stat(struct task_struct *task)
>>if (unlikely(task->rss_stat.events++ > TASK_RSS_EVENTS_THRESH))
>>sync_mm_rss(task->mm);
>> }
>> -#else /* SPLIT_RSS_COUNTING */
>> +#else /* CONFIG_SPLIT_RSS_COUNTING */
>> 
>> #define inc_mm_counter_fast(mm, member) inc_mm_counter(mm, member)
>> #define dec_mm_counter_fast(mm, member) dec_mm_counter(mm, member)
>> @@ -186,7 +186,7 @@ static void check_sync_rss_stat(struct task_struct *task)
>> {
>> }

Re: [PATCH 4/6] dt-bindings: phy: tegra: Add Tegra194 support

2019-10-02 Thread JC Kuo
Thanks Thierry. I will fix the typo in the next revision.

On 10/2/19 5:44 PM, Thierry Reding wrote:
> On Wed, Oct 02, 2019 at 04:00:49PM +0800, JC Kuo wrote:
>> Extend the bindings to cover the set of features found in Tegra194.
>> Note that, technically, there are four more supplies connected to the
>> XUSB pad controller (DVDD_PEX, DVDD_PEX_PLL, HVDD_PEX and HVDD_PEX_PLL)
>> , but the power sequencing requirements of Tegra194 require these to be
>> under the control of the PMIC.
>>
>> Tegra194 XUSB PADCTL supports up to USB 3.1 Gen 2 speed, however, it is
>> possible for some platforms have long signal trace that could not
>> provide sufficient electrical environment for Gen 2 speed. To deal with
>> this, a new device node property "nvidia,disable-gen2" was added to
>> Tegra194 that be used to specifically disable Gen 2 speed for a
>> particular USB 3.0 port so that the port can be limited to Gen 1 speed
>> and avoid the instability.
>>
>> Signed-off-by: JC Kuo 
>> ---
>>  .../bindings/phy/nvidia,tegra124-xusb-padctl.txt | 16 
>>  1 file changed, 16 insertions(+)
>>
>> diff --git 
>> a/Documentation/devicetree/bindings/phy/nvidia,tegra124-xusb-padctl.txt 
>> b/Documentation/devicetree/bindings/phy/nvidia,tegra124-xusb-padctl.txt
>> index 9fb682e47c29..3bef37e7c365 100644
>> --- a/Documentation/devicetree/bindings/phy/nvidia,tegra124-xusb-padctl.txt
>> +++ b/Documentation/devicetree/bindings/phy/nvidia,tegra124-xusb-padctl.txt
>> @@ -37,6 +37,7 @@ Required properties:
>>- Tegra132: "nvidia,tegra132-xusb-padctl", "nvidia,tegra124-xusb-padctl"
>>- Tegra210: "nvidia,tegra210-xusb-padctl"
>>- Tegra186: "nvidia,tegra186-xusb-padctl"
>> +  - Tegra194: "nvidia,tegra194-xusb-padctl"
>>  - reg: Physical base address and length of the controller's registers.
>>  - resets: Must contain an entry for each entry in reset-names.
>>  - reset-names: Must include the following entries:
>> @@ -62,6 +63,10 @@ For Tegra186:
>>  - vclamp-usb-supply: Bias rail for USB pad. Must supply 1.8 V.
>>  - vddio-hsic-supply: HSIC PHY power supply. Must supply 1.2 V.
>>  
>> +For Tegra194:
>> +- avdd-usb-supply: USB I/Os, VBUS, ID, REXT, D+/D- power supply. Must supply
>> +  3.3 V.
>> +- vclamp-usb-supply: Bias rail for USB pad. Must supply 1.8 V.
>>  
>>  Pad nodes:
>>  ==
>> @@ -154,6 +159,11 @@ For Tegra210, the list of valid PHY nodes is given 
>> below:
>>  - sata: sata-0
>>- functions: "usb3-ss", "sata"
>>  
>> +For Tegra194, the list of valid PHY nodes is given below:
>> +- usb2: usb2-0, usb2-1, usb2-2, usb2-3
>> +  - functions: "xusb"
>> +- usb3: usb3-0, usb3-1, usb3-2, usb3-3
>> +  - functions: "xusb"
>>  
>>  Port nodes:
>>  ===
>> @@ -221,6 +231,9 @@ Optional properties:
>>is internal. In the absence of this property the port is considered to be
>>external.
>>  
>> +- nvidia,disable-gen2: A boolean property whose presence determines that a 
>> port
>> +  should be limited to USB 3.1 Gen 1. This properlty is only for Tegra194.
> 
> s/properlty/property/
> 
> With that:
> 
> Acked-by: Thierry Reding 
> 


Re: [PATCH] kheaders: making headers archive reproducible

2019-10-02 Thread Masahiro Yamada
Hi Dmitry,


(+CC Ben Hutchings, who might be interested)


On Sun, Sep 22, 2019 at 10:38 PM Dmitry Goldin  wrote:
>
> From: Dmitry Goldin 
>
> In commit 43d8ce9d65a5 ("Provide in-kernel headers to make
> extending kernel easier") a new mechanism was introduced, for kernels
> >=5.2, which embeds the kernel headers in the kernel image or a module
> and exposed them in procfs for use by userland tools.
>
> The archive containing the header files has nondeterminism through the
> header files metadata. This patch normalizes the metadata and utilizes
> KBUILD_BUILD_TIMESTAMP if provided and otherwise falls back to the
> default behaviour.
>
> In commit f7b101d33046 ("kheaders: Move from proc to sysfs") it was
> modified to use sysfs and the script for generation of the archive was
> renamed to what is being patched.
>
> Signed-off-by: Dmitry Goldin 
> ---
>  kernel/gen_kheaders.sh | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)


Thanks, this produced the deterministic archive for me.


While you are here, could you also update the following hunk
in Documentation/kbuild/reproducible-builds.rst

-->8---
The kernel embeds a timestamp in two places:

* The version string exposed by ``uname()`` and included in
  ``/proc/version``

* File timestamps in the embedded initramfs
-->8---


With the documentation updated, I will pick it soon.

Thank you.




> diff --git a/kernel/gen_kheaders.sh b/kernel/gen_kheaders.sh
> index 9ff449888d9c..2e154741e3b2 100755
> --- a/kernel/gen_kheaders.sh
> +++ b/kernel/gen_kheaders.sh
> @@ -71,7 +71,10 @@ done | cpio --quiet -pd $cpio_dir >/dev/null 2>&1
>  find $cpio_dir -type f -print0 |
> xargs -0 -P8 -n1 perl -pi -e 'BEGIN {undef $/;}; 
> s/\/\*((?!SPDX).)*?\*\///smg;'
>
> -tar -Jcf $tarfile -C $cpio_dir/ . > /dev/null
> +# Create archive and try to normalized metadata for reproducibility
> +tar "${KBUILD_BUILD_TIMESTAMP:+--mtime=$KBUILD_BUILD_TIMESTAMP}" \
> +--owner=0 --group=0 --sort=name --numeric-owner \
> +-Jcf $tarfile -C $cpio_dir/ . > /dev/null
>
>  echo "$src_files_md5" >  kernel/kheaders.md5
>  echo "$obj_files_md5" >> kernel/kheaders.md5
> --
> 2.19.2
>
>
>


-- 
Best Regards
Masahiro Yamada


[PATCH tip/core/rcu 7/9] rcutorture: Make in-kernel-loop testing more brutal

2019-10-02 Thread paulmck
From: "Paul E. McKenney" 

The rcu_torture_fwd_prog_nr() tests the ability of RCU to tolerate
in-kernel busy loops.  It invokes rcu_torture_fwd_prog_cond_resched()
within its delay loop, which, in PREEMPT && NO_HZ_FULL kernels results
in the occasional direct call to schedule().  Now, this direct call to
schedule() is appropriate for call_rcu() flood testing, in which either
the kernel should restrain itself or userspace transitions will supply
the needed restraint.  But in pure in-kernel loops, the occasional
cond_resched() should do the job.

This commit therefore makes rcu_torture_fwd_prog_nr() use cond_resched()
instead of rcu_torture_fwd_prog_cond_resched() in order to increase the
brutality of this aspect of rcutorture testing.

Signed-off-by: Paul E. McKenney 
---
 kernel/rcu/rcutorture.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index a9e97c3..f1339ee 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1811,7 +1811,7 @@ static void rcu_torture_fwd_prog_nr(int *tested, int 
*tested_tries)
udelay(10);
cur_ops->readunlock(idx);
if (!fwd_progress_need_resched || need_resched())
-   rcu_torture_fwd_prog_cond_resched(1);
+   cond_resched();
}
(*tested_tries)++;
if (!time_before(jiffies, stopat) &&
-- 
2.9.5



[PATCH tip/core/rcu 3/9] rcutorture: Remove CONFIG_HOTPLUG_CPU=n from scenarios

2019-10-02 Thread paulmck
From: "Paul E. McKenney" 

A number of mainstream CPU families are no longer capable of building
kernels having CONFIG_SMP=y and CONFIG_HOTPLUG_CPU=n, so this commit
removes this combination from the rcutorture scenarios having it.
People wishing to try out this combination may still do so using the
"--kconfig CONFIG_HOTPLUG_CPU=n CONFIG_SUSPEND=n CONFIG_HIBERNATION=n"
argument to the tools/testing/selftests/rcutorture/bin/kvm.sh script
that is used to run rcutorture.

Signed-off-by: Paul E. McKenney 
---
 tools/testing/selftests/rcutorture/configs/rcu/TASKS03  | 3 ---
 tools/testing/selftests/rcutorture/configs/rcu/TREE02   | 3 ---
 tools/testing/selftests/rcutorture/configs/rcu/TREE04   | 3 ---
 tools/testing/selftests/rcutorture/configs/rcu/TREE06   | 3 ---
 tools/testing/selftests/rcutorture/configs/rcu/TREE08   | 3 ---
 tools/testing/selftests/rcutorture/configs/rcu/TREE09   | 3 ---
 tools/testing/selftests/rcutorture/configs/rcu/TRIVIAL  | 3 ---
 tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt | 1 -
 8 files changed, 22 deletions(-)

diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TASKS03 
b/tools/testing/selftests/rcutorture/configs/rcu/TASKS03
index 28568b7..ea43990 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TASKS03
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TASKS03
@@ -1,8 +1,5 @@
 CONFIG_SMP=y
 CONFIG_NR_CPUS=2
-CONFIG_HOTPLUG_CPU=n
-CONFIG_SUSPEND=n
-CONFIG_HIBERNATION=n
 CONFIG_PREEMPT_NONE=n
 CONFIG_PREEMPT_VOLUNTARY=n
 CONFIG_PREEMPT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE02 
b/tools/testing/selftests/rcutorture/configs/rcu/TREE02
index 35e639e..65daee4 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE02
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE02
@@ -9,9 +9,6 @@ CONFIG_NO_HZ_IDLE=y
 CONFIG_NO_HZ_FULL=n
 CONFIG_RCU_FAST_NO_HZ=n
 CONFIG_RCU_TRACE=n
-CONFIG_HOTPLUG_CPU=n
-CONFIG_SUSPEND=n
-CONFIG_HIBERNATION=n
 CONFIG_RCU_FANOUT=3
 CONFIG_RCU_FANOUT_LEAF=3
 CONFIG_RCU_NOCB_CPU=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE04 
b/tools/testing/selftests/rcutorture/configs/rcu/TREE04
index 24c9f60..f6d6a40 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE04
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE04
@@ -9,9 +9,6 @@ CONFIG_NO_HZ_IDLE=n
 CONFIG_NO_HZ_FULL=y
 CONFIG_RCU_FAST_NO_HZ=y
 CONFIG_RCU_TRACE=y
-CONFIG_HOTPLUG_CPU=n
-CONFIG_SUSPEND=n
-CONFIG_HIBERNATION=n
 CONFIG_RCU_FANOUT=4
 CONFIG_RCU_FANOUT_LEAF=3
 CONFIG_DEBUG_LOCK_ALLOC=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE06 
b/tools/testing/selftests/rcutorture/configs/rcu/TREE06
index 05a4eec..bf4980d 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE06
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE06
@@ -9,9 +9,6 @@ CONFIG_NO_HZ_IDLE=y
 CONFIG_NO_HZ_FULL=n
 CONFIG_RCU_FAST_NO_HZ=n
 CONFIG_RCU_TRACE=n
-CONFIG_HOTPLUG_CPU=n
-CONFIG_SUSPEND=n
-CONFIG_HIBERNATION=n
 CONFIG_RCU_FANOUT=6
 CONFIG_RCU_FANOUT_LEAF=6
 CONFIG_RCU_NOCB_CPU=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE08 
b/tools/testing/selftests/rcutorture/configs/rcu/TREE08
index fb1c763..c810c52 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE08
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE08
@@ -9,9 +9,6 @@ CONFIG_NO_HZ_IDLE=y
 CONFIG_NO_HZ_FULL=n
 CONFIG_RCU_FAST_NO_HZ=n
 CONFIG_RCU_TRACE=n
-CONFIG_HOTPLUG_CPU=n
-CONFIG_SUSPEND=n
-CONFIG_HIBERNATION=n
 CONFIG_RCU_FANOUT=3
 CONFIG_RCU_FANOUT_LEAF=2
 CONFIG_RCU_NOCB_CPU=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE09 
b/tools/testing/selftests/rcutorture/configs/rcu/TREE09
index 6710e74..8523a75 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE09
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE09
@@ -8,9 +8,6 @@ CONFIG_HZ_PERIODIC=n
 CONFIG_NO_HZ_IDLE=y
 CONFIG_NO_HZ_FULL=n
 CONFIG_RCU_TRACE=n
-CONFIG_HOTPLUG_CPU=n
-CONFIG_SUSPEND=n
-CONFIG_HIBERNATION=n
 CONFIG_RCU_NOCB_CPU=n
 CONFIG_DEBUG_LOCK_ALLOC=n
 CONFIG_RCU_BOOST=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TRIVIAL 
b/tools/testing/selftests/rcutorture/configs/rcu/TRIVIAL
index 4d8eb5b..5d546ef 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TRIVIAL
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TRIVIAL
@@ -6,9 +6,6 @@ CONFIG_PREEMPT=n
 CONFIG_HZ_PERIODIC=n
 CONFIG_NO_HZ_IDLE=y
 CONFIG_NO_HZ_FULL=n
-CONFIG_HOTPLUG_CPU=n
-CONFIG_SUSPEND=n
-CONFIG_HIBERNATION=n
 CONFIG_DEBUG_LOCK_ALLOC=n
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
 CONFIG_RCU_EXPERT=y
diff --git a/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt 
b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
index af6fca0..1b96d68 100644
--- a/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
+++ b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
@@ -6,7 +6,6 @@ Kconfig Parameters:
 
 CONFIG_DEBUG_LOCK_ALLOC -- Do three, covering CONF

[PATCH tip/core/rcu 1/9] rcu: Remove unused function rcutorture_record_progress()

2019-10-02 Thread paulmck
From: Ethan Hansen <1ethanhan...@gmail.com>

The function rcutorture_record_progress() is declared in rcu.h, but is
never used.  This commit therefore removes rcutorture_record_progress()
to clean code.

Signed-off-by: Ethan Hansen <1ethanhan...@gmail.com>
Signed-off-by: Paul E. McKenney 
---
 kernel/rcu/rcu.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 8fd4f82..aeec70f 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -455,7 +455,6 @@ enum rcutorture_type {
 #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
 void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags,
unsigned long *gp_seq);
-void rcutorture_record_progress(unsigned long vernum);
 void do_trace_rcu_torture_read(const char *rcutorturename,
   struct rcu_head *rhp,
   unsigned long secs,
@@ -468,7 +467,6 @@ static inline void rcutorture_get_gp_data(enum 
rcutorture_type test_type,
*flags = 0;
*gp_seq = 0;
 }
-static inline void rcutorture_record_progress(unsigned long vernum) { }
 #ifdef CONFIG_RCU_TRACE
 void do_trace_rcu_torture_read(const char *rcutorturename,
   struct rcu_head *rhp,
-- 
2.9.5



[PATCH tip/core/rcu 5/9] rcu: Remove unused variable rcu_perf_writer_state

2019-10-02 Thread paulmck
From: Ethan Hansen <1ethanhan...@gmail.com>

The variable rcu_perf_writer_state is declared and initialized,
but is never actually referenced. Remove it to clean code.

Signed-off-by: Ethan Hansen <1ethanhan...@gmail.com>
[ ethansen: Also removed unused macros assigned to that variable. ]
Signed-off-by: Paul E. McKenney 
---
 kernel/rcu/rcuperf.c | 16 
 1 file changed, 16 deletions(-)

diff --git a/kernel/rcu/rcuperf.c b/kernel/rcu/rcuperf.c
index 5a879d0..5f884d5 100644
--- a/kernel/rcu/rcuperf.c
+++ b/kernel/rcu/rcuperf.c
@@ -109,15 +109,6 @@ static unsigned long b_rcu_perf_writer_started;
 static unsigned long b_rcu_perf_writer_finished;
 static DEFINE_PER_CPU(atomic_t, n_async_inflight);
 
-static int rcu_perf_writer_state;
-#define RTWS_INIT  0
-#define RTWS_ASYNC 1
-#define RTWS_BARRIER   2
-#define RTWS_EXP_SYNC  3
-#define RTWS_SYNC  4
-#define RTWS_IDLE  5
-#define RTWS_STOPPING  6
-
 #define MAX_MEAS 1
 #define MIN_MEAS 100
 
@@ -404,25 +395,20 @@ rcu_perf_writer(void *arg)
if (!rhp)
rhp = kmalloc(sizeof(*rhp), GFP_KERNEL);
if (rhp && atomic_read(this_cpu_ptr(&n_async_inflight)) 
< gp_async_max) {
-   rcu_perf_writer_state = RTWS_ASYNC;
atomic_inc(this_cpu_ptr(&n_async_inflight));
cur_ops->async(rhp, rcu_perf_async_cb);
rhp = NULL;
} else if (!kthread_should_stop()) {
-   rcu_perf_writer_state = RTWS_BARRIER;
cur_ops->gp_barrier();
goto retry;
} else {
kfree(rhp); /* Because we are stopping. */
}
} else if (gp_exp) {
-   rcu_perf_writer_state = RTWS_EXP_SYNC;
cur_ops->exp_sync();
} else {
-   rcu_perf_writer_state = RTWS_SYNC;
cur_ops->sync();
}
-   rcu_perf_writer_state = RTWS_IDLE;
t = ktime_get_mono_fast_ns();
*wdp = t - *wdp;
i_max = i;
@@ -463,10 +449,8 @@ rcu_perf_writer(void *arg)
rcu_perf_wait_shutdown();
} while (!torture_must_stop());
if (gp_async) {
-   rcu_perf_writer_state = RTWS_BARRIER;
cur_ops->gp_barrier();
}
-   rcu_perf_writer_state = RTWS_STOPPING;
writer_n_durations[me] = i_max;
torture_kthread_stopping("rcu_perf_writer");
return 0;
-- 
2.9.5



[PATCH tip/core/rcu 2/9] locktorture: Replace strncmp() with str_has_prefix()

2019-10-02 Thread paulmck
From: Chuhong Yuan 

The strncmp() function is error-prone because it is easy to get the
length wrong, especially if the string is subject to change, especially
given the need to account for the terminating nul byte.  This commit
therefore substitutes the newly introduced str_has_prefix(), which
does not require a separately specified length.

Signed-off-by: Chuhong Yuan 
Signed-off-by: Paul E. McKenney 
---
 kernel/locking/locktorture.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index c513031..8dd9002 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -889,16 +889,16 @@ static int __init lock_torture_init(void)
cxt.nrealwriters_stress = 2 * num_online_cpus();
 
 #ifdef CONFIG_DEBUG_MUTEXES
-   if (strncmp(torture_type, "mutex", 5) == 0)
+   if (str_has_prefix(torture_type, "mutex"))
cxt.debug_lock = true;
 #endif
 #ifdef CONFIG_DEBUG_RT_MUTEXES
-   if (strncmp(torture_type, "rtmutex", 7) == 0)
+   if (str_has_prefix(torture_type, "rtmutex"))
cxt.debug_lock = true;
 #endif
 #ifdef CONFIG_DEBUG_SPINLOCK
-   if ((strncmp(torture_type, "spin", 4) == 0) ||
-   (strncmp(torture_type, "rw_lock", 7) == 0))
+   if ((str_has_prefix(torture_type, "spin")) ||
+   (str_has_prefix(torture_type, "rw_lock")))
cxt.debug_lock = true;
 #endif
 
-- 
2.9.5



[PATCH tip/core/rcu 9/9] rcu: Suppress levelspread uninitialized messages

2019-10-02 Thread paulmck
From: "Paul E. McKenney" 

New tools bring new warnings, and with v5.3 comes:

kernel/rcu/srcutree.c: warning: 'levelspread[]' may be used 
uninitialized in this function [-Wuninitialized]:  => 121:34

This commit suppresses this warning by initializing the full array
to INT_MIN, which will result in failures should any out-of-bounds
references appear.

Reported-by: Michael Ellerman 
Reported-by: Geert Uytterhoeven 
Signed-off-by: Paul E. McKenney 
---
 kernel/rcu/rcu.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index aeec70f..ab504fb 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -299,6 +299,8 @@ static inline void rcu_init_levelspread(int *levelspread, 
const int *levelcnt)
 {
int i;
 
+   for (i = 0; i < RCU_NUM_LVLS; i++)
+   levelspread[i] = INT_MIN;
if (rcu_fanout_exact) {
levelspread[rcu_num_lvls - 1] = rcu_fanout_leaf;
for (i = rcu_num_lvls - 2; i >= 0; i--)
-- 
2.9.5



[PATCH tip/core/rcu 0/9] Torture-test updates

2019-10-02 Thread Paul E. McKenney
Hello!

This series provides torture-test updates.

1.  Remove unused function rcutorture_record_progress(), courtesy
of Ethan Hansen.

2.  Replace strncmp() with str_has_prefix(), courtesy of Chuhong Yuan.

3.  Remove CONFIG_HOTPLUG_CPU=n from scenarios.

4.  Emulate dyntick aspect of userspace nohz_full sojourn.

5.  Remove unused variable rcu_perf_writer_state, courtesy of
Ethan Hansen.

6.  Separate warnings for each failure type.

7.  Make in-kernel-loop testing more brutal.

8.  locktorture: Do not include rwlock.h directly, courtesy of
Wolfgang M. Reimer.

9.  Suppress levelspread uninitialized messages.

Thanx, Paul



 include/linux/rcutiny.h |1 
 kernel/locking/locktorture.c|9 +--
 kernel/rcu/rcu.h|4 -
 kernel/rcu/rcuperf.c|   16 --
 kernel/rcu/rcutorture.c |   28 +---
 kernel/rcu/tree.c   |1 
 tools/testing/selftests/rcutorture/configs/rcu/TASKS03  |3 -
 tools/testing/selftests/rcutorture/configs/rcu/TREE02   |3 -
 tools/testing/selftests/rcutorture/configs/rcu/TREE04   |3 -
 tools/testing/selftests/rcutorture/configs/rcu/TREE06   |3 -
 tools/testing/selftests/rcutorture/configs/rcu/TREE08   |3 -
 tools/testing/selftests/rcutorture/configs/rcu/TREE09   |3 -
 tools/testing/selftests/rcutorture/configs/rcu/TRIVIAL  |3 -
 tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt |1 
 14 files changed, 29 insertions(+), 52 deletions(-)


[PATCH tip/core/rcu 8/9] locking: locktorture: Do not include rwlock.h directly

2019-10-02 Thread paulmck
From: "Wolfgang M. Reimer" 

Including rwlock.h directly will cause kernel builds to fail
if CONFIG_PREEMPT_RT is defined. The correct header file
(rwlock_rt.h OR rwlock.h) will be included by spinlock.h which
is included by locktorture.c anyway.

Remove the include of linux/rwlock.h.

Signed-off-by: Wolfgang M. Reimer 
Signed-off-by: Sebastian Andrzej Siewior 
Acked-by: Davidlohr Bueso 
Signed-off-by: Paul E. McKenney 
---
 kernel/locking/locktorture.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index 8dd9002..99475a6 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -16,7 +16,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.9.5



[PATCH tip/core/rcu 4/9] rcutorture: Emulate dyntick aspect of userspace nohz_full sojourn

2019-10-02 Thread paulmck
From: "Paul E. McKenney" 

During an actual call_rcu() flood, there would be frequent trips to
userspace (in-kernel call_rcu() floods must be otherwise housebroken).
Userspace execution on nohz_full CPUs implies an RCU dyntick idle/not-idle
transition pair, so this commit adds emulation of that pair.

Signed-off-by: Paul E. McKenney 
---
 include/linux/rcutiny.h |  1 +
 kernel/rcu/rcutorture.c | 11 +++
 kernel/rcu/tree.c   |  1 +
 3 files changed, 13 insertions(+)

diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 9bf1dfe..37b6f0c 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -84,6 +84,7 @@ static inline void rcu_scheduler_starting(void) { }
 #endif /* #else #ifndef CONFIG_SRCU */
 static inline void rcu_end_inkernel_boot(void) { }
 static inline bool rcu_is_watching(void) { return true; }
+static inline void rcu_momentary_dyntick_idle(void) { }
 
 /* Avoid RCU read-side critical sections leaking across. */
 static inline void rcu_all_qs(void) { barrier(); }
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 3c9feca..7dcb2b8 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1759,6 +1759,11 @@ static unsigned long rcu_torture_fwd_prog_cbfree(void)
kfree(rfcp);
freed++;
rcu_torture_fwd_prog_cond_resched(freed);
+   if (tick_nohz_full_enabled()) {
+   local_irq_save(flags);
+   rcu_momentary_dyntick_idle();
+   local_irq_restore(flags);
+   }
}
return freed;
 }
@@ -1833,6 +1838,7 @@ static void rcu_torture_fwd_prog_nr(int *tested, int 
*tested_tries)
 static void rcu_torture_fwd_prog_cr(void)
 {
unsigned long cver;
+   unsigned long flags;
unsigned long gps;
int i;
long n_launders;
@@ -1891,6 +1897,11 @@ static void rcu_torture_fwd_prog_cr(void)
}
cur_ops->call(&rfcp->rh, rcu_torture_fwd_cb_cr);
rcu_torture_fwd_prog_cond_resched(n_launders + n_max_cbs);
+   if (tick_nohz_full_enabled()) {
+   local_irq_save(flags);
+   rcu_momentary_dyntick_idle();
+   local_irq_restore(flags);
+   }
}
stoppedat = jiffies;
n_launders_cb_snap = READ_ONCE(n_launders_cb);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8110514..5692db5 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -375,6 +375,7 @@ static void __maybe_unused rcu_momentary_dyntick_idle(void)
WARN_ON_ONCE(!(special & RCU_DYNTICK_CTRL_CTR));
rcu_preempt_deferred_qs(current);
 }
+EXPORT_SYMBOL_GPL(rcu_momentary_dyntick_idle);
 
 /**
  * rcu_is_cpu_rrupt_from_idle - see if interrupted from idle
-- 
2.9.5



[PATCH tip/core/rcu 6/9] rcutorture: Separate warnings for each failure type

2019-10-02 Thread paulmck
From: "Paul E. McKenney" 

Currently, each of six different types of failure triggers a
single WARN_ON_ONCE(), and it is then necessary to stare at the
rcu_torture_stats(), Reader Pipe, and Reader Batch lines looking for
inappropriately non-zero values.  This can be annoying and error-prone,
so this commit provides a separate WARN_ON_ONCE() for each of the
six error conditions and adds short comments to each to ease error
identification.

Signed-off-by: Paul E. McKenney 
---
 kernel/rcu/rcutorture.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 7dcb2b8..a9e97c3 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1442,15 +1442,18 @@ rcu_torture_stats_print(void)
n_rcu_torture_barrier_error);
 
pr_alert("%s%s ", torture_type, TORTURE_FLAG);
-   if (atomic_read(&n_rcu_torture_mberror) != 0 ||
-   n_rcu_torture_barrier_error != 0 ||
-   n_rcu_torture_boost_ktrerror != 0 ||
-   n_rcu_torture_boost_rterror != 0 ||
-   n_rcu_torture_boost_failure != 0 ||
+   if (atomic_read(&n_rcu_torture_mberror) ||
+   n_rcu_torture_barrier_error || n_rcu_torture_boost_ktrerror ||
+   n_rcu_torture_boost_rterror || n_rcu_torture_boost_failure ||
i > 1) {
pr_cont("%s", "!!! ");
atomic_inc(&n_rcu_torture_error);
-   WARN_ON_ONCE(1);
+   WARN_ON_ONCE(atomic_read(&n_rcu_torture_mberror));
+   WARN_ON_ONCE(n_rcu_torture_barrier_error);  // rcu_barrier()
+   WARN_ON_ONCE(n_rcu_torture_boost_ktrerror); // no boost kthread
+   WARN_ON_ONCE(n_rcu_torture_boost_rterror); // can't set RT prio
+   WARN_ON_ONCE(n_rcu_torture_boost_failure); // RCU boost failed
+   WARN_ON_ONCE(i > 1); // Too-short grace period
}
pr_cont("Reader Pipe: ");
for (i = 0; i < RCU_TORTURE_PIPE_LEN + 1; i++)
-- 
2.9.5



[PATCH tip/core/rcu 5/9] fs/afs: Replace rcu_swap_protected() with rcu_replace()

2019-10-02 Thread paulmck
From: "Paul E. McKenney" 

This commit replaces the use of rcu_swap_protected() with the more
intuitively appealing rcu_replace() as a step towards removing
rcu_swap_protected().

Link: 
https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=z7-ggtm6wcvtyytxza1+bhqta4gg...@mail.gmail.com/
Reported-by: Linus Torvalds 
Signed-off-by: Paul E. McKenney 
Cc: David Howells 
Cc: 
Cc: 
---
 fs/afs/vl_list.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/afs/vl_list.c b/fs/afs/vl_list.c
index 21eb0c0..e594598 100644
--- a/fs/afs/vl_list.c
+++ b/fs/afs/vl_list.c
@@ -279,8 +279,8 @@ struct afs_vlserver_list *afs_extract_vlserver_list(struct 
afs_cell *cell,
struct afs_addr_list *old = addrs;
 
write_lock(&server->lock);
-   rcu_swap_protected(server->addresses, old,
-  lockdep_is_held(&server->lock));
+   old = rcu_replace(server->addresses, old,
+ lockdep_is_held(&server->lock));
write_unlock(&server->lock);
afs_put_addrlist(old);
}
-- 
2.9.5



[PATCH tip/core/rcu 9/9] net/sched: Replace rcu_swap_protected() with rcu_replace()

2019-10-02 Thread paulmck
From: "Paul E. McKenney" 

This commit replaces the use of rcu_swap_protected() with the more
intuitively appealing rcu_replace() as a step towards removing
rcu_swap_protected().

Link: 
https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=z7-ggtm6wcvtyytxza1+bhqta4gg...@mail.gmail.com/
Reported-by: Linus Torvalds 
Signed-off-by: Paul E. McKenney 
Cc: Jamal Hadi Salim 
Cc: Cong Wang 
Cc: Jiri Pirko 
Cc: "David S. Miller" 
Cc: 
Cc: 
---
 net/sched/act_api.c| 2 +-
 net/sched/act_csum.c   | 4 ++--
 net/sched/act_ct.c | 2 +-
 net/sched/act_ctinfo.c | 4 ++--
 net/sched/act_ife.c| 2 +-
 net/sched/act_mirred.c | 4 ++--
 net/sched/act_mpls.c   | 2 +-
 net/sched/act_police.c | 6 +++---
 net/sched/act_skbedit.c| 4 ++--
 net/sched/act_tunnel_key.c | 4 ++--
 net/sched/act_vlan.c   | 2 +-
 11 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 2558f00..1ab810c 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -88,7 +88,7 @@ struct tcf_chain *tcf_action_set_ctrlact(struct tc_action *a, 
int action,
 struct tcf_chain *goto_chain)
 {
a->tcfa_action = action;
-   rcu_swap_protected(a->goto_chain, goto_chain, 1);
+   goto_chain = rcu_replace(a->goto_chain, goto_chain, 1);
return goto_chain;
 }
 EXPORT_SYMBOL(tcf_action_set_ctrlact);
diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index d3cfad8..ced5fe6 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -101,8 +101,8 @@ static int tcf_csum_init(struct net *net, struct nlattr 
*nla,
 
spin_lock_bh(&p->tcf_lock);
goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
-   rcu_swap_protected(p->params, params_new,
-  lockdep_is_held(&p->tcf_lock));
+   params_new = rcu_replace(p->params, params_new,
+lockdep_is_held(&p->tcf_lock));
spin_unlock_bh(&p->tcf_lock);
 
if (goto_ch)
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index fcc4602..500e874 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -722,7 +722,7 @@ static int tcf_ct_init(struct net *net, struct nlattr *nla,
 
spin_lock_bh(&c->tcf_lock);
goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
-   rcu_swap_protected(c->params, params, lockdep_is_held(&c->tcf_lock));
+   params = rcu_replace(c->params, params, lockdep_is_held(&c->tcf_lock));
spin_unlock_bh(&c->tcf_lock);
 
if (goto_ch)
diff --git a/net/sched/act_ctinfo.c b/net/sched/act_ctinfo.c
index 0dbcfd1..e6ea270 100644
--- a/net/sched/act_ctinfo.c
+++ b/net/sched/act_ctinfo.c
@@ -257,8 +257,8 @@ static int tcf_ctinfo_init(struct net *net, struct nlattr 
*nla,
 
spin_lock_bh(&ci->tcf_lock);
goto_ch = tcf_action_set_ctrlact(*a, actparm->action, goto_ch);
-   rcu_swap_protected(ci->params, cp_new,
-  lockdep_is_held(&ci->tcf_lock));
+   cp_new = rcu_replace(ci->params, cp_new,
+lockdep_is_held(&ci->tcf_lock));
spin_unlock_bh(&ci->tcf_lock);
 
if (goto_ch)
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 3a31e24..a6a60b8 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -594,7 +594,7 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
spin_lock_bh(&ife->tcf_lock);
/* protected by tcf_lock when modifying existing action */
goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
-   rcu_swap_protected(ife->params, p, 1);
+   p = rcu_replace(ife->params, p, 1);
 
if (exists)
spin_unlock_bh(&ife->tcf_lock);
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 9ce073a..1ed5d7e 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -178,8 +178,8 @@ static int tcf_mirred_init(struct net *net, struct nlattr 
*nla,
goto put_chain;
}
mac_header_xmit = dev_is_mac_header_xmit(dev);
-   rcu_swap_protected(m->tcfm_dev, dev,
-  lockdep_is_held(&m->tcf_lock));
+   dev = rcu_replace(m->tcfm_dev, dev,
+ lockdep_is_held(&m->tcf_lock));
if (dev)
dev_put(dev);
m->tcfm_mac_header_xmit = mac_header_xmit;
diff --git a/net/sched/act_mpls.c b/net/sched/act_mpls.c
index e168df0..cea8771 100644
--- a/net/sched/act_mpls.c
+++ b/net/sched/act_mpls.c
@@ -258,7 +258,7 @@ static int tcf_mpls_init(struct net *net, struct nlattr 
*nla,
 
spin_lock_bh(&m->tcf_lock);
goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
-   rcu_swap_protected(m->mpls_p, p, lockdep_is_held(&m->tcf_lock));
+   p = rcu_replace(m->mpls_p, p, lockdep_is_held(&m->tcf_lock));
spin_unlock_bh(&m->tcf_lock);
 
 

[PATCH tip/core/rcu 1/9] rcu: Upgrade rcu_swap_protected() to rcu_replace()

2019-10-02 Thread paulmck
From: "Paul E. McKenney" 

Although the rcu_swap_protected() macro follows the example of swap(),
the interactions with RCU make its update of its argument somewhat
counter-intuitive.  This commit therefore introduces an rcu_replace()
that returns the old value of the RCU pointer instead of doing the
argument update.  Once all the uses of rcu_swap_protected() are updated
to instead use rcu_replace(), rcu_swap_protected() will be removed.

Link: 
https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=z7-ggtm6wcvtyytxza1+bhqta4gg...@mail.gmail.com/
Reported-by: Linus Torvalds 
Signed-off-by: Paul E. McKenney 
Cc: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
Cc: Shane M Seymour 
Cc: Martin K. Petersen 
---
 include/linux/rcupdate.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 75a2ede..3b73287 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -383,6 +383,24 @@ do {   
  \
 } while (0)
 
 /**
+ * rcu_replace() - replace an RCU pointer, returning its old value
+ * @rcu_ptr: RCU pointer, whose old value is returned
+ * @ptr: regular pointer
+ * @c: the lockdep conditions under which the dereference will take place
+ *
+ * Perform a replacement, where @rcu_ptr is an RCU-annotated
+ * pointer and @c is the lockdep argument that is passed to the
+ * rcu_dereference_protected() call used to read that pointer.  The old
+ * value of @rcu_ptr is returned, and @rcu_ptr is set to @ptr.
+ */
+#define rcu_replace(rcu_ptr, ptr, c)   \
+({ \
+   typeof(ptr) __tmp = rcu_dereference_protected((rcu_ptr), (c));  \
+   rcu_assign_pointer((rcu_ptr), (ptr));   \
+   __tmp;  \
+})
+
+/**
  * rcu_swap_protected() - swap an RCU and a regular pointer
  * @rcu_ptr: RCU pointer
  * @ptr: regular pointer
-- 
2.9.5



[PATCH tip/core/rcu 4/9] drivers/scsi: Replace rcu_swap_protected() with rcu_replace()

2019-10-02 Thread paulmck
From: "Paul E. McKenney" 

This commit replaces the use of rcu_swap_protected() with the more
intuitively appealing rcu_replace() as a step towards removing
rcu_swap_protected().

Link: 
https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=z7-ggtm6wcvtyytxza1+bhqta4gg...@mail.gmail.com/
Reported-by: Linus Torvalds 
Signed-off-by: Paul E. McKenney 
Cc: "James E.J. Bottomley" 
Cc: "Martin K. Petersen" 
Cc: 
Cc: 
---
 drivers/scsi/scsi.c   | 4 ++--
 drivers/scsi/scsi_sysfs.c | 8 
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 1f5b5c8..6a38d4a 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -434,8 +434,8 @@ static void scsi_update_vpd_page(struct scsi_device *sdev, 
u8 page,
return;
 
mutex_lock(&sdev->inquiry_mutex);
-   rcu_swap_protected(*sdev_vpd_buf, vpd_buf,
-  lockdep_is_held(&sdev->inquiry_mutex));
+   vpd_buf = rcu_replace(*sdev_vpd_buf, vpd_buf,
+ lockdep_is_held(&sdev->inquiry_mutex));
mutex_unlock(&sdev->inquiry_mutex);
 
if (vpd_buf)
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 64c96c7..8d17779 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -466,10 +466,10 @@ static void scsi_device_dev_release_usercontext(struct 
work_struct *work)
sdev->request_queue = NULL;
 
mutex_lock(&sdev->inquiry_mutex);
-   rcu_swap_protected(sdev->vpd_pg80, vpd_pg80,
-  lockdep_is_held(&sdev->inquiry_mutex));
-   rcu_swap_protected(sdev->vpd_pg83, vpd_pg83,
-  lockdep_is_held(&sdev->inquiry_mutex));
+   vpd_pg80 = rcu_replace(sdev->vpd_pg80, vpd_pg80,
+  lockdep_is_held(&sdev->inquiry_mutex));
+   vpd_pg83 = rcu_replace(sdev->vpd_pg83, vpd_pg83,
+  lockdep_is_held(&sdev->inquiry_mutex));
mutex_unlock(&sdev->inquiry_mutex);
 
if (vpd_pg83)
-- 
2.9.5



[PATCH tip/core/rcu 2/9] x86/kvm/pmu: Replace rcu_swap_protected() with rcu_replace()

2019-10-02 Thread paulmck
From: "Paul E. McKenney" 

This commit replaces the use of rcu_swap_protected() with the more
intuitively appealing rcu_replace() as a step towards removing
rcu_swap_protected().

Link: 
https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=z7-ggtm6wcvtyytxza1+bhqta4gg...@mail.gmail.com/
Reported-by: Linus Torvalds 
Signed-off-by: Paul E. McKenney 
Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: 
Cc: 
---
 arch/x86/kvm/pmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 46875bb..4c37266 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -416,8 +416,8 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void 
__user *argp)
*filter = tmp;
 
mutex_lock(&kvm->lock);
-   rcu_swap_protected(kvm->arch.pmu_event_filter, filter,
-  mutex_is_locked(&kvm->lock));
+   filter = rcu_replace(kvm->arch.pmu_event_filter, filter,
+mutex_is_locked(&kvm->lock));
mutex_unlock(&kvm->lock);
 
synchronize_srcu_expedited(&kvm->srcu);
-- 
2.9.5



[PATCH tip/core/rcu 6/9] bpf/cgroup: Replace rcu_swap_protected() with rcu_replace()

2019-10-02 Thread paulmck
From: "Paul E. McKenney" 

This commit replaces the use of rcu_swap_protected() with the more
intuitively appealing rcu_replace() as a step towards removing
rcu_swap_protected().

Link: 
https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=z7-ggtm6wcvtyytxza1+bhqta4gg...@mail.gmail.com/
Reported-by: Linus Torvalds 
Signed-off-by: Paul E. McKenney 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Martin KaFai Lau 
Cc: Song Liu 
Cc: Yonghong Song 
Cc: 
Cc: 
---
 kernel/bpf/cgroup.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index ddd8add..06a0657 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -180,8 +180,8 @@ static void activate_effective_progs(struct cgroup *cgrp,
 enum bpf_attach_type type,
 struct bpf_prog_array *old_array)
 {
-   rcu_swap_protected(cgrp->bpf.effective[type], old_array,
-  lockdep_is_held(&cgroup_mutex));
+   old_array = rcu_replace(cgrp->bpf.effective[type], old_array,
+   lockdep_is_held(&cgroup_mutex));
/* free prog array after grace period, since __cgroup_bpf_run_*()
 * might be still walking the array
 */
-- 
2.9.5



[PATCH tip/core/rcu 7/9] net/core: Replace rcu_swap_protected() with rcu_replace()

2019-10-02 Thread paulmck
From: "Paul E. McKenney" 

This commit replaces the use of rcu_swap_protected() with the more
intuitively appealing rcu_replace() as a step towards removing
rcu_swap_protected().

Link: 
https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=z7-ggtm6wcvtyytxza1+bhqta4gg...@mail.gmail.com/
Reported-by: Linus Torvalds 
Signed-off-by: Paul E. McKenney 
Cc: "David S. Miller" 
Cc: Jiri Pirko 
Cc: Eric Dumazet 
Cc: Ido Schimmel 
Cc: Petr Machata 
Cc: Paolo Abeni 
Cc: 
---
 net/core/dev.c| 4 ++--
 net/core/sock_reuseport.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index bf3ed41..41f6936 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1288,8 +1288,8 @@ int dev_set_alias(struct net_device *dev, const char 
*alias, size_t len)
}
 
mutex_lock(&ifalias_mutex);
-   rcu_swap_protected(dev->ifalias, new_alias,
-  mutex_is_locked(&ifalias_mutex));
+   new_alias = rcu_replace(dev->ifalias, new_alias,
+   mutex_is_locked(&ifalias_mutex));
mutex_unlock(&ifalias_mutex);
 
if (new_alias)
diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
index f3ceec9..805287b 100644
--- a/net/core/sock_reuseport.c
+++ b/net/core/sock_reuseport.c
@@ -356,8 +356,8 @@ int reuseport_detach_prog(struct sock *sk)
spin_lock_bh(&reuseport_lock);
reuse = rcu_dereference_protected(sk->sk_reuseport_cb,
  lockdep_is_held(&reuseport_lock));
-   rcu_swap_protected(reuse->prog, old_prog,
-  lockdep_is_held(&reuseport_lock));
+   old_prog = rcu_replace(reuse->prog, old_prog,
+  lockdep_is_held(&reuseport_lock));
spin_unlock_bh(&reuseport_lock);
 
if (!old_prog)
-- 
2.9.5



[PATCH tip/core/rcu 8/9] net/netfilter: Replace rcu_swap_protected() with rcu_replace()

2019-10-02 Thread paulmck
From: "Paul E. McKenney" 

This commit replaces the use of rcu_swap_protected() with the more
intuitively appealing rcu_replace() as a step towards removing
rcu_swap_protected().

Link: 
https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=z7-ggtm6wcvtyytxza1+bhqta4gg...@mail.gmail.com/
Reported-by: Linus Torvalds 
Signed-off-by: Paul E. McKenney 
Cc: Pablo Neira Ayuso 
Cc: Jozsef Kadlecsik 
Cc: Florian Westphal 
Cc: "David S. Miller" 
Cc: 
Cc: 
Cc: 
---
 net/netfilter/nf_tables_api.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index d481f9b..8499baf 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1461,8 +1461,9 @@ static void nft_chain_stats_replace(struct nft_trans 
*trans)
if (!nft_trans_chain_stats(trans))
return;
 
-   rcu_swap_protected(chain->stats, nft_trans_chain_stats(trans),
-  lockdep_commit_lock_is_held(trans->ctx.net));
+   nft_trans_chain_stats(trans) =
+   rcu_replace(chain->stats, nft_trans_chain_stats(trans),
+   lockdep_commit_lock_is_held(trans->ctx.net));
 
if (!nft_trans_chain_stats(trans))
static_branch_inc(&nft_counters_enabled);
-- 
2.9.5



[PATCH tip/core/rcu 3/9] drivers/gpu: Replace rcu_swap_protected() with rcu_replace()

2019-10-02 Thread paulmck
From: "Paul E. McKenney" 

This commit replaces the use of rcu_swap_protected() with the more
intuitively appealing rcu_replace() as a step towards removing
rcu_swap_protected().

Link: 
https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=z7-ggtm6wcvtyytxza1+bhqta4gg...@mail.gmail.com/
Reported-by: Linus Torvalds 
Signed-off-by: Paul E. McKenney 
Cc: Jani Nikula 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Chris Wilson 
Cc: Tvrtko Ursulin 
Cc: 
Cc: 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 1cdfe05..c5c22c4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1629,7 +1629,7 @@ set_engines(struct i915_gem_context *ctx,
i915_gem_context_set_user_engines(ctx);
else
i915_gem_context_clear_user_engines(ctx);
-   rcu_swap_protected(ctx->engines, set.engines, 1);
+   set.engines = rcu_replace(ctx->engines, set.engines, 1);
mutex_unlock(&ctx->engines_mutex);
 
call_rcu(&set.engines->rcu, free_engines_rcu);
-- 
2.9.5



  1   2   3   4   5   6   7   8   9   10   >