date:20150514

Re: [PATCH v2 5/5] cpufreq: arm_big_little: add SCPI interface driver

2015-05-14 Thread Viresh Kumar

On 15-05-15, 00:03, Rafael J. Wysocki wrote:
> > + * SCPI CPUFreq Interface driver
> 
> It would be good to expand the TLA here IMO.
> 
> The rest I'm leaving to Viresh. :-)

The rest looks fine :)

Acked-by: Viresh Kumar 

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[git pull] drm fixes

2015-05-14 Thread Dave Airlie


Hi Linus,

radeon, one oops fix, one bug fix, one pci id addition patch
i915, one suspend/resume regression fix.

all seems quiet enough.

Dave.

The following changes since commit 030bbdbf4c833bc69f502eae58498bc5572db736:

  Linux 4.1-rc3 (2015-05-10 15:12:29 -0700)

are available in the git repository at:

  git://people.freedesktop.org/~airlied/linux drm-fixes

for you to fetch changes up to 472313245645661c3bc710f874e660c493d313e1:

  Merge tag 'drm-intel-fixes-2015-05-13' of 
git://anongit.freedesktop.org/drm-intel into drm-fixes (2015-05-15 15:21:18 
+1000)


Alex Deucher (1):
  drm/radeon: add new bonaire pci id

Christian König (1):
  drm/radeon: fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling

Dave Airlie (3):
  drm/radeon: don't do mst probing if MST isn't enabled.
  Merge branch 'drm-fixes-4.1' of git://people.freedesktop.org/~agd5f/linux 
into drm-fixes
  Merge tag 'drm-intel-fixes-2015-05-13' of 
git://anongit.freedesktop.org/drm-intel into drm-fixes

Peter Antoine (1):
  drm/i915: Avoid GPU hang when coming out of s3 or s4

 drivers/gpu/drm/i915/i915_drv.c| 13 ++---
 drivers/gpu/drm/radeon/cik.c   |  4 ++--
 drivers/gpu/drm/radeon/evergreen.c |  2 +-
 drivers/gpu/drm/radeon/ni.c|  5 +++--
 drivers/gpu/drm/radeon/r600.c  |  2 +-
 drivers/gpu/drm/radeon/radeon_dp_mst.c |  3 +++
 drivers/gpu/drm/radeon/rv770.c |  2 +-
 drivers/gpu/drm/radeon/si.c|  4 ++--
 include/drm/drm_pciids.h   |  1 +
 9 files changed, 24 insertions(+), 12 deletions(-)

Re: [BUG] kernel panic after bpf program removed.

2015-05-14 Thread Alexei Starovoitov


On 5/14/15 8:54 PM, Wangnan (F) wrote:

Hi Alexei Starovoitov and other,

I triggered a kernel panic when developing my 'perf bpf' facility. The
call stack is listed at the bottom of
this mail.

I attached two bpf programs on 'kmem_cache_free%return' and
'__alloc_pages_nodemask'. The programs is very simple.
The panic is raised after closing the bpf program and the perf event
file. Looks like the panic is caused
by racing between closing perf event fd and bpf program fd. I'm unable
to reproduce this problem with similar
operations.

Following is the exact instruction cause the panic.


thanks for the report.
Looks like pointer 'prog == 0x6c0' is passed into bpf_prog_put,
which means that event->tp_event was freed and memory reused before
free_event_rcu() was called.

I think it's not perf_event_fd racing with prog_fd, but rather
with kprobe freeing:
__free_event()
  event->destroy(event)
perf_trace_destroy
  perf_trace_event_unreg
which is dropping event->tp_event->perf_refcount
that allows kprobe freeing to proceed in:
unregister_kprobe_event
  trace_remove_event_call
probe_remove_event_call
and eventually tp_event to get freed.

I think calling perf_event_free_bpf_prog()
from __free_event() instead of free_event_rcu() will fix the race,
but please double check my analysis.
Also please send me a reproducer script. I'd like to see it crashing
first before the fix and not crashing afterwards.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Window watchdog driver design

2015-05-14 Thread Andreas Werner

On Thu, May 14, 2015 at 05:52:38PM -0700, Guenter Roeck wrote:
> On 05/14/2015 07:09 AM, Andreas Werner wrote:
> >On Thu, May 14, 2015 at 06:30:05AM -0700, Guenter Roeck wrote:
> >>On 05/14/2015 04:56 AM, Andreas Werner wrote:
> >>>Hi,
> >>>in the next few weeks I need to write a driver for a window wachtdog
> >>>implemented in a CPLD. I have some questions about the design
> >>>of the driver and the best way to write this driver to also be able
> >>>to submit it.
> >>>
> >>>The triggering and configuration of the Watchdog is done by several GPIOs 
> >>>which
> >>>are connected to the CPLD watchdog device. The correct GPIOs are 
> >>>configurable
> >>>using the Device Tree.
> >>>
> >>>1. Timeout
> >>>   The timeout values are defined in ms and start from 20ms to 2560ms.
> >>>   The timout is set by 3 GPIOs this means we have only 8 different
> >>>   timout values. It is also possible that a future Watchdog CPLD device
> >>>   does have different timeout values.
> >>>
> >>>   Is it possible to set ms timeouts? It seems that the WDT API does
> >>>   only support a resolution of 1sec.
> >>>
> >>>   One idea would be to use the API timeout as something like a timeout
> >>>   index to set the different values. Of course this needs to be 
> >>> documented.
> >>>
> >>>   e.g.
> >>>   timeout (API)   timeout in device
> >>>   1   20ms
> >>>   2   100ms
> >>>   3   500ms
> >>>   ... ... 
> >>>
> >>>2. Upper/Lower Window
> >>>   There is currently no support for a windowed watchdog in the wdt core.
> >>>   The lower window can be activated by a gpio and its timeout is defined
> >>>   as "upper windows timeout/4"
> >>>
> >>>   What is the best way to implement those additional settings? Adding 
> >>> additional
> >>>   ioctl or export these in sysfs?
> >>>--
> >>
> >>Sorry for the maybe dumb question, but what is a window watchdog,
> >>and what is the lower window timeout for (assuming the upper window
> >>timeout causes the watchdog to expire) ?
> >>
> >>Guenter
> >>
> >
> >Oh sorry forgot to describe it in more detail.
> >
> >If you have a watchdog window you do not have just one timeout where the 
> >watchdog can expire.
> >You have a so called "window" to trigger it within.
> >
> > ||
> >---lower timeoutupper timeout
> >
> >This means you have to trigger the watchdog not to late and not to early.
> >This kind of watchdog is often used in embedded applications or more often
> >in safety cases to fullfil requirements given e.g. by SIL1-SIL4 
> >certifications.
> >
> >The lower timeout is set by a dedicated GPIO and the value will then "Upper 
> >timeout / 4". The
> >upper timeout is set by 3 GPIOs to get different timeout values.
> >
> 
> Thanks a lot for the explanation.
> 
> I would suggest to use a module parameter to enable the "lower timeout" 
> functionality.
> 
> Timeouts have to be specified in seconds.
> 
> Hope this helps,
> Guenter
> 

Thanks for the answer.

The module parameter would be ok for me, but it would be better if i can 
enable/disable
the lower window by the application.

I know that the API defines the timout in seconds but what about ms? Is there no
watchdog out there which has timout values < seconds?.

In my case I can only set 2 timouts (1sec and 2sec) but I need to support all 8 
timeout
values.

The other thing is that my Watchdog can have differen timeout values depending
on the CPLD and the customer requirements. I can not read out this values, they 
are
only defined in the specification.

This is why i had the idea with the table to only set some "indexes" for the 
timout
to handle all the cases.

Regards
Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the rcu tree with the net-next tree

2015-05-14 Thread Stephen Rothwell

Hi Paul,

Today's linux-next merge of the rcu tree got a conflict in
net/netfilter/core.c between commit f7191483461c ("netfilter: add hook
list to nf_hook_state") from the net-next tree and commit e4dcfe3a648b
("netfilter: Fix list_entry_rcu usage") from the rcu tree.

I fixed it up (I used the net-next tree version) and can carry the fix
as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpK0zJipMuhg.pgp
Description: OpenPGP digital signature

Re: [PATCH v1 02/13] ASoC: qcom: move ipq806x specific bits out of lpass driver.

2015-05-14 Thread Kenneth Westfield

On Wed, May 13, 2015 at 05:00:26AM -0700, Srinivas Kandagatla wrote:
> This patch tries to make the lpass driver more generic by moving the
> ipq806x specific bits out of the cpu and platform driver, also allows the
> SOC specific drivers to add the correct register offsets.
> 
> This patch also renames the register definition header file into more
> generic header file.

> diff --git a/sound/soc/qcom/Kconfig b/sound/soc/qcom/Kconfig
> index 05b9840..865205e 100644
> --- a/sound/soc/qcom/Kconfig
> +++ b/sound/soc/qcom/Kconfig

> @@ -14,11 +14,16 @@ config SND_SOC_LPASS_PLATFORM
>   depends on SND_SOC_QCOM && OF
>   select REGMAP_MMIO
>  
> +config SND_SOC_LPASS_IPQ806X
> + tristate
> + depends on SND_SOC_QCOM
> + select SND_SOC_LPASS_CPU
> + select SND_SOC_LPASS_PLATFORM

Based on moving the of_device_id table from lpass-cpu.c to
lpass-ipq806x.c, shouldn't the OF dependency follow to the
SND_SOC_LPASS_IPQ806X config (and not SND_SOC_LPASS_CPU)?

> +
>  config SND_SOC_STORM
>   tristate "ASoC I2S support for Storm boards"
>   depends on (ARCH_QCOM && SND_SOC_QCOM) || COMPILE_TEST
> - select SND_SOC_LPASS_CPU
> - select SND_SOC_LPASS_PLATFORM
> + select SND_SOC_LPASS_IPQ806X
>   select SND_SOC_MAX98357A
>   help
>Say Y or M if you want add support for SoC audio on the

> diff --git a/sound/soc/qcom/lpass-ipq806x.c
> b/sound/soc/qcom/lpass-ipq806x.c
> new file mode 100644
> index 000..d1f698c
> --- /dev/null
> +++ b/sound/soc/qcom/lpass-ipq806x.c

> +static struct platform_driver ipq806x_lpass_cpu_platform_driver = {
> + .driver = {
> + .name   = "lpass-cpu",
> + .of_match_table =
> of_match_ptr(ipq806x_lpass_cpu_device_id),
> + },
> + .probe  = asoc_qcom_lpass_cpu_platform_probe,
> + .remove = asoc_qcom_lpass_cpu_platform_remove,
> +};
> +module_platform_driver(ipq801x_lpass_cpu_platform_driver);

Patch below fixes the above typo (which breaks compilation):

---><-
diff --git a/sound/soc/qcom/lpass-ipq806x.c
b/sound/soc/qcom/lpass-ipq806x.c
index ad1d67a..2eab828 100644
--- a/sound/soc/qcom/lpass-ipq806x.c
+++ b/sound/soc/qcom/lpass-ipq806x.c
@@ -103,7 +103,7 @@ static struct platform_driver 
ipq806x_lpass_cpu_platform_driver = {
.probe  = asoc_qcom_lpass_cpu_platform_probe,
.remove = asoc_qcom_lpass_cpu_platform_remove,
 };
-module_platform_driver(ipq801x_lpass_cpu_platform_driver);
+module_platform_driver(ipq806x_lpass_cpu_platform_driver);
  
 MODULE_DESCRIPTION("QTi LPASS CPU Driver");
 MODULE_LICENSE("GPL v2");
---><-

> +
> +MODULE_DESCRIPTION("QTi LPASS CPU Driver");
> +MODULE_LICENSE("GPL v2");

-- 
Kenneth Westfield
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1 11/13] ASoC: qcom: add apq8016 sound card support

2015-05-14 Thread Kenneth Westfield

On Wed, May 13, 2015 at 05:03:14AM -0700, Srinivas Kandagatla wrote:
> This patch adds apq8016 machine driver support. This patch was tested on
> two apq8016-sbc and msm8916-mtp board for both hdmi and analog audio
> features.

> diff --git a/sound/soc/qcom/Kconfig b/sound/soc/qcom/Kconfig
> index 9cc5ed7..e71b0f2 100644
> --- a/sound/soc/qcom/Kconfig
> +++ b/sound/soc/qcom/Kconfig
> @@ -34,3 +34,12 @@ config SND_SOC_STORM
>   help
>Say Y or M if you want add support for SoC audio on the
>Qualcomm Technologies IPQ806X-based Storm board.
> +
> +config SND_SOC_APQ8016_SBC
> + tristate "SoC Audio support for APQ8016 SBC platforms"
> + depends on SND_SOC_QCOM || ARCH_QCOM || COMPILE_TEST

I believe this should be:
depends on (SND_SOC_QCOM && ARCH_QCOM) || COMPILE_TEST

> + select SND_SOC_LPASS_APQ8016
> + help
> +  Support for Qualcomm Technologies LPASS audio block in
> +  APQ8016 SOC-based systems.
> +  Say Y if you want to use audio devices on MI2S

-- 
Kenneth Westfield
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1 05/13] ASoC: qcom: support bitclk and osrclk per i2s port

2015-05-14 Thread Kenneth Westfield

On Wed, May 13, 2015 at 05:00:52AM -0700, Srinivas Kandagatla wrote:
> This patch adds support to allow bitclk and osrclk per i2s dai port.
> on APQ8016 there are 4 i2s ports each one has its own bit clks.
> 
> Without this patch its not possible to support multiple i2s ports in the
> lpass driver.

> diff --git a/sound/soc/qcom/lpass-cpu.c b/sound/soc/qcom/lpass-cpu.c
> index 5965667..0d28ea7 100644
> --- a/sound/soc/qcom/lpass-cpu.c
> +++ b/sound/soc/qcom/lpass-cpu.c
> @@ -33,7 +33,7 @@ static int lpass_cpu_daiops_set_sysclk(struct
> snd_soc_dai *dai, int clk_id,
>   struct lpass_data *drvdata = snd_soc_dai_get_drvdata(dai);
>   int ret;
>  
> - ret = clk_set_rate(drvdata->mi2s_osr_clk, freq);
> + ret = clk_set_rate(drvdata->mi2s_osr_clk[dai->driver->id], freq);
>   if (ret)
>   dev_err(dai->dev, "%s() error setting mi2s osrclk to %u:
> %d\n",
>   __func__, freq, ret);

Audio was broken on the Storm board with this patch series.  The issue
has to do with the mismatch of the clock position in the array (which
was 0) and the dai->driver->id (which was 4).  Basically, the position of
the bit/osr clocks in their respective arrays need to match the MI2S
port number, even if the port number doesn't start at the 0 position.

I realize there are multiple ways to address this.  The quick solution I
came up with (to get audio functioning again) was to change the DT clock
entries for the ipq806x (see changes below for your reference).  The
downside to the way I did this is, that now, there is no error-checking
for clocks that should be in the DT but aren't there.

Suggestions are welcome on how to best address this issue.

---><-
diff --git a/Documentation/devicetree/bindings/sound/qcom,lpass-cpu.txt 
b/Documentation/devicetree/bindings/sound/qcom,lpass-cpu.txt
index 21c6483..2684a4f 100644
--- a/Documentation/devicetree/bindings/sound/qcom,lpass-cpu.txt
+++ b/Documentation/devicetree/bindings/sound/qcom,lpass-cpu.txt
@@ -8,8 +8,8 @@ Required properties:
 - clocks   : Must contain an entry for each entry in clock-names.
 - clock-names  : A list which must include the following entries:
* "ahbix-clk"
-   * "mi2s-osr-clk"
-   * "mi2s-bit-clk"
+   * "mi2s-osr-clk4"
+   * "mi2s-bit-clk4"
: required clocks for "qcom,lpass-cpu-apq8016"
* "ahbix-clk"
* "mi2s-bit-clk0"
@@ -42,7 +42,7 @@ Example:
 lpass@2810 {
compatible = "qcom,lpass-cpu";
clocks = < AHBIX_CLK>, < MI2S_OSR_CLK>, < MI2S_BIT_CLK>;
-   clock-names = "ahbix-clk", "mi2s-osr-clk", "mi2s-bit-clk";
+   clock-names = "ahbix-clk", "mi2s-osr-clk4", "mi2s-bit-clk4";
interrupts = <0 85 1>;
interrupt-names = "lpass-irq-lpaif";
pinctrl-names = "default", "idle";
diff --git a/arch/arm/boot/dts/qcom-ipq8064.dtsi 
b/arch/arm/boot/dts/qcom-ipq8064.dtsi
index 5a13366..090984f 100644
--- a/arch/arm/boot/dts/qcom-ipq8064.dtsi
+++ b/arch/arm/boot/dts/qcom-ipq8064.dtsi
@@ -189,8 +189,8 @@
< MI2S_OSR_CLK>,
< MI2S_BIT_CLK>;
clock-names = "ahbix-clk",
-   "mi2s-osr-clk",
-   "mi2s-bit-clk";
+   "mi2s-osr-clk4",
+   "mi2s-bit-clk4";
interrupts = <0 85 1>;
interrupt-names = "lpass-irq-lpaif";
reg = <0x2810 0x1>;
diff --git a/sound/soc/qcom/lpass-cpu.c b/sound/soc/qcom/lpass-cpu.c
index 5053629..7b66e52 100644
--- a/sound/soc/qcom/lpass-cpu.c
+++ b/sound/soc/qcom/lpass-cpu.c
@@ -411,11 +411,8 @@ int asoc_qcom_lpass_cpu_platform_probe(struct 
platform_device *pdev)
if (variant->init)
variant->init(pdev);
 
-   for (i = 0; i < variant->num_dai; i++) {
-   if (variant->num_dai > 1)
-   sprintf(clk_name, "mi2s-osr-clk%d", i);
-   else
-   sprintf(clk_name, "mi2s-osr-clk");
+   for (i = 0; i < LPASS_MAX_MI2S_PORTS; i++) {
+   sprintf(clk_name, "mi2s-osr-clk%d", i);
 
drvdata->mi2s_osr_clk[i] = devm_clk_get(>dev,
clk_name);
@@ -427,19 +424,14 @@ int asoc_qcom_lpass_cpu_platform_probe(struct 
platform_device *pdev)
}
}
 
-   for (i = 0; i < variant->num_dai; i++) {
-
-   if (variant->num_dai > 1)
-   sprintf(clk_name, "mi2s-bit-clk%d", i);
-   else
-   sprintf(clk_name,

Re: [PATCH v1 10/13] ASoC: qcom: Add apq8016 lpass driver support

2015-05-14 Thread Kenneth Westfield

On Wed, May 13, 2015 at 05:03:06AM -0700, Srinivas Kandagatla wrote:
> This patch adds apq8016 lpass driver support. APQ8016 has 4 MI2S which
> can be routed to one internal codec and 2 external codec interfaces.
> 
> Primary, Secondary, Quaternary I2S can do Rx(playback) and Tertiary and
> Quaternary can do Tx(capture).

> diff --git a/sound/soc/qcom/Kconfig b/sound/soc/qcom/Kconfig
> index 865205e..9cc5ed7 100644
> --- a/sound/soc/qcom/Kconfig
> +++ b/sound/soc/qcom/Kconfig
> @@ -20,6 +20,12 @@ config SND_SOC_LPASS_IPQ806X
>   select SND_SOC_LPASS_CPU
>   select SND_SOC_LPASS_PLATFORM
>  
> +config SND_SOC_LPASS_APQ8016
> + tristate
> + depends on SND_SOC_QCOM
> + select SND_SOC_LPASS_CPU
> + select SND_SOC_LPASS_PLATFORM

Continuing from my comments on patch 2/13, should an OF dependency be added
here as well?

> +
>  config SND_SOC_STORM
>   tristate "ASoC I2S support for Storm boards"
>   depends on (ARCH_QCOM && SND_SOC_QCOM) || COMPILE_TEST

> diff --git a/sound/soc/qcom/lpass-apq8016.c
> b/sound/soc/qcom/lpass-apq8016.c
> new file mode 100644
> index 000..5cbf17f0
> --- /dev/null
> +++ b/sound/soc/qcom/lpass-apq8016.c

> +static int apq8016_lpass_free_dma_channel(struct lpass_data *drvdata, int
> chan)
> +{
> + clear_bit(chan, >rdma_ch_bit_map);
> +
> + return 0;
> +}
> +
> +static int apq8016_lpass_init(struct platform_device *pdev)
> +{
> + struct lpass_data *drvdata = platform_get_drvdata(pdev);
> + struct device *dev = >dev;
> + int ret;
> +
> + drvdata->pcnoc_mport_clk = devm_clk_get(dev, "pcnoc-mport-clk");
> + if (IS_ERR(drvdata->pcnoc_mport_clk)) {
> + dev_err(>dev, "%s() error getting pcnoc-mport-clk:
> %ld\n",
> + __func__,
> PTR_ERR(drvdata->pcnoc_mport_clk));
> + return PTR_ERR(drvdata->pcnoc_mport_clk);
> + }
> +
> + ret = clk_prepare_enable(drvdata->pcnoc_mport_clk);
> + if (ret) {
> + dev_err(>dev, "%s() Error enabling ahbix_clk: %d\n",

Please correct the clock name in the log message ...

> + __func__, ret);
> + return ret;
> + }
> +
> + drvdata->pcnoc_sway_clk = devm_clk_get(dev, "pcnoc-sway-clk");
> + if (IS_ERR(drvdata->pcnoc_sway_clk)) {
> + dev_err(>dev, "%s() error getting pcnoc-sway-clk:
> %ld\n",
> + __func__,
> PTR_ERR(drvdata->pcnoc_sway_clk));
> + return PTR_ERR(drvdata->pcnoc_sway_clk);
> + }
> +
> + ret = clk_prepare_enable(drvdata->pcnoc_sway_clk);
> + if (ret) {
> + dev_err(>dev, "%s() Error enabling ahbix_clk: %d\n",

... here too.

> + __func__, ret);
> + return ret;
> + }

-- 
Kenneth Westfield
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net-next] be2net: be_hwmon_show_temp() can be static

2015-05-14 Thread Fengguang Wu

On Thu, May 14, 2015 at 05:39:46PM -0400, David Miller wrote:
> From: kbuild test robot 
> Date: Fri, 15 May 2015 03:02:35 +0800
> 
> > 
> > Signed-off-by: Fengguang Wu 
> > ---
> >  be_main.c |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
> > b/drivers/net/ethernet/emulex/benet/be_main.c
> > index dc7c0fd..76d491f 100644
> > --- a/drivers/net/ethernet/emulex/benet/be_main.c
> > +++ b/drivers/net/ethernet/emulex/benet/be_main.c
> > @@ -5612,7 +5612,7 @@ static void be_remove(struct pci_dev *pdev)
> > free_netdev(adapter->netdev);
> >  }
> >  
> > -ssize_t be_hwmon_show_temp(struct device *dev,
> > +static ssize_t be_hwmon_show_temp(struct device *dev,
> >struct device_attribute *dev_attr,
> >char *buf)
> 
> If you adjust the column of the openning parenthesis of the
> function, you have to reindent the subsequent lines so that they
> start precisely at the very next column.
> 
> You must use the appropriate number of TAB and SPACE characters
> necessary to do so.

OK. We'll improve the script to adjust indent accordingly.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pinctrl: zynq: add static to platform_driver remove callback

2015-05-14 Thread Sören Brinkmann

On Fri, 2015-05-15 at 12:31PM +0900, Masahiro Yamada wrote:
> This function is only referenced in this file.
> 
> Signed-off-by: Masahiro Yamada 
Reviewed-by: Sören Brinkmann 

Sören
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v9 00/17] Tegra124 CL-DVFS / DFLL clocksource + cpufreq

2015-05-14 Thread Mikko Perttunen


On 05/15/2015 05:09 AM, Viresh Kumar wrote:

On 15 May 2015 at 01:45, Rafael J. Wysocki  wrote:

You need ACKs from Viresh for those two, then.  He's officially responsible
for ARM cpufreq drivers.


I thought an Ack for 14th is enough :)

For: 12/13/14.
Acked-by: Viresh Kumar 



Thanks! :)

It probably was, but after almost one year of this series, I'm not sure 
about anything anymore ;)


Mikko

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-05-14 Thread Ming Lei

On Fri, May 15, 2015 at 8:59 AM, Rafael J. Wysocki  wrote:
> On Fri, May 15, 2015 at 2:40 AM, Ming Lei  wrote:
>> On Fri, May 15, 2015 at 8:34 AM, Rafael J. Wysocki  
>> wrote:
>>> On Friday, May 15, 2015 09:54:26 AM Dave Chinner wrote:
 ng back On Thu, May 14, 2015 at 09:22:51AM +1000, NeilBrown wrote:
 > On Mon, 11 May 2015 11:44:28 +1000 Dave Chinner  
 > wrote:
 >
 > > On Fri, May 08, 2015 at 03:08:43AM -0400, Len Brown wrote:
 > > > From: Len Brown 
 > > >
 > > > Remove sys_sync() from the kernel's suspend flow.
 > > >
 > > > sys_sync() is extremely expensive in some configurations,
 > > > and so the kernel should not force users to pay this cost
 > > > on every suspend.
 > >
 > > Since when? Please explain what your use case is that makes this
 > > so prohibitively expensive it needs to be removed.
 > >
 > > >
 > > > The user-space utilities s2ram and s2disk choose to invoke sync() 
 > > > today.
 > > > A user can invoke suspend directly via /sys/power/state to skip that 
 > > > cost.
 > >
 > > So, you want to have s2disk write all the dirty pages in memory to
 > > the suspend image, rather than to the filesystem?
 > >
 > > Either way you have to write that dirty data to disk, but if you
 > > write it to the suspend image, it then has to be loaded again on
 > > resume, and then written again to the filesystem the system has
 > > resumed. This doesn't seem very efficient to me
 > >
 > > And, quite frankly, machines fail to resume from suspne dall the
 > > time. e.g. run out of batteries when they are under s2ram
 > > conditions, or s2disk fails because a kernel upgrade was done before
 > > the s2disk and so can't be resumed. With your change, users lose all
 > > the data that was buffered in memory before suspend, whereas right
 > > now it is written to disk and so nothing is lost if the resume from
 > > suspend fails for whatever reason.
 > >
 > > IOWs, I can see several good reasons why the sys_sync() needs to
 > > remain in the suspend code. User data safety and filesystem
 > > integrity is far, far more important than a couple of seconds
 > > improvement in suspend speed
 >
 > To be honest, this sounds like superstition and fear, not science and 
 > fact.
 >
 > "filesystem integrity" is not an issue for the fast majority of 
 > filesystems
 > which use journalling to ensure continued integrity even after a crash.  
 > I
 > think even XFS does that :-)

 It has nothing to do with journalling, and everything to do with
 bring filesystems to an *idle state* before suspend runs.  We have a
 long history of bug reports with XFS that go: suspend, resume, XFS
 almost immediately detects corruption, shuts down.

 The problem is that "sync" doesn't make the filesystem idle - XFs
 has *lots* of background work going on, and if we aren't *real
 careful* the filesystem is still doing work while the hardware gets
 powerd down and the suspend image is being taken. the result is on
 resume that the on-disk filesystem state does not match the memory
 image pulled back from resume, and we get shutdowns.

 sys_sync() does not guarantee a filesystem is idle - it guarantees
 the data in memory is recoverable, butit doesn't stop the filesystem
 from doing things like writing back metadata or running background
 cleaup tasks. If those aren't stopped properly, then we get into
 the state where in-memory and on-disk state get out of whack. And
 s2ram can have these problems too, because if there is IO in flight
 when the hardware is powered down, that IO is lost

 Every time some piece of generic infrastructure changes behaviour
 w.r.t. suspend/resume, we get a new set of problems being reported
 by users. It's extremely hard to test for these problems and it
 might take months of occasional corruption reports from a user to
 isolate it to being a suspend/resume problem.  It's a game of
 whack-a-mole, because quite often they come down to the fact that
 something changed and nobody in the XFS world knew they had to now
 set an different initialisation flag on some structure or workqueue
 to make it work the way it needed to work.

 Go back an look at the history of sys_sync() in suspend discussions
 over the past 10 years.  You'll find me saying exactly the same
 thing again and again about sys_sync(): it does not guarantee the
 filesystem is in an idle or coherent, unchanging state, and nothing
 in the suspend code tells the filesystem to enter an idle or frozen
 state. We actually have mechanisms for doing this - we use it in the
 storage layers to idle the filesystem while we do things like *take
 a snapshot*.

 What is the mechanism suspend to

Re: [PATCH v3 109/110] namei: handle absolute symlinks without dropping out of RCU mode

2015-05-14 Thread Al Viro

On Mon, May 11, 2015 at 07:08:09PM +0100, Al Viro wrote:
> @@ -499,7 +499,7 @@ struct nameidata {
>   struct path root;
>   struct inode*inode; /* path.dentry.d_inode */
>   unsigned intflags;
> - unsignedseq, m_seq;
> + unsignedseq, m_seq, root_seq;
>   int last_type;
>   unsigneddepth;
>   int total_link_count;
> @@ -780,14 +780,14 @@ static __always_inline void set_root(struct nameidata 
> *nd)
>  static __always_inline unsigned set_root_rcu(struct nameidata *nd)
>  {
>   struct fs_struct *fs = current->fs;
> - unsigned seq, res;
> + unsigned seq;
>  
>   do {
>   seq = read_seqcount_begin(>seq);
>   nd->root = fs->root;
> - res = __read_seqcount_begin(>root.dentry->d_seq);
> + nd->root_seq = __read_seqcount_begin(>root.dentry->d_seq);

nd->root_seq is also needed in LOOKUP_ROOT | LOOKUP_RCU case.  Fixed and
folded.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 105/110] namei: make unlazy_walk and terminate_walk handle nd->stack, add unlazy_link

2015-05-14 Thread Al Viro

On Tue, May 12, 2015 at 05:10:01AM +0100, Al Viro wrote:

> +static int unlazy_link(struct nameidata *nd, struct path *link, unsigned seq)
> +{
> + if (unlikely(!legitimize_path(nd, link, seq))) {
> + drop_links(nd);
> + rcu_read_unlock();
> + nd->flags &= ~LOOKUP_RCU;
> + nd->path.mnt = NULL;
> + nd->path.dentry = NULL;
> + if (!(nd->flags & LOOKUP_ROOT))
> + nd->root.mnt = NULL;

... and nd->depth should be set to 0, to avoid bogus path_put() on the
stuff in nd->stack[...].link when we get to terminate_walk().  Fixed and
folded.

> + } else if (likely(unlazy_walk(nd, NULL, 0)) == 0) {
> + return 0;
> + }
> + path_put(link);
> + return -ECHILD;
> +}
> +
>  static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
>  {
>   return dentry->d_op->d_revalidate(dentry, flags);
> @@ -1537,20 +1613,6 @@ static inline int handle_dots(struct nameidata *nd, 
> int type)
>   return 0;
>  }
>  
> -static void terminate_walk(struct nameidata *nd)
> -{
> - if (!(nd->flags & LOOKUP_RCU)) {
> - path_put(>path);
> - } else {
> - nd->flags &= ~LOOKUP_RCU;
> - if (!(nd->flags & LOOKUP_ROOT))
> - nd->root.mnt = NULL;
> - rcu_read_unlock();
> - }
> - while (unlikely(nd->depth))
> - put_link(nd);
> -}
> -
>  static int pick_link(struct nameidata *nd, struct path *link,
>struct inode *inode, unsigned seq)
>  {
> @@ -1561,13 +1623,12 @@ static int pick_link(struct nameidata *nd, struct 
> path *link,
>   return -ELOOP;
>   }
>   if (nd->flags & LOOKUP_RCU) {
> - if (unlikely(nd->path.mnt != link->mnt ||
> -  unlazy_walk(nd, link->dentry, seq))) {
> + if (unlikely(unlazy_link(nd, link, seq)))
>   return -ECHILD;
> - }
> + } else {
> + if (link->mnt == nd->path.mnt)
> + mntget(link->mnt);
>   }
> - if (link->mnt == nd->path.mnt)
> - mntget(link->mnt);
>   error = nd_alloc_stack(nd);
>   if (unlikely(error)) {
>   path_put(link);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Crash in crc32_le during kmemleak_scan()

2015-05-14 Thread vigneshr

>
> kmemleak_disable() may be called in an atomic context, so calling
> kthread_stop() here is not safe. We have a scan_should_stop() function
> which checks for the kmemleak_enabled variable but it doesn't seem to be
> enough.
>
> Basically the object_list has some vmalloc'ed objects. Scanning such
> objects is protected by the kmemleak_object.lock and the look-up by the
> kmemleak_lock. What happens during kmemleak_disable() is that we set
> kmemleak_enable to 0 and kmemleak_free() simply exits. When this happens
> during a scan, objects in the object_list are freed/vunmap'ed but
> kmemleak doesn't know about this until the clean-up completes (which, as
> you found, may be blocked on the scanning to complete).
>
> A patch I had but never managed to test it properly (as in reproducing
> the low mem conditions during a scan) postpones the kmemleak disabling
> until after the clean-up is finished. If it works for you, I'll add a
> proper commit message:
>
> -8<-
>
> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index 5405aff5a590..dcba05812678 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -521,6 +521,10 @@ static struct kmemleak_object *create_object(unsigned
> long ptr, size_t size,
>   struct kmemleak_object *object, *parent;
>   struct rb_node **link, *rb_parent;
>
> + /* stop further allocations while kmemleak is being disabled */
> + if (kmemleak_error)
> + return NULL;
> +
>   object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
>   if (!object) {
>   pr_warning("Cannot allocate a kmemleak_object structure\n");
> @@ -741,6 +745,10 @@ static void add_scan_area(unsigned long ptr, size_t
> size, gfp_t gfp)
>   struct kmemleak_object *object;
>   struct kmemleak_scan_area *area;
>
> + /* stop further allocations while kmemleak is being disabled */
> + if (kmemleak_error)
> + return;
> +
>   object = find_and_get_object(ptr, 1);
>   if (!object) {
>   kmemleak_warn("Adding scan area to unknown object at 0x%08lx\n",
> @@ -1127,7 +1135,7 @@ static bool update_checksum(struct kmemleak_object
> *object)
>   */
>  static int scan_should_stop(void)
>  {
> - if (!kmemleak_enabled)
> + if (kmemleak_error)
>   return 1;
>
>   /*
> @@ -1755,6 +1763,10 @@ static void kmemleak_do_cleanup(struct work_struct
> *work)
>   pr_info("Kmemleak disabled without freeing internal data. "
>   "Reclaim the memory with \"echo clear >
> /sys/kernel/debug/kmemleak\"\n");
>   mutex_unlock(_mutex);
> +
> + /* stop any memory operation tracing */
> + kmemleak_enabled = 0;
> +
>  }
>
>  static DECLARE_WORK(cleanup_work, kmemleak_do_cleanup);
> @@ -1769,12 +1781,11 @@ static void kmemleak_disable(void)
>   if (cmpxchg(_error, 0, 1))
>   return;
>
> - /* stop any memory operation tracing */
> - kmemleak_enabled = 0;
> -
>   /* check whether it is too early for a kernel thread */
>   if (kmemleak_initialized)
>   schedule_work(_work);
> + else
> + kmemleak_enabled = 0;
>
>   pr_info("Kernel memory leak detector disabled\n");
>  }
>
> --
> Catalin
> --

Thank you for the explanation and the patch. I will give this a shot and
will get back with the results.

--
Thanks and regards,
Vignesh Radhakrishnan


QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of the Code Aurora Forum, hosted by The Linux Foundation.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [Linux-nvdimm] [PATCH v2 18/20] libnd: infrastructure for btt devices

2015-05-14 Thread Elliott, Robert (Server Storage)

> -Original Message-
> From: Linux-nvdimm [mailto:linux-nvdimm-boun...@lists.01.org] On Behalf
> Of Dan Williams
> Sent: Thursday, May 14, 2015 7:42 PM
> To: Kani, Toshimitsu
> Cc: Neil Brown; Greg KH; linux-kernel@vger.kernel.org; linux-
> nvd...@lists.01.org
> Subject: Re: [Linux-nvdimm] [PATCH v2 18/20] libnd: infrastructure for
> btt devices
> 
...
> So we can fix this to be at least as stable as the backing device
> names [1], but as far as I can see we would need to start using the
> backing device name in the btt device name.  A strawman proposal is to
> append 's' to indicated 'sectored'.  So /dev/pmem0s is the btt
> instance fronting /dev/pmem0.  Other examples:
> 
> /dev/pmem0p1s
> /dev/ndblk0.0s
> /dev/ndblk0.0p1s
> ...
> 
> Thoughts?
> 
> [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000636.html

I like that; it also hints to the user that another driver has already
claimed /dev/pmem0, similar to how the presence of /dev/sda1, /dev/sda2,
etc. hints that a program has partitioned /dev/sda.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 2/2] arch/x86: remove pci uart early console from early_prink.c

2015-05-14 Thread Bin Gao

The arch independent uart8250 early console driver has good
support for memory mapped and io port based 8250 uarts. Since
pci is arch independent so it's natural to extend uart8250 to
support mem, io and pci. Hence pci uart early console in
arch/x86/kernel_printk.c by the following commit:
'commit 5140fda16051 ("Specify PCI based UART for earlyprintk")'
is removed. And its equivalent function will be available from
uart8250 early console driver.

Signed-off-by: Bin Gao 
---
 arch/x86/kernel/early_printk.c | 180 -
 1 file changed, 15 insertions(+), 165 deletions(-)

diff --git a/arch/x86/kernel/early_printk.c b/arch/x86/kernel/early_printk.c
index 89427d8..00c2e2a 100644
--- a/arch/x86/kernel/early_printk.c
+++ b/arch/x86/kernel/early_printk.c
@@ -19,7 +19,6 @@
 #include 
 #include 
 #include 
-#include 
 
 /* Simple VGA output */
 #define VGABASE(__ISA_IO_base + 0xb8000)
@@ -77,7 +76,7 @@ static struct console early_vga_console = {
 
 /* Serial functions loosely based on a similar package from Klaus P. Gerlicher 
*/
 
-static unsigned long early_serial_base = 0x3f8;  /* ttyS0 */
+static int early_serial_base = 0x3f8;  /* ttyS0 */
 
 #define XMTRDY  0x20
 
@@ -95,26 +94,13 @@ static unsigned long early_serial_base = 0x3f8;  /* ttyS0 */
 #define DLL 0   /*  Divisor Latch Low */
 #define DLH 1   /*  Divisor latch High*/
 
-static unsigned int io_serial_in(unsigned long addr, int offset)
-{
-   return inb(addr + offset);
-}
-
-static void io_serial_out(unsigned long addr, int offset, int value)
-{
-   outb(value, addr + offset);
-}
-
-static unsigned int (*serial_in)(unsigned long addr, int offset) = 
io_serial_in;
-static void (*serial_out)(unsigned long addr, int offset, int value) = 
io_serial_out;
-
 static int early_serial_putc(unsigned char ch)
 {
unsigned timeout = 0x;
 
-   while ((serial_in(early_serial_base, LSR) & XMTRDY) == 0 && --timeout)
+   while ((inb(early_serial_base + LSR) & XMTRDY) == 0 && --timeout)
cpu_relax();
-   serial_out(early_serial_base, TXR, ch);
+   outb(ch, early_serial_base + TXR);
return timeout ? 0 : -1;
 }
 
@@ -128,28 +114,13 @@ static void early_serial_write(struct console *con, const 
char *s, unsigned n)
}
 }
 
-static __init void early_serial_hw_init(unsigned divisor)
-{
-   unsigned char c;
-
-   serial_out(early_serial_base, LCR, 0x3);/* 8n1 */
-   serial_out(early_serial_base, IER, 0);  /* no interrupt */
-   serial_out(early_serial_base, FCR, 0);  /* no fifo */
-   serial_out(early_serial_base, MCR, 0x3);/* DTR + RTS */
-
-   c = serial_in(early_serial_base, LCR);
-   serial_out(early_serial_base, LCR, c | DLAB);
-   serial_out(early_serial_base, DLL, divisor & 0xff);
-   serial_out(early_serial_base, DLH, (divisor >> 8) & 0xff);
-   serial_out(early_serial_base, LCR, c & ~DLAB);
-}
-
 #define DEFAULT_BAUD 9600
 
 static __init void early_serial_init(char *s)
 {
+   unsigned char c;
unsigned divisor;
-   unsigned long baud = DEFAULT_BAUD;
+   unsigned baud = DEFAULT_BAUD;
char *e;
 
if (*s == ',')
@@ -174,138 +145,23 @@ static __init void early_serial_init(char *s)
s++;
}
 
-   if (*s) {
-   if (kstrtoul(s, 0, ) < 0 || baud == 0)
-   baud = DEFAULT_BAUD;
-   }
-
-   /* Convert from baud to divisor value */
-   divisor = 115200 / baud;
-
-   /* These will always be IO based ports */
-   serial_in = io_serial_in;
-   serial_out = io_serial_out;
-
-   /* Set up the HW */
-   early_serial_hw_init(divisor);
-}
-
-#ifdef CONFIG_PCI
-static void mem32_serial_out(unsigned long addr, int offset, int value)
-{
-   u32 *vaddr = (u32 *)addr;
-   /* shift implied by pointer type */
-   writel(value, vaddr + offset);
-}
-
-static unsigned int mem32_serial_in(unsigned long addr, int offset)
-{
-   u32 *vaddr = (u32 *)addr;
-   /* shift implied by pointer type */
-   return readl(vaddr + offset);
-}
-
-/*
- * early_pci_serial_init()
- *
- * This function is invoked when the early_printk param starts with "pciserial"
- * The rest of the param should be ",B:D.F,baud" where B, D & F describe the
- * location of a PCI device that must be a UART device.
- */
-static __init void early_pci_serial_init(char *s)
-{
-   unsigned divisor;
-   unsigned long baud = DEFAULT_BAUD;
-   u8 bus, slot, func;
-   u32 classcode, bar0;
-   u16 cmdreg;
-   char *e;
-
-
-   /*
-* First, part the param to get the BDF values
-*/
-   if (*s == ',')
-   ++s;
-
-   if (*s == 0)
-   return;
-
-   bus = (u8)simple_strtoul(s, , 16);
-   s = e;
-   if (*s != ':')
-   return;
-   ++s;
-   slot = (u8)simple_strtoul(s, ,

[PATCH v2 1/2] serial_core: add pci uart early console support

2015-05-14 Thread Bin Gao

On some Intel Atom SoCs, the legacy IO port UART(0x3F8) is not available.
Instead, a 8250 compatible PCI uart can be used as early console.
This patch adds pci support to the 8250 early console driver uart8250.
For example, to enable pci uart(00:21.3) as early console on these
platforms, append the following line to the kernel command line
(assume baud rate is 115200):
earlyprintk=uart8250,pci32,0:24.2,115200n8

Signed-off-by: Bin Gao 
---
 drivers/tty/serial/earlycon.c|   6 ++
 drivers/tty/serial/serial_core.c | 140 ++-
 2 files changed, 144 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/serial/earlycon.c b/drivers/tty/serial/earlycon.c
index 5fdc9f3..586d84b 100644
--- a/drivers/tty/serial/earlycon.c
+++ b/drivers/tty/serial/earlycon.c
@@ -196,7 +196,13 @@ static int __init param_setup_earlycon(char *buf)
}
return err;
 }
+
+/* x86 uses "earlyprintk=xxx", so we keep the compatibility here */
+#ifdef CONFIG_X86
+early_param("earlyprintk", param_setup_earlycon);
+#else
 early_param("earlycon", param_setup_earlycon);
+#endif
 
 int __init of_setup_earlycon(unsigned long addr,
 int (*setup)(struct earlycon_device *, const char 
*))
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 0b7bb12..221143c 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -34,10 +34,16 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 
+/* Only x86 has early pci access APIs */
+#if defined(CONFIG_PCI) && defined(CONFIG_X86)
+#include 
+#endif
+
 /*
  * This is used to lock changes in serial line configuration.
  */
@@ -1808,6 +1814,98 @@ uart_get_console(struct uart_port *ports, int nr, struct 
console *co)
return ports + idx;
 }
 
+#if defined(CONFIG_PCI) && defined(CONFIG_X86)
+static int parse_bdf(char *options, char **endp, char delimiter, u8 *val)
+{
+   char str[4]; /* max 3 chars, plus a NULL terminator */
+   char *p = options;
+   int i = 0;
+
+   while (*p) {
+   if (i >= 4)
+   return -EINVAL;
+
+   if (*p == delimiter) {
+   str[i++] = 0;
+   if (endp)
+   *endp = p + 1;
+   return kstrtou8(str, 10, val); /* decimal, no hex */
+   }
+
+   str[i++] = *p++;
+   }
+
+   return -EINVAL;
+}
+
+/*
+ * The whole pci option from the command line is: pci[32],B:D.F[,options]
+ * Examples:
+ * pci,0:21.3,115200n8
+ * pci32,0:21.3
+ * Here pci32 means 8250 UART registers are 32-bit width(regshift = 2).
+ * pci means 8250 UART registers are 8-bit width(regshift = 0).
+ * B,D and F are bus, device and function, in decimal(not hex).
+ * The additional options(115200n8) would be parsed by the earlycon framework.
+ *
+ * @options: the pci options
+ * @phys: the pointer to return pci mem or io address
+ * return: <0: error
+ *  0: pci mem
+ *  1: pci io
+ */
+static int parse_pci_options(char *options, unsigned long *phys)
+{
+   u8 bus, dev, func;
+   char *endp;
+   u64 bar0;
+   u16 cmd;
+   int pci_io = 0;
+
+   /* We come here with options=B:D.F[,options] */
+   if (parse_bdf(options, , ':', ))
+   goto failed;
+
+   if (parse_bdf(endp, , '.', ))
+   goto failed;
+
+   if (parse_bdf(endp, , ',', ))
+   goto failed;
+
+   /*
+* On these platforms class code in pci config is broken,
+* so skip checking it.
+*/
+
+   bar0 = read_pci_config(bus, dev, func, PCI_BASE_ADDRESS_0);
+
+   /* The BAR is IO or Memory? */
+   if ((bar0 & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO)
+   pci_io = 1;
+
+   if ((bar0 & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
+   PCI_BASE_ADDRESS_MEM_TYPE_64)
+   bar0 |= (u64)read_pci_config(bus, dev, func,
+   PCI_BASE_ADDRESS_0 + 4) << 32;
+
+   *phys = bar0 & (pci_io ? PCI_BASE_ADDRESS_IO_MASK :
+PCI_BASE_ADDRESS_MEM_MASK);
+
+   /* Enable address decoding */
+   cmd = read_pci_config_16(bus, dev, func, PCI_COMMAND);
+   write_pci_config_16(bus, dev, func, PCI_COMMAND,
+   cmd | (pci_io ? PCI_COMMAND_IO : PCI_COMMAND_MEMORY));
+
+   pr_info("Use 8250 uart at PCI :%02u:%02u.%01u as early console\n",
+   bus, dev, func);
+   return pci_io;
+
+failed:
+   pr_err("Invalid earlycon pci parameters\n");
+   return -EINVAL;
+}
+#endif
+
 /**
  * uart_parse_earlycon - Parse earlycon options
  * @p:   ptr to 2nd field (ie., just beyond ',')
@@ -1816,8 +1914,9 @@ uart_get_console(struct uart_port *ports, int nr, struct 
console *co)
  * @options: ptr for  field; NULL if not present (out)
  *
  *

[RFC PATCH 1/1] perf/script: Script to display the ganged exits count on powerpc

2015-05-14 Thread Hemant Kumar

In powerpc, when a thread running in the guest context needs to exit to
the hypervisor to serve interrupts like the external interrupt, or the
hcall interrupt, etc, all the threads running in that specific vcore
inside the guest exit. These events can be classified as gang exits
which mean that they are forced exits. Only if the other vcpus cede,
then it won't be counted as a ganged exit.

What this script does is, it post processes the perf.data file to look
for two events : kvm_hv:kvmppc_run_core and kvm_hv:kvm_guest_exit. For a
kvm_hv:kvmppc_run_core tracepoint event, it initializes :

- if its an 'Entry', it gets the tgid and for that tgid, it initializes
  gang-exit count and cedes count.
- if its an 'Exit', it gets the runnable thread count and subtracts it
  from the no of cedes to see (if) how many runnable threads were in
  that core and how many of them ceded. If the difference is more than
  1 (its 1 because, we have to exclude the running thread itself), then
  its a ganged exit.

For a kvm_hv:kvm_guest_exit event, it checks if the vcpu ceded. If it
ceded, then increment the counter for cedes.

Usage :
 # perf record -e kvm_hv:kvm_guest_exit -e kvm_hv:kvmppc_run_core -a sleep 10
[ perf record: Woken up 96 times to write data ]
[ perf record: Captured and wrote 26.198 MB perf.data (~1144590 samples)]

 # perf script -s gang-exits.py
Ganged exits summary

Ganged exits for process 14000 :535
Ganged exits for process 13988 :  25314
===

Signed-off-by: Hemant Kumar 
---
 tools/perf/scripts/python/gang_exits.py | 65 +
 1 file changed, 65 insertions(+)
 create mode 100644 tools/perf/scripts/python/gang_exits.py

diff --git a/tools/perf/scripts/python/gang_exits.py 
b/tools/perf/scripts/python/gang_exits.py
new file mode 100644
index 000..011aa56
--- /dev/null
+++ b/tools/perf/scripts/python/gang_exits.py
@@ -0,0 +1,65 @@
+# gang-exits.py: Count the ganged exits of a VM
+#
+# In case of powerpc, When a thread running inside a guest needs to exit to
+# the hypervisor to serve interrupts like the external interrupt, or the hcall
+# interrupts, etc., all the threads running in that specific vcore
+# inside the guest exit to the host. These events are called as ganged exits.
+# These exits are forced. Only if the vcpus cede, then it/they won't be counted
+# as ganged exit(s).
+#
+# Usage :
+# So, if in powerpc, first we do :
+# perf record -e kvm_hv:kvm_guest_exit -e kvm_hv:kvmppc_run_core -aR sleep 

+# Using the perf.data, we have to do :
+# perf script -s gang-exits
+
+import os
+import sys
+
+sys.path.append(os.environ['PERF_EXEC_PATH'] + \
+'/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+
+from perf_trace_context import *
+from Core import *
+
+usage = "perf script -s gang_exits.py\n";
+
+stats = {}
+pid_tgid = {}
+
+def trace_begin():
+   print "Ganged exits summary"
+
+def trace_end():
+   print_ganged_exits()
+
+def kvm_hv__kvm_guest_exit(event_name, context, common_cpu,
+   common_secs, common_nsecs, common_pid, common_comm,
+   vcpu_id, reason, nip, msr, ceded):
+
+   if common_pid in pid_tgid:
+   if ceded:   # vcpu ceded ?
+   stats[pid_tgid[common_pid]]['nr_cedes'] += ceded
+
+def kvm_hv__kvmppc_run_core(event_name, context, common_cpu,
+   common_secs, common_nsecs, common_pid, common_comm,
+   n_runnable, runner_vcpu, where, tgid):
+
+   if (where): # kvmppc_run_core: Exit
+   if tgid in stats:
+   forced = n_runnable - stats[tgid]['nr_cedes']
+   if (forced > 1):
+   stats[tgid]['gang-exits'] += 1
+   else:   # kvmppc_run_core: Enter, init the counts
+   if tgid in stats:
+   stats[tgid]['nr_cedes'] = 0
+   else:
+   stats[tgid] = {'gang-exits': 0, 'nr_cedes': 0}
+   if common_pid not in pid_tgid:
+   pid_tgid[common_pid] = tgid
+
+def print_ganged_exits():
+   for i in stats.keys():
+   print "\nGanged exits for process %d : %20d" %(i, 
stats[i]['gang-exits'])
+
+   print "==="
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 0/1] perf/script: Ganged exits and VM topology

2015-05-14 Thread Hemant Kumar

In powerpc, if a thread running inside a guest needs to exit to the
host to serve interrupts like the external interrupt, or the hcall
interrupts, etc., all the threads running in that specific vcore
inside the guest exit to the host. These events are called as ganged
exits.

Because of the ganged exits, the other threads (if any) doing useful
work need to exit to the host. They can serve as a parameter to relate
the performance of the VM with their topology.

Here are a couple of examples to correlate this performance metric
with the topology of a VM.

The following setup was used :
Setup 1a :
VM (with 4 vcpus and one core)
ebizzy running on 2 vcpus.
No other load on the other 2 vcpus.
Resultant throughput for ebizzy in this case : 24373 records/sec
Total gang exits : 1174

Setup 1b:
VM (with 4 vcpus and one core)
ebizzy running on 2 vcpus.
Spinloop (while 1) loop running on other 2 vcpus.
Resultant throughput for ebizzy in this case : 20373 records/sec
Total gang exits : 1676

Setup 1c:
VM (with 4 vcpus and one core)
ebizzy running on 2 vcpus.
ping -f running on other 2 vcpus.
Resultant throughput for ebizzy in this case : 7841 records/sec
Total gang exits : 871073

Due to an increase in number of the gang exits, performance of ebizzy
dropped.

To verify the degradation in performance of ebizzy with the other
workloads running on the same core, the same set of loads were run on
the host machine too, with SMT on:
In all the following setups, ebizzy was pinned to 2 cpus and for
setups where some other load is running, the loads were pinned to
the other cpus of the same core.

Setup 2a:
ebizzy alone.
Resultant throughput for ebizzy in this case : 25099 records/sec

Setup 2b:
ebizzy and a spin loop (while 1) running on other cpus of the same
core.
Resultant throughput for ebizzy in this case : 22818 records/sec

Setup 2c:
ebizzy and ping -f (to a other machine in the same subnet).
Resultant throughput for ebizzy in this case : 17982 records/sec

We can see that the performance of ebizzy is dropping due to the
some load running on the other threads of the same core.

The "gang_exits" can serve as a parameter to define the topology of a
VM so that the load running on the VM can give us a maximum
throughput.

Here is an example with "redis" benchmark :

A VM running on 1 core and having two threads.
Running redis benchmark on this VM gives this throughput:
SET: 30048.08 requests per second
GET: 31806.62 requests per second
INCR: 247524.75 requests per second
LPUSH: 30284.68 requests per second
LPOP: 34036.76 requests per second
SADD: 168634.06 requests per second
SPOP: 261096.61 requests per second
MSET (10 keys): 11107.41 requests per second

For the entire run of redis :
Total gang_exits = 1192893

To see if we can reduce the number of gang_exits and increase the
throughput of redis benchmark by trying out a different topology and
system configuration, the cores were split into subcores. Each subcore
now has 2 threads each (SMT 2 mode).

So, the VM was started again with 2 subcores (with 1 thread each)
in SMT 1 mode. Running redis now gives this throughput :
SET: 36231.88 requests per second
GET: 57438.25 requests per second
INCR: 292397.66 requests per second
LPUSH: 38343.56 requests per second
LPOP: 53792.36 requests per second
SADD: 267379.66 requests per second
SPOP: 247524.75 requests per second
MSET (10 keys): 9922.60 requests per second

We see an increase in the performance of redis.
Total gang exits for this case : 0 (because of SMT 1)

The number of vcpus allocated to VM remained the same in both the
cases.

In the host, with the help of gang_exit numbers, we can change the
configuration of the host and the topology of the VM to increase the
throughput of the load (running on a VM).

If there is a single active thread on that core, none of the exits
should be counted in gang_exits.

Do have a look at the patch and let me know your feedback.

Thanks,

---
Hemant Kumar (1):
  perf/script: Python script to display the ganged exits count on powerpc

 tools/perf/scripts/python/gang_exits.py | 65 +
 1 file changed, 65 insertions(+)
 create mode 100644 tools/perf/scripts/python/gang_exits.py

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 2/9] powerpc/powernv: Add a virtual irqchip for opal events

2015-05-14 Thread Alistair Popple

Whenever an interrupt is received for opal the linux kernel gets a
bitfield indicating certain events that have occurred and need handling
by the various device drivers. Currently this is handled using a
notifier interface where we call every device driver that has
registered to receive opal events.

This approach has several drawbacks. For example each driver has to do
its own checking to see if the event is relevant as well as event
masking. There is also no easy method of recording the number of times
we receive particular events.

This patch solves these issues by exposing opal events via the
standard interrupt APIs by adding a new interrupt chip and
domain. Drivers can then register for the appropriate events using
standard kernel calls such as irq_of_parse_and_map().

Signed-off-by: Alistair Popple 
---
 arch/powerpc/include/asm/opal.h   |   3 +
 arch/powerpc/platforms/powernv/Makefile   |   2 +-
 arch/powerpc/platforms/powernv/opal-irqchip.c | 253 ++
 arch/powerpc/platforms/powernv/opal.c |  78 ++--
 arch/powerpc/platforms/powernv/powernv.h  |   4 +
 5 files changed, 273 insertions(+), 67 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/opal-irqchip.c

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 518a22e..520dfb2 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -242,6 +242,7 @@ extern void opal_msglog_init(void);
 extern int opal_async_comp_init(void);
 extern int opal_sensor_init(void);
 extern int opal_hmi_handler_init(void);
+extern int opal_event_init(void);
 
 extern int opal_machine_check(struct pt_regs *regs);
 extern bool opal_mce_check_early_recovery(struct pt_regs *regs);
@@ -253,6 +254,8 @@ extern int opal_resync_timebase(void);
 
 extern void opal_lpc_init(void);
 
+extern int opal_event_request(unsigned int opal_event_nr);
+
 struct opal_sg_list *opal_vmalloc_to_sg_list(void *vmalloc_addr,
 unsigned long vmalloc_size);
 void opal_free_sg_list(struct opal_sg_list *sg);
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 33e44f3..f1d7de2 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,7 +1,7 @@
 obj-y  += setup.o opal-wrappers.o opal.o opal-async.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
-obj-y  += opal-msglog.o opal-hmi.o opal-power.o
+obj-y  += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-p5ioc2.o pci-ioda.o
diff --git a/arch/powerpc/platforms/powernv/opal-irqchip.c 
b/arch/powerpc/platforms/powernv/opal-irqchip.c
new file mode 100644
index 000..bd5125d
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-irqchip.c
@@ -0,0 +1,253 @@
+/*
+ * This file implements an irqchip for OPAL events. Whenever there is
+ * an interrupt that is handled by OPAL we get passed a list of events
+ * that Linux needs to do something about. These basically look like
+ * interrupts to Linux so we implement an irqchip to handle them.
+ *
+ * Copyright Alistair Popple, IBM Corporation 2014.
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "powernv.h"
+
+/* Maximum number of events supported by OPAL firmware */
+#define MAX_NUM_EVENTS 64
+
+struct opal_event_irqchip {
+   struct irq_chip irqchip;
+   struct irq_domain *domain;
+   unsigned long mask;
+};
+static struct opal_event_irqchip opal_event_irqchip;
+
+static unsigned int opal_irq_count;
+static unsigned int *opal_irqs;
+
+static void opal_handle_irq_work(struct irq_work *work);
+static __be64 last_outstanding_events;
+static struct irq_work opal_event_irq_work = {
+   .func = opal_handle_irq_work,
+};
+
+static void opal_event_mask(struct irq_data *d)
+{
+   clear_bit(d->hwirq, _event_irqchip.mask);
+}
+
+static void opal_event_unmask(struct irq_data *d)
+{
+   set_bit(d->hwirq, _event_irqchip.mask);
+
+   opal_poll_events(_outstanding_events);
+   if (last_outstanding_events & opal_event_irqchip.mask)
+   /* Need to retrigger the interrupt */
+   irq_work_queue(_event_irq_work);
+}
+
+static int opal_event_set_type(struct irq_data *d, unsigned int flow_type)
+{
+   /*
+* For now we only support level triggered events. The irq
+

[PATCH v4 4/9] hvc: Convert to using interrupts instead of opal events

2015-05-14 Thread Alistair Popple

Convert the opal hvc driver to use the new irqchip to register for
opal events. As older firmware versions may not have device tree
bindings for the interrupt parent we just use a hardcoded hwirq based
on the event number.

Signed-off-by: Alistair Popple 
---
 drivers/tty/hvc/hvc_opal.c | 33 ++---
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/drivers/tty/hvc/hvc_opal.c b/drivers/tty/hvc/hvc_opal.c
index 543b234..47b54c6 100644
--- a/drivers/tty/hvc/hvc_opal.c
+++ b/drivers/tty/hvc/hvc_opal.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -61,7 +62,6 @@ static struct hvc_opal_priv 
*hvc_opal_privs[MAX_NR_HVC_CONSOLES];
 /* For early boot console */
 static struct hvc_opal_priv hvc_opal_boot_priv;
 static u32 hvc_opal_boot_termno;
-static bool hvc_opal_event_registered;
 
 static const struct hv_ops hvc_opal_raw_ops = {
.get_chars = opal_get_chars,
@@ -162,28 +162,15 @@ static const struct hv_ops hvc_opal_hvsi_ops = {
.tiocmset = hvc_opal_hvsi_tiocmset,
 };
 
-static int hvc_opal_console_event(struct notifier_block *nb,
- unsigned long events, void *change)
-{
-   if (events & OPAL_EVENT_CONSOLE_INPUT)
-   hvc_kick();
-   return 0;
-}
-
-static struct notifier_block hvc_opal_console_nb = {
-   .notifier_call  = hvc_opal_console_event,
-};
-
 static int hvc_opal_probe(struct platform_device *dev)
 {
const struct hv_ops *ops;
struct hvc_struct *hp;
struct hvc_opal_priv *pv;
hv_protocol_t proto;
-   unsigned int termno, boot = 0;
+   unsigned int termno, irq, boot = 0;
const __be32 *reg;
 
-
if (of_device_is_compatible(dev->dev.of_node, "ibm,opal-console-raw")) {
proto = HV_PROTOCOL_RAW;
ops = _opal_raw_ops;
@@ -227,18 +214,18 @@ static int hvc_opal_probe(struct platform_device *dev)
dev->dev.of_node->full_name,
boot ? " (boot console)" : "");
 
-   /* We don't do IRQ ... */
-   hp = hvc_alloc(termno, 0, ops, MAX_VIO_PUT_CHARS);
+   irq = opal_event_request(ilog2(OPAL_EVENT_CONSOLE_INPUT));
+   if (!irq) {
+   pr_err("hvc_opal: Unable to map interrupt for device %s\n",
+   dev->dev.of_node->full_name);
+   return irq;
+   }
+
+   hp = hvc_alloc(termno, irq, ops, MAX_VIO_PUT_CHARS);
if (IS_ERR(hp))
return PTR_ERR(hp);
dev_set_drvdata(>dev, hp);
 
-   /* ...  but we use OPAL event to kick the console */
-   if (!hvc_opal_event_registered) {
-   opal_notifier_register(_opal_console_nb);
-   hvc_opal_event_registered = true;
-   }
-
return 0;
 }
 
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 5/9] powernv/eeh: Update the EEH code to use the opal irq domain

2015-05-14 Thread Alistair Popple

The eeh code currently uses the old notifier method to get eeh events
from OPAL. It also contains some logic to filter opal events which has
been moved into the virtual irqchip. This patch converts the eeh code
to the new event interface which simplifies event handling.

Signed-off-by: Alistair Popple 
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 58 +++-
 1 file changed, 31 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index ce738ab..ca825ec 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -40,6 +41,7 @@
 #include "pci.h"
 
 static bool pnv_eeh_nb_init = false;
+static int eeh_event_irq = -EINVAL;
 
 /**
  * pnv_eeh_init - EEH platform dependent initialization
@@ -88,34 +90,22 @@ static int pnv_eeh_init(void)
return 0;
 }
 
-static int pnv_eeh_event(struct notifier_block *nb,
-unsigned long events, void *change)
+static irqreturn_t pnv_eeh_event(int irq, void *data)
 {
-   uint64_t changed_evts = (uint64_t)change;
-
/*
-* We simply send special EEH event if EEH has
-* been enabled, or clear pending events in
-* case that we enable EEH soon
+* We simply send a special EEH event if EEH has been
+* enabled. We don't care about EEH events until we've
+* finished processing the outstanding ones. Event processing
+* gets unmasked in next_error() if EEH is enabled.
 */
-   if (!(changed_evts & OPAL_EVENT_PCI_ERROR) ||
-   !(events & OPAL_EVENT_PCI_ERROR))
-   return 0;
+   disable_irq_nosync(irq);
 
if (eeh_enabled())
eeh_send_failure_event(NULL);
-   else
-   opal_notifier_update_evt(OPAL_EVENT_PCI_ERROR, 0x0ul);
 
-   return 0;
+   return IRQ_HANDLED;
 }
 
-static struct notifier_block pnv_eeh_nb = {
-   .notifier_call  = pnv_eeh_event,
-   .next   = NULL,
-   .priority   = 0
-};
-
 #ifdef CONFIG_DEBUG_FS
 static ssize_t pnv_eeh_ei_write(struct file *filp,
const char __user *user_buf,
@@ -237,16 +227,28 @@ static int pnv_eeh_post_init(void)
 
/* Register OPAL event notifier */
if (!pnv_eeh_nb_init) {
-   ret = opal_notifier_register(_eeh_nb);
-   if (ret) {
-   pr_warn("%s: Can't register OPAL event notifier (%d)\n",
-   __func__, ret);
+   eeh_event_irq = opal_event_request(ilog2(OPAL_EVENT_PCI_ERROR));
+   if (eeh_event_irq < 0) {
+   pr_err("%s: Can't register OPAL event interrupt (%d)\n",
+  __func__, eeh_event_irq);
+   return eeh_event_irq;
+   }
+
+   ret = request_irq(eeh_event_irq, pnv_eeh_event,
+   IRQ_TYPE_LEVEL_HIGH, "opal-eeh", NULL);
+   if (ret < 0) {
+   irq_dispose_mapping(eeh_event_irq);
+   pr_err("%s: Can't request OPAL event interrupt (%d)\n",
+  __func__, eeh_event_irq);
return ret;
}
 
pnv_eeh_nb_init = true;
}
 
+   if (!eeh_enabled())
+   disable_irq(eeh_event_irq);
+
list_for_each_entry(hose, _list, list_node) {
phb = hose->private_data;
 
@@ -1303,12 +1305,10 @@ static int pnv_eeh_next_error(struct eeh_pe **pe)
int state, ret = EEH_NEXT_ERR_NONE;
 
/*
-* While running here, it's safe to purge the event queue.
-* And we should keep the cached OPAL notifier event sychronized
-* between the kernel and firmware.
+* While running here, it's safe to purge the event queue. The
+* event should still be masked.
 */
eeh_remove_event(NULL, false);
-   opal_notifier_update_evt(OPAL_EVENT_PCI_ERROR, 0x0ul);
 
list_for_each_entry(hose, _list, list_node) {
/*
@@ -1477,6 +1477,10 @@ static int pnv_eeh_next_error(struct eeh_pe **pe)
break;
}
 
+   /* Unmask the event */
+   if (eeh_enabled())
+   enable_irq(eeh_event_irq);
+
return ret;
 }
 
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 3/9] ipmi/powernv: Convert to irq event interface

2015-05-14 Thread Alistair Popple

Convert the opal ipmi driver to use the new irq interface for events.

Signed-off-by: Alistair Popple 
Acked-by: Corey Minyard 
Cc: Corey Minyard 
Cc: openipmi-develo...@lists.sourceforge.net
---
 drivers/char/ipmi/ipmi_powernv.c | 39 ++-
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_powernv.c b/drivers/char/ipmi/ipmi_powernv.c
index 8753b0f..9b409c0 100644
--- a/drivers/char/ipmi/ipmi_powernv.c
+++ b/drivers/char/ipmi/ipmi_powernv.c
@@ -15,6 +15,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 
@@ -23,8 +25,7 @@ struct ipmi_smi_powernv {
u64 interface_id;
struct ipmi_device_id   ipmi_id;
ipmi_smi_t  intf;
-   u64 event;
-   struct notifier_block   event_nb;
+   unsigned intirq;
 
/**
 * We assume that there can only be one outstanding request, so
@@ -197,15 +198,12 @@ static struct ipmi_smi_handlers ipmi_powernv_smi_handlers 
= {
.poll   = ipmi_powernv_poll,
 };
 
-static int ipmi_opal_event(struct notifier_block *nb,
- unsigned long events, void *change)
+static irqreturn_t ipmi_opal_event(int irq, void *data)
 {
-   struct ipmi_smi_powernv *smi = container_of(nb,
-   struct ipmi_smi_powernv, event_nb);
+   struct ipmi_smi_powernv *smi = data;
 
-   if (events & smi->event)
-   ipmi_powernv_recv(smi);
-   return 0;
+   ipmi_powernv_recv(smi);
+   return IRQ_HANDLED;
 }
 
 static int ipmi_powernv_probe(struct platform_device *pdev)
@@ -240,13 +238,16 @@ static int ipmi_powernv_probe(struct platform_device 
*pdev)
goto err_free;
}
 
-   ipmi->event = 1ull << prop;
-   ipmi->event_nb.notifier_call = ipmi_opal_event;
+   ipmi->irq = irq_of_parse_and_map(dev->of_node, 0);
+   if (!ipmi->irq) {
+   dev_info(dev, "Unable to map irq from device tree\n");
+   ipmi->irq = opal_event_request(prop);
+   }
 
-   rc = opal_notifier_register(>event_nb);
-   if (rc) {
-   dev_warn(dev, "OPAL notifier registration failed (%d)\n", rc);
-   goto err_free;
+   if (request_irq(ipmi->irq, ipmi_opal_event, IRQ_TYPE_LEVEL_HIGH,
+   "opal-ipmi", ipmi)) {
+   dev_warn(dev, "Unable to request irq\n");
+   goto err_dispose;
}
 
ipmi->opal_msg = devm_kmalloc(dev,
@@ -271,7 +272,9 @@ static int ipmi_powernv_probe(struct platform_device *pdev)
 err_free_msg:
devm_kfree(dev, ipmi->opal_msg);
 err_unregister:
-   opal_notifier_unregister(>event_nb);
+   free_irq(ipmi->irq, ipmi);
+err_dispose:
+   irq_dispose_mapping(ipmi->irq);
 err_free:
devm_kfree(dev, ipmi);
return rc;
@@ -282,7 +285,9 @@ static int ipmi_powernv_remove(struct platform_device *pdev)
struct ipmi_smi_powernv *smi = dev_get_drvdata(>dev);
 
ipmi_unregister_smi(smi->intf);
-   opal_notifier_unregister(>event_nb);
+   free_irq(smi->irq, smi);
+   irq_dispose_mapping(smi->irq);
+
return 0;
 }
 
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 7/9] powernv/elog: Convert elog to opal irq domain

2015-05-14 Thread Alistair Popple

This patch converts the elog code to use the opal irq domain instead
of notifier events.

Signed-off-by: Alistair Popple 
---
 arch/powerpc/platforms/powernv/opal-elog.c | 32 +++---
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-elog.c 
b/arch/powerpc/platforms/powernv/opal-elog.c
index 38ce757..4949ef0 100644
--- a/arch/powerpc/platforms/powernv/opal-elog.c
+++ b/arch/powerpc/platforms/powernv/opal-elog.c
@@ -10,6 +10,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -276,24 +277,15 @@ static void elog_work_fn(struct work_struct *work)
 
 static DECLARE_WORK(elog_work, elog_work_fn);
 
-static int elog_event(struct notifier_block *nb,
-   unsigned long events, void *change)
+static irqreturn_t elog_event(int irq, void *data)
 {
-   /* check for error log event */
-   if (events & OPAL_EVENT_ERROR_LOG_AVAIL)
-   schedule_work(_work);
-   return 0;
+   schedule_work(_work);
+   return IRQ_HANDLED;
 }
 
-static struct notifier_block elog_nb = {
-   .notifier_call  = elog_event,
-   .next   = NULL,
-   .priority   = 0
-};
-
 int __init opal_elog_init(void)
 {
-   int rc = 0;
+   int rc = 0, irq;
 
/* ELOG not supported by firmware */
if (!opal_check_token(OPAL_ELOG_READ))
@@ -305,10 +297,18 @@ int __init opal_elog_init(void)
return -1;
}
 
-   rc = opal_notifier_register(_nb);
+   irq = opal_event_request(ilog2(OPAL_EVENT_ERROR_LOG_AVAIL));
+   if (!irq) {
+   pr_err("%s: Can't register OPAL event irq (%d)\n",
+  __func__, irq);
+   return irq;
+   }
+
+   rc = request_irq(irq, elog_event,
+   IRQ_TYPE_LEVEL_HIGH, "opal-elog", NULL);
if (rc) {
-   pr_err("%s: Can't register OPAL event notifier (%d)\n",
-   __func__, rc);
+   pr_err("%s: Can't request OPAL event irq (%d)\n",
+  __func__, rc);
return rc;
}
 
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 9/9] opal: Remove events notifier

2015-05-14 Thread Alistair Popple

All users of the old opal events notifier have been converted over to
the irq domain so remove the event notifier functions.

Signed-off-by: Alistair Popple 
---
 arch/powerpc/platforms/powernv/opal-irqchip.c | 16 ++---
 arch/powerpc/platforms/powernv/opal.c | 84 +--
 arch/powerpc/platforms/powernv/powernv.h  |  1 -
 arch/powerpc/platforms/powernv/setup.c|  2 +-
 4 files changed, 8 insertions(+), 95 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-irqchip.c 
b/arch/powerpc/platforms/powernv/opal-irqchip.c
index bd5125d..841135f 100644
--- a/arch/powerpc/platforms/powernv/opal-irqchip.c
+++ b/arch/powerpc/platforms/powernv/opal-irqchip.c
@@ -100,7 +100,6 @@ void opal_handle_events(uint64_t events)
 {
int virq, hwirq = 0;
u64 mask = opal_event_irqchip.mask;
-   u64 notifier_mask = 0;
 
if (!in_irq() && (events & mask)) {
last_outstanding_events = events;
@@ -108,19 +107,16 @@ void opal_handle_events(uint64_t events)
return;
}
 
-   while (events) {
+   while (events & mask) {
hwirq = fls64(events) - 1;
-   virq = irq_find_mapping(opal_event_irqchip.domain,
-   hwirq);
-   if (virq) {
-   if (BIT_ULL(hwirq) & mask)
+   if (BIT_ULL(hwirq) & mask) {
+   virq = irq_find_mapping(opal_event_irqchip.domain,
+   hwirq);
+   if (virq)
generic_handle_irq(virq);
-   } else
-   notifier_mask |= BIT_ULL(hwirq);
+   }
events &= ~BIT_ULL(hwirq);
}
-
-   opal_do_notifier(notifier_mask);
 }
 
 static irqreturn_t opal_interrupt(int irq, void *data)
diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index cd5718b..8403307 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -53,11 +53,7 @@ static int mc_recoverable_range_len;
 
 struct device_node *opal_node;
 static DEFINE_SPINLOCK(opal_write_lock);
-static ATOMIC_NOTIFIER_HEAD(opal_notifier_head);
 static struct atomic_notifier_head opal_msg_notifier_head[OPAL_MSG_TYPE_MAX];
-static DEFINE_SPINLOCK(opal_notifier_lock);
-static uint64_t last_notified_mask = 0x0ul;
-static atomic_t opal_notifier_hold = ATOMIC_INIT(0);
 static uint32_t opal_heartbeat;
 
 static void opal_reinit_cores(void)
@@ -223,82 +219,6 @@ static int __init opal_register_exception_handlers(void)
 }
 machine_early_initcall(powernv, opal_register_exception_handlers);
 
-int opal_notifier_register(struct notifier_block *nb)
-{
-   if (!nb) {
-   pr_warning("%s: Invalid argument (%p)\n",
-  __func__, nb);
-   return -EINVAL;
-   }
-
-   atomic_notifier_chain_register(_notifier_head, nb);
-   return 0;
-}
-EXPORT_SYMBOL_GPL(opal_notifier_register);
-
-int opal_notifier_unregister(struct notifier_block *nb)
-{
-   if (!nb) {
-   pr_warning("%s: Invalid argument (%p)\n",
-  __func__, nb);
-   return -EINVAL;
-   }
-
-   atomic_notifier_chain_unregister(_notifier_head, nb);
-   return 0;
-}
-EXPORT_SYMBOL_GPL(opal_notifier_unregister);
-
-void opal_do_notifier(uint64_t events)
-{
-   unsigned long flags;
-   uint64_t changed_mask;
-
-   if (atomic_read(_notifier_hold))
-   return;
-
-   spin_lock_irqsave(_notifier_lock, flags);
-   changed_mask = last_notified_mask ^ events;
-   last_notified_mask = events;
-   spin_unlock_irqrestore(_notifier_lock, flags);
-
-   /*
-* We feed with the event bits and changed bits for
-* enough information to the callback.
-*/
-   atomic_notifier_call_chain(_notifier_head,
-  events, (void *)changed_mask);
-}
-
-void opal_notifier_update_evt(uint64_t evt_mask,
- uint64_t evt_val)
-{
-   unsigned long flags;
-
-   spin_lock_irqsave(_notifier_lock, flags);
-   last_notified_mask &= ~evt_mask;
-   last_notified_mask |= evt_val;
-   spin_unlock_irqrestore(_notifier_lock, flags);
-}
-
-void opal_notifier_enable(void)
-{
-   int64_t rc;
-   __be64 evt = 0;
-
-   atomic_set(_notifier_hold, 0);
-
-   /* Process pending events */
-   rc = opal_poll_events();
-   if (rc == OPAL_SUCCESS && evt)
-   opal_do_notifier(be64_to_cpu(evt));
-}
-
-void opal_notifier_disable(void)
-{
-   atomic_set(_notifier_hold, 1);
-}
-
 /*
  * Opal message notifier based on message type. Allow subscribers to get
  * notified for specific messgae type.
@@ -570,10 +490,8 @@ int opal_handle_hmi_exception(struct pt_regs *regs)
 
local_paca->hmi_event_available = 0;
rc =

[PATCH v4 6/9] powernv/opal: Convert opal message events to opal irq domain

2015-05-14 Thread Alistair Popple

This patch converts the opal message event to use the new opal irq
domain.

Signed-off-by: Alistair Popple 
---
 arch/powerpc/platforms/powernv/opal.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 3baca71..cd5718b 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -362,33 +362,34 @@ static void opal_handle_message(void)
opal_message_do_notify(type, (void *));
 }
 
-static int opal_message_notify(struct notifier_block *nb,
- unsigned long events, void *change)
+static irqreturn_t opal_message_notify(int irq, void *data)
 {
-   if (events & OPAL_EVENT_MSG_PENDING)
-   opal_handle_message();
-   return 0;
+   opal_handle_message();
+   return IRQ_HANDLED;
 }
 
-static struct notifier_block opal_message_nb = {
-   .notifier_call  = opal_message_notify,
-   .next   = NULL,
-   .priority   = 0,
-};
-
 static int __init opal_message_init(void)
 {
-   int ret, i;
+   int ret, i, irq;
 
for (i = 0; i < OPAL_MSG_TYPE_MAX; i++)
ATOMIC_INIT_NOTIFIER_HEAD(_msg_notifier_head[i]);
 
-   ret = opal_notifier_register(_message_nb);
+   irq = opal_event_request(ilog2(OPAL_EVENT_MSG_PENDING));
+   if (!irq) {
+   pr_err("%s: Can't register OPAL event irq (%d)\n",
+  __func__, irq);
+   return irq;
+   }
+
+   ret = request_irq(irq, opal_message_notify,
+   IRQ_TYPE_LEVEL_HIGH, "opal-msg", NULL);
if (ret) {
-   pr_err("%s: Can't register OPAL event notifier (%d)\n",
+   pr_err("%s: Can't request OPAL event irq (%d)\n",
   __func__, ret);
return ret;
}
+
return 0;
 }
 
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 8/9] powernv/opal-dump: Convert to irq domain

2015-05-14 Thread Alistair Popple

Convert the opal dump driver to the new opal irq domain.

Signed-off-by: Alistair Popple 
---
 arch/powerpc/platforms/powernv/opal-dump.c | 56 +-
 1 file changed, 17 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-dump.c 
b/arch/powerpc/platforms/powernv/opal-dump.c
index 5aa9c1c..2ee9643 100644
--- a/arch/powerpc/platforms/powernv/opal-dump.c
+++ b/arch/powerpc/platforms/powernv/opal-dump.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -60,7 +61,7 @@ static ssize_t dump_type_show(struct dump_obj *dump_obj,
  struct dump_attribute *attr,
  char *buf)
 {
-   
+
return sprintf(buf, "0x%x %s\n", dump_obj->type,
   dump_type_to_string(dump_obj->type));
 }
@@ -363,7 +364,7 @@ static struct dump_obj *create_dump_obj(uint32_t id, size_t 
size,
return dump;
 }
 
-static int process_dump(void)
+static irqreturn_t process_dump(int irq, void *data)
 {
int rc;
uint32_t dump_id, dump_size, dump_type;
@@ -387,45 +388,13 @@ static int process_dump(void)
if (!dump)
return -1;
 
-   return 0;
-}
-
-static void dump_work_fn(struct work_struct *work)
-{
-   process_dump();
+   return IRQ_HANDLED;
 }
 
-static DECLARE_WORK(dump_work, dump_work_fn);
-
-static void schedule_process_dump(void)
-{
-   schedule_work(_work);
-}
-
-/*
- * New dump available notification
- *
- * Once we get notification, we add sysfs entries for it.
- * We only fetch the dump on demand, and create sysfs asynchronously.
- */
-static int dump_event(struct notifier_block *nb,
- unsigned long events, void *change)
-{
-   if (events & OPAL_EVENT_DUMP_AVAIL)
-   schedule_process_dump();
-
-   return 0;
-}
-
-static struct notifier_block dump_nb = {
-   .notifier_call  = dump_event,
-   .next   = NULL,
-   .priority   = 0
-};
-
 void __init opal_platform_dump_init(void)
 {
int rc;
+   int dump_irq;
 
/* ELOG not supported by firmware */
if (!opal_check_token(OPAL_DUMP_READ))
@@ -445,10 +414,19 @@ void __init opal_platform_dump_init(void)
return;
}
 
-   rc = opal_notifier_register(_nb);
+   dump_irq = opal_event_request(ilog2(OPAL_EVENT_DUMP_AVAIL));
+   if (!dump_irq) {
+   pr_err("%s: Can't register OPAL event irq (%d)\n",
+  __func__, dump_irq);
+   return;
+   }
+
+   rc = request_threaded_irq(dump_irq, NULL, process_dump,
+   IRQF_TRIGGER_HIGH | IRQF_ONESHOT,
+   "opal-dump", NULL);
if (rc) {
-   pr_warn("%s: Can't register OPAL event notifier (%d)\n",
-   __func__, rc);
+   pr_err("%s: Can't request OPAL event irq (%d)\n",
+  __func__, rc);
return;
}
 
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 1/9] powerpc/powernv: Reorder OPAL subsystem initialisation

2015-05-14 Thread Alistair Popple

Most of the OPAL subsystems are always compiled in for PowerNV and
many of them need to be initialised before or after other OPAL
subsystems. Rather than trying to control this ordering through
machine initcalls it is clearer and easier to control initialisation
order with explicit calls in opal_init.

Signed-off-by: Alistair Popple 
Cc: Mahesh Jagannath Salgaonkar 
---
 arch/powerpc/include/asm/opal.h |  3 +++
 arch/powerpc/platforms/powernv/opal-async.c |  3 +--
 arch/powerpc/platforms/powernv/opal-hmi.c   |  3 +--
 arch/powerpc/platforms/powernv/opal-memory-errors.c |  2 +-
 arch/powerpc/platforms/powernv/opal-sensor.c|  3 +--
 arch/powerpc/platforms/powernv/opal.c   | 13 -
 6 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 042af1a..518a22e 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -239,6 +239,9 @@ extern int opal_elog_init(void);
 extern void opal_platform_dump_init(void);
 extern void opal_sys_param_init(void);
 extern void opal_msglog_init(void);
+extern int opal_async_comp_init(void);
+extern int opal_sensor_init(void);
+extern int opal_hmi_handler_init(void);
 
 extern int opal_machine_check(struct pt_regs *regs);
 extern bool opal_mce_check_early_recovery(struct pt_regs *regs);
diff --git a/arch/powerpc/platforms/powernv/opal-async.c 
b/arch/powerpc/platforms/powernv/opal-async.c
index 693b6cd..bdc8c0c 100644
--- a/arch/powerpc/platforms/powernv/opal-async.c
+++ b/arch/powerpc/platforms/powernv/opal-async.c
@@ -151,7 +151,7 @@ static struct notifier_block opal_async_comp_nb = {
.priority   = 0,
 };
 
-static int __init opal_async_comp_init(void)
+int __init opal_async_comp_init(void)
 {
struct device_node *opal_node;
const __be32 *async;
@@ -205,4 +205,3 @@ out_opal_node:
 out:
return err;
 }
-machine_subsys_initcall(powernv, opal_async_comp_init);
diff --git a/arch/powerpc/platforms/powernv/opal-hmi.c 
b/arch/powerpc/platforms/powernv/opal-hmi.c
index b322bfb..a8f49d3 100644
--- a/arch/powerpc/platforms/powernv/opal-hmi.c
+++ b/arch/powerpc/platforms/powernv/opal-hmi.c
@@ -170,7 +170,7 @@ static struct notifier_block opal_hmi_handler_nb = {
.priority   = 0,
 };
 
-static int __init opal_hmi_handler_init(void)
+int __init opal_hmi_handler_init(void)
 {
int ret;
 
@@ -186,4 +186,3 @@ static int __init opal_hmi_handler_init(void)
}
return 0;
 }
-machine_subsys_initcall(powernv, opal_hmi_handler_init);
diff --git a/arch/powerpc/platforms/powernv/opal-memory-errors.c 
b/arch/powerpc/platforms/powernv/opal-memory-errors.c
index 43db213..00a2943 100644
--- a/arch/powerpc/platforms/powernv/opal-memory-errors.c
+++ b/arch/powerpc/platforms/powernv/opal-memory-errors.c
@@ -144,4 +144,4 @@ static int __init opal_mem_err_init(void)
}
return 0;
 }
-machine_subsys_initcall(powernv, opal_mem_err_init);
+machine_device_initcall(powernv, opal_mem_err_init);
diff --git a/arch/powerpc/platforms/powernv/opal-sensor.c 
b/arch/powerpc/platforms/powernv/opal-sensor.c
index 6552504..a06059d 100644
--- a/arch/powerpc/platforms/powernv/opal-sensor.c
+++ b/arch/powerpc/platforms/powernv/opal-sensor.c
@@ -77,7 +77,7 @@ out:
 }
 EXPORT_SYMBOL_GPL(opal_get_sensor_data);
 
-static __init int opal_sensor_init(void)
+int __init opal_sensor_init(void)
 {
struct platform_device *pdev;
struct device_node *sensor;
@@ -93,4 +93,3 @@ static __init int opal_sensor_init(void)
 
return PTR_ERR_OR_ZERO(pdev);
 }
-machine_subsys_initcall(powernv, opal_sensor_init);
diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 2241565..eb3decc 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -393,7 +393,6 @@ static int __init opal_message_init(void)
}
return 0;
 }
-machine_early_initcall(powernv, opal_message_init);
 
 int opal_get_chars(uint32_t vtermno, char *buf, int count)
 {
@@ -807,6 +806,18 @@ static int __init opal_init(void)
of_node_put(consoles);
}
 
+   /* Initialise OPAL messaging system */
+   opal_message_init();
+
+   /* Initialise OPAL asynchronous completion interface */
+   opal_async_comp_init();
+
+   /* Initialise OPAL sensor interface */
+   opal_sensor_init();
+
+   /* Initialise OPAL hypervisor maintainence interrupt handling */
+   opal_hmi_handler_init();
+
/* Create i2c platform devices */
opal_i2c_create_devs();
 
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 0/9] Convert OPAL notifier events to an irqchip

2015-05-14 Thread Alistair Popple

Whenever an interrupt is received for opal the linux kernel gets a
bitfield indicating certain events that have occurred and need handling
by the various device drivers. Currently this is handled using a
notifier interface where we call every device driver that has
registered to receive opal events.

This approach has several drawbacks. For example each driver has to do
its own checking to see if the event is relevant as well as event
masking. There is also no easy method of recording the number of times
we receive particular events.

This series solves these issues by exposing opal events via the
standard interrupt APIs by adding a new interrupt chip and
domain. Drivers can then register for the appropriate events using
standard kernel calls such as irq_of_parse_and_map().

Patches 2-8 of this series are the same as for v3.

Changes from v3:
 - Changed initialisation sequence to solve the following errors on
   some machines:

   irq: XICS didn't like hwirq-0xb to VIRQ17 mapping (rc=-22)
   opal: opal_message_init: Can't register OPAL event irq (0)

 - Added a poller to the OPAL heartbeat to support machines where OPAL
   can't generate an interrupt for the host (eg. mambo)

Changes from v2:
 - Addressed comments by Neelesh Gupta
 - Fixed soft-lockup bug reported by Neelesh in the opal-dump driver
 - Rebased on v4.1-rc1

Alistair Popple (9):
  powerpc/powernv: Reorder OPAL subsystem initialisation
  powerpc/powernv: Add a virtual irqchip for opal events
  ipmi/powernv: Convert to irq event interface
  hvc: Convert to using interrupts instead of opal events
  powernv/eeh: Update the EEH code to use the opal irq domain
  powernv/opal: Convert opal message events to opal irq domain
  powernv/elog: Convert elog to opal irq domain
  powernv/opal-dump: Convert to irq domain
  opal: Remove events notifier

 arch/powerpc/include/asm/opal.h|   6 +
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c   |  58 ++---
 arch/powerpc/platforms/powernv/opal-async.c|   3 +-
 arch/powerpc/platforms/powernv/opal-dump.c |  56 ++---
 arch/powerpc/platforms/powernv/opal-elog.c |  32 +--
 arch/powerpc/platforms/powernv/opal-hmi.c  |   3 +-
 arch/powerpc/platforms/powernv/opal-irqchip.c  | 249 +
 .../powerpc/platforms/powernv/opal-memory-errors.c |   2 +-
 arch/powerpc/platforms/powernv/opal-sensor.c   |   3 +-
 arch/powerpc/platforms/powernv/opal.c  | 196 +++-
 arch/powerpc/platforms/powernv/powernv.h   |   3 +
 arch/powerpc/platforms/powernv/setup.c |   2 +-
 drivers/char/ipmi/ipmi_powernv.c   |  39 ++--
 drivers/tty/hvc/hvc_opal.c |  33 +--
 15 files changed, 396 insertions(+), 291 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/opal-irqchip.c

--
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG] kernel panic after bpf program removed.

2015-05-14 Thread Wangnan (F)


Hi Alexei Starovoitov and other,

I triggered a kernel panic when developing my 'perf bpf' facility. The 
call stack is listed at the bottom of

this mail.

I attached two bpf programs on 'kmem_cache_free%return' and 
'__alloc_pages_nodemask'. The programs is very simple.
The panic is raised after closing the bpf program and the perf event 
file. Looks like the panic is caused
by racing between closing perf event fd and bpf program fd. I'm unable 
to reproduce this problem with similar

operations.

Following is the exact instruction cause the panic.

8111cf70 :

void bpf_prog_put(struct bpf_prog *prog)
{
8111cf70:   e8 fb a1 49 00  callq 815b7170 
<__fentry__>

8111cf75:   55  push   %rbp
8111cf76:   48 89 e5mov%rsp,%rbp
8111cf79:   53  push   %rbx
8111cf7a:   48 89 fbmov%rdi,%rbx
8111cf7d:   48 83 ec 08 sub$0x8,%rsp
8111cf81:   48 8b 47 10 mov 0x10(%rdi),%rax 
<-- *panic at this instruction*

8111cf85:   f0 ff 08lock decl (%rax)
8111cf88:   74 0e   je 8111cf98 


if (atomic_dec_and_test(>aux->refcnt)) {
free_used_maps(prog->aux);
bpf_prog_free(prog);
}
}
8111cf8a:   48 83 c4 08 add$0x8,%rsp
8111cf8e:   5b  pop%rbx
8111cf8f:   5d  pop%rbp
8111cf90:   c3  retq

Thank you.

--- KERNEL PANIC ---

[  261.839750] BUG: unable to handle kernel NULL pointer dereference at 
06d0

[  261.839750] IP: [] bpf_prog_put+0x11/0x50
[  261.839750] PGD 7f7d0067 PUD 7f74d067 PMD 0
[  261.839750] Oops:  [#1] SMP
[  261.839750] Modules linked in:
[  261.839750] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.0.0+ #11
[  261.839750] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 
04/01/2014
[  261.839750] task: 81a114a0 ti: 81a0 task.ti: 
81a0
[  261.839750] RIP: 0010:[] [] 
bpf_prog_put+0x11/0x50

[  261.839750] RSP: 0018:88007ea03e68  EFLAGS: 0292
[  261.839750] RAX: 880076e35d20 RBX: 06c0 RCX: 
81123d60
[  261.839750] RDX: 0001000d000b RSI:  RDI: 
06c0
[  261.839750] RBP: 88007ea03e78 R08: 88007f10c3c0 R09: 
88007ea189c0
[  261.839750] R10: 88007aa68290 R11: 88007ea0800d R12: 
88007643a000
[  261.839750] R13: 000a R14: 0125 R15: 
88007ea16540
[  261.839750] FS:  () GS:88007ea0() 
knlGS:

[  261.839750] CS:  0010 DS:  ES:  CR0: 8005003b
[  261.839750] CR2: 06d0 CR3: 78aa5000 CR4: 
06f0
[  261.839750] DR0:  DR1:  DR2: 

[  261.839750] DR3:  DR6:  DR7: 


[  261.839750] Stack:
[  261.839750]  88007ea03e78 88007643a320 88007ea03e98 
81123dac
[  261.839750]  81a38380 88007f7de000 88007ea03f08 
810a2d0b
[  261.839750]  81ced238 88007b911508 88007ea16570 
81a114a0

[  261.839750] Call Trace:
[  261.839750]  
[  261.839750]  [] free_event_rcu+0x4c/0x60
[  261.839750]  [] rcu_process_callbacks+0x25b/0x5a0
[  261.839750]  [] __do_softirq+0xed/0x280
[  261.839750]  [] irq_exit+0x4d/0x60
[  261.839750]  [] smp_apic_timer_interrupt+0x4a/0x60
[  261.839750]  [] apic_timer_interrupt+0x6b/0x70
[  261.839750]  
[  261.839750]  [] ? default_idle+0x20/0xb0
[  261.839750]  [] arch_cpu_idle+0xf/0x20
[  261.839750]  [] cpu_startup_entry+0x2f7/0x400
[  261.839750]  [] rest_init+0x77/0x80
[  261.839750]  [] start_kernel+0x423/0x430
[  261.839750]  [] ? set_init_arg+0x56/0x56
[  261.839750]  [] x86_64_start_reservations+0x2a/0x2c
[  261.839750]  [] x86_64_start_kernel+0xec/0xf0
[  261.839750] Code: 24 72 e7 49 8b 7d 00 e8 8e ce 05 00 48 83 c4 08 5b 
41 5c 41 5d 5d c3 0f 1f 00 66 66 66 66 90 55 48 89 e5 53 48 89 fb 48 83 
ec 08 <48> 8b 47 10 3e ff 08 74 0e 48 83 c4 08 5b 5d c3 0f 1f 80 00 00

[  261.839750] RIP  [] bpf_prog_put+0x11/0x50
[  261.839750]  RSP 
[  261.839750] CR2: 06d0
[  261.839750] ---[ end trace dddf4ec721745b49 ]---
[  261.839750] Kernel panic - not syncing: Fatal exception in interrupt
[  261.839750] Kernel Offset: disabled
[  261.839750] ---[ end Kernel panic - not syncing: Fatal exception in 
interrupt


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at

[PATCH 5/6] Refactor code to use new fw_event refcount

2015-05-14 Thread Calvin Owens

This refactors the fw_event code to use the new refcount.

Signed-off-by: Calvin Owens 
---
 drivers/scsi/mpt2sas/mpt2sas_scsih.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c 
b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index 611b34d..8d8c814 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -2863,6 +2863,7 @@ _scsih_fw_event_add(struct MPT2SAS_ADAPTER *ioc, struct 
fw_event_work *fw_event)
return;
 
spin_lock_irqsave(>fw_event_lock, flags);
+   fw_event_work_get(fw_event);
list_add_tail(_event->list, >fw_event_list);
INIT_DELAYED_WORK(_event->delayed_work, _firmware_event_work);
queue_delayed_work(ioc->firmware_event_thread,
@@ -2887,12 +2888,13 @@ _scsih_fw_event_free(struct MPT2SAS_ADAPTER *ioc, 
struct fw_event_work
unsigned long flags;
 
spin_lock_irqsave(>fw_event_lock, flags);
-   list_del(_event->list);
-   kfree(fw_event);
+   if (!list_empty(_event->list))
+   list_del_init(_event->list);
+
+   fw_event_work_put(fw_event);
spin_unlock_irqrestore(>fw_event_lock, flags);
 }
 
-
 /**
  * _scsih_error_recovery_delete_devices - remove devices not responding
  * @ioc: per adapter object
@@ -2907,13 +2909,14 @@ _scsih_error_recovery_delete_devices(struct 
MPT2SAS_ADAPTER *ioc)
if (ioc->is_driver_loading)
return;
 
-   fw_event = kzalloc(sizeof(struct fw_event_work), GFP_ATOMIC);
+   fw_event = alloc_fw_event_work(0);
if (!fw_event)
return;
 
fw_event->event = MPT2SAS_REMOVE_UNRESPONDING_DEVICES;
fw_event->ioc = ioc;
_scsih_fw_event_add(ioc, fw_event);
+   fw_event_work_put(fw_event);
 }
 
 /**
@@ -2927,12 +2930,13 @@ mpt2sas_port_enable_complete(struct MPT2SAS_ADAPTER 
*ioc)
 {
struct fw_event_work *fw_event;
 
-   fw_event = kzalloc(sizeof(struct fw_event_work), GFP_ATOMIC);
+   fw_event = alloc_fw_event_work(0);
if (!fw_event)
return;
fw_event->event = MPT2SAS_PORT_ENABLE_COMPLETE;
fw_event->ioc = ioc;
_scsih_fw_event_add(ioc, fw_event);
+   fw_event_work_put(fw_event);
 }
 
 /**
@@ -4439,13 +4443,14 @@ _scsih_send_event_to_turn_on_pfa_led(struct 
MPT2SAS_ADAPTER *ioc, u16 handle)
 {
struct fw_event_work *fw_event;
 
-   fw_event = kzalloc(sizeof(struct fw_event_work), GFP_ATOMIC);
+   fw_event = alloc_fw_event_work(0);
if (!fw_event)
return;
fw_event->event = MPT2SAS_TURN_ON_PFA_LED;
fw_event->device_handle = handle;
fw_event->ioc = ioc;
_scsih_fw_event_add(ioc, fw_event);
+   fw_event_work_put(fw_event);
 }
 
 /**
@@ -7740,7 +7745,7 @@ mpt2sas_scsih_event_callback(struct MPT2SAS_ADAPTER *ioc, 
u8 msix_index,
}
 
sz = le16_to_cpu(mpi_reply->EventDataLength) * 4;
-   fw_event = kzalloc(sizeof(*fw_event) + sz, GFP_ATOMIC);
+   fw_event = alloc_fw_event_work(sz);
if (!fw_event) {
printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
ioc->name, __FILE__, __LINE__, __func__);
@@ -7753,6 +7758,7 @@ mpt2sas_scsih_event_callback(struct MPT2SAS_ADAPTER *ioc, 
u8 msix_index,
fw_event->VP_ID = mpi_reply->VP_ID;
fw_event->event = event;
_scsih_fw_event_add(ioc, fw_event);
+   fw_event_work_put(fw_event);
return;
 }
 
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 6/6] Fix unsafe fw_event_list usage

2015-05-14 Thread Calvin Owens

Since the fw_event deletes itself from the list, cleanup_queue() can
walk onto garbage pointers or walk off into freed memory.

This refactors the code in _scsih_fw_event_cleanup_queue() to not
iterate over the fw_event_list without a lock. 

Signed-off-by: Calvin Owens 
---
 drivers/scsi/mpt2sas/mpt2sas_scsih.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c 
b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index 8d8c814..f504e28 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -2939,6 +2939,23 @@ mpt2sas_port_enable_complete(struct MPT2SAS_ADAPTER *ioc)
fw_event_work_put(fw_event);
 }
 
+static struct fw_event_work *dequeue_next_fw_event(struct MPT2SAS_ADAPTER *ioc)
+{
+   unsigned long flags;
+   struct fw_event_work *fw_event = NULL;
+
+   spin_lock_irqsave(>fw_event_lock, flags);
+   if (!list_empty(>fw_event_list)) {
+   fw_event = list_first_entry(>fw_event_list,
+   struct fw_event_work, list);
+   list_del_init(_event->list);
+   fw_event_work_get(fw_event);
+   }
+   spin_unlock_irqrestore(>fw_event_lock, flags);
+
+   return fw_event;
+}
+
 /**
  * _scsih_fw_event_cleanup_queue - cleanup event queue
  * @ioc: per adapter object
@@ -2951,17 +2968,18 @@ mpt2sas_port_enable_complete(struct MPT2SAS_ADAPTER 
*ioc)
 static void
 _scsih_fw_event_cleanup_queue(struct MPT2SAS_ADAPTER *ioc)
 {
-   struct fw_event_work *fw_event, *next;
+   struct fw_event_work *fw_event;
 
if (list_empty(>fw_event_list) ||
 !ioc->firmware_event_thread || in_interrupt())
return;
 
-   list_for_each_entry_safe(fw_event, next, >fw_event_list, list) {
+   while ((fw_event = dequeue_next_fw_event(ioc))) {
if (cancel_delayed_work_sync(_event->delayed_work)) {
_scsih_fw_event_free(ioc, fw_event);
continue;
}
+   fw_event_work_put(fw_event);
}
 }
 
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/6] Fix unsafe sas_device_list usage

2015-05-14 Thread Calvin Owens

We cannot iterate over the list without holding a lock for the entire
duration, or we risk corrupting random memory if items are added or
deleted as we iterate.

This refactors code such that it always holds the lock when iterating
on or accessing the sas_device_list.

Signed-off-by: Calvin Owens 
---
 drivers/scsi/mpt2sas/mpt2sas_scsih.c | 83 +++-
 1 file changed, 62 insertions(+), 21 deletions(-)

diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c 
b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index ad6ceb7e..9645055 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -7104,6 +7104,7 @@ _scsih_remove_unresponding_sas_devices(struct 
MPT2SAS_ADAPTER *ioc)
struct _raid_device *raid_device, *raid_device_next;
struct list_head tmp_list;
unsigned long flags;
+   LIST_HEAD(head);
 
printk(MPT2SAS_INFO_FMT "removing unresponding devices: start\n",
ioc->name);
@@ -7111,14 +7112,29 @@ _scsih_remove_unresponding_sas_devices(struct 
MPT2SAS_ADAPTER *ioc)
/* removing unresponding end devices */
printk(MPT2SAS_INFO_FMT "removing unresponding devices: end-devices\n",
ioc->name);
+
+   /*
+* Iterate, pulling off devices marked as non-responding. We become the
+* owner for the reference the list had on any object we prune.
+*/
+   spin_lock_irqsave(>sas_device_lock, flags);
list_for_each_entry_safe(sas_device, sas_device_next,
-   >sas_device_list, list) {
+   >sas_device_list, list) {
if (!sas_device->responding)
-   mpt2sas_device_remove_by_sas_address(ioc,
-   sas_device->sas_address);
+   list_move_tail(_device->list, );
else
sas_device->responding = 0;
}
+   spin_unlock_irqrestore(>sas_device_lock, flags);
+
+   /*
+* Now, uninitialize and remove the unresponding devices we pruned.
+*/
+   list_for_each_entry_safe(sas_device, sas_device_next, , list) {
+   _scsih_remove_device(ioc, sas_device);
+   list_del_init(_device->list);
+   sas_device_put(sas_device);
+   }
 
/* removing unresponding volumes */
if (ioc->ir_firmware) {
@@ -8055,6 +8071,37 @@ _scsih_probe_raid(struct MPT2SAS_ADAPTER *ioc)
}
 }
 
+static struct _sas_device *dequeue_next_sas_device(struct MPT2SAS_ADAPTER *ioc)
+{
+   struct _sas_device *sas_device = NULL;
+   unsigned long flags;
+
+   spin_lock_irqsave(>sas_device_lock, flags);
+   if (!list_empty(>sas_device_init_list)) {
+   sas_device = list_first_entry(>sas_device_init_list,
+   struct _sas_device, list);
+   list_del_init(_device->list);
+   }
+   spin_unlock_irqrestore(>sas_device_lock, flags);
+
+   /*
+* If an item was dequeued, the caller now owns the reference that was
+* previously owned by the list
+*/
+   return sas_device;
+}
+
+static void sas_device_make_active(struct MPT2SAS_ADAPTER *ioc,
+   struct _sas_device *sas_device)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(>sas_device_lock, flags);
+   sas_device_get(sas_device);
+   list_add_tail(_device->list, >sas_device_list);
+   spin_unlock_irqrestore(>sas_device_lock, flags);
+}
+
 /**
  * _scsih_probe_sas - reporting sas devices to sas transport
  * @ioc: per adapter object
@@ -8064,34 +8111,28 @@ _scsih_probe_raid(struct MPT2SAS_ADAPTER *ioc)
 static void
 _scsih_probe_sas(struct MPT2SAS_ADAPTER *ioc)
 {
-   struct _sas_device *sas_device, *next;
-   unsigned long flags;
-
-   /* SAS Device List */
-   list_for_each_entry_safe(sas_device, next, >sas_device_init_list,
-   list) {
+   struct _sas_device *sas_device;
 
-   if (ioc->hide_drives)
-   continue;
+   if (ioc->hide_drives)
+   return;
 
+   while ((sas_device = dequeue_next_sas_device(ioc))) {
if (!mpt2sas_transport_port_add(ioc, sas_device->handle,
-   sas_device->sas_address_parent)) {
-   list_del(_device->list);
-   kfree(sas_device);
+   sas_device->sas_address_parent)) {
+   sas_device_put(sas_device);
continue;
} else if (!sas_device->starget) {
if (!ioc->is_driver_loading) {
mpt2sas_transport_port_remove(ioc,
-   sas_device->sas_address,
-   sas_device->sas_address_parent);
-   list_del(_device->list);
-   kfree(sas_device);
+

[PATCH 1/6] Add refcount to sas_device struct

2015-05-14 Thread Calvin Owens

These objects can be referenced concurrently throughout the driver, we
need a way to make sure threads can't delete them out from under each
other.

Signed-off-by: Calvin Owens 
---
 drivers/scsi/mpt2sas/mpt2sas_base.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h 
b/drivers/scsi/mpt2sas/mpt2sas_base.h
index caff8d1..2e7dc33 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_base.h
+++ b/drivers/scsi/mpt2sas/mpt2sas_base.h
@@ -376,8 +376,24 @@ struct _sas_device {
u8  phy;
u8  responding;
u8  pfa_led_on;
+   struct kref refcount;
 };
 
+static inline void sas_device_get(struct _sas_device *s)
+{
+   kref_get(>refcount);
+}
+
+static inline void sas_device_free(struct kref *r)
+{
+   kfree(container_of(r, struct _sas_device, refcount));
+}
+
+static inline void sas_device_put(struct _sas_device *s)
+{
+   kref_put(>refcount, sas_device_free);
+}
+
 /**
  * struct _raid_device - raid volume link list
  * @list: sas device list
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/6] Fixes for memory corruption in mpt2sas

2015-05-14 Thread Calvin Owens

Hello all,

This patchset attempts to address problems we've been having with
panics due to memory corruption from the mpt2sas driver.

I will provide a similar set of fixes for mpt3sas, since we see
similar issues there as well. "Porting" this to mpt3sas will be
trivial since the part of the driver I'm touching is nearly identical
between the two, so I thought it would be simpler to review a patch
against mpt2sas alone at first.

I've tested this for a few days on a big storage box that seemed to be
very susceptible to the panics, and so far it seems to have eliminated
them.

Thanks,
Calvin


Total diffstat:

 drivers/scsi/mpt2sas/mpt2sas_base.h  |  20 +-
 drivers/scsi/mpt2sas/mpt2sas_scsih.c | 482 +--
 drivers/scsi/mpt2sas/mpt2sas_transport.c |  12 +-
 3 files changed, 359 insertions(+), 155 deletions(-)

Patches:

* [PATCH 1/6] Add refcount to sas_device struct
* [PATCH 2/6] Refactor code to use new sas_device refcount
* [PATCH 3/6] Fix unsafe sas_device_list usage
* [PATCH 4/6] Add refcount to fw_event_work struct
* [PATCH 5/6] Refactor code to use new fw_event refcount
* [PATCH 6/6] Fix unsafe fw_event_list usage
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/6] Add refcount to fw_event_work struct

2015-05-14 Thread Calvin Owens

The fw_event_work struct is concurrently referenced at shutdown, so
add a refcount to protect it.

Signed-off-by: Calvin Owens 
---
 drivers/scsi/mpt2sas/mpt2sas_scsih.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c 
b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index 9645055..611b34d 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -176,9 +176,37 @@ struct fw_event_work {
u8  VP_ID;
u8  ignore;
u16 event;
+   struct kref refcount;
charevent_data[0] __aligned(4);
 };
 
+static void fw_event_work_free(struct kref *r)
+{
+   kfree(container_of(r, struct fw_event_work, refcount));
+}
+
+static void fw_event_work_get(struct fw_event_work *fw_work)
+{
+   kref_get(_work->refcount);
+}
+
+static void fw_event_work_put(struct fw_event_work *fw_work)
+{
+   kref_put(_work->refcount, fw_event_work_free);
+}
+
+static struct fw_event_work *alloc_fw_event_work(int len)
+{
+   struct fw_event_work *fw_event;
+
+   fw_event = kzalloc(sizeof(*fw_event) + len, GFP_ATOMIC);
+   if (!fw_event)
+   return NULL;
+
+   kref_init(_event->refcount);
+   return fw_event;
+}
+
 /* raid transport support */
 static struct raid_template *mpt2sas_raid_template;
 
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/6] Refactor code to use new sas_device refcount

2015-05-14 Thread Calvin Owens

This patch refactors the code in the driver to use the new reference
count on the sas_device struct.

Signed-off-by: Calvin Owens 
---
 drivers/scsi/mpt2sas/mpt2sas_base.h  |   4 +-
 drivers/scsi/mpt2sas/mpt2sas_scsih.c | 329 ---
 drivers/scsi/mpt2sas/mpt2sas_transport.c |  12 +-
 3 files changed, 220 insertions(+), 125 deletions(-)

diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h 
b/drivers/scsi/mpt2sas/mpt2sas_base.h
index 2e7dc33..dac0e8a 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_base.h
+++ b/drivers/scsi/mpt2sas/mpt2sas_base.h
@@ -,7 +,9 @@ struct _sas_node 
*mpt2sas_scsih_expander_find_by_handle(struct MPT2SAS_ADAPTER *
 u16 handle);
 struct _sas_node *mpt2sas_scsih_expander_find_by_sas_address(struct 
MPT2SAS_ADAPTER
 *ioc, u64 sas_address);
-struct _sas_device *mpt2sas_scsih_sas_device_find_by_sas_address(
+struct _sas_device *mpt2sas_scsih_sas_device_get_by_sas_address(
+struct MPT2SAS_ADAPTER *ioc, u64 sas_address);
+struct _sas_device *mpt2sas_scsih_sas_device_get_by_sas_address_nolock(
 struct MPT2SAS_ADAPTER *ioc, u64 sas_address);
 
 void mpt2sas_port_enable_complete(struct MPT2SAS_ADAPTER *ioc);
diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c 
b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index 3f26147..ad6ceb7e 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -526,8 +526,31 @@ _scsih_determine_boot_device(struct MPT2SAS_ADAPTER *ioc,
}
 }
 
+struct _sas_device *
+mpt2sas_scsih_sas_device_get_by_sas_address_nolock(struct MPT2SAS_ADAPTER *ioc,
+u64 sas_address)
+{
+   struct _sas_device *sas_device;
+
+   BUG_ON(!spin_is_locked(>sas_device_lock));
+
+   list_for_each_entry(sas_device, >sas_device_list, list)
+   if (sas_device->sas_address == sas_address)
+   goto found_device;
+
+   list_for_each_entry(sas_device, >sas_device_init_list, list)
+   if (sas_device->sas_address == sas_address)
+   goto found_device;
+
+   return NULL;
+
+found_device:
+   sas_device_get(sas_device);
+   return sas_device;
+}
+
 /**
- * mpt2sas_scsih_sas_device_find_by_sas_address - sas device search
+ * mpt2sas_scsih_sas_device_get_by_sas_address - sas device search
  * @ioc: per adapter object
  * @sas_address: sas address
  * Context: Calling function should acquire ioc->sas_device_lock
@@ -536,24 +559,44 @@ _scsih_determine_boot_device(struct MPT2SAS_ADAPTER *ioc,
  * object.
  */
 struct _sas_device *
-mpt2sas_scsih_sas_device_find_by_sas_address(struct MPT2SAS_ADAPTER *ioc,
+mpt2sas_scsih_sas_device_get_by_sas_address(struct MPT2SAS_ADAPTER *ioc,
 u64 sas_address)
 {
struct _sas_device *sas_device;
+   unsigned long flags;
+
+   spin_lock_irqsave(>sas_device_lock, flags);
+   sas_device = mpt2sas_scsih_sas_device_get_by_sas_address_nolock(ioc,
+   sas_address);
+   spin_unlock_irqrestore(>sas_device_lock, flags);
+
+   return sas_device;
+}
+
+static struct _sas_device *
+_scsih_sas_device_get_by_handle_nolock(struct MPT2SAS_ADAPTER *ioc, u16 handle)
+{
+   struct _sas_device *sas_device;
+
+   BUG_ON(!spin_is_locked(>sas_device_lock));
 
list_for_each_entry(sas_device, >sas_device_list, list)
-   if (sas_device->sas_address == sas_address)
-   return sas_device;
+   if (sas_device->handle == handle)
+   goto found_device;
 
list_for_each_entry(sas_device, >sas_device_init_list, list)
-   if (sas_device->sas_address == sas_address)
-   return sas_device;
+   if (sas_device->handle == handle)
+   goto found_device;
 
return NULL;
+
+found_device:
+   sas_device_get(sas_device);
+   return sas_device;
 }
 
 /**
- * _scsih_sas_device_find_by_handle - sas device search
+ * _scsih_sas_device_get_by_handle - sas device search
  * @ioc: per adapter object
  * @handle: sas device handle (assigned by firmware)
  * Context: Calling function should acquire ioc->sas_device_lock
@@ -562,19 +605,16 @@ mpt2sas_scsih_sas_device_find_by_sas_address(struct 
MPT2SAS_ADAPTER *ioc,
  * object.
  */
 static struct _sas_device *
-_scsih_sas_device_find_by_handle(struct MPT2SAS_ADAPTER *ioc, u16 handle)
+_scsih_sas_device_get_by_handle(struct MPT2SAS_ADAPTER *ioc, u16 handle)
 {
struct _sas_device *sas_device;
+   unsigned long flags;
 
-   list_for_each_entry(sas_device, >sas_device_list, list)
-   if (sas_device->handle == handle)
-   return sas_device;
-
-   list_for_each_entry(sas_device, >sas_device_init_list, list)
-   if (sas_device->handle == handle)
-   return sas_device;
+   spin_lock_irqsave(>sas_device_lock, flags);
+   sas_device = _scsih_sas_device_get_by_handle_nolock(ioc, handle);
+

[PATCH] pinctrl: zynq: add static to platform_driver remove callback

2015-05-14 Thread Masahiro Yamada

This function is only referenced in this file.

Signed-off-by: Masahiro Yamada 
---

 drivers/pinctrl/pinctrl-zynq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pinctrl/pinctrl-zynq.c b/drivers/pinctrl/pinctrl-zynq.c
index 22280bd..3d5453a 100644
--- a/drivers/pinctrl/pinctrl-zynq.c
+++ b/drivers/pinctrl/pinctrl-zynq.c
@@ -1149,7 +1149,7 @@ static int zynq_pinctrl_probe(struct platform_device 
*pdev)
return 0;
 }
 
-int zynq_pinctrl_remove(struct platform_device *pdev)
+static int zynq_pinctrl_remove(struct platform_device *pdev)
 {
struct zynq_pinctrl *pctrl = platform_get_drvdata(pdev);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] f2fs crypto: fix incorrect release for crypto ctx

2015-05-14 Thread Chao Yu

When encryption feature is enable, if we rmmod f2fs module,
we will encounter a stack backtrace reported in syslog:

"BUG: Bad page state in process rmmod  pfn:aaf8a
page:f0f4f148 count:0 mapcount:129 mapping:ee2f4104 index:0x80
flags: 0xee2830a4(referenced|lru|slab|private_2|writeback|swapbacked|mlocked)
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags:
flags: 0x2030a0(lru|slab|private_2|writeback|mlocked)
Modules linked in: f2fs(O-) fuse bnep rfcomm bluetooth dm_crypt binfmt_misc 
snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm
snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device 
joydev ppdev mac_hid lp hid_generic i2c_piix4
parport_pc psmouse snd serio_raw parport soundcore ext4 jbd2 mbcache usbhid hid 
e1000 [last unloaded: f2fs]
CPU: 1 PID: 3049 Comm: rmmod Tainted: GB  O4.1.0-rc3+ #10
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
  c0021eb4 c15b7518 f0f4f148 c0021ed8 c112e0b7 c1779174
c9b75674 000aaf8a 01b13ce1 c17791a4 f0f4f148 ee2830a4 c0021ef8 c112e3c3
 f0f4f148 c0021f34 f0f4f148 ee2830a4 ef9f c0021f20 c112fdf8
Call Trace:
[] dump_stack+0x41/0x52
[] bad_page.part.72+0xa7/0x100
[] free_pages_prepare+0x213/0x220
[] free_hot_cold_page+0x28/0x120
[] ? try_to_wake_up+0x2b0/0x2b0
[] __free_pages+0x25/0x30
[] mempool_free_pages+0xd/0x10
[] mempool_free+0x31/0x90
[] f2fs_exit_crypto+0x6f/0xf0 [f2fs]
[] exit_f2fs_fs+0x23/0x95f [f2fs]
[] SyS_delete_module+0x130/0x180
[] ? vm_munmap+0x46/0x60
[] sysenter_do_call+0x12/0x12"

The reason is that:

since commit 0827e645fd35
("f2fs crypto: shrink size of the f2fs_crypto_ctx structure") is merged,
some fields in f2fs_crypto_ctx structure are merged into a union as they
will never be used simultaneously in write path, read path or on free list.

In f2fs_exit_crypto, we traverse each crypto ctx from free list, in this
moment, our free_list field in union is valid, but still we will try to
release memory space which is pointed by other invalid field in union
structure for each ctx.

Then the error occurs, let's fix it with this patch.

Signed-off-by: Chao Yu 
---
 fs/f2fs/crypto.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/fs/f2fs/crypto.c b/fs/f2fs/crypto.c
index e36eddd..0a66c9b 100644
--- a/fs/f2fs/crypto.c
+++ b/fs/f2fs/crypto.c
@@ -233,14 +233,6 @@ void f2fs_exit_crypto(void)
struct f2fs_crypto_ctx *pos, *n;
 
list_for_each_entry_safe(pos, n, _free_crypto_ctxs, free_list) {
-   if (pos->w.bounce_page) {
-   if (pos->flags &
-   F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL)
-   __free_page(pos->w.bounce_page);
-   else
-   mempool_free(pos->w.bounce_page,
-   f2fs_bounce_page_pool);
-   }
if (pos->tfm)
crypto_free_tfm(pos->tfm);
kmem_cache_free(f2fs_crypto_ctx_cachep, pos);
-- 
2.3.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [FYI] tux3: Core changes

2015-05-14 Thread Rik van Riel

On 05/14/2015 08:06 PM, Daniel Phillips wrote:
> Hi Rik,
> 
> Added Mel, Andrea and Peterz to CC as interested parties. There are
> probably others, please just jump in.
> 
> On 05/14/2015 05:59 AM, Rik van Riel wrote:
>> On 05/14/2015 04:26 AM, Daniel Phillips wrote:
>>> Hi Rik,
>>>
>>> Our linux-tux3 tree currently currently carries this 652 line diff
>>> against core, to make Tux3 work. This is mainly by Hirofumi, except
>>> the fs-writeback.c hook, which is by me. The main part you may be
>>> interested in is rmap.c, which addresses the issues raised at the
>>> 2013 Linux Storage Filesystem and MM Summit 2015 in San Francisco.[1]
>>>
>>>LSFMM: Page forking
>>>http://lwn.net/Articles/548091/
>>>
>>> This is just a FYI. An upcoming Tux3 report will be a tour of the page
>>> forking design and implementation. For now, this is just to give a
>>> general sense of what we have done. We heard there are concerns about
>>> how ptrace will work. I really am not familiar with the issue, could
>>> you please explain what you were thinking of there?
>>
>> The issue is that things like ptrace, AIO, infiniband
>> RDMA, and other direct memory access subsystems can take
>> a reference to page A, which Tux3 clones into a new page B
>> when the process writes it.
>>
>> However, while the process now points at page B, ptrace,
>> AIO, infiniband, etc will still be pointing at page A.
>>
>> This causes the process and the other subsystem to each
>> look at a different page, instead of at shared state,
>> causing ptrace to do nothing, AIO and RDMA data to be
>> invisible (or corrupted), etc...
> 
> Is this a bit like page migration?

Yes. Page migration will fail if there is an "extra"
reference to the page that is not accounted for by
the migration code.

Only pages that have no extra refcount can be migrated.

Similarly, your cow code needs to fail if there is an
extra reference count pinning the page. As long as
the page has a user that you cannot migrate, you cannot
move any of the other users over. They may rely on data
written by the hidden-to-you user, and the hidden-to-you
user may write to the page when you think it is a read
only stable snapshot.

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

INFORMATIVE LETTER !!

2015-05-14 Thread ilianebettencourt2

Hello,

I am Liliane Bettencourt,confirm that you got this email as soon as you read 
it, you can read about me on:http://en.wikipedia.org/wiki/Liliane_Bettencourt
I write to you because I intend to give to you a portion of my Net-worth which 
I have been banking. I want to cede it out as gift hoping it would be of help 
to you and others too. Respond for confirmation.

With love,
Liliane Bettencourt.
Email : lilianebettencou...@hotmail.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

god dag

2015-05-14 Thread Santander Consumer Finance

god dag

Dette er Santander Consumer Finance tilbyr vi sikret og usikret personlige lån 
som hjelper i forretnings dine investeringer og hjem behov og budsjett med en 
lav rente på 3% og konsentrasjoner på opp til 5000 i både dollar / euro og 
10,000.000.00 dollar / euro. Kontakt oss ved å fylle ut skjemaet nedenfor hvis 
du finne våre lånetilbud inetresting

Fullt navn:
Lånebeløp trengs:
Lån varighet:
Formålet med lånet:
Snakker du engelsk?

En rask respons er nødvendig


Rose Bakka

prosessering sjef
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] numa,sched: reduce conflict between fbq_classify_rq and migration

2015-05-14 Thread Rik van Riel

It is possible for fbq_classify_rq to indicate that a CPU has tasks that
should be moved to another NUMA node, but for migrate_improves_locality
and migrate_degrades_locality to not identify those tasks.

This patch always gives preference to preferred node evaluations, and
only checks the number of faults when evaluating moves between two
non-preferred nodes on a larger NUMA system.

On a two node system, the number of faults is never evaluated. Either
a task is about to be pulled off its preferred node, or migrated onto
it.

Signed-off-by: Rik van Riel 
---
 kernel/sched/fair.c | 60 +
 1 file changed, 33 insertions(+), 27 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ffeaa4105e48..9c9f225b14fc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5467,10 +5467,15 @@ static int task_hot(struct task_struct *p, struct 
lb_env *env)
 }
 
 #ifdef CONFIG_NUMA_BALANCING
-/* Returns true if the destination node has incurred more faults */
+/*
+ * Returns true if the destination node is the preferred node.
+ * Needs to match fbq_classify_rq: if there is a runnable task
+ * that is not on its preferred node, we should identify it.
+ */
 static bool migrate_improves_locality(struct task_struct *p, struct lb_env 
*env)
 {
struct numa_group *numa_group = rcu_dereference(p->numa_group);
+   unsigned long src_faults, dst_faults;
int src_nid, dst_nid;
 
if (!sched_feat(NUMA_FAVOUR_HIGHER) || !p->numa_faults ||
@@ -5484,29 +5489,30 @@ static bool migrate_improves_locality(struct 
task_struct *p, struct lb_env *env)
if (src_nid == dst_nid)
return false;
 
-   if (numa_group) {
-   /* Task is already in the group's interleave set. */
-   if (node_isset(src_nid, numa_group->active_nodes))
-   return false;
-
-   /* Task is moving into the group's interleave set. */
-   if (node_isset(dst_nid, numa_group->active_nodes))
-   return true;
-
-   return group_faults(p, dst_nid) > group_faults(p, src_nid);
-   }
-
/* Encourage migration to the preferred node. */
if (dst_nid == p->numa_preferred_nid)
return true;
 
-   return task_faults(p, dst_nid) > task_faults(p, src_nid);
+   /* Migrating away from the preferred node is bad. */
+   if (src_nid == p->numa_preferred_nid)
+   return false;
+
+   if (numa_group) {
+   src_faults = group_faults(p, src_nid);
+   dst_faults = group_faults(p, dst_nid);
+   } else {
+   src_faults = task_faults(p, src_nid);
+   dst_faults = task_faults(p, dst_nid);
+   }
+
+   return dst_faults > src_faults;
 }
 
 
 static bool migrate_degrades_locality(struct task_struct *p, struct lb_env 
*env)
 {
struct numa_group *numa_group = rcu_dereference(p->numa_group);
+   unsigned long src_faults, dst_faults;
int src_nid, dst_nid;
 
if (!sched_feat(NUMA) || !sched_feat(NUMA_RESIST_LOWER))
@@ -5521,23 +5527,23 @@ static bool migrate_degrades_locality(struct 
task_struct *p, struct lb_env *env)
if (src_nid == dst_nid)
return false;
 
-   if (numa_group) {
-   /* Task is moving within/into the group's interleave set. */
-   if (node_isset(dst_nid, numa_group->active_nodes))
-   return false;
+   /* Migrating away from the preferred node is bad. */
+   if (src_nid == p->numa_preferred_nid)
+   return true;
 
-   /* Task is moving out of the group's interleave set. */
-   if (node_isset(src_nid, numa_group->active_nodes))
-   return true;
+   /* Encourage migration to the preferred node. */
+   if (dst_nid == p->numa_preferred_nid)
+   return false;
 
-   return group_faults(p, dst_nid) < group_faults(p, src_nid);
+   if (numa_group) {
+   src_faults = group_faults(p, src_nid);
+   dst_faults = group_faults(p, dst_nid);
+   } else {
+   src_faults = task_faults(p, src_nid);
+   dst_faults = task_faults(p, dst_nid);
}
 
-   /* Migrating away from the preferred node is always bad. */
-   if (src_nid == p->numa_preferred_nid)
-   return true;
-
-   return task_faults(p, dst_nid) < task_faults(p, src_nid);
+   return dst_faults < src_faults;
 }
 
 #else
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][linux-next] coda: Do not define TRACE_SYSTEM_STRING

2015-05-14 Thread Steven Rostedt

On Thu, 14 May 2015 22:52:50 -0400
Steven Rostedt  wrote:

> TRACE_SYSTEM_STRING is used internally to the TRACE_EVENT system. It
> should not be defined by tracepoint files. I'm not even sure who
> started that (I hope it wasn't me!)

According to git history, that define first showed up in

commit 1c5d22f76dc721f3acb7a3dadc657a221e487fb7
Author: Chris Wilson 
Date:   Tue Aug 25 11:15:50 2009 +0100

drm/i915: Add tracepoints

And it must have spread with cut and paste. Now I need to start
administering a vaccine to keep developers from catching it!

-- Steve


> 
> Reported-by: kbuild test robot 
> Signed-off-by: Steven Rostedt  ---
> diff --git a/drivers/media/platform/coda/trace.h 
> b/drivers/media/platform/coda/trace.h
> index d1d06cbd1f6a..781bf7286d53 100644
> --- a/drivers/media/platform/coda/trace.h
> +++ b/drivers/media/platform/coda/trace.h
> @@ -9,8 +9,6 @@
>  
>  #include "coda.h"
>  
> -#define TRACE_SYSTEM_STRING __stringify(TRACE_SYSTEM)
> -
>  TRACE_EVENT(coda_bit_run,
>   TP_PROTO(struct coda_ctx *ctx, int cmd),
>  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH][linux-next] coda: Do not define TRACE_SYSTEM_STRING

2015-05-14 Thread Steven Rostedt

TRACE_SYSTEM_STRING is used internally to the TRACE_EVENT system. It
should not be defined by tracepoint files. I'm not even sure who
started that (I hope it wasn't me!)

Reported-by: kbuild test robot 
Signed-off-by: Steven Rostedt http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

2015-05-14 Thread Al Viro

On Thu, May 14, 2015 at 07:18:16PM -0700, Linus Torvalds wrote:
> The only difference - EVER - would be if you pass in the ICASE flag.
> Nothing I suggested would change semantics without it (the _hash_
> changes, but that doesn't change semantics, it's a purely internal
> random number).
> 
> Now, *with* O_ICASE/AT_ICASE, semantics change. Obviously. At that
> point the dentry lookup would match case-insensitively.
> 
> For example, let's say that you have a directory where you already
> have both "Blah" and "blah", because you created them in a sane
> environment. They'll be two different dentries (assuming they are
> cached), but they'll have the same dentry hash.
> 
> Now, you open "blah" with O_ICASE, and the end result is that you
> would randomly open one or the other (it would be the one you find
> first on the hash chain). Tough. Mixing icase and case-insensitive is
> by definition going to cause those kinds of issues.

With c-i mount, unpacking a tarball with tar(1) will not (silently, at that)
create a situaiton when your c-i users will get lookups proceeding into
randomly picked variant of directory, with variants differing only in
case.  It will do the right thing and put the files where they would be
expected, giving an expected "it already exists" if you try to create
a directory with the name that matches that of existing file, etc.
With this, OTOH, you'll have to use specialized tools for creating
files in that tree, or risk random lossage, because creating /mnt/foo/bar
when /mnt/Foo/Bugger already existed will succeed just fine, leaving one
hell of a mess for c-i users.

What's the benefit compared to c-i mount?  Not hitting filesystem's
->d_hash() and ->d_compare()?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net-next,v2,1/1] hv_netvsc: use per_cpu stats to calculate TX/RX data

2015-05-14 Thread David Miller

From: Simon Xiao 
Date: Thu, 14 May 2015 01:00:25 -0700

> Current code does not lock anything when calculating the TX and RX stats.
> As a result, the RX and TX data reported by ifconfig are not accuracy in a
> system with high network throughput and multiple CPUs (in my test,
> RX/TX = 83% between 2 HyperV VM nodes which have 8 vCPUs and 40G Ethernet).
> 
> This patch fixed the above issue by using per_cpu stats.
> netvsc_get_stats64() summarizes TX and RX data by iterating over all CPUs
> to get their respective stats.
> 
> This v2 patch addressed David's comments on the cleanup path when
> netdev_alloc_pcpu_stats() failed.
> 
> Signed-off-by: Simon Xiao 
> Reviewed-by: K. Y. Srinivasan 
> Reviewed-by: Haiyang Zhang 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/2] cpufreq_stats: Adds sysfs file /sys/devices/system/cpu/cpufreq/current_in_state

2015-05-14 Thread Viresh Kumar

I am not replying for concept here, as sched maintainers are in a
better position for that, but a nit below..

On 14-05-15, 17:12, Ruchi Kandoi wrote:
> Adds the sysfs file for userspace to initialize the active current
> values for all the cores at each of the frequencies.
> 
> The format for storing the values is as follows:
> echo "CPU:= =,CPU:
> ..." > /sys/devices/system/cpu/cpufreq/current_in_state

Why this file? And not
/sys/devices/system/cpu/cpuX/cpufreq/stats/current_in_state ? That way
you don't have to replicate the same information for all CPUs, as the
stats folder can be shared by multiple CPUs (which share their
clock/voltage rails)..

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH V2 1/2] x86/cpu hotplug: make apicid <--> cpuid mapping persistent

2015-05-14 Thread Gu Zheng

Hi Ishimatsu,

On 05/15/2015 12:44 AM, Yasuaki Ishimatsu wrote:

> Hi Gu,
> 
> Before 8 months, I posted the following patch to relate
> cpuid to apicid.
> 
> https://lkml.org/lkml/2014/9/3/1120
> 
> Could you try this patch?


Thanks for your reminder.
It seems similar to the https://lkml.org/lkml/2015/3/25/989
"[PATCH 0/2] workqueue: fix a bug when numa mapping is changed",
though it also can fix the issue, but it seems not the perfect
solution, because self-maintain cpumask mapping (or something
like this) is very common in kernel.
As TJ and Kame suggested, it is available to build the mapping
for all the possible cpus at boot, so that we can ignore the
effect of cpu/node hotplug, especially for per cpu cases.

Regards,
Gu

> 
> Thanks,
> Yasuaki Ishimatsu
> 
> On Thu, 14 May 2015 19:33:33 +0800
> Gu Zheng  wrote:
> 
>> Yasuaki Ishimatsu found that with node online/offline, cpu<->node
>> relationship is established. Because workqueue uses a info which
>> was established at boot time, but it may be changed by node hotpluging.
>>
>> Once pool->node points to a stale node, following allocation failure
>> happens.
>>   ==
>>  SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
>>   cache: kmalloc-192, object size: 192, buffer size: 192, default
>> order:
>> 1, min order: 0
>>   node 0: slabs: 6172, objs: 259224, free: 245741
>>   node 1: slabs: 3261, objs: 136962, free: 127656
>>   ==
>>
>> As the apicid <---> pxm and pxm <--> node relationship are persistent, then
>> the apicid <--> node mapping is persistent, so the root cause is the
>> cpu-id <-> lapicid mapping is not persistent (because the currently
>> implementation always choose the first free cpu id for the new added cpu).
>> If we can build persistent cpu-id <-> lapicid relationship, this problem
>> will be fixed.
>>
>> This patch tries to build the whole world mapping cpuid <-> apicid <-> pxm 
>> <-> node
>> for all possible processor at the boot, the detail implementation are 2 
>> steps:
>>
>> Step1: generate a logic cpu id for all the local apic (both enabled and 
>> dsiabled)
>>when register local apic
>> Step2: map the cpu to the phyical node via an additional acpi ns walk for 
>> processor.
>>
>> Please refer to:
>> https://lkml.org/lkml/2015/2/27/145
>> https://lkml.org/lkml/2015/3/25/989
>> for the previous discussion.
>> ---
>>  V2: rebase on latest upstream.
>> ---
>>
>> Signed-off-by: Gu Zheng 
>> ---
>>  arch/ia64/kernel/acpi.c   |   2 +-
>>  arch/x86/include/asm/mpspec.h |   1 +
>>  arch/x86/kernel/acpi/boot.c   |   8 ++-
>>  arch/x86/kernel/apic/apic.c   |  73 -
>>  arch/x86/mm/numa.c|  20 ---
>>  drivers/acpi/acpi_processor.c |   2 +-
>>  drivers/acpi/bus.c|   3 ++
>>  drivers/acpi/processor_core.c | 121 
>> ++
>>  include/linux/acpi.h  |   2 +
>>  9 files changed, 172 insertions(+), 60 deletions(-)
>>
>> diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
>> index b1698bc..7db5563 100644
>> --- a/arch/ia64/kernel/acpi.c
>> +++ b/arch/ia64/kernel/acpi.c
>> @@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
>>   *  ACPI based hotplug CPU support
>>   */
>>  #ifdef CONFIG_ACPI_HOTPLUG_CPU
>> -static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>> +int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>>  {
>>  #ifdef CONFIG_ACPI_NUMA
>>  /*
>> diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
>> index b07233b..db902d8 100644
>> --- a/arch/x86/include/asm/mpspec.h
>> +++ b/arch/x86/include/asm/mpspec.h
>> @@ -86,6 +86,7 @@ static inline void early_reserve_e820_mpc_new(void) { }
>>  #endif
>>  
>>  int generic_processor_info(int apicid, int version);
>> +int __generic_processor_info(int apicid, int version, bool enabled);
>>  
>>  #define PHYSID_ARRAY_SIZE   BITS_TO_LONGS(MAX_LOCAL_APIC)
>>  
>> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
>> index dbe76a1..c79115b 100644
>> --- a/arch/x86/kernel/acpi/boot.c
>> +++ b/arch/x86/kernel/acpi/boot.c
>> @@ -174,15 +174,13 @@ static int acpi_register_lapic(int id, u8 enabled)
>>  return -EINVAL;
>>  }
>>  
>> -if (!enabled) {
>> +if (!enabled)
>>  ++disabled_cpus;
>> -return -EINVAL;
>> -}
>>  
>>  if (boot_cpu_physical_apicid != -1U)
>>  ver = apic_version[boot_cpu_physical_apicid];
>>  
>> -return generic_processor_info(id, ver);
>> +return __generic_processor_info(id, ver, enabled);
>>  }
>>  
>>  static int __init
>> @@ -726,7 +724,7 @@ static void __init acpi_set_irq_model_ioapic(void)
>>  #ifdef CONFIG_ACPI_HOTPLUG_CPU
>>  #include 
>>  
>> -static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>> +void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>>  {
>>  #ifdef CONFIG_ACPI_NUMA
>>  int nid;
>> diff --git

Re: [RFC 21/23] net/xen-netback: Make it running on 64KB page granularity

2015-05-14 Thread Wei Liu

On Thu, May 14, 2015 at 06:01:01PM +0100, Julien Grall wrote:
> The PV network protocol is using 4KB page granularity. The goal of this
> patch is to allow a Linux using 64KB page granularity working as a
> network backend on a non-modified Xen.
> 
> It's only necessary to adapt the ring size and break skb data in small
> chunk of 4KB. The rest of the code is relying on the grant table code.
> 
> Although only simple workload is working (dhcp request, ping). If I try
> to use wget in the guest, it will stall until a tcpdump is started on
> the vif interface in DOM0. I wasn't able to find why.
> 

I think in wget workload you're more likely to break down 64K pages to
4K pages. Some of your calculation of mfn, offset might be wrong.

> I have not modified XEN_NETBK_RX_SLOTS_MAX because I wasn't sure what
> it's used for (I have limited knowledge on the network driver).
> 

This is the maximum slots a guest packet can use. AIUI the protocol
still works on 4K granularity (you break 64K page to a bunch of 4K
pages), you don't need to change this.

> Signed-off-by: Julien Grall 
> Cc: Ian Campbell 
> Cc: Wei Liu 
> Cc: net...@vger.kernel.org
> 
> ---
> 
> Improvement such as support of 64KB grant is not taken into
> consideration in this patch because we have the requirement to run a
> Linux using 64KB pages on a non-modified Xen.
> ---
>  drivers/net/xen-netback/common.h  |  7 ---
>  drivers/net/xen-netback/netback.c | 27 ++-
>  2 files changed, 18 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/common.h 
> b/drivers/net/xen-netback/common.h
> index 8a495b3..0eda6e9 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -44,6 +44,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  typedef unsigned int pending_ring_idx_t;
> @@ -64,8 +65,8 @@ struct pending_tx_info {
>   struct ubuf_info callback_struct;
>  };
>  
> -#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
> -#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
> +#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
> +#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
>  
>  struct xenvif_rx_meta {
>   int id;
> @@ -80,7 +81,7 @@ struct xenvif_rx_meta {
>  /* Discriminate from any valid pending_idx value. */
>  #define INVALID_PENDING_IDX 0x
>  
> -#define MAX_BUFFER_OFFSET PAGE_SIZE
> +#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE
>  
>  #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE
>  
> diff --git a/drivers/net/xen-netback/netback.c 
> b/drivers/net/xen-netback/netback.c
> index 9ae1d43..ea5ce84 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -274,7 +274,7 @@ static void xenvif_gop_frag_copy(struct xenvif_queue 
> *queue, struct sk_buff *skb
>  {
>   struct gnttab_copy *copy_gop;
>   struct xenvif_rx_meta *meta;
> - unsigned long bytes;
> + unsigned long bytes, off_grant;
>   int gso_type = XEN_NETIF_GSO_TYPE_NONE;
>  
>   /* Data must not cross a page boundary. */
> @@ -295,7 +295,8 @@ static void xenvif_gop_frag_copy(struct xenvif_queue 
> *queue, struct sk_buff *skb
>   if (npo->copy_off == MAX_BUFFER_OFFSET)
>   meta = get_next_rx_buffer(queue, npo);
>  
> - bytes = PAGE_SIZE - offset;
> + off_grant = offset & ~XEN_PAGE_MASK;
> + bytes = XEN_PAGE_SIZE - off_grant;
>   if (bytes > size)
>   bytes = size;
>  
> @@ -314,9 +315,9 @@ static void xenvif_gop_frag_copy(struct xenvif_queue 
> *queue, struct sk_buff *skb
>   } else {
>   copy_gop->source.domid = DOMID_SELF;
>   copy_gop->source.u.gmfn =
> - virt_to_mfn(page_address(page));
> + virt_to_mfn(page_address(page) + offset);
>   }
> - copy_gop->source.offset = offset;
> + copy_gop->source.offset = off_grant;
>  
>   copy_gop->dest.domid = queue->vif->domid;
>   copy_gop->dest.offset = npo->copy_off;
> @@ -747,7 +748,7 @@ static int xenvif_count_requests(struct xenvif_queue 
> *queue,
>   first->size -= txp->size;
>   slots++;
>  
> - if (unlikely((txp->offset + txp->size) > PAGE_SIZE)) {
> + if (unlikely((txp->offset + txp->size) > XEN_PAGE_SIZE)) {
>   netdev_err(queue->vif->dev, "Cross page boundary, 
> txp->offset: %x, size: %u\n",
>txp->offset, txp->size);
>   xenvif_fatal_tx_err(queue->vif);
> @@ -1241,11 +1242,11 @@ static void xenvif_tx_build_gops(struct xenvif_queue 
> *queue,
>   }
>  
>   /* No crossing a page as the payload mustn't fragment. */
> - if (unlikely((txreq.offset +

Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

2015-05-14 Thread Richard Guy Briggs

On 15/05/14, Paul Moore wrote:
> On Thursday, May 14, 2015 10:57:14 AM Steve Grubb wrote:
> > On Tuesday, May 12, 2015 03:57:59 PM Richard Guy Briggs wrote:
> > > On 15/05/05, Steve Grubb wrote:
> > > > I think there needs to be some more discussion around this. It seems
> > > > like this is not exactly recording things that are useful for audit.
> > > 
> > > It seems to me that either audit has to assemble that information, or
> > > the kernel has to do so.  The kernel doesn't know about containers
> > > (yet?).
> > 
> > Auditing is something that has a lot of requirements imposed on it by
> > security standards. There was no requirement to have an auid until audit
> > came along and said that uid is not good enough to know who is issuing
> > commands because of su or sudo. There was no requirement for sessionid
> > until we had to track each action back to a login so we could see if the
> > login came from the expected place.
> > 
> > What I am saying is we have the same situation. Audit needs to track a
> > container and we need an ID. The information that is being logged is not
> > useful for auditing. Maybe someone wants that info in syslog, but I doubt
> > it. The audit trail's purpose is to allow a security officer to reconstruct
> > the events to determine what happened during some security incident.
> 
> As Eric, and others, have stated, the container concept is a userspace idea, 
> not a kernel idea; the kernel only knows, and cares about, namespaces.  This 
> is unlikely to change.
> 
> However, as Steve points out, there is precedence for the kernel to record 
> userspace tokens for the sake of audit.  Personally I'm not a big fan of this 
> in general, but I do recognize that it does satisfy a legitimate need.  Think 
> of things like auid and the sessionid as necessary evils; audit is already 
> chock full of evilness I doubt one more will doom us all to hell.
> 
> Moving forward, I'd like to see the following:
> 
> * Record the creation/removal/mgmt of the individual namespaces as Richard's 
> patchset currently does.  However, I'd suggest using an explicit namespace 
> value for the init namespace instead of the "unset" value in the V6 patchset 
> (my apologies if you've already changed this Richard, I haven't looked at V7 
> yet).

The "unset" (none) value is only there before the first namespaces have
been created.  After that, any new ones are created relative to the init
namespace of that type.

> * Create a container ID token (unsigned 32-bit integer?), similar to 
> auid/sessionid, that is set by userspace and carried by the kernel to be used 
> in audit records.  I'd like to see some discussion on how we manage this, 
> e.g. 
> how do handle container ID inheritance, how do we handle nested containers 
> (setting the containerid when it is already set), do we care if multiple 
> different containers share the same namespace config, etc.?

(Addressed in another reply.)  Nested will need some careful thought...

> * When userspace sets the container ID, emit a new audit record with the 
> associated namespace tokens and the container ID.

That was the goal of AUDIT_VIRT_CONTROL or AUDIT_NS_INFO messages from
userspace into the kernel.

> * Look at our existing audit records to determine which records should have 
> namespace and container ID tokens added.  We may only want to add the 
> additional fields in the case where the namespace/container ID tokens are not 
> the init namespace.

If we have a record that ties a set of namespace IDs with a container
ID, then I expect we only need to list the containerID along with auid
and sessionID.

> Can we all live with this?  If not, please suggest some alternate ideas; 
> simply shouting "IT'S ALL CRAP!" isn't helpful for anyone ... it may be true, 
> but it doesn't help us solve the problem ;)

Thanks Paul.

> paul moore

- RGB

--
Richard Guy Briggs 
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red 
Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] net/mlx4: Avoid 'may be used uninitialized' warnings

2015-05-14 Thread David Miller

From: Bjorn Helgaas 
Date: Thu, 14 May 2015 18:17:08 -0500

> With a cross-compiler based on gcc-4.9, I see warnings like the following:
> 
>   drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 
> 'mlx4_SW2HW_CQ_wrapper':
>   drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:3048:10: error: 'cq' 
> may be used uninitialized in this function [-Werror=maybe-uninitialized]
> cq->mtt = mtt;
> 
> I think the warning is spurious because we only use cq when
> cq_res_start_move_to() returns zero, and it always initializes *cq in that
> case.  The srq case is similar.  But maybe gcc isn't smart enough to figure
> that out.
> 
> Initialize cq and srq explicitly to avoid the warnings.
> 
> Signed-off-by: Bjorn Helgaas 

Applied.

The compiler, generally, is not good at determining use-before-initialized
in situations of the form:

int x;

if (foo)
x = whatever;
...
if (foo)
use(x);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 2/2] coredump: add __printf attribute to cn_*printf functions

2015-05-14 Thread Nicolas Iooss

This allows detecting improper format string at build time, like:

  fs/coredump.c:225:5: warning: format '%ld' expects argument of type
  'long int', but argument 3 has type 'int' [-Wformat=]
   err = cn_printf(cn, "%ld", cprm->siginfo->si_signo);
   ^

As si_signo is always an int, the format should be %d here.

Signed-off-by: Nicolas Iooss 
---
 fs/coredump.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 833a57bc856c..e52e0064feac 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -70,7 +70,8 @@ static int expand_corename(struct core_name *cn, int size)
return 0;
 }
 
-static int cn_vprintf(struct core_name *cn, const char *fmt, va_list arg)
+static __printf(2, 0) int cn_vprintf(struct core_name *cn, const char *fmt,
+va_list arg)
 {
int free, need;
va_list arg_copy;
@@ -93,7 +94,7 @@ again:
return -ENOMEM;
 }
 
-static int cn_printf(struct core_name *cn, const char *fmt, ...)
+static __printf(2, 3) int cn_printf(struct core_name *cn, const char *fmt, ...)
 {
va_list arg;
int ret;
@@ -105,7 +106,8 @@ static int cn_printf(struct core_name *cn, const char *fmt, 
...)
return ret;
 }
 
-static int cn_esc_printf(struct core_name *cn, const char *fmt, ...)
+static __printf(2, 3)
+int cn_esc_printf(struct core_name *cn, const char *fmt, ...)
 {
int cur = cn->used;
va_list arg;
@@ -225,7 +227,8 @@ static int format_corename(struct core_name *cn, struct 
coredump_params *cprm)
break;
/* signal that caused the coredump */
case 's':
-   err = cn_printf(cn, "%ld", 
cprm->siginfo->si_signo);
+   err = cn_printf(cn, "%d",
+   cprm->siginfo->si_signo);
break;
/* UNIX time of coredump */
case 't': {
-- 
2.4.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 1/2] coredump: use from_kuid/kgid when formatting corename

2015-05-14 Thread Nicolas Iooss

When adding __printf attribute to cn_printf, gcc reports some issues:

  fs/coredump.c:213:5: warning: format '%d' expects argument of type
  'int', but argument 3 has type 'kuid_t' [-Wformat=]
   err = cn_printf(cn, "%d", cred->uid);
   ^
  fs/coredump.c:217:5: warning: format '%d' expects argument of type
  'int', but argument 3 has type 'kgid_t' [-Wformat=]
   err = cn_printf(cn, "%d", cred->gid);
   ^

These warnings come from the fact that the value of uid/gid needs to be
extracted from the kuid_t/kgid_t structure before being used as an
integer.  More precisely, cred->uid and cred->gid need to be converted
to either user-namespace uid/gid or to init_user_ns uid/gid.

Use init_user_ns in order not to break existing ABI, and document this
in Documentation/sysctl/kernel.txt.

While at it, format uid and gid values with %u instead of %d because
uid_t/__kernel_uid32_t and gid_t/__kernel_gid32_t are unsigned int.

Signed-off-by: Nicolas Iooss 
---
 Documentation/sysctl/kernel.txt | 4 ++--
 fs/coredump.c   | 8 ++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index c831001c45f1..e1913f78f21e 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -197,8 +197,8 @@ core_pattern is used to specify a core dumpfile pattern 
name.
%P  global pid (init PID namespace)
%i  tid
%I  global tid (init PID namespace)
-   %u  uid
-   %g  gid
+   %u  uid (in initial user namespace)
+   %g  gid (in initial user namespace)
%d  dump mode, matches PR_SET_DUMPABLE and
/proc/sys/fs/suid_dumpable
%s  signal number
diff --git a/fs/coredump.c b/fs/coredump.c
index bbbe139ab280..833a57bc856c 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -209,11 +209,15 @@ static int format_corename(struct core_name *cn, struct 
coredump_params *cprm)
break;
/* uid */
case 'u':
-   err = cn_printf(cn, "%d", cred->uid);
+   err = cn_printf(cn, "%u",
+   from_kuid(_user_ns,
+ cred->uid));
break;
/* gid */
case 'g':
-   err = cn_printf(cn, "%d", cred->gid);
+   err = cn_printf(cn, "%u",
+   from_kgid(_user_ns,
+ cred->gid));
break;
case 'd':
err = cn_printf(cn, "%d",
-- 
2.4.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tools perf: set vmlinux_path__nr_entries to 0 in vmlinux_path__exit.

2015-05-14 Thread Namhyung Kim

Hello,

On Thu, May 14, 2015 at 12:22:30PM +, Wang Nan wrote:
> Original vmlinux_path__exit() doesn't revert vmlinux_path__nr_entries
> to its original state. After the while loop vmlinux_path__nr_entries
> becomes -1 instead of 0. This makes a problem that, if runs twice,
> during the second run vmlinux_path__init() will set vmlinux_path[-1]
> to strdup("vmlinux"), corrupts random memory.
> 
> This patch reset vmlinux_path__nr_entries to 0 after the while loop.
> 
> Signed-off-by: Wang Nan 

Acked-by: Namhyung Kim 

Thanks,
Namhyung


> ---
>  tools/perf/util/symbol.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index 201f6c4c..451777f 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -1802,6 +1802,7 @@ static void vmlinux_path__exit(void)
>  {
>   while (--vmlinux_path__nr_entries >= 0)
>   zfree(_path[vmlinux_path__nr_entries]);
> + vmlinux_path__nr_entries = 0;
>  
>   zfree(_path);
>  }
> -- 
> 1.8.3.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

2015-05-14 Thread Richard Guy Briggs

On 15/05/14, Eric W. Biederman wrote:
> Paul Moore  writes:
> > As Eric, and others, have stated, the container concept is a userspace 
> > idea, 
> > not a kernel idea; the kernel only knows, and cares about, namespaces.  
> > This 
> > is unlikely to change.
> >
> > However, as Steve points out, there is precedence for the kernel to record 
> > userspace tokens for the sake of audit.  Personally I'm not a big fan of 
> > this 
> > in general, but I do recognize that it does satisfy a legitimate need.  
> > Think 
> > of things like auid and the sessionid as necessary evils; audit is already 
> > chock full of evilness I doubt one more will doom us all to hell.
> >
> > Moving forward, I'd like to see the following:
> 
> > * Create a container ID token (unsigned 32-bit integer?), similar to 
> > auid/sessionid, that is set by userspace and carried by the kernel to be 
> > used 
> > in audit records.  I'd like to see some discussion on how we manage this, 
> > e.g. 
> > how do handle container ID inheritance, how do we handle nested containers 
> > (setting the containerid when it is already set), do we care if multiple 
> > different containers share the same namespace config, etc.?
> 
> 
> > Can we all live with this?  If not, please suggest some alternate ideas; 
> > simply shouting "IT'S ALL CRAP!" isn't helpful for anyone ... it may be 
> > true, 
> > but it doesn't help us solve the problem ;)
> 
> Without stopping and defining what someone means by container I think it
> is pretty much nonsense.

Not complete, but this is why I'm asking for a standards document...

> Should every vsftp connection get a container every?  Every chrome tab?
> 
> At some of the connections per second numbers I have seen we might
> exhaust a 32bit number in an hour or two.  Will any of that make sense
> to someone reading the audit logs?

So making it 64bits buys us some time, but sure...  I think your
definition of a container may be a bit more liberal than what we're
trying to understand...

> Without considerning that container creation is an unprivileged
> operation I think it is pretty much nonsense.  Do I get to say I am any
> container I want?  That would seem to invalidate the concept of
> userspace setting a container id.

Ok, my impression was that we're dealing with a privileged application
as I alluded with the need to create a new CAP_AUDIT_CONTAINER_ID or
something...

> How does any of this interact with setns?  AKA entering a container?

You mean entering another namespace that might all be part of one
container?  Or an an application attempting to enter the namespace of
another container?

> I will go as far as looking at patches.  If someone comes up with
> a mission statement about what they are actually trying to achieve and a
> mechanism that actually achieves that, and that allows for containers to
> nest we can talk about doing something like that.

I don't pretend these patches are anywhere near finished or ready for
upstream.

> But for right now I just hear proposals for things that make no sense
> and can not possibly work.  Not least because it will require modifying
> every program that creates a container and who knows how many of them
> there are.  Especially since you don't need to be root.  Modifying
> /usr/bin/unshare seems a little far out to me.

My understanding is that just spawning or changing namespace doesn't
imply spawning or changing containers.  I also don't necessarily assume
that creating a container is an atomic operation, though that concept
might make some sense to understand or predict the boundaries of
actions...

> Eric

- RGB

--
Richard Guy Briggs 
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red 
Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5] livepatch: Prevent patch inconsistencies if the coming module notifier fails

2015-05-14 Thread Minfei Huang

From: Minfei Huang 

The previous patches can be applied, once the corresponding module is
loaded. In general, the patch will do relocation (if necessary) and
obtain/verify function address before we start to enable patch.

There are three different situations in which the coming module notifier
can fail:

1) relocations are not applied for some reason. In this case kallsyms
for module symbol is not called at all. The patch is not applied to the
module. If the user disable and enable patch again, there is possible
bug in klp_enable_func. If the user specified func->old_addr for some
function in the module (and he shouldn't do that, but nevertheless) our
warning would not catch it, ftrace will reject to register the handler
because of wrong address or will register the handler for wrong address.

2) relocations are applied successfully, but kallsyms lookup fails. In
this case func->old_addr can be correct for all previous lookups, 0 for
current failed one, and "unspecified" for the rest. If we undergo the
same scenario as in 1, the behaviour differs for three cases, but the
patch is not enabled anyway.

3) the object is initialized, but klp_enable_object fails in the
notifier due to possible ftrace error. Since it is improbable that
ftrace would heal itself in the future, we would get those errors
everytime the patch is enabled.

In order to fix above situations, we can make obj->mod to NULL, if the
coming modified notifier fails.

Signed-off-by: Minfei Huang 
---
v4:
- remove the label "out" in function klp_init_object_loaded
v3:
- modify the code style
v2:
- add the error message to make it more friendly
- modify the commit log, base on the mbe...@suse.cz suggesting
v1:
- modify the commit log, describe the issue more details
---
 kernel/livepatch/core.c | 29 ++---
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 284e269..4a87765 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -883,7 +883,7 @@ int klp_register_patch(struct klp_patch *patch)
 }
 EXPORT_SYMBOL_GPL(klp_register_patch);
 
-static void klp_module_notify_coming(struct klp_patch *patch,
+static int klp_module_notify_coming(struct klp_patch *patch,
 struct klp_object *obj)
 {
struct module *pmod = patch->mod;
@@ -891,22 +891,23 @@ static void klp_module_notify_coming(struct klp_patch 
*patch,
int ret;
 
ret = klp_init_object_loaded(patch, obj);
-   if (ret)
-   goto err;
+   if (ret) {
+   pr_warn("failed to initialize patch '%s' for module '%s' 
(%d)\n",
+   pmod->name, mod->name, ret);
+   return ret;
+   }
 
if (patch->state == KLP_DISABLED)
-   return;
+   return 0;
 
pr_notice("applying patch '%s' to loading module '%s'\n",
  pmod->name, mod->name);
 
ret = klp_enable_object(obj);
-   if (!ret)
-   return;
-
-err:
-   pr_warn("failed to apply patch '%s' to module '%s' (%d)\n",
-   pmod->name, mod->name, ret);
+   if (ret)
+   pr_warn("failed to apply patch '%s' to module '%s' (%d)\n",
+   pmod->name, mod->name, ret);
+   return ret;
 }
 
 static void klp_module_notify_going(struct klp_patch *patch,
@@ -930,6 +931,7 @@ disabled:
 static int klp_module_notify(struct notifier_block *nb, unsigned long action,
 void *data)
 {
+   int ret;
struct module *mod = data;
struct klp_patch *patch;
struct klp_object *obj;
@@ -955,7 +957,12 @@ static int klp_module_notify(struct notifier_block *nb, 
unsigned long action,
 
if (action == MODULE_STATE_COMING) {
obj->mod = mod;
-   klp_module_notify_coming(patch, obj);
+   ret = klp_module_notify_coming(patch, obj);
+   if (ret) {
+   obj->mod = NULL;
+   pr_warn("patch '%s' is in an 
inconsistent state!\n",
+   patch->mod->name);
+   }
} else /* MODULE_STATE_GOING */
klp_module_notify_going(patch, obj);
 
-- 
2.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

2015-05-14 Thread Linus Torvalds

On Thu, May 14, 2015 at 6:26 PM, Al Viro  wrote:
>
> Hold on.  Should
> stat("blah", )  => ENOENT, OK, let's create it
> mkdir("blah", 0)=> EEXIST, bugger, looks like a race
> stat("blah", )  => ENOENT, Whiskey, Tango, Foxtrot
> be possible?

No. What I described would not in any way change any of the above. I'm
not understanding what your point is.

The only difference - EVER - would be if you pass in the ICASE flag.
Nothing I suggested would change semantics without it (the _hash_
changes, but that doesn't change semantics, it's a purely internal
random number).

Now, *with* O_ICASE/AT_ICASE, semantics change. Obviously. At that
point the dentry lookup would match case-insensitively.

For example, let's say that you have a directory where you already
have both "Blah" and "blah", because you created them in a sane
environment. They'll be two different dentries (assuming they are
cached), but they'll have the same dentry hash.

Now, you open "blah" with O_ICASE, and the end result is that you
would randomly open one or the other (it would be the one you find
first on the hash chain). Tough. Mixing icase and case-insensitive is
by definition going to cause those kinds of issues.

The nasty issue (and the case that samba apparently wants it for) is
that ICASE wouldn't be able to trust negative dentries (us having a
negative dentry in one case doesn't mean that it's negative in ICASE).
And that might be the killer part. Negative dentries are really
useful.

Now, the VFS layer support part is I think fairly simple. I might be
wrong, but I really think the hashing etc wouldn't be too painful.
After all, we already do support ->d_hash() and ->d_compare(), this is
"more of the same", just supported at a vfs level directly (and
_allowing_ aliases in case).

The real pain is that the low-level filesystem has to support it too.
That's simple for some filesystems, but it can be hard for things that
hash filenames. Because there - unlike at the VFS layer - the hashes
have meaning and you can't just change them to suit a ICASE lookup
(because they exist on-disk).

So supporting that is likely trivial on filesystems like FAT or SYSV,
which just iterate over the directory anyway at lookup() time. On ext*
with hashed directories, it's nasty (and a ICASE lookup would probably
have to just walk the whole directory. old-style). But I think all the
code to do the nonhashed lookup is still there, since it is a
filesystem feature bit. And it would only need to do that linear
search thing when the ICASE flag is set in the lookup flags.

Of course, if it ends up just walking the directory linearly anyway,
it doesn't fix the one samba performance problem that Jeremy pointed
out, so that makes this of dubious value. If we can't do this better
than samba can already do it on its own, it's kind of pointless.

Again - the filesystems (and the vfs layer) would remain case
sensitive. But I think it might be fairly straightforward to allow
per-operation ICASE handling for thins that want it.

Keyword "think". Maybe there's something I didn't think of.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL 00/30] perf/core improvements and fixes

2015-05-14 Thread Namhyung Kim

Hi Arnaldo,

On Thu, May 14, 2015 at 10:18:27AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, May 14, 2015 at 05:23:30PM +0900, Namhyung Kim escreveu:
> > On Mon, May 11, 2015 at 11:06:26AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Mon, May 11, 2015 at 02:09:39PM +0900, Namhyung Kim escreveu:
> > > > I'm seeing a segfault on 'perf report' with a large data file after
> > > > applying thread refcount change - it happens regardless of the atomic
> > > > operation.
> 
> > > Any specific 'perf record' command line? Does it take a long time to
> > > reproduce? Any backtraces? I'll try to repro, its possible that we're
> > > doing one too many thread__put()...
>  
> > It's a kernel build with '-j 20' and recorded data size is ~2.1GB.
> > It takes ~30 sec to reproduce.
> > 
> >   $ perf report -i threaded/kbuild7.data --header-only
> >   # 
> >   # captured on: Thu Dec 18 12:06:35 2014
> >   # hostname : sejong
> >   # os release : 3.17.4-1-ARCH
> >   # perf version : 3.18.rc3.gcb4774b
> >   # arch : x86_64
> >   # nrcpus online : 12
> >   # nrcpus avail : 12
> >   # cpudesc : Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
> >   # cpuid : GenuineIntel,6,45,7
> >   # total memory : 24646828 kB
> >   # cmdline : /home/namhyung/project/linux/tools/perf/perf record -ag -o 
> > /home/namhyung/tmp/perf/threaded/kbuild7.data -- make -j20
> >   # event : name = cycles, , size = 104, { sample_period, sample_freq } = 
> > 4000, sample_type = IP|TID|TIME|CALLCHAIN|CPU|PERIOD, disabled = 1, inherit
> >   # HEADER_CPU_TOPOLOGY info available, use -I to display
> >   # HEADER_NUMA_TOPOLOGY info available, use -I to display
> >   # pmu mappings: cpu = 4, software = 1, power = 24, uncore_pcu = 13, 
> > tracepoint = 2, uncore_imc_0 = 15, uncore_imc_1 = 16, uncore_imc_2 = 17, 
> > uncore_
> >   # 
> >   #
> > 
> > 
> >   $ perf data stat -i threaded/kbuild7.data
> > 
> >Total event stats for 'threaded/kbuild7.data' file:
> >   
> >  TOTAL events:   25126492
> >   MMAP events:114
> >   COMM events: 117957
> >   EXIT events: 240544
> >   THROTTLE events: 16
> > UNTHROTTLE events: 16
> >   FORK events: 120488
> > SAMPLE events:   23878219
> >  MMAP2 events: 745325
> > FINISHED_ROUND events:  23813
> >   
> >Sample event stats:
> >   
> >   20,579,564,471,104  cycles
> >   23,878,219  samples   #   sampling ratio  
> > 99.745% (3989/4000)
> > 
> >498.736917889 second time sampled
> > 
> > 
> >   $ perf report -i threaded/kbuild7.data
> 
> We need to improve this segfault backtrace, I have to always use
> addr2line to resolve those missing entries, i.e. if you try:
> 
> addr2line -fe /path/to/your/perf 0x4dd9c8
> addr2line -fe /path/to/your/perf 0x4e2580
> 
> We would have resolved those lines :-/

Right, I'll add it to my TODO list.

Anyway, this is a backtrace using gdb..

Thanks,
Namhyung

Program received signal SIGSEGV, Segmentation fault.
0x75fb229e in __strcmp_sse2_unaligned () from /usr/lib/libc.so.6
(gdb) bt
#0  0x75fb229e in __strcmp_sse2_unaligned () from /usr/lib/libc.so.6
#1  0x004d3948 in _sort__dso_cmp (map_r=, 
map_l=) at util/sort.c:142
#2  sort__dso_cmp (left=, right=) at 
util/sort.c:148
#3  0x004d7f08 in hist_entry__cmp (right=0x7fffc530, 
left=0x323a27f0) at util/hist.c:911
#4  add_hist_entry (sample_self=true, al=0x7fffc710, entry=0x7fffc530, 
hists=0x18f6690) at util/hist.c:389
#5  __hists__add_entry (hists=0x18f6690, al=0x7fffc710, 
sym_parent=, bi=bi@entry=0x0, mi=mi@entry=0x0, period=,
weight=0, transaction=0, sample_self=true) at util/hist.c:471
#6  0x004d8234 in iter_add_single_normal_entry (iter=0x7fffc740, 
al=) at util/hist.c:662
#7  0x004d8765 in hist_entry_iter__add (iter=0x7fffc740, 
al=0x7fffc710, evsel=0x18f6550, sample=,
max_stack_depth=, arg=0x7fffd0a0) at util/hist.c:871
#8  0x00436353 in process_sample_event (tool=0x7fffd0a0, 
event=, sample=0x7fffc870, evsel=0x18f6550,
machine=) at builtin-report.c:171
#9  0x004bbe23 in perf_evlist__deliver_sample (machine=0x18f4cc0, 
evsel=0x18f6550, sample=0x7fffc870, event=0x7fffe0bd3220,
tool=0x7fffd0a0, evlist=0x18f5b50) at util/session.c:972
#10 machines__deliver_event (machines=machines@entry=0x18f4cc0, 
evlist=, event=event@entry=0x7fffe0bd3220,
sample=sample@entry=0x7fffc870, tool=tool@entry=0x7fffd0a0, 
file_offset=file_offset@entry=1821434400) at util/session.c:1009
#11 0x004bc681 in perf_session__deliver_event (file_offset=1821434400, 
tool=0x7fffd0a0, sample=0x7fffc870, event=0x7fffe0bd3220,
session=) at util/session.c:1050
#12 ordered_events__deliver_event (oe=0x18f4e00, event=) at 
util/session.c:109
#13 0x004bf12b in __ordered_events__flush (oe=0x18f4e00) at

Re: [PATCH V2 3/4] watchdog: da9062: DA9062 watchdog driver

2015-05-14 Thread Guenter Roeck


On 05/14/2015 09:43 AM, S Twiss wrote:

From: S Twiss 

Add watchdog driver support for DA9062


Signed-off-by: Steve Twiss 

---

Changes in V2:
  - Removed informational dev_info() 'installed watchdog' message
  - Copyright headers GPL v2 (and later) match correct 'GPL' in MODULE_LICENSE
  - Removed the explicit 300 msecs delay from the reset_watchdog_timer()
function and replaced it with a variable delay (depending on the
difference since the last ping). A debug message is used to catch the
multiple pings trying to break the 300 msecs protection barrier.
  - Fix error paths for the functions da9062_wdt_update_timeout_register()
and da9062_wdt_stop()
  - Add error paths in the probe() and correctly clean-up the registered
device if there is a problem after registration.

This patch applies against linux-next and v4.1-rc3



  drivers/watchdog/Kconfig  |   9 ++
  drivers/watchdog/Makefile |   1 +
  drivers/watchdog/da9062_wdt.c | 288 ++
  3 files changed, 298 insertions(+)
  create mode 100644 drivers/watchdog/da9062_wdt.c

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index e5e7c55..dfdb6c6 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -96,6 +96,15 @@ config DA9063_WATCHDOG

  This driver can be built as a module. The module name is da9063_wdt.

+config DA9062_WATCHDOG
+   tristate "Dialog DA9062 Watchdog"
+   depends on MFD_DA9062
+   select WATCHDOG_CORE
+   help
+ Support for the watchdog in the DA9062 PMIC.
+
+ This driver can be built as a module. The module name is da9062_wdt.
+
  config GPIO_WATCHDOG
tristate "Watchdog device controlled through GPIO-line"
depends on OF_GPIO
diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
index 5c19294..57ba815 100644
--- a/drivers/watchdog/Makefile
+++ b/drivers/watchdog/Makefile
@@ -179,6 +179,7 @@ obj-$(CONFIG_XEN_WDT) += xen_wdt.o
  # Architecture Independent
  obj-$(CONFIG_DA9052_WATCHDOG) += da9052_wdt.o
  obj-$(CONFIG_DA9055_WATCHDOG) += da9055_wdt.o
+obj-$(CONFIG_DA9062_WATCHDOG) += da9062_wdt.o
  obj-$(CONFIG_DA9063_WATCHDOG) += da9063_wdt.o
  obj-$(CONFIG_GPIO_WATCHDOG)   += gpio_wdt.o
  obj-$(CONFIG_WM831X_WATCHDOG) += wm831x_wdt.o
diff --git a/drivers/watchdog/da9062_wdt.c b/drivers/watchdog/da9062_wdt.c
new file mode 100644
index 000..9e6c93b
--- /dev/null
+++ b/drivers/watchdog/da9062_wdt.c
@@ -0,0 +1,288 @@
+/*
+ * da9062_wdt.c - WDT device driver for DA9062
+ * Copyright (C) 2015  Dialog Semiconductor Ltd.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static const unsigned int wdt_timeout[] = { 0, 2, 4, 8, 16, 32, 65, 131 };
+#define DA9062_TWDSCALE_DISABLE0
+#define DA9062_TWDSCALE_MIN1
+#define DA9062_TWDSCALE_MAX(ARRAY_SIZE(wdt_timeout) - 1)
+#define DA9062_WDT_MIN_TIMEOUT wdt_timeout[DA9062_TWDSCALE_MIN]
+#define DA9062_WDT_MAX_TIMEOUT wdt_timeout[DA9062_TWDSCALE_MAX]
+#define DA9062_WDG_DEFAULT_TIMEOUT wdt_timeout[DA9062_TWDSCALE_MAX-1]
+#define DA9062_RESET_PROTECTION_MS 300
+
+struct da9062_watchdog {
+   struct da9062 *hw;
+   struct watchdog_device wdtdev;
+   unsigned long j_time_stamp;
+};
+
+static void da9062_set_window_start(struct da9062_watchdog *wdt)
+{
+   wdt->j_time_stamp = jiffies;
+}
+
+static void da9062_apply_window_protection(struct da9062_watchdog *wdt)
+{
+   unsigned long delay = msecs_to_jiffies(DA9062_RESET_PROTECTION_MS);
+   unsigned long timeout = wdt->j_time_stamp + delay;
+   unsigned long now = jiffies;
+   unsigned int diff_ms;
+
+   /* if time-limit has not elapsed then wait for remainder */
+   if (time_before(now, timeout)) {
+   diff_ms = jiffies_to_msecs(timeout-now);
+   dev_dbg(wdt->hw->dev,
+   "Kicked too quickly. Delaying %u msecs\n", diff_ms);
+   msleep(diff_ms);
+   }
+
+   return;


Unnecessary return statement.


+}
+
+static unsigned int da9062_wdt_timeout_to_sel(unsigned int secs)
+{
+   unsigned int i;
+
+   for (i = DA9062_TWDSCALE_MIN; i <= DA9062_TWDSCALE_MAX; i++) {
+   if (wdt_timeout[i] >= secs)
+   return i;
+   }
+
+   return DA9062_TWDSCALE_MAX;
+}
+
+static int

Re: [PATCH 2/2] clk: divider: fix to set parent rate from CLK_DIVIDER_READ_ONLY flag

2015-05-14 Thread Joonyoung Shim

Hi Michael,

On 05/13/2015 08:57 AM, Stephen Boyd wrote:
> On 05/12, Michael Turquette wrote:
>> Quoting Joonyoung Shim (2015-04-07 00:46:46)
>>> The round_rate callback function will returns alway same parent clk rate
>>> of divider with CLK_DIVIDER_READ_ONLY flag. If be used
>>> CLK_SET_RATE_PARENT flag with CLK_DIVIDER_READ_ONLY flag, then never
>>> change parent clk rate anymore.
>>>
>>> From this case, this patch allows to change parent clk rate.
>>>
>>> Signed-off-by: Joonyoung Shim 
>>> ---
>>>  drivers/clk/clk-divider.c | 5 +
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/drivers/clk/clk-divider.c b/drivers/clk/clk-divider.c
>>> index ce34d29a..37e285e 100644
>>> --- a/drivers/clk/clk-divider.c
>>> +++ b/drivers/clk/clk-divider.c
>>> @@ -352,6 +352,11 @@ static long clk_divider_round_rate(struct clk_hw *hw, 
>>> unsigned long rate,
>>> bestdiv = readl(divider->reg) >> divider->shift;
>>> bestdiv &= div_mask(divider->width);
>>> bestdiv = _get_div(divider->table, bestdiv, divider->flags);
>>> +
>>> +   if ((__clk_get_flags(hw->clk) & CLK_SET_RATE_PARENT))
>>> +   *prate = __clk_round_rate(__clk_get_parent(hw->clk),
>>> + rate);
>>> +
>>> return DIV_ROUND_UP(*prate, bestdiv);
>>> }
>>>  
>>> -- 
>>> 1.9.1
>>>
>>
>> Hello Joonyoung Shim,
>>
>> Thanks for reporting the bug and providing a fix!
>>
>> I've come up with an alternative solution to this. This patch should
>> replace both of your patches. Can you test this and see if it fixes the
>> problem for you?
>>

Yes, it works.

>> Thanks,
>> Mike
>>
>>
>>
>> From 655dddad2700a30aaa397cd804422e0d9195efad Mon Sep 17 00:00:00 2001
>> From: Michael Turquette 
>> Date: Tue, 12 May 2015 16:13:46 -0700
>> Subject: [PATCH] clk: divider: support read-only dividers
>>
>> An arbitrary clock rate divider may be set out of reset, or perhaps by
>> the bootloader or something other than Linux. In these cases we may want
>> to know the frequency of the clock signal, but we do not want to allow
>> Linux to change it.
>>
>> The CLK_DIVIDER_READ_ONLY flag was intended to express this, but the
>> functionality was missing in the code. Add read-only clk_ops for divider
>> clocks to handle this case.
>>
>> For hardware with fixed dividers it is still best to use the
>> fixed-factor clock type.
>>
>> Reported-by: Joonyoung Shim 
>> Signed-off-by: Michael Turquette 
>> ---
>>  drivers/clk/clk-divider.c | 10 +-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/clk/clk-divider.c b/drivers/clk/clk-divider.c
>> index 25006a8..5d2de26 100644
>> --- a/drivers/clk/clk-divider.c
>> +++ b/drivers/clk/clk-divider.c
>> @@ -412,6 +412,11 @@ const struct clk_ops clk_divider_ops = {
>>  };
>>  EXPORT_SYMBOL_GPL(clk_divider_ops);
>>  
>> +const struct clk_ops clk_divider_ro_ops = {
>> +.recalc_rate = clk_divider_recalc_rate,
>> +};
>> +EXPORT_SYMBOL_GPL(clk_divider_ro_ops);
>> +
>>  static struct clk *_register_divider(struct device *dev, const char *name,
>>  const char *parent_name, unsigned long flags,
>>  void __iomem *reg, u8 shift, u8 width,
>> @@ -437,7 +442,10 @@ static struct clk *_register_divider(struct device 
>> *dev, const char *name,
>>  }
>>  
>>  init.name = name;
>> -init.ops = _divider_ops;
>> +if (clk_divider_flags & CLK_DIVIDER_READ_ONLY)
>> +init.ops = _divider_ro_ops;
>> +else
>> +init.ops = _divider_ops;
>>  init.flags = flags | CLK_IS_BASIC;
>>  init.parent_names = (parent_name ? _name: NULL);
>>  init.num_parents = (parent_name ? 1 : 0);
>> -- 
> 
> Isn't this sort of reverting commit e6d5e7d90be9 (clk-divider:
> Fix READ_ONLY when divider > 1, 2014-11-14)? 
> 

So i had abandoned to retry commit e6d5e7d90be9.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

2015-05-14 Thread Richard Guy Briggs

On 15/05/14, Oren Laadan wrote:
> On Thu, May 14, 2015 at 8:48 PM, Richard Guy Briggs  wrote:
> 
> >
> > > > > Recording each instance of a name space is giving me something that I
> > > > > cannot use to do queries required by the security target. Given these
> > > > > events, how do I locate a web server event where it accesses a
> > watched
> > > > > file? That authentication failed? That an update within the container
> > > > > failed?
> > > > >
> > > > > The requirements are that we have to log the creation, suspension,
> > > > > migration, and termination of a container. The requirements are not
> > on
> > > > > the individual name space.
> > > >
> > > > Ok.  Do we have a robust definition of a container?
> > >
> > > We call the combination of name spaces, cgroups, and seccomp rules a
> > > container.
> >
> > Can you detail what information is required from each?
> >
> > > > Where is that definition managed?
> > >
> > > In the thing that invokes a container.
> >
> > I was looking for a reference to a standards document rather than an
> > application...
> >
> >
> [focusing on "containers id" - snipped the rest away]
> 
> I am unfamiliar with the audit subsystem, but work with namespaces in other
> contexts. Perhaps the term "container" is overloaded here. The definition
> suggested by Steve in this thread makes sense to me: "a combination of
> namespaces". I imagine people may want to audit subsets of namespaces.

I assume it would be a bit more than that, including cgroup and seccomp info.

> For namespaces, can use a string like "A:B:C:D:E:F" as an identifier for a
> particular combination, where A-F are respective namespaces identifiers.
> (Can be taken for example from /proc/PID/ns/{mnt,uts,ipc,user,pid,net}).
>  That will even be grep-able to locate records related to a particular
> subset
> of namespaces. So a "container" in the classic meaning would have all A-F
> unique and different from the init process, but processes separated only by
> e.g. mnt-ns and net-ns will differ from the init process in  A and F.
> 
> (If a string is a no go, then perhaps combine the IDs in a unique way into a
> super ID).

I'd be fine with either, even including the nsfs deviceID.

> Oren.

- RGB

--
Richard Guy Briggs 
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red 
Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v9 00/17] Tegra124 CL-DVFS / DFLL clocksource + cpufreq

2015-05-14 Thread Viresh Kumar

On 15 May 2015 at 01:45, Rafael J. Wysocki  wrote:
> You need ACKs from Viresh for those two, then.  He's officially responsible
> for ARM cpufreq drivers.

I thought an Ack for 14th is enough :)

For: 12/13/14.
Acked-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

2015-05-14 Thread Richard Guy Briggs

On 15/05/14, Eric W. Biederman wrote:
> Steve Grubb  writes:
> > On Tuesday, May 12, 2015 03:57:59 PM Richard Guy Briggs wrote:
> >> On 15/05/05, Steve Grubb wrote:
> >> > I think there needs to be some more discussion around this. It seems like
> >> > this is not exactly recording things that are useful for audit.
> >> 
> >> It seems to me that either audit has to assemble that information, or
> >> the kernel has to do so.  The kernel doesn't know about containers
> >> (yet?).
> >
> > Auditing is something that has a lot of requirements imposed on it by 
> > security 
> > standards. There was no requirement to have an auid until audit came along 
> > and 
> > said that uid is not good enough to know who is issuing commands because of 
> > su 
> > or sudo. There was no requirement for sessionid until we had to track each 
> > action back to a login so we could see if the login came from the expected 
> > place. 
> 
> Stop right there.
> 
> You want a global identifier in a realm where only relative identifiers
> exist, and make sense.

I am assuming he wants an identifier unique per container on one kernel
and what happens on other kernels is a matter for a management
application to take care of.  This kernel doesn't have to deal with it
other than taking information from a container management application.

> I am sorry that isn't going to happen. EVER.
> 
> Square peg, round hole.  It doesn't work, it doesn't make sense, and
> most especially it doesn't allow anyone to reconstruct anything, because
> it does not make sense and does not match what the kernel is doing.
> 
> Container IDs do not, and will not exist.  There is probably something
> reasonable in your request but until you stop talking that nonsense I
> can't see it.

I didn't see anything in any of what Steve said that suggested it was to
be unique beyond that one kernel.

> Global IDs take us into the namespace of namespaces problem and that
> isn't going to happen.  I have already bent as far in this direction as
> I can go.  Further namespace creation is not a privileged event which
> makes the requestion for a container ID make even less sense.  With
> anyone able to create whatever they want it will not be a identifier
> that makes any sense to someone reading an audit log.

Again, I assume this is up to a container management application that
will manage its pool of container hosts and an audit aggregator.

You keep raising an objection about the unworkability of a "namespace of
namespaces".  Just so we are all on the same page here, can you explain
exactly what you mean with "namespace of namespaces"?

> Eric

- RGB

--
Richard Guy Briggs 
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red 
Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] clk: divider: don't set_rate with CLK_DIVIDER_READ_ONLY flag

2015-05-14 Thread Joonyoung Shim

Hi Stephen,

On 05/13/2015 08:59 AM, Stephen Boyd wrote:
> On 04/07, Joonyoung Shim wrote:
>> Even if use CLK_DIVIDER_READ_ONLY flag, divider setting can be changed
>> by set_rate callback. Don't change divider setting from set_rate
>> callback of divider with CLK_DIVIDER_READ_ONLY flag.
>>
>> Signed-off-by: Joonyoung Shim 
>> ---
> 
> Is the rate actually changing? Or is it just a problem that we
> may be writing the register to the same value it already is?
> 

If rate and parant_rate are different, it can write the register to
different value. Even if the value is same but i think it's unnecessary
to re-write the register.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] block:Add proper error handling to the function, disk_add_events

2015-05-14 Thread Jens Axboe


On 05/14/2015 09:39 PM, Nicholas Krause wrote:



On May 14, 2015 9:22:22 PM EDT, Jens Axboe  wrote:

On 05/14/2015 07:57 PM, Nicholas Krause wrote:

This adds the proper required error checking to the function,
disk_add_events for  when there are no disk events by returning
the error code, -EBUSY. Further this also adds error checking
for when our call to the function, sysfs_create_files by making
this function's return now go into a newly declared variable,
ret and at the end of this function's body return it to indicate
whether this function is successful or not to the caller.

Signed-off-by: Nicholas Krause 
---
   block/genhd.c | 13 +
   1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 0a536dc..3eb7ee9 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1803,15 +1803,19 @@ static void disk_alloc_events(struct gendisk

*disk)

disk->ev = ev;
   }

-static void disk_add_events(struct gendisk *disk)
+static int disk_add_events(struct gendisk *disk)
   {
+   int ret = 0;
if (!disk->ev)
-   return;
+   return -EBUSY;
+
+   ret = sysfs_create_files(_to_dev(disk)->kobj,

disk_events_attrs)


-   /* FIXME: error handling */
-   if (sysfs_create_files(_to_dev(disk)->kobj, disk_events_attrs)

< 0)

+   if (!ret) {
pr_warn("%s: failed to create sysfs files for events\n",
disk->disk_name);
+   return ret;
+   }


You didn't even test this, obviously.

It builds on my system.  I can't see anything wrong with it, please explain.


The fact that it compiles does not constitute that it has been tested. 
And it's definitely broken, as a test boot would have revealed.


--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] livepatch: Prevent to apply the patch once coming module notifier fails

2015-05-14 Thread Minfei Huang

On 05/14/15 at 05:05pm, Miroslav Benes wrote:
> 
> Hi,
> 
> I have few nitpicks...
> 
> The subject is slightly misleading. We still apply the patch (or the patch 
> is already applied to be precise). Only the coming module is not patched 
> and won't be patched. So I propose something like
> 
> livepatch: prevent patch inconsistencies if the coming module notifier fails 
> 
> (or bugs, corruptions, whatever).

Will do.

> 
> On Thu, 14 May 2015, Minfei Huang wrote:
> 
> > The previous patches can be applied, once the corresponding module is
> > loaded. In general, the patch will do relocation (if necessary) and
> > obtain/verify function address before we start to enable patch.
> > 
> > There are three different situations in which the coming module notifier
> > can fail:
> > 
> > 1) relocations are not applied for some reason. In this case kallsyms
> > for module symbol is not called at all. The patch is not applied to the
> > module. If the user disable and enable patch again, there is possible
> > bug in klp_enable_func. If the user specified func->old_addr for some
> > function in the module (and he shouldn't do that, but nevertheless) our
> > warning would not catch it, there will be something wrong with the
> > ftrace.
> 
> ", there will be something wrong with the ftrace."
> 
> I would improve that...
> 
> ", ftrace will reject to register the handler because of wrong address or 
> will register the handler for wrong address." But feel free to change it 
> according to your view. Just be more specific than the changelog is right
> now.
> 

Thanks


> > 2) relocations are applied successfully, but kallsyms lookup fails. In
> > this case func->old_addr can be correct for all previous lookups, 0 for
> > current failed one, and "unspecified" for the rest. If we undergo the
> > same scenario as in 1, the behaviour differs for three cases, but the
> > patch is not enable anyway.
> 
> s/enable/enabled/

Will correct it.
> 
> But I think it would be nice to describe different behaviours for the sake 
> of the changelog. I don't have strong opinion about this though.
> 

Thanks
Minfei


> > 3) the object is initialized, but klp_enable_object fails in the
> > notifier due to possible ftrace error. Since it is improbable that
> > ftrace would heal itself in the future, we would get those errors
> > everytime the patch is enabled.
> > 
> > In order to fix above situations, we can make obj->mod to NULL, if the
> > coming modified notifier fails.
> > 
> > Signed-off-by: Minfei Huang 
> > ---
> > v3:
> > - modify the code style
> > v2:
> > - add the error message to make it more friendly
> > - modify the commit log, base on the mbe...@suse.cz suggesting
> > v1:
> > - modify the commit log, describe the issue more details
> > ---
> >  kernel/livepatch/core.c | 30 +++---
> >  1 file changed, 19 insertions(+), 11 deletions(-)
> > 
> > diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> > index 284e269..d4603e7 100644
> > --- a/kernel/livepatch/core.c
> > +++ b/kernel/livepatch/core.c
> > @@ -883,7 +883,7 @@ int klp_register_patch(struct klp_patch *patch)
> >  }
> >  EXPORT_SYMBOL_GPL(klp_register_patch);
> >  
> > -static void klp_module_notify_coming(struct klp_patch *patch,
> > +static int klp_module_notify_coming(struct klp_patch *patch,
> >  struct klp_object *obj)
> >  {
> > struct module *pmod = patch->mod;
> > @@ -891,22 +891,24 @@ static void klp_module_notify_coming(struct klp_patch 
> > *patch,
> > int ret;
> >  
> > ret = klp_init_object_loaded(patch, obj);
> > -   if (ret)
> > -   goto err;
> > +   if (ret) {
> > +   pr_warn("failed to initialize patch '%s' for module '%s' 
> > (%d)\n",
> > +   pmod->name, mod->name, ret);
> > +   goto out;
> > +   }
> >  
> > if (patch->state == KLP_DISABLED)
> > -   return;
> > +   goto out;
> >  
> > pr_notice("applying patch '%s' to loading module '%s'\n",
> >   pmod->name, mod->name);
> >  
> > ret = klp_enable_object(obj);
> > -   if (!ret)
> > -   return;
> > -
> > -err:
> > -   pr_warn("failed to apply patch '%s' to module '%s' (%d)\n",
> > -   pmod->name, mod->name, ret);
> > +   if (ret)
> > +   pr_warn("failed to apply patch '%s' to module '%s' (%d)\n",
> > +   pmod->name, mod->name, ret);
> > +out:
> > +   return ret;
> >  }
> >  
> >  static void klp_module_notify_going(struct klp_patch *patch,
> > @@ -930,6 +932,7 @@ disabled:
> >  static int klp_module_notify(struct notifier_block *nb, unsigned long 
> > action,
> >  void *data)
> >  {
> > +   int ret;
> > struct module *mod = data;
> > struct klp_patch *patch;
> > struct klp_object *obj;
> > @@ -955,7 +958,12 @@ static int klp_module_notify(struct notifier_block 
> > *nb, unsigned long action,
> >  
> > if (action == MODULE_STATE_COMING) {
> >

Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

2015-05-14 Thread Eric W. Biederman

Paul Moore  writes:
> As Eric, and others, have stated, the container concept is a userspace idea, 
> not a kernel idea; the kernel only knows, and cares about, namespaces.  This 
> is unlikely to change.
>
> However, as Steve points out, there is precedence for the kernel to record 
> userspace tokens for the sake of audit.  Personally I'm not a big fan of this 
> in general, but I do recognize that it does satisfy a legitimate need.  Think 
> of things like auid and the sessionid as necessary evils; audit is already 
> chock full of evilness I doubt one more will doom us all to hell.
>
> Moving forward, I'd like to see the following:

> * Create a container ID token (unsigned 32-bit integer?), similar to 
> auid/sessionid, that is set by userspace and carried by the kernel to be used 
> in audit records.  I'd like to see some discussion on how we manage this, 
> e.g. 
> how do handle container ID inheritance, how do we handle nested containers 
> (setting the containerid when it is already set), do we care if multiple 
> different containers share the same namespace config, etc.?

> Can we all live with this?  If not, please suggest some alternate ideas; 
> simply shouting "IT'S ALL CRAP!" isn't helpful for anyone ... it may be true, 
> but it doesn't help us solve the problem ;)

Without stopping and defining what someone means by container I think it
is pretty much nonsense.

Should every vsftp connection get a container every?  Every chrome tab?

At some of the connections per second numbers I have seen we might
exhaust a 32bit number in an hour or two.  Will any of that make sense
to someone reading the audit logs?

Without considerning that container creation is an unprivileged
operation I think it is pretty much nonsense.  Do I get to say I am any
container I want?  That would seem to invalidate the concept of
userspace setting a container id.

How does any of this interact with setns?  AKA entering a container?

I will go as far as looking at patches.  If someone comes up with
a mission statement about what they are actually trying to achieve and a
mechanism that actually achieves that, and that allows for containers to
nest we can talk about doing something like that.

But for right now I just hear proposals for things that make no sense
and can not possibly work.  Not least because it will require modifying
every program that creates a container and who knows how many of them
there are.  Especially since you don't need to be root.  Modifying
/usr/bin/unshare seems a little far out to me.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] MIPS64: Support of at least 48 bits of SEGBITS

2015-05-14 Thread Leonid Yegoshin

SEGBITS default is 40 bits or less, depending from CPU type.
This patch introduces 48bits of application virtual address (SEGBITS) support.
It is defined only for 16K and 64K pages and is optional (configurable).

Penalty - a small number of additional pages for generic (small) applications.
But for 64K pages it adds 3rd level of PTE structure, which has a little
impact during software TLB refill.

This patch is needed because MIPS I6XXX and P6XXX cores have 48 bit of
virtual address in each segment (SEGBITS).

Signed-off-by: Leonid Yegoshin 
---
V2: Added correction for defintion of TASK_SIZE64
---
 arch/mips/Kconfig  |   11 +++
 arch/mips/include/asm/pgtable-64.h |   18 +++---
 arch/mips/include/asm/processor.h  |6 +-
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 76efb02ae99f..3acff2f065e9 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2032,6 +2032,17 @@ config PAGE_SIZE_64KB
 
 endchoice
 
+config 48VMBITS
+   bool "48 bits virtual memory"
+   depends on PAGE_SIZE_16KB || PAGE_SIZE_64KB
+   depends on 64BIT
+   help
+ Define a maximum at least 48 bits of application virtual memory.
+ Default is 40 bits or less, depending from CPU.
+ In generic (small) application it is a small set of pages increase
+ in page tables.
+ If unsure, say N.
+
 config FORCE_MAX_ZONEORDER
int "Maximum zone order"
range 14 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
diff --git a/arch/mips/include/asm/pgtable-64.h 
b/arch/mips/include/asm/pgtable-64.h
index cf661a2fb141..c6b5473440e6 100644
--- a/arch/mips/include/asm/pgtable-64.h
+++ b/arch/mips/include/asm/pgtable-64.h
@@ -17,7 +17,7 @@
 #include 
 #include 
 
-#ifdef CONFIG_PAGE_SIZE_64KB
+#if defined(CONFIG_PAGE_SIZE_64KB) && !defined(CONFIG_48VMBITS)
 #include 
 #else
 #include 
@@ -90,7 +90,11 @@
 #define PTE_ORDER  0
 #endif
 #ifdef CONFIG_PAGE_SIZE_16KB
-#define PGD_ORDER  0
+#ifdef CONFIG_48VMBITS
+#define PGD_ORDER   1
+#else
+#define PGD_ORDER   0
+#endif
 #define PUD_ORDER  ai_attempt_to_allocate_pud
 #define PMD_ORDER  0
 #define PTE_ORDER  0
@@ -104,7 +108,11 @@
 #ifdef CONFIG_PAGE_SIZE_64KB
 #define PGD_ORDER  0
 #define PUD_ORDER  ai_attempt_to_allocate_pud
+#ifdef CONFIG_48VMBITS
+#define PMD_ORDER  0
+#else
 #define PMD_ORDER  ai_attempt_to_allocate_pmd
+#endif
 #define PTE_ORDER  0
 #endif
 
@@ -114,11 +122,7 @@
 #endif
 #define PTRS_PER_PTE   ((PAGE_SIZE << PTE_ORDER) / sizeof(pte_t))
 
-#if PGDIR_SIZE >= TASK_SIZE64
-#define USER_PTRS_PER_PGD  (1)
-#else
-#define USER_PTRS_PER_PGD  (TASK_SIZE64 / PGDIR_SIZE)
-#endif
+#define USER_PTRS_PER_PGD   ((TASK_SIZE64 / PGDIR_SIZE)?(TASK_SIZE64 / 
PGDIR_SIZE):1)
 #define FIRST_USER_ADDRESS 0UL
 
 /*
diff --git a/arch/mips/include/asm/processor.h 
b/arch/mips/include/asm/processor.h
index 9b3b48e21c22..bd2030f32ea4 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -69,7 +69,11 @@ extern unsigned int vced_count, vcei_count;
  * 8192EB ...
  */
 #define TASK_SIZE320x7fff8000UL
-#define TASK_SIZE640x100UL
+#ifdef CONFIG_48VMBITS
+#define TASK_SIZE64 (0x1UL << 
((cpu_data[0].vmbits>48)?48:cpu_data[0].vmbits))
+#else
+#define TASK_SIZE64 (0x100UL)
+#endif
 #define TASK_SIZE (test_thread_flag(TIF_32BIT_ADDR) ? TASK_SIZE32 : 
TASK_SIZE64)
 #define STACK_TOP_MAX  TASK_SIZE64
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ppc64 ftrace: mark data_access callees "notrace" (pt.1)

2015-05-14 Thread Michael Ellerman

On Wed, 2015-05-13 at 18:11 +0200, Torsten Duwe wrote:
> In order to avoid an endless recursion, functions that may get
> called from the data access handler must not call into tracing
> functions, which may cause data access faults ;-)
> 
> Advancing from my previous approach that lavishly compiled whole
> subdirs without the profiling switches, this is more fine-grained
> (but probably yet incomplete). This patch is necessary albeit not
> sufficient for FTRACE_WITH_REGS on ppc64.

There's got to be a better solution than this. The chance that you've correctly
annotated every function is basically 0, and the chance that we correctly add
it to every new or modififed function in the future is also 0.

I don't mean that as a criticism of you, but rather the technique. For starters
I don't see any annotations in 32-bit code, or in the BookE code etc.

Can you give us more details on what goes wrong without these annotations?

cheers

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 2/5] arm64: hi6220: Document devicetree bindings for Hisilicon hi6220 SoC

2015-05-14 Thread Bintian


Hello Stephen,

On 2015/5/15 8:27, Stephen Boyd wrote:

On 05/05, Bintian Wang wrote:

This patch adds documentation for the devicetree bindings used by the
DT files of Hisilicon hi6220 SoC mobile platform.

Signed-off-by: Bintian Wang 


Acked-by: Stephen Boyd 


Thanks for your ACK!

Bintian

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] livepatch: Prevent to apply the patch once coming module notifier fails

2015-05-14 Thread Minfei Huang

On 05/14/15 at 09:30am, Josh Poimboeuf wrote:
> On Thu, May 14, 2015 at 09:51:07AM +0800, Minfei Huang wrote:
> > @@ -891,22 +891,24 @@ static void klp_module_notify_coming(struct klp_patch 
> > *patch,
> > int ret;
> >  
> > ret = klp_init_object_loaded(patch, obj);
> > -   if (ret)
> > -   goto err;
> > +   if (ret) {
> > +   pr_warn("failed to initialize patch '%s' for module '%s' 
> > (%d)\n",
> > +   pmod->name, mod->name, ret);
> > +   goto out;
> > +   }
> >  
> > if (patch->state == KLP_DISABLED)
> > -   return;
> > +   goto out;
> >  
> > pr_notice("applying patch '%s' to loading module '%s'\n",
> >   pmod->name, mod->name);
> >  
> > ret = klp_enable_object(obj);
> > -   if (!ret)
> > -   return;
> > -
> > -err:
> > -   pr_warn("failed to apply patch '%s' to module '%s' (%d)\n",
> > -   pmod->name, mod->name, ret);
> > +   if (ret)
> > +   pr_warn("failed to apply patch '%s' to module '%s' (%d)\n",
> > +   pmod->name, mod->name, ret);
> > +out:
> > +   return ret;
> 
> One more minor comment: the out label isn't needed.  Instead of "goto
> out", they can just return directly.

Ok, I will remove the label "out" in the next version.

Thanks
Minfei

> 
> Other than that, it looks good to me.
> 
> Thanks!
> 
> -- 
> Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

2015-05-14 Thread Al Viro

On Thu, May 14, 2015 at 05:25:39PM -0700, Linus Torvalds wrote:

> We can easily make things per-operation, by adding another flag. We
> already have per-operation flags like LOOKUP_FOLLOW, which decides if
> we follow the last symlink or not. We could add a LOOKUP_ICASE, which
> decides whether we compare case or not. Obviously, we'd have to ad the
> proper O_ICASE for open (and AT_ICASE for fstatat() and friends).
> Exactly like we do for LOOKUP_FOLLOW.

> Btw, don't get me wrong. I'm not saying it's a great idea. I think
> icase compares are stupid. Really really stupid. But samba might be
> worth jumping though a few hoops for. The real problem is that even
> with just ASCII, it does make it much easier to create nasty hash
> collisions in the dentry hashes (same hash from 256 variations of
> aAaAAaaA - just repeat the same letter in different variations of
> lower/upper case).

Hold on.  Should
stat("blah", )  => ENOENT, OK, let's create it
mkdir("blah", 0)=> EEXIST, bugger, looks like a race
stat("blah", )  => ENOENT, Whiskey, Tango, Foxtrot
be possible?  No per-operation flags passed, doesn't even know of the
case-insensitive crap.  And if fstatat() without your new flag would
find c-i matches, then what does that flag do?

Confused...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 1/1] iio: ltr501: Add light channel support

2015-05-14 Thread Kuppuswamy Sathyanarayanan

Added support to calculate lux value from visible
and IR spectrum adc count values. Also added IIO_LIGHT
channel to enable user read the lux value directly
from device using illuminance input ABI.

Signed-off-by: Kuppuswamy Sathyanarayanan 

---
 drivers/iio/light/ltr501.c | 57 ++
 1 file changed, 57 insertions(+)

diff --git a/drivers/iio/light/ltr501.c b/drivers/iio/light/ltr501.c
index ca4bf47..449b0fd 100644
--- a/drivers/iio/light/ltr501.c
+++ b/drivers/iio/light/ltr501.c
@@ -66,6 +66,9 @@
 
 #define LTR501_REGMAP_NAME "ltr501_regmap"
 
+#define LTR501_LUX_CONV(vis_coeff, vis_data, ir_coeff, ir_data) \
+   ((vis_coeff * vis_data) - (ir_coeff * ir_data))
+
 static const int int_time_mapping[] = {10, 5, 20, 40};
 
 static const struct reg_field reg_field_it =
@@ -298,6 +301,29 @@ static int ltr501_ps_read_samp_period(struct ltr501_data 
*data, int *val)
return IIO_VAL_INT;
 }
 
+/* IR and visible spectrum coeff's are given in data sheet */
+static unsigned long ltr501_calculate_lux(u16 vis_data, u16 ir_data)
+{
+   unsigned long ratio, lux;
+
+   if (vis_data == 0)
+   return 0;
+
+   /* multiply numerator by 100 to avoid handling ratio < 1 */
+   ratio = DIV_ROUND_UP(ir_data * 100, ir_data + vis_data);
+
+   if (ratio < 45)
+   lux = LTR501_LUX_CONV(1774, vis_data, -1105, ir_data);
+   else if (ratio >= 45 && ratio < 64)
+   lux = LTR501_LUX_CONV(3772, vis_data, 1336, ir_data);
+   else if (ratio >= 64 && ratio < 85)
+   lux = LTR501_LUX_CONV(1690, vis_data, 169, ir_data);
+   else
+   lux = 0;
+
+   return lux / 1000;
+}
+
 static int ltr501_drdy(struct ltr501_data *data, u8 drdy_mask)
 {
int tries = 100;
@@ -548,7 +574,20 @@ static const struct iio_event_spec ltr501_pxs_event_spec[] 
= {
.num_event_specs = _evsize,\
 }
 
+#define LTR501_LIGHT_CHANNEL() { \
+   .type = IIO_LIGHT, \
+   .info_mask_separate = BIT(IIO_CHAN_INFO_PROCESSED), \
+   .scan_index = -1, \
+   .scan_type = { \
+   .sign = 'u', \
+   .realbits = 16, \
+   .storagebits = 16, \
+   .endianness = IIO_CPU, \
+   }, \
+}
+
 static const struct iio_chan_spec ltr501_channels[] = {
+   LTR501_LIGHT_CHANNEL(),
LTR501_INTENSITY_CHANNEL(0, LTR501_ALS_DATA0, IIO_MOD_LIGHT_BOTH, 0,
 ltr501_als_event_spec,
 ARRAY_SIZE(ltr501_als_event_spec)),
@@ -576,6 +615,7 @@ static const struct iio_chan_spec ltr501_channels[] = {
 };
 
 static const struct iio_chan_spec ltr301_channels[] = {
+   LTR501_LIGHT_CHANNEL(),
LTR501_INTENSITY_CHANNEL(0, LTR501_ALS_DATA0, IIO_MOD_LIGHT_BOTH, 0,
 ltr501_als_event_spec,
 ARRAY_SIZE(ltr501_als_event_spec)),
@@ -596,6 +636,23 @@ static int ltr501_read_raw(struct iio_dev *indio_dev,
int ret, i;
 
switch (mask) {
+   case IIO_CHAN_INFO_PROCESSED:
+   if (iio_buffer_enabled(indio_dev))
+   return -EBUSY;
+
+   switch (chan->type) {
+   case IIO_LIGHT:
+   mutex_lock(>lock_als);
+   ret = ltr501_read_als(data, buf);
+   mutex_unlock(>lock_als);
+   if (ret < 0)
+   return ret;
+   *val = ltr501_calculate_lux(le16_to_cpu(buf[1]),
+   le16_to_cpu(buf[0]));
+   return IIO_VAL_INT;
+   default:
+   return -EINVAL;
+   }
case IIO_CHAN_INFO_RAW:
if (iio_buffer_enabled(indio_dev))
return -EBUSY;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] MIPS64: Support of at least 48 bits of SEGBITS

2015-05-14 Thread Leonid Yegoshin

SEGBITS default is 40 bits or less, depending from CPU type.
This patch introduces 48bits of application virtual address (SEGBITS) support.
It is defined only for 16K and 64K pages and is optional (configurable).

Penalty - a small number of additional pages for generic (small) applications.
But for 64K pages it adds 3rd level of PTE structure, which has a little
impact during software TLB refill.

This patch is needed because MIPS I6XXX and P6XXX cores have 48 bit of
virtual address in each segment (SEGBITS).

Signed-off-by: Leonid Yegoshin 
---
 arch/mips/Kconfig  |   10 ++
 arch/mips/include/asm/pgtable-64.h |   18 +++---
 arch/mips/include/asm/processor.h  |2 +-
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 76efb02ae99f..0a151a59a9ac 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2032,6 +2032,16 @@ config PAGE_SIZE_64KB
 
 endchoice
 
+config 48VMBITS
+   bool "48 bits virtual memory"
+   depends on PAGE_SIZE_16KB || PAGE_SIZE_64KB
+   help
+ Define a maximum at least 48 bits of application virtual memory.
+ Default is 40 bits or less, depending from CPU.
+ In generic (small) application it is a small set of pages increase
+ in page tables.
+ If unsure, say N.
+
 config FORCE_MAX_ZONEORDER
int "Maximum zone order"
range 14 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
diff --git a/arch/mips/include/asm/pgtable-64.h 
b/arch/mips/include/asm/pgtable-64.h
index cf661a2fb141..c6b5473440e6 100644
--- a/arch/mips/include/asm/pgtable-64.h
+++ b/arch/mips/include/asm/pgtable-64.h
@@ -17,7 +17,7 @@
 #include 
 #include 
 
-#ifdef CONFIG_PAGE_SIZE_64KB
+#if defined(CONFIG_PAGE_SIZE_64KB) && !defined(CONFIG_48VMBITS)
 #include 
 #else
 #include 
@@ -90,7 +90,11 @@
 #define PTE_ORDER  0
 #endif
 #ifdef CONFIG_PAGE_SIZE_16KB
-#define PGD_ORDER  0
+#ifdef CONFIG_48VMBITS
+#define PGD_ORDER   1
+#else
+#define PGD_ORDER   0
+#endif
 #define PUD_ORDER  ai_attempt_to_allocate_pud
 #define PMD_ORDER  0
 #define PTE_ORDER  0
@@ -104,7 +108,11 @@
 #ifdef CONFIG_PAGE_SIZE_64KB
 #define PGD_ORDER  0
 #define PUD_ORDER  ai_attempt_to_allocate_pud
+#ifdef CONFIG_48VMBITS
+#define PMD_ORDER  0
+#else
 #define PMD_ORDER  ai_attempt_to_allocate_pmd
+#endif
 #define PTE_ORDER  0
 #endif
 
@@ -114,11 +122,7 @@
 #endif
 #define PTRS_PER_PTE   ((PAGE_SIZE << PTE_ORDER) / sizeof(pte_t))
 
-#if PGDIR_SIZE >= TASK_SIZE64
-#define USER_PTRS_PER_PGD  (1)
-#else
-#define USER_PTRS_PER_PGD  (TASK_SIZE64 / PGDIR_SIZE)
-#endif
+#define USER_PTRS_PER_PGD   ((TASK_SIZE64 / PGDIR_SIZE)?(TASK_SIZE64 / 
PGDIR_SIZE):1)
 #define FIRST_USER_ADDRESS 0UL
 
 /*
diff --git a/arch/mips/include/asm/processor.h 
b/arch/mips/include/asm/processor.h
index 9b3b48e21c22..3ccb63eaa6c8 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -69,7 +69,7 @@ extern unsigned int vced_count, vcei_count;
  * 8192EB ...
  */
 #define TASK_SIZE320x7fff8000UL
-#define TASK_SIZE640x100UL
+#define TASK_SIZE64 (0x1UL << cpu_data[0].vmbits)
 #define TASK_SIZE (test_thread_flag(TIF_32BIT_ADDR) ? TASK_SIZE32 : 
TASK_SIZE64)
 #define STACK_TOP_MAX  TASK_SIZE64
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] block:Add proper error handling to the function, disk_add_events

2015-05-14 Thread Jens Axboe


On 05/14/2015 07:57 PM, Nicholas Krause wrote:

This adds the proper required error checking to the function,
disk_add_events for  when there are no disk events by returning
the error code, -EBUSY. Further this also adds error checking
for when our call to the function, sysfs_create_files by making
this function's return now go into a newly declared variable,
ret and at the end of this function's body return it to indicate
whether this function is successful or not to the caller.

Signed-off-by: Nicholas Krause 
---
  block/genhd.c | 13 +
  1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 0a536dc..3eb7ee9 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1803,15 +1803,19 @@ static void disk_alloc_events(struct gendisk *disk)
disk->ev = ev;
  }

-static void disk_add_events(struct gendisk *disk)
+static int disk_add_events(struct gendisk *disk)
  {
+   int ret = 0;
if (!disk->ev)
-   return;
+   return -EBUSY;
+
+   ret = sysfs_create_files(_to_dev(disk)->kobj, disk_events_attrs)

-   /* FIXME: error handling */
-   if (sysfs_create_files(_to_dev(disk)->kobj, disk_events_attrs) < 0)
+   if (!ret) {
pr_warn("%s: failed to create sysfs files for events\n",
disk->disk_name);
+   return ret;
+   }


You didn't even test this, obviously.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] clk: Show correct information when fail to set clock rate

2015-05-14 Thread Stephen Boyd

On 04/27, Chanwoo Choi wrote:
> This patch shows the correct information for debugging when fail to set clock
> rate because original error message shows the error value instead of current
> clock rate.
> 
> Cc: Mike Turquette 
> Cc: Stephen Boyd 
> Cc: Sylwester Nawrocki 
> Signed-off-by: Chanwoo Choi 
> ---

Seems fine to me. Sylwester?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 2/9] Documentation: bindings: move the Berlin clock documentation

2015-05-14 Thread Stephen Boyd

On 04/07, Antoine Tenart wrote:
> The Berlin clock documentation was part of the Marvell Berlin SoC
> documentation because the Berlin clock configuration was inside the
> chip controller. With the recent rework of the chip and system
> controller handling (now all sub-devices of the soc and system
> controller nodes are registred with simple-mfd, and each device has its
> own sub-node), the documentation of the Berlin clock driver can be moved
> to the generic clock documentation directory.
> 
> Signed-off-by: Antoine Tenart 
> ---

Acked-by: Stephen Boyd 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 1/9] clk: berlin: move to a dedicated sub-node

2015-05-14 Thread Stephen Boyd

On 04/07, Antoine Tenart wrote:
> The Berlin clock driver was sharing a DT node with the pin controller
> and the reset driver. All these devices are now sub-nodes of the chip
> controller. This patch rework the Berlin clock driver to allow moving
> the Berlin clock DT bindings into their own sub-node of the chip
> controller node.
> 
> Signed-off-by: Antoine Tenart 
> ---

Acked-by: Stephen Boyd 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: btrfs balance 4.0 regression?

2015-05-14 Thread Chris Murphy

On Thu, May 14, 2015 at 6:33 PM, Omar Sandoval  wrote:
>
>
> Yup, Chris says he has a proper fix but it hasn't hit the list yet.
>
>
> Actually, ext4 convert is broken anyways (with irrelevant output
> elided):

I'm curious how this bug ended up in mainline. Isn't there an XFS test
for both balance+convert and ext4 convert? If not, shouldn't there be?
It's not a data loss bug but Btrfs is in a transitional stretch where
functionality loss bugs are no longer minor. (I'd look but I'm lazy
and xfs tests doesn't appear to be indexed.)

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/9] ARM: multi_v7_defconfig: Enable options for Exynos display support

2015-05-14 Thread Krzysztof Kozlowski

2015-05-15 9:58 GMT+09:00 Javier Martinez Canillas :
> Hello Krzysztof,
>
> On Fri, May 15, 2015 at 2:36 AM, Krzysztof Kozlowski
>  wrote:
>> 2015-05-15 0:40 GMT+09:00 Javier Martinez Canillas
>> :
>>> Many Exynos devices have devices attached to their display ports.
>>> This patch enables the needed Kconfig options to support different
>>> configuration such as simple panel, embedded DisplayPort (eDP) to
>>> LVDS bridges and HDMI displays.
>>
>> Enabling the display would be nice but for some quite long time we had
>> issues with DRM on Exynos. exynos_defconfig has it enabled and most of
>> boards boot fine with it. Exception is Arndale 5250:
>
> Yes, like I said in the other thread, the fact that Exynos DRM is
> working fine now on most boards is mostly because the bugs were
> exposed when the Exynos DRM options were enabled.

I saw your response in email 0/9 but let us stick to one thread.
So these are my only concerns - instability in the past.

>> http://storage.kernelci.org/next/next-20150514/arm-exynos_defconfig/lab-khilman/boot-exynos5250-arndale.html
>> [1.630290] [drm:exynos_dp_bind] *ERROR* failed: of_get_videomode() : -22
>> [1.637071] exynos-drm exynos-drm: failed to bind
>> 145b.dp-controller (ops exynos_dp_ops): -22
>> [1.646504] exynos-drm exynos-drm: master bind failed: -22
>> [1.651391] exynos-drm: probe of exynos-drm failed with error -22
>>
>
> Ajay Kumar changed the DT bindings for the Exynos DRM Display Panel
> driver some time ago but it seems that the Arndale 5250 DTS was never
> updated. Something along the lines of commit [0] is needed.

Thanks,

>
>> Anyway it is not like I am against it... just wondering. On the other
>> hand enabling it could help in early detection of errors.
>>
>
> I think that not enabling these options will just make latent bugs to
> not be exposed. As an example I found that module auto-loading was
> broken for the driver of the PTN3460 eDP to LVDS bridge used in the
> Exynos5250 Snow Chromebook and already posted a fix [1].

Right, enabling the options helps in exposing problems so they could
be spotted and fixed.


> I would had never found that bug if wouldn't had tried enabling these
> options in multi_v7 as a module. Also remember that the consumer
> version of these machines don't have a serial console so for users
> building images with multi_v7, not having display support means that
> the machine is pretty useless.

That is indeed good reason.

FWIW, I tested multi_v7 with your patches on Exynos4412 Trats2 board
and it worked fine.

Reviewed-by: Krzysztof Kozlowski 

Best regards,
Krzysztof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] drm/i915: Expose OA metrics via perf PMU

2015-05-14 Thread Robert Bragg

On Fri, May 8, 2015 at 5:24 PM, Peter Zijlstra  wrote:
> On Thu, May 07, 2015 at 03:15:43PM +0100, Robert Bragg wrote:
>
>> I've changed the uapi for configuring the i915_oa specific attributes
>> when calling perf_event_open(2) whereby instead of cramming lots of
>> bitfields into the perf_event_attr config members, I'm now
>> daisy-chaining a drm_i915_oa_event_attr_t structure off of a single
>> config member that's extensible and validated in the same way as the
>> perf_event_attr struct. I've found this much nicer to work with while
>> being neatly extensible too.
>
> This worries me a bit.. is there more background for this?

Would it maybe be helpful to see the before and after? I had kept this
uapi change in a separate patch for a while locally but in the end
decided to squash it before sending out my updated series.

Although I did find it a bit awkward with the bitfields, I was mainly
concerned about the extensibility of packing logically separate
attributes into the config members and had heard similar concerns from
a few others who had been experimenting with my patches too.

A few simple attributes I can think of a.t.m that we might want to add
in the future are:
- control of the OABUFFER size
- a way to ask the kernel to collect reports at the beginning and end
of batch buffers, in addition to periodic reports
- alternative ways to uniquely identify a context to support tools
profiling a single context not necessarily owned by the current
process

It could also be interesting to expose some counter configuration
through these attributes too. E.g. on Broadwell+ we have 14 'Flexible
EU' counters included in the OA unit reports, each with a 16bit
configuration.

In a more extreme case it might also be useful to allow userspace to
specify a complete counter config, which (depending on the
configuration) could be over 100 32bit values to select the counter
signals + configure the corresponding combining logic.

Since this pmu is in a device driver it also seemed reasonably
appropriate to de-couple it slightly from the core perf_event_attr
structure by allowing driver extensible attributes.

I wonder if it might be less worrisome if the i915_oa_copy_attr() code
were instead a re-usable utility perhaps maintained in events/core.c,
so if other pmu drivers were to follow suite there would be less risk
of a mistake being made here?

Regards,
- Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-05-14 Thread NeilBrown

On Fri, 15 May 2015 09:54:26 +1000 Dave Chinner  wrote:

> ng back On Thu, May 14, 2015 at 09:22:51AM +1000, NeilBrown wrote:
> > On Mon, 11 May 2015 11:44:28 +1000 Dave Chinner  wrote:
> > 
> > > On Fri, May 08, 2015 at 03:08:43AM -0400, Len Brown wrote:
> > > > From: Len Brown 
> > > > 
> > > > Remove sys_sync() from the kernel's suspend flow.
> > > > 
> > > > sys_sync() is extremely expensive in some configurations,
> > > > and so the kernel should not force users to pay this cost
> > > > on every suspend.
> > > 
> > > Since when? Please explain what your use case is that makes this
> > > so prohibitively expensive it needs to be removed.
> > > 
> > > > 
> > > > The user-space utilities s2ram and s2disk choose to invoke sync() today.
> > > > A user can invoke suspend directly via /sys/power/state to skip that 
> > > > cost.
> > > 
> > > So, you want to have s2disk write all the dirty pages in memory to
> > > the suspend image, rather than to the filesystem?
> > > 
> > > Either way you have to write that dirty data to disk, but if you
> > > write it to the suspend image, it then has to be loaded again on
> > > resume, and then written again to the filesystem the system has
> > > resumed. This doesn't seem very efficient to me
> > > 
> > > And, quite frankly, machines fail to resume from suspne dall the
> > > time. e.g. run out of batteries when they are under s2ram
> > > conditions, or s2disk fails because a kernel upgrade was done before
> > > the s2disk and so can't be resumed. With your change, users lose all
> > > the data that was buffered in memory before suspend, whereas right
> > > now it is written to disk and so nothing is lost if the resume from
> > > suspend fails for whatever reason.
> > > 
> > > IOWs, I can see several good reasons why the sys_sync() needs to
> > > remain in the suspend code. User data safety and filesystem
> > > integrity is far, far more important than a couple of seconds
> > > improvement in suspend speed
> > 
> > To be honest, this sounds like superstition and fear, not science and fact.
> > 
> > "filesystem integrity" is not an issue for the fast majority of filesystems
> > which use journalling to ensure continued integrity even after a crash.  I
> > think even XFS does that :-)
> 
> It has nothing to do with journalling, and everything to do with
> bring filesystems to an *idle state* before suspend runs.  We have a
> long history of bug reports with XFS that go: suspend, resume, XFS
> almost immediately detects corruption, shuts down.
> 
> The problem is that "sync" doesn't make the filesystem idle - XFs
> has *lots* of background work going on, and if we aren't *real
> careful* the filesystem is still doing work while the hardware gets
> powerd down and the suspend image is being taken. the result is on
> resume that the on-disk filesystem state does not match the memory
> image pulled back from resume, and we get shutdowns.
> 
> sys_sync() does not guarantee a filesystem is idle - it guarantees
> the data in memory is recoverable, butit doesn't stop the filesystem
> from doing things like writing back metadata or running background
> cleaup tasks. If those aren't stopped properly, then we get into
> the state where in-memory and on-disk state get out of whack. And
> s2ram can have these problems too, because if there is IO in flight
> when the hardware is powered down, that IO is lost

This seems to be the nub of your complaint - yes?

Some storage devices don't handle suspend as well as they should and lose
requests resulting in corruption.  They should obviously be fixed, but it is
you who gets the problem reports and you are not in a position to fix them.
So you want a general solution that hides those problems.
sys_sync at suspend time is a sort-of solution because it flushes and waits
so there  is less in-flight IO immediately after a sys_sync and so less
opportunity for a bad device to stuff up.
But you seem to suggest that sys_sync isn't a complete solution and it
doesn't guarantee that xfs is not doing some background metadata IO.

Maybe a sensible thing to do would be to hook the "disk" devices into suspend
and have them flush their queue and possibly send a CACHE_FLUSH command.
That would provide more of a guarantee for you, and less of a cost for Len,
would it not?

Thanks,
NeilBrown



> 
> Every time some piece of generic infrastructure changes behaviour
> w.r.t. suspend/resume, we get a new set of problems being reported
> by users. It's extremely hard to test for these problems and it
> might take months of occasional corruption reports from a user to
> isolate it to being a suspend/resume problem.  It's a game of
> whack-a-mole, because quite often they come down to the fact that
> something changed and nobody in the XFS world knew they had to now
> set an different initialisation flag on some structure or workqueue
> to make it work the way it needed to work.
> 
> Go back an look at the history of sys_sync() in suspend discussions
>

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-05-14 Thread Rafael J. Wysocki

On Fri, May 15, 2015 at 2:40 AM, Ming Lei  wrote:
> On Fri, May 15, 2015 at 8:34 AM, Rafael J. Wysocki  wrote:
>> On Friday, May 15, 2015 09:54:26 AM Dave Chinner wrote:
>>> ng back On Thu, May 14, 2015 at 09:22:51AM +1000, NeilBrown wrote:
>>> > On Mon, 11 May 2015 11:44:28 +1000 Dave Chinner  
>>> > wrote:
>>> >
>>> > > On Fri, May 08, 2015 at 03:08:43AM -0400, Len Brown wrote:
>>> > > > From: Len Brown 
>>> > > >
>>> > > > Remove sys_sync() from the kernel's suspend flow.
>>> > > >
>>> > > > sys_sync() is extremely expensive in some configurations,
>>> > > > and so the kernel should not force users to pay this cost
>>> > > > on every suspend.
>>> > >
>>> > > Since when? Please explain what your use case is that makes this
>>> > > so prohibitively expensive it needs to be removed.
>>> > >
>>> > > >
>>> > > > The user-space utilities s2ram and s2disk choose to invoke sync() 
>>> > > > today.
>>> > > > A user can invoke suspend directly via /sys/power/state to skip that 
>>> > > > cost.
>>> > >
>>> > > So, you want to have s2disk write all the dirty pages in memory to
>>> > > the suspend image, rather than to the filesystem?
>>> > >
>>> > > Either way you have to write that dirty data to disk, but if you
>>> > > write it to the suspend image, it then has to be loaded again on
>>> > > resume, and then written again to the filesystem the system has
>>> > > resumed. This doesn't seem very efficient to me
>>> > >
>>> > > And, quite frankly, machines fail to resume from suspne dall the
>>> > > time. e.g. run out of batteries when they are under s2ram
>>> > > conditions, or s2disk fails because a kernel upgrade was done before
>>> > > the s2disk and so can't be resumed. With your change, users lose all
>>> > > the data that was buffered in memory before suspend, whereas right
>>> > > now it is written to disk and so nothing is lost if the resume from
>>> > > suspend fails for whatever reason.
>>> > >
>>> > > IOWs, I can see several good reasons why the sys_sync() needs to
>>> > > remain in the suspend code. User data safety and filesystem
>>> > > integrity is far, far more important than a couple of seconds
>>> > > improvement in suspend speed
>>> >
>>> > To be honest, this sounds like superstition and fear, not science and 
>>> > fact.
>>> >
>>> > "filesystem integrity" is not an issue for the fast majority of 
>>> > filesystems
>>> > which use journalling to ensure continued integrity even after a crash.  I
>>> > think even XFS does that :-)
>>>
>>> It has nothing to do with journalling, and everything to do with
>>> bring filesystems to an *idle state* before suspend runs.  We have a
>>> long history of bug reports with XFS that go: suspend, resume, XFS
>>> almost immediately detects corruption, shuts down.
>>>
>>> The problem is that "sync" doesn't make the filesystem idle - XFs
>>> has *lots* of background work going on, and if we aren't *real
>>> careful* the filesystem is still doing work while the hardware gets
>>> powerd down and the suspend image is being taken. the result is on
>>> resume that the on-disk filesystem state does not match the memory
>>> image pulled back from resume, and we get shutdowns.
>>>
>>> sys_sync() does not guarantee a filesystem is idle - it guarantees
>>> the data in memory is recoverable, butit doesn't stop the filesystem
>>> from doing things like writing back metadata or running background
>>> cleaup tasks. If those aren't stopped properly, then we get into
>>> the state where in-memory and on-disk state get out of whack. And
>>> s2ram can have these problems too, because if there is IO in flight
>>> when the hardware is powered down, that IO is lost
>>>
>>> Every time some piece of generic infrastructure changes behaviour
>>> w.r.t. suspend/resume, we get a new set of problems being reported
>>> by users. It's extremely hard to test for these problems and it
>>> might take months of occasional corruption reports from a user to
>>> isolate it to being a suspend/resume problem.  It's a game of
>>> whack-a-mole, because quite often they come down to the fact that
>>> something changed and nobody in the XFS world knew they had to now
>>> set an different initialisation flag on some structure or workqueue
>>> to make it work the way it needed to work.
>>>
>>> Go back an look at the history of sys_sync() in suspend discussions
>>> over the past 10 years.  You'll find me saying exactly the same
>>> thing again and again about sys_sync(): it does not guarantee the
>>> filesystem is in an idle or coherent, unchanging state, and nothing
>>> in the suspend code tells the filesystem to enter an idle or frozen
>>> state. We actually have mechanisms for doing this - we use it in the
>>> storage layers to idle the filesystem while we do things like *take
>>> a snapshot*.
>>>
>>> What is the mechanism suspend to disk uses? It *takes a snapshot* of
>>> system state, written to disk. It's supposed to be consistent, and
>>> the only way you can guarantee the state of an active,

Re: [PATCH 7/9] ARM: multi_v7_defconfig: Enable options for Exynos display support

2015-05-14 Thread Javier Martinez Canillas

Hello Krzysztof,

On Fri, May 15, 2015 at 2:36 AM, Krzysztof Kozlowski
 wrote:
> 2015-05-15 0:40 GMT+09:00 Javier Martinez Canillas
> :
>> Many Exynos devices have devices attached to their display ports.
>> This patch enables the needed Kconfig options to support different
>> configuration such as simple panel, embedded DisplayPort (eDP) to
>> LVDS bridges and HDMI displays.
>
> Enabling the display would be nice but for some quite long time we had
> issues with DRM on Exynos. exynos_defconfig has it enabled and most of
> boards boot fine with it. Exception is Arndale 5250:

Yes, like I said in the other thread, the fact that Exynos DRM is
working fine now on most boards is mostly because the bugs were
exposed when the Exynos DRM options were enabled.

> http://storage.kernelci.org/next/next-20150514/arm-exynos_defconfig/lab-khilman/boot-exynos5250-arndale.html
> [1.630290] [drm:exynos_dp_bind] *ERROR* failed: of_get_videomode() : -22
> [1.637071] exynos-drm exynos-drm: failed to bind
> 145b.dp-controller (ops exynos_dp_ops): -22
> [1.646504] exynos-drm exynos-drm: master bind failed: -22
> [1.651391] exynos-drm: probe of exynos-drm failed with error -22
>

Ajay Kumar changed the DT bindings for the Exynos DRM Display Panel
driver some time ago but it seems that the Arndale 5250 DTS was never
updated. Something along the lines of commit [0] is needed.

> Anyway it is not like I am against it... just wondering. On the other
> hand enabling it could help in early detection of errors.
>

I think that not enabling these options will just make latent bugs to
not be exposed. As an example I found that module auto-loading was
broken for the driver of the PTN3460 eDP to LVDS bridge used in the
Exynos5250 Snow Chromebook and already posted a fix [1].

I would had never found that bug if wouldn't had tried enabling these
options in multi_v7 as a module. Also remember that the consumer
version of these machines don't have a serial console so for users
building images with multi_v7, not having display support means that
the machine is pretty useless.

> Best regards,
> Krzysztof

Best regards,
Javier

[0]: 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=0a0752c6ee58f28a29e78f1a8c38f2f1b11cba9f
[1]: https://lkml.org/lkml/2015/5/14/363
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv3 1/4] phy: phy-core: Make GENERIC_PHY an invisible option

2015-05-14 Thread Felipe Balbi

Hi,

On Wed, Apr 22, 2015 at 04:04:10PM -0700, Arun Ramamurthy wrote:
> Most of the phy providers use "select" to enable GENERIC_PHY. Since select
> is only recommended when the config is not visible, GENERIC_PHY is changed
> an invisible option. To maintain consistency, all phy providers are changed
> to "select" GENERIC_PHY and all non-phy drivers use "depends on" when the
> phy framework is explicity required. USB_MUSB_OMAP2PLUS has a cyclic
> dependency, so it is left as "select".
> 
> Signed-off-by: Arun Ramamurthy 
> ---
>  drivers/ata/Kconfig   | 1 -
>  drivers/media/platform/exynos4-is/Kconfig | 2 +-
>  drivers/phy/Kconfig   | 4 ++--
>  drivers/usb/host/Kconfig  | 4 ++--
>  drivers/video/fbdev/exynos/Kconfig| 2 +-
>  5 files changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/ata/Kconfig b/drivers/ata/Kconfig
> index 5f60155..6d2e881 100644
> --- a/drivers/ata/Kconfig
> +++ b/drivers/ata/Kconfig
> @@ -301,7 +301,6 @@ config SATA_MV
>   tristate "Marvell SATA support"
>   depends on PCI || ARCH_DOVE || ARCH_MV78XX0 || \
>  ARCH_MVEBU || ARCH_ORION5X || COMPILE_TEST
> - select GENERIC_PHY
>   help
> This option enables support for the Marvell Serial ATA family.
> Currently supports 88SX[56]0[48][01] PCI(-X) chips,
> diff --git a/drivers/media/platform/exynos4-is/Kconfig 
> b/drivers/media/platform/exynos4-is/Kconfig
> index b7b2e47..b6f3eaa 100644
> --- a/drivers/media/platform/exynos4-is/Kconfig
> +++ b/drivers/media/platform/exynos4-is/Kconfig
> @@ -31,7 +31,7 @@ config VIDEO_S5P_FIMC
>  config VIDEO_S5P_MIPI_CSIS
>   tristate "S5P/EXYNOS MIPI-CSI2 receiver (MIPI-CSIS) driver"
>   depends on REGULATOR
> - select GENERIC_PHY
> + depends on GENERIC_PHY
>   help
> This is a V4L2 driver for Samsung S5P and EXYNOS4 SoC MIPI-CSI2
> receiver (MIPI-CSIS) devices.
> diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
> index 2962de2..edecdb1 100644
> --- a/drivers/phy/Kconfig
> +++ b/drivers/phy/Kconfig
> @@ -5,7 +5,7 @@
>  menu "PHY Subsystem"
>  
>  config GENERIC_PHY
> - bool "PHY Core"
> + bool
>   help
> Generic PHY support.
>  
> @@ -72,7 +72,7 @@ config PHY_MIPHY365X
>  config PHY_RCAR_GEN2
>   tristate "Renesas R-Car generation 2 USB PHY driver"
>   depends on ARCH_SHMOBILE
> - depends on GENERIC_PHY
> + select GENERIC_PHY

so some you changed from depends to select...

>   help
> Support for USB PHY found on Renesas R-Car generation 2 SoCs.
>  
> diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig
> index 5ad60e4..e2197e2 100644
> --- a/drivers/usb/host/Kconfig
> +++ b/drivers/usb/host/Kconfig
> @@ -182,7 +182,7 @@ config USB_EHCI_HCD_SPEAR
>  config USB_EHCI_HCD_STI
>   tristate "Support for ST STiHxxx on-chip EHCI USB controller"
>   depends on ARCH_STI && OF
> - select GENERIC_PHY
> + depends on GENERIC_PHY

while others you changed from select to depends.

NAK.

-- 
balbi


signature.asc
Description: Digital signature

Re: [PATCH v2] x86: Preserve iopl on fork and execve

2015-05-14 Thread H. Peter Anvin

On 05/14/2015 03:41 AM, Josh Triplett wrote:
> 
> I have a patch series that does exactly that, compiling out the syscalls
> as well as the underlying architecture-specific infrastructure.  (Saves
> quite a bit of space, too.)
> 
> It still needs some more detailed x86 architecture review.  Peter, Ingo?
> Would you be interested in taking (an updated version of) that patch
> series for the next merge window?
> 

I think that makes sense.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Window watchdog driver design

2015-05-14 Thread Guenter Roeck


On 05/14/2015 07:09 AM, Andreas Werner wrote:

On Thu, May 14, 2015 at 06:30:05AM -0700, Guenter Roeck wrote:

On 05/14/2015 04:56 AM, Andreas Werner wrote:

Hi,
in the next few weeks I need to write a driver for a window wachtdog
implemented in a CPLD. I have some questions about the design
of the driver and the best way to write this driver to also be able
to submit it.

The triggering and configuration of the Watchdog is done by several GPIOs which
are connected to the CPLD watchdog device. The correct GPIOs are configurable
using the Device Tree.

1. Timeout
The timeout values are defined in ms and start from 20ms to 2560ms.
The timout is set by 3 GPIOs this means we have only 8 different
timout values. It is also possible that a future Watchdog CPLD device
does have different timeout values.

Is it possible to set ms timeouts? It seems that the WDT API does
only support a resolution of 1sec.

One idea would be to use the API timeout as something like a timeout
index to set the different values. Of course this needs to be 
documented.

e.g.
timeout (API)   timeout in device
1   20ms
2   100ms
3   500ms
... ... 

2. Upper/Lower Window
There is currently no support for a windowed watchdog in the wdt core.
The lower window can be activated by a gpio and its timeout is defined
as "upper windows timeout/4"  

What is the best way to implement those additional settings? Adding 
additional
ioctl or export these in sysfs?
--


Sorry for the maybe dumb question, but what is a window watchdog,
and what is the lower window timeout for (assuming the upper window
timeout causes the watchdog to expire) ?

Guenter



Oh sorry forgot to describe it in more detail.

If you have a watchdog window you do not have just one timeout where the 
watchdog can expire.
You have a so called "window" to trigger it within.

||
---lower timeoutupper timeout

This means you have to trigger the watchdog not to late and not to early.
This kind of watchdog is often used in embedded applications or more often
in safety cases to fullfil requirements given e.g. by SIL1-SIL4 certifications.

The lower timeout is set by a dedicated GPIO and the value will then "Upper timeout 
/ 4". The
upper timeout is set by 3 GPIOs to get different timeout values.



Thanks a lot for the explanation.

I would suggest to use a module parameter to enable the "lower timeout" 
functionality.

Timeouts have to be specified in seconds.

Hope this helps,
Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

2015-05-14 Thread Richard Guy Briggs

On 15/05/14, Steve Grubb wrote:
> On Tuesday, May 12, 2015 03:57:59 PM Richard Guy Briggs wrote:
> > On 15/05/05, Steve Grubb wrote:
> > > I think there needs to be some more discussion around this. It seems like
> > > this is not exactly recording things that are useful for audit.
> > 
> > It seems to me that either audit has to assemble that information, or
> > the kernel has to do so.  The kernel doesn't know about containers
> > (yet?).
> 
> Auditing is something that has a lot of requirements imposed on it by 
> security 
> standards. There was no requirement to have an auid until audit came along 
> and 
> said that uid is not good enough to know who is issuing commands because of 
> su 
> or sudo. There was no requirement for sessionid until we had to track each 
> action back to a login so we could see if the login came from the expected 
> place. 
> 
> What I am saying is we have the same situation. Audit needs to track a 
> container and we need an ID. The information that is being logged is not 
> useful for auditing. Maybe someone wants that info in syslog, but I doubt it. 
> The audit trail's purpose is to allow a security officer to reconstruct the 
> events to determine what happened during some security incident.

I agree the information being logged is not yet useful, but it is a
component of what would be.  I wasn't ever thinking about syslog...  It
is this trail that I was trying to help create.

> What they would want to know is what resources were assigned; if two 
> containers shared a resource, what resource and container was it shared with; 
> if two containers can communicate, we need to see or control information flow 
> when necessary; and we need to see termination and release of resources.

So, namespaces are a big part of this.  I understand how they are
spawned and potentially shared.  I have a more vague idea about how
cgroups contribute to this concept of a container.  So far, I have very
little idea how seccomp contributes, but I assume that it will also need
to be part of this tracking.

> Also, if the host OS cannot make sense of the information being logged 
> because 
> the pid maps to another process name, or a uid maps to another user, or a 
> file 
> access maps to something not in the host's, then we need the container to do 
> its own auditing and resolve these mappings and optionally pass these to an 
> aggregation server.

I'm open to both being possible.

> Nothing else makes sense.
> 
> > > On Friday, April 17, 2015 03:35:52 AM Richard Guy Briggs wrote:
> > > > Log the creation and deletion of namespace instances in all 6 types of
> > > > namespaces.
> > > > 
> > > > Twelve new audit message types have been introduced:
> > > > AUDIT_NS_INIT_MNT   1330/* Record mount namespace instance
> > > > creation
> > > > */ AUDIT_NS_INIT_UTS   1331/* Record UTS namespace instance
> > > > creation */ AUDIT_NS_INIT_IPC   1332/* Record IPC namespace
> > > > instance creation */ AUDIT_NS_INIT_USER  1333/* Record USER
> > > > namespace instance creation */ AUDIT_NS_INIT_PID   1334/* Record
> > > > PID namespace instance creation */ AUDIT_NS_INIT_NET   1335/*
> > > > Record NET namespace instance creation */ AUDIT_NS_DEL_MNT1336
> > > > /* Record mount namespace instance deletion */ AUDIT_NS_DEL_UTS   
> > > > 1337
> > > > 
> > > >/* Record UTS namespace instance deletion */ AUDIT_NS_DEL_IPC
> > > > 
> > > > 1338/* Record IPC namespace instance deletion */ AUDIT_NS_DEL_USER
> > > > 
> > > >  1339/* Record USER namespace instance deletion */ AUDIT_NS_DEL_PID
> > > >  
> > > >1340/* Record PID namespace instance deletion */ AUDIT_NS_DEL_NET
> > > >
> > > > 1341/* Record NET namespace instance deletion */
> > > 
> > > The requirements for auditing of containers should be derived from VPP. In
> > > it, it asks for selectable auditing, selective audit, and selective audit
> > > review. What this means is that we need the container and all its
> > > children to have one identifier that is inserted into all the events that
> > > are associated with the container.
> > 
> > Is that requirement for the records that are sent from the kernel, or
> > for the records stored by auditd, or by another facility that delivers
> > those records to a final consumer?
> 
> A little of both. Selective audit means that you can set rules to include or 
> exclude an event. This is done in the kernel. Selectable review means that 
> the 
> user space tools need to be able to skip past records not of interest to a 
> specific line of inquiry. Also, logging everything and letting user space 
> work 
> it out later is also not a solution because the needle is harder to find in a 
> larger haystack. Or, the logs may rotate and its gone forever because the 
> partition is filled. 

I agree it needs to be a balance of flexibility and efficiency.

> > > With this, its possible to do a search for all events related to a
>

Re: [Linux-nvdimm] [PATCH v2 18/20] libnd: infrastructure for btt devices

2015-05-14 Thread Dan Williams

On Tue, May 12, 2015 at 9:33 AM, Toshi Kani  wrote:
> On Tue, 2015-04-28 at 14:25 -0400, Dan Williams wrote:
>> Block devices from an nd bus, in addition to accepting "struct bio"
>> based requests, also have the capability to perform byte-aligned
>> accesses.  By default only the bio/block interface is used.  However, if
>> another driver can make effective use of the byte-aligned capability it
>> can claim/disable the block interface and use the byte-aligned "nd_io"
>> interface.
>>
>> The BTT driver is the intended first consumer of this mechanism to allow
>> layering atomic sector update guarantees on top of nd_io capable
>> nd-bus-block-devices.
>  :
>> +static int nd_btt_autodetect(struct nd_bus *nd_bus, struct nd_io *ndio,
>> + struct block_device *bdev)
>> +{
>> + char name[BDEVNAME_SIZE];
>> + struct nd_btt *nd_btt;
>> + struct btt_sb *btt_sb;
>> + u64 offset, checksum;
>> + u32 lbasize;
>> + u8 *uuid;
>> + int rc;
>> +
>> + btt_sb = kzalloc(sizeof(*btt_sb), GFP_KERNEL);
>> + if (!btt_sb)
>> + return -ENODEV;
>> +
>> + offset = nd_partition_offset(bdev);
>> + rc = ndio->rw_bytes(ndio, btt_sb, offset + SZ_4K, sizeof(*btt_sb), 
>> READ);
>> + if (rc)
>> + goto out_free_sb;
>> +
>> + if (get_capacity(bdev->bd_disk) < SZ_16M / 512)
>> + goto out_free_sb;
>> +
>> + if (memcmp(btt_sb->signature, BTT_SIG, BTT_SIG_LEN) != 0)
>> + goto out_free_sb;
>> +
>> + checksum = le64_to_cpu(btt_sb->checksum);
>> + btt_sb->checksum = 0;
>> + if (checksum != nd_btt_sb_checksum(btt_sb))
>> + goto out_free_sb;
>> + btt_sb->checksum = cpu_to_le64(checksum);
>> +
>> + uuid = kmemdup(btt_sb->uuid, 16, GFP_KERNEL);
>> + if (!uuid)
>> + goto out_free_sb;
>> +
>> + lbasize = le32_to_cpu(btt_sb->external_lbasize);
>> + nd_btt = __nd_btt_create(nd_bus, lbasize, uuid);
>
> When BTT is first set up, user binds a seed "btt0" to a block device,
> such as /dev/pmem0.  It then creates /dev/nd0 bound to /dev/pmem0.
>
> After a reboot, nd_btt_autodetect() detects the BTT setup and creates a
> new "btt1" since it is called after a seed "btt0" is created.
> Therefore, it creates /dev/nd1 bound to /dev/pmem0 this time.
>
> Is this how it is intended to work, i.e. "btt0" as the default seed btt?
> While user should not rely on the name of /dev/nd%d, I thought this
> device name change was confusing...

So we can fix this to be at least as stable as the backing device
names [1], but as far as I can see we would need to start using the
backing device name in the btt device name.  A strawman proposal is to
append 's' to indicated 'sectored'.  So /dev/pmem0s is the btt
instance fronting /dev/pmem0.  Other examples:

/dev/pmem0p1s
/dev/ndblk0.0s
/dev/ndblk0.0p1s
...

Thoughts?

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000636.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-05-14 Thread Ming Lei

On Fri, May 15, 2015 at 8:34 AM, Rafael J. Wysocki  wrote:
> On Friday, May 15, 2015 09:54:26 AM Dave Chinner wrote:
>> ng back On Thu, May 14, 2015 at 09:22:51AM +1000, NeilBrown wrote:
>> > On Mon, 11 May 2015 11:44:28 +1000 Dave Chinner  
>> > wrote:
>> >
>> > > On Fri, May 08, 2015 at 03:08:43AM -0400, Len Brown wrote:
>> > > > From: Len Brown 
>> > > >
>> > > > Remove sys_sync() from the kernel's suspend flow.
>> > > >
>> > > > sys_sync() is extremely expensive in some configurations,
>> > > > and so the kernel should not force users to pay this cost
>> > > > on every suspend.
>> > >
>> > > Since when? Please explain what your use case is that makes this
>> > > so prohibitively expensive it needs to be removed.
>> > >
>> > > >
>> > > > The user-space utilities s2ram and s2disk choose to invoke sync() 
>> > > > today.
>> > > > A user can invoke suspend directly via /sys/power/state to skip that 
>> > > > cost.
>> > >
>> > > So, you want to have s2disk write all the dirty pages in memory to
>> > > the suspend image, rather than to the filesystem?
>> > >
>> > > Either way you have to write that dirty data to disk, but if you
>> > > write it to the suspend image, it then has to be loaded again on
>> > > resume, and then written again to the filesystem the system has
>> > > resumed. This doesn't seem very efficient to me
>> > >
>> > > And, quite frankly, machines fail to resume from suspne dall the
>> > > time. e.g. run out of batteries when they are under s2ram
>> > > conditions, or s2disk fails because a kernel upgrade was done before
>> > > the s2disk and so can't be resumed. With your change, users lose all
>> > > the data that was buffered in memory before suspend, whereas right
>> > > now it is written to disk and so nothing is lost if the resume from
>> > > suspend fails for whatever reason.
>> > >
>> > > IOWs, I can see several good reasons why the sys_sync() needs to
>> > > remain in the suspend code. User data safety and filesystem
>> > > integrity is far, far more important than a couple of seconds
>> > > improvement in suspend speed
>> >
>> > To be honest, this sounds like superstition and fear, not science and fact.
>> >
>> > "filesystem integrity" is not an issue for the fast majority of filesystems
>> > which use journalling to ensure continued integrity even after a crash.  I
>> > think even XFS does that :-)
>>
>> It has nothing to do with journalling, and everything to do with
>> bring filesystems to an *idle state* before suspend runs.  We have a
>> long history of bug reports with XFS that go: suspend, resume, XFS
>> almost immediately detects corruption, shuts down.
>>
>> The problem is that "sync" doesn't make the filesystem idle - XFs
>> has *lots* of background work going on, and if we aren't *real
>> careful* the filesystem is still doing work while the hardware gets
>> powerd down and the suspend image is being taken. the result is on
>> resume that the on-disk filesystem state does not match the memory
>> image pulled back from resume, and we get shutdowns.
>>
>> sys_sync() does not guarantee a filesystem is idle - it guarantees
>> the data in memory is recoverable, butit doesn't stop the filesystem
>> from doing things like writing back metadata or running background
>> cleaup tasks. If those aren't stopped properly, then we get into
>> the state where in-memory and on-disk state get out of whack. And
>> s2ram can have these problems too, because if there is IO in flight
>> when the hardware is powered down, that IO is lost
>>
>> Every time some piece of generic infrastructure changes behaviour
>> w.r.t. suspend/resume, we get a new set of problems being reported
>> by users. It's extremely hard to test for these problems and it
>> might take months of occasional corruption reports from a user to
>> isolate it to being a suspend/resume problem.  It's a game of
>> whack-a-mole, because quite often they come down to the fact that
>> something changed and nobody in the XFS world knew they had to now
>> set an different initialisation flag on some structure or workqueue
>> to make it work the way it needed to work.
>>
>> Go back an look at the history of sys_sync() in suspend discussions
>> over the past 10 years.  You'll find me saying exactly the same
>> thing again and again about sys_sync(): it does not guarantee the
>> filesystem is in an idle or coherent, unchanging state, and nothing
>> in the suspend code tells the filesystem to enter an idle or frozen
>> state. We actually have mechanisms for doing this - we use it in the
>> storage layers to idle the filesystem while we do things like *take
>> a snapshot*.
>>
>> What is the mechanism suspend to disk uses? It *takes a snapshot* of
>> system state, written to disk. It's supposed to be consistent, and
>> the only way you can guarantee the state of an active, mounted
>> filesystem has consistent in-memory state and on-disk state and
>> that it won't get changed is to *freeze the filesystem*.
>>
>> Removing the

[PATCH] clk: Kconfig: Move bcm Kconfig into clk menu

2015-05-14 Thread Stephen Boyd

Having this Kconfig sourced outside the clk menu means the option
is under the "Device Drivers" menu instead of the "Common Clock
Framework" menu. Move it so that the bcm clock config options are
in the right place.

Cc: Alex Elder 
Signed-off-by: Stephen Boyd 
---
 drivers/clk/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
index 9897f353bf1a..67e3a84d2805 100644
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@ -150,11 +150,11 @@ config COMMON_CLK_CDCE706
---help---
  This driver supports TI CDCE706 programmable 3-PLL clock synthesizer.
 
+source "drivers/clk/bcm/Kconfig"
 source "drivers/clk/qcom/Kconfig"
 
 endmenu
 
-source "drivers/clk/bcm/Kconfig"
 source "drivers/clk/mvebu/Kconfig"
 
 source "drivers/clk/samsung/Kconfig"
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/9] ARM: multi_v7_defconfig: Enable options for Exynos display support

2015-05-14 Thread Krzysztof Kozlowski

2015-05-15 0:40 GMT+09:00 Javier Martinez Canillas
:
> Many Exynos devices have devices attached to their display ports.
> This patch enables the needed Kconfig options to support different
> configuration such as simple panel, embedded DisplayPort (eDP) to
> LVDS bridges and HDMI displays.

Enabling the display would be nice but for some quite long time we had
issues with DRM on Exynos. exynos_defconfig has it enabled and most of
boards boot fine with it. Exception is Arndale 5250:
http://storage.kernelci.org/next/next-20150514/arm-exynos_defconfig/lab-khilman/boot-exynos5250-arndale.html
[1.630290] [drm:exynos_dp_bind] *ERROR* failed: of_get_videomode() : -22
[1.637071] exynos-drm exynos-drm: failed to bind
145b.dp-controller (ops exynos_dp_ops): -22
[1.646504] exynos-drm exynos-drm: master bind failed: -22
[1.651391] exynos-drm: probe of exynos-drm failed with error -22

Anyway it is not like I am against it... just wondering. On the other
hand enabling it could help in early detection of errors.

Best regards,
Krzysztof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/9] multi_v7_defconfig: Enable options for Exynos Chromebooks

2015-05-14 Thread Javier Martinez Canillas

Hello Krzysztof,

On Fri, May 15, 2015 at 2:18 AM, Krzysztof Kozlowski
 wrote:
> 2015-05-15 0:40 GMT+09:00 Javier Martinez Canillas
> :
>> Hello arm-soc maintainers,
>>
>> This series is an attempt to reduce the delta between exynos_defconfig
>> and multi_v7_defconfig. Primarily to enable the needed Kconfig symbols
>> to make all Exynos Chromebooks peripherals to be working when building
>> an image using the ARMv7 multi-platform default config.
>>
>> Since the policy is now to now enable as much as possible, I did build
>> as a module all the Kconfig symbols that were tristate and only enable
>> as built-in those that can't be a module because are boolean options.
>>
>> A nice side effect of this series is that I found that many drivers
>> were not working properly when built as a module because the modalias
>> information was not filled properly or at all. I've posted patches to
>> fix the issues I found when testing this series.
>>
>> The patches have been tested on an Exynos5250 Snow, Exynos5420 Peach
>> Pit and Exynos5800 Peach Pi Chromebooks but most config options will
>> be useful for others Exynos5 or other Samsung SoCs.
>
> I think enabling these config options would help using the multi_v7 on
> Exynos boards. I have doubts only for patch 7 (DRM), so for rest of
> them:
> Reviewed-by: Krzysztof Kozlowski 
>

Thanks a lot for your review. Do you think that there is something
wrong with patch 7 (DRM) or your doubt is given that the Exynos DRM
driver has been so unstable in the past, enabling could cause more
harm than good?

IMHO enabling the Exynos DRM options in exynos_defconfig was worth the
trouble since a lot of bugs were exposed (and fixed!) while before we
had unnoticed broken code laying around.

> Best regards,
> Krzysztof
> --

Best regards,
Javier
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: btrfs balance 4.0 regression?

2015-05-14 Thread Omar Sandoval

On Fri, May 15, 2015 at 12:15:06AM +, Duncan wrote:
> Josh Boyer posted on Thu, 14 May 2015 08:43:25 -0400 as excerpted:
> 
> > Hi Omar and Chris,
> > 
> > We have a bug reported [1] against 4.0 saying that btrfs balance is
> > broken.  The reporter found a revert patch that Omar sent [2] to revert
> > commit 2f0810880.  Looking in Linus' latest tree, I don't see that
> > revert and I don't immediately see a patch to fix the issue Omar
> > reported either.
> > 
> > Do either of you know if this is still an issue?  If not, which commit
> > was it fixed by?
> > 
> > josh
> > 
> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1217191
> > [2] https://patchwork.kernel.org/patch/6238111/
> 
> Still an issue, officially as of dev comments a day or two ago, at least.

Yup, Chris says he has a proper fix but it hasn't hit the list yet.

> From various comments including from Chris Mason directly, the devs are 
> aware of it, but (from a non-dev list-regular perspective) there's a 
> seeming reluctance to simply apply the revert patch.  Not being a dev I 
> can't explain why tho I can speculate that the patch is logically correct 
> and simply triggers this other bug.  But further patches have yet to 
> appear.
> 
> Part of the problem may be a bit of confusion as some of the devs 
> evidently thought the revert patch fixed the problem and hadn't been 
> worrying about it until others pointed out the revert hadn't been applied 
> and the problem thus remained.
> 
> So as of now, the choice appears to be broken balance-convert with the 
> current code, or broken ext*-convert with that patch reverted.  Both 
> cases aren't entirely common, so I guess it's up to you which you want to 
> break ATM.

Actually, ext4 convert is broken anyways (with irrelevant output
elided):

# mkfs.ext4 -F /dev/vdb
# btrfs-convert /dev/vdb
# mount /dev/vdb /mnt
# btrfs fi df /mnt
Data, single: total=2.64GiB, used=163.70MiB <- single
System, single: total=32.00MiB, used=16.00KiB   <- single
Metadata, single: total=1.33GiB, used=37.13MiB  <- single
GlobalReserve, single: total=16.00MiB, used=0.00B   <- single
# btrfs device add -f /dev/vdc /mnt
# btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt
Done, had to relocate 9 out of 9 chunks
# btrfs fi df /mnt
Data, single: total=832.00MiB, used=200.55MiB   <- still single
System, single: total=32.00MiB, used=16.00KiB   <- still single
Metadata, single: total=256.00MiB, used=368.00KiB   <- still single
GlobalReserve, single: total=16.00MiB, used=0.00B   <- still single

So the balance succeeds unlike before the commit that caused the
regression, but the profile is still single, which defeats the purpose.

-- 
Omar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 2/5] arm64: hi6220: Document devicetree bindings for Hisilicon hi6220 SoC

2015-05-14 Thread Stephen Boyd

On 05/05, Bintian Wang wrote:
> This patch adds documentation for the devicetree bindings used by the
> DT files of Hisilicon hi6220 SoC mobile platform.
> 
> Signed-off-by: Bintian Wang 

Acked-by: Stephen Boyd 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 08/23] net/xen-netback: Remove unused code in xenvif_rx_action

2015-05-14 Thread Wei Liu

On Thu, May 14, 2015 at 06:00:48PM +0100, Julien Grall wrote:
> The variables old_req_cons and ring_slots_used are assigned but never
> used since commit 1650d5455bd2dc6b5ee134bd6fc1a3236c266b5b "xen-netback:
> always fully coalesce guest Rx packets".
> 
> Signed-off-by: Julien Grall 
> Cc: Ian Campbell 
> Cc: Wei Liu 
> Cc: net...@vger.kernel.org

Acked-by: Wei Liu 

> ---
>  drivers/net/xen-netback/netback.c | 5 -
>  1 file changed, 5 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c 
> b/drivers/net/xen-netback/netback.c
> index 9c6a504..9ae1d43 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -515,14 +515,9 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
>  
>   while (xenvif_rx_ring_slots_available(queue, XEN_NETBK_RX_SLOTS_MAX)
>  && (skb = xenvif_rx_dequeue(queue)) != NULL) {
> - RING_IDX old_req_cons;
> - RING_IDX ring_slots_used;
> -
>   queue->last_rx_time = jiffies;
>  
> - old_req_cons = queue->rx.req_cons;
>   XENVIF_RX_CB(skb)->meta_slots_used = xenvif_gop_skb(skb, , 
> queue);
> - ring_slots_used = queue->rx.req_cons - old_req_cons;
>  
>   __skb_queue_tail(, skb);
>   }
> -- 
> 2.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

2015-05-14 Thread Linus Torvalds

On Thu, May 14, 2015 at 4:36 PM, Al Viro  wrote:
> On Thu, May 14, 2015 at 04:24:13PM -0700, Linus Torvalds wrote:
>
>> So ASCII-only case-insensitivity is sufficient for you guys?
>>
>> Doing case-insensitive lookups at a vfs layer level wouldn't be
>> impossible (add some new lookup flag, so it would *not* be
>> per-filesystem, it would be per-operation!),
>
> ENOPARSE.  Either two names are equivalent or they are not; it's not a
> per-operation thing.  What do you mean?

We can easily make things per-operation, by adding another flag. We
already have per-operation flags like LOOKUP_FOLLOW, which decides if
we follow the last symlink or not. We could add a LOOKUP_ICASE, which
decides whether we compare case or not. Obviously, we'd have to ad the
proper O_ICASE for open (and AT_ICASE for fstatat() and friends).
Exactly like we do for LOOKUP_FOLLOW.

HOWEVER.

The reason ASCII-only matters is two-fold:

 (a) hashing needs to work, and hash all equivalent names to the same
bucket. And we need to hash the same *regardless* of whether the
operation was done with ICASE or not.

 With ASCII, this is fairly easy: we could easily make the hashing
just mask bit 5 in each byte, and that wouldn't slow us down at all,
and it would hardly change the hash effectiveness either. m

 In particular, with ASCII, we can trivially still do the
word-at-a-time hashing.  So there's fairly little downside.

 (b) The *compare* needs to work too. In particular, right now we very
much try to avoid comparing the names by checking both the full hash
and the name length. Again, that's fine with ASCII - two names that
differ in case are the same length.

 And again, we can still use the word-at-a-time compare, just have
a mask (and at compare time, we can make the mask depend on ICASE).
Sure, you'll still have to do a more careful compare (becaue
case-insensitivity is not *just* "same except for bit 5 even in
ASCII), but we can trivially have a ICASE test up front, and keep the
fast case exactly the same as before.

Now, doing full UTF-8 is *much* harder. Part of it is that outside of
ASCII, you literally have cases that are ambiguous. Part of it is that
outside of ASCII, now the lengths aren't even guaranteed to match. And
part of it is that now you have to do things that are much more
complex than just masking bits in parallel for multiple bytes at the
same time (although you can still have a fast-path that depends on
just masking the high bit, to at least say "this is just the ASCII
subcase").

But doing ASCII ICASE compares wouldn't be that hard, and wouldn't
affect performance.

Btw, don't get me wrong. I'm not saying it's a great idea. I think
icase compares are stupid. Really really stupid. But samba might be
worth jumping though a few hoops for. The real problem is that even
with just ASCII, it does make it much easier to create nasty hash
collisions in the dentry hashes (same hash from 256 variations of
aAaAAaaA - just repeat the same letter in different variations of
lower/upper case).

So even plain ASCII icase has some real problems. But it's
conceptually not that hard. True UTF-8 icase? That's an absolute
*nightmare*, and causes serious problems. OS X got it very very wrong,
for example, by messing up the normalization.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1832 matches

Mail list logo