Re: [GIT PULL] platform-drivers-x86 for 3.19

2015-01-15 Thread Andrew Lutomirski
On Jan 15, 2015 8:43 AM, "Kirill A. Shutemov"  wrote:
>
> On Tue, Jan 13, 2015 at 10:04:55AM -0800, Andrew Lutomirski wrote:
> > On Tue, Jan 13, 2015 at 9:56 AM, Darren Hart  wrote:
> > > On Mon, Jan 12, 2015 at 02:12:44PM -0800, Andrew Lutomirski wrote:
> > >> On Mon, Jan 12, 2015 at 12:30 PM, Kirill A. Shutemov
> > >>  wrote:
> > >> > On Mon, Jan 12, 2015 at 12:05:45PM -0800, Andrew Lutomirski wrote:
> > >> >> On Mon, Jan 12, 2015 at 12:03 PM, Kirill A. Shutemov
> > >> >>  wrote:
> > >> >> > On Mon, Jan 12, 2015 at 10:42:11AM -0800, Andrew Lutomirski wrote:
> > >> >> >> On Mon, Jan 12, 2015 at 10:38 AM, Darren Hart 
> > >> >> >>  wrote:
> > >> >> >> > On Mon, Jan 12, 2015 at 12:58:02AM +0200, Kirill A. Shutemov 
> > >> >> >> > wrote:
> > >> >> >> >> On Thu, Dec 18, 2014 at 09:51:27AM -0800, Darren Hart wrote:
> > >> >> >> >> > thinkpad-acpi using software mute simplifies the driver and 
> > >> >> >> >> > the user experience
> > >> >> >> >> > significantly.
> > >> >> >> >>
> > >> >> >> >> Except when it doesn't.
> > >> >> >> >>
> > >> >> >> >> I'm probably in minority, but I don't use fancy userspace to 
> > >> >> >> >> mess with my
> > >> >> >> >> mixer and the mute button worked just fine for me before the 
> > >> >> >> >> change.
> > >> >> >> >> Wasted half an hour to find out what happened is not a pure win 
> > >> >> >> >> from user
> > >> >> >> >> experience point of view.
> > >> >> >> >>
> > >> >> >> >> Is it really necessary to have software_mute_requested == true 
> > >> >> >> >> by default?
> > >> >> >> >> Can fancy userspace ask for desired behaviour instead and 
> > >> >> >> >> change kernel to
> > >> >> >> >> not send hotkeys change notification until software_mute is 
> > >> >> >> >> enabled?
> > >> >> >> >>
> > >> >> >> >> --
> > >> >> >> >>  Kirill A. Shutemov
> > >> >> >> >>
> > >> >> >> >
> > >> >> >> > Thanks for the report Kirill,
> > >> >> >> >
> > >> >> >> > Andy, we're at RC4, so if we need to fix (or revert) this fix, 
> > >> >> >> > we only have a
> > >> >> >> > couple weeks to do so.
> > >> >> >> >
> > >> >> >> > Kirill, to define the scope of the problem, if you specify
> > >> >> >> > software_mute_requested as false on the kernel command line, 
> > >> >> >> > does your system
> > >> >> >> > function as expected?
> > >> >> >>
> > >> >> >> If I understood Kirill's email correctly, the only issue is that he
> > >> >> >> liked the old behavior.  Kirill, is that correct?
> > >> >> >
> > >> >> > Yes. For now I use thinkpad_acpi.software_mute=0 to get old 
> > >> >> > behaviour.
> > >> >> >
> > >> >>
> > >> >> What aspect of the old behavior is better than the new default 
> > >> >> behavior?
> > >> >
> > >> > It's not necessary better. The new behavior just broke my use-case.
> > >> >
> > >> > I don't have anything in my system what would react on KEY_MUTE and I 
> > >> > used
> > >> > the functionality platform provides until it suddenly stopped working.
> > >> >
> > >> > I personally can live with new thinkpad_acpi. I'll probably update my
> > >> > configuration to react on the software button.
> > >> >
> > >> > But who else will get frustrated after update to v3.19?
> > >>
> > >> Can you try the equivalent of:
> > >>
> > >> bindsym XF86AudioMute exec amixer -q set Master toggle
> > >>
> > >> for your desktop environment?  Note that this is exactly what you'd do
> > >> if you were using any laptop that wasn't a thinkpad.
> > >
> > > Unless we hear back from Kirill on why this isn't workable, I'm inclined 
> > > to
> > > leave this as is. Between the kernel module option and this keybinding,
> > > equivalent behavior can be easily achieved, and the general use case is
> > > significantly improved.
> > >
> > > The current gap appears to be the mute LED in the button itself, which 
> > > can be
> > > addressed moving forward.
> >
> > I should have clarified this better.  The command:
> >
> > amixer -q set Master toggle
> >
> > should toggle the mute LED in sync with the master mixer on all
> > ThinkPad models with a mute LED.  This works on my X220, and Borislav
> > confirmed off-list that it works on his X230.  If there's a ThinkPad
> > with a mute LED on which this doesn't work, then IMO it's a bug and
> > should be fixed.
>
> Okay. Mute and mute led works. Any idea how to get mic mute led work?
> What happend to your patch on the topic?
>
> http://permalink.gmane.org/gmane.linux.drivers.platform.x86.devel/1962

It was addressed differently -- the mic mute works like the mute
button as a result of:

b317b032d2dc ALSA: hda - Split Thinkpad ACPI-related code
b67ae3f1c97e ALSA: hda - Enable Thinkpad mute/micmute LEDs for Realtek
08cf680ccafd ALSA: hda - add connection to thinkpad_acpi to control
mute/micmute LEDs

On my X220, "amixer set Capture toggle" toggles the mute LED.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please 

Re: [RFC PATCH] [media] coda: Use S_PARM to set nominal framerate for h.264 encoder

2015-01-15 Thread Frédéric Sureau

Hi Philipp,

Le 22/12/2014 17:00, Philipp Zabel a écrit :

The encoder needs to know the nominal framerate for the constant bitrate
control mechanism to work. Currently the only way to set the framerate is
by using VIDIOC_S_PARM on the output queue.

Signed-off-by: Philipp Zabel
---
  drivers/media/platform/coda/coda-common.c | 29 +
  1 file changed, 29 insertions(+)

diff --git a/drivers/media/platform/coda/coda-common.c 
b/drivers/media/platform/coda/coda-common.c
index 39330a7..63eb510 100644
--- a/drivers/media/platform/coda/coda-common.c
+++ b/drivers/media/platform/coda/coda-common.c
@@ -803,6 +803,32 @@ static int coda_decoder_cmd(struct file *file, void *fh,
return 0;
  }
  
+static int coda_g_parm(struct file *file, void *fh, struct v4l2_streamparm *a)

+{
+   struct coda_ctx *ctx = fh_to_ctx(fh);
+
+   a->parm.output.timeperframe.denominator = 1;
+   a->parm.output.timeperframe.numerator = ctx->params.framerate;
+

Maybe a->parm.output.capability should be set to V4L2_CAP_TIMEPERFRAME here.
I think it is required by GStreamer V4L2 plugin.

+   return 0;
+}
+
+static int coda_s_parm(struct file *file, void *fh, struct v4l2_streamparm *a)
+{
+   struct coda_ctx *ctx = fh_to_ctx(fh);
+
+   if (a->type == V4L2_BUF_TYPE_VIDEO_OUTPUT &&
+   a->parm.output.timeperframe.numerator != 0) {
+   ctx->params.framerate = a->parm.output.timeperframe.denominator
+ / a->parm.output.timeperframe.numerator;
+   }
+
+   a->parm.output.timeperframe.denominator = 1;
+   a->parm.output.timeperframe.numerator = ctx->params.framerate;
+
+   return 0;
+}
+
  static int coda_subscribe_event(struct v4l2_fh *fh,
const struct v4l2_event_subscription *sub)
  {
@@ -843,6 +869,9 @@ static const struct v4l2_ioctl_ops coda_ioctl_ops = {
.vidioc_try_decoder_cmd = coda_try_decoder_cmd,
.vidioc_decoder_cmd = coda_decoder_cmd,
  
+	.vidioc_g_parm		= coda_g_parm,

+   .vidioc_s_parm  = coda_s_parm,
+
.vidioc_subscribe_event = coda_subscribe_event,
.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
  };

Thanks for the patch!
Fred
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 5/8] ARM: at91: move the restart function to the system timer driver

2015-01-15 Thread Alexandre Belloni
Hi,

On 15/01/2015 at 17:39:05 +0100, Daniel Lezcano wrote :
> On 01/12/2015 04:37 PM, Alexandre Belloni wrote:
> >Restarting on an at91rm9200 is handled by using the system timer. Move that
> >function to the system timer driver.
> >
> >Signed-off-by: Alexandre Belloni 
> >Acked-by: Boris Brezillon 
> >---
> >  arch/arm/mach-at91/at91rm9200.c  | 11 ---
> >  drivers/clocksource/timer-atmel-st.c | 12 
> >  2 files changed, 12 insertions(+), 11 deletions(-)
> >
> >diff --git a/arch/arm/mach-at91/at91rm9200.c 
> >b/arch/arm/mach-at91/at91rm9200.c
> >index b52916947535..4ea889cd6091 100644
> >--- a/arch/arm/mach-at91/at91rm9200.c
> >+++ b/arch/arm/mach-at91/at91rm9200.c
> >@@ -15,7 +15,6 @@
> >
> >  #include 
> >  #include 
> >-#include 
> >  #include 
> >
> >  #include "soc.h"
> >@@ -30,15 +29,6 @@ static void at91rm9200_idle(void)
> > at91_pmc_write(AT91_PMC_SCDR, AT91_PMC_PCK);
> >  }
> >
> >-static void at91rm9200_restart(enum reboot_mode reboot_mode, const char 
> >*cmd)
> >-{
> >-/*
> >- * Perform a hardware reset with the use of the Watchdog timer.
> >- */
> >-at91_st_write(AT91_ST_WDMR, AT91_ST_RSTEN | AT91_ST_EXTEN | 1);
> >-at91_st_write(AT91_ST_CR, AT91_ST_WDRST);
> >-}
> >-
> >  /* 
> >   *  AT91RM9200 processor initialization
> >   *  */
> >@@ -51,7 +41,6 @@ static void __init at91rm9200_map_io(void)
> >  static void __init at91rm9200_initialize(void)
> >  {
> > arm_pm_idle = at91rm9200_idle;
> >-arm_pm_restart = at91rm9200_restart;
> >  }
> >
> >
> >diff --git a/drivers/clocksource/timer-atmel-st.c 
> >b/drivers/clocksource/timer-atmel-st.c
> >index 51761f8927b7..522583d6cc78 100644
> >--- a/drivers/clocksource/timer-atmel-st.c
> >+++ b/drivers/clocksource/timer-atmel-st.c
> >@@ -29,6 +29,7 @@
> >  #include 
> >
> >  #include 
> >+#include 
> >
> >  #include 
> >  #include 
> >@@ -180,6 +181,15 @@ static struct clock_event_device clkevt = {
> > .set_mode   = clkevt32k_mode,
> >  };
> >
> >+static void at91rm9200_restart(enum reboot_mode reboot_mode, const char 
> >*cmd)
> >+{
> >+/*
> >+ * Perform a hardware reset with the use of the Watchdog timer.
> >+ */
> >+at91_st_write(AT91_ST_WDMR, AT91_ST_RSTEN | AT91_ST_EXTEN | 1);
> >+at91_st_write(AT91_ST_CR, AT91_ST_WDRST);
> >+}
> >+
> >  void __iomem *at91_st_base;
> >  EXPORT_SYMBOL_GPL(at91_st_base);
> >
> >@@ -248,4 +258,6 @@ void __init at91rm9200_timer_init(void)
> >
> > /* register clocksource */
> > clocksource_register_hz(, AT91_SLOW_CLOCK);
> >+
> >+arm_pm_restart = at91rm9200_restart;
> >  }
> 
> Mmh, I can't clearly explain why but I have a problem with that.
> 
> Can you explain why restart code falls in the clockevents driver ?
> 
> 

That is a temporary location before getting rid of it by writing a
proper reset driver that will use the same syscon.

The main goal is to remove mach/at91_st.h and direct accesses to the
system timer registers.


-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 7/8] clocksource: atmel-st: use syscon/regmap

2015-01-15 Thread Alexandre Belloni
On 15/01/2015 at 17:40:37 +0100, Daniel Lezcano wrote :
> >  /*
> >@@ -234,13 +201,21 @@ err:
> >   */
> >  static void __init atmel_st_timer_init(struct device_node *node)
> >  {
> >-/* For device tree enabled device: initialize here */
> >-of_at91rm9200_st_init();
> >+unsigned int val;
> >+
> >+regmap_st = syscon_node_to_regmap(node);
> >+if (IS_ERR(regmap_st))
> >+panic(pr_fmt("Unable to get regmap\n"));
> 
> Is it possible to get ride of those panics ? IIRC, we discussed to not panic
> when a timer was not initialized in case there is a definition for another
> one.
> 

Ok, I'll remove those.

-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] at91: drivers for 3.20 #1

2015-01-15 Thread Nicolas Ferre
Arnd, Olof, Kevin,

This is a pull-request about AT91 drivers for 3.20. We took the USB gadget part
with us as it depends on the Matrix syscon part. There is no dependency anyway.

Thanks, best regards,

The following changes since commit eaa27f34e91a14cdceed26ed6c6793ec1d186115:

  linux 3.19-rc4 (2015-01-11 12:44:53 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/nferre/linux-at91.git 
tags/at91-drivers

for you to fetch changes up to a5514d142e7f5cff8e02a6fb4cfcb3e301c0eb59:

  usb: gadget: at91_udc: Allocate udc instance (2015-01-15 14:53:23 +0100)


First batch of drivers changes for 3.20:
- Internal AHB bus matrix (Matrix) and Static Memory Controller (SMC) are now
  mfd/syscon drivers.
- USB gadget full speed (at91_udc): fixes, simplification and multi-platform 
awareness
  DT enhancement.


Boris Brezillon (12):
  mfd: syscon: Add atmel-matrix registers definition
  mfd: syscon: Add Atmel Matrix bus DT binding documentation
  mfd: syscon: Add atmel-smc registers definition
  mfd: syscon: Add Atmel SMC binding doc
  usb: gadget: at91_udc: Fix clock names
  usb: gadget: at91_udc: Drop uclk clock
  usb: gadget: at91_udc: Document DT clocks and clock-names property
  usb: gadget: at91_udc: Remove non-DT handling code
  usb: gadget: at91_udc: Simplify probe and remove functions
  usb: gadget: at91_udc: Rework for multi-platform kernel support
  usb: gadget: at91_udc: Update DT binding documentation
  usb: gadget: at91_udc: Allocate udc instance

 .../devicetree/bindings/mfd/atmel-matrix.txt   |  24 +
 .../devicetree/bindings/mfd/atmel-smc.txt  |  19 +
 .../devicetree/bindings/usb/atmel-usb.txt  |  10 +-
 drivers/usb/gadget/udc/Kconfig |   1 +
 drivers/usb/gadget/udc/at91_udc.c  | 525 +++--
 drivers/usb/gadget/udc/at91_udc.h  |   9 +-
 include/linux/mfd/syscon/atmel-matrix.h| 117 +
 include/linux/mfd/syscon/atmel-smc.h   | 173 +++
 8 files changed, 623 insertions(+), 255 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/mfd/atmel-matrix.txt
 create mode 100644 Documentation/devicetree/bindings/mfd/atmel-smc.txt
 create mode 100644 include/linux/mfd/syscon/atmel-matrix.h
 create mode 100644 include/linux/mfd/syscon/atmel-smc.h

-- 
Nicolas Ferre
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 10/11] ASoC: tegra: Add a control for the headphone switch

2015-01-15 Thread Tomeu Vizoso
On 15 January 2015 at 17:20, Mark Brown  wrote:
> On Thu, Jan 15, 2015 at 05:12:22PM +0100, Tomeu Vizoso wrote:
>> To be used by userspace when the headphones jack is plugged in.
>
> I'm missing patches 1-9 of this series, what's going on there?

Sorry, no idea. They have reached lkml though:

https://lkml.org/lkml/2015/1/15/473

Regards,

Tomeu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] gpio: max732x: Rewrite IRQ code to use irq_domain API

2015-01-15 Thread Linus Walleij
On Tue, Jan 13, 2015 at 2:41 PM, Semen Protsenko
 wrote:

> Signed-off-by: Semen Protsenko 

This makes the code *so* much better so patch applied, naturally.

But...

>  config GPIO_MAX732X
> tristate "MAX7319, MAX7320-7327 I2C Port Expanders"
> depends on I2C
> +   select IRQ_DOMAIN

The most modern way would be to take another step and convert the
driver to GPIOLIB_IRQCHIP by stating

select GPIOLIB_IRQCHIP

here.

If you look at other drivers using GPIOLIB_IRQCHIP on an
i2c expander, say gpio-stmpe.c or gpio-tc3589x.c, you
can get a grip on how that works.

So please follow up with a patch converting the driver to
GPIOLIB_IRQCHIP on top of these patches :) if you
have time.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] lowmemorykiller: Avoid excessive/redundant calling of LMK

2015-01-15 Thread Michal Hocko
On Mon 12-01-15 21:49:14, Chintan Pandya wrote:
> The global shrinker will invoke lowmem_shrink in a loop.
> The loop will be run (total_scan_pages/batch_size) times.
> The default batch_size will be 128 which will make
> shrinker invoking 100s of times. LMK does meaningful
> work only during first 2-3 times and then rest of the
> invocations are just CPU cycle waste. Fix that by returning
> to the shrinker with SHRINK_STOP when LMK doesn't find any
> more work to do. The deciding factor here is, no process
> found in the selected LMK bucket or memory conditions are
> sane.

lowmemory killer is broken by design and this one of the examples which
shows why. It simply doesn't fit into shrinkers concept.

The count_object callback simply lies and tells the core that all
the reclaimable LRU pages are scanable and gives it this as a number
which the core uses for total_scan. scan_objects callback then happily
ignore nr_to_reclaim and does its one time job where it iterates over
_all_ tasks and picks up the victim and returns its rss as a return
value. This is just a subset of LRU pages of course so it continues
looping until total_scan goes down to 0 finally.

If this really has to be a shrinker then, shouldn't it evaluate the OOM
situation in the count callback and return non zero only if OOM and then
the scan callback would kill and return nr_to_reclaim.

Or even better wouldn't it be much better to use vmpressure to wake
up a kernel module which would simply check the situation and kill
something?

Please do not put only cosmetic changes on top of broken concept and try
to think about a proper solution that is what staging is for AFAIU.

The code is in this state for quite some time and I would really hate if
it got merged just because it is in staging for too long and it is used
out there.

> Signed-off-by: Chintan Pandya 
> ---
>  drivers/staging/android/lowmemorykiller.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/android/lowmemorykiller.c 
> b/drivers/staging/android/lowmemorykiller.c
> index b545d3d..5bf483f 100644
> --- a/drivers/staging/android/lowmemorykiller.c
> +++ b/drivers/staging/android/lowmemorykiller.c
> @@ -110,7 +110,7 @@ static unsigned long lowmem_scan(struct shrinker *s, 
> struct shrink_control *sc)
>   if (min_score_adj == OOM_SCORE_ADJ_MAX + 1) {
>   lowmem_print(5, "lowmem_scan %lu, %x, return 0\n",
>sc->nr_to_scan, sc->gfp_mask);
> - return 0;
> + return SHRINK_STOP;
>   }
>  
>   selected_oom_score_adj = min_score_adj;
> @@ -163,6 +163,9 @@ static unsigned long lowmem_scan(struct shrinker *s, 
> struct shrink_control *sc)
>   set_tsk_thread_flag(selected, TIF_MEMDIE);
>   send_sig(SIGKILL, selected, 0);
>   rem += selected_tasksize;
> + } else {
> + rcu_read_unlock();
> + return SHRINK_STOP;
>   }
>  
>   lowmem_print(4, "lowmem_scan %lu, %x, return %lu\n",
> -- 
> Chintan Pandya
> 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member of the Code Aurora Forum, hosted by The Linux Foundation
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] gpio: max732x: Fix possible deadlock

2015-01-15 Thread Linus Walleij
On Tue, Jan 13, 2015 at 2:41 PM, Semen Protsenko
 wrote:

> This patch was derived from next one:
> "gpio: fix pca953x set_type 'scheduling while atomic' bug".
>
> After adding entry that consumes max732x GPIO as interrupt line to dts
> file, deadlock appears somewhere in max732x probe function.

Patch applied.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm v2] vmscan: move reclaim_state handling to shrink_slab

2015-01-15 Thread Vladimir Davydov
On Thu, Jan 15, 2015 at 03:48:38PM +0100, Michal Hocko wrote:
> On Thu 15-01-15 16:25:16, Vladimir Davydov wrote:
> > memcg = mem_cgroup_iter(root, NULL, );
> > do {
> > [...]
> > if (memcg && is_classzone)
> > shrink_slab(sc->gfp_mask, zone_to_nid(zone),
> > memcg, sc->nr_scanned - scanned,
> > lru_pages);
> > 
> > /*
> >  * Direct reclaim and kswapd have to scan all memory
> >  * cgroups to fulfill the overall scan target for the
> >  * zone.
> >  *
> >  * Limit reclaim, on the other hand, only cares about
> >  * nr_to_reclaim pages to be reclaimed and it will
> >  * retry with decreasing priority if one round over the
> >  * whole hierarchy is not sufficient.
> >  */
> > if (!global_reclaim(sc) &&
> > sc->nr_reclaimed >= sc->nr_to_reclaim) {
> > mem_cgroup_iter_break(root, memcg);
> > break;
> > }
> > memcg = mem_cgroup_iter(root, memcg, );
> > } while (memcg);
> > 
> > 
> > If we can ignore reclaimed slab pages here (?), let's drop this patch.
> 
> I see what you are trying to achieve but can this lead to a serious
> over-reclaim?

I think it can, but only if we shrink an inode with lots of pages
attached to its address space (they also count to reclaim_state). In
this case, we overreclaim anyway though.

I agree that this is a high risk for a vague benefit. Let's drop it
until we see this problem in real life.

Thanks,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] platform-drivers-x86 for 3.19

2015-01-15 Thread Kirill A. Shutemov
On Thu, Jan 15, 2015 at 09:00:44AM -0800, Andrew Lutomirski wrote:
> On Jan 15, 2015 8:43 AM, "Kirill A. Shutemov"  wrote:
> >
> > On Tue, Jan 13, 2015 at 10:04:55AM -0800, Andrew Lutomirski wrote:
> > > On Tue, Jan 13, 2015 at 9:56 AM, Darren Hart  wrote:
> > > > On Mon, Jan 12, 2015 at 02:12:44PM -0800, Andrew Lutomirski wrote:
> > > >> On Mon, Jan 12, 2015 at 12:30 PM, Kirill A. Shutemov
> > > >>  wrote:
> > > >> > On Mon, Jan 12, 2015 at 12:05:45PM -0800, Andrew Lutomirski wrote:
> > > >> >> On Mon, Jan 12, 2015 at 12:03 PM, Kirill A. Shutemov
> > > >> >>  wrote:
> > > >> >> > On Mon, Jan 12, 2015 at 10:42:11AM -0800, Andrew Lutomirski wrote:
> > > >> >> >> On Mon, Jan 12, 2015 at 10:38 AM, Darren Hart 
> > > >> >> >>  wrote:
> > > >> >> >> > On Mon, Jan 12, 2015 at 12:58:02AM +0200, Kirill A. Shutemov 
> > > >> >> >> > wrote:
> > > >> >> >> >> On Thu, Dec 18, 2014 at 09:51:27AM -0800, Darren Hart wrote:
> > > >> >> >> >> > thinkpad-acpi using software mute simplifies the driver and 
> > > >> >> >> >> > the user experience
> > > >> >> >> >> > significantly.
> > > >> >> >> >>
> > > >> >> >> >> Except when it doesn't.
> > > >> >> >> >>
> > > >> >> >> >> I'm probably in minority, but I don't use fancy userspace to 
> > > >> >> >> >> mess with my
> > > >> >> >> >> mixer and the mute button worked just fine for me before the 
> > > >> >> >> >> change.
> > > >> >> >> >> Wasted half an hour to find out what happened is not a pure 
> > > >> >> >> >> win from user
> > > >> >> >> >> experience point of view.
> > > >> >> >> >>
> > > >> >> >> >> Is it really necessary to have software_mute_requested == 
> > > >> >> >> >> true by default?
> > > >> >> >> >> Can fancy userspace ask for desired behaviour instead and 
> > > >> >> >> >> change kernel to
> > > >> >> >> >> not send hotkeys change notification until software_mute is 
> > > >> >> >> >> enabled?
> > > >> >> >> >>
> > > >> >> >> >> --
> > > >> >> >> >>  Kirill A. Shutemov
> > > >> >> >> >>
> > > >> >> >> >
> > > >> >> >> > Thanks for the report Kirill,
> > > >> >> >> >
> > > >> >> >> > Andy, we're at RC4, so if we need to fix (or revert) this fix, 
> > > >> >> >> > we only have a
> > > >> >> >> > couple weeks to do so.
> > > >> >> >> >
> > > >> >> >> > Kirill, to define the scope of the problem, if you specify
> > > >> >> >> > software_mute_requested as false on the kernel command line, 
> > > >> >> >> > does your system
> > > >> >> >> > function as expected?
> > > >> >> >>
> > > >> >> >> If I understood Kirill's email correctly, the only issue is that 
> > > >> >> >> he
> > > >> >> >> liked the old behavior.  Kirill, is that correct?
> > > >> >> >
> > > >> >> > Yes. For now I use thinkpad_acpi.software_mute=0 to get old 
> > > >> >> > behaviour.
> > > >> >> >
> > > >> >>
> > > >> >> What aspect of the old behavior is better than the new default 
> > > >> >> behavior?
> > > >> >
> > > >> > It's not necessary better. The new behavior just broke my use-case.
> > > >> >
> > > >> > I don't have anything in my system what would react on KEY_MUTE and 
> > > >> > I used
> > > >> > the functionality platform provides until it suddenly stopped 
> > > >> > working.
> > > >> >
> > > >> > I personally can live with new thinkpad_acpi. I'll probably update my
> > > >> > configuration to react on the software button.
> > > >> >
> > > >> > But who else will get frustrated after update to v3.19?
> > > >>
> > > >> Can you try the equivalent of:
> > > >>
> > > >> bindsym XF86AudioMute exec amixer -q set Master toggle
> > > >>
> > > >> for your desktop environment?  Note that this is exactly what you'd do
> > > >> if you were using any laptop that wasn't a thinkpad.
> > > >
> > > > Unless we hear back from Kirill on why this isn't workable, I'm 
> > > > inclined to
> > > > leave this as is. Between the kernel module option and this keybinding,
> > > > equivalent behavior can be easily achieved, and the general use case is
> > > > significantly improved.
> > > >
> > > > The current gap appears to be the mute LED in the button itself, which 
> > > > can be
> > > > addressed moving forward.
> > >
> > > I should have clarified this better.  The command:
> > >
> > > amixer -q set Master toggle
> > >
> > > should toggle the mute LED in sync with the master mixer on all
> > > ThinkPad models with a mute LED.  This works on my X220, and Borislav
> > > confirmed off-list that it works on his X230.  If there's a ThinkPad
> > > with a mute LED on which this doesn't work, then IMO it's a bug and
> > > should be fixed.
> >
> > Okay. Mute and mute led works. Any idea how to get mic mute led work?
> > What happend to your patch on the topic?
> >
> > http://permalink.gmane.org/gmane.linux.drivers.platform.x86.devel/1962
> 
> It was addressed differently -- the mic mute works like the mute
> button as a result of:
> 
> b317b032d2dc ALSA: hda - Split Thinkpad ACPI-related code
> b67ae3f1c97e ALSA: hda - Enable Thinkpad mute/micmute LEDs for Realtek
> 08cf680ccafd ALSA: hda - add connection to 

Re: [patch 2/2] mm: memcontrol: default hierarchy interface for memory

2015-01-15 Thread Michal Hocko
On Wed 14-01-15 12:19:44, Johannes Weiner wrote:
> On Wed, Jan 14, 2015 at 04:34:25PM +0100, Michal Hocko wrote:
> > On Thu 08-01-15 23:15:04, Johannes Weiner wrote:
[...]
> > > @@ -2322,6 +2325,12 @@ static bool shrink_zone(struct zone *zone, struct 
> > > scan_control *sc,
> > >   struct lruvec *lruvec;
> > >   int swappiness;
> > >  
> > > + if (mem_cgroup_low(root, memcg)) {
> > > + if (!sc->may_thrash)
> > > + continue;
> > > + mem_cgroup_events(memcg, MEMCG_LOW, 1);
> > > + }
> > > +
> > >   lruvec = mem_cgroup_zone_lruvec(zone, memcg);
> > >   swappiness = mem_cgroup_swappiness(memcg);
> > >  
> > > @@ -2343,8 +2352,7 @@ static bool shrink_zone(struct zone *zone, struct 
> > > scan_control *sc,
> > >   mem_cgroup_iter_break(root, memcg);
> > >   break;
> > >   }
> > > - memcg = mem_cgroup_iter(root, memcg, );
> > > - } while (memcg);
> > > + } while ((memcg = mem_cgroup_iter(root, memcg, )));
> > 
> > I had a similar code but then I could trigger quick priority drop downs
> > during parallel reclaim with multiple low limited groups. I've tried to
> > address that by retrying shrink_zone if it hasn't called shrink_lruvec
> > at all. Still not ideal because it can livelock theoretically, but I
> > haven't seen that in my testing.
> 
> Do you remember the circumstances and the exact configuration?

Well, I was testing heavy parallel memory intensive load (combination of
anon and file) in one memcg and many (hundreds of) idle memcgs to see
how much overhead memcg traversing would cost us. And I misconfigured by
setting idle memcgs low-limit to -1 instead of 0. There is nothing
running in them.
I've noticed that I can see more pages reclaimed than expected and also
higher runtime which turned out to be related to longer stalls during
reclaim rather than the cost of the memcg reclaim iterator.  Debugging
has shown that many direct reclaimers were racing over low-limited
groups and dropped to lower priorities. The race window was apparently
much much smaller than a noop shrink_lruvec run.

So in a sense this was a mis-configured system because I do not expect
so many low limited groups in real life but there was still something
reclaimable so the machine wasn't really over-committed. So this points
to an issue which might happen, albeit in a smaller scale, if there are
many groups, heavy reclaim and some reclaimers unlucky to race and see
only low-limited groups.
 
> I tested this with around 30 containerized kernel build jobs whose low
> boundaries pretty much added up to the available physical memory and
> never observed this.  That being said, thrashing is an emergency path
> and users should really watch the memory.events low counter.  After
> all, if global reclaim frequently has to ignore the reserve settings,
> what's the point of having them in the first place?

Sure, over-committed low limit is a misconfiguration. But this is not
what happened in my testing.

> So while I see that this might burn some cpu cycles when the system is
> misconfigured, and that we could definitely be smarter about this, I'm
> not convinced we have to rush a workaround before moving ahead with
> this patch, especially not one that is prone to livelock the system.

OK, then do not merge it to the original patch. If for nothing else then
for bisectability. I will post a patch separately. I still think we
should consider a way how to address it sooner or later because the
result would be non trivial to debug.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] at91: defconfig for 3.20 #1

2015-01-15 Thread Nicolas Ferre
Arnd, Olof, Kevin,

A little defconfig update. If you want me to collect more patches before
sending you a more substantial one, let me know.

Thanks, bye,

The following changes since commit eaa27f34e91a14cdceed26ed6c6793ec1d186115:

  linux 3.19-rc4 (2015-01-11 12:44:53 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/nferre/linux-at91.git 
tags/at91-defconfig

for you to fetch changes up to a1b282911b47a522f150eb715fb037628c5173fe:

  ARM: at91: sama5: enable atmel-isi and ov2640 in defconfig (2015-01-15 
16:28:23 +0100)


First batch of defconfig update for 3.20:
- we remove DEBUG_LL from sama5 defconfig as all SoC are not compatible
- enable ISI and selected sensors


Josh Wu (1):
  ARM: at91: sama5: enable atmel-isi and ov2640 in defconfig

Maxime Ripard (1):
  ARM: at91/config: sama5: Remove DEBUG_LL

 arch/arm/configs/sama5_defconfig | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

-- 
Nicolas Ferre
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 10/11] ASoC: tegra: Add a control for the headphone switch

2015-01-15 Thread Mark Brown
On Thu, Jan 15, 2015 at 06:02:51PM +0100, Tomeu Vizoso wrote:
> On 15 January 2015 at 17:20, Mark Brown  wrote:
> > On Thu, Jan 15, 2015 at 05:12:22PM +0100, Tomeu Vizoso wrote:
> >> To be used by userspace when the headphones jack is plugged in.

> > I'm missing patches 1-9 of this series, what's going on there?

> Sorry, no idea. They have reached lkml though:

> https://lkml.org/lkml/2015/1/15/473

What's going wrong there is that you didn't CC me on either them or the
cover letter - you should always make sure that everyone is at least
getting the cover letter so they know what's going on.  Never assume
anyone is going to see anything on the list.


signature.asc
Description: Digital signature


Re: [PATCH v2 2/7] block: rewrite __bio_copy_iov()

2015-01-15 Thread Christoph Hellwig
> +/**
> + * __bio_copy_iov - copy all pages between bio and iov_iter
> + * @bio: The  bio which describes the I/O
> + * @iter: iov_iter either as source or destination
> + * @to_iov: whether to %READ (0) or %WRITE (1)
> + *
> + * Simple wrapper around __bio_copy_iov_{write,read}().
> + * Returns 0 on success, or the error returned as-is on failure.
> + */
> +static inline int __bio_copy_iov(struct bio *bio, struct iov_iter iter,
> +  int to_iov)
> +{
> + if (to_iov == WRITE)
> + return __bio_copy_iov_write(bio, iter);
> + else
> + return __bio_copy_iov_read(bio, iter);
>  }

No need to keep this wrapper..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm->hiwater_rss (peak RSS)

2015-01-15 Thread Petr Cermak
On Thu, Jan 15, 2015 at 01:36:30AM +0200, Kirill A. Shutemov wrote:
> On Wed, Jan 14, 2015 at 03:22:25PM +, Petr Cermak wrote:
> > On Wed, Jan 07, 2015 at 07:24:52PM +0200, Kirill A. Shutemov wrote:
> > > And how it's not an ABI break?
> > I don't think this is an ABI break because the current behaviour is not
> > changed unless you write "5" to /proc/pid/clear_refs. If you do, you are
> > explicitly requesting the new functionality.
> 
> I'm not sure if it should be considered ABI break or not. Just asking.
> 
> I would like to hear opinion from other people.

We would like to get more feedback as well.

> > > We have never-lowering VmHWM for 9+ years. How can you know that nobody
> > > expects this behaviour?
> > This is why we sent an RFC [1] several weeks ago. We expect this to be
> > used mainly by performance-related tools (e.g. profilers) and from the
> > comments in the code [2] VmHWM seems to be a best-effort counter. If this
> > is strictly a no-go, I can only think of the following two alternatives:
> > 
> >   1. Add an extra resettable field to /proc/pid/status (e.g.
> >  resettable_hiwater_rss). While this doesn't violate the current
> >  definition of VmHWM, it adds an extra line to /proc/pid/status,
> >  which I think is a much bigger issue.
> 
> I don't think extra line is bigger issue. Sane applications would look for
> a key, not line number. We do add lines there. I've posted patch which
> adds one more just today ;)

In that case, should we add an extra field to /proc/pid/status?

> >   2. Introduce a new proc fs file to task_mmu (e.g.
> >  /proc/pid/profiler_stats), but this feels like overengineering.
> 
> BTW, we have memory.max_usage_in_byte in memory cgroup. And it's resetable.
> Wouldn't it be enough for your profiling use-case?

This is a very interesting point, but it doesn't cover all use cases.
Our specific use case is memory profiling of the Chromium browser, but I
think that the same considerations below apply to any other use case
which involves child sub-processes:

  1. The process must be added to the control group explicitly by root.
 Hence, the Chromium process cannot do this itself.
  2. All forked children are implicitly grouped in the same control
 group. This is a problem because we want to be able to measure
 memory usage of the child processes separately.

The advantage of using clear_refs instead is that Chomium would be able
to profile its memory usage without any external intervention (as it's
already the case with performance profiling).

> > > And why do you reset hiwater_rss, but not hiwater_vm?
> > This is a good point. Should we reset both using the same flag, or
> > introduce a new one ("6")?
> > 
> > [1] lkml.iu.edu/hypermail/linux/kernel/1412.1/01877.html
> > [2] task_mmu.c:32: "... such snapshots can always be inconsistent."

Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2 0/9] simplify block layer based on immutable biovecs

2015-01-15 Thread Christoph Hellwig
> - move a patch "btrfs: make use of immutable biovecs" to the upcoming series.

which upcoming series is that?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 5/8] ARM: at91: move the restart function to the system timer driver

2015-01-15 Thread Alexandre Belloni
On 15/01/2015 at 18:01:34 +0100, Alexandre Belloni wrote :
> > Mmh, I can't clearly explain why but I have a problem with that.
> > 
> > Can you explain why restart code falls in the clockevents driver ?
> > 
> > 
> 
> That is a temporary location before getting rid of it by writing a
> proper reset driver that will use the same syscon.
> 
> The main goal is to remove mach/at91_st.h and direct accesses to the
> system timer registers.
> 

Just to add that it was not because of laziness but to avoid a
depending on an other subsystem...

-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Repost sched-rt: Reduce rq lock contention by eliminating locking of non-feasible target

2015-01-15 Thread Tim Chen
On Tue, 2015-01-06 at 20:30 +0100, Peter Zijlstra wrote:
> On Tue, Jan 06, 2015 at 11:01:51AM -0800, Tim Chen wrote:
> > Didn't get any response for this patch probably due to the holidays.
> > Reposting it as we will like to get it merged to help our database
> > workload.
> > 
> > This patch added checks that prevent futile attempts to move rt tasks
> > to cpu with active tasks of equal or higher priority.  This reduces
> > run queue lock contention and improves the performance of a well
> > known OLTP benchmark by 0.7%.
> 
> Don't immediately see anything wrong with this, Steve?

Steve, do you have any objections to this patch?  Thanks.

Tim


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/11] Improvements to Tegra-based Chromebook support

2015-01-15 Thread Mark Brown
On Thu, Jan 15, 2015 at 05:12:12PM +0100, Tomeu Vizoso wrote:

Please fix your mailer to word wrap within paragraphs so your mails are
more legible, I've reflowed.

> this started as adding support for the Nyan Blaze, but the Big is so
> similar to it that I thought it would be better to have both in the
> same series.

This seems like a series of unrelated patch series rather than a single
one and so should probably have been sent as such, they're all
independent of each other.  I'd probably have sent each of the things
you list in paragraphs here separately.


signature.asc
Description: Digital signature


Re: [PATCH 4/4] gpio: max732x: Add DT binding documentation

2015-01-15 Thread Linus Walleij
On Tue, Jan 13, 2015 at 2:41 PM, Semen Protsenko
 wrote:

> Add a devicetree binding documentation for the max732x driver.
>
> Signed-off-by: Semen Protsenko 

Vanilla bindings, OK. Patch applied.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] gpio: gpio-dln2: Added a Blank line after declaration

2015-01-15 Thread Linus Walleij
On Tue, Jan 13, 2015 at 4:09 PM, Mohammad Jamal
 wrote:

> Fix the coding style issue by adding a blank line after declaration
>
> Signed-off-by: Mohammad Jamal 

Patch applied.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 03/11] ARM: tegra: Set the sound card model that alsaucm expects

2015-01-15 Thread Stephen Warren

On 01/15/2015 09:12 AM, Tomeu Vizoso wrote:

Patches are on its way to add a config file to alsaucm for the Nyan
boards. Use the same card ID that alsaucm will expect.

Signed-off-by: Tomeu Vizoso 
---
  arch/arm/boot/dts/tegra124-nyan-big.dts | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/tegra124-nyan-big.dts 
b/arch/arm/boot/dts/tegra124-nyan-big.dts
index 43e58a4..9a9cffe 100644
--- a/arch/arm/boot/dts/tegra124-nyan-big.dts
+++ b/arch/arm/boot/dts/tegra124-nyan-big.dts
@@ -1976,9 +1976,9 @@
};

sound {
-   compatible = "nvidia,tegra-audio-max98090-nyan-big",
+   compatible = "nvidia,tegra-audio-max98090-nyan",
 "nvidia,tegra-audio-max98090";


If all the boards that are derived from Nyan truly have identical audio 
HW (or at least any differences can be described by this binding), then 
it seems fine to add "nvidia,tegra-audio-max98090-nyan" to the 
compatible value.


However, I don't see a reason to remove the board-specific compatible 
value "nvidia,tegra-audio-max98090-nyan-big"; we should always include 
all the values that are relevant.



-   nvidia,model = "Acer Chromebook 13";
+   nvidia,model = "GoogleNyan";


Why not just name the UCM config file after the ASoC card name that's 
already in use? Perhaps it's not likely to be unique enough though:-(

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linaro-acpi] [PATCH v5 18/18] Documentation: ACPI for ARM64

2015-01-15 Thread Al Stone
On 01/15/2015 09:52 AM, Arnd Bergmann wrote:
> On Thursday 15 January 2015 10:51:58 Jon Masters wrote:
>> On 01/15/2015 09:10 AM, Grant Likely wrote:
>>> On Tue, Jan 6, 2015 at 1:59 PM, Arnd Bergmann  wrote:
 For drivers merged upstream, I would insist that every driver merged
 for an ARM64 platform has a documented DT binding that is used in the
 driver.
>>>
>>> That's a dumb rule. It will result in untested DT code paths being
>>> thrown into drivers just too meet the rules rather than on whether or
>>> not they will actually be used. It's fine to allow driver authors to
>>> only implement the ACPI code path if that is what they are working
>>> with. We can *always* add a DT path to the driver when it is needed.
>>
>> It gets worse. There *will* be large numbers of ACPI only ARM servers
>> landing over the coming year. Not only would DT code be untested, but
>> insisting on keeping e.g. a DSDT and DT in sync is never going to work
>> anyway. Already we have early stage servers that contain a DT used for
>> bringup that is subsequently not being updated as often as the ACPI
>> tables (those systems are now booting exclusively in labs with ACPI).
>> Eventually, I am going to push for the DT data to be removed from these
>> systems rather than have out of date unmaintained DT data in firmware.
> 
> We will of course be able to relax the rule once ACPI has stabilized on
> ARM64. At the moment, we haven't even agreed on how to represent basic
> devices, so things are in flux and there is no way for a BIOS writer
> to ship an image that we will guarantee to support in the long run.
> 
> At some point after we are reasonably sure we are able to keep supporting
> all existing systems that are working with that kernel, we can take
> support for new systems without having DT by default, and also support
> booting those without acpi=force, which is related to this question.
> 
>   Arnd
> 

Can I restate the position as I hear it, then?  I want to make sure
I'm understanding what's being said.

What I'm reading seems to say: if an ARMv8 vendor wants Linux support
in the upstream kernel, regardless of whether or not it is a mobile or
server product, they must submit DT-based patches until such time as
ACPI on arm64 is deemed "mature."  Do I have that correct?

That implies to me that if I want to build an ACPI-only product, there
is no way to predict when or if I can get Linux support.  And, that if
I do want Linux support, and need ACPI for my end-users, I have to
maintain both sets of firmware for some unknown time into the future.
Is that what was meant?

I'm not really trying to judge the position right this second, but I
am trying to make sure I understand it.  English is not really the most
precise of languages and I would prefer not to misinterpret.

-- 
ciao,
al
---
Al Stone
Software Engineer
Linaro Enterprise Group
al.st...@linaro.org
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2 4/9] rtc/ab3100: Update driver to address y2038/y2106 issues

2015-01-15 Thread Linus Walleij
On Tue, Jan 13, 2015 at 4:44 PM, Xunlei Pang  wrote:

> From: Xunlei Pang 
>
> This driver has a number of y2038/y2106 issues.
>
> This patch resolves them by:
> - Replace rtc_tm_to_time() with rtc_tm_to_time64()
> - Replace rtc_time_to_tm() with rtc_time64_to_tm()
> - Change ab3100_rtc_set_mmss() to use rtc_class_ops's set_mmss64()
>
> After this patch, the driver should not have any remaining
> y2038/y2106 issues.
>
> Signed-off-by: Xunlei Pang 

Acked-by: Linus Walleij 

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 02/11] ARM: tegra: Use the generated pinmux data

2015-01-15 Thread Stephen Warren

On 01/15/2015 09:12 AM, Tomeu Vizoso wrote:

Google has submitted a board config for the pinmux programming of the
Nyan Big board. Use the whole of it as it's generated to make it easier
to update as the configuration gets fixed in the future.


Submitted to where? I assume you mean tegra-pinmux-scripts. I don't see 
a Nyan/Nyan-big configuration there yet. IIRC (and I might not) one was 
posted, but there were some review comments that aren't yet addressed?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] pseries/le: Fix another endiannes issue in RTAS call from xmon

2015-01-15 Thread Laurent Dufour
The commit 3b8a3c010969 ("powerpc/pseries: Fix endiannes issue in RTAS
call from xmon") was fixing an endianness issue in the call made from
xmon to RTAS.

However, as Michael Ellerman noticed, this fix was not complete, the
token value was not byte swapped. This lead to call an unexpected and
most of the time unexisting RTAS function, which is silently ignored
by RTAS.

This fix addresses this hole.

Reported-by: Michael Ellerman 
Cc: sta...@vger.kernel.org
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/xmon/xmon.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 5b150f0c5df9..13c6e200b24e 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -337,6 +337,7 @@ static inline void disable_surveillance(void)
args.token = rtas_token("set-indicator");
if (args.token == RTAS_UNKNOWN_SERVICE)
return;
+   args.token = cpu_to_be32(args.token);
args.nargs = cpu_to_be32(3);
args.nret = cpu_to_be32(1);
args.rets = [3];
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] mm: vmscan: fix the page state calculation in too_many_isolated

2015-01-15 Thread Vinayak Menon

On 01/14/2015 10:20 PM, Michal Hocko wrote:

On Wed 14-01-15 17:06:59, Vinayak Menon wrote:
[...]

In one such instance, zone_page_state(zone, NR_ISOLATED_FILE)
had returned 14, zone_page_state(zone, NR_INACTIVE_FILE)
returned 92, and GFP_IOFS was set, and this resulted
in too_many_isolated returning true. But one of the CPU's
pageset vm_stat_diff had NR_ISOLATED_FILE as "-14". So the
actual isolated count was zero. As there weren't any more
updates to NR_ISOLATED_FILE and vmstat_update deffered work
had not been scheduled yet, 7 tasks were spinning in the
congestion wait loop for around 4 seconds, in the direct
reclaim path.


Not syncing for such a long time doesn't sound right. I am not familiar
with the vmstat syncing but sysctl_stat_interval is HZ so it should
happen much more often that every 4 seconds.



Though the interval is HZ, since the vmstat_work is declared as a 
deferrable work, IIUC the timer trigger can be deferred to the
next non-defferable timer expiry on the CPU which is in idle. This 
results in the vmstat syncing on an idle CPU delayed by seconds. May be 
in most cases this behavior is fine, except in cases like this. Even in 
usual cases were the timer triggers in 1-2 secs, is it fine to let the 
tasks in reclaim path wait that long unnecessarily when there isn't any 
real congestion?


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 18/24] irqchip: mips-gic: Stop using per-platform mapping tables

2015-01-15 Thread James Hogan
On 15/01/15 16:58, Andrew Bresticker wrote:
> Hi James, Qais,
> 
> On Thu, Jan 15, 2015 at 8:36 AM, Qais Yousef  wrote:
>> On 01/15/2015 04:29 PM, James Hogan wrote:
>>>
>>> On 15/01/15 11:59, James Hogan wrote:

 Hi Andrew,

 On 18/09/14 22:47, Andrew Bresticker wrote:
>
> Now that the GIC properly uses IRQ domains, kill off the per-platform
> routing tables that were used to make the GIC appear transparent.
>
> This includes:
>   - removing the mapping tables and the support for applying them,
>   - moving GIC IPI support to the GIC driver,
>   - properly routing the i8259 through the GIC on Malta, and
>   - updating IRQ assignments on SEAD-3 when the GIC is present.
>
> Platforms no longer will pass an interrupt mapping table to gic_init.
> Instead, they will pass the CPU interrupt vector (2 - 7) that they
> expect the GIC to route interrupts to.  Note that in EIC mode this
> value is ignored and all GIC interrupts are routed to EIC vector 1.
>
> Signed-off-by: Andrew Bresticker 
> Acked-by: Jason Cooper 
> Reviewed-by: Qais Yousef 
> Tested-by: Qais Yousef 

 This commit (18743d2781d01d34d132f952a2e16353ccb4c3de) appears to break
 boot of interAptiv, dual core, dual vpe per core, on malta with
 malta_defconfig.

 It gets to here:
 ...
 CPU1 revision is: 0001a120 (MIPS interAptiv (multi))
 FPU revision is: 0173a000
 Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes.
 Primary data cache 64kB, 4-way, PIPT, no aliases, linesize 32 bytes
 MIPS secondary cache 1024kB, 8-way, linesize 32 bytes.
 Synchronize counters for CPU 1: done.
 Brought up 2 CPUs

 and then appears to just hang. Passing nosmp works around it, allowing
 it to get to userland.

 Is that a problem you've already come across?

 I'll keep debugging.
>>>
>>> Right, it appears the CPU IRQ line that the GIC is using doesn't get
>>> unmasked (STATUSF_IP2) when a new VPE is brought up, so only the first
>>> CPU will actually get any interrupts after your patch (including the
>>> rather critical IPIs), i.e. hacking it in vsmp_init_secondary() in
>>> smp-mt.c allows it to boot.
>>>
>>> Hmm, I'll have a think about what the most generic fix is, since
>>> arbitrary stuff may or may not have registered handlers for the raw CPU
>>> interrupts (timer, performance counter, gic etc)...
>>>
>>> Cheers
>>> James
>>>
>>
>> Is this similar to the issue addressed by this (ff1e29ade4c6 MIPS: smp-cps:
>> Enable all hardware interrupts on secondary CPUs)?
> 
> I believe so.
> 
> James, I think you can apply a similar patch to smp-mt.c:vsmp_init_secondary.

Yes, I've come to the same conclusion. smp-cmp.c is also affected. I'll
post patches tomorrow.

Thanks
James



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 04/11] ARM: tegra: Set spi-max-frequency property to flash node

2015-01-15 Thread Stephen Warren

On 01/15/2015 09:12 AM, Tomeu Vizoso wrote:

To silence a warning on Nyan boards.

Signed-off-by: Tomeu Vizoso 
---
  arch/arm/boot/dts/tegra124-nyan-big.dts | 1 +
  1 file changed, 1 insertion(+)

diff --git a/arch/arm/boot/dts/tegra124-nyan-big.dts 
b/arch/arm/boot/dts/tegra124-nyan-big.dts
index 9a9cffe..94c7ba9 100644
--- a/arch/arm/boot/dts/tegra124-nyan-big.dts
+++ b/arch/arm/boot/dts/tegra124-nyan-big.dts
@@ -1660,6 +1660,7 @@

flash@0 {
compatible = "winbond,w25q32dw";
+   spi-max-frequency = <2500>;


This property already exists in the SPI controller. Isn't the max 
frequency supposed to inherit from there? If so, shouldn't the code not 
warn when such inheritance happens, i.e. it'd be better to fix the code?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH cgroup/for-3.19-fixes] cgroup: implement cgroup_subsys->unbind() callback

2015-01-15 Thread Michal Hocko
On Sat 10-01-15 16:43:16, Tejun Heo wrote:
> Currently, if a hierarchy doesn't have any live children when it's
> unmounted, the hierarchy starts dying by killing its refcnt.  The
> expectation is that even if there are lingering dead children which
> are lingering due to remaining references, they'll be put in a finite
> amount of time.  When the children are finally released, the hierarchy
> is destroyed and all controllers bound to it also are released.
> 
> However, for memcg, the premise that the lingering refs will be put in
> a finite amount time is not true.  In the absense of memory pressure,
> dead memcg's may hang around indefinitely pinned by its pages.  This
> unfortunately may lead to indefinite hang on the next mount attempt
> involving memcg as the mount logic waits for it to get released.
> 
> While we can change hierarchy destruction logic such that a hierarchy
> is only destroyed when it's not mounted anywhere and all its children,
> live or dead, are gone, this makes whether the hierarchy gets
> destroyed or not to be determined by factors opaque to userland.
> Userland may or may not get a new hierarchy on the next mount attempt.
> Worse, if it explicitly wants to create a new hierarchy with different
> options or controller compositions involving memcg, it will fail in an
> essentially arbitrary manner.
> 
> We want to guarantee that a hierarchy is destroyed once the
> conditions, unmounted and no visible children, are met.  To aid it,
> this patch introduces a new callback cgroup_subsys->unbind() which is
> invoked right before the hierarchy a subsystem is bound to starts
> dying.  memcg can implement this callback and initiate draining of
> remaining refs so that the hierarchy can eventually be released in a
> finite amount of time.
> 
> Signed-off-by: Tejun Heo 
> Cc: Li Zefan 
> Cc: Johannes Weiner 
> Cc: Michal Hocko 
> Cc: Vladimir Davydov 

Ohh, I have missed this one as I wasn't on the CC list.

FWIW this approach makes sense to me. I just think that we should have a
way to fail. E.g. kmem pages are impossible to reclaim because there
might be some objects lingering somewhere not bound to a task context
and reparenting is hard as Vladimir has pointed out several times
already.
Normal LRU pages should be reclaimable or reparented to the root easily.

I cannot judge the implementation but I agree with the fact that memcg
controller should be the one to take an action.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 06/11] ARM: tegra: Move out nyan-generic parts out from the nyan-big DT

2015-01-15 Thread Stephen Warren

On 01/15/2015 09:12 AM, Tomeu Vizoso wrote:

In preparation for adding the DT for the nyan-blaze board.


"git format-patch -C" might help here; hopefully it'd highlight that 
arch/arm/boot/dts/tegra124-nyan.dtsi was a copy from 
arch/arm/boot/dts/tegra124-nyan-big.dts, with just a few small diffs?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 4/5] ARM: mediatek: Add EINT support to MTK pinctrl driver.

2015-01-15 Thread Linus Walleij
On Wed, Jan 14, 2015 at 3:32 AM, Yingjoe Chen  wrote:

> Let's me describe my problem more clearly. On our SoC, if a pin support
> interrupt it will have 2 different numbers for it. For examples, here's
> a partial list for the gpio and EINT number mappings on mt8135:
>
>gpio  EINT
>  0 49
>  1 48
> ...
> 36 97
> 37 19
> ...
>
> To control interrupt related function, we'll need EINT number to locate
> corresponding register bits. When interrupt occurs, the interrupt
> handler will know which EINT interrupt occurs. In irq_chip functions,
> only .irq_request_resources and .irq_release_resources use gpio number
> to set pinmux to EINT mode and all the others need EINT number.
>
> Because EINT number is used more frequently in interrupt related
> functions, it make sense to use EINT number as hwirq instead of gpio
> number. That means irq_domain will translate EINT number to virq.
> So what mtk_gpio_to_irq actually do is translate gpio number to EINT
> number and use irq domain to translate it to virq.

But the EINT is not a hardware number is it?

That sounds like an abuse of the irqdomain framework.

The purpose of that code is to map hardware IRQs to Linux
IRQs nothing else.

This sounds more like mapping a Linux IRQ (the EINT) to
another Linux IRQ (whatever the irqdomain allocates for
you).

Since the EINTs are already Linux IRQs, these should be used
directly.

I would solve this by just having some cross-reference table
for mapping GPIOs to EINTs without involving the irqdomain
at all.

struct eint_map {
int gpio_offset;
int eint;
};

But also check with the irqdomain people.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 08/11] mmc: pwrseq_simple: Add support for a delay

2015-01-15 Thread Stephen Warren

On 01/15/2015 09:12 AM, Tomeu Vizoso wrote:

Signed-off-by: Tomeu Vizoso 


Some explanation of why such a delay might be useful would be ... useful!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 08/11] mmc: pwrseq_simple: Add support for a delay

2015-01-15 Thread Stephen Warren

On 01/15/2015 09:12 AM, Tomeu Vizoso wrote:

Signed-off-by: Tomeu Vizoso 


Ah, having read the explanation in the next patch, I think ...


diff --git a/Documentation/devicetree/bindings/mmc/mmc,pwrseq-simple.txt 
b/Documentation/devicetree/bindings/mmc/mmc,pwrseq-simple.txt



+- delay : delay to wait after driving the reset gpio active [ms].


... delay is the wrong name. reset-pulse-width or reset-pulse-length 
would be better. delay sounds like a delay after resetting the device 
before it can be accessed.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 09/11] ARM: tegra: Use pwrseq-simple for the wifi in Nyan

2015-01-15 Thread Stephen Warren

On 01/15/2015 09:12 AM, Tomeu Vizoso wrote:

The Nyan boards have a Marvell 88w8897 wifi card connected through SDIO
that need the reset line to be hold active for several milliseconds.



diff --git a/arch/arm/boot/dts/tegra124-nyan.dtsi 
b/arch/arm/boot/dts/tegra124-nyan.dtsi



+   sdhci0_pwrseq: sdhci0_pwrseq {
+   compatible = "mmc,pwrseq-simple";
+
+reset-gpios = < TEGRA_GPIO(X, 7) GPIO_ACTIVE_LOW>;
+delay = <20>; /* ms */


Some of these new lines are indented with spaces not TABs.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/3] Introduce IIO interface for fingerprint sensors

2015-01-15 Thread Sylwester Nawrocki
On 14/01/15 18:14, Baluta, Teodora wrote:
> On Vi, 2014-12-26 at 11:13 +, Jonathan Cameron wrote:
>> On 18/12/14 16:51, Lars-Peter Clausen wrote:
>>> Adding V4L folks to Cc for more input.
>>
>> Thanks Lars - we definitely would need the v4l guys to agree to a driver like
>> this going in IIO. (not that I'm convinced it should!)
>>
>>> On 12/08/2014 03:10 PM, Baluta, Teodora wrote:
 Hello,

 On Vi, 2014-12-05 at 02:15 +, Jonathan Cameron wrote:
> On 04/12/14 13:00, Teodora Baluta wrote:
>> This patchset adds support for fingerprint sensors through the IIO 
>> interface.
>> This way userspace applications collect information in a uniform way. All
>> processing would be done in the upper layers as suggested in [0].
>>
>> In order to test out this proposal, a minimal implementation for UPEK's
>> TouchChip Fingerprint Sensor via USB is also available. Although there 
>> is an
>> existing implementation in userspace for USB fingerprint devices, 
>> including this
>> particular device, the driver represents a proof of concept of how 
>> fingerprint
>> sensors could be integrated in the IIO framework regardless of the used 
>> bus. For
>> lower power requirements, the SPI bus is preferred and a kernel driver
>> implementation makes more sense.
>
> So why not v4l?  These are effectively image sensors..

 Well, here's why I don't think v4l would be the best option:

 - an image scanner could be implemented in the v4l subsystem, but it
 seems far more complicated for a simple fingerprint scanner - it usually
 has drivers for webcams, TVs or video streaming devices. The v4l
 subsystem (with all its support for colorspace, decoders, image
 compression, frame control) seems a bit of an overkill for a very
 straightforward fingerprint imaging sensor.
>
>> Whilst those are there, I would doubt the irrelevant bits would put much
>> burden on a fingerprint scanning driver.  Been a while since I did
>> anything in that area though so I could be wrong!

IMO V4L is much better fit for this kind of devices than IIO. You can use
just a subset of the API, it shouldn't take much effort to write a simple
v4l2 capture driver, supporting fixed (probably vendor/chip specific) image
format.  I'm not sure if it's better to use the v4l2 controls [1], define
a new v4l2 controls class for the fingerprint scanner processing features,
rather than trying to pass raw data to user space and interpret it then
in some library.  I know there has been resistance to allowing passing
unknown binary blobs to user space, due to possible abuses.

[1] Documentation/video4linux/v4l2-controls.txt

 - a fingerprint device could also send out a processed information, not
 just the image of a fingerprint. This means that the processing is done
 in hardware - the UPEK TouchStrip chipset in libfprint has this behavior
 (see [0]). So, the IIO framework would support a uniform way of handling
 fingerprint devices that either do processing in software or in
 hardware.

You can use the v4l2 controls API for that, which also supports events.
The controls could be made read only.
It would be interesting to list what kind of features these could be.

>> This is more interesting, but does that map well to IIO style
>> channels anyway?  If not we are going to end up with a whole new
>> interface which ever subsystem is used for the image side of things.

 The way I see it now, for processed fingerprint information, an IIO
 device could have an IIO_FINGERPRINT channel with a modifier and only
 the sensitivity threshold attribute set. We would also need two
 triggers: one for enrollment and one for the verification mode to
 control the device from a userspace application.

This could be all well handled with the v4l2 controls, for instance see
what features are available in the Camera Flash controls subset

http://linuxtv.org/downloads/v4l-dvb-apis/extended-controls.html#flash-controls

>> Sure - what you proposed would work.  The question is whether it is
>> the best way to do it.
> 
> Any thoughts on this from the v4l community?

I would try it with V4L2, it seems to me most suitable subsystem for such
devices to me.  The question is what ends up in the kernel and what in user
space.  Anyway IMO V4L2 API is quite flexible with regards to that, due to
wide range of devices it needs to cover.

 [0] http://www.freedesktop.org/wiki/Software/fprint/libfprint/upekts/

>> A sysfs trigger is enabled and the device starts scanning. As soon as an 
>> image
>> is available it is written in the character device /dev/iio:deviceX.
>>
>> Userspace applications will be able to calculate the expected image size 
>> using
>> the fingerprint attributes height, width and bit depth. Other attributes
>> introduced for the fingerprint channel in IIO represent information 

kmsg: lseek errors confuse glibc's dprintf

2015-01-15 Thread Mike Crowe
glibc's dprintf implementation does not work correctly with /dev/kmsg file
descriptors because glibc treats receiving EBADF and EINVAL from lseek when
trying to determine the current file position as errors. See
https://sourceware.org/bugzilla/show_bug.cgi?id=17830

>From what I can tell prior to Kay Sievers printk record commit
e11fea92e13fb91c50bacca799a6131c81929986, calling lseek(fd, 0, SEEK_CUR)
with such a file descriptor would not return an error.

Prior to Kay's change, Arnd Bergmann's commit
6038f373a3dc1f1c26496e60b6c40b164716f07e seemed to go to some lengths to
preserve the successful return code rather than returning (the perhaps more
logical) -ESPIPE.

glibc is happy with either a successful return or -ESPIPE.

For maximum compatibility it seems that success should be returned but
given Kay's new seek interface perhaps this isn't helpful.

This patch ensures that such a seek succeeds:

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 02d6b6d..b3ff6f0 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -693,7 +693,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
loff_t ret = 0;
 
if (!user)
-   return -EBADF;
+   return (whence == SEEK_CUR) ? 0 : -EBADF;
if (offset)
return -ESPIPE;
 
@@ -718,6 +718,11 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
user->idx = log_next_idx;
user->seq = log_next_seq;
break;
+   case SEEK_CUR:
+   /* For compatibility with userspace requesting the
+* current file position. */
+   ret = 0;
+   break;
default:
ret = -EINVAL;
}

(although it could be argued that the !user case should return -ESPIPE
rather than EBADF since the file descriptor _is_ valid.)

and this patch causes a failure that glibc is prepared to accept:

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 02d6b6d..f6b0c93 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -693,7 +693,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
loff_t ret = 0;
 
if (!user)
-   return -EBADF;
+   return -ESPIPE;
if (offset)
return -ESPIPE;
 
@@ -718,6 +718,11 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
user->idx = log_next_idx;
user->seq = log_next_seq;
break;
+   case SEEK_CUR:
+   /* For compatibility with userspace expecting SEEK_CUR
+* to not yield EINVAL. */
+   ret = -ESPIPE;
+   break;
default:
ret = -EINVAL;
}

Either makes dprintf work, but is either the right solution?

Thanks.

Mike.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 0/7] Support Write-Through mapping on x86

2015-01-15 Thread Toshi Kani
Hi Ingo, Peter, Thomas,

Is there anything else I need to do for accepting this patchset? 

Thanks,
-Toshi


On Tue, 2015-01-06 at 13:49 -0700, Toshi Kani wrote:
> This patchset adds support of Write-Through (WT) mapping on x86.
> The study below shows that using WT mapping may be useful for
> non-volatile memory.
> 
>   http://www.hpl.hp.com/techreports/2012/HPL-2012-236.pdf
> 
> All new/modified interfaces have been tested.
> 
> v7:
>  - Rebased to 3.19-rc3 as Juergen's patchset for the PAT management
>has been accepted.
> 
> v6:
>  - Dropped the patch moving [set|get]_page_memtype() to pat.c
>since the tip branch already has this change.
>  - Fixed an issue when CONFIG_X86_PAT is not defined.
> 
> v5:
>  - Clarified comment of why using slot 7. (Andy Lutomirski,
>Thomas Gleixner)
>  - Moved [set|get]_page_memtype() to pat.c. (Thomas Gleixner)
>  - Removed BUG() from set_page_memtype(). (Thomas Gleixner)
> 
> v4:
>  - Added set_memory_wt() by adding WT support of regular memory.
> 
> v3:
>  - Dropped the set_memory_wt() patch. (Andy Lutomirski)
>  - Refactored the !pat_enabled handling. (H. Peter Anvin,
>Andy Lutomirski)
>  - Added the picture of PTE encoding. (Konrad Rzeszutek Wilk)
> 
> v2:
>  - Changed WT to use slot 7 of the PAT MSR. (H. Peter Anvin,
>Andy Lutomirski)
>  - Changed to have conservative checks to exclude all Pentium 2, 3,
>M, and 4 families. (Ingo Molnar, Henrique de Moraes Holschuh,
>Andy Lutomirski)
>  - Updated documentation to cover WT interfaces and usages.
>(Andy Lutomirski, Yigal Korman)
> 
> ---
> Toshi Kani (7):
>   1/7 x86, mm, pat: Set WT to PA7 slot of PAT MSR
>   2/7 x86, mm, pat: Change reserve_memtype() to handle WT
>   3/7 x86, mm, asm-gen: Add ioremap_wt() for WT
>   4/7 x86, mm, pat: Add pgprot_writethrough() for WT
>   5/7 x86, mm, pat: Refactor !pat_enable handling
>   6/7 x86, mm, asm: Add WT support to set_page_memtype()
>   7/7 x86, mm: Add set_memory_wt() for WT
> 
> ---
>  Documentation/x86/pat.txt|  13 ++-
>  arch/x86/include/asm/cacheflush.h|   6 +-
>  arch/x86/include/asm/io.h|   2 +
>  arch/x86/include/asm/pgtable_types.h |   3 +
>  arch/x86/mm/init.c   |   6 +-
>  arch/x86/mm/iomap_32.c   |  12 +--
>  arch/x86/mm/ioremap.c|  26 -
>  arch/x86/mm/pageattr.c   |  61 ++--
>  arch/x86/mm/pat.c| 184 
> ---
>  include/asm-generic/io.h |   9 ++
>  include/asm-generic/iomap.h  |   4 +
>  include/asm-generic/pgtable.h|   4 +
>  12 files changed, 244 insertions(+), 86 deletions(-)
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Smack: Don't build IPv6 stuff when CONFIG_IPV6=n

2015-01-15 Thread Rafał Krypa
From: Rafal Krypa 

For case when IPv6 is disabled, this fixes build break in one place and removes
unused code in several other places.

Signed-off-by: Rafal Krypa 
---
 security/smack/smack_lsm.c | 46 ++
 1 file changed, 30 insertions(+), 16 deletions(-)

diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index a688f7b..6fe7c6e 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -52,9 +52,9 @@
 #define SMK_RECEIVING  1
 #define SMK_SENDING2
 
-#ifndef CONFIG_SECURITY_SMACK_NETFILTER
+#if IS_ENABLED(CONFIG_IPV6) && !defined(CONFIG_SECURITY_SMACK_NETFILTER)
 LIST_HEAD(smk_ipv6_port_list);
-#endif
+#endif /* CONFIG_IPV6 && !CONFIG_SECURITY_SMACK_NETFILTER */
 static struct kmem_cache *smack_inode_cache;
 int smack_enabled;
 
@@ -2218,7 +2218,7 @@ static int smack_netlabel_send(struct sock *sk, struct 
sockaddr_in *sap)
return smack_netlabel(sk, sk_lbl);
 }
 
-#ifndef CONFIG_SECURITY_SMACK_NETFILTER
+#if IS_ENABLED(CONFIG_IPV6) && !defined(CONFIG_SECURITY_SMACK_NETFILTER)
 /**
  * smk_ipv6_port_label - Smack port access table management
  * @sock: socket
@@ -2368,7 +2368,7 @@ auditout:
rc = smk_bu_note("IPv6 port check", skp, object, MAY_WRITE, rc);
return rc;
 }
-#endif /* !CONFIG_SECURITY_SMACK_NETFILTER */
+#endif /* CONFIG_IPV6 && !CONFIG_SECURITY_SMACK_NETFILTER */
 
 /**
  * smack_inode_setsecurity - set smack xattrs
@@ -2429,10 +2429,10 @@ static int smack_inode_setsecurity(struct inode *inode, 
const char *name,
} else
return -EOPNOTSUPP;
 
-#ifndef CONFIG_SECURITY_SMACK_NETFILTER
+#if IS_ENABLED(CONFIG_IPV6) && !defined(CONFIG_SECURITY_SMACK_NETFILTER)
if (sock->sk->sk_family == PF_INET6)
smk_ipv6_port_label(sock, NULL);
-#endif
+#endif /* CONFIG_IPV6 && !CONFIG_SECURITY_SMACK_NETFILTER */
 
return 0;
 }
@@ -2474,12 +2474,14 @@ static int smack_socket_post_create(struct socket 
*sock, int family,
 static int smack_socket_bind(struct socket *sock, struct sockaddr *address,
int addrlen)
 {
+#if IS_ENABLED(CONFIG_IPV6)
if (sock->sk != NULL && sock->sk->sk_family == PF_INET6)
smk_ipv6_port_label(sock, address);
+#endif
 
return 0;
 }
-#endif
+#endif /* !CONFIG_SECURITY_SMACK_NETFILTER */
 
 /**
  * smack_socket_connect - connect access check
@@ -2508,10 +2510,10 @@ static int smack_socket_connect(struct socket *sock, 
struct sockaddr *sap,
case PF_INET6:
if (addrlen < sizeof(struct sockaddr_in6))
return -EINVAL;
-#ifndef CONFIG_SECURITY_SMACK_NETFILTER
+#if IS_ENABLED(CONFIG_IPV6) && !defined(CONFIG_SECURITY_SMACK_NETFILTER)
rc = smk_ipv6_port_check(sock->sk, (struct sockaddr_in6 *)sap,
SMK_CONNECTING);
-#endif
+#endif /* CONFIG_IPV6 && !CONFIG_SECURITY_SMACK_NETFILTER */
break;
}
return rc;
@@ -3394,9 +3396,9 @@ static int smack_socket_sendmsg(struct socket *sock, 
struct msghdr *msg,
int size)
 {
struct sockaddr_in *sip = (struct sockaddr_in *) msg->msg_name;
-#ifndef CONFIG_SECURITY_SMACK_NETFILTER
+#if IS_ENABLED(CONFIG_IPV6) && !defined(CONFIG_SECURITY_SMACK_NETFILTER)
struct sockaddr_in6 *sap = (struct sockaddr_in6 *) msg->msg_name;
-#endif
+#endif /* CONFIG_IPV6 && !CONFIG_SECURITY_SMACK_NETFILTER */
int rc = 0;
 
/*
@@ -3410,9 +3412,9 @@ static int smack_socket_sendmsg(struct socket *sock, 
struct msghdr *msg,
rc = smack_netlabel_send(sock->sk, sip);
break;
case AF_INET6:
-#ifndef CONFIG_SECURITY_SMACK_NETFILTER
+#if IS_ENABLED(CONFIG_IPV6) && !defined(CONFIG_SECURITY_SMACK_NETFILTER)
rc = smk_ipv6_port_check(sock->sk, sap, SMK_SENDING);
-#endif
+#endif /* CONFIG_IPV6 && !CONFIG_SECURITY_SMACK_NETFILTER */
break;
}
return rc;
@@ -3503,6 +3505,7 @@ static struct smack_known *smack_from_secattr(struct 
netlbl_lsm_secattr *sap,
return smack_net_ambient;
 }
 
+#if IS_ENABLED(CONFIG_IPV6)
 static int smk_skb_to_addr_ipv6(struct sk_buff *skb, struct sockaddr_in6 *sip)
 {
u8 nexthdr;
@@ -3549,6 +3552,7 @@ static int smk_skb_to_addr_ipv6(struct sk_buff *skb, 
struct sockaddr_in6 *sip)
}
return proto;
 }
+#endif /* CONFIG_IPV6 */
 
 /**
  * smack_socket_sock_rcv_skb - Smack packet delivery access check
@@ -3562,13 +3566,15 @@ static int smack_socket_sock_rcv_skb(struct sock *sk, 
struct sk_buff *skb)
struct netlbl_lsm_secattr secattr;
struct socket_smack *ssp = sk->sk_security;
struct smack_known *skp = NULL;
-   struct sockaddr_in6 sadd;
int rc = 0;
-   int proto;
struct smk_audit_info ad;
 #ifdef CONFIG_AUDIT
struct lsm_network_audit net;
 #endif
+#if IS_ENABLED(CONFIG_IPV6)
+   struct 

Re: [PATCH cgroup/for-3.19-fixes] cgroup: implement cgroup_subsys->unbind() callback

2015-01-15 Thread Michal Hocko
On Sun 11-01-15 15:55:43, Johannes Weiner wrote:
> From d527ba1dbfdb58e1f7c7c4ee12b32ef2e5461990 Mon Sep 17 00:00:00 2001
> From: Johannes Weiner 
> Date: Sun, 11 Jan 2015 10:29:05 -0500
> Subject: [patch] mm: memcontrol: zap outstanding cache/swap references during
>  unbind
> 
> Cgroup core assumes that any outstanding css references after
> offlining are temporary in nature, and e.g. mount waits for them to
> disappear and release the root cgroup.  But leftover page cache and
> swapout records in an offlined memcg are only dropped when the pages
> get reclaimed under pressure or the swapped out pages get faulted in
> from other cgroups, and so those cgroup operations can hang forever.
> 
> Implement the ->unbind() callback to actively get rid of outstanding
> references when cgroup core wants them gone.  Swap out records are
> deleted, such that the swap-in path will charge those pages to the
> faulting task. 

OK, that makes sense to me.

> Page cache pages are moved to the root memory cgroup.

OK, this is better than reclaiming them.

[...]
> +static void unbind_lru_list(struct mem_cgroup *memcg,
> + struct zone *zone, enum lru_list lru)
> +{
> + struct lruvec *lruvec = mem_cgroup_zone_lruvec(zone, memcg);
> + struct list_head *list = >lists[lru];
> +
> + while (!list_empty(list)) {
> + unsigned int nr_pages;
> + unsigned long flags;
> + struct page *page;
> +
> + spin_lock_irqsave(>lru_lock, flags);
> + if (list_empty(list)) {
> + spin_unlock_irqrestore(>lru_lock, flags);
> + break;
> + }
> + page = list_last_entry(list, struct page, lru);

taking lru_lock for each page calls for troubles. The lock would bounce
like crazy. It shouldn't be a big problem to list_move to a local list
and then work on that one without the lock. Those pages wouldn't be
visible for the reclaim but that would be only temporary. Or if that is
not acceptable then just batch at least some number of pages (popular
SWAP_CLUSTER_MAX).

> + if (!get_page_unless_zero(page)) {
> + list_move(>lru, list);
> + spin_unlock_irqrestore(>lru_lock, flags);
> + continue;
> + }
> + BUG_ON(!PageLRU(page));
> + ClearPageLRU(page);
> + del_page_from_lru_list(page, lruvec, lru);
> + spin_unlock_irqrestore(>lru_lock, flags);
> +
> + compound_lock(page);
> + nr_pages = hpage_nr_pages(page);
> +
> + if (!mem_cgroup_move_account(page, nr_pages,
> +  memcg, root_mem_cgroup)) {
> + /*
> +  * root_mem_cgroup page counters are not used,
> +  * otherwise we'd have to charge them here.
> +  */
> + page_counter_uncharge(>memory, nr_pages);
> + if (do_swap_account)
> + page_counter_uncharge(>memsw, nr_pages);
> + css_put_many(>css, nr_pages);
> + }
> +
> + compound_unlock(page);
> +
> + putback_lru_page(page);
> + }
> +}
> +
> +static void unbind_work_fn(struct work_struct *work)
> +{
> + struct cgroup_subsys_state *css;
> +retry:
> + drain_all_stock(root_mem_cgroup);
> +
> + rcu_read_lock();
> + css_for_each_child(css, _mem_cgroup->css) {
> + struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> +
> + /* Drop references from swap-out records */
> + if (do_swap_account) {
> + long zapped;
> +
> + zapped = swap_cgroup_zap_records(memcg->css.id);
> + page_counter_uncharge(>memsw, zapped);
> + css_put_many(>css, zapped);
> + }
> +
> + /* Drop references from leftover LRU pages */
> + css_get(css);
> + rcu_read_unlock();
> + atomic_inc(>moving_account);
> + synchronize_rcu();

Why do we need this? Who can migrate to/from offline memcgs? 

> + while (page_counter_read(>memory) -
> +page_counter_read(>kmem) > 0) {
> + struct zone *zone;
> + enum lru_list lru;
> +
> + lru_add_drain_all();
> +
> + for_each_zone(zone)
> + for_each_lru(lru)
> + unbind_lru_list(memcg, zone, lru);
> +
> + cond_resched();
> + }
> + atomic_dec(>moving_account);
> + rcu_read_lock();
> + css_put(css);
> + }
> + rcu_read_unlock();
> + /*
> +  * Swap-in is racy:
> +  *
> +  * #0#1
> +  *   

Re: [Linaro-acpi] [PATCH v5 18/18] Documentation: ACPI for ARM64

2015-01-15 Thread Mark Brown
On Thu, Jan 15, 2015 at 05:52:31PM +0100, Arnd Bergmann wrote:
> On Thursday 15 January 2015 10:51:58 Jon Masters wrote:

> > It gets worse. There *will* be large numbers of ACPI only ARM servers
> > landing over the coming year. Not only would DT code be untested, but
> > insisting on keeping e.g. a DSDT and DT in sync is never going to work
> > anyway. Already we have early stage servers that contain a DT used for
> > bringup that is subsequently not being updated as often as the ACPI
> > tables (those systems are now booting exclusively in labs with ACPI).
> > Eventually, I am going to push for the DT data to be removed from these
> > systems rather than have out of date unmaintained DT data in firmware.

> We will of course be able to relax the rule once ACPI has stabilized on
> ARM64. At the moment, we haven't even agreed on how to represent basic
> devices, so things are in flux and there is no way for a BIOS writer
> to ship an image that we will guarantee to support in the long run.

> At some point after we are reasonably sure we are able to keep supporting
> all existing systems that are working with that kernel, we can take
> support for new systems without having DT by default, and also support
> booting those without acpi=force, which is related to this question.

Speaking with my subsystem maintainer hat on (admittedly not subsystems
affected too much by ARM servers so take this with a pinch of salt) this
just sounds like it's making more work for me - it means having to force
people to write DT code and bindings which I'm then going to have to
review and none of us really care about.  Realistically I'm just going
to take the code if a lack of a DT binding is the only option, I suspect
others will be similar.


signature.asc
Description: Digital signature


Re: [PATCH 1/1] arch/x86/kvm/vmx.c: Fix external interrupts inject directly bug with guestos RFLAGS.IF=0

2015-01-15 Thread Radim Krčmář
2015-01-15 20:36+0800, Li Kaihang:
> This patch fix a external interrupt injecting bug in linux 3.19-rc4.

Was the bug introduced in earlier 3.19 release candidate?

> GuestOS is running and handling some interrupt with RFLAGS.IF = 0 while a 
> external interrupt coming,
> then can lead to a vm exit,in this case,we must avoid inject this external 
> interrupt or it will generate
> a processor hardware exception causing virtual machine crash.

What is the source of this exception?  (Is there a reproducer?)

> Now, I show more details about this problem:
> 
> A general external interrupt processing for a running virtual machine is 
> shown in the following:
> 
> Step 1:
>  a ext intr gen a vm_exit --> vmx_complete_interrupts --> 
> __vmx_complete_interrupts --> case INTR_TYPE_EXT_INR: 
> kvm_queue_interrupt(vcpu, vector, type == INTR_TYPE_SOFT_INTR);
> 
> Step 2:
>  kvm_x86_ops->handle_external_intr(vcpu);
> 
> Step 3:
>  get back to vcpu_enter_guest after a while cycle,then run 
> inject_pending_event
> 
> Step 4:
>  if (vcpu->arch.interrupt.pending) {
>   kvm_x86_ops->set_irq(vcpu);
>   return 0;
>   }
> 
> Step 5:
>  kvm_x86_ops->run(vcpu) --> vm_entry inject vector to guestos IDT
> 
> for the above steps, step 4 and 5 will be a processor hardware exception if 
> step1 happen while guestos RFLAGS.IF = 0, that is to say, guestos interrupt 
> is disabled.
> So we should add a logic to judge in step 1 whether a external interrupt need 
> to be pended then inject directly, in the process, we don't need to worry 
> about
> this external interrupt lost because the next Step 2 will handle and choose a 
> best chance to inject it by virtual interrupt controller.

Can you explain the relation between vectored events (Step 1) and
external interrupts (Step 2)?
(The bug happens when external interrupt arrives during event delivery?)

Why isn't the delivered event lost?
(It should be different from the external interrupt.)

Thanks.

> 
> 
> Signed-off-by: Li kaihang 
> ---
>  arch/x86/kvm/vmx.c |   20 ++--
>  1 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index d4c58d8..e8311ee 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -7711,10 +7711,26 @@ static void __vmx_complete_interrupts(struct kvm_vcpu 
> *vcpu,
> break;
> case INTR_TYPE_SOFT_INTR:
> vcpu->arch.event_exit_inst_len = vmcs_read32(instr_len_field);
> -   /* fall through */
> -   case INTR_TYPE_EXT_INTR:
> +   /*
> +   * As software and external interrupts may all get here,
> +   * we should separate soft intr from ext intr code,and this
> +   * will ensure that software interrupts handling process is not
> +   * affected by solving external interrupt invalid injecting.
> +   */
> kvm_queue_interrupt(vcpu, vector, type == 
> INTR_TYPE_SOFT_INTR);

(No need for 'type == INTR_TYPE_SOFT_INTR', we know it is true.)

> break;
> +   case INTR_TYPE_EXT_INTR:
> +   /*
> +   * GuestOS is running and handling some interrupt with
> +   * RFLAGS.IF = 0 while a external interrupt coming,
> +   * then can lead a vm exit getting here,in this case,
> +   * we must avoid inject this external interrupt or it will
> +   * generate a processor hardware exception causing vm crash.
> +   */
> +   if (kvm_x86_ops->interrupt_allowed(vcpu))
> +   kvm_queue_interrupt(vcpu, vector,
> +   type == INTR_TYPE_SOFT_INTR);

(And false here.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] gpio: gpio-dln2: Added a Blank line after declaration

2015-01-15 Thread Johan Hovold
On Thu, Jan 15, 2015 at 06:20:43PM +0100, Linus Walleij wrote:
> On Tue, Jan 13, 2015 at 4:09 PM, Mohammad Jamal
>  wrote:
> 
> > Fix the coding style issue by adding a blank line after declaration
> >
> > Signed-off-by: Mohammad Jamal 
> 
> Patch applied.

This one looks bogus; it's adding a random newline within the
declarations not after.

Johan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] virtio_pci_modern: validate features

2015-01-15 Thread Michael S. Tsirkin
Spec says devices must set VIRTIO_1 feature bit.
Fail gracefully if they don't.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_pci_modern.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/virtio/virtio_pci_modern.c 
b/drivers/virtio/virtio_pci_modern.c
index 17f0228..a8fd267 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -570,6 +570,7 @@ int virtio_pci_modern_probe(struct virtio_pci_device 
*vp_dev)
int err, common, isr, notify, device;
u32 notify_length;
u32 notify_offset;
+   u64 features;
 
check_offsets();
 
@@ -676,12 +677,19 @@ int virtio_pci_modern_probe(struct virtio_pci_device 
*vp_dev)
vp_dev->vdev.config = _pci_config_nodev_ops;
}
 
+   features = vp_get_features(vdev);
+   if (!features & (1ULL << VIRTIO_F_VERSION_1))
+   goto err_valid_features;
+
vp_dev->config_vector = vp_config_vector;
vp_dev->setup_vq = setup_vq;
vp_dev->del_vq = del_vq;
 
return 0;
 
+err_valid_features:
+   if (vp_dev->device)
+   pci_iounmap(pci_dev, vp_dev->device);
 err_map_device:
if (vp_dev->notify_base)
pci_iounmap(pci_dev, vp_dev->notify_base);
-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] input: tsc2007: Add pre-calibration, flipping and rotation

2015-01-15 Thread Dmitry Torokhov
On Thu, Jan 15, 2015 at 05:14:38PM +0100, Dr. H. Nikolaus Schaller wrote:
> 
> Am 15.01.2015 um 15:38 schrieb Sebastian Reichel :
> 
> > Hi,
> > 
> > On Thu, Jan 15, 2015 at 08:36:44AM +0100, Dr. H. Nikolaus Schaller wrote:
> >>> 1. Perform conversion in input core rather than individual drivers. I
> >>> think we should allocate a new bitmaps for some transformations and have
> >>> the code do X/Y flip/clip of the coordinates.
> >> 
> >> Do you have a suggestion where this should be (I have no clue how
> >> the input system works or is structured - we just know how to extend a
> >> driver that uses it)?
> >> 
> >>> 2. Standardize on bindings. We already have of-touchscreen.c doing
> >>> rudimentary parsing, we shoudl look into extending it rather than
> >>> creating myriad of driver-specific bindings.
> >> 
> >> Ok, looks reasonable.
> > 
> > Documentation is in 
> > 
> > Documentation/devicetree/bindings/input/touchscreen/touchscreen.txt
> 
> I did look into it now. Unfortunately, it does not fit well into my view of 
> how bindings
> should be. They should describe hardware (as we are told for many other kernel
> subsystems).
> 
> Pixels and resolutions are IMHO related to the screen it is glued on - and 
> that is
> quite independent.

Well, I think pixels was the wrong word to be used there. It is meant to
be native units, as opposed to millimeters, inches, points, etc.

> 
> So I don’t see how they do describe the different ways the touch screen can be
> wired to a tsc2007 controller.
> 
> Please can you add minimum and maximum properties for us?
> 
> Then, inverted-x and inverted-y is redundant because it is the same as having
> an expected higher value from the ADC for the minimum coordinate and a lower
> for the maximum.

I'd rather not add minimum and maximum, but add the touchscreen-start-x and
touchscreen-start-y instead so that we limit the number of obsolete
properties.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/7] block: rewrite __bio_copy_iov()

2015-01-15 Thread Christoph Hellwig
On Mon, Jan 12, 2015 at 12:43:59PM +0100, Dongsu Park wrote:
> Rewrite __bio_copy_iov() so that it can call either _read() or _write()
> variant, which is determined by direction to_iov, given as either READ
> or WRITE. Moreover, make __bio_copy_iov() take its parameter iov_iter
> by value, to avoid awkward situations like ref-/dereferencing pointer
> and value repeatedly.
> 
> This commit should contain only literal replacements, without
> functional changes.

This breaks booting a simple KVM VM for me:

[2.692732] general protection fault:  [#1] SMP 
[2.696041] Modules linked in:
[2.696041] CPU: 2 PID: 1819 Comm: cdrom_id Not tainted 3.19.0-rc4+ #47
[2.696041] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[2.696041] task: 88007b318b90 ti: 88007a0b task.ti: 
88007a0b
[2.696041] RIP: 0010:[]  [] 
bio_uncopy_user+0x60/0x160
[2.701775] RSP: 0018:88007a0b3a88  EFLAGS: 00010246
[2.701775] RAX: 0024 RBX: 20202020554d4551 RCX: 
[2.701775] RDX: 0024 RSI: 88007a6c7024 RDI: 88007cc9e304
[2.705548] RBP: 88007a0b3b08 R08: 0024 R09: 
[2.705548] R10:  R11: 0001 R12: 
[2.705548] R13: 88007cc9e280 R14: 880079cdd200 R15: 
[2.705548] FS:  7fdeb0282700() GS:88007fd0() 
knlGS:
[2.705548] CS:  0010 DS:  ES:  CR0: 8005003b
[2.705548] CR2: 01ebd008 CR3: 7aca6000 CR4: 06e0
[2.705548] Stack:
[2.715017]  0001  0024 
88007a0b3a70
[2.716562]  0001 0001 0024 

[2.717630]  88007a0b3a70 0001 88007a0b3b18 
88007cc9e280
[2.717630] Call Trace:
[2.717630]  [] __blk_rq_unmap_user+0x14/0x40
[2.717630]  [] blk_rq_unmap_user+0x31/0x60
[2.717630]  [] sg_io+0x2c3/0x4a0
[2.724739]  [] scsi_cmd_ioctl+0x425/0x4a0
[2.724739]  [] scsi_cmd_blk_ioctl+0x4a/0x60
[2.726432]  [] cdrom_ioctl+0x3b/0xc10
[2.726432]  [] ? trace_hardirqs_on+0xd/0x10
[2.726432]  [] ? sr_block_ioctl+0x48/0xd0
[2.726432]  [] ? trace_hardirqs_on_caller+0x10d/0x1d0
[2.726432]  [] ? trace_hardirqs_on+0xd/0x10
[2.726432]  [] sr_block_ioctl+0x84/0xd0
[2.726432]  [] blkdev_ioctl+0x232/0x7f0
[2.726432]  [] block_ioctl+0x3c/0x40
[2.726432]  [] do_vfs_ioctl+0x83/0x5b0
[2.726432]  [] ? final_putname+0x21/0x50
[2.726432]  [] ? sysret_check+0x22/0x5d
[2.726432]  [] SyS_ioctl+0x47/0x90
[2.726432]  [] system_call_fastpath+0x12/0x17
[2.726432] Code: 48 83 b8 48 03 00 00 00 74 06 f6 47 18 01 74 63 41 8b 1e 
85 db 74 30 66 41 83 7d 60 00 49 8b 5d 68 74 24 45 31 e4 0f 1f 44 00 00 <48> 8b 
3b 31 f6 41 83 c4 01 48 83 c3 10 e8 7e d4 a3 ff 41 0f b7 
[2.726432] RIP  [] bio_uncopy_user+0x60/0x160
[2.750102]  RSP 
[2.751775] ---[ end trace 577bd821e65932ad ]---



(gdb) l *(bio_uncopy_user+0x60/0x160)
0x81742400 is in bio_uncopy_user (../block/bio.c:1137).
1132 *
1133 *  Free pages allocated from bio_copy_user() and write back
data
1134 *  to user space in case of a read.
1135 */
1136int bio_uncopy_user(struct bio *bio)
1137{
1138struct bio_map_data *bmd = bio->bi_private;
1139struct bio_vec *bvec;
1140int ret = 0, i;
1141

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio_pci_modern: validate features

2015-01-15 Thread Michael S. Tsirkin
On Thu, Jan 15, 2015 at 08:13:44PM +0200, Michael S. Tsirkin wrote:
> Spec says devices must set VIRTIO_1 feature bit.
> Fail gracefully if they don't.
> 
> Signed-off-by: Michael S. Tsirkin 

Oops, this is not needed: we already have this in finalize_features

if (!__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) {
dev_err(>dev, "virtio: device uses modern interface "
"but does not have VIRTIO_F_VERSION_1\n");
return -EINVAL;
}

Pls disregard this patch.


> ---
>  drivers/virtio/virtio_pci_modern.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_pci_modern.c 
> b/drivers/virtio/virtio_pci_modern.c
> index 17f0228..a8fd267 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -570,6 +570,7 @@ int virtio_pci_modern_probe(struct virtio_pci_device 
> *vp_dev)
>   int err, common, isr, notify, device;
>   u32 notify_length;
>   u32 notify_offset;
> + u64 features;
>  
>   check_offsets();
>  
> @@ -676,12 +677,19 @@ int virtio_pci_modern_probe(struct virtio_pci_device 
> *vp_dev)
>   vp_dev->vdev.config = _pci_config_nodev_ops;
>   }
>  
> + features = vp_get_features(vdev);
> + if (!features & (1ULL << VIRTIO_F_VERSION_1))
> + goto err_valid_features;
> +
>   vp_dev->config_vector = vp_config_vector;
>   vp_dev->setup_vq = setup_vq;
>   vp_dev->del_vq = del_vq;
>  
>   return 0;
>  
> +err_valid_features:
> + if (vp_dev->device)
> + pci_iounmap(pci_dev, vp_dev->device);
>  err_map_device:
>   if (vp_dev->notify_base)
>   pci_iounmap(pci_dev, vp_dev->notify_base);
> -- 
> MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


sysfs methods can race with ->remove

2015-01-15 Thread Alan Stern
Tejun:

The context is that we have been talking about
drivers/scsi/scsi_scan.c:scsi_rescan_device(), which is called by the
store_rescan_field() sysfs method in scsi_sysfs.c.  The problem is
this: What happens in scsi_rescan_device if the device is unbound from
its driver before the module_put call?  The dev->driver->owner
calculation would dereference a NULL pointer.

On Thu, 15 Jan 2015, Christoph Hellwig wrote:

> On Wed, Jan 14, 2015 at 10:07:00AM -0500, Alan Stern wrote:
> > and the kernfs core insures that the underlying device won't be 
> > deallocated while a sysfs method runs.
> 
> It has a reference to keep it from beeing freed, but so far I can't find
> anything that prevents ->remove from beeing called while we are in or
> just before a method call.

There are two types of methods to think about: Those registered by the 
subsystem and those registered by the driver.

If a method is registered by the driver, then the driver will
unregister it when the ->remove routine runs.  I don't know for
certain, but I would expect that the sysfs/kernfs core will make sure
that any existing method calls complete before unregister returns.  
This would prevent races.

If a method is registered by the subsystem, and if the method runs 
entirely within the subsystem's code, then ->remove doesn't matter.  
The driver could be unbound while the method is running and it would be 
okay.

The only time we have a problem is when the method is registered by the 
subsystem and the method calls into the driver.  (Note that this is 
exactly what happens with scsi_rescan_device.)

> > > But this seems like a more generic problem, and at least a quick glance at
> > > the pci_driver methods seems like others don't have a good
> > > synchroniation of ->remove against random driver methods.
> > 
> > Can you give one or two examples?
> 
> I look at the sriov_configure PCI method, or the various sub-methods
> under pci_driver.err_handler.

The sriov_numvfs_store method does have the same problem, and so does 
the reset_store method (by way of pci_reset_function -> 
pci_dev_save_and_disable -> pci_reset_notify).

Tejun, is my analysis correct?  How should we fix these races?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched,numa: do not move past the balance point if unbalanced

2015-01-15 Thread Rik van Riel

On 01/15/2015 05:45 AM, Peter Zijlstra wrote:

On Mon, Jan 12, 2015 at 04:30:39PM -0500, Rik van Riel wrote:

There is a subtle interaction between the logic introduced in commit
e63da03639cc9e6e83b62e7ef8ffdbb92421416a, the way the load balancer


   e63da03639cc ("sched/numa: Allow task switch if load imbalance improves")


Will do. Thanks for the git config :)


The fix is to check the direction of the net moving of load, and to
refuse a NUMA move if it would cause the system to move past the point
of balance.  In an unbalanced state, only moves that bring us closer
to the balance point are allowed.


Did you also test with whatever load needed the previous thing? Its far
too easy to fix one and break the other in my experience ;-)


The load that caused the need for the previous fix
was one where all CPUs were overloaded, and one
process did not have its threads converged on one
node yet.

In that case, the load from moving one task is a
small fraction of the total load, and we are unable
to move through the balance point to the other side.


orig_src_load = env->src_stats.load;
-   orig_dst_load = env->dst_stats.load;

-   if (orig_dst_load < orig_src_load)
-   swap(orig_dst_load, orig_src_load);
-
-   old_imb = orig_dst_load * src_capacity * 100 -
- orig_src_load * dst_capacity * env->imbalance_pct;
+   /*
+* In a task swap, there will be one load moving from src to dst,
+* and another moving back. This is the net sum of both moves.
+* Allow the move if it brings the system closer to a balanced
+* situation, without crossing over the balance point.
+*/


This comment seems to 'forget' about the !swap moves?


Not sure how to describe that, except by pointing out
that a task move is always from src to dst :)


+   moved_load = orig_src_load - src_load;

-   /* Would this change make things worse? */
-   return (imb > old_imb);
+   if (moved_load > 0)
+   /* Moving src -> dst. Did we overshoot balance? */
+   return src_load < dst_load;


So here we inhibit movement when the src cpu gained (moved_load > 0) and
src was smaller than dst.


moved_load > 0 means that the src cpu loses load


However there is no check the new src value is in fact bigger than dst;
src could have gained and still be smaller. And afaict that's a valid
move, even under the proposed semantics, right?


If src gains load, moved_load will be negative


+   else
+   /* Moving dst -> src. Did we overshoot balance? */
+   return dst_load < src_load;


And vs.


  }


One should use capacity muck when comparing load between CPUs.


You are right, I should use the capacities here.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 00/17] Introduce ACPI for ARM64 based on ACPI 5.1

2015-01-15 Thread Catalin Marinas
Hi Grant,

On Thu, Jan 15, 2015 at 04:26:20PM +, Grant Likely wrote:
> On Wed, Jan 14, 2015 at 3:04 PM, Hanjun Guo  wrote:
> > This is the v7 of ACPI core patches for ARM64 based on ACPI 5.1
> 
> I'll get right to the point: Can we please have this series queued up
> for v3.20?

Before you even ask for this, please look at the patches and realise
that there is a complete lack of Reviewed-by tags on the code (well,
apart from trivial Kconfig changes). In addition, the series touches on
other subsystems like clocksource, irqchip, acpi and I don't see any
acks from the corresponding maintainers. So even if I wanted to merge
the series, there is no way it can be done without additional
reviews/acks. On the document (last patch), I'd like to see a statement
from HP as they've been vocal in private but no public endorsement of
this doc.

I also have trouble seeing the full picture. Is there a git repository
somewhere with this series and any additional patches required for a
real hardware platform?

> I really think we've hit the point where it is more valuable to merge
> it (or at least prepare to merge it) rather than keeping it out of
> mainline.

That's pretty subjective.

> Continuing to keep the patches out I think is having the opposite
> effect from what is desired. Catalin, you've told me a few times that
> saying "no" is the only leverage you have to keeping crap drivers out
> of the kernel until things mature, and by extension influence how
> firmware gets implemented. However, as far as drivers are concerned,
> there is nothing stopping maintainers from picking up ACPI drivers for
> ARM hardware regardless of whether or not the core ARM code is merged.
> If a driver depends on CONFIG_ACPI, and if the code seems to look
> good, there is nothing preventing it from being merged. There are
> already ARM related ACPI patches going into mainline.
> 
> For example: https://lkml.org/lkml/2014/12/25/120

I wasn't really referring to simple driver changes like the above but to
whole subsystems like clocks done in ACPI. My point was that before we
enable arm64 ACPI, we need to have some clear guidelines to firmware and
hardware vendors, otherwise if we don't know how to do it properly, we
shouldn't even bother (or we may end up re-creating the DT support in
ACPI; I'm not convinced that's sorted yet).

> Instead, keeping these patches out means that hardware is getting
> developed and tested against Fedora, early access RHEL and Linaro
> kernels. It means that we're abdicating on any influence mainline has
> over how those platforms are developed. The longer these patches stay
> out of mainline, the greater the potential for delta between what is
> in the vendor kernels and what we accept into mainline.

I'm not buying this argument. Putting pressure on maintainers to merge
something because Fedora or some other distro has merged them is not the
right approach. If such Linux vendors ignore arguments on the list just
for the sake of providing ACPI support, there is a high chance that they
will accept non-standard code any other time when the kernel community
disagrees.

Just to be clear, I don't block the ACPI patches for fun, reading these
long threads is not fun anymore. I don't have any religious arguments
against ACPI, longer term I see it as a first class citizen alongside
DT, but I want to make sure we do it properly and have a clear vision on
how we support it in the future. You can call this "delayed
gratification" if you want.

And it's not about code going into arch/arm64 and not even small driver
changes to enable ACPI but the longer term plans on how we reduce
(rather than eliminate) future kernel quirks because we didn't first get
to an agreement on how kernel and firmware interact. Things are getting
better and Al's to-do list is a good benchmark (more comments below).

(I have my concerns with DT as well but the requirement of compatibility
between older/newer kernels/firmware is not as strict)

> Finally, keeping them out has the practical effect of causing extra
> work to continually rebase them, while potentially running into new
> conflicts and bugs, for little if any real benefit. Whereas getting
> them into linux-next starts giving us some feedback on conflicts with
> other things that are being queued up for mainline. Not to mention
> reviewer fatigue having to go over the same set of patches again and
> again.

17 patches is really not too hard and it looks like the number is slowly
decreasing as they are picked by the corresponding maintainers.

> Right now we're at -rc4. We'll be at -rc5 this weekend, and quite
> possibly have a new merge window right at the start of Connect.
> Queuing these patches up now isn't even a 100% commitment for you to
> ask Linus to pull them. We can have further discussions at Connect. If
> you're still not satisfied then drop them out again for another cycle.
> However, if they aren't queued up now, then we're looking at mid-June
> before they show up in 

Re: [PATCH 0/2] clockevents: introduce ->set_dev_mode() and convert a few drivers

2015-01-15 Thread Kevin Hilman
Thomas,

Gentle reminder ping...

On Tue, Dec 9, 2014 at 2:03 PM, Kevin Hilman  wrote:
> From: Kevin Hilman 
>
> Currently, the ->set_mode() method of a clockevent device is not
> allowed to fail, so it has no return value.  In order to add new
> clockevent modes, and allow the setting of those modes to fail, we
> need the clockevent core to be able to detect when setting a mode
> fails.
>
> Rather than changing the current ->set_mode() and requiring all
> clockevent devices to change immedately, introduce a new mode setting
> method ->set_dev_mode() which returns 'int'.
>
> In addition, migrate a few drivers over to the new method to
> demonstrate how the new method is to be used, and how to convert.
>
> Proposal for new method originally suggested by Thomas Gleixner[1].
>
> [1] https://lkml.org/lkml/2014/5/10/86
>
> Viresh Kumar (2):
>   clockevents: introduce ->set_dev_mode() which can return error
>   clockevents: migrate some drivers to new ->set_dev_mode()
>
>  drivers/clocksource/arm_arch_timer.c | 46 
> +---
>  drivers/clocksource/bcm2835_timer.c  | 10 +++
>  drivers/clocksource/bcm_kona_timer.c | 15 ---
>  drivers/clocksource/i8253.c  | 11 +---
>  drivers/clocksource/time-armada-370-xp.c | 21 +++
>  include/linux/clockchips.h   |  5 +++-
>  kernel/time/clockevents.c| 21 ---
>  kernel/time/timer_list.c |  5 +++-
>  8 files changed, 91 insertions(+), 43 deletions(-)
>
> --
> 2.1.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Makefile: allow building selected tests with non-NPTL toolchain

2015-01-15 Thread Clark Williams
On Thu, 15 Jan 2015 07:35:01 +
Alexey Brodkin  wrote:

> Hi Clark, John,
> 
> On Mon, 2014-11-10 at 10:16 +0300, Alexey Brodkin wrote:
> > Some architectures are still stuck with non-NPTL toolchains.
> > These are for example ARC, Blackfin, Xtensa etc.
> > 
> > Still rt-tests are very good benchmarks and it would be good to enable use 
> > of
> > at least selected (those that will be built) tests on those architectures.
> > 
> > This change makes it possible to only build subset of tests that don't 
> > require
> > NPTL calls.
> > 
> > By default behavior is not modified - all tests are built, but if one wants
> > to build with non-NPTL toolchain just add "HAVE_NPTL=no" in command line
> > or modify "HAVE_NPTL" variable right in Makefile and execute "make".
> > 
> > Signed-off-by: Alexey Brodkin 
> > Cc: Vineet Gupta 
> > Cc: Clark Williams 
> > ---
> >  Makefile | 11 ---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/Makefile b/Makefile
> > index 318a5c6..675edf7 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -1,8 +1,13 @@
> >  VERSION_STRING = 0.89
> >  
> > -sources = cyclictest.c signaltest.c pi_stress.c rt-migrate-test.c  \
> > - ptsematest.c sigwaittest.c svsematest.c pmqtest.c sendme.c\
> > - pip_stress.c hackbench.c
> > +HAVE_NPTL ?= yes
> > +
> > +ifeq ($(HAVE_NPTL),yes)
> > +sources = cyclictest.c pi_stress.c pip_stress.c pmqtest.c rt-migrate-test.c
> > +endif
> > +
> > +sources += signaltest.c ptsematest.c sigwaittest.c svsematest.c sendme.c \
> > + hackbench.c
> >  
> >  TARGETS = $(sources:.c=)
> 
> I'm wondering if there's a chance to get this patch reviewed and if
> there're no objections applied?
> 
> Regards,
> Alexey

Changes looked good to me. I've pulled it in and it will be in the next
release.

Clark


pgpX8vunLzLqU.pgp
Description: OpenPGP digital signature


Re: [PATCH v7 03/17] ARM64 / ACPI: Introduce sleep-arm.c

2015-01-15 Thread Mark Langsdorf

On 01/14/2015 09:04 AM, Hanjun Guo wrote:

From: Graeme Gregory 

ACPI 5.1 does not currently support S states for ARM64 hardware but
ACPI code will call acpi_target_system_state() for device power
managment, so introduce sleep-arm.c to allow other drivers to function
until S states are defined.

Tested-by: Suravee Suthikulpanit 
Tested-by: Yijing Wang 
Signed-off-by: Graeme Gregory 
Signed-off-by: Tomasz Nowicki 
Signed-off-by: Hanjun Guo 
---

Tested-by: Mark Langsdorf 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 02/17] ARM64 / ACPI: Get RSDP and ACPI boot-time tables

2015-01-15 Thread Mark Langsdorf

On 01/14/2015 09:04 AM, Hanjun Guo wrote:

From: Al Stone 

As we want to get ACPI tables to parse and then use the information
for system initialization, we should get the RSDP (Root System
Description Pointer) first, it then locates Extended Root Description
Table (XSDT) which contains all the 64-bit physical address that
pointer to other boot-time tables.

Introduce acpi.c and its related head file in this patch to provide
fundamental needs of extern variables and functions for ACPI core,
and then get boot-time tables as needed.
   - asm/acenv.h for arch specific ACPICA environments and
 implementation, It is needed unconditionally by ACPI core;
   - asm/acpi.h for arch specific variables and functions needed by
 ACPI driver core;
   - acpi.c for ARM64 related ACPI implementation for ACPI driver
 core;

acpi_boot_table_init() is introduced to get RSDP and boot-time tables,
it will be called in setup_arch() before paging_init(), so we should
use eary_memremap() mechanism here to get the RSDP and all the table
pointers.

Tested-by: Suravee Suthikulpanit 
Tested-by: Yijing Wang 
Signed-off-by: Al Stone 
Signed-off-by: Graeme Gregory 
Signed-off-by: Tomasz Nowicki 
Signed-off-by: Hanjun Guo 
---

Tested-by: Mark Langsdorf 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 01/17] arm64: allow late use of early_ioremap

2015-01-15 Thread Mark Langsdorf

On 01/14/2015 09:04 AM, Hanjun Guo wrote:

From: Mark Salter 

Commit 0e63ea48b4d8 (arm64/efi: add missing call to early_ioremap_reset())
added a missing call to early_ioremap_reset(). This triggers a BUG if code
tries using early_ioremap() after the early_ioremap_reset(). This is a
problem for some ACPI code which needs short-lived temporary mappings
after paging_init() but before acpi_early_init() in start_kernel(). This
patch adds definitions for the __late_set_fixmap() and __late_clear_fixmap()
which avoids the BUG by allowing later use of early_ioremap().

Signed-off-by: Mark Salter 
CC: Leif Lindholm 
CC: Ard Biesheuvel 
[hj: update the change log]
Signed-off-by: Hanjun Guo 
---

Tested-by: Mark Langsdorf 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 05/17] ARM64 / ACPI: If we chose to boot from acpi then disable FDT

2015-01-15 Thread Mark Langsdorf

On 01/14/2015 09:04 AM, Hanjun Guo wrote:

From: Graeme Gregory 

If the early boot methods of acpi are happy that we have valid ACPI
tables and acpi=force has been passed, then do not unflat devicetree
effectively disabling further hardware probing from DT.

Tested-by: Suravee Suthikulpanit 
Tested-by: Yijing Wang 
Signed-off-by: Graeme Gregory 
Signed-off-by: Hanjun Guo 
---

Tested-by: Mark Langsdorf 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 04/17] ARM64 / ACPI: Introduce early_param for "acpi" and pass acpi=force to enable ACPI

2015-01-15 Thread Mark Langsdorf

On 01/14/2015 09:04 AM, Hanjun Guo wrote:

From: Al Stone 

Introduce one early parameters "off" and "force" for "acpi", acpi=off
will be the default behavior for ARM64, so introduce acpi=force to
enable ACPI on ARM64.

Disable ACPI before early parameters parsed, and enable it to pass
"acpi=force" if people want use ACPI on ARM64. This ensures DT be
the prefer one if ACPI table and DT both are provided at this moment.

Tested-by: Suravee Suthikulpanit 
Tested-by: Yijing Wang 
Signed-off-by: Al Stone 
Signed-off-by: Graeme Gregory 
Signed-off-by: Hanjun Guo 
---

Tested-by: Mark Langsdorf 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 06/17] ARM64 / ACPI: Make PCI optional for ACPI on ARM64

2015-01-15 Thread Mark Langsdorf

On 01/14/2015 09:04 AM, Hanjun Guo wrote:

Since PCI is not required in ACPI spec and ARM can run without
it, introduce some stub functions to make PCI optional for ACPI,
and make ACPI core run without CONFIG_PCI on ARM64.

When PCI is enabled on ARM64, ACPI core will need some PCI functions
to make it functional, so introduce some empty functions here and
implement it later.

Since ACPI on X86 and IA64 depends on PCI and this patch only makes
PCI optional for ARM64, it will not break anything on X86 and IA64.

Tested-by: Suravee Suthikulpanit 
Tested-by: Yijing Wang 
Signed-off-by: Hanjun Guo 
---

Tested-by: Mark Langsdorf 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 07/17] ARM64 / ACPI: Disable ACPI if FADT revision is less than 5.1

2015-01-15 Thread Mark Langsdorf

On 01/14/2015 09:04 AM, Hanjun Guo wrote:

FADT Major.Minor version was introduced in ACPI 5.1, it is the same
as ACPI version.

In ACPI 5.1, some major gaps are fixed for ARM, such as updates in
MADT table for GIC and SMP init, without those updates, we can not
get the MPIDR for SMP init, and GICv2/3 related init information, so
we can't boot arm64 ACPI properly with table versions predating 5.1.

If firmware provides ACPI tables with ACPI version less than 5.1,
OS will be messed up with those information and have no way to init
smp and GIC, so disable ACPI if we get an FADT table with version
less that 5.1.

Tested-by: Suravee Suthikulpanit 
Tested-by: Yijing Wang 
Signed-off-by: Hanjun Guo 
---

Tested-by: Mark Langsdorf 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 09/17] ACPI / table: Print GIC information when MADT is parsed

2015-01-15 Thread Mark Langsdorf

On 01/14/2015 09:04 AM, Hanjun Guo wrote:

When MADT is parsed, print GIC information to make the boot
log look pretty:

ACPI: GICC (acpi_id[0x] address[e112f000] MPIDR[0x0] enabled)
ACPI: GICC (acpi_id[0x0001] address[e112f000] MPIDR[0x1] enabled)
...
ACPI: GICC (acpi_id[0x0201] address[e112f000] MPIDR[0x201] enabled)

These information will be very helpful to bring up early systems to
see if acpi_id and MPIDR are matched or not as spec defined.

Tested-by: Suravee Suthikulpanit 
Tested-by: Yijing Wang 
Signed-off-by: Hanjun Guo 
Signed-off-by: Tomasz Nowicki 
---

Tested-by: Mark Langsdorf 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 2/4] fpga manager: add sysfs interface document

2015-01-15 Thread Jason Gunthorpe
On Thu, Jan 15, 2015 at 10:34:39AM -0600, atull wrote:

> This is great!  The way I had it working was using Pantelis' devicetree
> configfs interface.

I figured you were very close to this already in your overlay work..
 
> The DT fragment described the FPGA logic and included a filename
> for firmware to load. In another branch of this thread, I see discussion
> starting on what the overlay should look like and whether it could somehow
> contain the DT itself.

It is a novel idea, my concern would be that embedding the FPGA in the
DT makes it permanent unswappable kernel memory.

Not having the kernel hold the FPGA is best for many uses.

Having the kernel hold the FPGA as a swppable file handle/mmap of some
sort is next best (obviously the fs must be operating during resume)

Unswappable kernel memory is the worst choice

> Long ago this driver started out with a /dev interface.  It didn't have
> an ioctl yet at that point, but programming the fpga was by opening
> the devnode and writing to it.  Greg KH preferred sysfs or configfs
> over adding another ioctl:

I think to justify the ioctls you need a reason to have the context
of a FD. sysfs is stateless, so if my process dies the kernel doesn't
know.

But now that we are talking about adding locking and ownership
concepts a FD is the natural anchor for that in user space.

Ie, if I open the dev node, program a FPGA and then crash the kernel
doesn't attach drivers, and immediately de-programs the
chip. Userspace has to make it all the way through to the DT bind
before the FPGA lifetime would exceed the FD.

> https://lkml.org/lkml/2013/10/8/677

I think Greg's reply makes sense in the context of the question being
asked. Thinking of the FPGA as lockable ref counted kind of resource
changes the question somewhat.

Identifying the ioctls needed would probably clarify things. My
rough start would be 

- Get status: programed, not programmed, error?
  Bitfile type? (eg Xilinx has nearly every permutation of bit/byte
  ordering)
- Start Program with with some kind of context (ie this a new bit
  file, partial reconfiguration basis X, partial reconfiguration
  overlay on X)
- for (;;) write() to do programming
- Get Error to return detailed failure information (CRC error,
  auth error, etc)
- Hand over to a DT overlay (how does this work?) Lock transfers
  from FD to kernel

-  .. something something VFIO .. ?

Where start program is refused if the FPGA is already locked, and
  locks it 
Where start program -> close() returns the FPGA back to reset and
  unlocks
Where start program -> hand over -> close() keeps the FPGA loaded with
 kernel drivers attached and fpga locked (remove the overlay to
 de-program and unlock)

Not sure exactly how to tie together DT overlays with the FPGA state,
but that seems the natural combination..

Not sure about partial reconfiguration - clearly the kernel needs to
know and check that the bitfiles are of the correct family, I wonder
if the approach should be to program a basis on the FPGA which then
creates a new FPGA device in the system that can accept the partial
reconfiguration - this way the locking makes sense, the basis is
locked by the kernel and devices and the overlay remains
lockable/swappable/whatever by a dedicated /dev/ node ??

Just thinking aloud, I've never had a use case for partial
reconfiguration.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/6] memcg: track shared inodes with dirty pages

2015-01-15 Thread Konstantin Khebnikov
From: Konstantin Khlebnikov 

Inode is owned only by one memory cgroup, but if it's shared it might
contain pages from multiple cgroups. This patch detects this situation
in memory reclaiemer and marks dirty inode with flag I_DIRTY_SHARED
which is cleared only when data is completely written. Memcg writeback
always writes such inodes.

Signed-off-by: Konstantin Khlebnikov 
---
 fs/fs-writeback.c  |4 ++--
 include/linux/fs.h |3 +++
 include/linux/memcontrol.h |4 
 mm/memcontrol.c|   20 
 mm/vmscan.c|4 
 5 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 9034768..fda6a64 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -484,7 +484,7 @@ __writeback_single_inode(struct inode *inode, struct 
writeback_control *wbc)
 */
spin_lock(>i_lock);
 
-   dirty = inode->i_state & I_DIRTY;
+   dirty = inode->i_state & (I_DIRTY | I_DIRTY_SHARED);
inode->i_state &= ~I_DIRTY;
 
/*
@@ -501,7 +501,7 @@ __writeback_single_inode(struct inode *inode, struct 
writeback_control *wbc)
smp_mb();
 
if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
-   inode->i_state |= I_DIRTY_PAGES;
+   inode->i_state |= I_DIRTY_PAGES | (dirty & I_DIRTY_SHARED);
 
spin_unlock(>i_lock);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ee2e3c0..303f0ad 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1741,6 +1741,8 @@ struct super_operations {
  *
  * I_DIO_WAKEUPNever set.  Only used as a key for 
wait_on_bit().
  *
+ * I_DIRTY_SHARED  Dirty pages belong to multiple memory cgroups.
+ *
  * Q: What is the difference between I_WILL_FREE and I_FREEING?
  */
 #define I_DIRTY_SYNC   (1 << 0)
@@ -1757,6 +1759,7 @@ struct super_operations {
 #define __I_DIO_WAKEUP 9
 #define I_DIO_WAKEUP   (1 << I_DIO_WAKEUP)
 #define I_LINKABLE (1 << 10)
+#define I_DIRTY_SHARED (1 << 11)
 
 #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES)
 
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index ae05563..3f89e9b 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -181,6 +181,8 @@ void mem_cgroup_forget_mapping(struct address_space 
*mapping);
 bool mem_cgroup_dirty_limits(struct address_space *mapping, unsigned long 
*dirty,
 unsigned long *thresh, unsigned long *bg_thresh);
 bool mem_cgroup_dirty_exceeded(struct inode *inode);
+void mem_cgroup_poke_writeback(struct address_space *mapping,
+  struct mem_cgroup *memcg);
 
 #else /* CONFIG_MEMCG */
 struct mem_cgroup;
@@ -358,6 +360,8 @@ static inline void mem_cgroup_forget_mapping(struct 
address_space *mapping) {}
 static inline bool mem_cgroup_dirty_limits(struct address_space *mapping, 
unsigned long *dirty,
 unsigned long *thresh, unsigned long *bg_thresh) { 
return false; }
 static inline bool mem_cgroup_dirty_exceeded(struct inode *inode) { return 
false; }
+static inline void mem_cgroup_poke_writeback(struct address_space *mapping,
+struct mem_cgroup *memcg) { }
 
 #endif /* CONFIG_MEMCG */
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 17d966a3b..d9d345c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6064,6 +6064,9 @@ bool mem_cgroup_dirty_exceeded(struct inode *inode)
if (mapping->backing_dev_info->dirty_exceeded)
return true;
 
+   if (inode->i_state & I_DIRTY_SHARED)
+   return true;
+
rcu_read_lock();
memcg = rcu_dereference(mapping->i_memcg);
for (; memcg; memcg = parent_mem_cgroup(memcg)) {
@@ -6084,6 +6087,23 @@ bool mem_cgroup_dirty_exceeded(struct inode *inode)
return memcg != NULL;
 }
 
+void mem_cgroup_poke_writeback(struct address_space *mapping,
+  struct mem_cgroup *memcg)
+{
+   struct inode *inode = mapping->host;
+
+   if (rcu_access_pointer(mapping->i_memcg) == memcg ||
+   !memcg->dirty_exceeded)
+   return;
+
+   if (inode->i_state & (I_DIRTY_PAGES|I_DIRTY_SHARED) == I_DIRTY_PAGES) {
+   spin_lock(>i_lock);
+   if (inode->i_state & I_DIRTY_PAGES)
+   inode->i_state |= I_DIRTY_SHARED;
+   spin_unlock(>i_lock);
+   }
+}
+
 /*
  * subsys_initcall() for memory controller.
  *
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ab2505c..75165fc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1013,6 +1013,10 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
inc_zone_page_state(page, NR_VMSCAN_IMMEDIATE);
SetPageReclaim(page);
 
+   if (!global_reclaim(sc))
+

[PATCH 4/6] percpu_ratelimit: high-performance ratelimiting counter

2015-01-15 Thread Konstantin Khebnikov
From: Konstantin Khlebnikov 

Parameters:
period   - interval between refills (100ms should be fine)
quota- events refill per period
deadline - interval to utilize unused past quota (1s by default)
latency  - maximum injected delay (10s by default)

Quota sums into 'budget' and spreads across cpus.

Signed-off-by: Konstantin Khlebnikov 
---
 include/linux/percpu_ratelimit.h |   45 ++
 lib/Makefile |1 
 lib/percpu_ratelimit.c   |  168 ++
 3 files changed, 214 insertions(+)
 create mode 100644 include/linux/percpu_ratelimit.h
 create mode 100644 lib/percpu_ratelimit.c

diff --git a/include/linux/percpu_ratelimit.h b/include/linux/percpu_ratelimit.h
new file mode 100644
index 000..42c45d4
--- /dev/null
+++ b/include/linux/percpu_ratelimit.h
@@ -0,0 +1,45 @@
+#ifndef _LINUX_PERCPU_RATELIMIT_H
+#define _LINUX_PERCPU_RATELIMIT_H
+
+#include 
+
+struct percpu_ratelimit {
+   struct hrtimer  timer;
+   ktime_t target; /* time of next refill */
+   ktime_t deadline;   /* interval to utilize past budget */
+   ktime_t latency;/* maximum injected delay */
+   ktime_t period; /* interval between refills */
+   u64 quota;  /* events refill per period */
+   u64 budget; /* amount of available events */
+   u64 total;  /* consumed and pre-charged events */
+   raw_spinlock_t  lock;   /* protect the state */
+   u32 cpu_batch;  /* events in per-cpu precharge */
+   u32 __percpu*cpu_budget;/* per-cpu precharge */
+};
+
+static inline bool percpu_ratelimit_blocked(struct percpu_ratelimit *rl)
+{
+   return hrtimer_active(>timer);
+}
+
+static inline ktime_t percpu_ratelimit_target(struct percpu_ratelimit *rl)
+{
+   return rl->target;
+}
+
+static inline int percpu_ratelimit_wait(struct percpu_ratelimit *rl)
+{
+   ktime_t target = rl->target;
+
+   return schedule_hrtimeout_range(, ktime_to_ns(rl->period),
+   HRTIMER_MODE_ABS);
+}
+
+int percpu_ratelimit_init(struct percpu_ratelimit *rl, gfp_t gfp);
+void percpu_ratelimit_destroy(struct percpu_ratelimit *rl);
+void percpu_ratelimit_setup(struct percpu_ratelimit *rl, u64 quota, u64 
period);
+u64 percpu_ratelimit_quota(struct percpu_ratelimit *rl, u64 period);
+bool percpu_ratelimit_charge(struct percpu_ratelimit *rl, u64 events);
+u64 percpu_ratelimit_sum(struct percpu_ratelimit *rl);
+
+#endif /* _LINUX_PERCPU_RATELIMIT_H */
diff --git a/lib/Makefile b/lib/Makefile
index 3c3b30b..b20ab47 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -21,6 +21,7 @@ lib-$(CONFIG_SMP) += cpumask.o
 
 lib-y  += kobject.o klist.o
 obj-y  += lockref.o
+obj-y   += percpu_ratelimit.o
 
 obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
 bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \
diff --git a/lib/percpu_ratelimit.c b/lib/percpu_ratelimit.c
new file mode 100644
index 000..8254683
--- /dev/null
+++ b/lib/percpu_ratelimit.c
@@ -0,0 +1,168 @@
+#include 
+
+static void __percpu_ratelimit_setup(struct percpu_ratelimit *rl,
+u64 period, u64 quota)
+{
+   rl->period = ns_to_ktime(period);
+   rl->quota = quota;
+   rl->total += quota - rl->budget;
+   rl->budget = quota;
+   if (do_div(quota, num_possible_cpus() * 2))
+   quota++;
+   rl->cpu_batch = min_t(u64, UINT_MAX, quota);
+   rl->target = ktime_get();
+}
+
+static enum hrtimer_restart ratelimit_unblock(struct hrtimer *t)
+{
+   struct percpu_ratelimit *rl = container_of(t, struct percpu_ratelimit, 
timer);
+   enum hrtimer_restart ret = HRTIMER_NORESTART;
+   ktime_t now = t->base->get_time();
+
+   raw_spin_lock(>lock);
+   if (ktime_after(rl->target, now)) {
+   hrtimer_set_expires_range(t, rl->target, rl->period);
+   ret = HRTIMER_RESTART;
+   }
+   raw_spin_unlock(>lock);
+
+   return ret;
+}
+
+int percpu_ratelimit_init(struct percpu_ratelimit *rl, gfp_t gfp)
+{
+   memset(rl, 0, sizeof(*rl));
+   rl->cpu_budget = alloc_percpu_gfp(typeof(*rl->cpu_budget), gfp);
+   if (!rl->cpu_budget)
+   return -ENOMEM;
+   raw_spin_lock_init(>lock);
+   hrtimer_init(>timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+   rl->timer.function = ratelimit_unblock;
+   rl->deadline = ns_to_ktime(NSEC_PER_SEC);
+   rl->latency  = ns_to_ktime(NSEC_PER_SEC * 10);
+   __percpu_ratelimit_setup(rl, NSEC_PER_SEC, ULLONG_MAX);
+   return 0;
+}
+EXPORT_SYMBOL_GPL(percpu_ratelimit_init);
+
+void percpu_ratelimit_destroy(struct percpu_ratelimit *rl)
+{
+   free_percpu(rl->cpu_budget);
+   hrtimer_cancel(>timer);
+}
+EXPORT_SYMBOL_GPL(percpu_ratelimit_destroy);
+
+static void percpu_ratelimit_drain(void 

[PATCH 1/2] MIPS: OCTEON: fix kernel crash when offlining a CPU

2015-01-15 Thread Aaro Koskinen
octeon_cpu_disable() will unconditionally enable interrupts when called
with interrupts disabled. Fix that.

The patch fixes the following crash when offlining a CPU:

[   93.818785] [ cut here ]
[   93.823421] WARNING: CPU: 1 PID: 10 at kernel/smp.c:231 
flush_smp_call_function_queue+0x1c4/0x1d0()
[   93.836215] Modules linked in:
[   93.839287] CPU: 1 PID: 10 Comm: migration/1 Not tainted 
3.19.0-rc4-octeon-los_b5f0 #1
[   93.847212] Stack : 0001 81b2cf90 0004 
8163
     004a
  0006 8117e550  
  81b3 81b26808 800032c77748 81627e07
  81595ec8 81b26808 000a 0001
  0001 0003 10008ce1 815030c8
  800032cbbb38 8113d42c 10008ce1 8117f36c
  800032c77300 800032cbba50 0001 81503984
     
   81121668  
  ...
[   93.912819] Call Trace:
[   93.915273] [] show_stack+0x68/0x80
[   93.920335] [] dump_stack+0x6c/0x90
[   93.925395] [] warn_slowpath_common+0x94/0xd8
[   93.931324] [] flush_smp_call_function_queue+0x1c4/0x1d0
[   93.938208] [] hotplug_cfd+0xf0/0x108
[   93.943444] [] notifier_call_chain+0x5c/0xb8
[   93.949286] [] cpu_notify+0x24/0x60
[   93.954348] [] take_cpu_down+0x38/0x58
[   93.959670] [] multi_cpu_stop+0x154/0x180
[   93.965250] [] cpu_stopper_thread+0xd8/0x160
[   93.971093] [] smpboot_thread_fn+0x1ec/0x1f8
[   93.976936] [] kthread+0xd4/0xf0
[   93.981735] [] ret_from_kernel_thread+0x14/0x1c
[   93.987835]
[   93.989326] ---[ end trace c9e3815ee655bda9 ]---
[   93.993951] Kernel bug detected[#1]:
[   93.997533] CPU: 1 PID: 10 Comm: migration/1 Tainted: GW  
3.19.0-rc4-octeon-los_b5f0 #1
[   94.006591] task: 800032c77300 ti: 800032cb8000 task.ti: 
800032cb8000
[   94.014081] $ 0   :  1ce1 0001 
8162
[   94.022146] $ 4   : 82c72ac0  01a7 
813b06f0
[   94.030210] $ 8   : 813b20d8   
8163
[   94.038275] $12   : 0087  0086 

[   94.046339] $16   : 81623168 0001  
0008
[   94.054405] $20   : 0001 0001 0001 
0003
[   94.062470] $24   : 0038 813b7f10
[   94.070536] $28   : 800032cb8000 800032cbbc20 10008ce1 
811bcaf4
[   94.078601] Hi: 00f188e8
[   94.082179] Lo: d4fdf3b646c09d55
[   94.085760] epc   : 811bc9d0 irq_work_run_list+0x8/0xf8
[   94.091686] Tainted: GW
[   94.095613] ra: 811bcaf4 irq_work_run+0x34/0x60
[   94.101192] Status: 1ce3 KX SX UX KERNEL EXL IE
[   94.106235] Cause : 40808034
[   94.109119] PrId  : 000d9301 (Cavium Octeon II)
[   94.113653] Modules linked in:
[   94.116721] Process migration/1 (pid: 10, threadinfo=800032cb8000, 
task=800032c77300, tls=)
[   94.127168] Stack : 82c74c80 811a4128 0001 
81635720
  fff2 8115bacc 8000320fbce0 8000320fbca4
  8000320fbc80 0002 0004 8113d704
  8000320fbce0 81501738 0003 811b343c
  82c72aa0 82c72aa8 8159cae8 8159caa0
  8165 8000320fbbf0 8000320fbc80 811b32e8
   811b3768 81622b80 815148a8
  800032c77300 82c73e80 815148a8 800032c77300
  81622b80 815148a8 800032c77300 81503f48
  8115ea0c 8162  81174d64
  ...
[   94.192771] Call Trace:
[   94.195222] [] irq_work_run_list+0x8/0xf8
[   94.200802] [] irq_work_run+0x34/0x60
[   94.206036] [] hotplug_cfd+0xf0/0x108
[   94.211269] [] notifier_call_chain+0x5c/0xb8
[   94.217111] [] cpu_notify+0x24/0x60
[   94.222171] [] take_cpu_down+0x38/0x58
[   94.227491] [] multi_cpu_stop+0x154/0x180
[   94.233072] [] cpu_stopper_thread+0xd8/0x160
[   94.238914] [] smpboot_thread_fn+0x1ec/0x1f8
[   94.244757] [] kthread+0xd4/0xf0
[   94.249555] [] ret_from_kernel_thread+0x14/0x1c
[   94.255654]
[   94.257146]
Code: a2423c40  40026000  30420001 <00020336> dc82  10400037    
010f  010f
[   94.267183] ---[ end trace c9e3815ee655bdaa ]---
[   94.271804] Fatal exception: panic in 5 seconds

Reported-by: Hemmo Nieminen 

Re: [PATCH v7 08/17] ARM64 / ACPI: Get PSCI flags in FADT for PSCI init

2015-01-15 Thread Mark Langsdorf

On 01/14/2015 09:04 AM, Hanjun Guo wrote:

From: Graeme Gregory 

There are two flags: PSCI_COMPLIANT and PSCI_USE_HVC. When set,
the former signals to the OS that the firmware is PSCI compliant.
The latter selects the appropriate conduit for PSCI calls by
toggling between Hypervisor Calls (HVC) and Secure Monitor Calls
(SMC).

FADT table contains such information in ACPI 5.1, FADT table was
parsed in ACPI table init and copy to struct acpi_gbl_FADT, so
use the flags in struct acpi_gbl_FADT for PSCI init.

Since ACPI 5.1 doesn't support self defined PSCI function IDs,
which means that only PSCI 0.2+ is supported in ACPI.

Tested-by: Suravee Suthikulpanit 
Tested-by: Yijing Wang 
Signed-off-by: Graeme Gregory 
Signed-off-by: Tomasz Nowicki 
Signed-off-by: Hanjun Guo 
---

Tested-by: Mark Langsdorf 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 11/17] ACPI / processor: Make it possible to get CPU hardware ID via GICC

2015-01-15 Thread Mark Langsdorf

On 01/14/2015 09:04 AM, Hanjun Guo wrote:

Introduce a new function map_gicc_mpidr() to allow MPIDRs to be obtained
from the GICC Structure introduced by ACPI 5.1.

MPIDR is the CPU hardware ID as local APIC ID on x86 platform, so we use
MPIDR not the GIC CPU interface ID to identify CPUs.

Tested-by: Suravee Suthikulpanit 
Tested-by: Yijing Wang 
Signed-off-by: Hanjun Guo 
---

Tested-by: Mark Langsdorf 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] MIPS: fix kernel lockup or crash after CPU offline/online

2015-01-15 Thread Aaro Koskinen
From: Hemmo Nieminen 

As printk() invocation can cause e.g. a TLB miss, printk() cannot be
called before the exception handlers have been properly initialized.
This can happen e.g. when netconsole has been loaded as a kernel module
and the TLB table has been cleared when a CPU was offline.

Call cpu_report() in start_secondary() only after the exception handlers
have been initialized to fix this.

Without the patch the kernel will randomly either lockup or crash
after a CPU is onlined and the console driver is a module.

Signed-off-by: Hemmo Nieminen 
Signed-off-by: Aaro Koskinen 
Cc: sta...@vger.kernel.org
---
 arch/mips/kernel/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index c94c4e9..1c0d8c5 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -123,10 +123,10 @@ asmlinkage void start_secondary(void)
unsigned int cpu;
 
cpu_probe();
-   cpu_report();
per_cpu_trap_init(false);
mips_clockevent_init();
mp_ops->init_secondary();
+   cpu_report();
 
/*
 * XXX parity protection should be folded in here when it's converted
-- 
2.2.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio_balloon: coding style fixes

2015-01-15 Thread Michael S. Tsirkin
On Thu, Jan 15, 2015 at 03:13:08PM +0100, Michal Hocko wrote:
> On Thu 15-01-15 15:44:12, Michael S. Tsirkin wrote:
> > On Thu, Jan 15, 2015 at 02:06:42PM +0100, Michal Hocko wrote:
> > > On Thu 15-01-15 13:39:06, Michael S. Tsirkin wrote:
> > > > Most of our code has
> > > > struct foo {
> > > > }
> > > > 
> > > > Fix two instances where balloon is inconsistent.
> > > 
> > > I hate to complain but is it really necessary to post such patches to
> > > linux-api?
> > 
> > Well it's human to err, so it seems wise to copy parties
> > interested in the ABI/API whenever we are changing files under include/uapi.
> > Whitespace changes should mostly be safe, but it's not unknown
> > e.g. to include unrelated changes in the same commit by mistake.
> > 
> > > I thought the list was primarily for API related discussions.
> > 
> > Basically this line in MAINTAINERS
> > 
> > ABI/API
> > L:  linux-...@vger.kernel.org
> > F:  Documentation/ABI/
> > F:  include/linux/syscalls.h
> > F:  include/uapi/
> > F:  kernel/sys_ni.c
> > 
> > normally means "send all patches affecting files under include/uapi/ to
> > this list", does it not?
> 
> Well, this should always be taken as a hint not a hard rule. So if there
> is a change which is adding/removing or changing signature then sure but
> not everything falls into that category.

At least for code I maintain, I really wish people would just Cc me in
any case.  There's been a bunch of cases where people don't Cc me, and
then another maintainer assumes my silence implies agreement, and
applies.  Not nice. OTOH it's easy to ignore an irrelevant patch.

> > Wasn't this the intent?
> > 
> > > This is not the only mail sent here which doesn't fall into that
> > > category IMO. It is far from low volume list for quite some time.
> > > 
> > > Please let's get back low volume and API only discussion!
> > 
> > Maybe send patch dropping include/uapi/ from there,
> > should help drive the volumes down?
> 
> This would be an overkill IMO. It would be much more preferable if
> people actually think about who from the suggested list (either from
> MAINTAINERS or ./scripts/get_maintainer.pl) should be really added.
> 
> [...]

Yea, think about it, then what?  I've no idea what is linux-abi for, and
what people subscribed there are interested in. How should I? All I know
is what's in MAINTAINERS, which say "ABI/API". So I copy all ABI/API
patches there.

> -- 
> Michal Hocko
> SUSE Labs
> ___
> Virtualization mailing list
> virtualizat...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 12/17] ARM64 / ACPI: Introduce ACPI_IRQ_MODEL_GIC and register device's gsi

2015-01-15 Thread Mark Langsdorf

On 01/14/2015 09:05 AM, Hanjun Guo wrote:

Introduce ACPI_IRQ_MODEL_GIC which is needed for ARM64 as GIC is
used, and then register device's gsi with the core IRQ subsystem.

acpi_register_gsi() is similar to DT based irq_of_parse_and_map(),
since gsi is unique in the system, so use hwirq number directly
for the mapping.

Originally-by: Amit Daniel Kachhap 
Tested-by: Suravee Suthikulpanit 
Tested-by: Yijing Wang 
Signed-off-by: Hanjun Guo 
---

Tested-by: Mark Langsdorf 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] input: tsc2007: Add pre-calibration, flipping and rotation

2015-01-15 Thread Dr. H. Nikolaus Schaller

Am 15.01.2015 um 19:16 schrieb Dmitry Torokhov :

> On Thu, Jan 15, 2015 at 05:14:38PM +0100, Dr. H. Nikolaus Schaller wrote:
>> 
>> Am 15.01.2015 um 15:38 schrieb Sebastian Reichel :
>> 
>>> Hi,
>>> 
>>> On Thu, Jan 15, 2015 at 08:36:44AM +0100, Dr. H. Nikolaus Schaller wrote:
> 1. Perform conversion in input core rather than individual drivers. I
> think we should allocate a new bitmaps for some transformations and have
> the code do X/Y flip/clip of the coordinates.
 
 Do you have a suggestion where this should be (I have no clue how
 the input system works or is structured - we just know how to extend a
 driver that uses it)?
 
> 2. Standardize on bindings. We already have of-touchscreen.c doing
> rudimentary parsing, we shoudl look into extending it rather than
> creating myriad of driver-specific bindings.
 
 Ok, looks reasonable.
>>> 
>>> Documentation is in 
>>> 
>>> Documentation/devicetree/bindings/input/touchscreen/touchscreen.txt
>> 
>> I did look into it now. Unfortunately, it does not fit well into my view of 
>> how bindings
>> should be. They should describe hardware (as we are told for many other 
>> kernel
>> subsystems).
>> 
>> Pixels and resolutions are IMHO related to the screen it is glued on - and 
>> that is
>> quite independent.
> 
> Well, I think pixels was the wrong word to be used there. It is meant to
> be native units, as opposed to millimeters, inches, points, etc.

ok.

> 
>> 
>> So I don’t see how they do describe the different ways the touch screen can 
>> be
>> wired to a tsc2007 controller.
>> 
>> Please can you add minimum and maximum properties for us?
>> 
>> Then, inverted-x and inverted-y is redundant because it is the same as having
>> an expected higher value from the ADC for the minimum coordinate and a lower
>> for the maximum.
> 
> I'd rather not add minimum and maximum, but add the touchscreen-start-x and
> touchscreen-start-y instead so that we limit the number of obsolete
> properties.

ok, that should not be too difficult to add.

So we will modify our driver to use the new functions and align 
omap3-gta04.dtsi accordingly.

BR,
Nikolaus

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 16/17] ARM64 / ACPI: Enable ARM64 in Kconfig

2015-01-15 Thread Mark Langsdorf

On 01/14/2015 09:05 AM, Hanjun Guo wrote:

From: Graeme Gregory 

Add Kconfigs to build ACPI on ARM64, and make ACPI available on ARM64.

acpi_idle driver is x86/IA64 dependent now, so make CONFIG_ACPI_PROCESSOR
depend on X86 || IA64, and implement it on ARM64 in the future.

Reviewed-by: Grant Likely 
Tested-by: Suravee Suthikulpanit 
Tested-by: Yijing Wang 
Signed-off-by: Graeme Gregory 
Signed-off-by: Al Stone 
Signed-off-by: Hanjun Guo 
---

Tested-by: Mark Langsdorf 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 15/17] ARM64 / ACPI: Select ACPI_REDUCED_HARDWARE_ONLY if ACPI is enabled on ARM64

2015-01-15 Thread Mark Langsdorf

On 01/14/2015 09:05 AM, Hanjun Guo wrote:

From: Al Stone 

ACPI reduced hardware mode is disabled by default, but ARM64
can only run properly in ACPI hardware reduced mode, so select
ACPI_REDUCED_HARDWARE_ONLY if ACPI is enabled on ARM64.

Reviewed-by: Grant Likely 
Tested-by: Suravee Suthikulpanit 
Tested-by: Yijing Wang 
Signed-off-by: Al Stone 
Signed-off-by: Hanjun Guo 
---

Tested-by: Mark Langsdorf 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 14/17] ARM64 / ACPI: Parse GTDT to initialize arch timer

2015-01-15 Thread Mark Langsdorf

On 01/14/2015 09:05 AM, Hanjun Guo wrote:

Using the information presented by GTDT to initialize the arch
timer (not memory-mapped).

Originally-by: Amit Daniel Kachhap 
Tested-by: Suravee Suthikulpanit 
Tested-by: Yijing Wang 
Signed-off-by: Hanjun Guo 
---

Tested-by: Mark Langsdorf 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 13/17] ARM64 / ACPI: Add GICv2 specific ACPI boot support

2015-01-15 Thread Mark Langsdorf

On 01/14/2015 09:05 AM, Hanjun Guo wrote:

From: Tomasz Nowicki 

ACPI kernel uses MADT table for proper GIC initialization. It needs to
parse GIC related subtables, collect CPU interface and distributor
addresses and call driver initialization function (which is hardware
abstraction agnostic). In a similar way, FDT initialize GICv1/2.

NOTE: This commit allow to initialize GICv1/2 basic functionality.
GICv2 vitalization extension, GICv3/4 and ITS are considered as next
steps.

Tested-by: Suravee Suthikulpanit 
Tested-by: Yijing Wang 
Signed-off-by: Tomasz Nowicki 
Signed-off-by: Hanjun Guo 
---

Tested-by: Mark Langsdorf 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 10/17] ARM64 / ACPI: Parse MADT for SMP initialization

2015-01-15 Thread Mark Langsdorf

On 01/14/2015 09:04 AM, Hanjun Guo wrote:

MADT contains the information for MPIDR which is essential for
SMP initialization, parse the GIC cpu interface structures to
get the MPIDR value and map it to cpu_logical_map(), and add
enabled cpu with valid MPIDR into cpu_possible_map.

ACPI 5.1 only has two explicit methods to boot up SMP, PSCI and
Parking protocol, but the Parking protocol is only specified for
ARMv7 now, so make PSCI as the only way for the SMP boot protocol
before some updates for the ACPI spec or the Parking protocol spec.

Parking protocol patches for SMP boot will be sent to upstream when
the new version of Parking protocol is ready.

Tested-by: Suravee Suthikulpanit 
Tested-by: Yijing Wang 
Signed-off-by: Hanjun Guo 
Signed-off-by: Tomasz Nowicki 
---

Tested-by: Mark Langsdorf 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/6] memcg: track shared inodes with dirty pages

2015-01-15 Thread Tejun Heo
On Thu, Jan 15, 2015 at 09:49:14PM +0300, Konstantin Khebnikov wrote:
> From: Konstantin Khlebnikov 
> 
> Inode is owned only by one memory cgroup, but if it's shared it might
> contain pages from multiple cgroups. This patch detects this situation
> in memory reclaiemer and marks dirty inode with flag I_DIRTY_SHARED
> which is cleared only when data is completely written. Memcg writeback
> always writes such inodes.
> 
> Signed-off-by: Konstantin Khlebnikov 

This conflicts with the writeback cgroup support patchset which will
solve the writeback and memcg problem a lot more comprehensively.

 http://lkml.kernel.org/g/1420579582-8516-1-git-send-email...@kernel.org

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 17/17] Documentation: ACPI for ARM64

2015-01-15 Thread Mark Langsdorf

On 01/14/2015 09:05 AM, Hanjun Guo wrote:

From: Graeme Gregory 

Add documentation for the guidelines of how to use ACPI
on ARM64.

Reviewed-by: Suravee Suthikulpanit 
Reviewed-by: Yi Li 
Signed-off-by: Graeme Gregory 
Signed-off-by: Al Stone 
Signed-off-by: Hanjun Guo 
---

There's enough here to get people started. Additional
information can be added in later patches as needed and
as we get more experience with ACPI on ARM64.

Reviewed-by: Mark Langsdorf 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/6] delay-injection: resource management via procrastination

2015-01-15 Thread Konstantin Khebnikov
From: Konstantin Khlebnikov 

inject_delay() allows to pause current task before returning
into userspace in place where kernel doesn't hold any locks
thus wait wouldn't introduce any priority-inversion problems.

This code abuses existing task-work and 'TASK_PARKED' state.
Parked tasks are killable and don't contribute into cpu load.

Together with percpu_ratelimit this could be used in this manner:

if (percpu_ratelimit_charge(, events))
inject_delay(percpu_ratelimit_target());

Signed-off-by: Konstantin Khlebnikov 
---
 include/linux/sched.h|7 
 include/trace/events/sched.h |7 
 kernel/sched/core.c  |   66 ++
 kernel/sched/fair.c  |   12 
 4 files changed, 92 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8db31ef..2363918 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1132,6 +1132,7 @@ struct sched_statistics {
u64 iowait_sum;
 
u64 sleep_start;
+   u64 delay_start;
u64 sleep_max;
s64 sum_sleep_runtime;
 
@@ -1662,6 +1663,10 @@ struct task_struct {
unsigned long timer_slack_ns;
unsigned long default_timer_slack_ns;
 
+   /* Pause task till this time before returning into userspace */
+   ktime_t delay_injection_target;
+   struct callback_head delay_injection_work;
+
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
/* Index of current stored address in ret_stack */
int curr_ret_stack;
@@ -2277,6 +2282,8 @@ extern void set_curr_task(int cpu, struct task_struct *p);
 
 void yield(void);
 
+extern void inject_delay(ktime_t target);
+
 /*
  * The default (Linux) execution domain.
  */
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 30fedaf..d35154e 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -365,6 +365,13 @@ DEFINE_EVENT(sched_stat_template, sched_stat_blocked,
 TP_ARGS(tsk, delay));
 
 /*
+ * Tracepoint for accounting delay-injection
+ */
+DEFINE_EVENT(sched_stat_template, sched_stat_delayed,
+TP_PROTO(struct task_struct *tsk, u64 delay),
+TP_ARGS(tsk, delay));
+
+/*
  * Tracepoint for accounting runtime (time the task is executing
  * on a CPU).
  */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c0accc0..7a9d6a1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -65,6 +65,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -8377,3 +8378,68 @@ void dump_cpu_task(int cpu)
pr_info("Task dump for CPU %d:\n", cpu);
sched_show_task(cpu_curr(cpu));
 }
+
+#define DELAY_INJECTION_SLACK_NS   (NSEC_PER_SEC / 50)
+
+static enum hrtimer_restart delay_injection_wakeup(struct hrtimer *timer)
+{
+   struct hrtimer_sleeper *t =
+   container_of(timer, struct hrtimer_sleeper, timer);
+   struct task_struct *task = t->task;
+
+   t->task = NULL;
+   if (task)
+   wake_up_state(task, TASK_PARKED);
+
+   return HRTIMER_NORESTART;
+}
+
+/*
+ * Here delayed task sleeps in 'P'arked state.
+ */
+static void delay_injection_sleep(struct callback_head *head)
+{
+   struct task_struct *task = current;
+   struct hrtimer_sleeper t;
+
+   head->func = NULL;
+   __set_task_state(task, TASK_WAKEKILL | TASK_PARKED);
+   hrtimer_init_on_stack(, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+   hrtimer_set_expires_range_ns(, current->delay_injection_target,
+DELAY_INJECTION_SLACK_NS);
+
+   t.timer.function = delay_injection_wakeup;
+   t.task = task;
+
+   hrtimer_start_expires(, HRTIMER_MODE_ABS);
+   if (!hrtimer_active())
+   t.task = NULL;
+
+   if (likely(t.task))
+   schedule();
+
+   hrtimer_cancel();
+   destroy_hrtimer_on_stack();
+
+   __set_task_state(task, TASK_RUNNING);
+}
+
+/*
+ * inject_delay - injects delay before returning into userspace
+ * @target: absolute monotomic timestamp to sleeping for,
+ * task will not return into userspace before this time
+ */
+void inject_delay(ktime_t target)
+{
+   struct task_struct *task = current;
+
+   if (ktime_after(target, task->delay_injection_target)) {
+   task->delay_injection_target = target;
+   if (!task->delay_injection_work.func) {
+   init_task_work(>delay_injection_work,
+   delay_injection_sleep);
+   task_work_add(task, >delay_injection_work, true);
+   }
+   }
+}
+EXPORT_SYMBOL(inject_delay);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 40667cb..2e3269b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2944,6 +2944,15 @@ static void enqueue_sleeper(struct cfs_rq *cfs_rq, 

[PATCHSET RFC 0/6] memcg: inode-based dirty-set controller

2015-01-15 Thread Konstantin Khebnikov
This is ressurection of my old RFC patch for dirty-set accounting cgroup [1]
Now it's merged into memory cgroup and got bandwidth controller as a bonus.

That shows alternative solution: less accurate but much less monstrous than
accurate page-based dirty-set controller from Tejun Heo.

Memory overhead: +1 pointer into struct address_space.
Perfomance overhead is almost zero, no new locks added.

Idea is stright forward: link each inode to some cgroup when first dirty
page appers and account all dirty pages to it. Writeback is implemented
as single per-bdi writeback work which writes only inodes which belong
to memory cgroups where amount of dirty memory is beyond thresholds.

Third patch adds trick for handling shared inodes which have dirty pages
from several cgroups: it marks whole inode as shared and alters writeback
filter for it.

The rest is an example of bandwith and iops controller build on top of that.
Design is completely original, I bet nobody ever used task-works for that =)

[1] [PATCH RFC] fsio: filesystem io accounting cgroup
http://marc.info/?l=linux-kernel=137331569501655=2

Patches also available here:
https://github.com/koct9i/linux.git branch memcg_dirty_control

---

Konstantin Khebnikov (6):
  memcg: inode-based dirty and writeback pages accounting
  memcg: dirty-set limiting and filtered writeback
  memcg: track shared inodes with dirty pages
  percpu_ratelimit: high-performance ratelimiting counter
  delay-injection: resource management via procrastination
  memcg: filesystem bandwidth controller


 block/blk-core.c |2 
 fs/direct-io.c   |2 
 fs/fs-writeback.c|   22 ++
 fs/inode.c   |1 
 include/linux/backing-dev.h  |1 
 include/linux/fs.h   |   14 +
 include/linux/memcontrol.h   |   27 +++
 include/linux/percpu_ratelimit.h |   45 
 include/linux/sched.h|7 +
 include/linux/writeback.h|1 
 include/trace/events/sched.h |7 +
 include/trace/events/writeback.h |1 
 kernel/sched/core.c  |   66 +++
 kernel/sched/fair.c  |   12 +
 lib/Makefile |1 
 lib/percpu_ratelimit.c   |  168 +
 mm/memcontrol.c  |  381 ++
 mm/page-writeback.c  |   32 +++
 mm/readahead.c   |2 
 mm/truncate.c|1 
 mm/vmscan.c  |4 
 21 files changed, 787 insertions(+), 10 deletions(-)
 create mode 100644 include/linux/percpu_ratelimit.h
 create mode 100644 lib/percpu_ratelimit.c

--
Signature
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/6] memcg: inode-based dirty and writeback pages accounting

2015-01-15 Thread Konstantin Khebnikov
From: Konstantin Khlebnikov 

This patch links memory cgroup into vfs layer and assigns owner
memcg for each inode which has dirty or writeback pages within.
The main goal of this is controlling dirty memory size.

Accounting dirty memory in per-inode manner is much easier (we've
got locking for free) and more effective because we could use this
information in in writeback and writeout only inodes which belongs
to cgroup where amount of dirty memory is beyond of thresholds.

Interface: fs_dirty and fs_writeback in memory.stat attribute.

Signed-off-by: Konstantin Khlebnikov 
---
 fs/inode.c |1 
 include/linux/fs.h |   11 
 include/linux/memcontrol.h |   13 +
 mm/memcontrol.c|  118 
 mm/page-writeback.c|7 ++-
 mm/truncate.c  |1 
 6 files changed, 150 insertions(+), 1 deletion(-)

diff --git a/fs/inode.c b/fs/inode.c
index aa149e7..979a548 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -559,6 +559,7 @@ static void evict(struct inode *inode)
bd_forget(inode);
if (S_ISCHR(inode->i_mode) && inode->i_cdev)
cd_forget(inode);
+   mem_cgroup_forget_mapping(>i_data);
 
remove_inode_hash(inode);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 42efe13..ee2e3c0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -413,6 +413,9 @@ struct address_space {
spinlock_t  private_lock;   /* for use by the address_space 
*/
struct list_headprivate_list;   /* ditto */
void*private_data;  /* ditto */
+#ifdef CONFIG_MEMCG
+   struct mem_cgroup __rcu *i_memcg;   /* protected by ->tree_lock */
+#endif
 } __attribute__((aligned(sizeof(long;
/*
 * On most architectures that alignment is already the case; but
@@ -489,6 +492,14 @@ static inline void i_mmap_unlock_read(struct address_space 
*mapping)
 }
 
 /*
+ * Returns bitmap with all page-cache radix-tree tags
+ */
+static inline unsigned mapping_tags(struct address_space *mapping)
+{
+   return (__force unsigned)mapping->page_tree.gfp_mask >> 
__GFP_BITS_SHIFT;
+}
+
+/*
  * Might pages of this file be mapped into userspace?
  */
 static inline int mapping_mapped(struct address_space *mapping)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 7c95af8..b281333 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -173,6 +173,12 @@ static inline void mem_cgroup_count_vm_event(struct 
mm_struct *mm,
 void mem_cgroup_split_huge_fixup(struct page *head);
 #endif
 
+void mem_cgroup_inc_page_dirty(struct address_space *mapping);
+void mem_cgroup_dec_page_dirty(struct address_space *mapping);
+void mem_cgroup_inc_page_writeback(struct address_space *mapping);
+void mem_cgroup_dec_page_writeback(struct address_space *mapping);
+void mem_cgroup_forget_mapping(struct address_space *mapping);
+
 #else /* CONFIG_MEMCG */
 struct mem_cgroup;
 
@@ -340,6 +346,13 @@ static inline
 void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
 {
 }
+
+static inline void mem_cgroup_inc_page_dirty(struct address_space *mapping) {}
+static inline void mem_cgroup_dec_page_dirty(struct address_space *mapping) {}
+static inline void mem_cgroup_inc_page_writeback(struct address_space 
*mapping) {}
+static inline void mem_cgroup_dec_page_writeback(struct address_space 
*mapping) {}
+static inline void mem_cgroup_forget_mapping(struct address_space *mapping) {}
+
 #endif /* CONFIG_MEMCG */
 
 enum {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 851924f..c5655f1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -361,6 +361,9 @@ struct mem_cgroup {
struct list_head event_list;
spinlock_t event_list_lock;
 
+   struct percpu_counter nr_dirty;
+   struct percpu_counter nr_writeback;
+
struct mem_cgroup_per_node *nodeinfo[0];
/* WARNING: nodeinfo must be the last member here */
 };
@@ -3743,6 +3746,11 @@ static int memcg_stat_show(struct seq_file *m, void *v)
seq_printf(m, "total_%s %llu\n", mem_cgroup_lru_names[i], val);
}
 
+   seq_printf(m, "fs_dirty %llu\n", PAGE_SIZE *
+   percpu_counter_sum_positive(>nr_dirty));
+   seq_printf(m, "fs_writeback %llu\n", PAGE_SIZE *
+   percpu_counter_sum_positive(>nr_writeback));
+
 #ifdef CONFIG_DEBUG_VM
{
int nid, zid;
@@ -4577,6 +4585,10 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
if (!memcg)
return NULL;
 
+   if (percpu_counter_init(>nr_dirty, 0, GFP_KERNEL) ||
+   percpu_counter_init(>nr_writeback, 0, GFP_KERNEL))
+   goto out_free;
+
memcg->stat = alloc_percpu(struct mem_cgroup_stat_cpu);
if (!memcg->stat)
goto out_free;
@@ -4584,6 +4596,8 @@ static struct mem_cgroup *mem_cgroup_alloc(void)

[PATCH 2/6] memcg: dirty-set limiting and filtered writeback

2015-01-15 Thread Konstantin Khebnikov
From: Konstantin Khlebnikov 

mem_cgroup_dirty_limits() checks thresholds and schedules per-bdi
writeback work (where ->for_memcg is set) which writes only inodes
where dirty limit is exceeded for owner memcg or for whole bdi.

Interface: memory.dirty_ratio percent of memory limit used as threshold
(0 = unlimited, default 50). Background threshold is a half of that.
And fs_dirty_threshold line in memory.stat shows current threshold.

Signed-off-by: Konstantin Khlebnikov 
---
 fs/fs-writeback.c|   18 -
 include/linux/backing-dev.h  |1 
 include/linux/memcontrol.h   |6 ++
 include/linux/writeback.h|1 
 include/trace/events/writeback.h |1 
 mm/memcontrol.c  |  145 ++
 mm/page-writeback.c  |   25 ++-
 7 files changed, 190 insertions(+), 7 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 2d609a5..9034768 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -47,6 +48,7 @@ struct wb_writeback_work {
unsigned int range_cyclic:1;
unsigned int for_background:1;
unsigned int for_sync:1;/* sync(2) WB_SYNC_ALL writeback */
+   unsigned int for_memcg:1;
enum wb_reason reason;  /* why was writeback initiated? */
 
struct list_head list;  /* pending work list */
@@ -137,6 +139,7 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long 
nr_pages,
work->nr_pages  = nr_pages;
work->range_cyclic = range_cyclic;
work->reason= reason;
+   work->for_memcg = reason == WB_REASON_FOR_MEMCG;
 
bdi_queue_work(bdi, work);
 }
@@ -258,15 +261,16 @@ static int move_expired_inodes(struct list_head 
*delaying_queue,
LIST_HEAD(tmp);
struct list_head *pos, *node;
struct super_block *sb = NULL;
-   struct inode *inode;
+   struct inode *inode, *next;
int do_sb_sort = 0;
int moved = 0;
 
-   while (!list_empty(delaying_queue)) {
-   inode = wb_inode(delaying_queue->prev);
+   list_for_each_entry_safe(inode, next, delaying_queue, i_wb_list) {
if (work->older_than_this &&
inode_dirtied_after(inode, *work->older_than_this))
break;
+   if (work->for_memcg && !mem_cgroup_dirty_exceeded(inode))
+   continue;
list_move(>i_wb_list, );
moved++;
if (sb_is_blkdev_sb(inode->i_sb))
@@ -650,6 +654,11 @@ static long writeback_sb_inodes(struct super_block *sb,
break;
}
 
+   if (work->for_memcg && !mem_cgroup_dirty_exceeded(inode)) {
+   redirty_tail(inode, wb);
+   continue;
+   }
+
/*
 * Don't bother with new inodes or inodes being freed, first
 * kind does not need periodic writeout yet, and for the latter
@@ -1014,6 +1023,9 @@ static long wb_do_writeback(struct bdi_writeback *wb)
 
wrote += wb_writeback(wb, work);
 
+   if (work->for_memcg)
+   clear_bit(BDI_memcg_writeback_running, >state);
+
/*
 * Notify the caller of completion if this is a synchronous
 * work item, otherwise just free it.
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 5da6012..91b55d8 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -32,6 +32,7 @@ enum bdi_state {
BDI_sync_congested, /* The sync queue is getting full */
BDI_registered, /* bdi_register() was done */
BDI_writeback_running,  /* Writeback is in progress */
+   BDI_memcg_writeback_running,
 };
 
 typedef int (congested_fn)(void *, int);
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index b281333..ae05563 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -178,6 +178,9 @@ void mem_cgroup_dec_page_dirty(struct address_space 
*mapping);
 void mem_cgroup_inc_page_writeback(struct address_space *mapping);
 void mem_cgroup_dec_page_writeback(struct address_space *mapping);
 void mem_cgroup_forget_mapping(struct address_space *mapping);
+bool mem_cgroup_dirty_limits(struct address_space *mapping, unsigned long 
*dirty,
+unsigned long *thresh, unsigned long *bg_thresh);
+bool mem_cgroup_dirty_exceeded(struct inode *inode);
 
 #else /* CONFIG_MEMCG */
 struct mem_cgroup;
@@ -352,6 +355,9 @@ static inline void mem_cgroup_dec_page_dirty(struct 
address_space *mapping) {}
 static inline void mem_cgroup_inc_page_writeback(struct address_space 
*mapping) {}
 static inline void mem_cgroup_dec_page_writeback(struct address_space 
*mapping) {}
 static inline void 

[PATCH 6/6] memcg: filesystem bandwidth controller

2015-01-15 Thread Konstantin Khebnikov
From: Konstantin Khlebnikov 

This is example of filesystem bandwidth controller build on the top of
dirty memory accounting, percpu_ratelimit and delay-injection.

Cgroup charges read/write requests into rate-limiters and injects delays
which controls overall speed.

Interface:
memory.fs_bps_limit bytes per second, 0 == unlimited
memory.fs_iops_limitiops limit, 0 == unlimited
Statistics: fs_io_bytes and fs_io_operations in memory.stat

For small bandwidth limits memory limit also must be set into corresponded
value otherwise injected delay after writing dirty-set might be enormous.

Signed-off-by: Konstantin Khlebnikov 
---
 block/blk-core.c   |2 +
 fs/direct-io.c |2 +
 include/linux/memcontrol.h |4 ++
 mm/memcontrol.c|  102 +++-
 mm/readahead.c |2 +
 5 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 3ad4055..799f5f5 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1966,6 +1966,7 @@ void submit_bio(int rw, struct bio *bio)
count_vm_events(PGPGOUT, count);
} else {
task_io_account_read(bio->bi_iter.bi_size);
+   mem_cgroup_account_bandwidth(bio->bi_iter.bi_size);
count_vm_events(PGPGIN, count);
}
 
@@ -2208,6 +2209,7 @@ void blk_account_io_start(struct request *rq, bool new_io)
}
part_round_stats(cpu, part);
part_inc_in_flight(part, rw);
+   mem_cgroup_account_ioop();
rq->part = part;
}
 
diff --git a/fs/direct-io.c b/fs/direct-io.c
index e181b6b..9c60a82 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -775,6 +776,7 @@ submit_page_section(struct dio *dio, struct dio_submit 
*sdio, struct page *page,
 * Read accounting is performed in submit_bio()
 */
task_io_account_write(len);
+   mem_cgroup_account_bandwidth(len);
}
 
/*
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 3f89e9b..633310e 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -183,6 +183,8 @@ bool mem_cgroup_dirty_limits(struct address_space *mapping, 
unsigned long *dirty
 bool mem_cgroup_dirty_exceeded(struct inode *inode);
 void mem_cgroup_poke_writeback(struct address_space *mapping,
   struct mem_cgroup *memcg);
+void mem_cgroup_account_bandwidth(unsigned long bytes);
+void mem_cgroup_account_ioop(void);
 
 #else /* CONFIG_MEMCG */
 struct mem_cgroup;
@@ -362,6 +364,8 @@ static inline bool mem_cgroup_dirty_limits(struct 
address_space *mapping, unsign
 static inline bool mem_cgroup_dirty_exceeded(struct inode *inode) { return 
false; }
 static inline void mem_cgroup_poke_writeback(struct address_space *mapping,
 struct mem_cgroup *memcg) { }
+static inline void mem_cgroup_account_bandwidth(unsigned long bytes) {}
+static inline void mem_cgroup_account_ioop(void) {}
 
 #endif /* CONFIG_MEMCG */
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d9d345c..f49fbbf 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -27,6 +27,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -368,6 +369,9 @@ struct mem_cgroup {
unsigned int dirty_exceeded;
unsigned int dirty_ratio;
 
+   struct percpu_ratelimit iobw;
+   struct percpu_ratelimit ioop;
+
struct mem_cgroup_per_node *nodeinfo[0];
/* WARNING: nodeinfo must be the last member here */
 };
@@ -3762,6 +3766,12 @@ static int memcg_stat_show(struct seq_file *m, void *v)
seq_printf(m, "fs_dirty_threshold %llu\n", (u64)PAGE_SIZE *
memcg->dirty_threshold);
 
+   seq_printf(m, "fs_io_bytes %llu\n",
+   percpu_ratelimit_sum(>iobw));
+   seq_printf(m, "fs_io_operations %llu\n",
+   percpu_ratelimit_sum(>ioop));
+
+
 #ifdef CONFIG_DEBUG_VM
{
int nid, zid;
@@ -3833,6 +3843,40 @@ static int mem_cgroup_dirty_ratio_write(struct 
cgroup_subsys_state *css,
return 0;
 }
 
+static u64 mem_cgroup_get_bps_limit(
+   struct cgroup_subsys_state *css, struct cftype *cft)
+{
+   struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+
+   return percpu_ratelimit_quota(>iobw, NSEC_PER_SEC);
+}
+
+static int mem_cgroup_set_bps_limit(
+   struct cgroup_subsys_state *css, struct cftype *cft, u64 val)
+{
+   struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+
+   percpu_ratelimit_setup(>iobw, val, NSEC_PER_SEC);
+   return 0;
+}
+
+static u64 mem_cgroup_get_iops_limit(
+   struct cgroup_subsys_state *css, struct cftype *cft)
+{

Re: [PATCH v7 00/17] Introduce ACPI for ARM64 based on ACPI 5.1

2015-01-15 Thread Jon Masters
On 01/14/2015 10:04 AM, Hanjun Guo wrote:
> Hi,
> 
> This is the v7 of ACPI core patches for ARM64 based on ACPI 5.1
> 
> updates from v6:
>   - Rebased on top of 3.19-rc4, add Mack Salter's patch to use
> the early_ioremap after paging_init() for ACPI table mappings;
> 
>   - Two patches about converting apic_id to phys_id to make it arch
> agnostic were already merged into RC4 by Rafael.
> 
>   - Split patch "Parse FADT table to get PSCI flags for PSCI init"
> into two as Lorenzo's suggestion, also fix typo and lack of __init
> for psci_0_2_set_functions() which is spotted by Lorenzo.
> 
>   - Add Tested-by from Yijing Wang.
> 
> previous version is here:
> v6: https://lkml.org/lkml/2015/1/4/40
> 
> 1. Why we need ACPI on ARM64?
> 
>   - Grant already posted a blog about this, and stated clearly
> why we need ACPI on ARM64:
> 
> http://www.secretlab.ca/archives/151
> 
> 
> 2. What we need to do before the arm64 ACPI core patches
>could be merged into the kernel?
> 
>   - Al Stone posted a TODO list and updates v2 for the
> progress we made:
> http://www.spinics.net/lists/arm-kernel/msg390069.html
> 
>   - so from the progress we can see that we already finished
> most of the items, and _OSI we got a plan to fix it, RFC
> patch is on the way.
> 
> 
> This patch set was tested on FVP by Fuwei, and booted ok as expected.
> (No functional change since last version)

For the entire series:

Tested-by: Jon Masters 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 00/17] Introduce ACPI for ARM64 based on ACPI 5.1

2015-01-15 Thread Mark Brown
On Thu, Jan 15, 2015 at 06:23:47PM +, Catalin Marinas wrote:
> On Thu, Jan 15, 2015 at 04:26:20PM +, Grant Likely wrote:

> > I'll get right to the point: Can we please have this series queued up
> > for v3.20?

> Before you even ask for this, please look at the patches and realise
> that there is a complete lack of Reviewed-by tags on the code (well,
> apart from trivial Kconfig changes). In addition, the series touches on
> other subsystems like clocksource, irqchip, acpi and I don't see any
> acks from the corresponding maintainers. So even if I wanted to merge
> the series, there is no way it can be done without additional
> reviews/acks. On the document (last patch), I'd like to see a statement

There's probably a bit of a process problem here - these patches are all
being posted as part of big and apparently controversial threads with
subject lines in the form "ARM / ACPI:" so people could be forgiven for
just not even reading the e-mails enough to notice changes to their
subsystems.  Is it worth posting those patches separately more directly
to the relevant maintainers?


signature.asc
Description: Digital signature


Re: [PATCH 3/6] memcg: track shared inodes with dirty pages

2015-01-15 Thread Konstantin Khlebnikov
On Thu, Jan 15, 2015 at 9:55 PM, Tejun Heo  wrote:
> On Thu, Jan 15, 2015 at 09:49:14PM +0300, Konstantin Khebnikov wrote:
>> From: Konstantin Khlebnikov 
>>
>> Inode is owned only by one memory cgroup, but if it's shared it might
>> contain pages from multiple cgroups. This patch detects this situation
>> in memory reclaiemer and marks dirty inode with flag I_DIRTY_SHARED
>> which is cleared only when data is completely written. Memcg writeback
>> always writes such inodes.
>>
>> Signed-off-by: Konstantin Khlebnikov 
>
> This conflicts with the writeback cgroup support patchset which will
> solve the writeback and memcg problem a lot more comprehensively.
>
>  http://lkml.kernel.org/g/1420579582-8516-1-git-send-email...@kernel.org
>
> Thanks.

I know. Absolutely accurate per-page solution looks too complicated for me.
Is there any real demand for accurate handling dirty set in shared inodes?
Doing whole accounting in per-inode basis makes life so much easier.

>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] fbdev: ssd1307fb: return proper error code if write command fails

2015-01-15 Thread Lad, Prabhakar
From: Prabhakar Lad 

this patch fixes ssd1307fb_ssd1306_init() function to return
proper error codes in case of failures.

Signed-off-by: Lad, Prabhakar 
---
 Changes for v2:
 a: Added new line as per Maxime's suggestion.

 drivers/video/fbdev/ssd1307fb.c | 67 -
 1 file changed, 53 insertions(+), 14 deletions(-)

diff --git a/drivers/video/fbdev/ssd1307fb.c b/drivers/video/fbdev/ssd1307fb.c
index f4daa59..0ea6345 100644
--- a/drivers/video/fbdev/ssd1307fb.c
+++ b/drivers/video/fbdev/ssd1307fb.c
@@ -320,7 +320,10 @@ static int ssd1307fb_ssd1306_init(struct ssd1307fb_par 
*par)
 
/* Set initial contrast */
ret = ssd1307fb_write_cmd(par->client, SSD1307FB_CONTRAST);
-   ret = ret & ssd1307fb_write_cmd(par->client, 0x7f);
+   if (ret < 0)
+   return ret;
+
+   ret = ssd1307fb_write_cmd(par->client, 0x7f);
if (ret < 0)
return ret;
 
@@ -336,63 +339,99 @@ static int ssd1307fb_ssd1306_init(struct ssd1307fb_par 
*par)
 
/* Set multiplex ratio value */
ret = ssd1307fb_write_cmd(par->client, SSD1307FB_SET_MULTIPLEX_RATIO);
-   ret = ret & ssd1307fb_write_cmd(par->client, par->height - 1);
+   if (ret < 0)
+   return ret;
+
+   ret = ssd1307fb_write_cmd(par->client, par->height - 1);
if (ret < 0)
return ret;
 
/* set display offset value */
ret = ssd1307fb_write_cmd(par->client, SSD1307FB_SET_DISPLAY_OFFSET);
+   if (ret < 0)
+   return ret;
+
ret = ssd1307fb_write_cmd(par->client, 0x20);
if (ret < 0)
return ret;
 
/* Set clock frequency */
ret = ssd1307fb_write_cmd(par->client, SSD1307FB_SET_CLOCK_FREQ);
-   ret = ret & ssd1307fb_write_cmd(par->client, 0xf0);
+   if (ret < 0)
+   return ret;
+
+   ret = ssd1307fb_write_cmd(par->client, 0xf0);
if (ret < 0)
return ret;
 
/* Set precharge period in number of ticks from the internal clock */
ret = ssd1307fb_write_cmd(par->client, SSD1307FB_SET_PRECHARGE_PERIOD);
-   ret = ret & ssd1307fb_write_cmd(par->client, 0x22);
+   if (ret < 0)
+   return ret;
+
+   ret = ssd1307fb_write_cmd(par->client, 0x22);
if (ret < 0)
return ret;
 
/* Set COM pins configuration */
ret = ssd1307fb_write_cmd(par->client, SSD1307FB_SET_COM_PINS_CONFIG);
-   ret = ret & ssd1307fb_write_cmd(par->client, 0x22);
+   if (ret < 0)
+   return ret;
+
+   ret = ssd1307fb_write_cmd(par->client, 0x22);
if (ret < 0)
return ret;
 
/* Set VCOMH */
ret = ssd1307fb_write_cmd(par->client, SSD1307FB_SET_VCOMH);
-   ret = ret & ssd1307fb_write_cmd(par->client, 0x49);
+   if (ret < 0)
+   return ret;
+
+   ret = ssd1307fb_write_cmd(par->client, 0x49);
if (ret < 0)
return ret;
 
/* Turn on the DC-DC Charge Pump */
ret = ssd1307fb_write_cmd(par->client, SSD1307FB_CHARGE_PUMP);
-   ret = ret & ssd1307fb_write_cmd(par->client, 0x14);
+   if (ret < 0)
+   return ret;
+
+   ret = ssd1307fb_write_cmd(par->client, 0x14);
if (ret < 0)
return ret;
 
/* Switch to horizontal addressing mode */
ret = ssd1307fb_write_cmd(par->client, SSD1307FB_SET_ADDRESS_MODE);
-   ret = ret & ssd1307fb_write_cmd(par->client,
-   SSD1307FB_SET_ADDRESS_MODE_HORIZONTAL);
+   if (ret < 0)
+   return ret;
+
+   ret = ssd1307fb_write_cmd(par->client,
+ SSD1307FB_SET_ADDRESS_MODE_HORIZONTAL);
if (ret < 0)
return ret;
 
ret = ssd1307fb_write_cmd(par->client, SSD1307FB_SET_COL_RANGE);
-   ret = ret & ssd1307fb_write_cmd(par->client, 0x0);
-   ret = ret & ssd1307fb_write_cmd(par->client, par->width - 1);
+   if (ret < 0)
+   return ret;
+
+   ret = ssd1307fb_write_cmd(par->client, 0x0);
+   if (ret < 0)
+   return ret;
+
+   ret = ssd1307fb_write_cmd(par->client, par->width - 1);
if (ret < 0)
return ret;
 
ret = ssd1307fb_write_cmd(par->client, SSD1307FB_SET_PAGE_RANGE);
-   ret = ret & ssd1307fb_write_cmd(par->client, 0x0);
-   ret = ret & ssd1307fb_write_cmd(par->client,
-   par->page_offset + (par->height / 8) - 
1);
+   if (ret < 0)
+   return ret;
+
+   ret = ssd1307fb_write_cmd(par->client, 0x0);
+   if (ret < 0)
+   return ret;
+
+   ret = ssd1307fb_write_cmd(par->client,
+ par->page_offset + (par->height / 8) - 1);
if (ret < 0)
return ret;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a 

Re: [PATCH 3/6] memcg: track shared inodes with dirty pages

2015-01-15 Thread Tejun Heo
Hello,

On Thu, Jan 15, 2015 at 11:04:49PM +0400, Konstantin Khlebnikov wrote:
> I know. Absolutely accurate per-page solution looks too complicated for me.
> Is there any real demand for accurate handling dirty set in shared inodes?
> Doing whole accounting in per-inode basis makes life so much easier.

Ah, yeah, patch #3 arrived in isolation, so I thought it was part of
something completely different.  I definitely thought about doing it
per-inode too (and also requiring memcg to attribute pages according
to its inode rather than individual pages).  I'll look into the
patchset and try to identify the pros and cons of our approaches.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH_V5] dm9000: Add regulator and reset support to dm9000

2015-01-15 Thread David Miller
From: Zubair Lutfullah Kakakhel 
Date: Thu, 15 Jan 2015 10:12:26 +

> In boards, the dm9000 chip's power and reset can be controlled by gpio.
> 
> It makes sense to add them to the dm9000 driver and let dt be used to
> enable power and reset the phy.
> 
> Signed-off-by: Zubair Lutfullah Kakakhel 
> Signed-off-by: Paul Burton 

Applied to net-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] Input: touchscreen-iproc: Add Broadcom iProc touchscreen driver

2015-01-15 Thread Jonathan Richardson
On 15-01-14 05:08 PM, Florian Fainelli wrote:
> On 19/12/14 15:03, Jonathan Richardson wrote:
>> On 14-12-19 02:26 PM, Joe Perches wrote:
>>> On Fri, 2014-12-19 at 14:17 -0800, Jonathan Richardson wrote:
 Add initial version of the Broadcom touchscreen driver.
>>>
>>> more trivia:
>>>
 diff --git a/drivers/input/touchscreen/bcm_iproc_tsc.c 
 b/drivers/input/touchscreen/bcm_iproc_tsc.c
>>> []
 +static int get_tsc_config(struct device_node *np, struct iproc_ts_priv 
 *priv)
 +{
 +  u32 val;
>>> []
 +  if (of_property_read_u32(np, "debounce_timeout", ) >= 0) {
 +  if (val < 0 || val > 255) {
 +  dev_err(dev, "debounce_timeout must be [0-255]\n");
 +  return -EINVAL;
 +  }
 +  priv->cfg_params.debounce_timeout = val;
> 
> BTW, common practice for DT properties is to use a dash instead of an
> underscore for multi-worded properties.

ts-rotation is done that way already so I'll change the others to be
consistent. Thanks.

> 
>>>
>>> Doesn't the compiler generate a warning message
>>> about an impossible "unsigned < 0" test for all
>>> of these "val < 0" uses?
>>>
>>
>> Actually no it doesn't. The gcc output shows that neither -Wtype-limits
>> nor -Wextra are used to compile that file. I assume this is because
>> there would be just too many warnings.
>>
>>
 +  }
 +
 +  if (of_property_read_u32(np, "settling_timeout", ) >= 0) {
 +  if (val < 0 || val > 11) {
>>> []
 +  if (of_property_read_u32(np, "touch_timeout", ) >= 0) {
 +  if (val < 0 || val > 255) {
>>> []
 +  if (of_property_read_u32(np, "average_data", ) >= 0) {
 +  if (val < 0 || val > 8) {
>>> []
 +  if (of_property_read_u32(np, "fifo_threshold", ) >= 0) {
 +  if (val < 0 || val > 31) {
>>>
>>>
>>
>>
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] x86, fpu: kernel_fpu_begin/end initial cleanups/fix

2015-01-15 Thread Oleg Nesterov
Add cc's.

On 01/11, r...@redhat.com wrote:
>
> Currently the kernel will always load the FPU context, even
> when switching to a kernel thread, or to an idle thread. In
> the case of a task on a KVM VCPU going idle for a bit, and
> waking up again later, this creates a vastly inefficient
> chain of FPU context saves & loads:

I assume you will send v2.

Let me send initial kernel_fpu_begin/end cleanups, I believe they make
sense anyway and won't conflict with your changes.

This is actually resend, I sent more patches some time ago but they were
ignored.

Note that (I hope) we can do more changes on top of this series, in
particular:

- remove all checks from irq_fpu_usable() except in_kernel_fpu

- do not abuse FPU in kernel threads, this makes sense even if
  use_eager_fpu(), and with or without the changes you proposed.

Please review.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] x86, fpu: introduce per-cpu "bool in_kernel_fpu"

2015-01-15 Thread Oleg Nesterov
interrupted_kernel_fpu_idle() tries to detect if kernel_fpu_begin()
is safe or not. In particular it should obviously deny the nested
kernel_fpu_begin() and this logic looks very confusing.

If use_eager_fpu() == T we rely on a) __thread_has_fpu() check in
interrupted_kernel_fpu_idle(), and b) on the fact that _begin() does
__thread_clear_has_fpu().

Otherwise we demand that the interrupted task has no FPU if it is in
kernel mode, this works because __kernel_fpu_begin() does clts() and
interrupted_kernel_fpu_idle() checks X86_CR0_TS.

Add the per-cpu "bool in_kernel_fpu" variable, and change this code
to check/set/clear it. This allows to do more cleanups and fixes, see
the next changes.

The patch also moves WARN_ON_ONCE() under preempt_disable() just to
make this_cpu_read() look better, this is not really needed. And in
fact I think we should move it into __kernel_fpu_begin().

Signed-off-by: Oleg Nesterov 
---
 arch/x86/include/asm/i387.h |2 +-
 arch/x86/kernel/i387.c  |9 +
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index ed8089d..5e275d3 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -40,8 +40,8 @@ extern void __kernel_fpu_end(void);
 
 static inline void kernel_fpu_begin(void)
 {
-   WARN_ON_ONCE(!irq_fpu_usable());
preempt_disable();
+   WARN_ON_ONCE(!irq_fpu_usable());
__kernel_fpu_begin();
 }
 
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index a9a4229..a815723 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -19,6 +19,8 @@
 #include 
 #include 
 
+static DEFINE_PER_CPU(bool, in_kernel_fpu);
+
 /*
  * Were we in an interrupt that interrupted kernel mode?
  *
@@ -33,6 +35,9 @@
  */
 static inline bool interrupted_kernel_fpu_idle(void)
 {
+   if (this_cpu_read(in_kernel_fpu))
+   return false;
+
if (use_eager_fpu())
return __thread_has_fpu(current);
 
@@ -73,6 +78,8 @@ void __kernel_fpu_begin(void)
 {
struct task_struct *me = current;
 
+   this_cpu_write(in_kernel_fpu, true);
+
if (__thread_has_fpu(me)) {
__thread_clear_has_fpu(me);
__save_init_fpu(me);
@@ -99,6 +106,8 @@ void __kernel_fpu_end(void)
} else {
stts();
}
+
+   this_cpu_write(in_kernel_fpu, false);
 }
 EXPORT_SYMBOL(__kernel_fpu_end);
 
-- 
1.5.5.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] x86, fpu: don't abuse ->has_fpu in __kernel_fpu_{begin,end}()

2015-01-15 Thread Oleg Nesterov
Now that we have in_kernel_fpu we can remove __thread_clear_has_fpu()
in __kernel_fpu_begin(). And this allows to replace the asymmetrical
and nontrivial use_eager_fpu + tsk_used_math check in kernel_fpu_end()
with the same __thread_has_fpu() check.

The logic becomes really simple; if _begin() does save() then _end()
needs restore(), this is controlled by __thread_has_fpu(). Otherwise
they do clts/stts unless use_eager_fpu().

Not only this makes begin/end symmetrical and imo more understandable,
potentially this allows to change irq_fpu_usable() to avoid all other
checks except "in_kernel_fpu".

Also, with this patch __kernel_fpu_end() does restore_fpu_checking()
and WARNs if it fails instead of math_state_restore(). I think this
looks better because we no longer need __thread_fpu_begin(), and it
would be better to report the failure in this case.

Signed-off-by: Oleg Nesterov 
---
 arch/x86/kernel/i387.c |   19 ++-
 1 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index a815723..12088a3 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -81,9 +81,7 @@ void __kernel_fpu_begin(void)
this_cpu_write(in_kernel_fpu, true);
 
if (__thread_has_fpu(me)) {
-   __thread_clear_has_fpu(me);
__save_init_fpu(me);
-   /* We do 'stts()' in __kernel_fpu_end() */
} else if (!use_eager_fpu()) {
this_cpu_write(fpu_owner_task, NULL);
clts();
@@ -93,17 +91,12 @@ EXPORT_SYMBOL(__kernel_fpu_begin);
 
 void __kernel_fpu_end(void)
 {
-   if (use_eager_fpu()) {
-   /*
-* For eager fpu, most the time, tsk_used_math() is true.
-* Restore the user math as we are done with the kernel usage.
-* At few instances during thread exit, signal handling etc,
-* tsk_used_math() is false. Those few places will take proper
-* actions, so we don't need to restore the math here.
-*/
-   if (likely(tsk_used_math(current)))
-   math_state_restore();
-   } else {
+   struct task_struct *me = current;
+
+   if (__thread_has_fpu(me)) {
+   if (WARN_ON(restore_fpu_checking(me)))
+   drop_init_fpu(me);
+   } else if (!use_eager_fpu()) {
stts();
}
 
-- 
1.5.5.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] x86, fpu: fix math_state_restore() race with kernel_fpu_begin()

2015-01-15 Thread Oleg Nesterov
math_state_restore() can race with kernel_fpu_begin() if irq comes
right after __thread_fpu_begin(), __save_init_fpu() will overwrite
fpu->state we are going to restore.

Add 2 simple helpers, kernel_fpu_disable() and kernel_fpu_enable()
which simply set/clear in_kernel_fpu, and change math_state_restore()
to exclude kernel_fpu_begin() in between.

Alternatively we could use local_irq_save/restore, but probably these
new helpers can have more users.

Perhaps they should disable/enable preemption themselves, in this case
we can remove preempt_disable() in __restore_xstate_sig().

Signed-off-by: Oleg Nesterov 
---
 arch/x86/include/asm/i387.h |4 
 arch/x86/kernel/i387.c  |   11 +++
 arch/x86/kernel/traps.c |   12 +---
 3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 5e275d3..6eb6fcb 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -51,6 +51,10 @@ static inline void kernel_fpu_end(void)
preempt_enable();
 }
 
+/* Must be called with preempt disabled */
+extern void kernel_fpu_disable(void);
+extern void kernel_fpu_enable(void);
+
 /*
  * Some instructions like VIA's padlock instructions generate a spurious
  * DNA fault but don't modify SSE registers. And these instructions
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 12088a3..81049ff 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -21,6 +21,17 @@
 
 static DEFINE_PER_CPU(bool, in_kernel_fpu);
 
+void kernel_fpu_disable(void)
+{
+   WARN_ON(this_cpu_read(in_kernel_fpu));
+   this_cpu_write(in_kernel_fpu, true);
+}
+
+void kernel_fpu_enable(void)
+{
+   this_cpu_write(in_kernel_fpu, false);
+}
+
 /*
  * Were we in an interrupt that interrupted kernel mode?
  *
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 88900e2..fb4cb6a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -788,18 +788,16 @@ void math_state_restore(void)
local_irq_disable();
}
 
+   /* Avoid __kernel_fpu_begin() right after __thread_fpu_begin() */
+   kernel_fpu_disable();
__thread_fpu_begin(tsk);
-
-   /*
-* Paranoid restore. send a SIGSEGV if we fail to restore the state.
-*/
if (unlikely(restore_fpu_checking(tsk))) {
drop_init_fpu(tsk);
force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk);
-   return;
+   } else {
+   tsk->thread.fpu_counter++;
}
-
-   tsk->thread.fpu_counter++;
+   kernel_fpu_enable();
 }
 EXPORT_SYMBOL_GPL(math_state_restore);
 
-- 
1.5.5.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Revert "usb: dwc2: add bus suspend/resume for dwc2"

2015-01-15 Thread Paul Zimmerman
This reverts commit 0cf884e819e05437287a668b9bfcc198bab6329c.
Even after applying the follow-on patch at
https://patchwork.kernel.org/patch/5325111
there are still problems with device connect on the Altera SOCFPGA
platform at least. One possible fix would be to add a whitelist
to enable suspend/resume on platforms where it does work correctly.

Signed-off-by: Paul Zimmerman 
---
Based on Felipe's testing/next branch.

 drivers/usb/dwc2/hcd.c |   88 ++--
 1 files changed, 11 insertions(+), 77 deletions(-)

diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index a0cd9db..755e16b 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -1473,30 +1473,6 @@ static void dwc2_port_suspend(struct dwc2_hsotg *hsotg, 
u16 windex)
}
 }
 
-static void dwc2_port_resume(struct dwc2_hsotg *hsotg)
-{
-   u32 hprt0;
-
-   /* After clear the Stop PHY clock bit, we should wait for a moment
-* for PLL work stable with clock output.
-*/
-   writel(0, hsotg->regs + PCGCTL);
-   usleep_range(2000, 4000);
-
-   hprt0 = dwc2_read_hprt0(hsotg);
-   hprt0 |= HPRT0_RES;
-   writel(hprt0, hsotg->regs + HPRT0);
-   hprt0 &= ~HPRT0_SUSP;
-   /* according to USB2.0 Spec 7.1.7.7, the host must send the resume
-* signal for at least 20ms
-*/
-   usleep_range(2, 25000);
-
-   hprt0 &= ~HPRT0_RES;
-   writel(hprt0, hsotg->regs + HPRT0);
-   hsotg->lx_state = DWC2_L0;
-}
-
 /* Handles hub class-specific requests */
 static int dwc2_hcd_hub_control(struct dwc2_hsotg *hsotg, u16 typereq,
u16 wvalue, u16 windex, char *buf, u16 wlength)
@@ -1542,7 +1518,17 @@ static int dwc2_hcd_hub_control(struct dwc2_hsotg 
*hsotg, u16 typereq,
case USB_PORT_FEAT_SUSPEND:
dev_dbg(hsotg->dev,
"ClearPortFeature USB_PORT_FEAT_SUSPEND\n");
-   dwc2_port_resume(hsotg);
+   writel(0, hsotg->regs + PCGCTL);
+   usleep_range(2, 4);
+
+   hprt0 = dwc2_read_hprt0(hsotg);
+   hprt0 |= HPRT0_RES;
+   writel(hprt0, hsotg->regs + HPRT0);
+   hprt0 &= ~HPRT0_SUSP;
+   usleep_range(10, 15);
+
+   hprt0 &= ~HPRT0_RES;
+   writel(hprt0, hsotg->regs + HPRT0);
break;
 
case USB_PORT_FEAT_POWER:
@@ -2315,55 +2301,6 @@ static void _dwc2_hcd_stop(struct usb_hcd *hcd)
usleep_range(1000, 3000);
 }
 
-static int _dwc2_hcd_suspend(struct usb_hcd *hcd)
-{
-   struct dwc2_hsotg *hsotg = dwc2_hcd_to_hsotg(hcd);
-   u32 hprt0;
-
-   if (!((hsotg->op_state == OTG_STATE_B_HOST) ||
-   (hsotg->op_state == OTG_STATE_A_HOST)))
-   return 0;
-
-   /* TODO: We get into suspend from 'on' state, maybe we need to do
-* something if we get here from DWC2_L1(LPM sleep) state one day.
-*/
-   if (hsotg->lx_state != DWC2_L0)
-   return 0;
-
-   hprt0 = dwc2_read_hprt0(hsotg);
-   if (hprt0 & HPRT0_CONNSTS) {
-   dwc2_port_suspend(hsotg, 1);
-   } else {
-   u32 pcgctl = readl(hsotg->regs + PCGCTL);
-
-   pcgctl |= PCGCTL_STOPPCLK;
-   writel(pcgctl, hsotg->regs + PCGCTL);
-   }
-
-   return 0;
-}
-
-static int _dwc2_hcd_resume(struct usb_hcd *hcd)
-{
-   struct dwc2_hsotg *hsotg = dwc2_hcd_to_hsotg(hcd);
-   u32 hprt0;
-
-   if (!((hsotg->op_state == OTG_STATE_B_HOST) ||
-   (hsotg->op_state == OTG_STATE_A_HOST)))
-   return 0;
-
-   if (hsotg->lx_state != DWC2_L2)
-   return 0;
-
-   hprt0 = dwc2_read_hprt0(hsotg);
-   if ((hprt0 & HPRT0_CONNSTS) && (hprt0 & HPRT0_SUSP))
-   dwc2_port_resume(hsotg);
-   else
-   writel(0, hsotg->regs + PCGCTL);
-
-   return 0;
-}
-
 /* Returns the current frame number */
 static int _dwc2_hcd_get_frame_number(struct usb_hcd *hcd)
 {
@@ -2734,9 +2671,6 @@ static struct hc_driver dwc2_hc_driver = {
.hub_status_data = _dwc2_hcd_hub_status_data,
.hub_control = _dwc2_hcd_hub_control,
.clear_tt_buffer_complete = _dwc2_hcd_clear_tt_buffer_complete,
-
-   .bus_suspend = _dwc2_hcd_suspend,
-   .bus_resume = _dwc2_hcd_resume,
 };
 
 /*
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] MIPS: OCTEON: fix kernel crash when offlining a CPU

2015-01-15 Thread David Daney

On 01/15/2015 10:49 AM, Aaro Koskinen wrote:

octeon_cpu_disable() will unconditionally enable interrupts when called
with interrupts disabled. Fix that.


interrupts are always disabled here, so...

[...]


Reported-by: Hemmo Nieminen 
Signed-off-by: Aaro Koskinen 
Cc: sta...@vger.kernel.org


NACK!


---
  arch/mips/cavium-octeon/smp.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/mips/cavium-octeon/smp.c b/arch/mips/cavium-octeon/smp.c
index ecd903d..9673c5b 100644
--- a/arch/mips/cavium-octeon/smp.c
+++ b/arch/mips/cavium-octeon/smp.c
@@ -231,6 +231,7 @@ DEFINE_PER_CPU(int, cpu_state);
  static int octeon_cpu_disable(void)
  {
unsigned int cpu = smp_processor_id();
+   unsigned long flags;

if (cpu == 0)
return -EBUSY;
@@ -240,9 +241,9 @@ static int octeon_cpu_disable(void)

set_cpu_online(cpu, false);
cpu_clear(cpu, cpu_callin_map);
-   local_irq_disable();
+   local_irq_save(flags);


Just remove this...


octeon_fixup_irqs();
-   local_irq_enable();
+   local_irq_restore(flags);


... and this.



flush_cache_all();
local_flush_tlb_all();



You can add an Acked-by me if you do that.

David Daney.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 06/11] x86,fpu: lazily skip fpu restore with eager fpu mode, too

2015-01-15 Thread Oleg Nesterov
On 01/14, Rik van Riel wrote:
>
> On 01/14/2015 01:36 PM, Oleg Nesterov wrote:
>
> >> @@ -466,6 +462,10 @@ static inline void switch_fpu_finish(void)
> >>
> >> __thread_fpu_begin(tsk);
> >>
> >> +  /* The FPU registers already have this task's FPU state. */ +
> >> if (fpu_lazy_restore(tsk, raw_smp_processor_id())) +   return; 
> >> +
> >
> > Now that this is called before return to user-mode, I am not sure
> > this is correct. Note that __kernel_fpu_begin() doesn't clear
> > fpu_owner_task if use_eager_fpu().
>
> However, __kernel_fpu_begin() does call __thread_clear_has_fpu(),
> which clears the per-cpu fpu_owner variable, which is also
> evaluated by fpu_lazy_restore(), so I think this is actually
> correct.

Sure, but only if __thread_has_fpu().

But please ignore. My comment was confusing, sorry. What I actually
tried to say is that this patch is another reason why (I think) we
should start with kernel_fpu_begin/end.

If nothing else:

1. interrupted_kernel_fpu_idle() should not fail if
   use_eager_fpu() && !__thread_has_fpu(), otherwise your
   changes will introduce the performance regression.

   And in fact I think that it should only fail if
   kernel_fpu_begin() is already in progress.

2. And in this case this_cpu_write(fpu_owner_task, NULL)
   can't depend on use_eager_fpu().

   And in fact I think it should not depend in any case,
   this only adds more confusion.

Please look at the initial cleanups I sent.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 10/11] perf/x86/intel: Perform rotation on Intel CQM RMIDs

2015-01-15 Thread Matt Fleming
On Fri, 09 Jan, at 04:58:35PM, Peter Zijlstra wrote:
> 
> Yeah, that'll work, when the free+limbo count is 1/4th the total we
> should stop pulling more plugs.

Perhaps something like this? It favours stealing more RMIDs over
increasing the "dirty threshold".

---

diff --git a/arch/x86/kernel/cpu/perf_event_intel_cqm.c 
b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
index fc1a90245601..af58f233c93c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
@@ -490,29 +490,27 @@ static unsigned int __rmid_queue_time_ms = 
RMID_DEFAULT_QUEUE_TIME;
 
 /*
  * intel_cqm_rmid_stabilize - move RMIDs from limbo to free list
- * @available: are there freeable RMIDs on the limbo list?
+ * @nr_available: number of freeable RMIDs on the limbo list
  *
  * Quiescent state; wait for all 'freed' RMIDs to become unused, i.e. no
  * cachelines are tagged with those RMIDs. After this we can reuse them
  * and know that the current set of active RMIDs is stable.
  *
- * Return %true or %false depending on whether we were able to stabilize
- * an RMID for intel_cqm_rotation_rmid.
+ * Return %true or %false depending on whether stabilization needs to be
+ * reattempted.
  *
- * If we return %false then @available is updated to indicate the reason
- * we couldn't stabilize any RMIDs. @available is %false if no suitable
- * RMIDs were found on the limbo list to recycle, i.e. no RMIDs had been
- * on the list for the minimum queue time. If @available is %true then,
- * we found suitable RMIDs to recycle but none had an associated
- * occupancy value below __intel_cqm_threshold and the threshold should
- * be increased and stabilization reattempted.
+ * If we return %true then @nr_available is updated to indicate the
+ * number of RMIDs on the limbo list that have been queued for the
+ * minimum queue time (RMID_AVAILABLE), but whose data occupancy values
+ * are above __intel_cqm_threshold.
  */
-static bool intel_cqm_rmid_stabilize(bool *available)
+static bool intel_cqm_rmid_stabilize(unsigned int *available)
 {
struct cqm_rmid_entry *entry, *tmp;
 
lockdep_assert_held(_mutex);
 
+   *available = 0;
list_for_each_entry(entry, _rmid_limbo_lru, list) {
unsigned long min_queue_time;
unsigned long now = jiffies;
@@ -539,7 +537,7 @@ static bool intel_cqm_rmid_stabilize(bool *available)
break;
 
entry->state = RMID_AVAILABLE;
-   *available = true;
+   *available++;
}
 
/*
@@ -547,7 +545,7 @@ static bool intel_cqm_rmid_stabilize(bool *available)
 * sitting on the queue for the minimum queue time.
 */
if (!*available)
-   return false;
+   return true;
 
/*
 * Test whether an RMID is free for each package.
@@ -684,9 +682,10 @@ static void intel_cqm_sched_out_events(struct perf_event 
*event)
 static bool __intel_cqm_rmid_rotate(void)
 {
struct perf_event *group, *start = NULL;
+   unsigned int threshold_limit;
unsigned int nr_needed = 0;
+   unsigned int nr_available;
bool rotated = false;
-   bool available;
 
mutex_lock(_mutex);
 
@@ -756,14 +755,41 @@ stabilize:
 * Alternatively, if we didn't need to perform any rotation,
 * we'll have a bunch of RMIDs in limbo that need stabilizing.
 */
-   if (!intel_cqm_rmid_stabilize()) {
-   unsigned int limit;
+   threshold_limit = __intel_cqm_max_threshold / cqm_l3_scale;
+
+   while (intel_cqm_rmid_stabilize(_available) &&
+  __intel_cqm_threshold < threshold_limit) {
+   unsigned int steal_limit;
+
+   /* Allow max 25% of RMIDs to be in limbo. */
+   steal_limit = (cqm_max_rmid + 1) / 4;
 
-   limit = __intel_cqm_max_threshold / cqm_l3_scale;
-   if (available && __intel_cqm_threshold < limit) {
-   __intel_cqm_threshold++;
+   /*
+* We failed to stabilize any RMIDs so our rotation
+* logic is now stuck. In order to make forward progress
+* we have a few options:
+*
+*   1. rotate ("steal") another RMID
+*   2. increase the threshold
+*   3. do nothing
+*
+* We do both of 1. and 2. until we hit the steal limit.
+*
+* The steal limit prevents all RMIDs ending up on the
+* limbo list. This can happen if every RMID has a
+* non-zero occupancy above threshold_limit, and the
+* occupancy values aren't dropping fast enough.
+*
+* Note that there is prioritisation at work here - we'd
+* rather increase the number of RMIDs on the limbo list
+* than increase the threshold, 

Re: [PATCH 4/8] x86/spinlock: Leftover conversion ACCESS_ONCE->READ_ONCE

2015-01-15 Thread Oleg Nesterov
On 01/15, Christian Borntraeger wrote:
>
> --- a/arch/x86/include/asm/spinlock.h
> +++ b/arch/x86/include/asm/spinlock.h
> @@ -186,7 +186,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t 
> *lock)
>   __ticket_t head = ACCESS_ONCE(lock->tickets.head);
>  
>   for (;;) {
> - struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
> + struct __raw_tickets tmp = READ_ONCE(lock->tickets);

Agreed, but what about another ACCESS_ONCE() above?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sysfs methods can race with ->remove

2015-01-15 Thread Tejun Heo
Hello, Alan.

On Thu, Jan 15, 2015 at 01:22:03PM -0500, Alan Stern wrote:
> > It has a reference to keep it from beeing freed, but so far I can't find
> > anything that prevents ->remove from beeing called while we are in or
> > just before a method call.
> 
> There are two types of methods to think about: Those registered by the 
> subsystem and those registered by the driver.
> 
> If a method is registered by the driver, then the driver will
> unregister it when the ->remove routine runs.  I don't know for
> certain, but I would expect that the sysfs/kernfs core will make sure
> that any existing method calls complete before unregister returns.  
> This would prevent races.

Yes, attribute deletions are blocked till the on-going sysfs
read/write operations are finished and further rw accesses are failed.

> If a method is registered by the subsystem, and if the method runs 
> entirely within the subsystem's code, then ->remove doesn't matter.  
> The driver could be unbound while the method is running and it would be 
> okay.
> 
> The only time we have a problem is when the method is registered by the 
> subsystem and the method calls into the driver.  (Note that this is 
> exactly what happens with scsi_rescan_device.)
> 
> > > > But this seems like a more generic problem, and at least a quick glance 
> > > > at
> > > > the pci_driver methods seems like others don't have a good
> > > > synchroniation of ->remove against random driver methods.
> > > 
> > > Can you give one or two examples?
> > 
> > I look at the sriov_configure PCI method, or the various sub-methods
> > under pci_driver.err_handler.
> 
> The sriov_numvfs_store method does have the same problem, and so does 
> the reset_store method (by way of pci_reset_function -> 
> pci_dev_save_and_disable -> pci_reset_notify).
> 
> Tejun, is my analysis correct?  How should we fix these races?

I'm not really following what the actual problem case is, so SCSI
subsystem store methods are derefing dev->driver without synchronizing
against detach events?  If that's the case, the solution would be
synchronizing against attach/detach events?  Sorry if I'm being
totally idiotic.  I'm having a bit of hard time jumping right in.  :)

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    10   11   12   13   14   15   16   17   18   19   >