date:20140429

At Tue, 29 Apr 2014 16:05:54 +0200,
Jose Ignacio Naranjo wrote:
 
 Hi,
 
 I sent a similar solution some months ago for another Toshiba's model
 
 http://permalink.gmane.org/gmane.linux.drivers.platform.x86.devel/5198
 
 I answered to the thread, but I just noticed it didn't make it to the list.
 Don't know why, maybe because of having attached some files :(
 
 My answer was basically if we could use the oem_table_id instead of DMI,
 but I don't know others dsdt from Toshiba.

Yeah, that sounds feasible.  I'm also not particularly in favor of
DMI, either.  It was used just because it's the easiest way.

I cannot answer to both of Matthew's questions in the thread above, as
I'm no owner of the machine but merely a patch monkey who tried to
solve a bug on openSUSE.  Feel free to join to the bugzilla thread
mentioned in the patch for more detailed information.


thanks,

Takashi

 
 Regards,
 JI
 
 
 
 On Tue, Apr 29, 2014 at 3:15 PM, Takashi Iwai ti...@suse.de wrote:
 
  Toshiba Satellite M840 laptop has a complete different keymap although
  it's bound with the same ACPI ID TOS1900.  This patch provides an
  alternative keymap specific to this machine by identifying via DMI
  matching.  The keymap table doesn't fill all entries that were used
  before since some keys aren't found on this machine at all.
 
  Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=69761
  Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=812209
  Reported-and-tested-by: Federico Vecchiarelli fe...@gmx.net
  Signed-off-by: Takashi Iwai ti...@suse.de
  ---
   drivers/platform/x86/toshiba_acpi.c | 30 +-
   1 file changed, 29 insertions(+), 1 deletion(-)
 
  diff --git a/drivers/platform/x86/toshiba_acpi.c
  b/drivers/platform/x86/toshiba_acpi.c
  index 46473ca7566b..76441dcbe5ff 100644
  --- a/drivers/platform/x86/toshiba_acpi.c
  +++ b/drivers/platform/x86/toshiba_acpi.c
  @@ -56,6 +56,7 @@
   #include linux/workqueue.h
   #include linux/i8042.h
   #include linux/acpi.h
  +#include linux/dmi.h
   #include asm/uaccess.h
 
   MODULE_AUTHOR(John Belmonte);
  @@ -213,6 +214,30 @@ static const struct key_entry toshiba_acpi_keymap[] =
  {
  { KE_END, 0 },
   };
 
  +/* alternative keymap */
  +static const struct dmi_system_id toshiba_alt_keymap_dmi[] = {
  +   {
  +   .matches = {
  +   DMI_MATCH(DMI_SYS_VENDOR, TOSHIBA),
  +   DMI_MATCH(DMI_PRODUCT_NAME, Satellite M840),
  +   },
  +   },
  +   {}
  +};
  +
  +static const struct key_entry toshiba_acpi_alt_keymap[] = {
  +   { KE_KEY, 0x157, { KEY_MUTE } },
  +   { KE_KEY, 0x102, { KEY_ZOOMOUT } },
  +   { KE_KEY, 0x103, { KEY_ZOOMIN } },
  +   { KE_KEY, 0x139, { KEY_ZOOMRESET } },
  +   { KE_KEY, 0x13e, { KEY_SWITCHVIDEOMODE } },
  +   { KE_KEY, 0x13c, { KEY_BRIGHTNESSDOWN } },
  +   { KE_KEY, 0x13d, { KEY_BRIGHTNESSUP } },
  +   { KE_KEY, 0x158, { KEY_WLAN } },
  +   { KE_KEY, 0x13f, { KEY_TOUCHPAD_TOGGLE } },
  +   { KE_END, 0 },
  +};
  +
   /* utility
*/
 
  @@ -1440,6 +1465,7 @@ static int toshiba_acpi_setup_keyboard(struct
  toshiba_acpi_dev *dev)
  acpi_handle ec_handle;
  int error;
  u32 hci_result;
  +   const struct key_entry *keymap = toshiba_acpi_keymap;
 
  dev-hotkey_dev = input_allocate_device();
  if (!dev-hotkey_dev)
  @@ -1449,7 +1475,9 @@ static int toshiba_acpi_setup_keyboard(struct
  toshiba_acpi_dev *dev)
  dev-hotkey_dev-phys = toshiba_acpi/input0;
  dev-hotkey_dev-id.bustype = BUS_HOST;
 
  -   error = sparse_keymap_setup(dev-hotkey_dev, toshiba_acpi_keymap,
  NULL);
  +   if (dmi_check_system(toshiba_alt_keymap_dmi))
  +   keymap = toshiba_acpi_alt_keymap;
  +   error = sparse_keymap_setup(dev-hotkey_dev, keymap, NULL);
  if (error)
  goto err_free_dev;
 
  --
  1.9.2
 
  --
  To unsubscribe from this list: send the line unsubscribe
  platform-driver-x86 in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 [2  text/html; UTF-8 (quoted-printable)]
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RT 2/4] net: gianfar: do not try to cleanup TX packets if they are not done

2014-04-29 Thread Paul Gortmaker

On 14-04-28 02:37 AM, Sebastian Andrzej Siewior wrote:
 On 04/27/2014 04:31 PM, Steven Rostedt wrote:
 diff --git a/drivers/net/ethernet/freescale/gianfar.c 
 b/drivers/net/ethernet/freescale/gianfar.c
 index 5c0efcc..8aecc1d 100644
 --- a/drivers/net/ethernet/freescale/gianfar.c
 +++ b/drivers/net/ethernet/freescale/gianfar.c
 @@ -2856,10 +2855,14 @@ static int gfar_poll(struct napi_struct *napi, int 
 budget)
  tx_queue = priv-tx_queue[i];
  /* run Tx cleanup to completion */
  if (tx_queue-tx_skbuff[tx_queue-skb_dirtytx]) {
 -gfar_clean_tx_ring(tx_queue);
 -has_tx_work = 1;
 +int ret;
 +
 +ret = gfar_clean_tx_ring(tx_queue);
 +if (ret)
 +has_tx_work++;
  }
  }
 +work_done += has_tx_work;
  
  for_each_set_bit(i, gfargrp-rx_bit_map, priv-num_rx_queues) {
  /* skip queue if not active */
 
 The 3.14-RT version of the patch should have an additional return
 statement here which I forgot initially.

Sanity boot tested the 3.10 rc1 on a sbc8548 (UP PPC with gianfar), with
the one-liner added as follows:

diff --git a/drivers/net/ethernet/freescale/gianfar.c 
b/drivers/net/ethernet/freescale/gianfar.c
index 8aecc1d81395..b87a8c919c3e 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -2574,6 +2574,7 @@ static int gfar_clean_tx_ring(struct gfar_priv_tx_q 
*tx_queue)
tx_queue-dirty_tx = bdp;
 
netdev_tx_completed_queue(txq, howmany, bytes_sent);
+   return howmany;
 }
 
 static void gfar_schedule_cleanup(struct gfar_priv_grp *gfargrp)

Paul.
--


 
 Sebastian
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: Pulseaudio hung at schedule in 3.15-rc1

At Sun, 20 Apr 2014 14:50:12 -0400,
Bryan Quigley wrote:
 
  Does the patch below work?
 Nope, that didn't help.

Now I'm back and start looking at this again.
Could you give the raw kernel messages with the stack trace?
The previous cited message is hard to read.

 I did determine it is caused by having my Logitech webcam plugged in.
 Here is lsusb from a working system:
 Bus 002 Device 002: ID 046d:0825 Logitech, Inc. Webcam C270

OK, there is a quick for this device but it shouldn't affect the
suspend/resume behavior, at least.


thanks,

Takashi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched_{set,get}attr() manpage

2014-04-29 Thread Peter Zijlstra

On Tue, Apr 29, 2014 at 03:08:55PM +0200, Michael Kerrisk (man-pages) wrote:
 Hi Peter,
 
 On 04/28/2014 10:18 AM, Peter Zijlstra wrote:
  Hi Michael,
  
  find below an updated manpage, I did not apply the comments on parts
  that are identical to SCHED_SETSCHEDULER(2) in order to keep these texts
  in alignment. I feel that if we change one we should also change the
  other, and such a 'patch' is best done separate from the new manpage
  itself.
  
  I did add the missing EBUSY error, and amended the text where it said
  we'd return EINVAL in that case.
  
  I added a paragraph stating that SCHED_DEADLINE preempted anything else
  userspace can do (with the explicit mention of userspace to leave me
  wriggle room for the kernel's stop task :-).
  
  I also did a short paragraph on the deadline sched_yield(). For further
  deadline yield details we should maybe add to the SCHED_YIELD(2)
  manpage.
  
  Re juri/claudio; no I think sched_yield() as implemented for deadline
  makes sense, no other yield semantics other than NOP makes sense for it,
  and since we have the syscall already might as well make it do something
  useful.
 
 Thanks for the updated page. Would you be willing
 to revise as per the comments below.

Ok.

 
  NAME
  sched_setattr, sched_getattr - set and get scheduling policy/attributes
  
  SYNOPSIS
  #include sched.h
  
  struct sched_attr {
  u32 size;
  u32 sched_policy;
  u64 sched_flags;
  
  /* SCHED_NORMAL, SCHED_BATCH */
  s32 sched_nice;
  /* SCHED_FIFO, SCHED_RR */
  u32 sched_priority;
  /* SCHED_DEADLINE */
  u64 sched_runtime;
  u64 sched_deadline;
  u64 sched_period;
  };
  int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned 
  int flags);
  
  int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned 
  int size, unsigned int flags);
  
  DESCRIPTION
  sched_setattr() sets both the scheduling policy and the
  associated attributes for the process whose ID is specified in
  pid.  
 
 Around about here, I think there needs to be a sentence explaining
 that sched_setattr() provides a superset of the functionality of 
 sched_setscheduler(2) and setpritority(2). I mean, it can do all that 
 those two calls can do, right?

Almost; setpriority() has the .which argument which we don't have. So
while that syscall can change the nice value for an entire process group
or user, sched_setattr() can only change the nice value for 1 task.

But yes, I can mention something along those lines.

  If pid equals zero, the scheduling policy and attributes
  of the calling process will be set.  The interpretation of the
  argument attr depends on the selected policy.  Currently, Linux
  supports the following normal (i.e., non-real-time) scheduling
  policies:
  
  SCHED_OTHER the standard fair time-sharing policy;
  
  SCHED_BATCH for batch style execution of processes; and
  
  SCHED_IDLE  for running very low priority background jobs.
  
  The following real-time policies are also supported, for
  special time-critical applications that need precise control
  over the way in which runnable processes are selected for
  execution:
  
  SCHED_FIFO  a first-in, first-out policy;
  
  SCHED_RRa round-robin policy; and
  
  SCHED_DEADLINE  a deadline policy.
  
  The semantics of each of these policies are detailed below.
 
 The semantics of each of these policies are detailed in sched(7).

I don't appear to have SCHED(7), how new is that?

 [See my comments below]
 
  
  sched_attr::size must be set to the size of the structure, as in
  sizeof(struct sched_attr), if the provided structure is smaller
  than the kernel structure, any additional fields are assumed
  '0'. If the provided structure is larger than the kernel
  structure, the kernel verifies all additional fields are '0' if
  not the syscall will fail with -E2BIG.
  
  sched_attr::sched_policy the desired scheduling policy.
  
  sched_attr::sched_flags additional flags that can influence
  scheduling behaviour. Currently as per Linux kernel 3.14:
  
  SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
  to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
  on fork().
  
  is the only supported flag.
  
  sched_attr::sched_nice should only be set for SCHED_OTHER,
  SCHED_BATCH, the desired nice value [-20,19], see NICE(2).
  
  sched_attr::sched_priority should only be set for SCHED_FIFO,
  SCHED_RR, the desired static priority [1,99].
  
  sched_attr::sched_runtime
  sched_attr::sched_deadline
  sched_attr::sched_period should only be set for SCHED_DEADLINE
  and are the traditional sporadic task model parameters.
 
 Could you add (a lot ;-))

Re: [PATCH 2/5] netfilter: Fix format string mismatch in mangle_content_len()

2014-04-29 Thread Patrick McHardy

On Tue, Apr 01, 2014 at 12:43:36AM +0900, Masanari Iida wrote:
 Fix format string mismatch in mangle_connect_len()

All these patches seem like pointless noise to me. In none of these
cases can the value legitimately be negative. If anything, you should
fix the types to be unsigned.

 
 Signed-off-by: Masanari Iida standby2...@gmail.com
 ---
  net/netfilter/nf_nat_sip.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/net/netfilter/nf_nat_sip.c b/net/netfilter/nf_nat_sip.c
 index b4d691d..5f98845 100644
 --- a/net/netfilter/nf_nat_sip.c
 +++ b/net/netfilter/nf_nat_sip.c
 @@ -434,7 +434,7 @@ static int mangle_content_len(struct sk_buff *skb, 
 unsigned int protoff,
 matchoff, matchlen) = 0)
   return 0;
  
 - buflen = sprintf(buffer, %u, c_len);
 + buflen = sprintf(buffer, %d, c_len);
   return mangle_packet(skb, protoff, dataoff, dptr, datalen,
matchoff, matchlen, buffer, buflen);
  }
 -- 
 1.9.1.352.gd393d14
 
 --
 To unsubscribe from this list: send the line unsubscribe netfilter-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] mba6x_bl: Backlight driver for mid 2013 MacBook Air

2014-04-29 Thread Patrik Jakobsson

This driver takes control over the LP8550 backlight driver chip found
in the mid 2013 and newer MacBook Air (6,1 and 6,2). The i915 GPU driver
cannot properly restore the backlight after resume, but with this driver
we can hijack the LP8550 and get fully functional backlight support.

v2: - Dropped if ACPI in Kconfig since we already depend on it
- Added comment about brightness mapping
- Removed lp8550_init() from set_brightness()
- Always write to dev_ctl when setting brightness
- Change %Ld to standard C %lld
- Constify the backlight_ops struct

Signed-off-by: Patrik Jakobsson patrik.r.jakobs...@gmail.com
---
 MAINTAINERS |   6 +
 drivers/platform/x86/Kconfig|  13 ++
 drivers/platform/x86/Makefile   |   1 +
 drivers/platform/x86/mba6x_bl.c | 353 
 4 files changed, 373 insertions(+)
 create mode 100644 drivers/platform/x86/mba6x_bl.c

diff --git a/MAINTAINERS b/MAINTAINERS
index e67ea24..cad3e82 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5576,6 +5576,12 @@ T:   git 
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git
 S: Maintained
 F: net/mac80211/rc80211_pid*
 
+MACBOOK AIR 6,X BACKLIGHT DRIVER
+M: Patrik Jakobsson patrik.r.jakobs...@gmail.com
+L: platform-driver-...@vger.kernel.org
+S: Maintained
+F: drivers/platform/x86/mba6x_bl.c
+
 MACVLAN DRIVER
 M: Patrick McHardy ka...@trash.net
 L: net...@vger.kernel.org
diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
index 27df2c5..10ac918 100644
--- a/drivers/platform/x86/Kconfig
+++ b/drivers/platform/x86/Kconfig
@@ -795,6 +795,19 @@ config APPLE_GMUX
  graphics as well as the backlight. Currently only backlight
  control is supported by the driver.
 
+config MBA6X_BL
+   tristate MacBook Air 6,x backlight driver
+   depends on ACPI
+   depends on BACKLIGHT_CLASS_DEVICE
+   select ACPI_VIDEO
+   help
+This driver takes control over the LP8550 backlight driver found in
+some MacBook Air models. Say Y here if you have a MacBook Air from mid
+2013 or newer.
+
+To compile this driver as a module, choose M here: the module will
+be called mba6x_bl.
+
 config INTEL_RST
 tristate Intel Rapid Start Technology Driver
depends on ACPI
diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile
index 1a2eafc..9a182fe 100644
--- a/drivers/platform/x86/Makefile
+++ b/drivers/platform/x86/Makefile
@@ -56,3 +56,4 @@ obj-$(CONFIG_INTEL_SMARTCONNECT)  += intel-smartconnect.o
 
 obj-$(CONFIG_PVPANIC)   += pvpanic.o
 obj-$(CONFIG_ALIENWARE_WMI)+= alienware-wmi.o
+obj-$(CONFIG_MBA6X_BL) += mba6x_bl.o
diff --git a/drivers/platform/x86/mba6x_bl.c b/drivers/platform/x86/mba6x_bl.c
new file mode 100644
index 000..c549667
--- /dev/null
+++ b/drivers/platform/x86/mba6x_bl.c
@@ -0,0 +1,353 @@
+/*
+ * MacBook Air 6,1 and 6,2 (mid 2013) backlight driver
+ *
+ * Copyright (C) 2014 Patrik Jakobsson (patrik.r.jakobs...@gmail.com)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include linux/kernel.h
+#include linux/init.h
+#include linux/module.h
+#include linux/platform_device.h
+#include linux/backlight.h
+#include linux/acpi.h
+#include acpi/acpi.h
+#include acpi/video.h
+
+#define LP8550_SMBUS_ADDR  (0x58  1)
+#define LP8550_REG_BRIGHTNESS  0
+#define LP8550_REG_DEV_CTL 1
+#define LP8550_REG_FAULT   2
+#define LP8550_REG_IDENT   3
+#define LP8550_REG_DIRECT_CTL  4
+#define LP8550_REG_TEMP_MSB5 /* Must be read before TEMP_LSB */
+#define LP8550_REG_TEMP_LSB6
+
+#define INIT_BRIGHTNESS150
+
+static struct {
+   u8 brightness;  /* Brightness control */
+   u8 dev_ctl; /* Device control */
+   u8 fault;   /* Fault indication */
+   u8 ident;   /* Identification */
+   u8 direct_ctl;  /* Direct control */
+   u8 temp_msb;/* Temperature MSB  */
+   u8 temp_lsb;/* Temperature LSB */
+} lp8550_regs;
+
+static struct platform_device *platform_device;
+static struct backlight_device *backlight_device;
+
+static int lp8550_reg_read(u8 reg, u8 *val)
+{
+   acpi_status status;
+   acpi_handle handle;
+   struct acpi_object_list arg_list;
+   struct acpi_buffer buffer = {ACPI_ALLOCATE_BUFFER, NULL};
+   union acpi_object args[2];
+   union acpi_object *result;
+   int ret = 0;
+
+   status = acpi_get_handle(NULL, \\_SB.PCI0.SBUS.SRDB, handle);
+   if (ACPI_FAILURE(status)) {
+

Re: [RFC PATCH v2 4/9] crypto: qce: Add ablkcipher algorithms

2014-04-29 Thread Stanimir Varbanov

Thanks for the review!

On 04/28/2014 11:00 AM, Herbert Xu wrote:
 On Mon, Apr 14, 2014 at 03:48:40PM +0300, Stanimir Varbanov wrote:

 +if (IS_AES(flags)) {
 +switch (keylen) {
 +case AES_KEYSIZE_128:
 +case AES_KEYSIZE_256:
 +break;
 +default:
 +goto badkey;
 
 You need to support 192 here.  If the hardware doesn't do that
 you can work around it by using a software fallback.

Sure, I will make a software fallback. Thanks.

 
 In general you need to provide everything that is supported by
 the generic software implementation.


-- 
regards,
Stan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf tests: Add static build make test

2014-04-29 Thread David Ahern


On 4/29/14, 2:33 AM, Jiri Olsa wrote:

Adding test for building static perf build into the automated
suite. Also available via following commands:

   $ make -f tests/make make_static
   - make_static: cd .  make -f Makefile DESTDIR=/tmp/tmp.7u5MlB4njo 
LDFLAGS=-static
   $ make -f tests/make make_static_O
   - make_static_O: cd .  make -f Makefile O=/tmp/tmp.Ay6r3wEmtX 
DESTDIR=/tmp/tmp.vK0KQwO0Vi LDFLAGS=-static

Cc: Arnaldo Carvalho de Melo a...@kernel.org
Cc: Corey Ashford cjash...@linux.vnet.ibm.com
Cc: David Ahern dsah...@gmail.com
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Ingo Molnar mi...@kernel.org
Cc: Namhyung Kim namhy...@kernel.org
Cc: Paul Mackerras pau...@samba.org
Cc: Peter Zijlstra a.p.zijls...@chello.nl
Signed-off-by: Jiri Olsa jo...@kernel.org
---
  tools/perf/tests/make | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/tools/perf/tests/make b/tools/perf/tests/make
index 5daeae1..2f92d6e 100644
--- a/tools/perf/tests/make
+++ b/tools/perf/tests/make
@@ -46,6 +46,7 @@ make_install_man:= install-man
  make_install_html   := install-html
  make_install_info   := install-info
  make_install_pdf:= install-pdf
+make_static := LDFLAGS=-static

  # all the NO_* variable combined
  make_minimal:= NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1
@@ -87,6 +88,7 @@ run += make_install_bin
  # run += make_install_info
  # run += make_install_pdf
  run += make_minimal
+run += make_static

  ifneq ($(call has,ctags),)
  run += make_tags



Acked-by: David Ahern dsah...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RT 2/4] net: gianfar: do not try to cleanup TX packets if they are not done

On Tue, 29 Apr 2014 10:16:51 -0400
Paul Gortmaker paul.gortma...@windriver.com wrote:

 
 Sanity boot tested the 3.10 rc1 on a sbc8548 (UP PPC with gianfar), with
 the one-liner added as follows:
 
 diff --git a/drivers/net/ethernet/freescale/gianfar.c 
 b/drivers/net/ethernet/freescale/gianfar.c
 index 8aecc1d81395..b87a8c919c3e 100644
 --- a/drivers/net/ethernet/freescale/gianfar.c
 +++ b/drivers/net/ethernet/freescale/gianfar.c
 @@ -2574,6 +2574,7 @@ static int gfar_clean_tx_ring(struct gfar_priv_tx_q 
 *tx_queue)
 tx_queue-dirty_tx = bdp;
  
 netdev_tx_completed_queue(txq, howmany, bytes_sent);
 +   return howmany;

That's the change I added to 3.10-rc2. I'll post it soon if you want to
test it.

-- Steve

  }
  
  static void gfar_schedule_cleanup(struct gfar_priv_grp *gfargrp)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On 04/29/2014 07:06 AM, Steven Rostedt wrote:
 On Tue, 29 Apr 2014 06:29:04 -0700
 H. Peter Anvin h...@linux.intel.com wrote:
 
  
 [2] A special case can occur if an SMI handler nests inside an NMI 
  handler and then another NMI occurs. During NMI interrupt 
  handling, NMI interrupts are disabled, so normally NMI interrupts 
  are serviced and completed with an IRET instruction one at a 
  time. When the processor enters SMM while executing an NMI 
  handler, the processor saves the SMRAM state save map but does 
  not save the attribute to keep NMI interrupts disabled. 
  Potentially, an NMI could be latched (while in SMM or upon exit) 
  and serviced upon exit of SMM even though the previous NMI  
  handler has still not completed.

 I believe [2] only applies if there is an IRET executing inside the SMM
 handler, which should not normally be the case.  It might also have been
 addressed since that was written, but I don't know.
 
 Bad behaving BIOS? But I'm sure there's no such thing ;-)
 

Never...

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] kdb: Implement seq_file command

Combining the kdb seq_file infrastructure with its symbolic lookups allows
a good sub-set of files held in pseudo filesystems to be displayed by
kdb. The seq_file command does exactly this and allows a significant
subset of pseudo files to be safely examined whilst debugging (and in
the hands of a brave expert an even bigger subset can be unsafely
examined).

Good arguments to try with this command include: cpuinfo_op,
gpiolib_seq_ops and vmalloc_op.

Signed-off-by: Daniel Thompson daniel.thomp...@linaro.org
---
 kernel/debug/kdb/kdb_main.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 0b097c8..d87731c 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -1734,6 +1734,32 @@ static int kdb_mm(int argc, const char **argv)
 }
 
 /*
+ * kdb_seq_file - This function implements the 'seq_file' command.
+ * seq_file address-expression
+ */
+static int kdb_seq_file(int argc, const char **argv)
+{
+   int diag;
+   unsigned long addr;
+   int nextarg;
+   long offset;
+   char *name;
+   const struct seq_operations *ops;
+
+   nextarg = 1;
+   diag = kdbgetaddrarg(argc, argv, nextarg, addr, offset, name);
+   if (diag)
+   return diag;
+
+   if (nextarg != argc+1)
+   return KDB_ARGCOUNT;
+
+   ops = (const struct seq_operations *) (addr + offset);
+   kdb_printf(Using sequence_ops at 0x%p (%s)\n, ops, name);
+   return kdb_print_seq_file(ops);
+}
+
+/*
  * kdb_go - This function implements the 'go' command.
  * go [address-expression]
  */
@@ -2838,6 +2864,8 @@ static void __init kdb_inittab(void)
  Display per_cpu variables, 3, KDB_REPEAT_NONE);
kdb_register_repeat(grephelp, kdb_grep_help, ,
  Display help on | grep, 0, KDB_REPEAT_NONE);
+   kdb_register_repeat(seq_file, kdb_seq_file, ,
+ Show a seq_file using struct seq_operations, 3, KDB_REPEAT_NONE);
 }
 
 /* Execute any commands defined in kdb_cmds.  */
-- 
1.9.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 1/4] net: gianfar: do not disable interrupts

3.10.37-rt38-rc2 stable review patch.
If anyone has any objections, please let me know.

--

From: Sebastian Andrzej Siewior bige...@linutronix.de

each per-queue lock is taken with spin_lock_irqsave() except in the case
where all of them are taken for some kind of serialisation. As an
optimisation local_irq_save() is used so that lock_tx_qs() and
lock_rx_qs() can use just the spin_lock() variant instead.
On RT local_irq_save() behaves differently so we use the nort()
variant.
Lockdep screems easily by ethtool -K eth0 rx off tx off

What remains is missing lockdep annotation that makes lockdep think
lock_tx_qs() may cause a dead lock.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior bige...@linutronix.de
Signed-off-by: Steven Rostedt rost...@goodmis.org
---
 drivers/net/ethernet/freescale/gianfar.c | 16 
 drivers/net/ethernet/freescale/gianfar_ethtool.c |  8 
 drivers/net/ethernet/freescale/gianfar_sysfs.c   | 24 
 3 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c 
b/drivers/net/ethernet/freescale/gianfar.c
index 0343a14..5c0efcc 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -1274,7 +1274,7 @@ static int gfar_suspend(struct device *dev)
 
if (netif_running(ndev)) {
 
-   local_irq_save(flags);
+   local_irq_save_nort(flags);
lock_tx_qs(priv);
lock_rx_qs(priv);
 
@@ -1292,7 +1292,7 @@ static int gfar_suspend(struct device *dev)
 
unlock_rx_qs(priv);
unlock_tx_qs(priv);
-   local_irq_restore(flags);
+   local_irq_restore_nort(flags);
 
disable_napi(priv);
 
@@ -1334,7 +1334,7 @@ static int gfar_resume(struct device *dev)
/* Disable Magic Packet mode, in case something
 * else woke us up.
 */
-   local_irq_save(flags);
+   local_irq_save_nort(flags);
lock_tx_qs(priv);
lock_rx_qs(priv);
 
@@ -1346,7 +1346,7 @@ static int gfar_resume(struct device *dev)
 
unlock_rx_qs(priv);
unlock_tx_qs(priv);
-   local_irq_restore(flags);
+   local_irq_restore_nort(flags);
 
netif_device_attach(ndev);
 
@@ -2346,7 +2346,7 @@ void gfar_vlan_mode(struct net_device *dev, 
netdev_features_t features)
u32 tempval;
 
regs = priv-gfargrp[0].regs;
-   local_irq_save(flags);
+   local_irq_save_nort(flags);
lock_rx_qs(priv);
 
if (features  NETIF_F_HW_VLAN_CTAG_TX) {
@@ -2379,7 +2379,7 @@ void gfar_vlan_mode(struct net_device *dev, 
netdev_features_t features)
gfar_change_mtu(dev, dev-mtu);
 
unlock_rx_qs(priv);
-   local_irq_restore(flags);
+   local_irq_restore_nort(flags);
 }
 
 static int gfar_change_mtu(struct net_device *dev, int new_mtu)
@@ -3258,14 +3258,14 @@ static irqreturn_t gfar_error(int irq, void *grp_id)
dev-stats.tx_dropped++;
atomic64_inc(priv-extra_stats.tx_underrun);
 
-   local_irq_save(flags);
+   local_irq_save_nort(flags);
lock_tx_qs(priv);
 
/* Reactivate the Tx Queues */
gfar_write(regs-tstat, gfargrp-tstat);
 
unlock_tx_qs(priv);
-   local_irq_restore(flags);
+   local_irq_restore_nort(flags);
}
netif_dbg(priv, tx_err, dev, Transmit Error\n);
}
diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c 
b/drivers/net/ethernet/freescale/gianfar_ethtool.c
index 21cd881..c965c0a 100644
--- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
+++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
@@ -501,7 +501,7 @@ static int gfar_sringparam(struct net_device *dev,
/* Halt TX and RX, and process the frames which
 * have already been received
 */
-   local_irq_save(flags);
+   local_irq_save_nort(flags);
lock_tx_qs(priv);
lock_rx_qs(priv);
 
@@ -509,7 +509,7 @@ static int gfar_sringparam(struct net_device *dev,
 
unlock_rx_qs(priv);
unlock_tx_qs(priv);
-   local_irq_restore(flags);
+   local_irq_restore_nort(flags);
 
for (i = 0; i  priv-num_rx_queues; i++)
gfar_clean_rx_ring(priv-rx_queue[i],
@@ -552,7 +552,7 @@ int gfar_set_features(struct net_device *dev, 
netdev_features_t features)
/* Halt TX and RX, and process the frames which
 * have already been received
 */
-   local_irq_save(flags);
+   local_irq_save_nort(flags);
lock_tx_qs(priv);
lock_rx_qs(priv);
 
@@ -560,7

[PATCH RT 0/4] Linux 3.10.37-rt38-rc2


Dear RT Folks,

This is the RT stable review cycle of patch 3.10.37-rt38-rc2.

Please scream at me if I messed something up. Please test the patches too.

The -rc release will be uploaded to kernel.org and will be deleted when
the final release is out. This is just a review release (or release candidate).

The pre-releases will not be pushed to the git repository, only the
final release is.

If all goes well, this patch will be converted to the next main release
on 4/30/2014.

Enjoy,

-- Steve


To build 3.10.37-rt38-rc2 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.10.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.10.37.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.10/patch-3.10.37-rt38-rc2.patch.xz

You can also build from 3.10.37-rt37 by applying the incremental patch:

http://www.kernel.org/pub/linux/kernel/projects/rt/3.10/incr/patch-3.10.37-rt37-rt38-rc2.patch.xz


Changes from 3.10.37-rt37:

---


Sebastian Andrzej Siewior (3):
  net: gianfar: do not disable interrupts
  net: gianfar: do not try to cleanup TX packets if they are not done
  rcu: make RCU_BOOST default on RT

Steven Rostedt (Red Hat) (1):
  Linux 3.10.37-rt38-rc2


 drivers/net/ethernet/freescale/gianfar.c | 28 ++--
 drivers/net/ethernet/freescale/gianfar_ethtool.c |  8 +++
 drivers/net/ethernet/freescale/gianfar_sysfs.c   | 24 ++--
 init/Kconfig |  2 +-
 localversion-rt  |  2 +-
 5 files changed, 34 insertions(+), 30 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 2/4] net: gianfar: do not try to cleanup TX packets if they are not done

3.10.37-rt38-rc2 stable review patch.
If anyone has any objections, please let me know.

--

From: Sebastian Andrzej Siewior bige...@linutronix.de

What I observe is that the TX queue is not empty and does not make any
progress. gfar_clean_tx_ring() does not clean up the packet because it
is not completed yet.
The root cause is that the DMA engine did not start yet (it was
preempted before doing so) and that dumb loop, loops until that packet
is gone.
This is broken since c233cf4 (gianfar: Fix tx napi polling).

What remains are spurious interrupts if CPU0 cleans up TX packages and
CPU1 returns with IRQ_NONE.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior bige...@linutronix.de
[ added return howmany; ]
Signed-off-by: Steven Rostedt rost...@goodmis.org
---
 drivers/net/ethernet/freescale/gianfar.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c 
b/drivers/net/ethernet/freescale/gianfar.c
index 5c0efcc..b87a8c9 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -132,7 +132,6 @@ static int gfar_poll(struct napi_struct *napi, int budget);
 static void gfar_netpoll(struct net_device *dev);
 #endif
 int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit);
-static void gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue);
 static void gfar_process_frame(struct net_device *dev, struct sk_buff *skb,
   int amount_pull, struct napi_struct *napi);
 void gfar_halt(struct net_device *dev);
@@ -2475,7 +2474,7 @@ static void gfar_align_skb(struct sk_buff *skb)
 }
 
 /* Interrupt Handler for Transmit complete */
-static void gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
+static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
 {
struct net_device *dev = tx_queue-dev;
struct netdev_queue *txq;
@@ -2575,6 +2574,7 @@ static void gfar_clean_tx_ring(struct gfar_priv_tx_q 
*tx_queue)
tx_queue-dirty_tx = bdp;
 
netdev_tx_completed_queue(txq, howmany, bytes_sent);
+   return howmany;
 }
 
 static void gfar_schedule_cleanup(struct gfar_priv_grp *gfargrp)
@@ -2856,10 +2856,14 @@ static int gfar_poll(struct napi_struct *napi, int 
budget)
tx_queue = priv-tx_queue[i];
/* run Tx cleanup to completion */
if (tx_queue-tx_skbuff[tx_queue-skb_dirtytx]) {
-   gfar_clean_tx_ring(tx_queue);
-   has_tx_work = 1;
+   int ret;
+
+   ret = gfar_clean_tx_ring(tx_queue);
+   if (ret)
+   has_tx_work++;
}
}
+   work_done += has_tx_work;
 
for_each_set_bit(i, gfargrp-rx_bit_map, priv-num_rx_queues) {
/* skip queue if not active */
-- 
1.8.5.3


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2 4/9] crypto: qce: Add ablkcipher algorithms

2014-04-29 Thread Stanimir Varbanov

Thanks for the review!

On 04/28/2014 11:18 AM, Herbert Xu wrote:
 On Mon, Apr 14, 2014 at 03:48:40PM +0300, Stanimir Varbanov wrote:

 +} else if (IS_DES(flags)) {
 +u32 tmp[DES_EXPKEY_WORDS];
 +
 +if (keylen != QCE_DES_KEY_SIZE)
 +goto badkey;
 
 No need to check here since you've already set min_keysize and
 max_keysize correctly.
 
 +} else if (IS_3DES(flags)) {
 +if (keylen != DES3_EDE_KEY_SIZE)
 +goto badkey;
 
 Ditto.

OK, I will delete those needless keylen checks.


-- 
regards,
Stan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/3] kdb: Infrastructure to display sequence files

This patchset started out as a simple patch to introduce the irqs
command from Android's FIQ debugger to kdb. However it has since grown
more powerful because allowing kdb to reuse existing kernel
infrastructure gives us extra opportunities.

Based on the comments at the top of irqdesc.h (plotting to take the
irq_desc structure private to kernel/irq) and the relative similarity
between FIQ debugger's irqs command and the contents /proc/interrupts
we start by adding a kdb feature to print seq_files. This forms the 
foundation for a new command, interrupts.

I have also been able to implement a much more generic command,
seq_file, that can display a good number of files from pseudo
filesystems. This command is very powerful although that power does mean
care must be taken to deploy it safely. It is deliberately and by
default aimed at your foot!

Note that the risk associated with the seq_file command is why I
implemented the interrupts command in C (in principle it could have been
a kdb macro). Doing it in C codifies the need for show_interrupts() to
continue using spin locks as its locking strategy.

To give an idea of what can be done with this command. The following
seq_operations structures worked correctly and report no errors:

cpuinfo_op
extfrag_op
fragmentation_op
gpiolib_seq_ops
int_seq_ops (a.k.a. /proc/interrupts)
pagetypeinfo_op
unusable_op
vmalloc_op
zoneinfo_op

The following display the information correctly but triggered errors
(sleeping function called from invalid context) with lock debugging
enabled:

consoles_op
crypto_seq_ops
diskstats_op
partitions_op
slabinfo_op
vmstat_op

All tests are run on an ARM multi_v7_defconfig kernel (plus lots of
debug features) and halted using magic SysRq so that kdb has interrupt
context. Note also that some of the seq_operations structures hook into
driver supplied code that will only be called if that driver is enabled
so the test above are useful but cannot be exhaustive.

Daniel Thompson (3):
  kdb: Add framework to display sequence files
  proc: Provide access to /proc/interrupts from kdb
  kdb: Implement seq_file command

 fs/proc/interrupts.c| 10 +
 include/linux/kdb.h |  3 +++
 kernel/debug/kdb/kdb_io.c   | 51 +
 kernel/debug/kdb/kdb_main.c | 28 +
 4 files changed, 92 insertions(+)

-- 
1.9.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] proc: Provide access to /proc/interrupts from kdb

The contents of /proc/interrupts is useful to diagnose problems during
boot up or when the system becomes unresponsive (or at least it can be if
failure is causes by interrupt problems). This command is also seen in
out-of-tree debug systems such as Android's FIQ debugger.

This change allows the file to be displayed from kdb.

Signed-off-by: Daniel Thompson daniel.thomp...@linaro.org
---
 fs/proc/interrupts.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/fs/proc/interrupts.c b/fs/proc/interrupts.c
index a352d57..1f8eeaf 100644
--- a/fs/proc/interrupts.c
+++ b/fs/proc/interrupts.c
@@ -4,6 +4,7 @@
 #include linux/irqnr.h
 #include linux/proc_fs.h
 #include linux/seq_file.h
+#include linux/kdb.h
 
 /*
  * /proc/interrupts
@@ -45,9 +46,18 @@ static const struct file_operations 
proc_interrupts_operations = {
.release= seq_release,
 };
 
+#ifdef CONFIG_KGDB_KDB
+static int kdb_interrupts(int argc, const char **argv)
+{
+   return kdb_print_seq_file(int_seq_ops);
+}
+#endif
+
 static int __init proc_interrupts_init(void)
 {
proc_create(interrupts, 0, NULL, proc_interrupts_operations);
+   kdb_register(interrupts, kdb_interrupts, ,
+Show /proc/interrupts, 3);
return 0;
 }
 fs_initcall(proc_interrupts_init);
-- 
1.9.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 3/4] rcu: make RCU_BOOST default on RT

3.10.37-rt38-rc2 stable review patch.
If anyone has any objections, please let me know.

--

From: Sebastian Andrzej Siewior bige...@linutronix.de

Since it is no longer invoked from the softirq people run into OOM more
often if the priority of the RCU thread is too low. Making boosting
default on RT should help in those case and it can be switched off if
someone knows better.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior bige...@linutronix.de
Signed-off-by: Steven Rostedt rost...@goodmis.org
---
 init/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 6c3a4fd..bd3612d 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -604,7 +604,7 @@ config TREE_RCU_TRACE
 config RCU_BOOST
bool Enable RCU priority boosting
depends on RT_MUTEXES  PREEMPT_RCU
-   default n
+   default y if PREEMPT_RT_FULL
help
  This option boosts the priority of preempted RCU readers that
  block the current preemptible RCU grace period for too long.
-- 
1.8.5.3


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 2/2 with seqcount v3] reservation: add suppport for read-only access using rcu

2014-04-29 Thread Maarten Lankhorst


op 23-04-14 13:15, Maarten Lankhorst schreef:

This adds 4 more functions to deal with rcu.

reservation_object_get_fences_rcu() will obtain the list of shared
and exclusive fences without obtaining the ww_mutex.

reservation_object_wait_timeout_rcu() will wait on all fences of the
reservation_object, without obtaining the ww_mutex.

reservation_object_test_signaled_rcu() will test if all fences of the
reservation_object are signaled without using the ww_mutex.

reservation_object_get_excl() is added because touching the fence_excl
member directly will trigger a sparse warning.

Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com
---
Using seqcount and fixing some lockdep bugs.
Changes since v2:
- Fix some crashes, remove some unneeded barriers when provided by seqcount 
writes
- Fix code to work correctly with sparse's RCU annotations.
- Create a global string for the seqcount lock to make lockdep happy.

Can I get this version reviewed? If it looks correct I'll mail the full series
because it's intertwined with the TTM conversion to use this code.

Ping, can anyone review this?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

2014-04-29 Thread Petr Tesarik

On Tue, 29 Apr 2014 06:29:04 -0700
H. Peter Anvin h...@linux.intel.com wrote:

On 04/29/2014 06:05 AM, Jiri Kosina wrote:

We were not able to come up with any other fix than avoiding using IST
completely on x86_64, and instead going back to stack switching in
software -- the same way 32bit x86 does.

This is not possible, though, because there are several windows during
which if we were to take an exception which doesn't do IST, e.g. NMI, we
are worse than dead -- we are in fact rootable. Right after SYSCALL in
particular.

Ah, right. SYSCALL does not update RSP. :-(
Hm, so anything that can fire up right after a SYSCALL must use IST.
It's possible to use an alternative IDT that gets loaded as the first
thing in an NMI handler, but this gets incredibly ugly...

So basically, I have two questions:

(1) is the above analysis correct? (if not, why?)
(2) if it is correct, is there any other option for fix than avoiding
using IST for exception stack switching, and having kernel do the
legacy task switching (the same way x86_32 is doing)?

It is not an option, see above.

[1]
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

[2] A special case can occur if an SMI handler nests inside an NMI
handler and then another NMI occurs. During NMI interrupt
handling, NMI interrupts are disabled, so normally NMI interrupts
are serviced and completed with an IRET instruction one at a
time. When the processor enters SMM while executing an NMI
handler, the processor saves the SMRAM state save map but does
not save the attribute to keep NMI interrupts disabled.
Potentially, an NMI could be latched (while in SMM or upon exit)
and serviced upon exit of SMM even though the previous NMI
handler has still not completed.

I believe [2] only applies if there is an IRET executing inside the SMM
handler, which should not normally be the case. It might also have been
addressed since that was written, but I don't know.

The trouble here is that the official Intel documentation describes how
to do this and specifically requests the OS to cope with nested NMIs.

Petr T
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace

2014-04-29 Thread Marian Marinov


Hello,
when using user namespaces I found a bug in the capability checks done by ioctl.

If someone tries to use chattr +i while in a different user namespace it will 
get the following:

ioctl(3, EXT2_IOC_SETFLAGS, 0x7fffa4fedacc) = -1 EPERM (Operation not permitted)

I'm proposing a fix to this, by replacing the capable(CAP_LINUX_IMMUTABLE) check with 
ns_capable(current_cred()-user_ns, CAP_LINUX_IMMUTABLE).


If you agree I can send patches for all filesystems.

I'm proposing the following patch:

---
 fs/ext4/ioctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index d011b69..25683d0 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -265,7 +265,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, 
unsigned long arg)
 * This test looks nicer. Thanks to Pauline Middelink
 */
if ((flags ^ oldflags)  (EXT4_APPEND_FL | EXT4_IMMUTABLE_FL)) {
-   if (!capable(CAP_LINUX_IMMUTABLE))
+   if (!ns_capable(current_cred()-user_ns, 
CAP_LINUX_IMMUTABLE))
goto flags_out;
}

--
1.8.4


--
Marian Marinov
Founder  CEO of 1H Ltd.
Jabber/GTalk: hack...@jabber.org
ICQ: 7556201
Mobile: +359 886 660 270
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 00/10] arm64: UEFI support

On 04/29/2014 04:43 AM, Matt Fleming wrote:
 (Pulling in Peter and Stephen)
 
 On Tue, 29 Apr, at 11:28:17AM, Catalin Marinas wrote:

 The patches look fine to me, they've been through several rounds of
 review already. How do we propose these get merged as the series
 contains both generic and arm64 patches? And there are dependencies
 already in linux-next.

 Are the EFI patches in -next pulled from some non-rebaseable branch?
 
 Peter suggsted a plan when he took the generic EFI stuff that's in tip
 (and hence currently in linux-next),
 
   It doesn't hurt to inform Stephen, although I think it will simply fall
   out automatically since he uses git to merge and git will recognize the
   graph.
 
   During the merge window, it means they should not push their patches
   until Linus has accepted the precondition patches from the tip tree.
   Since Ingo and I try to push most of the tip tree as early as possible
   in the merge window, this is usually not a problem.
 
 So we currently have the prerequisites in tip/x86/efi, and assuming that
 this 10-patch series gets merged into a single branch somewhere, things
 should work automatically for linux-next.
 
 It may be prudent to negotiate a plan now for when the merge window
 opens because, as Peter mentions above, the stuff in tip/x86/efi needs
 to be merged by Linus first to avoid build breakage with the arm64
 stuff.

Whomever is going to push the arm64 stuff just needs to be aware of this
constraint.  Again, since we tend to push -tip very early in the merge
window, unless there are problems or late additions, this is unlikely to
be a problem in any way.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 4/4] Linux 3.10.37-rt38-rc2

3.10.37-rt38-rc2 stable review patch.
If anyone has any objections, please let me know.

--

From: Steven Rostedt (Red Hat) rost...@goodmis.org

---
 localversion-rt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/localversion-rt b/localversion-rt
index a3b2408..43245dc 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt37
+-rt38-rc2
-- 
1.8.5.3


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] kdb: Add framework to display sequence files

Lots of useful information about the system is held in pseudo filesystems
and presented using the seq_file mechanism. Unfortunately during both boot
up and kernel panic (both good times to break out kdb) it is difficult to
examine these files. This patch introduces a means to display sequence
files via kdb.

Signed-off-by: Daniel Thompson daniel.thomp...@linaro.org
---
 include/linux/kdb.h   |  3 +++
 kernel/debug/kdb/kdb_io.c | 51 +++
 2 files changed, 54 insertions(+)

diff --git a/include/linux/kdb.h b/include/linux/kdb.h
index 290db12..2607893 100644
--- a/include/linux/kdb.h
+++ b/include/linux/kdb.h
@@ -25,6 +25,7 @@ typedef int (*kdb_func_t)(int, const char **);
 #include linux/init.h
 #include linux/sched.h
 #include linux/atomic.h
+#include linux/seq_file.h
 
 #define KDB_POLL_FUNC_MAX  5
 extern int kdb_poll_idx;
@@ -117,6 +118,8 @@ extern __printf(1, 0) int vkdb_printf(const char *fmt, 
va_list args);
 extern __printf(1, 2) int kdb_printf(const char *, ...);
 typedef __printf(1, 2) int (*kdb_printf_t)(const char *, ...);
 
+extern int kdb_print_seq_file(const struct seq_operations *ops);
+
 extern void kdb_init(int level);
 
 /* Access to kdb specific polling devices */
diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
index 14ff484..c68c223 100644
--- a/kernel/debug/kdb/kdb_io.c
+++ b/kernel/debug/kdb/kdb_io.c
@@ -850,3 +850,54 @@ int kdb_printf(const char *fmt, ...)
return r;
 }
 EXPORT_SYMBOL_GPL(kdb_printf);
+
+/*
+ * Display a seq_file on the kdb console.
+ */
+
+static int __kdb_print_seq_file(struct seq_file *m, void *v)
+{
+   int i, res;
+
+   res = m-op-show(m, v);
+   if (0 != res)
+   return KDB_BADLENGTH;
+
+   for (i = 0; i  m-count  !KDB_FLAG(CMD_INTERRUPT); i++)
+   kdb_printf(%c, m-buf[i]);
+   m-count = 0;
+
+   return 0;
+}
+
+int kdb_print_seq_file(const struct seq_operations *ops)
+{
+   static char seq_buf[4096];
+   static DEFINE_SPINLOCK(seq_buf_lock);
+   unsigned long flags;
+   struct seq_file m = {
+   .buf = seq_buf,
+   .size = sizeof(seq_buf),
+   /* .lock is deliberately uninitialized to help reveal
+* unsupportable show methods
+*/
+   .op = ops,
+   };
+   loff_t pos = 0;
+   void *v;
+   int res = 0;
+
+   v = ops-start(m, pos);
+   while (v) {
+   spin_lock_irqsave(seq_buf_lock, flags);
+   res = __kdb_print_seq_file(m, v);
+   spin_unlock_irqrestore(seq_buf_lock, flags);
+   if (res != 0 || KDB_FLAG(CMD_INTERRUPT))
+   break;
+
+   v = ops-next(m, v, pos);
+   }
+   ops-stop(m, v);
+
+   return res;
+}
-- 
1.9.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 00/10] arm64: UEFI support

On 04/29/2014 06:47 AM, Catalin Marinas wrote:
 
 Waiting for the tip/x86/efi to be merged first is not a problem. We
 also need a stable base for testing the arm64 UEFI series, so I assume
 this series can be based onto tip/x86/efi (would such branch be rebased
 before hitting mainline?).
 
 Given that Leif's series contains both generic efi and arm64 patches,
 what's your preference for merging them? I'm happy to add my ack and
 they go via your tree (or the other way around).
 

tip:x86/efi will not be rebased (barring major unforseen events).

I'm not opposed to pushing the arm64 patches through -tip (via Matt), if
it works with your workflow, either.  Perhaps we need to rename the
branch to tip:core/efi...

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2 1/9] crypto: qce: Add core driver implementation

2014-04-29 Thread Stanimir Varbanov

Thanks for the review!

On 04/28/2014 11:50 AM, Herbert Xu wrote:
 On Mon, Apr 14, 2014 at 03:48:37PM +0300, Stanimir Varbanov wrote:

 +if (backlog)
 +backlog-complete(backlog, -EINPROGRESS);
 
 The completion function needs to be called with BH disabled.
 
 Cheers,
 

This is new for me because I saw similar code in cryptd.c where in
cryptd_queue_worker() (workqueue context) the backlog-complete() is
called outside of local_bh_disable().

-- 
regards,
Stan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 00/12] scsi/NCR5380: fix debugging macros and #include structure

2014-04-29 Thread James Bottomley

On Tue, 2014-04-29 at 15:15 +1200, Michael Schmitz wrote:
 Finn,
 
 On Tue, Apr 29, 2014 at 2:22 PM, Finn Thain fth...@telegraphics.com.au 
 wrote:
 
  On Sat, 26 Apr 2014, James Bottomley wrote:
 
  OK, so this is a pretty big change to an unmaintained driver.  I'll take
  it if you're willing to maintain the driver afterwards ... in which case
  I need another patch to add you to the MAINTAINERS file.
 
  Sure, I'm happy to support these patches and future work I plan to do on
  the driver.
 
  What additional responsibilities would come with adding my name the
  MAINTAINERS file?
 
  Perhaps Michael and Sam would be interested in sharing the role, for atari
  and sun3 NCR5380 drivers (?)
 
 If you insist ...
 
 (kidding - Im OK with it if James thinks it's worth it)

As long as you understand how it works and how to fix it, the more the
merrier.  It gives me more people to yell at if something goes wrong
with the driver.

James


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 4/7] of: configure the platform device dma parameters

2014-04-29 Thread Grant Likely

On Thu, 24 Apr 2014 11:30:04 -0400, Santosh Shilimkar 
santosh.shilim...@ti.com wrote:
 Retrieve DMA configuration from DT and setup platform device's DMA
 parameters. The DMA configuration in DT has to be specified using
 dma-ranges and dma-coherent properties if supported.
 
 We setup dma_pfn_offset using dma-ranges and dma_coherent_ops
 using dma-coherent device tree properties.
 
 The set_arch_dma_coherent_ops macro has to be defined by arch if
 it supports coherent dma_ops. Otherwise, set_arch_dma_coherent_ops() is
 declared as nop.
 
 Cc: Greg Kroah-Hartman gre...@linuxfoundation.org
 Cc: Russell King li...@arm.linux.org.uk
 Cc: Arnd Bergmann a...@arndb.de
 Cc: Olof Johansson o...@lixom.net
 Cc: Grant Likely grant.lik...@linaro.org
 Cc: Rob Herring robh...@kernel.org
 Cc: Catalin Marinas catalin.mari...@arm.com
 Cc: Linus Walleij linus.wall...@linaro.org
 Signed-off-by: Grygorii Strashko grygorii.stras...@ti.com
 Signed-off-by: Santosh Shilimkar santosh.shilim...@ti.com
 ---
  drivers/of/platform.c   |   48 
 ---
  include/linux/dma-mapping.h |7 +++
  2 files changed, 52 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/of/platform.c b/drivers/of/platform.c
 index 48de98f..270c0b9 100644
 --- a/drivers/of/platform.c
 +++ b/drivers/of/platform.c
 @@ -187,6 +187,50 @@ struct platform_device *of_device_alloc(struct 
 device_node *np,
  EXPORT_SYMBOL(of_device_alloc);
  
  /**
 + * of_dma_configure - Setup DMA configuration
 + * @dev: Device to apply DMA configuration
 + *
 + * Try to get devices's DMA configuration from DT and update it
 + * accordingly.
 + *
 + * In case if platform code need to use own special DMA configuration,it
 + * can use Platform bus notifier and handle BUS_NOTIFY_ADD_DEVICE event
 + * to fix up DMA configuration.
 + */
 +static void of_dma_configure(struct device *dev)
 +{
 + u64 dma_addr, paddr, size;
 + int ret;
 +
 + dev-coherent_dma_mask = DMA_BIT_MASK(32);
 + if (!dev-dma_mask)
 + dev-dma_mask = dev-coherent_dma_mask;
 +
 + /*
 +  * if dma-coherent property exist, call arch hook to setup
 +  * dma coherent operations.
 +  */
 + if (of_dma_is_coherent(dev-of_node)) {
 + set_arch_dma_coherent_ops(dev);
 + dev_dbg(dev, device is dma coherent\n);
 + }
 +
 + /*
 +  * if dma-ranges property doesn't exist - just return else
 +  * setup the dma offset
 +  */
 + ret = of_dma_get_range(dev-of_node, dma_addr, paddr, size);
 + if ((ret == -ENODEV) || (ret  0)) {
 + dev_dbg(dev, no dma range information to setup\n);
 + return;
 + }
 +
 + /* DMA ranges found. Calculate and set dma_pfn_offset */
 + dev-dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
 + dev_dbg(dev, dma_pfn_offset(%#08lx)\n, dev-dma_pfn_offset);

I've got two concerns here. of_dma_get_range() retrieves only the first
tuple from the dma-ranges property, but it is perfectly valid for
dma-ranges to contain multiple tuples. How should we handle it if a
device has multiple ranges it can DMA from?

Second, while the pfn offset is being determined, I don't see anything
making use of either the base address or size. How is the device
constrained to only getting DMA buffers from within that range? Is the
driver expected to manage that directly?

g.

 +}
 +
 +/**
   * of_platform_device_create_pdata - Alloc, initialize and register an 
 of_device
   * @np: pointer to node to create device for
   * @bus_id: name to assign device
 @@ -214,9 +258,7 @@ static struct platform_device 
 *of_platform_device_create_pdata(
  #if defined(CONFIG_MICROBLAZE)
   dev-archdata.dma_mask = 0xUL;
  #endif
 - dev-dev.coherent_dma_mask = DMA_BIT_MASK(32);
 - if (!dev-dev.dma_mask)
 - dev-dev.dma_mask = dev-dev.coherent_dma_mask;
 + of_dma_configure(dev-dev);
   dev-dev.bus = platform_bus_type;
   dev-dev.platform_data = platform_data;
  
 diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
 index fd4aee2..c7d9b1b 100644
 --- a/include/linux/dma-mapping.h
 +++ b/include/linux/dma-mapping.h
 @@ -123,6 +123,13 @@ static inline int dma_coerce_mask_and_coherent(struct 
 device *dev, u64 mask)
  
  extern u64 dma_get_required_mask(struct device *dev);
  
 +#ifndef set_arch_dma_coherent_ops
 +static inline int set_arch_dma_coherent_ops(struct device *dev)
 +{
 + return 0;
 +}
 +#endif
 +
  static inline unsigned int dma_get_max_seg_size(struct device *dev)
  {
   return dev-dma_parms ? dev-dma_parms-max_segment_size : 65536;
 -- 
 1.7.9.5
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 2/7] arm64: Decouple page size from level of translation tables

Jungseok,

On Tue, Apr 29, 2014 at 05:59:20AM +0100, Jungseok Lee wrote:
 +choice
 + prompt Level of translation tables
 + default ARM64_3_LEVELS if ARM64_4K_PAGES
 + default ARM64_2_LEVELS if ARM64_64K_PAGES
 + help
 +   Allows level of translation tables.
 +
 +config ARM64_2_LEVELS
 + bool 2 level
 + depends on ARM64_64K_PAGES
 + help
 +   This feature enables 2 levels of translation tables.
 +
 +config ARM64_3_LEVELS
 + bool 3 level
 + depends on ARM64_4K_PAGES
 + help
 +   This feature enables 3 levels of translation tables.
 +
 +endchoice

As I mentioned previously
(http://www.spinics.net/linux/lists/arm-kernel/msg319552.html), just
expose options for the VA space bits rather than the number of levels.
You can still keep the number of levels config options but not visible
in menuconfig (though I think you could also hide them in some header
and avoid config altogether). The VA bits config options can be:

VA_BITS_39 if 4K (3 levels)
VA_BITS_42 if 64K (2 levels)
VA_BITS_47 if 16K (3 levels)
VA_BITS_48 if 4K || 16K || 64K (4/4/3 levels depending on page size)

That's more meaningful to people configuring the kernel.

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 5/5] powercap/rapl: change floor frequency for vallewview

2014-04-29 Thread R, Durgadoss

 -Original Message-
 From: Jacob Pan [mailto:jacob.jun@linux.intel.com]
 Sent: Tuesday, April 29, 2014 6:33 PM
 To: R, Durgadoss
 Cc: Linux PM; Wysocki, Rafael J; LKML; David E. Box; Alan Cox; Accardi, 
 Kristen C
 Subject: Re: [PATCH 5/5] powercap/rapl: change floor frequency for vallewview

 On Tue, 29 Apr 2014 02:45:22 +
 R, Durgadoss durgados...@intel.com wrote:

  Hi Jacob,

   -Original Message-
   From: Jacob Pan [mailto:jacob.jun@linux.intel.com]
   Sent: Monday, April 28, 2014 7:35 PM
   To: Linux PM; Wysocki, Rafael J; LKML
   Cc: David E. Box; Alan Cox; R, Durgadoss; Accardi, Kristen C; Jacob
   Pan Subject: [PATCH 5/5] powercap/rapl: change floor frequency for
   vallewview

   RAPL power limit reduce power by limiting CPU P-state and
   other techniques. On Valleyview, RAPL power limit cannot
   go to LFM (low frequency mode) if we don't set the floor
   frequency via IOSF mailbox.

   This patch enables setting of floor frquency such that
   RAPL power limit is more effective.

   Signed-off-by: Jacob Pan jacob.jun@linux.intel.com
   ---
drivers/powercap/intel_rapl.c | 27 +++
1 file changed, 19 insertions(+), 8 deletions(-)

   diff --git a/drivers/powercap/intel_rapl.c
   b/drivers/powercap/intel_rapl.c index b1cda6f..13e4776 100644
   --- a/drivers/powercap/intel_rapl.c
   +++ b/drivers/powercap/intel_rapl.c
   @@ -32,6 +32,7 @@

#include asm/processor.h
#include asm/cpu_device_id.h
   +#include asm/iosf_mbi.h

/* bitmasks for RAPL MSRs, used by primitive access functions */
#define ENERGY_STATUS_MASK  0x
   @@ -336,11 +337,17 @@ static int find_nr_power_limit(struct
   rapl_domain *rd) return i;
}

   +#define VLV_CPU_POWER_BUDGET_CTL (0x2)
   +static const struct x86_cpu_id valleyview_id[] = {
   + { X86_VENDOR_INTEL, 6, 0x37},
   + {}
   +};

  There are other platforms that have this FloorFreq register as well.
  And those addresses are not '0x02'. So, we need to have a cpu_id based
  table to define the address of the floor freq register as well.
  [This is not specific to valleyview.]

 Sounds like I need to add an abstraction to capture this. So far, there
 are only two exceptions so i was hesitate to do so. Thanks for the
 input.

Yes, We at least have few platforms that need this.

  Also, is there a plan to expose this floor freq ratio through Sysfs
  for runtime configuration. ? May be through a standard thermal
  cooling device interface ?

 why would that be necessary? who will use it? floor freq only affects
 RAPL, AFAIK. In Linux there is no guaranteed freq anyway. My original
 patch to enable RAPL as cooling device was abandoned in favor of
 powercap framework, I am not sure if we should go back.

There are user space thermal controls which change RAPL Power limits
according to platform's thermal condition as you might be aware.

The floor frequency is not used only to transition to LFM ratio. We can
transition to any frequency ratio by adjusting this floor frequency
(at least on VLV and couple more platforms)

Hence while changing RAPL Power Limits, there is a need to adjust
this also, to specify which ratio is our Floor (basically we will not
go below that). That's why we need an interface for modifying this
at run time (along with Power Limits).

Thanks,
Durga

   +
static int set_domain_enable(struct powercap_zone *power_zone,
   bool mode) {
 struct rapl_domain *rd =
   power_zone_to_rapl_domain(power_zone); int nr_powerlimit;
   -
   + u32 mdata = 0;
 if (rd-state  DOMAIN_STATE_BIOS_LOCKED)
 return -EACCES;
 get_online_cpus();
   @@ -350,7 +357,16 @@ static int set_domain_enable(struct
   powercap_zone *power_zone, bool mode)
 /* always enable clamp such that p-state can go below OS
   requested
  * range. power capping priority over guranteed frequency.
  */
   - rapl_write_data_raw(rd, PL1_CLAMP, mode);
   + if (x86_match_cpu(valleyview_id)) {
   + iosf_mbi_read(BT_MBI_UNIT_PMC, BT_MBI_PMC_READ,
   + VLV_CPU_POWER_BUDGET_CTL, mdata);
   + mdata = ~(0x7f  8);
   + mdata |= 1  8;
   + iosf_mbi_write(BT_MBI_UNIT_PMC, BT_MBI_PMC_WRITE,
   + VLV_CPU_POWER_BUDGET_CTL, mdata);
   + } else
   + rapl_write_data_raw(rd, PL1_CLAMP, mode);
   +
 /* some domains have pl2 */
 if (nr_powerlimit  1) {
 rapl_write_data_raw(rd, PL2_ENABLE, mode);
   @@ -833,11 +849,6 @@ static int rapl_write_data_raw(struct
   rapl_domain *rd, return 0;
}

   -static const struct x86_cpu_id energy_unit_quirk_ids[] = {
   - { X86_VENDOR_INTEL, 6, 0x37},/* Valleyview */
   - {}
   -};

  Same thing here. There are other Atom platforms that need this
  conversion quirk. So, please keep the table as is.

  Thanks,
  Durga

   -
static int rapl_check_unit(struct rapl_package *rp, int cpu)
{
 u64 msr_val;
   @@ -859,7 +870,7 @@ static int

Re: [PATCH v4 3/7] arm64: Introduce a kernel configuration option for VA_BITS

On Tue, Apr 29, 2014 at 05:59:23AM +0100, Jungseok Lee wrote:
 +config ARM64_VA_BITS
 + int Virtual address space size
 + range 39 39 if ARM64_4K_PAGES  ARM64_3_LEVELS
 + range 42 42 if ARM64_64K_PAGES  ARM64_2_LEVELS
 + help
 +   This feature is determined by a combination of page size and
 +   level of translation tables.

OK, so you are doing the VA bits selection already. But see my other
email about setting only exposing this and hiding the number of levels
(though number of levels can be mentioned in the help).

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] nohz: Fix iowait overcounting if iowait task migrates

2014-04-29 Thread Frederic Weisbecker

On Thu, Apr 24, 2014 at 08:45:58PM +0200, Denys Vlasenko wrote:
 Before this change, if last IO-blocked task wakes up
 on a different CPU, the original CPU may stay idle for much longer,
 and the entire time it stays idle is accounted as iowait time.
 
 This change adds struct tick_sched::iowait_exittime member.
 On entry to idle, it is set to KTIME_MAX.
 Last IO-blocked task, if migrated, sets it to current time.
 Note that this can happen only once per each idle period:
 new iowaiting tasks can't magically appear on idle CPU's rq.
 
 If iowait_exittime is set, then (iowait_exittime - idle_entrytime)
 gets accounted as iowait, and the remaining (now - iowait_exittime)
 as true idle.
 
 Run-tested: /proc/stat counters no longer go backwards.
 
 Signed-off-by: Denys Vlasenko dvlas...@redhat.com
 Cc: Frederic Weisbecker fweis...@gmail.com
 Cc: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
 Cc: Fernando Luis Vazquez Cao fernando...@lab.ntt.co.jp
 Cc: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp
 Cc: Thomas Gleixner t...@linutronix.de
 Cc: Ingo Molnar mi...@kernel.org
 Cc: Peter Zijlstra pet...@infradead.org
 Cc: Andrew Morton a...@linux-foundation.org
 Cc: Arjan van de Ven ar...@linux.intel.com
 Cc: Oleg Nesterov o...@redhat.com
 ---
  include/linux/tick.h |  2 ++
  kernel/sched/core.c  | 14 +++
  kernel/time/tick-sched.c | 64 
 
  3 files changed, 70 insertions(+), 10 deletions(-)
 
 diff --git a/include/linux/tick.h b/include/linux/tick.h
 index 4de1f9e..1bf653e 100644
 --- a/include/linux/tick.h
 +++ b/include/linux/tick.h
 @@ -67,6 +67,7 @@ struct tick_sched {
   ktime_t idle_exittime;
   ktime_t idle_sleeptime;
   ktime_t iowait_sleeptime;
 + ktime_t iowait_exittime;
   seqcount_t  idle_sleeptime_seq;
   ktime_t sleep_length;
   unsigned long   last_jiffies;
 @@ -140,6 +141,7 @@ extern void tick_nohz_irq_exit(void);
  extern ktime_t tick_nohz_get_sleep_length(void);
  extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
  extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
 +extern void tick_nohz_iowait_to_idle(int cpu);
  
  # else /* !CONFIG_NO_HZ_COMMON */
  static inline int tick_nohz_tick_stopped(void)
 diff --git a/kernel/sched/core.c b/kernel/sched/core.c
 index 268a45e..ffea757 100644
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -4218,7 +4218,14 @@ void __sched io_schedule(void)
   current-in_iowait = 1;
   schedule();
   current-in_iowait = 0;
 +#ifdef CONFIG_NO_HZ_COMMON
 + if (atomic_dec_and_test(rq-nr_iowait)) {
 + if (raw_smp_processor_id() != cpu_of(rq))
 + tick_nohz_iowait_to_idle(cpu_of(rq));
 + }
 +#else
   atomic_dec(rq-nr_iowait);
 +#endif
   delayacct_blkio_end();
  }
  EXPORT_SYMBOL(io_schedule);
 @@ -4234,7 +4241,14 @@ long __sched io_schedule_timeout(long timeout)
   current-in_iowait = 1;
   ret = schedule_timeout(timeout);
   current-in_iowait = 0;
 +#ifdef CONFIG_NO_HZ_COMMON
 + if (atomic_dec_and_test(rq-nr_iowait)) {
 + if (raw_smp_processor_id() != cpu_of(rq))
 + tick_nohz_iowait_to_idle(cpu_of(rq));
 + }
 +#else
   atomic_dec(rq-nr_iowait);
 +#endif
   delayacct_blkio_end();
   return ret;
  }
 diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
 index 47ed7cf..d78c942 100644
 --- a/kernel/time/tick-sched.c
 +++ b/kernel/time/tick-sched.c
 @@ -408,15 +408,27 @@ static void tick_nohz_update_jiffies(ktime_t now)
  
  static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now)
  {
 - ktime_t delta;
 + ktime_t delta, entry, end;
  
   /* Updates the per cpu time idle statistics counters */
   write_seqcount_begin(ts-idle_sleeptime_seq);
 - delta = ktime_sub(now, ts-idle_entrytime);
 - if (ts-idle_active == 2)
 + entry = ts-idle_entrytime;
 + delta = ktime_sub(now, entry);
 + if (ts-idle_active == 2) {
 + end = ts-iowait_exittime;
 + if (end.tv64 != KTIME_MAX) {
 + /*
 +  * Last iowaiting task on our rq was woken up on other 
 CPU
 +  * sometime in the past, it updated ts-iowait_exittime.
 +  */
 + delta = ktime_sub(now, end);
 + ts-idle_sleeptime = ktime_add(ts-idle_sleeptime, 
 delta);
 + delta = ktime_sub(end, entry);
 + }
   ts-iowait_sleeptime = ktime_add(ts-iowait_sleeptime, delta);
 - else
 + } else {
   ts-idle_sleeptime = ktime_add(ts-idle_sleeptime, delta);
 + }
   ts-idle_active = 0;
   write_seqcount_end(ts-idle_sleeptime_seq);
  
 @@ -430,6 +442,7 @@ static ktime_t

Re: [PATCH v2 00/10] arm64: UEFI support

2014-04-29 Thread Matt Fleming

On Tue, 29 Apr, at 02:47:28PM, Catalin Marinas wrote:
 
 Waiting for the tip/x86/efi to be merged first is not a problem. We
 also need a stable base for testing the arm64 UEFI series, so I assume
 this series can be based onto tip/x86/efi (would such branch be rebased
 before hitting mainline?).
 
tip/x86/efi is unlikely to be rebased. Certainly with dependencies like
this there would have to be a really good reason to rebase it.

 Given that Leif's series contains both generic efi and arm64 patches,
 what's your preference for merging them? I'm happy to add my ack and
 they go via your tree (or the other way around).

I'm happy either way, though if I take them through my tree (and
subsequently through tip) you won't have to worry about the merge window
rigmarole, which is a plus.

So, eveyone happy for me to take these with Catalin's Acked-by?

-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: line6: fix possible overrun

At Mon, 28 Apr 2014 01:44:25 +0300,
Dan Carpenter wrote:
 
 On Sun, Apr 27, 2014 at 10:00:43PM +0200, Mateusz Guzik wrote:
 and a WARN_ON + -EINVAL in line6_init_audio to catch future
 offenders.
   
   Returning -EINVAL is a bad idea because it would break the driver
   completely and make it unusable.
   
  
  Well I would vote for returning the error anyway.
 
 I'm trying to be polite, but you are talking about adding regressions
 deliberately...
 
 It's very rare for people to deliberately add regressions to the kernel.
 I have only seen it one time before.

I don't think Dan would be against returning -EINVAL if all the
offender codes have been fixed (e.g. truncating strings to fit with
the fixed arrays) at first.  Then it'd be a good help to catch any
future bugs.  But, having -EINVAL without fixing the caller side means
essentially that you're introducing the breakage intentionally
although you know it certainly breaks, which is obviously bad.


Takashi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] drivercore: deferral race condition fix

2014-04-29 Thread Grant Likely

On Tue, Apr 29, 2014 at 2:13 PM, Greg Kroah-Hartman
gre...@linuxfoundation.org wrote:
 On Tue, Apr 29, 2014 at 01:35:09PM +0100, Grant Likely wrote:
 When the kernel is built with CONFIG_PREEMPT it is possible to reach a state
 when all modules loaded but some driver still stuck in the deferred list
 and there is a need for external event to kick the deferred queue to probe
 these drivers.
[...]
 Hi Greg,

 This change needs to go into 3.15. I've got this patch in the
 devicetree/merge branch of my tree and can ask Linus to pull it directly
 if you would like.

 Sure, that would be fine:

 Acked-by: Greg Kroah-Hartman gre...@linuxfoundation.org

Thanks Greg

I'll give it a few days in linux-next and then ask Linus to pull.

g.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ACPI / EC: Process rather than discard events in acpi_ec_clear

2014-04-29 Thread Kieran Clancy

Address a regression caused by commit ad332c8a4533:
(ACPI / EC: Clear stale EC events on Samsung systems)

After the earlier patch, there was found to be a race condition on some
earlier Samsung systems (N150/N210/N220). The function acpi_ec_clear was
sometimes discarding a new EC event before its GPE was triggered by the
system. In the case of these systems, this meant that the lid open
event was not registered on resume if that was the cause of the wake,
leading to problems when attempting to close the lid to suspend again.

After testing on a number of Samsung systems, both those affected by the
previous EC bug and those affected by the race condition, it seemed that
the best course of action was to process rather than discard the events.
On Samsung systems which accumulate stale EC events, there does not seem
to be any adverse side-effects of running the associated _Q methods.

This patch adds an argument to the static function acpi_ec_sync_query so
that it may be used within the acpi_ec_clear loop in place of
acpi_ec_query_unlocked which was used previously.

With thanks to Stefan Biereigel for reporting the issue, and for all the
people who helped test the new patch on affected systems.

References: https://lkml.kernel.org/r/532fe3b2.9060...@biereigel-wb.de
References: https://bugzilla.kernel.org/show_bug.cgi?id=44161#c173
Reported-by: Stefan Biereigel ste...@biereigel.de
Signed-off-by: Kieran Clancy clancy.kie...@gmail.com
Tested-by: Stefan Biereigel ste...@biereigel.de
Tested-by: Dennis Jansen dennis.jan...@web.de
Tested-by: Nicolas Porcel nicolasporce...@gmail.com
Tested-by: Maurizio D'Addona mauritiusd...@gmail.com
Tested-by: Juan Manuel Cabo juanmanuel.c...@gmail.com
Tested-by: Giannis Koutsou giannis.kout...@gmail.com
Tested-by: Kieran Clancy clancy.kie...@gmail.com
Cc: Lan Tianyu tianyu@intel.com
---

To maintainers: Assuming this patch is accepted, please mark this for
inclusion in all -stable trees. It should be noted that the previous
patch (ad332c8a4533) was excluded from a number of stable trees after
the regression was found, but should now be included again along with
this patch. I am not sure of the correct way to annotate this above.

 drivers/acpi/ec.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index d7d32c2..ad11ba4 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -206,13 +206,13 @@ unlock:
spin_unlock_irqrestore(ec-lock, flags);
 }
 
-static int acpi_ec_sync_query(struct acpi_ec *ec);
+static int acpi_ec_sync_query(struct acpi_ec *ec, u8 *data);
 
 static int ec_check_sci_sync(struct acpi_ec *ec, u8 state)
 {
if (state  ACPI_EC_FLAG_SCI) {
if (!test_and_set_bit(EC_FLAGS_QUERY_PENDING, ec-flags))
-   return acpi_ec_sync_query(ec);
+   return acpi_ec_sync_query(ec, NULL);
}
return 0;
 }
@@ -443,10 +443,8 @@ acpi_handle ec_get_handle(void)
 
 EXPORT_SYMBOL(ec_get_handle);
 
-static int acpi_ec_query_unlocked(struct acpi_ec *ec, u8 *data);
-
 /*
- * Clears stale _Q events that might have accumulated in the EC.
+ * Process _Q events that might have accumulated in the EC.
  * Run with locked ec mutex.
  */
 static void acpi_ec_clear(struct acpi_ec *ec)
@@ -455,7 +453,7 @@ static void acpi_ec_clear(struct acpi_ec *ec)
u8 value = 0;
 
for (i = 0; i  ACPI_EC_CLEAR_MAX; i++) {
-   status = acpi_ec_query_unlocked(ec, value);
+   status = acpi_ec_sync_query(ec, value);
if (status || !value)
break;
}
@@ -582,13 +580,18 @@ static void acpi_ec_run(void *cxt)
kfree(handler);
 }
 
-static int acpi_ec_sync_query(struct acpi_ec *ec)
+static int acpi_ec_sync_query(struct acpi_ec *ec, u8 *data)
 {
u8 value = 0;
int status;
struct acpi_ec_query_handler *handler, *copy;
-   if ((status = acpi_ec_query_unlocked(ec, value)))
+
+   status = acpi_ec_query_unlocked(ec, value);
+   if (data)
+   *data = value;
+   if (status)
return status;
+
list_for_each_entry(handler, ec-list, node) {
if (value == handler-query_bit) {
/* have custom handler for this bit */
@@ -612,7 +615,7 @@ static void acpi_ec_gpe_query(void *ec_cxt)
if (!ec)
return;
mutex_lock(ec-mutex);
-   acpi_ec_sync_query(ec);
+   acpi_ec_sync_query(ec, NULL);
mutex_unlock(ec-mutex);
 }
 
-- 
1.9.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RESEND PATCH V5 0/8] remove cpu_load idx

2014-04-29 Thread Morten Rasmussen

On Wed, Apr 16, 2014 at 03:43:21AM +0100, Alex Shi wrote:
 In the cpu_load decay usage, we mixed the long term, short term load with
 balance bias, randomly pick a big or small value according to balance 
 destination or source.

I disagree that it is random. min()/max() in {source,target}_load()
provides a conservative bias to the load estimate that should prevent us
from trying to pull tasks from the source cpu if its current load is
just a temporary spike. Likewise, we don't try to pull tasks to the
target cpu if the load is just a temporary drop.

 This mix is wrong, the balance bias should be based
 on task moving cost between cpu groups, not on random history or instant load.

Your patch set actually changes everything to be based on the instant
load alone. rq-cfs.runnable_load_avg is updated instantaneously when
tasks are enqueued and deqeueue, so this load expression is quite volatile.

What do you mean by task moving cost?

 History load maybe diverage a lot from real load, that lead to incorrect bias.
 
 like on busy_idx,
 We mix history load decay and bias together. The ridiculous thing is, when 
 all cpu load are continuous stable, long/short term load is same. then we 
 lose the bias meaning, so any minimum imbalance may cause unnecessary task
 moving. To prevent this funny thing happen, we have to reuse the 
 imbalance_pct again in find_busiest_group().  But that clearly causes over
 bias in normal time. If there are some burst load in system, it is more worse.

Isn't imbalance_pct only used once in the periodic load-balance path?

It is not clear to me what the over bias problem is. If you have a
stable situation, I would expect the long and short term load to be the
same?

 As to idle_idx:
 Though I have some cencern of usage corretion, 
 https://lkml.org/lkml/2014/3/12/247 but since we are working on cpu
 idle migration into scheduler. The problem will be reconsidered. We don't
 need to care it too much now.
 
 In fact, the cpu_load decays can be replaced by the sched_avg decay, that 
 also decays load on time. The balance bias part can fullly use fixed bias --
 imbalance_pct, which is already used in newly idle, wake, forkexec balancing
 and numa balancing scenarios.

As I have said previously, I agree that cpu_load[] is somewhat broken in
its current form, but I don't see how removing it and replacing it with
the instantaneous cpu load solves the problems you point out.

The current cpu_load[] averages the cpu_load over time, while
rq-cfs.runnable_load_avg is the sum of the currently runnable tasks'
load_avg_contrib. The former provides a long term view of the cpu_load,
the latter does not. It can change radically in an instant. I'm
therefore a bit concerned about the stability of the load-balance
decisions. However, since most decisions are based on cpu_load[0]
anyway, we could try setting LB_BIAS to false as Peter suggests and see
what happens.

Morten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] percpu_ida: introduce kobject for percpu_ida pool

So that we can export some allocation/free information
for monitoring percpu_ida performance.

Signed-off-by: Ming Lei tom.leim...@gmail.com
---
 include/linux/percpu_ida.h |   16 
 lib/percpu_ida.c   |   21 ++---
 2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/include/linux/percpu_ida.h b/include/linux/percpu_ida.h
index f5cfdd6..463e3b3 100644
--- a/include/linux/percpu_ida.h
+++ b/include/linux/percpu_ida.h
@@ -8,6 +8,7 @@
 #include linux/spinlock_types.h
 #include linux/wait.h
 #include linux/cpumask.h
+#include linux/kobject.h
 
 struct percpu_ida_cpu;
 
@@ -52,6 +53,8 @@ struct percpu_ida {
unsignednr_free;
unsigned*freelist;
} cacheline_aligned_in_smp;
+
+   struct kobject kobj;
 };
 
 /*
@@ -79,4 +82,17 @@ int percpu_ida_for_each_free(struct percpu_ida *pool, 
percpu_ida_cb fn,
void *data);
 
 unsigned percpu_ida_free_tags(struct percpu_ida *pool, int cpu);
+
+static inline int percpu_ida_kobject_add(struct percpu_ida *pool,
+   struct kobject *parent, const char *name)
+{
+   if (pool-kobj.state_initialized)
+   return kobject_add(pool-kobj, parent, name);
+   return 0;
+}
+static inline void percpu_ida_kobject_del(struct percpu_ida *pool)
+{
+   if (pool-kobj.state_in_sysfs)
+   kobject_del(pool-kobj);
+}
 #endif /* __PERCPU_IDA_H__ */
diff --git a/lib/percpu_ida.c b/lib/percpu_ida.c
index 93d145e..56ae350 100644
--- a/lib/percpu_ida.c
+++ b/lib/percpu_ida.c
@@ -260,6 +260,20 @@ void percpu_ida_free(struct percpu_ida *pool, unsigned tag)
 }
 EXPORT_SYMBOL_GPL(percpu_ida_free);
 
+static void percpu_ida_release(struct kobject *kobj)
+{
+   struct percpu_ida *pool = container_of(kobj,
+   struct percpu_ida, kobj);
+
+   free_percpu(pool-tag_cpu);
+   free_pages((unsigned long) pool-freelist,
+  get_order(pool-nr_tags * sizeof(unsigned)));
+}
+
+static struct kobj_type percpu_ida_ktype = {
+   .release= percpu_ida_release,
+};
+
 /**
  * percpu_ida_destroy - release a tag pool's resources
  * @pool: pool to free
@@ -268,9 +282,8 @@ EXPORT_SYMBOL_GPL(percpu_ida_free);
  */
 void percpu_ida_destroy(struct percpu_ida *pool)
 {
-   free_percpu(pool-tag_cpu);
-   free_pages((unsigned long) pool-freelist,
-  get_order(pool-nr_tags * sizeof(unsigned)));
+   if (pool-kobj.state_initialized)
+   kobject_put(pool-kobj);
 }
 EXPORT_SYMBOL_GPL(percpu_ida_destroy);
 
@@ -324,6 +337,8 @@ int __percpu_ida_init(struct percpu_ida *pool, unsigned 
long nr_tags,
for_each_possible_cpu(cpu)
spin_lock_init(per_cpu_ptr(pool-tag_cpu, cpu)-lock);
 
+   kobject_init(pool-kobj, percpu_ida_ktype);
+
return 0;
 err:
percpu_ida_destroy(pool);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] percpu_ida: support exporting allocation/free info via sysfs

With this information, it is easy to monitor percpu_ida
performance.

Signed-off-by: Ming Lei tom.leim...@gmail.com
---
 include/linux/percpu_ida.h |   24 
 lib/Kconfig|7 +++
 lib/percpu_ida.c   |  130 +++-
 3 files changed, 159 insertions(+), 2 deletions(-)

diff --git a/include/linux/percpu_ida.h b/include/linux/percpu_ida.h
index 463e3b3..be1036d 100644
--- a/include/linux/percpu_ida.h
+++ b/include/linux/percpu_ida.h
@@ -12,6 +12,27 @@
 
 struct percpu_ida_cpu;
 
+#ifdef CONFIG_PERCPU_IDA_STATS
+struct percpu_ida_stats {
+   u64 alloc_tags;
+   u64 alloc_in_fastpath;
+   u64 alloc_from_global_pool;
+   u64 alloc_by_stealing;
+   u64 alloc_after_sched;
+
+   u64 freed_tags;
+   u64 freed_empty;
+   u64 freed_full;
+};
+
+#define percpu_ida_inc(pool, ptr)  \
+do {   \
+   __this_cpu_inc(pool-stats-ptr);   \
+} while (0)
+#else
+#define percpu_ida_inc(pool, ptr)  do {} while (0)
+#endif
+
 struct percpu_ida {
/*
 * number of tags available to be allocated, as passed to
@@ -55,6 +76,9 @@ struct percpu_ida {
} cacheline_aligned_in_smp;
 
struct kobject kobj;
+#ifdef CONFIG_PERCPU_IDA_STATS
+   struct percpu_ida_stats __percpu *stats;
+#endif
 };
 
 /*
diff --git a/lib/Kconfig b/lib/Kconfig
index 325a8d4..d47a1cf 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -476,6 +476,13 @@ config OID_REGISTRY
help
  Enable fast lookup object identifier registry.
 
+config PERCPU_IDA_STATS
+   bool Export percpu_ida status by sysfs
+   default n
+   help
+ Export percpu_ida allocation/free information so
+ the performance can be monitored.
+
 config UCS2_STRING
 tristate
 
diff --git a/lib/percpu_ida.c b/lib/percpu_ida.c
index 56ae350..6f6c68d 100644
--- a/lib/percpu_ida.c
+++ b/lib/percpu_ida.c
@@ -42,6 +42,105 @@ struct percpu_ida_cpu {
unsignedfreelist[];
 };
 
+#ifdef CONFIG_PERCPU_IDA_STATS
+struct pcpu_ida_sysfs_entry {
+   struct attribute attr;
+   ssize_t (*show)(struct percpu_ida *, char *);
+};
+
+#define pcpu_ida_show(field, fmt)  \
+static ssize_t field##_show(struct percpu_ida *pool, char *buf)
\
+{  \
+   u64 val = 0;\
+   ssize_t rc; \
+   unsigned cpu;   \
+   \
+   for_each_possible_cpu(cpu)  \
+   val += per_cpu_ptr(pool-stats, cpu)-field;\
+   \
+   rc = sprintf(buf, fmt, val);\
+   return rc;  \
+}
+
+#define PERCPU_IDA_ATTR_RO(_name)  \
+   struct pcpu_ida_sysfs_entry pcpu_ida_attr_##_name = __ATTR_RO(_name)
+
+#define pcpu_ida_attr_ro(field, fmt)   \
+   pcpu_ida_show(field, fmt)   \
+   static PERCPU_IDA_ATTR_RO(field)
+
+pcpu_ida_attr_ro(alloc_tags, %lld\n);
+pcpu_ida_attr_ro(alloc_in_fastpath, %lld\n);
+pcpu_ida_attr_ro(alloc_from_global_pool, %lld\n);
+pcpu_ida_attr_ro(alloc_by_stealing, %lld\n);
+pcpu_ida_attr_ro(alloc_after_sched, %lld\n);
+pcpu_ida_attr_ro(freed_tags, %lld\n);
+pcpu_ida_attr_ro(freed_empty, %lld\n);
+pcpu_ida_attr_ro(freed_full, %lld\n);
+
+ssize_t pcpu_ida_sysfs_max_size_show(struct percpu_ida *pool, char *page)
+{
+   ssize_t rc = sprintf(page, %u\n, pool-percpu_max_size);
+   return rc;
+}
+
+static struct pcpu_ida_sysfs_entry pcpu_ida_attr_max_size = {
+   .attr = {.name = percpu_max_size, .mode = S_IRUGO},
+   .show = pcpu_ida_sysfs_max_size_show,
+};
+
+ssize_t pcpu_ida_sysfs_batch_size_show(struct percpu_ida *pool, char *page)
+{
+   ssize_t rc = sprintf(page, %u\n, pool-percpu_batch_size);
+   return rc;
+}
+
+static struct pcpu_ida_sysfs_entry pcpu_ida_attr_batch_size = {
+   .attr = {.name = percpu_batch_size, .mode = S_IRUGO},
+   .show = pcpu_ida_sysfs_batch_size_show,
+};
+
+static ssize_t percpu_ida_sysfs_show(struct kobject *kobj,
+   struct attribute *attr, char *page)
+{
+   struct pcpu_ida_sysfs_entry *entry;
+   struct percpu_ida *pool;
+   ssize_t res = -EIO;
+
+   entry = container_of(attr, struct pcpu_ida_sysfs_entry, attr);
+   pool = container_of(kobj, struct percpu_ida, kobj);
+
+   if (!entry-show)
+   return res;
+   res = entry-show(pool, page);
+   return res;
+}
+
+static struct attribute *percpu_ida_def_attrs[] = {
+

[PATCH 0/3] percpu_ida: support to export allocation/free information

Hi,

These patches support to export percpu_ida allocation/free information
via sysfs, so that percpu_ida performance can be monitored, and
at least two use cases:

- some parameters(such as percpu_max_size) from its users are very
sensitive to performance

- the data is helpful for verifying patches which try to improve
percpu_ida


Thanks,
--
Ming Lei

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 4/7] arm64: Add a description on 48-bit address space with 4KB pages

On Tue, Apr 29, 2014 at 05:59:27AM +0100, Jungseok Lee wrote:
 --- a/Documentation/arm64/memory.txt
 +++ b/Documentation/arm64/memory.txt
 @@ -8,10 +8,11 @@ This document describes the virtual memory layout used by 
 the AArch64
  Linux kernel. The architecture allows up to 4 levels of translation
  tables with a 4KB page size and up to 3 levels with a 64KB page size.
  
 -AArch64 Linux uses 3 levels of translation tables with the 4KB page
 -configuration, allowing 39-bit (512GB) virtual addresses for both user
 -and kernel. With 64KB pages, only 2 levels of translation tables are
 -used but the memory layout is the same.
 +AArch64 Linux uses 3 levels and 4 levels of translation tables with
 +the 4KB page configuration, allowing 39-bit (512GB) and 48-bit (256TB)
 +virtual addresses, respectively, for both user and kernel. With 64KB
 +pages, only 2 levels of translation tables are used but the memory layout
 +is the same.

Any reason why we couldn't use 48-bit address space with 64K pages
(implying 3 levels)?

 -AArch64 Linux memory layout with 64KB pages:
 +AArch64 Linux memory layout with 4KB pages + 4 levels:
 +
 +StartEnd SizeUse
 +---
 +  256TB  user
 +
 + 7bfe~124TB  vmalloc

BTW, maybe as a separate patch we should change the end to be
exclusive. It becomes harder to modify (I've been through this a few
times already ;)) and even follow the changes.

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 00/10] arm64: UEFI support

On 04/29/2014 07:47 AM, Matt Fleming wrote:
 On Tue, 29 Apr, at 02:47:28PM, Catalin Marinas wrote:

 Waiting for the tip/x86/efi to be merged first is not a problem. We
 also need a stable base for testing the arm64 UEFI series, so I assume
 this series can be based onto tip/x86/efi (would such branch be rebased
 before hitting mainline?).
  
 tip/x86/efi is unlikely to be rebased. Certainly with dependencies like
 this there would have to be a really good reason to rebase it.
 
 Given that Leif's series contains both generic efi and arm64 patches,
 what's your preference for merging them? I'm happy to add my ack and
 they go via your tree (or the other way around).
 
 I'm happy either way, though if I take them through my tree (and
 subsequently through tip) you won't have to worry about the merge window
 rigmarole, which is a plus.
 
 So, eveyone happy for me to take these with Catalin's Acked-by?
 

I'm wondering if it would be better to organize it into a separate topic
branch.  We can still take it through tip, if you want, but it would be
better than putting it all into one tree.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] bio: modify __bio_add_page() to accept pages that don't start a new segment

2014-04-29 Thread Maurizio Lombardi

The original behaviour is to refuse to add a new page if the maximum number
of segments has been reached, regardless of the fact the page we are
going to add can be merged into the last segment or not.

Unfortunately, when the system runs under heavy memory fragmentation conditions,
a driver may try to add multiple pages to the last segment.
The original code won't accept them and EBUSY will be reported to
userspace.

This patch modifies the function so it refuses to add a page
only in case the latter starts a new segment and the maximum number
of segments has already been reached.

The bug can be easily reproduced with the st driver:

1) set CONFIG_SCSI_MPT2SAS_MAX_SGE or CONFIG_SCSI_MPT3SAS_MAX_SGE  to 16
2) modprobe st buffer_kbs=1024
3) #dd if=/dev/zero of=/dev/st0 bs=1M count=10
   dd: error writing ‘/dev/st0’: Device or resource busy

Signed-off-by: Maurizio Lombardi mlomb...@redhat.com
---
 fs/bio.c | 50 --
 1 file changed, 28 insertions(+), 22 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 6f0362b..9a3a0b1 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -750,29 +750,31 @@ static int __bio_add_page(struct request_queue *q, struct 
bio *bio, struct page
return 0;
 
/*
-* we might lose a segment or two here, but rather that than
-* make this too complex.
+* setup the new entry, we might clear it again later if we
+* cannot add the page
+*/
+   bvec = bio-bi_io_vec[bio-bi_vcnt];
+   bvec-bv_page = page;
+   bvec-bv_len = len;
+   bvec-bv_offset = offset;
+   bio-bi_vcnt++;
+   bio-bi_phys_segments++;
+
+   /*
+* Perform a recount if the number of segments is greater
+* than queue_max_segments(q).
 */
 
-   while (bio-bi_phys_segments = queue_max_segments(q)) {
+   while (bio-bi_phys_segments  queue_max_segments(q)) {
 
if (retried_segments)
-   return 0;
+   goto failed;
 
retried_segments = 1;
blk_recount_segments(q, bio);
}
 
/*
-* setup the new entry, we might clear it again later if we
-* cannot add the page
-*/
-   bvec = bio-bi_io_vec[bio-bi_vcnt];
-   bvec-bv_page = page;
-   bvec-bv_len = len;
-   bvec-bv_offset = offset;
-
-   /*
 * if queue has other restrictions (eg varying max sector size
 * depending on offset), it can specify a merge_bvec_fn in the
 * queue to get further control
@@ -789,23 +791,27 @@ static int __bio_add_page(struct request_queue *q, struct 
bio *bio, struct page
 * merge_bvec_fn() returns number of bytes it can accept
 * at this offset
 */
-   if (q-merge_bvec_fn(q, bvm, bvec)  bvec-bv_len) {
-   bvec-bv_page = NULL;
-   bvec-bv_len = 0;
-   bvec-bv_offset = 0;
-   return 0;
-   }
+   if (q-merge_bvec_fn(q, bvm, bvec)  bvec-bv_len)
+   goto failed;
}
 
/* If we may be able to merge these biovecs, force a recount */
-   if (bio-bi_vcnt  (BIOVEC_PHYS_MERGEABLE(bvec-1, bvec)))
+   if (bio-bi_vcnt  1  (BIOVEC_PHYS_MERGEABLE(bvec-1, bvec)))
bio-bi_flags = ~(1  BIO_SEG_VALID);
 
-   bio-bi_vcnt++;
-   bio-bi_phys_segments++;
  done:
bio-bi_iter.bi_size += len;
return len;
+
+ failed:
+   bvec-bv_page = NULL;
+   bvec-bv_len = 0;
+   bvec-bv_offset = 0;
+   bio-bi_vcnt--;
+   if (!retried_segments)
+   bio-bi_phys_segments--;
+
+   return 0;
 }
 
 /**
-- 
Maurizio Lombardi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] blk-mq: add percpu_ida kobjects

So that the percpu_ida performance can be monitored.

Signed-off-by: Ming Lei tom.leim...@gmail.com
---
 block/blk-mq-sysfs.c |7 +++
 1 file changed, 7 insertions(+)

diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 8145b5b..4171ae2 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -329,6 +329,8 @@ void blk_mq_unregister_disk(struct gendisk *disk)
kobject_del(ctx-kobj);
kobject_put(ctx-kobj);
}
+   percpu_ida_kobject_del(hctx-tags-free_tags);
+   percpu_ida_kobject_del(hctx-tags-reserved_tags);
kobject_del(hctx-kobj);
kobject_put(hctx-kobj);
}
@@ -362,6 +364,11 @@ int blk_mq_register_disk(struct gendisk *disk)
if (ret)
break;
 
+   percpu_ida_kobject_add(hctx-tags-free_tags,
+   hctx-kobj, free_tags);
+   percpu_ida_kobject_add(hctx-tags-reserved_tags,
+   hctx-kobj, reserved_tags);
+
if (!hctx-nr_ctx)
continue;
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: line6: fix possible overrun

2014-04-29 Thread Dan Carpenter

Yeah.  If this were a brand new driver then returning -EINVAL would be a
good idea.

Smatch actually warns about this code as well if you turn on the
--spammy option.  But there are too many of these kinds of warnings and
even I can't check them all so the warning is basically useless.

In a few months I will have improved the Smatch code to know that the
source string is too large so this bug could have been avoided.

regards,
dan carpenter
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RESEND PATCH V5 0/8] remove cpu_load idx

2014-04-29 Thread Morten Rasmussen

On Thu, Apr 24, 2014 at 05:20:29PM +0100, Peter Zijlstra wrote:
 OK, this series is a lot saner, with the exception of 3/8 and
 dependents.
 
 I do still worry a bit for loosing the longer term view for the big
 domains though. Sadly I don't have any really big machines.
 
 I think the entire series is equivalent to setting LB_BIAS to false. So
 I suppose we could do that for a while and if nobody reports horrible
 things we could just do this.
 
 Anybody?

I can't say what will happen on big machines, but I think the LB_BIAS
test could be a way to see what happens. I'm not convinced that it won't
lead to more task migrations since we will use the instantaneous cpu
load (weighted_cpuload()) unfiltered.

Morten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: l2c: prima2: only call l2x0_of_init() on matching nodes

2014-04-29 Thread Barry Song

2014-04-28 22:52 GMT+08:00 Russell King - ARM Linux li...@arm.linux.org.uk:
 On Mon, Apr 28, 2014 at 10:37:09AM -0400, Matt Porter wrote:
 The fix is tested against bcm281xx and bcm21664 as that is what the
 l2c cleanup breaks in -next. As mentioned, I don't have the sirfsoc h/w
 so this first attempt at a fix also breaks their platform. It can be
 addressed by adding those platform specific compatibles back to the dts,
 of course.

 I'd much prefer that the sirfsoc folks fix this...it's going to break
 other platforms in a multi v7 build.

 Well, it's about time we got rid of this from platform specific code
 anyway, taking it away from platform maintainers to mess around with.
 So that's what I'm doing.

 It's worth noting that if you build a single zImage with exynos also
 enabled, then you also end up with an unconditional call from that
 code to l2x0_of_init() with it's own magic numbers - and that applies
 before my changes.

 So let's fix this properly and yank this crap from platform maintainers
 fingers.

i mentioned dropping specific dts compatible prop will break non-csr
platforms in the mail thread ARM: prima2: remove L2 cache size
override and i said i was going to send v2. you said you need it
before rc6. now it has been sent, but i am sorry it is not against
next-20140424.


 --
 FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
 improving, and getting towards what was expected from it.

-barry
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] rwsem: Support optimistic spinning

On Mon, Apr 28, 2014 at 05:50:49PM -0700, Tim Chen wrote:
 On Mon, 2014-04-28 at 16:10 -0700, Paul E. McKenney wrote:
 
   +#ifdef CONFIG_SMP
   +static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
   +{
   + int retval;
   + struct task_struct *owner;
   +
   + rcu_read_lock();
   + owner = ACCESS_ONCE(sem-owner);
  
  OK, I'll bite...
  
  Why ACCESS_ONCE() instead of rcu_dereference()?
 
 We're using it as a speculative check on the sem-owner to see
 if the owner is running on the cpu.  The rcu_read_lock
 is used for ensuring that the owner-on_cpu memory is
 still valid.

OK, so if we read complete garbage, all that happens is that we
lose a bit of performance?  If so, I am OK with it as long as there
is a comment (which Davidlohr suggested later in this thread).

Thanx, Paul

  (My first question was where is the update side, but this is covered
  by task_struct allocation and deallocation.)
 
 Tim
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: l2c: prima2: only call l2x0_of_init() on matching nodes

2014-04-29 Thread Barry Song

2014-04-28 21:40 GMT+08:00 Matt Porter mpor...@linaro.org:
 On Mon, Apr 28, 2014 at 10:15:33AM +0100, Russell King wrote:
 On Sun, Apr 27, 2014 at 08:27:40PM -0400, Matt Porter wrote:
  l2x0_of_init() is executed unconditionally within the sirfsoc_l2x0_init()
  early initcall. In a multi v7 kernel this causes bcm281xx and bcm21664
  platform to fail boot since they have their own pre l2x0 init sequence
  that is required. Fix this by checking that a matching OF ID is present
  before calling l2x0_of_init().
 
  Reported-by: Kevin Hilman khil...@linaro.org
  Signed-off-by: Matt Porter mpor...@linaro.org
  ---
  Applies against next-20140424 to fix the issue introduced in
  50655e6 ARM: l2c: prima2: remove cache size override

 Err, this only fixes it because it effectively disables the L2 cache
 _entirely_ - in the above commit, I kill your private compatible strings.

 This doesn't make sense.  If the cache is already enabled, then the L2C
 code won't try to enable it again.

 Ok, please suggest an alternative. You merged this commit..it looks like
 it had no ack from the platform maintainer..and I don't have hardware.
 The commit is wrong, we can't have every platform executing sirfsoc's
 l2x0_of_init() call/parameters by having this stuff in an early initcall
 like that.

 It would be pretty straightforward to add those private compatibles
 back so the approach works. If not, we need to move this to
 .init_machine where it's guaranteed to only run on sirfsoc.

these has been one V1 patch at
http://permalink.gmane.org/gmane.linux.ports.arm.kernel/316312
my v2 has moved to init_irq() as Russell's suggestion.


 -Matt

-barry
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 0/7] Generic serial earlycon

2014-04-29 Thread Rob Herring

On Tue, Apr 29, 2014 at 6:09 AM, Catalin Marinas
catalin.mari...@arm.com wrote:
 On Fri, Apr 18, 2014 at 11:19:53PM +0100, Rob Herring wrote:
 Rob Herring (7):
   x86: move FIX_EARLYCON_MEM kconfig into x86
   tty/serial: add generic serial earlycon
   tty/serial: convert 8250 to generic earlycon
   tty/serial: pl011: add generic earlycon support
   tty/serial: add arm/arm64 semihosting earlycon
   arm64: enable FIX_EARLYCON_MEM kconfig
   arm64: remove arch specific earlyprintk

 The series looks fine, you can add:

 Acked-by: Catalin Marinas catalin.mari...@arm.com


Thanks.

 BTW, are you merging all of them via some other tree or would prefer me
 to take the arm64-specific patches?

Greg has taken it, but there were a few issues, so it may get reposted.

Rob
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: OFD (file private) locks and NFS

2014-04-29 Thread Frank Filz

 On Tue, 29 Apr 2014 07:40:08 -0400 (EDT) Matt W. Benjamin
 m...@linuxbox.com wrote:
 
  Hi Jeff,
 
  Something which came up on the last Ganesha conn call is that we have
  a pretty strong need for some ability to wait on a set of locks, and
  perhaps receive events.  Frank Filz believed that you had made a
  proposal which would cover this.  Can you elaborate on that?
 
  Thanks,
 
  Matt
 
 
 No, there's no mechanism to wait on a set of locks from within the context
of
 a single thread of execution or to receive events. Again, that would be
a
 new API beyond what I've been proposing over the last several months.

Some kind of facility to enable one user space thread to wait on multiple
blocked locks would definitely be helpful to user space servers.

Our current plan is to have a pool of threads, and dispatch blocking locks
to them. If that pool is exhausted, all further locks would be dispatched to
a single thread that would poll for locks.

Frank


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] nohz: move NOHZ code bits out of io_schedule{,_timeout} into a helper

2014-04-29 Thread Frederic Weisbecker

On Fri, Apr 25, 2014 at 08:57:29PM +0200, Denys Vlasenko wrote:
 Signed-off-by: Denys Vlasenko dvlas...@redhat.com
 Cc: Frederic Weisbecker fweis...@gmail.com
 Cc: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
 Cc: Fernando Luis Vazquez Cao fernando...@lab.ntt.co.jp
 Cc: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp
 Cc: Thomas Gleixner t...@linutronix.de
 Cc: Ingo Molnar mi...@kernel.org
 Cc: Peter Zijlstra pet...@infradead.org
 Cc: Andrew Morton a...@linux-foundation.org
 Cc: Arjan van de Ven ar...@linux.intel.com
 Cc: Oleg Nesterov o...@redhat.com
 ---
  kernel/sched/core.c | 33 +
  1 file changed, 17 insertions(+), 16 deletions(-)
 
 diff --git a/kernel/sched/core.c b/kernel/sched/core.c
 index ffea757..3137980 100644
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -4208,6 +4208,21 @@ EXPORT_SYMBOL_GPL(yield_to);
   * This task is about to go to sleep on IO. Increment rq-nr_iowait so
   * that process accounting knows that this is a task in IO wait state.
   */
 +#ifdef CONFIG_NO_HZ_COMMON
 +static __sched void io_wait_end(struct rq *rq)
 +{
 + if (atomic_dec_and_test(rq-nr_iowait)) {
 + if (raw_smp_processor_id() != cpu_of(rq))
 + tick_nohz_iowait_to_idle(cpu_of(rq));
 + }
 +}
 +#else
 +static inline void io_wait_end(struct rq *rq)
 +{
 + atomic_dec(rq-nr_iowait);
 +}
 +#endif
 +
  void __sched io_schedule(void)
  {
   struct rq *rq = raw_rq();
 @@ -4218,14 +4233,7 @@ void __sched io_schedule(void)
   current-in_iowait = 1;
   schedule();
   current-in_iowait = 0;
 -#ifdef CONFIG_NO_HZ_COMMON
 - if (atomic_dec_and_test(rq-nr_iowait)) {
 - if (raw_smp_processor_id() != cpu_of(rq))
 - tick_nohz_iowait_to_idle(cpu_of(rq));
 - }
 -#else
 - atomic_dec(rq-nr_iowait);
 -#endif
 + io_wait_end(rq);
   delayacct_blkio_end();

There is much more to unify that the iowait accounting between all
the io_schedule() declensions.

Peterz I think you had a patch to unify that a few month ago?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: l2c: prima2: only call l2x0_of_init() on matching nodes

2014-04-29 Thread Russell King - ARM Linux

On Tue, Apr 29, 2014 at 11:05:06PM +0800, Barry Song wrote:
 2014-04-28 22:52 GMT+08:00 Russell King - ARM Linux li...@arm.linux.org.uk:
  On Mon, Apr 28, 2014 at 10:37:09AM -0400, Matt Porter wrote:
  The fix is tested against bcm281xx and bcm21664 as that is what the
  l2c cleanup breaks in -next. As mentioned, I don't have the sirfsoc h/w
  so this first attempt at a fix also breaks their platform. It can be
  addressed by adding those platform specific compatibles back to the dts,
  of course.
 
  I'd much prefer that the sirfsoc folks fix this...it's going to break
  other platforms in a multi v7 build.
 
  Well, it's about time we got rid of this from platform specific code
  anyway, taking it away from platform maintainers to mess around with.
  So that's what I'm doing.
 
  It's worth noting that if you build a single zImage with exynos also
  enabled, then you also end up with an unconditional call from that
  code to l2x0_of_init() with it's own magic numbers - and that applies
  before my changes.
 
  So let's fix this properly and yank this crap from platform maintainers
  fingers.
 
 i mentioned dropping specific dts compatible prop will break non-csr
 platforms in the mail thread ARM: prima2: remove L2 cache size
 override and i said i was going to send v2. you said you need it
 before rc6. now it has been sent, but i am sorry it is not against
 next-20140424.

FFS.  IT HASN'T BEEN SENT.  All that I did was drop it into linux-next
so that more people would get off their fat backsides and test this
fscking patch set - something which hasn't happened because no one
pays attention to emails sent to mailing lists.

I also told you that this was what I was going to do.  But... is it
really on to hold up such a large patch set which impacts virtually
everyone because _you_ don't have time to sort out your small special
requirements - no it is not, that's just fscking selfish.

Anyway, I've had it with dealing with platform maintainers, I've yanked
this patch set, and I'm no longer planning to do anything with it -
platform maintainers have destroyed my will to get any of this series
into the kernel.

So, the L2 cache code is going to remain in its current state, and it's
going to rot because it's _FAR_ too much effort dealing with slow people
like yourselves, or people who want the series split up, or people who
whinge that there aren't any acks there (WELL GET OFF YOUR FAT BACKSIDES
AND SEND ME SOME IF YOU CARE ABOUT THIS - no, don't, I'm no longer pushing
this series.)

This is the last time I'm going to ever try cleaning up any core ARM code.
Core ARM maintanence is impossible in this environment with arm-soc split
from core ARM stuff, because core ARM stuff /always/ impacts on SoC
specific code.  You can't get away from that.

My position in this community has been made impossible and obsolete by
Linaro.  I'm at the point of walking away from this crap.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 3/7] tty/serial: convert 8250 to generic earlycon

2014-04-29 Thread Rob Herring

On Mon, Apr 28, 2014 at 9:56 PM, Yinghai Lu ying...@kernel.org wrote:
 On Mon, Apr 28, 2014 at 4:24 PM, Rob Herring robherri...@gmail.com wrote:
 On Sat, Apr 26, 2014 at 1:29 AM, Yinghai Lu ying...@kernel.org wrote:

 Thanks for finding these. I missed them in my build tests. This should fix 
 them:

 diff --git a/drivers/tty/serial/8250/8250_early.c
 b/drivers/tty/serial/8250/8250_early.c
 index e83c9db..2094c3b 100644
 --- a/drivers/tty/serial/8250/8250_early.c
 +++ b/drivers/tty/serial/8250/8250_early.c
 @@ -156,6 +156,11 @@ static int __init early_serial8250_setup(struct
 earlycon_device *device,
  EARLYCON_DECLARE(uart8250, early_serial8250_setup);
  EARLYCON_DECLARE(uart, early_serial8250_setup);

 +int __init setup_early_serial8250_console(char *cmdline)
 +{
 +   return setup_earlycon(cmdline, uart8250, early_serial8250_setup);
 +}
 +
  int serial8250_find_port_for_earlycon(void)
  {
 struct earlycon_device *device = early_device;

 that only handle uart8250,, may need to add more lines to handle uart,


That is on purpose because the only 2 users use uart8250. I consider
this a legacy interface and use of uart is horrible because there
are lots of uarts which are not 8250.

Rob


 +int __init setup_early_serial8250_console(char *cmdline)
 +{
 +   char *options;
 +   options = strstr(cmdline, uart8250,);
 +   if (options)
 +   return setup_earlycon(cmdline, uart8250,
 early_serial8250_setup);
 +
 +   options = strstr(cmdline, uart,);
 +   if (options)
 +  return setup_earlycon(cmdline, uart, early_serial8250_setup);
 +
 +  return 0;
 +}
 +

 Thanks

 Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[3.15-rc3] rtmutex-debug assertion.

2014-04-29 Thread Dave Jones

Just hit this while fuzzing the futex() syscall.


WARNING: CPU: 2 PID: 6202 at kernel/locking/rtmutex-debug.c:151 
debug_rt_mutex_proxy_unlock+0x4e/0x60()
DEBUG_LOCKS_WARN_ON(!rt_mutex_owner(lock))
Modules linked in:
 tun fuse ipt_ULOG nfnetlink bnep can_bcm scsi_transport_iscsi nfc caif_socket 
caif af_802154 ieee802154 phonet af_rxrpc can_raw can pppoe pppox ppp_generic 
slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc 
ax25 cfg80211 coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm xfs libcrc32c 
btusb bluetooth snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic 
snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device 
snd_pcm e1000e crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_timer snd 
microcode serio_raw pcspkr usb_debug 6lowpan_iphc rfkill shpchp ptp pps_core 
soundcore
CPU: 2 PID: 6202 Comm: trinity-c63 Not tainted 3.15.0-rc3+ #201
 0009 de725d52 880099befbd8 92746dad
 880099befc20 880099befc10 9206d46d 88020951c010
 88009d718000 88009d718000 c90011408680 c90011408688
Call Trace:
 [92746dad] dump_stack+0x4e/0x7a
 [9206d46d] warn_slowpath_common+0x7d/0xa0
 [9206d4ec] warn_slowpath_fmt+0x5c/0x80
 [920c533e] debug_rt_mutex_proxy_unlock+0x4e/0x60
 [920c4d77] rt_mutex_proxy_unlock+0x17/0x40
 [920ead7a] free_pi_state+0x6a/0xb0
 [920eade0] unqueue_me_pi+0x20/0x40
 [920ebfc2] futex_lock_pi.isra.18+0x262/0x3f0
 [92096910] ? hrtimer_get_res+0x50/0x50
 [920edb2c] do_futex+0x2ec/0xb60
 [92349897] ? debug_smp_processor_id+0x17/0x20
 [920bf3ee] ? put_lock_stats.isra.23+0xe/0x30
 [920bf756] ? lock_release_holdtime.part.24+0xe6/0x160
 [920a3cdd] ? get_parent_ip+0xd/0x50
 [9275698b] ? preempt_count_sub+0x6b/0xf0
 [92751f51] ? _raw_spin_unlock+0x31/0x50
 [920ee420] SyS_futex+0x80/0x180
 [9275b0e4] tracesys+0xdd/0xe2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

2014-04-29 Thread Suravee Suthikulanit


On 4/29/2014 5:20 AM, Borislav Petkov wrote:

On Tue, Apr 29, 2014 at 09:33:09AM +0200, Andreas Herrmann wrote:

I am sure, it's because some server systems had MMIO ECS access not
enabled in BIOS. I can't remember which systems were affected.


If you are referring to accessing PCI ECS ranges via 0xCF8, then yes, 
BIOS disable this as described below in the BKDG.


The BIOS may use either configuration space access mechanism during 
boot. Before booting the OS, BIOS must disable IO access to ECS, enable 
MMIO configuration and build an ACPI defined MCFG table. BIOS ACPI code 
must use MMIO to access configuration space.



Ok, now AMD people: what's the story with IO ECS, can we assume that on
everything after F10h, BIOS has a sensible MCFG and we can limit this to
F10h only? I like Bjorn's idea but we need to make sure a working MCFG
is ubiquitous.

Which begs the real question: Suravee, why are you even touching IO ECS
provided F15h and later have a MCFG? Or, do they?



As I was trying to generalize the logic inside amd_bus.c, which seems to 
be used mainly as a fallback mechanism, I tried to maintain the existing 
code, which does many things:

1. Setup numa_node information (if PXM doesn't exist)
2. Probe NB for MMIO resources (if MCFG doesn't exist)
3. Probe NB for IO resources
4. Setup IO ECS

In the new code, the IO ECS was needed to retrieve the 
AMD_NB_F1_MMIO_BASE_LIMIT_HI_REG (offset 0x180) during the early 
initialization as part of (2) logic. However, this register exists only 
on the newer systems.  However, as you mentioned, for (2) we can assume 
that the MCFG exists for most of the systems (family10h and later), and 
should be used instead.


The main purpose of this patch set is mainly to deal with the the node 
information (1).  So, we might need to split these all up and handle 
them separately as needed where (2) and (3) will be used as fallback for 
older systems where MCFG does not exist. I am not sure if where we need (4).


Suravee

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

2014-04-29 Thread Jiri Kosina

On Tue, 29 Apr 2014, Steven Rostedt wrote:

  According to 38.4 of [1], when SMM mode is entered while the CPU is 
  handling NMI, the end result might be that upon exit from SMM, NMIs will 
  be re-enabled and latched NMI delivered as nested [2].
 
 Note, if this were true, then the x86_64 hardware would be extremely
 buggy. That's because NMIs are not made to be nested. If SMM's come in
 during an NMI and re-enables the NMI, then *all* software would break.
 That would basically make NMIs useless.
 
 The only time I've ever witness problems (and I stress NMIs all the
 time), is when the NMI itself does a fault. Which my patch set handles
 properly. 

Yes, it indeed does. 

In the scenario I have outlined, the race window is extremely small, plus 
NMIs don't happen that often, plus SMIs don't happen that often, plus 
(hopefully) many BIOSes don't enable NMIs upon SMM exit.

The problem is, that Intel documentation is clear in this respect, and 
explicitly states it can happen. And we are violating that, which makes me 
rather nervous -- it'd be very nice to know what is the background of 38.4 
section text in the Intel docs.

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: line6: fix possible overrun

2014-04-29 Thread Mateusz Guzik

On Tue, Apr 29, 2014 at 04:47:11PM +0200, Takashi Iwai wrote:
 At Mon, 28 Apr 2014 01:44:25 +0300,
 Dan Carpenter wrote:
  
  On Sun, Apr 27, 2014 at 10:00:43PM +0200, Mateusz Guzik wrote:
  and a WARN_ON + -EINVAL in line6_init_audio to catch future
  offenders.

Returning -EINVAL is a bad idea because it would break the driver
completely and make it unusable.

   
   Well I would vote for returning the error anyway.
  
  I'm trying to be polite, but you are talking about adding regressions
  deliberately...
  
  It's very rare for people to deliberately add regressions to the kernel.
  I have only seen it one time before.
 
 I don't think Dan would be against returning -EINVAL if all the
 offender codes have been fixed (e.g. truncating strings to fit with
 the fixed arrays) at first.  Then it'd be a good help to catch any
 future bugs.  But, having -EINVAL without fixing the caller side means
 essentially that you're introducing the breakage intentionally
 although you know it certainly breaks, which is obviously bad.
 
 

We clearly have a serious miscommunication here (and apparently it
started with me not addressing the concern of complete driver breakage).

line6_init_audio consumers have to be fixed first, no doubt about that.

I was only commenting on catching *future* offenders, which I thought
would implictly mean *afterwards*.

With that in mind it would seem we are in agreement after all. :-)

As far getting this done maybe OP is interested.

-- 
Mateusz Guzik
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 00/10] arm64: UEFI support

2014-04-29 Thread Matt Fleming

On Tue, 29 Apr, at 07:56:20AM, H. Peter Anvin wrote:
 
 I'm wondering if it would be better to organize it into a separate topic
 branch.  We can still take it through tip, if you want, but it would be
 better than putting it all into one tree.

Sure, that makes sense. I'll do that.

-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch V2 0/9] I2C ACPI operation region handler support

2014-04-29 Thread Rafael J. Wysocki

On Tuesday, April 29, 2014 09:54:46 AM Lan Tianyu wrote:
 On 2014年04月29日 06:51, Rafael J. Wysocki wrote:
  On Monday, April 28, 2014 10:27:39 PM Lan Tianyu wrote:
  ACPI 5.0 spec(5.5.2.4.5) defines GenericSerialBus(i2c, spi, uart) operation
  region. It allows ACPI aml code able to access such kind of devices to
  implement some ACPI standard method.
 
  On the Asus T100TA, Bios use GenericSerialBus operation region to access
  i2c device to get battery info. So battery function depends on the I2C
  operation region support. Here is the bug link.
  https://bugzilla.kernel.org/show_bug.cgi?id=69011
 
  This patchset is to add I2C ACPI operation region handler support.
 
  Change Since V1:
 Fix some code style and memory leak issues in Patch 7
  
  Is it the only patch that has changed from v1?
 
 
 I also remove a redundant semicolon in the PATCH 8. Sorry. I didn't
 notice these patches are already in your tree. I will produce divergence
 patches based on your bleeding-edge branch.

No need for that, I'll use the new versions.

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V4] Add support for flag status register on Micron chips.

2014-04-29 Thread Graham Moore

Some new Micron flash chips require reading the flag
status register to determine when operations have completed.

Furthermore, chips with multi-die stacks of the 65nm 256Mb QSPI also
require reading the status register before reading the flag status register.

This patch adds support for the flag status register in the n25q512ax3 and 
n25q00
Micron QSPI flash chips.

Signed-off-by: Graham Moore grmo...@altera.com
---
V4:
Do not set nor-wait_till_ready if driver has already set it.
V3:
Rebase to l2-mtd spinor branch.
V2:
Remove leading underscore in function names.
Remove type cast in dev_err call and use the proper format
specifier instead.
---
 drivers/mtd/spi-nor/spi-nor.c |   52 +
 include/linux/mtd/spi-nor.h   |4 
 2 files changed, 56 insertions(+)

diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c
index d6f44d5..7e2817e 100644
--- a/drivers/mtd/spi-nor/spi-nor.c
+++ b/drivers/mtd/spi-nor/spi-nor.c
@@ -48,6 +48,25 @@ static int read_sr(struct spi_nor *nor)
 }
 
 /*
+ * Read the flag status register, returning its value in the location
+ * Return the status register value.
+ * Returns negative if error occurred.
+ */
+static int read_fsr(struct spi_nor *nor)
+{
+   int ret;
+   u8 val;
+
+   ret = nor-read_reg(nor, SPINOR_OP_RDFSR, val, 1);
+   if (ret  0) {
+   pr_err(error %d reading FSR\n, ret);
+   return ret;
+   }
+
+   return val;
+}
+
+/*
  * Read configuration register, returning its value in the
  * location. Return the configuration register value.
  * Returns negative if error occured.
@@ -165,6 +184,32 @@ static int spi_nor_wait_till_ready(struct spi_nor *nor)
return -ETIMEDOUT;
 }
 
+static int spi_nor_wait_till_fsr_ready(struct spi_nor *nor)
+{
+   unsigned long deadline;
+   int sr;
+   int fsr;
+
+   deadline = jiffies + MAX_READY_WAIT_JIFFIES;
+
+   do {
+   cond_resched();
+
+   sr = read_sr(nor);
+   if (sr  0) {
+   break;
+   } else if (!(sr  SR_WIP)) {
+   fsr = read_fsr(nor);
+   if (fsr  0)
+   break;
+   if (fsr  FSR_READY)
+   return 0;
+   }
+   } while (!time_after_eq(jiffies, deadline));
+
+   return -ETIMEDOUT;
+}
+
 /*
  * Service routine to read status register until ready, or timeout occurs.
  * Returns non-zero if error.
@@ -402,6 +447,7 @@ struct flash_info {
 #defineSECT_4K_PMC 0x10/* SPINOR_OP_BE_4K_PMC works 
uniformly */
 #defineSPI_NOR_DUAL_READ   0x20/* Flash supports Dual Read */
 #defineSPI_NOR_QUAD_READ   0x40/* Flash supports Quad Read */
+#defineUSE_FSR 0x80/* use flag status register */
 };
 
 #define INFO(_jedec_id, _ext_id, _sector_size, _n_sectors, _flags) \
@@ -488,6 +534,8 @@ const struct spi_device_id spi_nor_ids[] = {
{ n25q128a13,  INFO(0x20ba18, 0, 64 * 1024,  256, 0) },
{ n25q256a,INFO(0x20ba19, 0, 64 * 1024,  512, SECT_4K) },
{ n25q512a,INFO(0x20bb20, 0, 64 * 1024, 1024, SECT_4K) },
+   { n25q512ax3,  INFO(0x20ba20, 0, 64 * 1024, 1024, USE_FSR) },
+   { n25q00,  INFO(0x20ba21, 0, 64 * 1024, 2048, USE_FSR) },
 
/* PMC */
{ pm25lv512,   INFO(0,0, 32 * 1024,2, SECT_4K_PMC) },
@@ -965,6 +1013,10 @@ int spi_nor_scan(struct spi_nor *nor, const struct 
spi_device_id *id,
else
mtd-_write = spi_nor_write;
 
+   if ((info-flags  USE_FSR) 
+   nor-wait_till_ready == spi_nor_wait_till_ready)
+   nor-wait_till_ready = spi_nor_wait_till_fsr_ready;
+
/* prefer small sector erase if possible */
if (info-flags  SECT_4K) {
nor-erase_opcode = SPINOR_OP_BE_4K;
diff --git a/include/linux/mtd/spi-nor.h b/include/linux/mtd/spi-nor.h
index 5324184..9e6294f 100644
--- a/include/linux/mtd/spi-nor.h
+++ b/include/linux/mtd/spi-nor.h
@@ -34,6 +34,7 @@
 #define SPINOR_OP_SE   0xd8/* Sector erase (usually 64KiB) */
 #define SPINOR_OP_RDID 0x9f/* Read JEDEC ID */
 #define SPINOR_OP_RDCR 0x35/* Read configuration register */
+#define SPINOR_OP_RDFSR0x70/* Read flag status register */
 
 /* 4-byte address opcodes - used on Spansion and some Macronix flashes. */
 #define SPINOR_OP_READ40x13/* Read data bytes (low 
frequency) */
@@ -66,6 +67,9 @@
 
 #define SR_QUAD_EN_MX  0x40/* Macronix Quad I/O */
 
+/* Flag Status Register bits */
+#define FSR_READY  0x80
+
 /* Configuration Register bits. */
 #define CR_QUAD_EN_SPAN0x2 /* Spansion Quad I/O */
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to

Re: [PATCH 04/16] perf, mmap: Factor out perf_get_fd()

2014-04-29 Thread Robert Richter

On 25.04.14 16:52:05, Peter Zijlstra wrote:
 But no, I don't think that helps, its still true that the moment you get
 a fd another thread can immediately close(). That would drop the last
 ref and free it, meanwhile perf_event_open() is happily poking at it.
 
 Now I think you could cure this by adding an extra ref before calling
 your perf_get_fd() and dropping that extra ref at the end, where we used
 to have fd_install().

Yes, right. I have a solution now which increments the event's ref
count before creating the file descriptor using try_get_event()/
put_event().

The patch also does not remove get_unused_fd_flags() and the err_fd
error handler.

Have an update already of a rebase version but still need to test it.

Would it be ok to split the patch set and send in a first step only
the first 4 patches that refactor the perf mmap code?

Thanks,

-Robert
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 2/3] documentation: Record rcu_dereference() value mishandling

On Tue, Apr 29, 2014 at 01:42:13AM -0400, Pranith Kumar wrote:
 Minor nits below:
 
 Other than that Acked-by: Pranith Kumar bobby.pr...@gmail.com
 
 On Tue, Apr 29, 2014 at 1:04 AM, Andev debian...@gmail.com wrote:
  From: Paul E. McKenney paul...@linux.vnet.ibm.com
 
  Recent LKML discussings (see http://lwn.net/Articles/586838/ and
  http://lwn.net/Articles/588300/ for the LWN writeups) brought out
  some ways of misusing the return value from rcu_dereference() that
  are not necessarily completely intuitive.  This commit therefore
  documents what can and cannot safely be done with these values.
 
  Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
 snip
  +
  +   o   The pointer is never dereferenced after being compared.
  +   Since there are no subsequent dereferences, the compiler
  +   cannot use anything it learned from the comparison
  +   to reorder the non-existent subsequent dereferences.
  +   This sort of comparison occurs frequently when scanning
  +   RCU-protected circular linked lists.
  +
  +   o   The comparison is against a pointer pointer that
 
 duplicate pointer, remove one

Good catch, fixed!

  +   references memory that was initialized a long time ago.
  +   The reason this is safe is that even if misordering
  +   occurs, the misordering will not affect the accesses
  +   that follow the comparison.  So exactly how long ago is
  +   a long time ago?  Here are some possibilities:
 snip
  +   o   All of the accesses following the comparison are stores,
  +   so that a control dependency preserves the needed ordering.
  +   That said, it is easy to get control dependencies wrong.
  +   Please see the CONTROL DEPENDENCIES section of
  +   Documentation/memory-barriers.txt for more details.
  +
  +   o   The pointers compared not-equal -and- the compiler does
 
 add in are  - The pointers compared are not-equal...

Actually, compared is a verb here.  But that use is a bit obscure, so
taking your suggestion as a bug report.  I changed it to read:

The pointers are not equal -and- the compiler does not have
enough information to deduce the value of the pointer.

Fair enough?

  +   not have enough information to deduce the value of the
  +   pointer.  Note that the volatile cast in rcu_dereference()
  +   will normally prevent the compiler from knowing too much.
  +
  +o  Disable any value-speculation optimizations that your compiler
  +   might provide, especially if you are making use of feedback-based
  +   optimizations that take data collected from prior runs.  Such
  +   value-speculation optimizations reorder operations by design.
  +
  +   There is one exception to this rule:  Value-speculation
  +   optimizations that leverage the branch-prediction hardware are
  +   safe on strongly ordered systems (such as x86), but not on weakly
  +   ordered systems (such as ARM or Power).  Choose your compiler
  +   command-line options wisely!
  +
  +
  +EXAMPLE OF AMPLIFIED RCU-USAGE BUG
  +
  +Because updaters can run concurrently with RCU readers, RCU readers can
  +see stale and/or inconsistent values.  If RCU readers need fresh or
  +consistent values, which they sometimes do, they need to take proper
  +precautions.  To see this, consider the following code fragment:
  +
  +   struct foo {
  +   int a;
  +   int b;
  +   int c;
  +   };
  +   struct foo *gp1;
  +   struct foo *gp2;
  +
  +   void updater(void)
  +   {
  +   struct foo *p;
  +
  +   p = kmalloc(...);
  +   if (p == NULL)
  +   deal_with_it();
  +   p-a = 42;  /* Each field in its own cache line. */
  +   p-b = 43;
  +   p-c = 44;
  +   rcu_assign_pointer(gp1, p);
  +   p-b = 143;
  +   p-c = 144;
  +   rcu_assign_pointer(gp2, p);
  +   }
  +
  +   void reader(void)
  +   {
  +   struct foo *p;
  +   struct foo *q;
  +   int r1, r2;
  +
  +   p = rcu_dereference(gp2);
  +   r1 = p-b;  /* Guaranteed to get 143. */
  +   q = rcu_dereference(gp1);
  +   if (p == q) {
  +   /* The compiler decides that q-c is same as p-c. 
  */
  +   r2 = p-c; /* Could get 44 on weakly order system. 
  */
  +   }
  +   }
  +
  +You might be surprised that the outcome (r1 == 143  r2 == 44) is 
  possible,
  +but you should not be.  After all, the updater might have been invoked
  +a second time between the time reader() loaded into r1 and the time
  +that it loaded into r2.  The fact that this same

Re: [PATCH] ARM: l2c: prima2: only call l2x0_of_init() on matching nodes

2014-04-29 Thread Barry Song

2014-04-29 23:14 GMT+08:00 Russell King - ARM Linux li...@arm.linux.org.uk:
 On Tue, Apr 29, 2014 at 11:05:06PM +0800, Barry Song wrote:
 2014-04-28 22:52 GMT+08:00 Russell King - ARM Linux li...@arm.linux.org.uk:
  On Mon, Apr 28, 2014 at 10:37:09AM -0400, Matt Porter wrote:
  The fix is tested against bcm281xx and bcm21664 as that is what the
  l2c cleanup breaks in -next. As mentioned, I don't have the sirfsoc h/w
  so this first attempt at a fix also breaks their platform. It can be
  addressed by adding those platform specific compatibles back to the dts,
  of course.
 
  I'd much prefer that the sirfsoc folks fix this...it's going to break
  other platforms in a multi v7 build.
 
  Well, it's about time we got rid of this from platform specific code
  anyway, taking it away from platform maintainers to mess around with.
  So that's what I'm doing.
 
  It's worth noting that if you build a single zImage with exynos also
  enabled, then you also end up with an unconditional call from that
  code to l2x0_of_init() with it's own magic numbers - and that applies
  before my changes.
 
  So let's fix this properly and yank this crap from platform maintainers
  fingers.

 i mentioned dropping specific dts compatible prop will break non-csr
 platforms in the mail thread ARM: prima2: remove L2 cache size
 override and i said i was going to send v2. you said you need it
 before rc6. now it has been sent, but i am sorry it is not against
 next-20140424.

 FFS.  IT HASN'T BEEN SENT.  All that I did was drop it into linux-next
 so that more people would get off their fat backsides and test this
 fscking patch set - something which hasn't happened because no one
 pays attention to emails sent to mailing lists.

so your point is people don't pay attention to your mails? or you are
ignored? i think that is 100% not real. i think your opinions and
mails are always respected as you are the chief arm linux expert.


 I also told you that this was what I was going to do.  But... is it
 really on to hold up such a large patch set which impacts virtually
 everyone because _you_ don't have time to sort out your small special
 requirements - no it is not, that's just fscking selfish.

 Anyway, I've had it with dealing with platform maintainers, I've yanked
 this patch set, and I'm no longer planning to do anything with it -
 platform maintainers have destroyed my will to get any of this series
 into the kernel.

no, i am trying to follow your suggestion to make patch set merged and
l2 codes cleaned.
i have been trying to follow your will until now, and from the beginning.


 So, the L2 cache code is going to remain in its current state, and it's
 going to rot because it's _FAR_ too much effort dealing with slow people
 like yourselves, or people who want the series split up, or people who
 whinge that there aren't any acks there (WELL GET OFF YOUR FAT BACKSIDES
 AND SEND ME SOME IF YOU CARE ABOUT THIS - no, don't, I'm no longer pushing
 this series.)

people might be selfish, but people might have some reasons to
response slowly, like holiday or family issue.
how about taking it easy? it doesn't prove you are not respected by
platform maintainers.


 This is the last time I'm going to ever try cleaning up any core ARM code.
 Core ARM maintanence is impossible in this environment with arm-soc split
 from core ARM stuff, because core ARM stuff /always/ impacts on SoC
 specific code.  You can't get away from that.

 My position in this community has been made impossible and obsolete by
 Linaro.  I'm at the point of walking away from this crap.

just fix the relationship and communication, that is good enough. you
have done things so well, there is no reason to give up.


 --
 FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
 improving, and getting towards what was expected from it.

-barry
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 0/3] Miscellaneous fixes for 3.16

On Mon, Apr 28, 2014 at 08:23:45PM -0700, Josh Triplett wrote:
 On Mon, Apr 28, 2014 at 04:56:00PM -0700, Paul E. McKenney wrote:
  Hello!
  
  This series provides miscellaneous fixes:
  
  1.  Apply ACCESS_ONCE() to unprotected -gp_flags accesses.
  
  2.  Fix typo in comment, courtesy of Liu Ping Fan.
  
  3.  Make RCU CPU stall warnings print grace-period numbers in
  signed format to improve readability of stall-warning output.
  
  4.  Make cpu_needs_another_gp() take future grace-period needs
  into account.
  
  5.  Remove unused -preemptible field from the rcu_data structure,
  courtesty of Iulia Manda.
  
  6.  Apply ACCESS_ONCE() to unprotected -jiffies_stall accesses,
  courtesty of Iulia Manda.
  
  7.  Make callers responsible for grace-period kthread wakeup in
  order to avoid potential silent grace-period stalls.
  
  8.  Remove extern from RCU function declarations, courtesy of
  Iulia Manda.
  
  9.  Apply ACCESS_ONCE() to additional -jiffies_stall accesses,
  courtesy of Himangi Saraogi.
  
  10. Add event tracing to dyntick_save_progress_counter(), courtesy
  of Andreea-Cristina Bernat.
  
  11. Make rcu_init_one() use nr_cpu_ids instead of NR_CPUS for
  data-structure setup limit check, courtesy of Himangi Saraogi.
  
  12. Remove redundant kfree_call_rcu() definition by using the
  rcu_state pointer, courtesy of Andreea-Cristina Bernat.
  
  13. Merge rcu_sched_force_quiescent_state() definition with
  rcu_force_quiescent_state() by using the rcu_state pointer,
  courtesy of Andreea-Cristina Bernat.
  
  14. Document RCU_INIT_POINTER()'s lack of ordering guarantees.
  
  15. Automatically bind RCU's grace-period kthreads to timekeeping
  CPU for NO_HZ_FULL builds.
  
  16. Make large and small sysidle systems use equivalent state machine.
  
  17. Remove duplicate resched_cpu() declaration, courtesy of
  Pranith Kumar.
  
  18. Replace deprecated __this_cpu_ptr() uses with raw_cpu_ptr(),
  courtesy of Christoph Lameter.
  
  19. Make softirq processing provide a quiescent state only once
  per full pass over all softirqs rather than once per action,
  courtesy of Eric Dumazet.
 
 For all 19:
 Reviewed-by: Josh Triplett j...@joshtriplett.org

And thank you for these reviews as well, applied!

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RT 0/3] Linux 3.2.57-rt84-rc1

2014-04-29 Thread Pavel Vasilyev


28.04.2014 17:39, Steven Rostedt пишет:

On Mon, 28 Apr 2014 02:15:28 +0400
Pavel Vasilyev pa...@pavlinux.ru wrote:


27.04.2014 18:39, Steven Rostedt пишет:

Dear RT Folks,

This is the RT stable review cycle of patch 3.2.57-rt84-rc1.

Please scream at me if I messed something up. Please test the patches too.



More than two years our thin clients (about 5000 machines, Intel Atom, x86_32)
work with RCU_BOOST.

CONFIG_RCU_BOOST=y
CONFIG_RCU_BOOST_PRIO=80
CONFIG_RCU_BOOST_DELAY=400



Is this just a confirmation of having RCU_BOOST default y for
PREEMPT_RT is a good thing?



Only 3.2-rt


--

 Pavel.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Input: implement managed polled input devices

2014-04-29 Thread Dmitry Torokhov

On Tue, Apr 29, 2014 at 08:09:39AM +0200, David Herrmann wrote:
 Hi
 
 On Tue, Apr 29, 2014 at 5:23 AM, Dmitry Torokhov
 dmitry.torok...@gmail.com wrote:
  Managed resources are becoming more and more popular in drivers. Let's
  implement managed polled input devices, to complement managed regular input
  devices.
 
  Similarly to managed regular input devices only one new call
  devm_input_allocate_polled_device() is added and the rest of APIs is
  modified to work with both managed and non-managed devices.
 
  Signed-off-by: Dmitry Torokhov dmitry.torok...@gmail.com
  ---
   drivers/input/input-polldev.c | 113 
  +-
   include/linux/input-polldev.h |   3 ++
   2 files changed, 115 insertions(+), 1 deletion(-)
 
  diff --git a/drivers/input/input-polldev.c b/drivers/input/input-polldev.c
  index 4b19190..27961fc 100644
  --- a/drivers/input/input-polldev.c
  +++ b/drivers/input/input-polldev.c
  @@ -176,6 +176,90 @@ struct input_polled_dev 
  *input_allocate_polled_device(void)
   }
   EXPORT_SYMBOL(input_allocate_polled_device);
 
  +struct input_polled_devres {
  +   struct input_polled_dev *polldev;
  +};
  +
  +static int devm_input_polldev_match(struct device *dev, void *res, void 
  *data)
  +{
  +   struct input_polled_devres *devres = res;
  +
  +   return devres-polldev == data;
  +}
  +
  +static void devm_input_polldev_release(struct device *dev, void *res)
  +{
  +   struct input_polled_devres *devres = res;
  +   struct input_polled_dev *polldev = devres-polldev;
  +
  +   dev_dbg(dev, %s: dropping reference/freeing %s\n,
  +   __func__, dev_name(polldev-input-dev));
  +
  +   input_put_device(polldev-input);
  +   kfree(polldev);
  +}
  +
  +static void devm_input_polldev_unregister(struct device *dev, void *res)
  +{
  +   struct input_polled_devres *devres = res;
  +   struct input_polled_dev *polldev = devres-polldev;
  +
  +   dev_dbg(dev, %s: unregistering device %s\n,
  +   __func__, dev_name(polldev-input-dev));
  +   input_unregister_device(polldev-input);
  +
  +   /*
  +* Note that we are still holding extra reference to the input
  +* device so it will stick around until devm_input_polldev_release()
  +* is called.
  +*/
  +}
  +
  +/**
  + * devm_input_allocate_polled_device - allocate managed polled device
  + * @dev: device owning the polled device being created
  + *
  + * Returns prepared struct input_polled_dev or %NULL.
  + *
  + * Managed polled input devices do not need to be explicitly unregistered
  + * or freed as it will be done automatically when owner device unbinds
  + * from * its driver (or binding fails). Once such managed polled device
  + * is allocated, it is ready to be set up and registered in the same
  + * fashion as regular polled input devices (using
  + * input_register_polled_device() function).
  + *
  + * If you want to manually unregister and free such managed polled devices,
  + * it can be still done by calling input_unregister_polled_device() and
  + * input_free_polled_device(), although it is rarely needed.
  + *
  + * NOTE: the owner device is set up as parent of input device and users
  + * should not override it.
  + */
  +struct input_polled_dev *devm_input_allocate_polled_device(struct device 
  *dev)
  +{
  +   struct input_polled_dev *polldev;
  +   struct input_polled_devres *devres;
  +
  +   devres = devres_alloc(devm_input_polldev_release, sizeof(*devres),
  + GFP_KERNEL);
  +   if (!devres)
  +   return NULL;
  +
  +   polldev = input_allocate_polled_device();
  +   if (!polldev) {
  +   devres_free(devres);
  +   return NULL;
  +   }
  +
  +   polldev-input-dev.parent = dev;
  +   polldev-devres_managed = true;
  +
  +   devres-polldev = polldev;
  +   devres_add(dev, devres);
  +
  +   return polldev;
  +}
  +
   /**
* input_free_polled_device - free memory allocated for polled device
* @dev: device to free
  @@ -186,7 +270,12 @@ EXPORT_SYMBOL(input_allocate_polled_device);
   void input_free_polled_device(struct input_polled_dev *dev)
   {
  if (dev) {
  -   input_free_device(dev-input);
  +   if (dev-devres_managed)
  +   WARN_ON(devres_destroy(dev-input-dev.parent,
  +   devm_input_polldev_release,
  +   devm_input_polldev_match,
  +   dev));
  +   input_put_device(dev-input);
  kfree(dev);
  }
   }
  @@ -204,9 +293,19 @@ EXPORT_SYMBOL(input_free_polled_device);
*/
   int input_register_polled_device(struct input_polled_dev *dev)
   {
  +   struct input_polled_devres *devres = NULL;
  struct input_dev *input = dev-input;

Re: usermodehelper lock error at resume

2014-04-29 Thread Rafael J. Wysocki


On 4/29/2014 5:14 PM, Takashi Iwai wrote:

At Fri, 18 Apr 2014 10:28:05 +0200,
Takashi Iwai wrote:

[my previous post didn't seem to go out by some reason, so I just
  resend this; please disregard if you already received it.]

Hmm, I still can't see this in LKML archives...
Did you guys receive my previous post below?



I did, sorry for not responding, I'm buried under stuff at the moment.

Rafael



Hi,

we've received a bug report with 3.14.x kernel regarding the firmware
loading of intel BT device at suspend/resume:
   https://bugzilla.novell.com/show_bug.cgi?id=873790

It's a WARN_ON() that was recently introduced.  And, it turned out
that the problem basically comes from a small window between the
process resume and the clear of usermodehelper lock.

The request_firmware() function checks the UMH lock and gives up when
it's in DISABLE state.  This is for avoiding the invalid  f/w loading
during suspend/resume phase.  The problem is that
usermodehelper_enable() is called at the end of thaw_processes().
Thus, a thawed process in between can kick off the f/w loader code
path (in this case, via btusb_setup_intel()) even before the call of
usermodehelper_enable().  Then usermodehelper_read_trylock() returns
an error and request_firmware() spews WARN_ON() in the end.

The oneliner patch below seems fixing the problem.  But, I'm not quite
sure whether it's the best; rather usermodehelper_enable() can be
moved there, or better to define yet another state, e.g. UMH_THAWING,
instead of reusing UMH_FREEZING?

Suggestions?

Once when we agree, I'll cook up a proper patch.


thanks,

Takashi

---
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 06ec8869dbf1..9c7552f092f2 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -181,6 +181,8 @@ void thaw_processes(void)
pm_nosig_freezing = false;
  
  	oom_killer_enable();

+   /* allow request_firmare() at this point */
+   __usermodehelper_set_disable_depth(UMH_FREEZING);
  
  	printk(Restarting tasks ... );
  


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V2] bio: modify __bio_add_page() to accept pages that don't start a new segment

2014-04-29 Thread Maurizio Lombardi

The original behaviour is to refuse to add a new page if the maximum number
of segments has been reached, regardless of the fact the page we are
going to add can be merged into the last segment or not.

Unfortunately, when the system runs under heavy memory fragmentation conditions,
a driver may try to add multiple pages to the last segment.
The original code won't accept them and EBUSY will be reported to
userspace.

This patch modifies the function so it refuses to add a page
only in case the latter starts a new segment and the maximum number
of segments has already been reached.

The bug can be easily reproduced with the st driver:

1) set CONFIG_SCSI_MPT2SAS_MAX_SGE or CONFIG_SCSI_MPT3SAS_MAX_SGE  to 16
2) modprobe st buffer_kbs=1024
3) #dd if=/dev/zero of=/dev/st0 bs=1M count=10
   dd: error writing ‘/dev/st0’: Device or resource busy

Signed-off-by: Maurizio Lombardi mlomb...@redhat.com
---
 fs/bio.c | 51 +--
 1 file changed, 29 insertions(+), 22 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 6f0362b..a31e12b 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -699,6 +699,7 @@ static int __bio_add_page(struct request_queue *q, struct 
bio *bio, struct page
  unsigned int max_sectors)
 {
int retried_segments = 0;
+   unsigned int bi_phys_segments_orig;
struct bio_vec *bvec;
 
/*
@@ -750,29 +751,32 @@ static int __bio_add_page(struct request_queue *q, struct 
bio *bio, struct page
return 0;
 
/*
-* we might lose a segment or two here, but rather that than
-* make this too complex.
+* setup the new entry, we might clear it again later if we
+* cannot add the page
+*/
+   bvec = bio-bi_io_vec[bio-bi_vcnt];
+   bvec-bv_page = page;
+   bvec-bv_len = len;
+   bvec-bv_offset = offset;
+   bio-bi_vcnt++;
+   bi_phys_segments_orig = bio-bi_phys_segments;
+   bio-bi_phys_segments++;
+
+   /*
+* Perform a recount if the number of segments is greater
+* than queue_max_segments(q).
 */
 
-   while (bio-bi_phys_segments = queue_max_segments(q)) {
+   while (bio-bi_phys_segments  queue_max_segments(q)) {
 
if (retried_segments)
-   return 0;
+   goto failed;
 
retried_segments = 1;
blk_recount_segments(q, bio);
}
 
/*
-* setup the new entry, we might clear it again later if we
-* cannot add the page
-*/
-   bvec = bio-bi_io_vec[bio-bi_vcnt];
-   bvec-bv_page = page;
-   bvec-bv_len = len;
-   bvec-bv_offset = offset;
-
-   /*
 * if queue has other restrictions (eg varying max sector size
 * depending on offset), it can specify a merge_bvec_fn in the
 * queue to get further control
@@ -789,23 +793,26 @@ static int __bio_add_page(struct request_queue *q, struct 
bio *bio, struct page
 * merge_bvec_fn() returns number of bytes it can accept
 * at this offset
 */
-   if (q-merge_bvec_fn(q, bvm, bvec)  bvec-bv_len) {
-   bvec-bv_page = NULL;
-   bvec-bv_len = 0;
-   bvec-bv_offset = 0;
-   return 0;
-   }
+   if (q-merge_bvec_fn(q, bvm, bvec)  bvec-bv_len)
+   goto failed;
}
 
/* If we may be able to merge these biovecs, force a recount */
-   if (bio-bi_vcnt  (BIOVEC_PHYS_MERGEABLE(bvec-1, bvec)))
+   if (bio-bi_vcnt  1  (BIOVEC_PHYS_MERGEABLE(bvec-1, bvec)))
bio-bi_flags = ~(1  BIO_SEG_VALID);
 
-   bio-bi_vcnt++;
-   bio-bi_phys_segments++;
  done:
bio-bi_iter.bi_size += len;
return len;
+
+ failed:
+   bvec-bv_page = NULL;
+   bvec-bv_len = 0;
+   bvec-bv_offset = 0;
+   bio-bi_vcnt--;
+   bio-bi_phys_segments = bi_phys_segments_orig;
+
+   return 0;
 }
 
 /**
-- 
Maurizio Lombardi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 3/4] thermal: Added Bang-bang thermal governor

2014-04-29 Thread Javi Merino

On Tue, Apr 29, 2014 at 10:17:56AM +0100, Peter Feuerer wrote:
 The bang-bang thermal governor uses a hysteresis to switch abruptly on
 or off a cooling device.  It is intended to control fans, which can
 not be throttled but just switched on or off.
 Bang-bang cannot be set as default governor as it is intended for
 special devices only.  For those special devices the driver needs to
 explicitely request it.

I don't really understand why step-wise doesn't work for you (AIUI,
this governor should be a subset of it.  I'll let others comment on
that, just a minor comment below.

[...]
 diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c
 new file mode 100644
 index 000..328dde0
 --- /dev/null
 +++ b/drivers/thermal/gov_bang_bang.c
 @@ -0,0 +1,124 @@
 +/*
 + *  gov_bang_bang.c - A simple thermal throttling governor using hysteresis
 + *
 + *  Copyright (C) 2014 Peter Feuerer pe...@piie.net
 + *
 + *  Based on step_wise.c with following Copyrights:
 + *  Copyright (C) 2012 Intel Corp
 + *  Copyright (C) 2012 Durgadoss R durgados...@intel.com
 + *
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License as published by
 + * the Free Software Foundation, version 2.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
 + * the GNU General Public License for more details.
 + *
 + */
 +
 +#include linux/thermal.h
 +
 +#include thermal_core.h
 +
 +static void thermal_zone_trip_update(struct thermal_zone_device *tz, int 
 trip)
 +{
 + long trip_temp;
 + unsigned long trip_hyst;
 + struct thermal_instance *instance;
 +
 + tz-ops-get_trip_temp(tz, trip, trip_temp);
 + tz-ops-get_trip_hyst(tz, trip, trip_hyst);
 +
 + dev_dbg(tz-device, Trip%d[temp=%ld]:temp=%d:hyst=%ld\n,
 + trip, trip_temp, tz-temperature,
 + trip_hyst);
 +
 + mutex_lock(tz-lock);
 +
 + list_for_each_entry(instance, tz-thermal_instances, tz_node) {
 + if (instance-trip != trip)
 + continue;
 +
 + /* in case fan is neither on nor off set the fan to active */
 + if (instance-target != 0  instance-target != 1)
 + instance-target = 1;

I think you should add a pr_warn() here to warn the user that the
governor is being used with a cooling device that seems to support
more than one cooling state.

Cheers,
Javi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] bio: modify __bio_add_page() to accept pages that don't start a new segment

2014-04-29 Thread Maurizio Lombardi

Sorry I did a mistake in this patch: on failure I should restore the original 
value
of bi_phys_segments.

I'm going to send a new version.

Maurizio Lombardi

On Tue, Apr 29, 2014 at 04:58:18PM +0200, Maurizio Lombardi wrote:
 The original behaviour is to refuse to add a new page if the maximum number
 of segments has been reached, regardless of the fact the page we are
 going to add can be merged into the last segment or not.
 
 Unfortunately, when the system runs under heavy memory fragmentation 
 conditions,
 a driver may try to add multiple pages to the last segment.
 The original code won't accept them and EBUSY will be reported to
 userspace.
 
 This patch modifies the function so it refuses to add a page
 only in case the latter starts a new segment and the maximum number
 of segments has already been reached.
 
 The bug can be easily reproduced with the st driver:
 
 1) set CONFIG_SCSI_MPT2SAS_MAX_SGE or CONFIG_SCSI_MPT3SAS_MAX_SGE  to 16
 2) modprobe st buffer_kbs=1024
 3) #dd if=/dev/zero of=/dev/st0 bs=1M count=10
dd: error writing ‘/dev/st0’: Device or resource busy
 
 Signed-off-by: Maurizio Lombardi mlomb...@redhat.com
 ---
  fs/bio.c | 50 --
  1 file changed, 28 insertions(+), 22 deletions(-)
 
 diff --git a/fs/bio.c b/fs/bio.c
 index 6f0362b..9a3a0b1 100644
 --- a/fs/bio.c
 +++ b/fs/bio.c
 @@ -750,29 +750,31 @@ static int __bio_add_page(struct request_queue *q, 
 struct bio *bio, struct page
   return 0;
  
   /*
 -  * we might lose a segment or two here, but rather that than
 -  * make this too complex.
 +  * setup the new entry, we might clear it again later if we
 +  * cannot add the page
 +  */
 + bvec = bio-bi_io_vec[bio-bi_vcnt];
 + bvec-bv_page = page;
 + bvec-bv_len = len;
 + bvec-bv_offset = offset;
 + bio-bi_vcnt++;
 + bio-bi_phys_segments++;
 +
 + /*
 +  * Perform a recount if the number of segments is greater
 +  * than queue_max_segments(q).
*/
  
 - while (bio-bi_phys_segments = queue_max_segments(q)) {
 + while (bio-bi_phys_segments  queue_max_segments(q)) {
  
   if (retried_segments)
 - return 0;
 + goto failed;
  
   retried_segments = 1;
   blk_recount_segments(q, bio);
   }
  
   /*
 -  * setup the new entry, we might clear it again later if we
 -  * cannot add the page
 -  */
 - bvec = bio-bi_io_vec[bio-bi_vcnt];
 - bvec-bv_page = page;
 - bvec-bv_len = len;
 - bvec-bv_offset = offset;
 -
 - /*
* if queue has other restrictions (eg varying max sector size
* depending on offset), it can specify a merge_bvec_fn in the
* queue to get further control
 @@ -789,23 +791,27 @@ static int __bio_add_page(struct request_queue *q, 
 struct bio *bio, struct page
* merge_bvec_fn() returns number of bytes it can accept
* at this offset
*/
 - if (q-merge_bvec_fn(q, bvm, bvec)  bvec-bv_len) {
 - bvec-bv_page = NULL;
 - bvec-bv_len = 0;
 - bvec-bv_offset = 0;
 - return 0;
 - }
 + if (q-merge_bvec_fn(q, bvm, bvec)  bvec-bv_len)
 + goto failed;
   }
  
   /* If we may be able to merge these biovecs, force a recount */
 - if (bio-bi_vcnt  (BIOVEC_PHYS_MERGEABLE(bvec-1, bvec)))
 + if (bio-bi_vcnt  1  (BIOVEC_PHYS_MERGEABLE(bvec-1, bvec)))
   bio-bi_flags = ~(1  BIO_SEG_VALID);
  
 - bio-bi_vcnt++;
 - bio-bi_phys_segments++;
   done:
   bio-bi_iter.bi_size += len;
   return len;
 +
 + failed:
 + bvec-bv_page = NULL;
 + bvec-bv_len = 0;
 + bvec-bv_offset = 0;
 + bio-bi_vcnt--;
 + if (!retried_segments)
 + bio-bi_phys_segments--;
 +
 + return 0;
  }
  
  /**
 -- 
 Maurizio Lombardi
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 00/24] input: Introduce ff-memless-next as an improved replacement for ff-memless

2014-04-29 Thread simon

 This patch series:
 1) Adds ff-memless-next module [1]
 2) Ports all hardware-specific drivers to MLNX's API [2-23]
 3) Removes FFML and replaces it with MLNX [24]

 Signed-off-by: Michal MalÃ½ madcatxs...@devoid-pointer.net

 v4:
  - Add a summary of changes between MLNX and FFML to the last patch
  - Remove a stale empty line in hid-sony.c
  - Add Tested-by: Elias Vanderstuyft elias@gmail.com to hid-lg4ff
 patch.

 v3:
  - Rebase against latest linux-next. Fixes conflict in hid-sony.c and
max8997_haptic.c
  - Updated documentation in ff-memless-next.h. The documentation now
 describes
parameters of the callback function and specifically mentions that
HW-specific drivers must not keep a reference to mlnx_effect_command
 struct
to which a pointer is passed in the callback function.
  - Fix a minor brace inconsistency in hid-lgff
  I believe that all concerns regarding v2 have been resolved as false
 alarms.

 v2:
  - Add missing msecs to jiffies conversion in ff-memless-next
  - lgff: Properly convert force on Y axis from MLNX to device range
  Support periodic effects for joystick_ac device class
  - lg3ff: Properly convert forces from MLNX to device range
  - Very minor coding style issues fixed

Hi all,
I'd confirm that I build v2 and tested on a number of devices (1), and it
appears to work OK.

The only slight hiccup was with an older version (Xubuntu 12.10)
'ffcfstress' application which did not correctly detect the CF
capabilities of my gaming wheel(s). This is believed to be a fault with
the application not using correct bit-field testing and appears to have
been fixed on later versions (Xubuntu 13.10).

I also built v4, but have not yet had time/access to all the devices
(other than DS4) to test.

Cheers,
Simon

Tested-by: Simon Wood si...@mungewell.org

(1) Devices:
Logitech Momo Red, Momo Black, DFGT, WiiWheel, G27
Sony DS3-SA, DS4, Intec 3rd Party PS3 controller
Nintendo Wii Remote


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: usermodehelper lock error at resume

At Tue, 29 Apr 2014 17:34:32 +0200,
Rafael J. Wysocki wrote:
 
 On 4/29/2014 5:14 PM, Takashi Iwai wrote:
  At Fri, 18 Apr 2014 10:28:05 +0200,
  Takashi Iwai wrote:
  [my previous post didn't seem to go out by some reason, so I just
resend this; please disregard if you already received it.]
  Hmm, I still can't see this in LKML archives...
  Did you guys receive my previous post below?
 
 
 I did, sorry for not responding, I'm buried under stuff at the moment.

Don't worry, this isn't any urgent issue.  (And I've been off in the
whole last week in anyway :)

I just wondered why this didn't come up in LKML archive.  But if the
post went out actually, it's fine.


thanks,

Takashi

 
 Rafael
 
 
  Hi,
 
  we've received a bug report with 3.14.x kernel regarding the firmware
  loading of intel BT device at suspend/resume:
 https://bugzilla.novell.com/show_bug.cgi?id=873790
 
  It's a WARN_ON() that was recently introduced.  And, it turned out
  that the problem basically comes from a small window between the
  process resume and the clear of usermodehelper lock.
 
  The request_firmware() function checks the UMH lock and gives up when
  it's in DISABLE state.  This is for avoiding the invalid  f/w loading
  during suspend/resume phase.  The problem is that
  usermodehelper_enable() is called at the end of thaw_processes().
  Thus, a thawed process in between can kick off the f/w loader code
  path (in this case, via btusb_setup_intel()) even before the call of
  usermodehelper_enable().  Then usermodehelper_read_trylock() returns
  an error and request_firmware() spews WARN_ON() in the end.
 
  The oneliner patch below seems fixing the problem.  But, I'm not quite
  sure whether it's the best; rather usermodehelper_enable() can be
  moved there, or better to define yet another state, e.g. UMH_THAWING,
  instead of reusing UMH_FREEZING?
 
  Suggestions?
 
  Once when we agree, I'll cook up a proper patch.
 
 
  thanks,
 
  Takashi
 
  ---
  diff --git a/kernel/power/process.c b/kernel/power/process.c
  index 06ec8869dbf1..9c7552f092f2 100644
  --- a/kernel/power/process.c
  +++ b/kernel/power/process.c
  @@ -181,6 +181,8 @@ void thaw_processes(void)
 pm_nosig_freezing = false;

 oom_killer_enable();
  +  /* allow request_firmare() at this point */
  +  __usermodehelper_set_disable_depth(UMH_FREEZING);

 printk(Restarting tasks ... );

 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 4/4] acerhdf: Use bang-bang thermal governor

2014-04-29 Thread Javi Merino

On Tue, Apr 29, 2014 at 10:17:57AM +0100, Peter Feuerer wrote:
 acerhdf has been doing an on-off fan control using hysteresis by
 post-manipulating the outcome of thermal subsystem trip point handling.
 This patch enables acerhdf to use the bang-bang governor, which is
 intended for on-off controlled fans.
 
 CC: Zhang Rui rui.zh...@intel.com
 Cc: Andreas Mohr a...@lisas.de
 Cc: Borislav Petkov b...@suse.de
 Signed-off-by: Peter Feuerer pe...@piie.net
 ---
  drivers/platform/x86/Kconfig   |  2 +-
  drivers/platform/x86/acerhdf.c | 48 
 +++---
  2 files changed, 41 insertions(+), 9 deletions(-)
 
 diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
 index 27df2c5..0c15d89 100644
 --- a/drivers/platform/x86/Kconfig
 +++ b/drivers/platform/x86/Kconfig
 @@ -38,7 +38,7 @@ config ACER_WMI
  
  config ACERHDF
   tristate Acer Aspire One temperature and fan driver
 - depends on THERMAL  ACPI
 + depends on ACPI  THERMAL_GOV_BANG_BANG
   ---help---
 This is a driver for Acer Aspire One netbooks. It allows to access
 the temperature sensor and to control the fan.
 diff --git a/drivers/platform/x86/acerhdf.c b/drivers/platform/x86/acerhdf.c
 index 176edbd..f3884f9 100644
 --- a/drivers/platform/x86/acerhdf.c
 +++ b/drivers/platform/x86/acerhdf.c
 @@ -50,7 +50,7 @@
   */
  #undef START_IN_KERNEL_MODE
  
 -#define DRV_VER 0.5.30
 +#define DRV_VER 0.5.31
  
  /*
   * According to the Atom N270 datasheet,
 @@ -135,8 +135,8 @@ struct bios_settings_t {
   const char *vendor;
   const char *product;
   const char *version;
 - unsigned char fanreg;
 - unsigned char tempreg;
 + u8 fanreg;
 + u8 tempreg;
   struct fancmd cmd;
   int mcmd_enable;
  };
 @@ -259,6 +259,17 @@ static const struct bios_settings_t bios_tbl[] = {
  
  static const struct bios_settings_t *bios_cfg __read_mostly;
  
 +/*
 + * this struct is used to instruct thermal layer to use bang_bang instead of
 + * default governor for acerhdf
 + */
 +static struct thermal_zone_params acerhdf_zone_params = {
 + .governor_name = bang_bang,
 + .no_hwmon = 0,
 + .num_tbps = 0,
 + .tbp = 0,
 +};

You don't need to initialize statics to 0.  checkpatch only considers
it an error if it finds it in a variable, but I think it also applies
to fields in struct.

 +
  static int acerhdf_get_temp(int *temp)
  {
   u8 read_temp;
 @@ -436,6 +447,17 @@ static int acerhdf_get_trip_type(struct 
 thermal_zone_device *thermal, int trip,
  {
   if (trip == 0)
   *type = THERMAL_TRIP_ACTIVE;
 + if (trip == 1)
 + *type = THERMAL_TRIP_CRITICAL;

This looks like an unrelated change that should be on a patch on its
own.

 +
 + return 0;
 +}
 +
 +static int acerhdf_get_trip_hyst(struct thermal_zone_device *thermal, int 
 trip,
 +  unsigned long *temp)
 +{
 + if (trip == 0)
 + *temp = fanon - fanoff;
  
   return 0;
  }
 @@ -445,6 +467,8 @@ static int acerhdf_get_trip_temp(struct 
 thermal_zone_device *thermal, int trip,
  {
   if (trip == 0)
   *temp = fanon;
 + else if (trip == 1)
 + *temp = ACERHDF_TEMP_CRIT;
  
   return 0;
  }
 @@ -464,8 +488,10 @@ static struct thermal_zone_device_ops acerhdf_dev_ops = {
   .get_mode = acerhdf_get_mode,
   .set_mode = acerhdf_set_mode,
   .get_trip_type = acerhdf_get_trip_type,
 + .get_trip_hyst = acerhdf_get_trip_hyst,
   .get_trip_temp = acerhdf_get_trip_temp,
   .get_crit_temp = acerhdf_get_crit_temp,
 + .notify = NULL,

Same as before, no need to initialize static to NULL.

Cheers,
Javi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

dcache shrink list corruption?

2014-04-29 Thread Miklos Szeredi

This was reported by IBM for 3.12, but if my analysis is right, it affects
current kernel as well as older ones.

So the question is: does anything protect the shrink list from concurrent
modification by one or more dput() instances?

E.g. two dentries are on the shrink list, for both dget(), d_drop() and dput()
are called.  dput() - dentry_kill() - dentry_lru_del() - d_shrink_del() -
list_del_init().  Unlike the LRU list this is only protected with d_lock on the
individual dentries, which is not enough to prevent list corruption:

list-next = a, list-prev = b
a-next = b, a-prev = list
b-next = list, b-prev = a

CPU1: list_del_init(b)
__list_del(a, list)
 a-next = list ...
CPU2: list_del_init(a)
__list_del(list, list)
 list-next = list
 list-prev = list
CPU1: (continuing list_del_init(b))
 list-prev = a

Attached patch is just a starting point (untested).  Not sure how to minimize
contention without adding too much complexity.

Thanks,
Miklos


diff --git a/fs/dcache.c b/fs/dcache.c
index 40707d88a945..5e0719292e3e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -357,10 +357,14 @@ static void d_lru_del(struct dentry *dentry)
WARN_ON_ONCE(!list_lru_del(dentry-d_sb-s_dentry_lru, 
dentry-d_lru));
 }
 
+static __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_shrink_lock);
+
 static void d_shrink_del(struct dentry *dentry)
 {
D_FLAG_VERIFY(dentry, DCACHE_SHRINK_LIST | DCACHE_LRU_LIST);
+   spin_lock(dcache_shrink_lock);
list_del_init(dentry-d_lru);
+   spin_unlock(dcache_shrink_lock);
dentry-d_flags = ~(DCACHE_SHRINK_LIST | DCACHE_LRU_LIST);
this_cpu_dec(nr_dentry_unused);
 }
@@ -368,7 +372,9 @@ static void d_shrink_del(struct dentry *dentry)
 static void d_shrink_add(struct dentry *dentry, struct list_head *list)
 {
D_FLAG_VERIFY(dentry, 0);
+   spin_lock(dcache_shrink_lock);
list_add(dentry-d_lru, list);
+   spin_unlock(dcache_shrink_lock);
dentry-d_flags |= DCACHE_SHRINK_LIST | DCACHE_LRU_LIST;
this_cpu_inc(nr_dentry_unused);
 }
@@ -391,7 +397,9 @@ static void d_lru_shrink_move(struct dentry *dentry, struct 
list_head *list)
 {
D_FLAG_VERIFY(dentry, DCACHE_LRU_LIST);
dentry-d_flags |= DCACHE_SHRINK_LIST;
+   spin_lock(dcache_shrink_lock);
list_move_tail(dentry-d_lru, list);
+   spin_unlock(dcache_shrink_lock);
 }
 
 /*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 0/8] Introduce new cpufreq helper macros

2014-04-29 Thread Stratos Karafotis

On 29/04/2014 07:17 πμ, Viresh Kumar wrote:
 On 26 April 2014 01:45, Stratos Karafotis strat...@semaphore.gr wrote:
 This patch set introduces two freq_table helper macros which
 can be used for iteration over cpufreq_frequency_table and
 makes the necessary changes to cpufreq core and drivers that
 use such an iteration procedure.

 The motivation was a usage of common procedure to iterate over
 cpufreq_frequency_table across all drivers and cpufreq core.

 This was tested on a x86_64 platform.
 Most files compiled successfully but unfortunately I was not
 able to compile sh_sir.c pasemi_cpufreq.c and ppc_cbe_cpufreq.c
 due to lack of cross compiler.

 Changelog

 v4 - v5
 - Fix warnings in printk format specifier for 32 bit
   architectures in freq_table.c, longhaul, pasemi, ppc_cbe
 
 Doesn't look much has changed and so it stays as is:
 
 Acked-by: Viresh Kumar viresh.ku...@linaro.org
 

Thank you very much!


Stratos Karafotis
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 23/24] hid: Port hid-lg4ff to ff-memless-next

2014-04-29 Thread simon

 Port hid-lg4ff to ff-memless-next

 Signed-off-by: Michal MalÃ½ madcatxs...@devoid-pointer.net
 Tested-by: Tested-by: Elias Vanderstuyft elias@gmail.com

Signed-off-by: Simon Wood si...@mungewell.org


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched_{set,get}attr() manpage

2014-04-29 Thread Peter Zijlstra

On Tue, Apr 29, 2014 at 03:08:55PM +0200, Michael Kerrisk (man-pages) wrote:

Juri, Dario, Can you have a look at the 2nd part; I'm not at all sure I
got the activate/release the right way around.

My current thinking was that we activate first, and then release it to
go run. But googling the terms only confused me more. I suppose its one
of those things that's not actually _that_ well defined. And I hope the
ASCII art actually clarifies things better than the terms used.

 [1] A page describing the sched_setattr() and sched_getattr() APIs

NAME
sched_setattr, sched_getattr - set and get scheduling policy/attributes

SYNOPSIS
#include sched.h

struct sched_attr {
u32 size;
u32 sched_policy;
u64 sched_flags;

/* SCHED_NORMAL, SCHED_BATCH */
s32 sched_nice;

/* SCHED_FIFO, SCHED_RR */
u32 sched_priority;

/* SCHED_DEADLINE */
u64 sched_runtime;
u64 sched_deadline;
u64 sched_period;
};

int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned 
int flags);

int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned 
int size, unsigned int flags);

DESCRIPTION
sched_setattr() sets both the scheduling policy and the
associated attributes for the process whose ID is specified in
pid.

sched_setattr() replaces sched_setscheduler(), sched_setparam(),
nice() and some of setpriority().

If pid equals zero, the scheduling policy and attributes
of the calling process will be set.  The interpretation of the
argument attr depends on the selected policy.  Currently, Linux
supports the following normal (i.e., non-real-time) scheduling
policies:

SCHED_OTHER the standard fair time-sharing policy;

SCHED_BATCH for batch style execution of processes; and

SCHED_IDLE  for running very low priority background jobs.

The following real-time policies are also supported, for
special time-critical applications that need precise control
over the way in which runnable processes are selected for
execution:

SCHED_FIFO  a static priority first-in, first-out policy;

SCHED_RRa static priority round-robin policy; and

SCHED_DEADLINE  a dynamic priority deadline policy.

The semantics of each of these policies are detailed in
sched(7).

sched_attr::size must be set to the size of the structure, as in
sizeof(struct sched_attr), if the provided structure is smaller
than the kernel structure, any additional fields are assumed
'0'. If the provided structure is larger than the kernel
structure, the kernel verifies all additional fields are '0' if
not the syscall will fail with -E2BIG.

sched_attr::sched_policy the desired scheduling policy.

sched_attr::sched_flags additional flags that can influence
scheduling behaviour. Currently as per Linux kernel 3.14:

SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
on fork().

is the only supported flag.

sched_attr::sched_nice should only be set for SCHED_OTHER,
SCHED_BATCH, the desired nice value [-20,19], see sched(7).

sched_attr::sched_priority should only be set for SCHED_FIFO,
SCHED_RR, the desired static priority [1,99], see sched(7).

sched_attr::sched_runtime
sched_attr::sched_deadline
sched_attr::sched_period should only be set for SCHED_DEADLINE
and are the traditional sporadic task model parameters, see
sched(7).

The flags argument should be 0.

sched_getattr() queries the scheduling policy currently applied
to the process identified by pid.

Similar to sched_setattr(), sched_getattr() replaces
sched_getscheduler(), sched_getparam() and some of
getpriority().

If pid equals zero, the policy of the calling process will be
retrieved.

The size argument should reflect the size of struct sched_attr
as known to userspace. The kernel fills out sched_attr::size to
the size of its sched_attr structure. If the user provided
structure is larger, additional fields are not touched. If the
user provided structure is smaller, but the kernel needs to
return values outside the provided space, the syscall will fail
with -E2BIG.

The flags argument should be 0.

The other sched_attr fields are filled out as described in
sched_setattr().

RETURN VALUE
On success, sched_setattr() and sched_getattr() return 0. On
error, -1 is returned, and errno is set

Re: [PATCH v2] rwsem: Support optimistic spinning

2014-04-29 Thread Tim Chen

On Tue, 2014-04-29 at 08:11 -0700, Paul E. McKenney wrote:
 On Mon, Apr 28, 2014 at 05:50:49PM -0700, Tim Chen wrote:
  On Mon, 2014-04-28 at 16:10 -0700, Paul E. McKenney wrote:
  
+#ifdef CONFIG_SMP
+static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
+{
+   int retval;
+   struct task_struct *owner;
+
+   rcu_read_lock();
+   owner = ACCESS_ONCE(sem-owner);
   
   OK, I'll bite...
   
   Why ACCESS_ONCE() instead of rcu_dereference()?
  
  We're using it as a speculative check on the sem-owner to see
  if the owner is running on the cpu.  The rcu_read_lock
  is used for ensuring that the owner-on_cpu memory is
  still valid.
 
 OK, so if we read complete garbage, all that happens is that we
 lose a bit of performance?  

Correct.

 If so, I am OK with it as long as there
 is a comment (which Davidlohr suggested later in this thread).
 
Yes, we should add some comments to clarify things.

Thanks.

Tim


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On Tue, 29 Apr 2014 17:24:32 +0200 (CEST)
Jiri Kosina jkos...@suse.cz wrote:

 On Tue, 29 Apr 2014, Steven Rostedt wrote:
 
   According to 38.4 of [1], when SMM mode is entered while the CPU is 
   handling NMI, the end result might be that upon exit from SMM, NMIs will 
   be re-enabled and latched NMI delivered as nested [2].
  
  Note, if this were true, then the x86_64 hardware would be extremely
  buggy. That's because NMIs are not made to be nested. If SMM's come in
  during an NMI and re-enables the NMI, then *all* software would break.
  That would basically make NMIs useless.
  
  The only time I've ever witness problems (and I stress NMIs all the
  time), is when the NMI itself does a fault. Which my patch set handles
  properly. 
 
 Yes, it indeed does. 
 
 In the scenario I have outlined, the race window is extremely small, plus 
 NMIs don't happen that often, plus SMIs don't happen that often, plus 
 (hopefully) many BIOSes don't enable NMIs upon SMM exit.
 
 The problem is, that Intel documentation is clear in this respect, and 
 explicitly states it can happen. And we are violating that, which makes me 
 rather nervous -- it'd be very nice to know what is the background of 38.4 
 section text in the Intel docs.
 

You keep saying 38.4, but I don't see any 38.4. Perhaps you meant 34.8?

Which BTW is this:


34.8 NMI HANDLING WHILE IN SMM

NMI interrupts are blocked upon entry to the SMI handler. If an NMI
request occurs during the SMI handler, it is latched and serviced after
the processor exits SMM. Only one NMI request will be latched during
the SMI handler. If an NMI request is pending when the processor
executes the RSM instruction, the NMI is serviced before the next
instruction of the interrupted code sequence. This assumes that NMIs
were not blocked before the SMI occurred. If NMIs were blocked before
the SMI occurred, they are blocked after execution of RSM.

Although NMI requests are blocked when the processor enters SMM, they
may be enabled through software by executing an IRET instruction. If
the SMI handler requires the use of NMI interrupts, it should invoke a
dummy interrupt service routine for the purpose of executing an IRET
instruction. Once an IRET instruction is executed, NMI interrupt
requests are serviced in the same “real mode” manner in which they are
handled outside of SMM.

A special case can occur if an SMI handler nests inside an NMI handler
and then another NMI occurs. During NMI interrupt handling, NMI
interrupts are disabled, so normally NMI interrupts are serviced and
completed with an IRET instruction one at a time. When the processor
enters SMM while executing an NMI handler, the processor saves the
SMRAM state save map but does not save the attribute to keep NMI
interrupts disabled. Potentially, an NMI could be latched (while in SMM
or upon exit) and serviced upon exit of SMM even though the previous
NMI handler has still not completed. One or more NMIs could thus be
nested inside the first NMI handler. The NMI interrupt handler should
take this possibility into consideration.

Also, for the Pentium processor, exceptions that invoke a trap or fault
handler will enable NMI interrupts from inside of SMM. This behavior is
implementation specific for the Pentium processor and is not part of
the IA-32 architecture.


Read the first paragraph. That sounds like normal operation. The SMM
should use the RSM to return and that does not re-enable NMIs if the
SMM triggered during an NMI.

The above is just stating that the SMM can enable NMIs if it wants to
by executing an IRET. Which to me sounds rather buggy to do.

Now the third paragraph is rather ambiguous. It sounds like it's still
talking about doing an IRET in the SMI handler. As the IRET will enable
NMIs, and if the SMI happened while an NMI was happening, the new NMI
will happen. In this case, the NMI handler needs to address this. But
this really sounds like if you have control of both SMM handlers and
NMI handlers, which the Linux kernel certainly does not. Again, I label
this as a bug in the BIOS.

And again, if the SMM were to trigger a fault, it too would enable
NMIs. That is something that the SMM handler should not do.


Can you reproduce your problem on different platforms, or is this just
one box that exhibits this behavior? If it's only one box, I'm betting
it has a BIOS doing nasty things.

No where in the Intel text do I see that the operating system is to
handle nested NMIs. It needs to handle it if you control the SMMs,
which the operating system does not. Sounds like they are talking to
the firmware folks.

-- Steve


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] usb: dwc3: debugfs: add snapshot to dump requests trbs events

2014-04-29 Thread Felipe Balbi

Hi,

On Tue, Apr 29, 2014 at 05:21:42PM -0400, Zhuang Jin Can wrote:
 On Mon, Apr 28, 2014 at 10:55:36AM -0500, Felipe Balbi wrote:
  On Mon, Apr 28, 2014 at 04:49:23PM -0400, Zhuang Jin Can wrote:
   Adds a debugfs file snapshot to dump dwc3 requests, trbs and events.
  
  you need to explain what are you trying to provide to our users here.
  
  What problem are you trying to solve ?
  
 The interface enables users to easily peek into requests, trbs and
 events to know the current transfer state of each request.
 If an transfer is stuck, user can use the interface to check why it's
 stuck (e.g. Is it because a gadget doesn't queued the request? Or it's
 queued but it's not primed to the controller? Or It's primed to the
 controller but the TRBs and events indicate the transfer never completes?).
 User can immediately narrow down the issue without enabling verbose log or
 reproduce the issue again. It's helpful when we need to deal with some
 hard-to-reproduce bugs or timing sensitive bugs can't be reproduced with
 verbose log enabled.

this should be part of the commit log in some shape or form.

   As ep0 requests are more complex than others. It's not included in this
   patch.
  
  For ep0, you could at least print the endpoint phase we are currently
  in and if we have requests in flight or not.
  
 Agree. Will add it in [PATCH v2].

tks

   + seq_puts(s, busy_slot--|\n);
   + seq_puts(s,\\\n);
   + }
   + if (i == (dep-free_slot  DWC3_TRB_MASK)) {
   + seq_puts(s, free_slot--|\n);
   + seq_puts(s,\\\n);
   + }
   + seq_printf(s, trb[%02d](dma@0x%pad): %08x(bpl), %08x(bph), 
   %08x(size), %08x(ctrl)\n,
  
  I'm not sure you need to print out the TRB address. bpl, bph, size and
  ctrl are desired though.
  
 printing out the TRB DMA address helps user to locate the start TRB of a
 request. I admit that we can achive the same purose using the start_slot
 of the request. I'll remove it in [PATCH v2].

thanks

   + i, dep-trb_pool_dma + i * sizeof(*trb),
   + trb-bpl, trb-bph, trb-size, trb-ctrl);
  
  this will be pretty difficult to parse by a human. I would rather see
  you creating one directory per TRB (and also one directory per
  endpoint) which holds the details for that entity, so that it looks
  like:
  
  dwc3
  |-- current_state   (or perhaps a better name, but snapshot isn't very good 
  either)
 Actually, it's hard to find a perfect name. current_state or snapshot 
 doesn't
 make too much difference to me. If current_state makes more sense to you, I
 can change to use this name. Or let me know if you have a better suggestion.

the name is important as we will have to deal with it for the next 50
years. We also need to think about someone starting out on dwc3 5 years
from now or a QA engineer in whatever OEM trying to provide details of
the failure for the development team. It needs to be well thought out.

I don't have a better idea but snapshot gives me the idea that we will
end up with a copy of everything which we can revisit at any time and
that's not true. If we read this file twice there's no guarantee it'll
contain the same information.

  |-- ep2
  |   |-- direction
  |   |-- maxpacket
  |   |-- number
  |   |-- state
  |   |-- stream_capable
  |   |-- type
  |   |-- trbs
  |   |   |-- trb0
  |   |   |   |-- bph
  |   |   |   |-- bpl
  |   |   |   |-- ctrl
  |   |   |   |-- size
  |   |   |-- trb1
  |   |   |   |-- bph
  |   |   |   |-- bpl
  |   |   |   |-- ctrl
  |   |   |   |-- size
  |   |   |-- trb2
  |   |   |   |-- bph
  |   |   |   |-- bpl
  |   |   |   |-- ctrl
  |   |   |   |-- size
  |   |   |-- trb3
  |   |   |   |-- bph
  |   |   |   |-- bpl
  |   |   |   |-- ctrl
  |   |   |   |-- size
  .   .   .
  .   .   .
  .   .   .
  |   |-- request0
  |   |   |-- direction
  |   |   |-- mapped
  |   |   |-- queued
  |   |   |-- trb0(symlink to actual trb directory)
  |   |   |-- ep2 (symlink to actual ep2 directory)
  |   |   |-- usbrequest
  |   |   |-- actual
  |   |   |-- length
  |   |   |-- no_interrupt
  |   |   |-- num_mapped_sgs
  |   |   |-- num_sgs
  |   |   |-- short_not_ok
  |   |   |-- status
  |   |   |-- stream_id
  |   |   |-- zero
  |   |-- request1
  |   |   |-- direction
  |   |   |-- mapped
  |   |   |-- queued
  |   |   |-- trb1(symlink to actual trb directory)
  |   |   |-- ep2 (symlink to actual ep2 directory)
  |   |   |-- usbrequest
  |   |   |-- actual
  |   |   |-- length
  |   |   |-- no_interrupt
  |   |   |-- num_mapped_sgs
  |   |   |-- num_sgs
  |   |   |-- short_not_ok

Re: 64bit x86: NMI nesting still buggy?

On Tue, 29 Apr 2014 12:09:08 -0400
Steven Rostedt rost...@goodmis.org wrote:

 Can you reproduce your problem on different platforms, or is this just
 one box that exhibits this behavior? If it's only one box, I'm betting
 it has a BIOS doing nasty things.

This box probably crashes on all kernels too. My NMI nesting changes
did not fix a bug (well, it did as a side effect, see below). It was
done to allow NMIs to use IRET so that we could remove stopmachine from
ftrace, and instead have it use breakpoints (which return with IRET).

The bug that was fixed by this was the ability to do stack traces
(sysrq-t) from NMI context. Stack traces can page fault, and when I was
debugging hard lock ups and having the NMI do a stack dump of all
tasks, another NMI would trigger and corrupt the stack of the NMI doing
the dumps. But that was something that would only be seen while
debugging, and not something seen in normal operation.

I don't see a bug to fix in the kernel. I see a bug to fix in the
vendor's BIOS.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 1/2] FS: Add generic data flush to fsync

2014-04-29 Thread Jan Kara

On Mon 28-04-14 23:12:39, Fabian Frederick wrote:
 This patch issues a flush in generic_file_fsync.
 (Modern filesystems already do it)
 
 -Behaviour can be reversed using /sys/devices/.../cache_type
 -Filesystems can also call __generic_file_fsync with bool flush false
  The patch looks good. You can add:
Reviewed-by: Jan Kara j...@suse.cz

Honza

 
 Suggested-by: Jan Kara j...@suse.cz
 Suggested-by: Christoph Hellwig h...@infradead.org
 Cc: Jan Kara j...@suse.cz
 Cc: Christoph Hellwig h...@infradead.org
 Cc: Alexander Viro v...@zeniv.linux.org.uk
 Cc: Theodore Ts'o ty...@mit.edu
 Cc: Andrew Morton a...@linux-foundation.org
 Signed-off-by: Fabian Frederick f...@skynet.be
 ---
 V3: __generic_file_fsync = no flush
 V2: No flag
 V1: First version with MS_BARRIER flag
 
  fs/libfs.c | 36 +---
  include/linux/fs.h |  1 +
  2 files changed, 34 insertions(+), 3 deletions(-)
 
 diff --git a/fs/libfs.c b/fs/libfs.c
 index a184424..4877906 100644
 --- a/fs/libfs.c
 +++ b/fs/libfs.c
 @@ -3,6 +3,7 @@
   *   Library for filesystems writers.
   */
  
 +#include linux/blkdev.h
  #include linux/export.h
  #include linux/pagemap.h
  #include linux/slab.h
 @@ -923,16 +924,19 @@ struct dentry *generic_fh_to_parent(struct super_block 
 *sb, struct fid *fid,
  EXPORT_SYMBOL_GPL(generic_fh_to_parent);
  
  /**
 - * generic_file_fsync - generic fsync implementation for simple filesystems
 + * __generic_file_fsync - generic fsync implementation for simple filesystems
 + *
   * @file:file to synchronize
 + * @start:   start offset in bytes
 + * @end: end offset in bytes (inclusive)
   * @datasync:only synchronize essential metadata if true
   *
   * This is a generic implementation of the fsync method for simple
   * filesystems which track all non-inode metadata in the buffers list
   * hanging off the address_space structure.
   */
 -int generic_file_fsync(struct file *file, loff_t start, loff_t end,
 -int datasync)
 +int __generic_file_fsync(struct file *file, loff_t start, loff_t end,
 +  int datasync)
  {
   struct inode *inode = file-f_mapping-host;
   int err;
 @@ -952,10 +956,36 @@ int generic_file_fsync(struct file *file, loff_t start, 
 loff_t end,
   err = sync_inode_metadata(inode, 1);
   if (ret == 0)
   ret = err;
 +
  out:
   mutex_unlock(inode-i_mutex);
   return ret;
  }
 +EXPORT_SYMBOL(__generic_file_fsync);
 +
 +/**
 + * generic_file_fsync - generic fsync implementation for simple filesystems
 + *   with flush
 + * @file:file to synchronize
 + * @start:   start offset in bytes
 + * @end: end offset in bytes (inclusive)
 + * @datasync:only synchronize essential metadata if true
 + *
 + */
 +
 +int generic_file_fsync(struct file *file, loff_t start, loff_t end,
 +int datasync)
 +{
 + struct inode *inode = file-f_mapping-host;
 + int err;
 +
 + err = __generic_file_fsync(file, start, end, datasync);
 + if (err)
 + return err;
 +
 + return blkdev_issue_flush(inode-i_sb-s_bdev, GFP_KERNEL, NULL);
 +
 +}
  EXPORT_SYMBOL(generic_file_fsync);
  
  /**
 diff --git a/include/linux/fs.h b/include/linux/fs.h
 index 8780312..c3f46e4 100644
 --- a/include/linux/fs.h
 +++ b/include/linux/fs.h
 @@ -2590,6 +2590,7 @@ extern ssize_t simple_read_from_buffer(void __user *to, 
 size_t count,
  extern ssize_t simple_write_to_buffer(void *to, size_t available, loff_t 
 *ppos,
   const void __user *from, size_t count);
  
 +extern int __generic_file_fsync(struct file *, loff_t, loff_t, int);
  extern int generic_file_fsync(struct file *, loff_t, loff_t, int);
  
  extern int generic_check_addressable(unsigned, u64);
 -- 
 1.8.4.5
-- 
Jan Kara j...@suse.cz
SUSE Labs, CR
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Cocci] [PATCH v2 1/1] scripts/coccinelle: use BIT macro if used

2014-04-29 Thread Lars-Peter Clausen


On 04/27/2014 12:50 PM, Javier Martinez Canillas wrote:

Using the BIT() macro instead of manually shifting bits
makes the code less error prone.

If is more readable is a matter of taste so only replace
if the file is already using this macro.

Signed-off-by: Javier Martinez Canillas jav...@dowhile0.org


I don't think this should be enabled by default. It will generate a ton of 
false positives, not everything that is 1 shifted by something is a 
single-bit field. E.g. imagine a device with multi-bit fields:


#define FOOBAR_A (0  FOOBAR_OFFSET)
#define FOOBAR_B (1  FOOBAR_OFFSET)
#define FOOBAR_C (2  FOOBAR_OFFSET)
#define FOOBAR_D (3  FOOBAR_OFFSET)

The script will now suggest to replace FOOBAR_B (1  FOOBAR_OFFSET) with 
FOOBAR_B BIT(FOOBAR_OFFSET). Which is technically correct, but not semantically.


- Lars


---

Changes since v1:
  - Add a rule that checks if the file is already using this macro
as suggested by Julia Lawall

  scripts/coccinelle/api/bit.cocci | 30 ++
  1 file changed, 30 insertions(+)
  create mode 100644 scripts/coccinelle/api/bit.cocci

diff --git a/scripts/coccinelle/api/bit.cocci b/scripts/coccinelle/api/bit.cocci
new file mode 100644
index 000..a02cfd3
--- /dev/null
+++ b/scripts/coccinelle/api/bit.cocci
@@ -0,0 +1,30 @@
+// Use the BIT() macro if is already used
+//
+// Confidence: High
+// Copyright (C) 2014 Javier Martinez Canillas.  GPLv2.
+// URL: http://coccinelle.lip6.fr/
+// Options: --include-headers
+
+@hasbitops@
+@@
+
+#include linux/bitops.h
+
+@usesbit@
+@@
+
+BIT(...)
+
+@depends on hasbitops  usesbit@
+expression E;
+@@
+
+- 1  E
++ BIT(E)
+
+@depends on hasbitops  usesbit@
+expression E;
+@@
+
+- BIT((E))
++ BIT(E)



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] nohz: Fix iowait overcounting if iowait task migrates

2014-04-29 Thread Frederic Weisbecker

On Thu, Apr 24, 2014 at 08:45:58PM +0200, Denys Vlasenko wrote:
 Before this change, if last IO-blocked task wakes up
 on a different CPU, the original CPU may stay idle for much longer,
 and the entire time it stays idle is accounted as iowait time.
 
 This change adds struct tick_sched::iowait_exittime member.
 On entry to idle, it is set to KTIME_MAX.
 Last IO-blocked task, if migrated, sets it to current time.
 Note that this can happen only once per each idle period:
 new iowaiting tasks can't magically appear on idle CPU's rq.
 
 If iowait_exittime is set, then (iowait_exittime - idle_entrytime)
 gets accounted as iowait, and the remaining (now - iowait_exittime)
 as true idle.
 
 Run-tested: /proc/stat counters no longer go backwards.
 
 Signed-off-by: Denys Vlasenko dvlas...@redhat.com
 Cc: Frederic Weisbecker fweis...@gmail.com
 Cc: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
 Cc: Fernando Luis Vazquez Cao fernando...@lab.ntt.co.jp
 Cc: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp
 Cc: Thomas Gleixner t...@linutronix.de
 Cc: Ingo Molnar mi...@kernel.org
 Cc: Peter Zijlstra pet...@infradead.org
 Cc: Andrew Morton a...@linux-foundation.org
 Cc: Arjan van de Ven ar...@linux.intel.com
 Cc: Oleg Nesterov o...@redhat.com
 ---
  include/linux/tick.h |  2 ++
  kernel/sched/core.c  | 14 +++
  kernel/time/tick-sched.c | 64 
 
  3 files changed, 70 insertions(+), 10 deletions(-)
 
 diff --git a/include/linux/tick.h b/include/linux/tick.h
 index 4de1f9e..1bf653e 100644
 --- a/include/linux/tick.h
 +++ b/include/linux/tick.h
 @@ -67,6 +67,7 @@ struct tick_sched {
   ktime_t idle_exittime;
   ktime_t idle_sleeptime;
   ktime_t iowait_sleeptime;
 + ktime_t iowait_exittime;
   seqcount_t  idle_sleeptime_seq;
   ktime_t sleep_length;
   unsigned long   last_jiffies;
 @@ -140,6 +141,7 @@ extern void tick_nohz_irq_exit(void);
  extern ktime_t tick_nohz_get_sleep_length(void);
  extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
  extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
 +extern void tick_nohz_iowait_to_idle(int cpu);
  
  # else /* !CONFIG_NO_HZ_COMMON */
  static inline int tick_nohz_tick_stopped(void)
 diff --git a/kernel/sched/core.c b/kernel/sched/core.c
 index 268a45e..ffea757 100644
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -4218,7 +4218,14 @@ void __sched io_schedule(void)
   current-in_iowait = 1;
   schedule();
   current-in_iowait = 0;
 +#ifdef CONFIG_NO_HZ_COMMON
 + if (atomic_dec_and_test(rq-nr_iowait)) {
 + if (raw_smp_processor_id() != cpu_of(rq))
 + tick_nohz_iowait_to_idle(cpu_of(rq));

Note that even using seqlock doesn't alone help to fix the preemption issue
when the above may overwrite the exittime of the next last iowait task from
the old rq.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 2/2] fs/ext4/fsync.c: generic_file_fsync call based on barrier flag

2014-04-29 Thread Jan Kara

On Mon 28-04-14 23:15:08, Fabian Frederick wrote:
 generic_file_fsync has been updated to issue a flush for
 older filesystems.
 
 This patch tests for barrier flag in ext4 mount flags
 and calls the right function.
 
 Suggested-by: Jan Kara j...@suse.cz
 Suggested-by: Christoph Hellwig h...@infradead.org
 Cc: Jan Kara j...@suse.cz
 Cc: Christoph Hellwig h...@infradead.org
 Cc: Alexander Viro v...@zeniv.linux.org.uk
 Cc: Theodore Ts'o ty...@mit.edu
 Cc: Andrew Morton a...@linux-foundation.org
 Signed-off-by: Fabian Frederick f...@skynet.be
  The patch looks good. You can add:
Reviewed-by: Jan Kara j...@suse.cz

Honza
 ---
  fs/ext4/fsync.c | 4 
  1 file changed, 4 insertions(+)
 
 diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
 index a8bc47f..fa82c0a 100644
 --- a/fs/ext4/fsync.c
 +++ b/fs/ext4/fsync.c
 @@ -108,6 +108,10 @@ int ext4_sync_file(struct file *file, loff_t start, 
 loff_t end, int datasync)
  
   if (!journal) {
   ret = generic_file_fsync(file, start, end, datasync);
 + if (test_opt(inode-i_sb, BARRIER))
 + ret = generic_file_fsync(file, start, end, datasync);
 + else
 + ret = __generic_file_fsync(file, start, end, datasync);
   if (!ret  !hlist_empty(inode-i_dentry))
   ret = ext4_sync_parent(inode);
   goto out;
 -- 
 1.8.4.5
 
-- 
Jan Kara j...@suse.cz
SUSE Labs, CR
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 1/2] FS: Add generic data flush to fsync

2014-04-29 Thread Fabian Frederick


On Tue, 29 Apr 2014 18:19:07 +0200
Jan Kara j...@suse.cz wrote:

 On Mon 28-04-14 23:12:39, Fabian Frederick wrote:
  This patch issues a flush in generic_file_fsync.
  (Modern filesystems already do it)
  
  -Behaviour can be reversed using /sys/devices/.../cache_type
  -Filesystems can also call __generic_file_fsync with bool flush false
   The patch looks good. You can add:
 Reviewed-by: Jan Kara j...@suse.cz
 
   Honza

Just noticed I forgot to remove with bool flush false in patch description 
above
(-Filesystems can also call __generic_file_fsync with bool flush false)
Tell me if I need to send the patch again.

Fabian


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/5] powercap/rapl: change floor frequency for vallewview

2014-04-29 Thread Jacob Pan

On Tue, 29 Apr 2014 14:40:37 +
R, Durgadoss durgados...@intel.com wrote:

  -Original Message-
  From: Jacob Pan [mailto:jacob.jun@linux.intel.com]
  Sent: Tuesday, April 29, 2014 6:33 PM
  To: R, Durgadoss
  Cc: Linux PM; Wysocki, Rafael J; LKML; David E. Box; Alan Cox;
  Accardi, Kristen C Subject: Re: [PATCH 5/5] powercap/rapl: change
  floor frequency for vallewview

  On Tue, 29 Apr 2014 02:45:22 +
  R, Durgadoss durgados...@intel.com wrote:

   Hi Jacob,

-Original Message-
From: Jacob Pan [mailto:jacob.jun@linux.intel.com]
Sent: Monday, April 28, 2014 7:35 PM
To: Linux PM; Wysocki, Rafael J; LKML
Cc: David E. Box; Alan Cox; R, Durgadoss; Accardi, Kristen C;
Jacob Pan Subject: [PATCH 5/5] powercap/rapl: change floor
frequency for vallewview

RAPL power limit reduce power by limiting CPU P-state and
other techniques. On Valleyview, RAPL power limit cannot
go to LFM (low frequency mode) if we don't set the floor
frequency via IOSF mailbox.

This patch enables setting of floor frquency such that
RAPL power limit is more effective.

Signed-off-by: Jacob Pan jacob.jun@linux.intel.com
---
 drivers/powercap/intel_rapl.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/powercap/intel_rapl.c
b/drivers/powercap/intel_rapl.c index b1cda6f..13e4776 100644
--- a/drivers/powercap/intel_rapl.c
+++ b/drivers/powercap/intel_rapl.c
@@ -32,6 +32,7 @@

 #include asm/processor.h
 #include asm/cpu_device_id.h
+#include asm/iosf_mbi.h

 /* bitmasks for RAPL MSRs, used by primitive access functions
*/ #define ENERGY_STATUS_MASK  0x
@@ -336,11 +337,17 @@ static int find_nr_power_limit(struct
rapl_domain *rd) return i;
 }

+#define VLV_CPU_POWER_BUDGET_CTL (0x2)
+static const struct x86_cpu_id valleyview_id[] = {
+   { X86_VENDOR_INTEL, 6, 0x37},
+   {}
+};

   There are other platforms that have this FloorFreq register as
   well. And those addresses are not '0x02'. So, we need to have a
   cpu_id based table to define the address of the floor freq
   register as well. [This is not specific to valleyview.]

  Sounds like I need to add an abstraction to capture this. So far,
  there are only two exceptions so i was hesitate to do so. Thanks
  for the input.

 Yes, We at least have few platforms that need this.

   Also, is there a plan to expose this floor freq ratio through
   Sysfs for runtime configuration. ? May be through a standard
   thermal cooling device interface ?

  why would that be necessary? who will use it? floor freq only
  affects RAPL, AFAIK. In Linux there is no guaranteed freq anyway.
  My original patch to enable RAPL as cooling device was abandoned in
  favor of powercap framework, I am not sure if we should go back.

 There are user space thermal controls which change RAPL Power limits
 according to platform's thermal condition as you might be aware.

 The floor frequency is not used only to transition to LFM ratio. We
 can transition to any frequency ratio by adjusting this floor
 frequency (at least on VLV and couple more platforms)

 Hence while changing RAPL Power Limits, there is a need to adjust
 this also, to specify which ratio is our Floor (basically we will not
 go below that). That's why we need an interface for modifying this
 at run time (along with Power Limits).

I understand. What I am proposing here is to have a single knob for
user control power, instead of two knobs (power limit and floor freq)
which may have conflicts. When thermal throttling is needed, user only
cares about power limit, that is why I think it is better to set floor
to LFM and let power limit be the only knob. It is simpler. In case
freq is a constraint, user should use cpufreq interface.

 Thanks,
 Durga

+
 static int set_domain_enable(struct powercap_zone *power_zone,
bool mode) {
struct rapl_domain *rd =
power_zone_to_rapl_domain(power_zone); int nr_powerlimit;
-
+   u32 mdata = 0;
if (rd-state  DOMAIN_STATE_BIOS_LOCKED)
return -EACCES;
get_online_cpus();
@@ -350,7 +357,16 @@ static int set_domain_enable(struct
powercap_zone *power_zone, bool mode)
/* always enable clamp such that p-state can go below
OS requested
 * range. power capping priority over guranteed
frequency. */
-   rapl_write_data_raw(rd, PL1_CLAMP, mode);
+   if (x86_match_cpu(valleyview_id)) {
+   iosf_mbi_read(BT_MBI_UNIT_PMC, BT_MBI_PMC_READ,
+   VLV_CPU_POWER_BUDGET_CTL, mdata);
+   mdata = ~(0x7f  8);
+   mdata |= 1  8;
+   iosf_mbi_write(BT_MBI_UNIT_PMC,
BT_MBI_PMC_WRITE,
+   VLV_CPU_POWER_BUDGET_CTL,

Re: pid ns feature request

2014-04-29 Thread Andy Lutomirski

On Mon, Apr 28, 2014 at 6:39 AM, Serge Hallyn serge.hal...@ubuntu.com wrote:
 Quoting Andy Lutomirski (l...@amacapital.net):
 On Fri, Apr 25, 2014 at 12:37 PM, Eric W. Biederman
 ebied...@xmission.com wrote:
  Andy Lutomirski l...@amacapital.net writes:
 
  Unless I'm missing some trick, it's currently rather painful to mount
  a namespace /proc.  You have to actually be in the pid namespace to
  mount the correct /proc instance, and you can't unmount the old /proc
  until you've mounted the new /proc.  This means that you have to fork
  into the new pid namespace before you can finish setting it up.
 
  Yes.  You have to be inside just about all namespaces before you can
  finish setting them up.
 
  I don't know the context in which needed to be inside the pid namespace
  is a burden.

 I'm trying to sandbox myself.  I unshare everything, setup up new
 mounts, pivot_root, umount the old stuff, fork, and wait around for
 the child to finish.

 This doesn't work: the parent can't mount the new /proc, and the child
 can't either because it's too late.

 I'm probably not thinking it through enough...  But can't the parent, before
 forking, do

 mkdir -p /childproc/proc
 mount --bind /childproc /childproc
 mount --make-rshared /childproc

 then the child mounts its proc under /childproc/proc and have that show
 up in the parent's tree?

Yes, and the --make-rshared /childproc isn't necessary.  This is still
a bit annoying, since the parent now needs to wait for the child to
set up mounts if it wants to do anything that requires all the mounts
to be fully set up.

This issue certainly isn't a show-stopper, but it might be nice to
address if anyone ever adds options to proc to do other sensible
namespacy things (e.g. turning off sysctls).

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Warning from kernel/printk/printk.c in linux-next

2014-04-29 Thread Fabio Estevam

Jan,

I am running linux-next 20140429 on a mx6 board (ARM 32-bit) and after commit
5dc90cb49691755faa (printk: enable interrupts before calling
console_trylock_for_printk()) I get the following warning:


[ INFO: possible recursive locking detected ]
3.15.0-rc3-next-20140429-1-gac246a5 #1074 Not tainted
-
swapper/0/0 is trying to acquire lock:
 (console_lock){+.+...}, at: [808c1358] con_init+0x14/0x29c

but task is already holding lock:
 (console_lock){+.+...}, at: [8006deac] vprintk_emit+0x194/0x514

other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(console_lock);
  lock(console_lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

1 lock held by swapper/0/0:
 #0:  (console_lock){+.+...}, at: [8006deac] vprintk_emit+0x194/0x514

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.15.0-rc3-next-20140429-1-gac2464
Backtrace:
[80011cbc] (dump_backtrace) from [80011e58] (show_stack+0x18/0x1c)
 r6:8114b4fc r5: r4: r3:
[80011e40] (show_stack) from [8065e65c] (dump_stack+0x88/0xa4)
[8065e5d4] (dump_stack) from [80065518] (__lock_acquire+0x1494/0x1c10)
 r5:808f3f70 r4:80a7d6c0
[80064084] (__lock_acquire) from [80066178] (lock_acquire+0x68/0x7c)
 r10: r9:412fc09a r8:8000406a r7:0001 r6:6153 r5:808e8000
 r4:
[80066110] (lock_acquire) from [8006c91c] (console_lock+0x54/0x68)
 r7:befffbc0 r6:808d6db0 r5:808c1358 r4:0001
[8006c8c8] (console_lock) from [808c1358] (con_init+0x14/0x29c)
 r5:808e37a4 r4:808e37a4
[808c1344] (con_init) from [808c0ae4] (console_init+0x24/0x38)
 r6:808d6db0 r5:808e37a4 r4:808e37a4
[808c0ac0] (console_init) from [80894bb4] (start_kernel+0x26c/0x3a4)
 r5:8094d5c0 r4:
[80894948] (start_kernel) from [80008074] (0x80008074)
 r7:808f5644 r6:808d6dac r5:808f0928 r4:10c5387d

Reverting this commit causes the warning to go away.

Any suggestions?

Regards,

Fabio Estevam
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v3 1/9] sysrq: Implement __handle_sysrq_nolock to avoid recursive locking in kdb

2014-04-29 Thread Colin Cross

On Tue, Apr 29, 2014 at 1:59 AM, Daniel Thompson
daniel.thomp...@linaro.org wrote:
 On 28/04/14 18:44, Colin Cross wrote:
 Is that case documented somewhere in the code comments?

 Perhaps not near enough to the _nolock but the primary bit of comment is
 here (and in same file as kdb_sr).
 --- cut here ---
  * kdb_main_loop - After initial setup and assignment of the
  *  controlling cpu, all cpus are in this loop.  One cpu is in
  *  control and will issue the kdb prompt, the others will spin
  *  until 'go' or cpu switch.
 --- cut here ---

 The mechanism kgdb uses to quiesce other CPUs means other CPUs cannot be
 in irqsave critical sections.



 One of the advantages of FIQ debugger is that it can be triggered from
 an FIQ (NMI for those in x86 land), and Jason and I have discussed
 using FIQs for kgdb to allow interrupting cpus stuck in critical
 sections.  If that gets implemented the above assumption will no
 longer be correct.

 Reviewing this I realized I missed one of the most critical points in
 the above.

 Today kdb, even if triggered by FIQ/NMI, would still be likely to wedge
 waiting for the IPI interrupts to be delivered to other processors.

 Did you and Jason discuss getting the active CPU to quiesce the other
 processors using FIQ/NMI, or to allow the active CPU to timeout while
 waiting for them the stop?


 Daniel.

Yes, all cpus would have to get an FIQ/NMI.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpufreq: intel_pstate: Change the calculation of next pstate

2014-04-29 Thread Stratos Karafotis

On 29/04/2014 07:58 πμ, Viresh Kumar wrote:
 Cc'd Dirk,
 
 On 28 April 2014 03:42, Stratos Karafotis strat...@semaphore.gr wrote:
 Currently the driver calculates the next pstate proportional to
 core_busy factor and reverse proportional to current pstate.

 Change the above method and calculate the next pstate independently
 of current pstate.
 
 We must mention why the change is required.
 

Hi Viresh,

Actually, I can't say that it's required. :)
I just believe that calculation of next p-state should be independent
from current one. In my opinion we can't scale the load across different
p-states, because it's not always equivalent.

For example suppose a load of 100% because of a tight for loop in the
current p-state. It will be also a 100% load in any other p-state.
It will be wrong if we scale the load in the calculation formula
according to the current p-state.

I included the test results in the change log to point out an improvement
because of this patch.

I will enrich more the change log as you suggested.

Thanks,
Stratos Karafotis

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 00/10] arm64: UEFI support

On Tue, Apr 29, 2014 at 03:47:13PM +0100, Matt Fleming wrote:
 On Tue, 29 Apr, at 02:47:28PM, Catalin Marinas wrote:
  Given that Leif's series contains both generic efi and arm64 patches,
  what's your preference for merging them? I'm happy to add my ack and
  they go via your tree (or the other way around).
 
 I'm happy either way, though if I take them through my tree (and
 subsequently through tip) you won't have to worry about the merge window
 rigmarole, which is a plus.
 
 So, eveyone happy for me to take these with Catalin's Acked-by?

Fine by me. Just in case I haven't stated it explicitly for this series:

Acked-by: Catalin Marinas catalin.mari...@arm.com

Thanks.

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mmotm 2014-04-24-13-07 uploaded