Re: [PATCH] memcg: deprecate memory.force_empty knob

2014-05-16 Thread Greg Thelen

On Tue, May 13 2014, Michal Hocko  wrote:

> force_empty has been introduced primarily to drop memory before it gets
> reparented on the group removal. This alone doesn't sound fully
> justified because reparented pages which are not in use can be reclaimed
> also later when there is a memory pressure on the parent level.
>
> Mark the knob CFTYPE_INSANE which tells the cgroup core that it
> shouldn't create the knob with the experimental sane_behavior. Other
> users will get informed about the deprecation and asked to tell us more
> because I do not expect most users will use sane_behavior cgroups mode
> very soon.
> Anyway I expect that most users will be simply cgroup remove handlers
> which do that since ever without having any good reason for it.
>
> If somebody really cares because reparented pages, which would be
> dropped otherwise, push out more important ones then we should fix the
> reparenting code and put pages to the tail.

I should mention a case where I've needed to use memory.force_empty: to
synchronously flush stats from child to parent.  Without force_empty
memory.stat is temporarily inconsistent until async css_offline
reparents charges.  Here is an example on v3.14 showing that
parent/memory.stat contents are in-flux immediately after rmdir of
parent/child.

$ cat /test
#!/bin/bash

# Create parent and child.  Add some non-reclaimable anon rss to child,
# then move running task to parent.
mkdir p p/c
(echo $BASHPID > p/c/cgroup.procs && exec sleep 1d) &
pid=$!
sleep 1
echo $pid > p/cgroup.procs 

grep 'rss ' {p,p/c}/memory.stat
if [[ $1 == force ]]; then
  echo 1 > p/c/memory.force_empty
fi
rmdir p/c

echo 'For a small time the p/c memory has not been reparented to p.'
grep 'rss ' {p,p/c}/memory.stat

sleep 1
echo 'After waiting all memory has been reparented'
grep 'rss ' {p,p/c}/memory.stat

kill $pid
rmdir p


-- First, demonstrate that just rmdir, without memory.force_empty,
   temporarily hides reparented child memory stats.

$ /test
p/memory.stat:rss 0
p/memory.stat:total_rss 69632
p/c/memory.stat:rss 69632
p/c/memory.stat:total_rss 69632
For a small time the p/c memory has not been reparented to p.
p/memory.stat:rss 0
p/memory.stat:total_rss 0
grep: p/c/memory.stat: No such file or directory
After waiting all memory has been reparented
p/memory.stat:rss 69632
p/memory.stat:total_rss 69632
grep: p/c/memory.stat: No such file or directory
/test: Terminated  ( echo $BASHPID > p/c/cgroup.procs && exec sleep 
1d )

-- Demonstrate that using memory.force_empty before rmdir, behaves more
   sensibly.  Stats for reparented child memory are not hidden.

$ /test force
p/memory.stat:rss 0
p/memory.stat:total_rss 69632
p/c/memory.stat:rss 69632
p/c/memory.stat:total_rss 69632
For a small time the p/c memory has not been reparented to p.
p/memory.stat:rss 69632
p/memory.stat:total_rss 69632
grep: p/c/memory.stat: No such file or directory
After waiting all memory has been reparented
p/memory.stat:rss 69632
p/memory.stat:total_rss 69632
grep: p/c/memory.stat: No such file or directory
/test: Terminated  ( echo $BASHPID > p/c/cgroup.procs && exec sleep 
1d )
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] ACPI and power management fixes for v3.15-rc6

2014-05-16 Thread Rafael J. Wysocki
Hi Linus,

Sorry for this last-minute update, but it's just turned out that
one of the new ACPI video blacklist entries was added overzealously
and added is a commit reverting it.

Please pull from

 git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
 pm+acpi-3.15-rc6

to receive ACPI and power management fixes for v3.15-rc6
with top-most commit 658a0f4e661a6c07395de318a58f9058ba2faf8f

  Merge branch 'acpi-video'

on top of commit d6d211db37e75de2ddc3a4f979038c40df7cc79c

  Linux 3.15-rc5

Still fixing regressions (partly by reverting commits that broke
things for people), fixing other stable-candidate bugs and adding
some blacklist entries for ACPI video and _OSI.  Two ACPICA regression
fixes (one recent and one for a 3.14 commit), a fix for an ACPI-related
regression in TPM (introduced in 3.14), a revert of the ACPI AC
driver conversion in 3.13 that went wrong for an unknown reason,
two reverts of commits that attempted to remove an old user space
interface in /proc and broke some utilities, in 3.13 too, a fix for
a CPU hotplug bug in the ACPI processor driver (stable material),
two (stable candidate) fixes for intel_pstate and a few new blacklist
entries, mostly for systems that shipped with Windows 8.

Specifics:

 - ACPICA fix for a stale pointer access introduced by a recent
   commit in the XSDT validation code from Lv Zheng.

 - ACPICA fix for the default value of the command line switch
   to favor 32-bit FADT addresses (in case there's a conflict
   between a 64-bit and a 32-bit address).  The previous default
   was that the 32-bit version would take precedence and we tried
   to change it to the other way around and it didn't work.
   From Lv Zheng.

 - A TPM commit related to ACPI _DSM in 3.14 caused the driver to
   refuse to load if a specific _DSM was missing and that broke
   resume from system suspend on Chromebooks that require the TPM
   hardware to be restored to a working state during resume by the
   OS.  Restore the old behavior to load the driver if the _DSM
   in question is not present, but prevent it from using the
   feature the _DSM is for.

 - ACPI AC driver conversion in 3.13 broke thermal management on
   at least one machine and has to be reverted.  From Guenter Roeck.

 - Two reverts of 3.13 commits that attempted to remove the old ACPI
   battery interface in /proc, but turned out to break some utilities
   still using that interface.  From Lan Tianyu.

 - ACPI processor driver fix to prevent acpi_processor_add() from
   modifying the CPU device's .offline field which leads to breakage
   if the initial online of the CPU fails.  From Igor Mammedov.

 - Two intel_pstate fixes, one to take a BayTrail documentation update
   into account and one to avoid forcing the maximum P-state on init
   which causes CPU PM trouble on systems with P-states coordination
   when one of the CPU cores is initialized after an offline/online
   cycle triggered by user space.  Both stable candidates, from
   Dirk Brandewie.

 - Fix for the ACPI video DMI blacklist entry for Dell Inspiron 7520
   from Aaron Lu.

 - Two new ACPI video blacklist entries for machines shipping with
   Win8 that need to use native backlight so that it can be controlled
   in a usual way (which doesn't work otherwise due bugs in the ACPI
   tables) from Hans de Goede.

 - Two ACPI _OSI quirks for systems that need them to work correctly
   with Linux from Edward Lin and Hans de Goede.

Thanks!


---

Aaron Lu (1):
  ACPI / video: correct DMI tag for Dell Inspiron 7520

Dirk Brandewie (2):
  intel_pstate: Set turbo VID for BayTrail
  intel_pstate: remove setting P state to MAX on init

Edward Lin (1):
  ACPI: blacklist win8 OSI for Dell Inspiron 7737

Guenter Roeck (1):
  ACPI: Revert "ACPI / AC: convert ACPI ac driver to platform bus"

Hans de Goede (3):
  ACPI / video: Add use_native_backlight quirks for more systems
  ACPI / blacklist: Add dmi_enable_osi_linux quirk for Asus EEE PC 1015PX
  ACPI / video: Revert native brightness quirk for ThinkPad T530

Igor Mammedov (1):
  ACPI / processor: do not mark present at boot but not onlined
CPU as onlined

Lan Tianyu (2):
  ACPI: Revert "ACPI: Remove CONFIG_ACPI_PROCFS_POWER and cm_sbsc.c"
  ACPI: Revert "ACPI / Battery: Remove battery's proc directory"

Lv Zheng (2):
  ACPICA: Tables: Fix invalid pointer accesses in
acpi_tb_parse_root_table().
  ACPICA: Tables: Restore old behavor to favor 32-bit FADT addresses.

Rafael J. Wysocki (2):
  ACPI / proc: Do not say when /proc interfaces will be deleted in Kconfig
  ACPI / TPM: Fix resume regression on Chromebooks

---

 drivers/acpi/Kconfig   |  17 +++
 drivers/acpi/Makefile  |   1 +
 drivers/acpi/ac.c  | 117 ---
 drivers/acpi/acpi_platform.c   |   1 -
 drivers/acpi/acpi_processor.c  |   1 -
 drivers/acpi/acpica/acglobal.h |   4 +-
 drivers/acpi/acpica/tbutils.c  |   7 +-
 

[PATCH 3/8] list: Out of line INIT_LIST_HEAD and list_del

2014-05-16 Thread Andi Kleen
From: Andi Kleen 

Out of lining these two inlines saves ~21k on my vmlinux

141527132003976 1507328 1766401710d8811 vmlinux-before-list
141314312008136 1507328 1764689510d452f vmlinux-list

Signed-off-by: Andi Kleen 
---
 include/linux/list.h | 15 ---
 lib/Makefile |  2 +-
 lib/list.c   | 22 ++
 3 files changed, 27 insertions(+), 12 deletions(-)
 create mode 100644 lib/list.c

diff --git a/include/linux/list.h b/include/linux/list.h
index ef95941..8297885 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -21,11 +21,8 @@
 #define LIST_HEAD(name) \
struct list_head name = LIST_HEAD_INIT(name)
 
-static inline void INIT_LIST_HEAD(struct list_head *list)
-{
-   list->next = list;
-   list->prev = list;
-}
+/* Out of line to save space */
+void INIT_LIST_HEAD(struct list_head *list);
 
 /*
  * Insert a new entry between two known consecutive entries.
@@ -101,12 +98,8 @@ static inline void __list_del_entry(struct list_head *entry)
__list_del(entry->prev, entry->next);
 }
 
-static inline void list_del(struct list_head *entry)
-{
-   __list_del(entry->prev, entry->next);
-   entry->next = LIST_POISON1;
-   entry->prev = LIST_POISON2;
-}
+/* Out of line to save space */
+void list_del(struct list_head *entry);
 #else
 extern void __list_del_entry(struct list_head *entry);
 extern void list_del(struct list_head *entry);
diff --git a/lib/Makefile b/lib/Makefile
index 0cd7b68..8b744f7 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -13,7 +13,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \
 proportions.o flex_proportions.o prio_heap.o ratelimit.o show_mem.o \
 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
-earlycpio.o
+earlycpio.o list.o
 
 obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
 lib-$(CONFIG_MMU) += ioremap.o
diff --git a/lib/list.c b/lib/list.c
new file mode 100644
index 000..298768f
--- /dev/null
+++ b/lib/list.c
@@ -0,0 +1,22 @@
+#include 
+#include 
+
+/*
+ * Out of line versions of common list.h functions that bloat the
+ * kernel too much.
+ */
+
+void INIT_LIST_HEAD(struct list_head *list)
+{
+   list->next = list;
+   list->prev = list;
+}
+EXPORT_SYMBOL(INIT_LIST_HEAD);
+
+void list_del(struct list_head *entry)
+{
+   __list_del(entry->prev, entry->next);
+   entry->next = LIST_POISON1;
+   entry->prev = LIST_POISON2;
+}
+EXPORT_SYMBOL(list_del);
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/8] radeon: Out of line radeon_get_ib_value

2014-05-16 Thread Andi Kleen
From: Andi Kleen 

Saves about 5k of text

   textdata bss dec hex filename
140803602008168 1507328 1759585610c7dd0 vmlinux-before-radeon
140749782008168 1507328 1759047410c68ca vmlinux-radeon

Cc: alexander.deuc...@amd.com
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: Andi Kleen 
---
 drivers/gpu/drm/radeon/radeon.h| 10 +-
 drivers/gpu/drm/radeon/radeon_device.c |  9 +
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 6852861..8cae409 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -1032,15 +1032,7 @@ struct radeon_cs_parser {
struct ww_acquire_ctx   ticket;
 };
 
-static inline u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
-{
-   struct radeon_cs_chunk *ibc = >chunks[p->chunk_ib_idx];
-
-   if (ibc->kdata)
-   return ibc->kdata[idx];
-   return p->ib.ptr[idx];
-}
-
+u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx);
 
 struct radeon_cs_packet {
unsignedidx;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 0e770bb..1cbd171 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -112,6 +112,15 @@ bool radeon_is_px(struct drm_device *dev)
return false;
 }
 
+u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
+{
+   struct radeon_cs_chunk *ibc = >chunks[p->chunk_ib_idx];
+
+   if (ibc->kdata)
+   return ibc->kdata[idx];
+   return p->ib.ptr[idx];
+}
+
 /**
  * radeon_program_register_sequence - program an array of registers.
  *
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/8] ftrace: Out of line ftrace_trigger_soft_disabled

2014-05-16 Thread Andi Kleen
From: Andi Kleen 

Out of lining this function saves about 14k text

   textdata bss dec hex filename
140946292004040 1507328 1760599710ca56d vmlinux-before-ftrace
140796502008136 1507328 1759511410c7aea vmlinux-ftrace

Signed-off-by: Andi Kleen 
---
 include/linux/ftrace_event.h| 23 +--
 kernel/trace/trace_events_trigger.c | 25 +
 2 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index d16da3e..70be665 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -416,28 +416,7 @@ extern enum event_trigger_type event_triggers_call(struct 
ftrace_event_file *fil
 extern void event_triggers_post_call(struct ftrace_event_file *file,
 enum event_trigger_type tt);
 
-/**
- * ftrace_trigger_soft_disabled - do triggers and test if soft disabled
- * @file: The file pointer of the event to test
- *
- * If any triggers without filters are attached to this event, they
- * will be called here. If the event is soft disabled and has no
- * triggers that require testing the fields, it will return true,
- * otherwise false.
- */
-static inline bool
-ftrace_trigger_soft_disabled(struct ftrace_event_file *file)
-{
-   unsigned long eflags = file->flags;
-
-   if (!(eflags & FTRACE_EVENT_FL_TRIGGER_COND)) {
-   if (eflags & FTRACE_EVENT_FL_TRIGGER_MODE)
-   event_triggers_call(file, NULL);
-   if (eflags & FTRACE_EVENT_FL_SOFT_DISABLED)
-   return true;
-   }
-   return false;
-}
+extern bool ftrace_trigger_soft_disabled(struct ftrace_event_file *file);
 
 /*
  * Helper function for event_trigger_unlock_commit{_regs}().
diff --git a/kernel/trace/trace_events_trigger.c 
b/kernel/trace/trace_events_trigger.c
index 4747b47..136c181 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -28,6 +28,31 @@
 static LIST_HEAD(trigger_commands);
 static DEFINE_MUTEX(trigger_cmd_mutex);
 
+
+/**
+ * ftrace_trigger_soft_disabled - do triggers and test if soft disabled
+ * @file: The file pointer of the event to test
+ *
+ * If any triggers without filters are attached to this event, they
+ * will be called here. If the event is soft disabled and has no
+ * triggers that require testing the fields, it will return true,
+ * otherwise false.
+ */
+bool
+ftrace_trigger_soft_disabled(struct ftrace_event_file *file)
+{
+   unsigned long eflags = file->flags;
+
+   if (!(eflags & FTRACE_EVENT_FL_TRIGGER_COND)) {
+   if (eflags & FTRACE_EVENT_FL_TRIGGER_MODE)
+   event_triggers_call(file, NULL);
+   if (eflags & FTRACE_EVENT_FL_SOFT_DISABLED)
+   return true;
+   }
+   return false;
+}
+EXPORT_SYMBOL(ftrace_trigger_soft_disabled);
+
 static void
 trigger_data_free(struct event_trigger_data *data)
 {
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/8] radeonfb: Out of line errata workarounds

2014-05-16 Thread Andi Kleen
From: Andi Kleen 

Out of lining _radeon_msleep and radeon_pll_errata_* saves about 40k text.

141936732003976 1507328 1770497710e2811 vmlinux-before-radeon
141527132003976 1507328 1766401710d8811 vmlinux-radeon

Cc: Benjamin Herrenschmidt 
Cc: linux-fb...@vger.kernel.org
Signed-off-by: Andi Kleen 
---
 drivers/video/fbdev/aty/radeon_base.c | 57 ++
 drivers/video/fbdev/aty/radeonfb.h| 58 ++-
 2 files changed, 60 insertions(+), 55 deletions(-)

diff --git a/drivers/video/fbdev/aty/radeon_base.c 
b/drivers/video/fbdev/aty/radeon_base.c
index 26d80a4..abd89a9 100644
--- a/drivers/video/fbdev/aty/radeon_base.c
+++ b/drivers/video/fbdev/aty/radeon_base.c
@@ -282,6 +282,63 @@ static int backlight = 1;
 static int backlight = 0;
 #endif
 
+/* Note about this function: we have some rare cases where we must not 
schedule,
+ * this typically happen with our special "wake up early" hook which allows us 
to
+ * wake up the graphic chip (and thus get the console back) before everything 
else
+ * on some machines that support that mechanism. At this point, interrupts are 
off
+ * and scheduling is not permitted
+ */
+void _radeon_msleep(struct radeonfb_info *rinfo, unsigned long ms)
+{
+   if (rinfo->no_schedule || oops_in_progress)
+   mdelay(ms);
+   else
+   msleep(ms);
+}
+
+/*
+ * Note about PLL register accesses:
+ *
+ * I have removed the spinlock on them on purpose. The driver now
+ * expects that it will only manipulate the PLL registers in normal
+ * task environment, where radeon_msleep() will be called, protected
+ * by a semaphore (currently the console semaphore) so that no conflict
+ * will happen on the PLL register index.
+ *
+ * With the latest changes to the VT layer, this is guaranteed for all
+ * calls except the actual drawing/blits which aren't supposed to use
+ * the PLL registers anyway
+ *
+ * This is very important for the workarounds to work properly. The only
+ * possible exception to this rule is the call to unblank(), which may
+ * be done at irq time if an oops is in progress.
+ */
+void radeon_pll_errata_after_index(struct radeonfb_info *rinfo)
+{
+   if (!(rinfo->errata & CHIP_ERRATA_PLL_DUMMYREADS))
+   return;
+
+   (void)INREG(CLOCK_CNTL_DATA);
+   (void)INREG(CRTC_GEN_CNTL);
+}
+
+void radeon_pll_errata_after_data(struct radeonfb_info *rinfo)
+{
+   if (rinfo->errata & CHIP_ERRATA_PLL_DELAY) {
+   /* we can't deal with posted writes here ... */
+   _radeon_msleep(rinfo, 5);
+   }
+   if (rinfo->errata & CHIP_ERRATA_R300_CG) {
+   u32 save, tmp;
+   save = INREG(CLOCK_CNTL_INDEX);
+   tmp = save & ~(0x3f | PLL_WR_EN);
+   OUTREG(CLOCK_CNTL_INDEX, tmp);
+   tmp = INREG(CLOCK_CNTL_DATA);
+   OUTREG(CLOCK_CNTL_INDEX, save);
+   }
+}
+
+
 /*
  * prototypes
  */
diff --git a/drivers/video/fbdev/aty/radeonfb.h 
b/drivers/video/fbdev/aty/radeonfb.h
index cb84604..bb73446 100644
--- a/drivers/video/fbdev/aty/radeonfb.h
+++ b/drivers/video/fbdev/aty/radeonfb.h
@@ -370,20 +370,7 @@ struct radeonfb_info {
  * IO macros
  */
 
-/* Note about this function: we have some rare cases where we must not 
schedule,
- * this typically happen with our special "wake up early" hook which allows us 
to
- * wake up the graphic chip (and thus get the console back) before everything 
else
- * on some machines that support that mechanism. At this point, interrupts are 
off
- * and scheduling is not permitted
- */
-static inline void _radeon_msleep(struct radeonfb_info *rinfo, unsigned long 
ms)
-{
-   if (rinfo->no_schedule || oops_in_progress)
-   mdelay(ms);
-   else
-   msleep(ms);
-}
-
+void _radeon_msleep(struct radeonfb_info *rinfo, unsigned long ms);
 
 #define INREG8(addr)   readb((rinfo->mmio_base)+addr)
 #define OUTREG8(addr,val)  writeb(val, (rinfo->mmio_base)+addr)
@@ -408,47 +395,8 @@ static inline void _OUTREGP(struct radeonfb_info *rinfo, 
u32 addr,
 
 #define OUTREGP(addr,val,mask) _OUTREGP(rinfo, addr, val,mask)
 
-/*
- * Note about PLL register accesses:
- *
- * I have removed the spinlock on them on purpose. The driver now
- * expects that it will only manipulate the PLL registers in normal
- * task environment, where radeon_msleep() will be called, protected
- * by a semaphore (currently the console semaphore) so that no conflict
- * will happen on the PLL register index.
- *
- * With the latest changes to the VT layer, this is guaranteed for all
- * calls except the actual drawing/blits which aren't supposed to use
- * the PLL registers anyway
- *
- * This is very important for the workarounds to work properly. The only
- * possible exception to this rule is the call to unblank(), which may
- * be done at irq time if an oops is in progress.
- */
-static inline void 

Fix some common inline bloat

2014-05-16 Thread Andi Kleen
It's very easy to bloat the kernel code significantly by adding
code to commonly called inlines. Often these inlines start small,
but later when new code is added they don't get moved out-of-line.

I wrote a new tool to account for inline bloat. Addressing selected
occurrences in the top-20 of my kernel config saved about
145k.

   textdata bss dec hex filename
142208732008072 1507328 1773627310ea251 vmlinux-before-anything
140749782008168 1507328 1759047410c68ca vmlinux-inline

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/8] e1000e: Out of line __ew32_prepare/__ew32

2014-05-16 Thread Andi Kleen
From: Andi Kleen 

Out of lining these two common inlines saves about 30k text size,
due to their errata workarounds.

141314312008136 1507328 1764689510d452f vmlinux-before-e1000e
141014152004040 1507328 1761278310cbfef vmlinux-e1000e

Cc: jeffrey.t.kirs...@intel.com
Cc: net...@vger.kernel.org
Signed-off-by: Andi Kleen 
---
 drivers/net/ethernet/intel/e1000e/e1000.h  | 31 ++
 drivers/net/ethernet/intel/e1000e/netdev.c | 30 +
 2 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h 
b/drivers/net/ethernet/intel/e1000e/e1000.h
index 1471c54..cbe25bb 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -573,35 +573,8 @@ static inline u32 __er32(struct e1000_hw *hw, unsigned 
long reg)
 
 #define er32(reg)  __er32(hw, E1000_##reg)
 
-/**
- * __ew32_prepare - prepare to write to MAC CSR register on certain parts
- * @hw: pointer to the HW structure
- *
- * When updating the MAC CSR registers, the Manageability Engine (ME) could
- * be accessing the registers at the same time.  Normally, this is handled in
- * h/w by an arbiter but on some parts there is a bug that acknowledges Host
- * accesses later than it should which could result in the register to have
- * an incorrect value.  Workaround this by checking the FWSM register which
- * has bit 24 set while ME is accessing MAC CSR registers, wait if it is set
- * and try again a number of times.
- **/
-static inline s32 __ew32_prepare(struct e1000_hw *hw)
-{
-   s32 i = E1000_ICH_FWSM_PCIM2PCI_COUNT;
-
-   while ((er32(FWSM) & E1000_ICH_FWSM_PCIM2PCI) && --i)
-   udelay(50);
-
-   return i;
-}
-
-static inline void __ew32(struct e1000_hw *hw, unsigned long reg, u32 val)
-{
-   if (hw->adapter->flags2 & FLAG2_PCIM2PCI_ARBITER_WA)
-   __ew32_prepare(hw);
-
-   writel(val, hw->hw_addr + reg);
-}
+s32 __ew32_prepare(struct e1000_hw *hw);
+void __ew32(struct e1000_hw *hw, unsigned long reg, u32 val);
 
 #define ew32(reg, val) __ew32(hw, E1000_##reg, (val))
 
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 3e69386..9b6cd9a 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -124,6 +124,36 @@ static const struct e1000_reg_info e1000_reg_info_tbl[] = {
 };
 
 /**
+ * __ew32_prepare - prepare to write to MAC CSR register on certain parts
+ * @hw: pointer to the HW structure
+ *
+ * When updating the MAC CSR registers, the Manageability Engine (ME) could
+ * be accessing the registers at the same time.  Normally, this is handled in
+ * h/w by an arbiter but on some parts there is a bug that acknowledges Host
+ * accesses later than it should which could result in the register to have
+ * an incorrect value.  Workaround this by checking the FWSM register which
+ * has bit 24 set while ME is accessing MAC CSR registers, wait if it is set
+ * and try again a number of times.
+ **/
+s32 __ew32_prepare(struct e1000_hw *hw)
+{
+   s32 i = E1000_ICH_FWSM_PCIM2PCI_COUNT;
+
+   while ((er32(FWSM) & E1000_ICH_FWSM_PCIM2PCI) && --i)
+   udelay(50);
+
+   return i;
+}
+
+void __ew32(struct e1000_hw *hw, unsigned long reg, u32 val)
+{
+   if (hw->adapter->flags2 & FLAG2_PCIM2PCI_ARBITER_WA)
+   __ew32_prepare(hw);
+
+   writel(val, hw->hw_addr + reg);
+}
+
+/**
  * e1000_regdump - register printout routine
  * @hw: pointer to the HW structure
  * @reginfo: pointer to the register info table
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/8] Kbuild: add inline-account tool to find inline bloat

2014-05-16 Thread Andi Kleen
From: Andi Kleen 

Add a tool to hunt for inline bloat. It uses objdump -S to account
inlines.

Example output:

Total code bytes seen 10463206

Code bytes by functions:
Function   Total  Avg   Num
kmalloc37132 (0.00%)  113310
ixgbe_read_reg 35440 (0.00%)  241444
spin_lock  28975 (0.00%)  112575
constant_test_bit  26387 (0.00%)  5 4642
arch_spin_unlock   24986 (0.00%)  7 3364
spin_unlock_irqrestore 24928 (0.00%)  112258
readl  24584 (0.00%)  4 5344
writel 23199 (0.00%)  6 3643
perf_fetch_caller_regs 22436 (0.00%)  27821
get_current22076 (0.00%)  9 2288
_radeon_msleep 19680 (0.00%)  55353
INIT_LIST_HEAD 19410 (0.00%)  111747
list_del   19270 (0.00%)  161176
__ew32_prepare 19080 (0.00%)  25740
__list_add 17830 (0.00%)  121406

Cc: linux-kbu...@vger.kernel.org
Cc: mma...@suse.cz
Signed-off-by: Andi Kleen 
---
 scripts/inline-account.py | 164 ++
 1 file changed, 164 insertions(+)
 create mode 100755 scripts/inline-account.py

diff --git a/scripts/inline-account.py b/scripts/inline-account.py
new file mode 100755
index 000..2dfbf7c
--- /dev/null
+++ b/scripts/inline-account.py
@@ -0,0 +1,164 @@
+#!/usr/bin/python
+# account code bytes per source code / functions from objdump -Sl output
+# useful to find inline bloat
+# Author: Andi Kleen
+import os, sys, re, argparse, multiprocessing
+from collections import Counter
+
+p = argparse.ArgumentParser(
+description="""
+Account code bytes per source code / functions from objdump.
+Useful to find inline bloat.
+
+The line numbers are the beginning of a block, so the actual code can be later.
+Line numbers can be a also little off due to objdump bugs
+also some misaccounting can happen due to inexact gcc debug information.
+The number output for functions may account a single large function multiple
+times.  program/object files need to be built with -g.
+
+This is somewhat slow due to objdump -S being slow. It helps to have
+plenty of cores.""")
+p.add_argument('--min-bytes', type=int, help='minimum bytes to report', 
default=100)
+p.add_argument('--threads', '-t', type=int, 
default=multiprocessing.cpu_count(),
+   help='Number of objdump processes to run')
+p.add_argument('file', help='object file/program as input')
+args = p.parse_args()
+
+def get_syms(fn):
+f = os.popen("nm  --print-size " + fn)
+syms = []
+pc = None
+for l in f:
+n = l.split()
+if len(n) > 2 and n[2].upper() == "T":
+pc = int(n[0], 16)
+syms.append(pc)
+ln = int(n[1], 16)
+f.close()
+if not pc:
+sys.exit(fn + " has no symbols")
+syms.append(pc + ln)
+return syms
+
+class Account:
+pass
+
+def add_account(a, b):
+a.funcbytes += b.funcbytes
+a.linebytes += b.linebytes
+a.funccount += b.funccount
+a.nolinebytes += a.nolinebytes
+a.nofuncbytes += a.nofuncbytes
+a.total += b.total
+return a
+
+# dont add sys.exit here, causes deadlocks
+def account_range(r):
+a = Account()
+a.funcbytes = Counter()
+a.linebytes = Counter()
+a.funccount = Counter()
+a.nolinebytes = 0
+a.nofuncbytes = 0
+a.total = 0
+
+line = None
+func = None
+codefunc = None
+
+cmd = ("objdump -Sl %s --start-address=%#x --stop-address=%#x" %
+(args.file, r[0], r[1]))
+f = os.popen(cmd)
+for l in f:
+#  250:   e8 00 00 00 00  callq  255 

+m = re.match(r'\s*([0-9a-fA-F]+):\s+(.*)', l)
+if m:
+#print "iscode", func, l,
+bytes = len(re.findall(r'[0-9a-f][0-9a-f] ', m.group(2)))
+if not func:
+a.nofuncbytes += bytes
+continue
+if not line:
+a.nolinebytes += bytes
+continue
+a.total += bytes
+a.funcbytes[func] += bytes
+a.linebytes[(file, line)] += bytes
+codefunc = func
+continue
+
+# sysctl_init():
+m = re.match(r'([a-zA-Z_][a-zA-Z0-9_]*)\(\):$', l)
+if m:
+if codefunc and m.group(1) != codefunc:
+a.funccount[codefunc] += 1
+codefunc = None
+func = m.group(1)
+continue
+
+# /sysctl.c:1666
+m = 

[PATCH 1/8] ixgbe: Out of line ixgbe_read/write_reg

2014-05-16 Thread Andi Kleen
From: Andi Kleen 

ixgbe_read_reg and ixgbe_write_reg are frequently called and are very big
because they have complex error handling code.

Moving them out of line saves ~27k text in the ixgbe driver.

   textdata bss dec hex filename
142208732008072 1507328 1773627310ea251 vmlinux-before-ixgbe
141936732003976 1507328 1770497710e2811 vmlinux-ixgbe

Cc: net...@vger.kernel.org
Cc: Jeff Kirsher 
Signed-off-by: Andi Kleen 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.h | 22 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   | 22 ++
 2 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h
index f12c40f..05f094d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h
@@ -162,28 +162,10 @@ static inline void writeq(u64 val, void __iomem *addr)
 }
 #endif
 
-static inline void ixgbe_write_reg64(struct ixgbe_hw *hw, u32 reg, u64 value)
-{
-   u8 __iomem *reg_addr = ACCESS_ONCE(hw->hw_addr);
+void ixgbe_write_reg64(struct ixgbe_hw *hw, u32 reg, u64 value);
+u32 ixgbe_read_reg(struct ixgbe_hw *hw, u32 reg);
 
-   if (ixgbe_removed(reg_addr))
-   return;
-   writeq(value, reg_addr + reg);
-}
 #define IXGBE_WRITE_REG64(a, reg, value) ixgbe_write_reg64((a), (reg), (value))
-
-static inline u32 ixgbe_read_reg(struct ixgbe_hw *hw, u32 reg)
-{
-   u8 __iomem *reg_addr = ACCESS_ONCE(hw->hw_addr);
-   u32 value;
-
-   if (ixgbe_removed(reg_addr))
-   return IXGBE_FAILED_READ_REG;
-   value = readl(reg_addr + reg);
-   if (unlikely(value == IXGBE_FAILED_READ_REG))
-   ixgbe_check_remove(hw, reg);
-   return value;
-}
 #define IXGBE_READ_REG(a, reg) ixgbe_read_reg((a), (reg))
 
 #define IXGBE_WRITE_REG_ARRAY(a, reg, offset, value) \
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d62e7a2..5f81f62 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -371,6 +371,28 @@ void ixgbe_write_pci_cfg_word(struct ixgbe_hw *hw, u32 
reg, u16 value)
pci_write_config_word(adapter->pdev, reg, value);
 }
 
+void ixgbe_write_reg64(struct ixgbe_hw *hw, u32 reg, u64 value)
+{
+   u8 __iomem *reg_addr = ACCESS_ONCE(hw->hw_addr);
+
+   if (ixgbe_removed(reg_addr))
+   return;
+   writeq(value, reg_addr + reg);
+}
+
+u32 ixgbe_read_reg(struct ixgbe_hw *hw, u32 reg)
+{
+   u8 __iomem *reg_addr = ACCESS_ONCE(hw->hw_addr);
+   u32 value;
+
+   if (ixgbe_removed(reg_addr))
+   return IXGBE_FAILED_READ_REG;
+   value = readl(reg_addr + reg);
+   if (unlikely(value == IXGBE_FAILED_READ_REG))
+   ixgbe_check_remove(hw, reg);
+   return value;
+}
+
 static void ixgbe_service_event_complete(struct ixgbe_adapter *adapter)
 {
BUG_ON(!test_bit(__IXGBE_SERVICE_SCHED, >state));
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/8] x86: Out of line get_dma_ops

2014-05-16 Thread Andi Kleen
From: Andi Kleen 

Out of lining the complex version of get_dma_ops saves about 6.8k on
my kernel.

141014152004040 1507328 1761278310cbfef vmlinux-before-dma
140946292004040 1507328 1760599710ca56d vmlinux-dma

Signed-off-by: Andi Kleen 
---
 arch/x86/include/asm/dma-mapping.h |  7 +++
 arch/x86/lib/Makefile  |  2 ++
 arch/x86/lib/dma.c | 11 +++
 3 files changed, 16 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/lib/dma.c

diff --git a/arch/x86/include/asm/dma-mapping.h 
b/arch/x86/include/asm/dma-mapping.h
index 808dae6..314e4bd 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -29,15 +29,14 @@ extern int panic_on_overflow;
 
 extern struct dma_map_ops *dma_ops;
 
+struct dma_map_ops *__get_dma_ops(struct device *dev);
+
 static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 {
 #ifndef CONFIG_X86_DEV_DMA_OPS
return dma_ops;
 #else
-   if (unlikely(!dev) || !dev->archdata.dma_ops)
-   return dma_ops;
-   else
-   return dev->archdata.dma_ops;
+   return __get_dma_ops(dev);
 #endif
 }
 
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index eabcb6e..44dae40 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -44,3 +44,5 @@ else
 lib-y += copy_user_64.o copy_user_nocache_64.o
lib-y += cmpxchg16b_emu.o
 endif
+
+lib-y += dma.o
diff --git a/arch/x86/lib/dma.c b/arch/x86/lib/dma.c
new file mode 100644
index 000..c97b5ae
--- /dev/null
+++ b/arch/x86/lib/dma.c
@@ -0,0 +1,11 @@
+#include 
+#include 
+
+struct dma_map_ops *__get_dma_ops(struct device *dev)
+{
+   if (unlikely(!dev) || !dev->archdata.dma_ops)
+   return dma_ops;
+   else
+   return dev->archdata.dma_ops;
+}
+EXPORT_SYMBOL(__get_dma_ops);
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SCSI "staging" tree for linux-next?

2014-05-16 Thread Nicholas A. Bellinger
On Fri, 2014-05-16 at 15:38 +0200, Hannes Reinecke wrote:
> On 05/15/2014 07:26 AM, Christoph Hellwig wrote:
> > Hi James,
> >
> > we're past -rc5 and no SCSI patches have been collected for 3.16 yet,
> > despite a lot of patches including a lot of reviewed ones pending on the
> > list.
> >
> > I'd really love to get at least some testing for all the work that
> > sometimes has been pending for months in linux-next and would offer to
> > put together a tree of reviewed patches for linux-next.  Is this fine
> > with you?
> >
> Seconded. Having a staging tree would make my life _so_ much easier.
> 

+1.  I thought this was already agreed upon at LSF anyways..?

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] target fixes for v3.15-rc6

2014-05-16 Thread Nicholas A. Bellinger
Hello Linus,

Here are the target-pending fixes for v3.15-rc6.  Please go ahead and
pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git master

This series include:

  - Close race between iser-target network portal shutdown + accepting 
new connection logins (sagi)
  - Fix free-after-use regression in tcm_fc post conversion to 
percpu-ida pre-allocation (nab)
  - Explicitly disable Immediate + Unsolicited Data for iser-target 
connections when T10-PI is enabled (sagi + nab)
  - Allow pi_prot_type + emulate_write_cache attributes to be set to
zero regardless of backend support (andy)

Thank you,

--nab

Andy Grover (2):
  target: Allow non-supporting backends to set pi_prot_type to 0
  target: Don't allow setting WC emulation if device doesn't support

Nicholas Bellinger (3):
  iscsi-target: Change BUG_ON to REJECT in iscsit_process_nop_out
  tcm_fc: Fix free-after-use regression in ft_free_cmd
  iscsi-target: Disable Immediate + Unsolicited Data with ISER
Protection

Sagi Grimberg (3):
  Target/iser: Fix wrong connection requests list addition
  Target/iser: Fix iscsit_accept_np and rdma_cm racy flow
  Target/iscsi,iser: Avoid accepting transport connections during stop
stage

 drivers/infiniband/ulp/isert/ib_isert.c   |   38 +
 drivers/infiniband/ulp/isert/ib_isert.h   |2 +-
 drivers/target/iscsi/iscsi_target.c   |4 ++-
 drivers/target/iscsi/iscsi_target_core.h  |1 +
 drivers/target/iscsi/iscsi_target_login.c |   28 -
 drivers/target/iscsi/iscsi_target_tpg.c   |1 +
 drivers/target/target_core_device.c   |   12 ++---
 drivers/target/tcm_fc/tfc_cmd.c   |8 +++---
 8 files changed, 62 insertions(+), 32 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sched/dl: Fix race between dl_task_timer() and sched_setaffinity()

2014-05-16 Thread Kirill Tkhai
The race is in unlocked task_rq() access. In pair with parallel
call of sched_setaffinity() it may be a reason of corruption
of internal rq's data.

Signed-off-by: Kirill Tkhai 
CC: Juri Lelli 
CC: Peter Zijlstra 
CC: Ingo Molnar 
Cc:  # v3.14
---
 kernel/sched/deadline.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 800e99b..ffb023a 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -513,9 +513,16 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer 
*timer)
 struct sched_dl_entity,
 dl_timer);
struct task_struct *p = dl_task_of(dl_se);
-   struct rq *rq = task_rq(p);
+   struct rq *rq;
+again:
+   rq = task_rq(p);
raw_spin_lock(>lock);
 
+   if (unlikely(rq != task_rq(p))) {
+   raw_spin_unlock(>lock);
+   goto again;
+   }
+
/*
 * We need to take care of a possible races here. In fact, the
 * task might have changed its scheduling policy to something

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Kernel Failure - 3.4.24 Similar USB MO To 3.4.89 Kernel Failure

2014-05-16 Thread John L. Males
reen.  My
> research after the kernel opps suggests one has to write down
> the information on the screen from a kernel opps, which I did
> not do as I did not think I would need to anymore.  The
> reason I mention this is that kernel opps was with a USB
> device as well.  The difference was it was a USB Wireless BGN
> device that I have used many times over the last 12 months
> with a number of 3.2.x kernels with no kernel opps/failure,
> just odd functional issues that seem to resolve in later
> kernel versions. The kernel opps that occurred with this
> Wireless BGN device only occurred once with that exact older
> 3.2.x kernel version and I have no clue why.  I have no
> information I know of about that kernel opps that might help
> with this kernel failure.  I did not know I needed to write
> down the screen from the opps.  I therefore cannot provide
> the kernel opps information that might share some common
> findings with the kernel failure of this issue.  I suspect
> there may be nothing in common, but without the kernel opps
> information we will not know for certain. 
> 
> The USB device was a MP3 player that acts like a flash USB
> drive when it is plugged into a computer.  This means one can
> copy to/from, rename, delete files using the command line or
> any file manager one uses.
> 
> > - Is this *new* meaning is there a kernel where did not
> > happen?
> 
> I am not sure where the "new" reference you are referring to
> is from.  That said, the only time this person's MP3
> player/USB flash was used was with the kernel.org 3.2.24
> kernel I noted.
> 
> The only other USB problem I had was once with a USB Wireless
> BGN device that has see alot of activity on my system and had
> one opps on a 3.2.x kernel prior to 3.2.24 and again only once
> on that kernel version. 
> 
> > 
> > Sebastian
> 
> I know you know, but for those that do not, I am not on the
> LKML.  It would be appreciated if I was copied in on any LKML
> replies.
> 
> As always if there is more information or clarification needed
> please ask.
> 
> 
> Regards,
> 
> John L. Males
> Toronto, Ontario
> Canada
> 30 January 2013 13:58
> 
> 
> ==
> 2013-01-30 13:09:05.479017366-0500-EST
> 
> 30 Jan 13:09:05 ntpdate[17854]: ntpdate 4.2.6p2@1.2194-o Sun
> Oct 17 13:35:14 UTC 2010 (1)
> 
> 30 Jan 13:09:32 ntpdate[17863]: step time server 142.4.209.106
> offset -7.350323 sec
> 
> Linux 3.4.24-kernel.org-jlm-010-amd64 #1 SMP PREEMPT Sun Dec
> 23 10:06:41 EST 2012
> 
> Modified Debian GNU/Linux 6.0.3 (squeeze)
> (Evaluating alternatives to Debian)
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.10 (GNU/Linux)
> 
> iEYEARECAAYFAlEJbUcACgkQ
> +V/XUtB6aBAh4ACeKQIM7vMWliG9iHpUfmhwQPKo
> 58sAoMiUS1AgNtfj0oBBPydcP60m3dyH =8sUO
> -END PGP SIGNATURE-
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAlN2g1gACgkQ+V/XUtB6aBBBWACgyCfzfETF9d1GNI6Ci2MIbIvA
nwEAn1Q+k+ogNczAoBOvZGWEQp2YUdhs
=MxB8
-END PGP SIGNATURE-
20140516 14:49 Window of LXDE popped up saying had kernel failure.

Failure occurred yet again as handful of times prior versions in past couple 
years related to USB

Last time was USB TV adaptor was inserted after had been removed.

This time umounted wip 8gb ArchLinux USB created on T5730 with umount command 
in system since early hours of this morning,
then inserted 2gb USB that has boot/install image of ArchLinux used on T5730 to 
create the WIP ArchLinux on 8GB flash
at which point kernel failure occurred.

unmount was done via console which already has console logging enabled:

20140516 14:47:26 -0400 EDT 
keypunch@pwsdhhuesloejsgegsjwilastwhsk:/vm/ISOs/linux/ArchLinux/archlinux/2014.05.01
 tty0 $ pumount /media/0ad46398-13a0-4c3d-93f9-34e67510f053
20140516 14:47:42 -0400 EDT 
keypunch@pwsdhhuesloejsgegsjwilastwhsk:/vm/ISOs/linux/ArchLinux/archlinux/2014.05.01
 tty0 $ cd /vm/ISOs/linux/debian/debian/debian-cd/7.5.0-live/i386/iso-hybrid


Choose not to send, but show details of:


Kernel failure message 1:
[619141.142769] [ cut here ]
[619141.142784] WARNING: at block/genhd.c:1573 disk_clear_events+0x11f/0x130()
[619141.142788] Hardware name: HP Compaq nc6400 (RM100AW#ABA)
[619141.142791] Modules linked in: ext4 jbd2 ufs isofs nls_iso8859_1 nls_utf8 
nls_cp437 vfat fat cryptd aes_x86_64 aes_generic snd_hrtimer kvm_intel kvm 
ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables 
x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative 
bridge stp bnep rfcomm bluetooth crc16 ppdev lp binfmt_misc i915 drm_kms_helper 
drm i2c_algo_bit i2c_core uinput fuse loop snd_hda_codec_si3054 
snd_hda_codec_analog snd_hda_intel 

Re: [PATCH v2] ARM: OMAP: replace checks for CONFIG_USB_GADGET_OMAP

2014-05-16 Thread Tony Lindgren
* Paul Bolle  [140516 03:01]:
> Commit 193ab2a60700 ("usb: gadget: allow multiple gadgets to be built")
> apparently required that checks for CONFIG_USB_GADGET_OMAP would be
> replaced with checks for CONFIG_USB_OMAP. Do so now for the remaining
> checks for CONFIG_USB_GADGET_OMAP, even though these checks have
> basically been broken since v3.1.
> 
> And, since we're touching this code, use the IS_ENABLED() macro, so
> things will now (hopefully) also work if USB_OMAP is modular.
> 
> Fixes: 193ab2a60700 ("usb: gadget: allow multiple gadgets to be built")
> Signed-off-by: Paul Bolle 
> ---
> v2: us IS_ENABLED() as Felipe requested. Still untested.

Thanks applying into omap-for-v3.16/board.

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] perf,x86: use common PMU interrupt disabled code

2014-05-16 Thread Vince Weaver

Make the x86 perf code use the new common PMU interrupt disabled code.

Typically most x86 machines have working PMU interrupts, although
some older p6-class machines had this problem.

Signed-off-by: Vince Weaver 

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index ae407f7..adba966 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -303,15 +303,6 @@ int x86_setup_perfctr(struct perf_event *event)
hwc->sample_period = x86_pmu.max_period;
hwc->last_period = hwc->sample_period;
local64_set(>period_left, hwc->sample_period);
-   } else {
-   /*
-* If we have a PMU initialized but no APIC
-* interrupts, we cannot sample hardware
-* events (user-space has to fall back and
-* sample via a hrtimer based software event):
-*/
-   if (!x86_pmu.apic)
-   return -EOPNOTSUPP;
}
 
if (attr->type == PERF_TYPE_RAW)
@@ -1365,6 +1356,15 @@ static void __init pmu_check_apic(void)
x86_pmu.apic = 0;
pr_info("no APIC, boot with the \"lapic\" boot parameter to 
force-enable it.\n");
pr_info("no hardware sampling interrupt available.\n");
+
+   /*
+* If we have a PMU initialized but no APIC
+* interrupts, we cannot sample hardware
+* events (user-space has to fall back and
+* sample via a hrtimer based software event):
+*/
+   pmu.capabilities |= PERF_PMU_NO_INTERRUPT;
+
 }
 
 static struct attribute_group x86_pmu_format_group = {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Building sh4 without CONFIG_EXPERT.

2014-05-16 Thread Randy Dunlap
On 05/11/2014 04:53 PM, Rob Landley wrote:
> I got sh4 to work under QEMU years ago as part of my aboriginal linux
> project, which builds the smallest Linux system capable of rebuilding
> itself natively from source code. (You can download and run the system
> images from http://landley.net/aboriginal/bin if you're curious.)
> 
> One of the goals of Aboriginal is to make different architectures behave
> the same way, and one of the ways I do that is by having a basic kernel
> miniconfig file defining the config symbols common across platforms:
> 
>   http://landley.net/hg/aboriginal/file/1651/sources/baseconfig-linux
> 
> And then append target-specific chunks, ala the LINUX_CONFIG sections
> from each of:
> 
>   http://landley.net/hg/aboriginal/file/1651/sources/targets
> 
> The problem is, the sh4 target's target-specific chunk is an INSANE 45
> config symbols (armv5l needs 15 symbols, powerpc needs 16, i686 needs 7,
> mips needs 6...) and the reason for the verbosity is that sh4 forces on
> CONFIG_EXPERT.
> 
> As far as I can tell, the only reason sh4 is forcing on CONFIG_EXPERT is
> to get CONFIG_PATA_PLATFORM. The patch to make sh4 _not_ force on
> CONFIG_EXPERT is just:
> 
> diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
> index 834b67c..7d0d44d 100644
> --- a/arch/sh/Kconfig
> +++ b/arch/sh/Kconfig
> @@ -1,7 +1,7 @@
>  config SUPERH
>   def_bool y
>   select ARCH_MIGHT_HAVE_PC_PARPORT
> - select EXPERT
> + select HAVE_PATA_PLATFORM
>   select CLKDEV_LOOKUP
>   select HAVE_IDE if HAS_IOPORT_MAP
>   select HAVE_MEMBLOCK

Since PATA_PLATFORM is:

config PATA_PLATFORM
tristate "Generic platform device PATA support"
depends on EXPERT || PPC || HAVE_PATA_PLATFORM

then any of EXPERT, PPC, or HAVE_PATA_PLATFORM should be sufficient.
and using HAVE_PATA_PLATFORM is more direct and obvious.

Acked-by: Randy Dunlap 


> Which swaps EXPERT for HAVE_PATA_PLATFORM, so I can still provide a hard
> drive qemu can see as /dev/sda. The result builds, boots, and works for
> me in very basic smoke testing, and lets me get my sh4-specific config
> symbol set down to 27 symbols.
> 
> (P.S. I have no idea why hitting ctrl-C kills the _emulator_ rather than
> passing it along to the emulated system. The other qemu targets don't do
> that...)
> 
> Rob
> 
> P.S. I'm using linux 3.14, qemu 2.0, and the sh4 cross compiler
> toolchain from the URL at the top of the post. I can be more explicit if
> you care...


-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: OMAP: AM3517EVM: remove check for CONFIG_PANEL_SHARP_LQ043T1DG01

2014-05-16 Thread Tony Lindgren
* Paul Bolle  [140515 12:55]:
> The Kconfig symbol PANEL_SHARP_LQ043T1DG01 was removed in v2.6.38. The
> check for CONFIG_PANEL_SHARP_LQ043T1DG01 and its MODULE variant has
> evaluated to false ever since. Remove that check.

Thanks applying into omap-for-v3.16/board.

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: OMAP: SX1: remove check for CONFIG_SX1_OLD_FLASH

2014-05-16 Thread Tony Lindgren
* Paul Bolle  [140515 12:42]:
> A check for CONFIG_SX1_OLD_FLASH was added in v2.6.24. But the related
> Kconfig symbol was never part of the tree. So we can remove some dead
> code.

Thanks applying into omap-for-v3.16/board.

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] perf, ARM: use common PMU interrupt disabled code

2014-05-16 Thread Vince Weaver

Make the ARM perf code use the new common PMU interrupt disabled code.

This allows perf to work on ARM machines without a working PMU
interrupt (for example, raspberry pi).

Signed-off-by: Vince Weaver 

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index a6bc431..4238bcb 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -410,7 +410,7 @@ __hw_perf_event_init(struct perf_event *event)
 */
hwc->config_base|= (unsigned long)mapping;
 
-   if (!hwc->sample_period) {
+   if (!is_sampling_event(event)) {
/*
 * For non-sampling runs, limit the sample_period to half
 * of the counter width. That way, the new counter value
diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index 51798d7..63d95fa 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -126,8 +126,8 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, 
irq_handler_t handler)
 
irqs = min(pmu_device->num_resources, num_possible_cpus());
if (irqs < 1) {
-   pr_err("no irqs for PMUs defined\n");
-   return -ENODEV;
+   printk_once("no irqs for PMU defined, sampled events not 
supported\n");
+   return 0;
}
 
irq = platform_get_irq(pmu_device, 0);
@@ -191,6 +191,11 @@ static void cpu_pmu_init(struct arm_pmu *cpu_pmu)
/* Ensure the PMU has sane values out of reset. */
if (cpu_pmu->reset)
on_each_cpu(cpu_pmu->reset, cpu_pmu, 1);
+
+   /* If no interrupts available, set the corresponding capability flag */
+   if (platform_get_irq(cpu_pmu->plat_device, 0) <= 0) {
+   cpu_pmu->pmu.capabilities |= PERF_PMU_NO_INTERRUPT;
+   }
 }
 
 /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: fix page fault tracing when KVM guest support enabled

2014-05-16 Thread Dave Hansen
On 05/16/2014 02:01 PM, Paolo Bonzini wrote:
> Yes, of course.  Dave, ok to only have it in 3.16?

Sure, it's been broken for a long time, so it's no hurry to get fixed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: OMAP: remove some dead code

2014-05-16 Thread Tony Lindgren
* Aaro Koskinen  [140515 12:48]:
> On Thu, May 15, 2014 at 09:16:21PM +0200, Paul Bolle wrote:
> > A check for CONFIG_CBUS_TAHVO_USB was added in v2.6.17. The related
> > Kconfig symbol has never been part of the tree. Remove that check.
> > 
> > Replace the while (...) loop with a simple if (...) statement, while
> > we're at it.
> > 
> > Signed-off-by: Paul Bolle 
> 
> Acked-by: Aaro Koskinen 
> 
> > ---
> > Untested, as usual.
> > 
> > A quick search across the history of the tree suggests CBUS_TAHVO_USB
> > was N770 related. Not that this matters much.
> 
> Yes, Tahvo USB is only used on Nokia 770 (which is the correct name,
> not N770). The driver is nowadays behind TAHVO_USB, but like you said
> the old symbol was never part of the mainline so it's OK to delete it.

Thanks applying into omap-for-v3.16/board.

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: OMAP: omap3stalker: remove two Kconfig macros

2014-05-16 Thread Tony Lindgren
* Paul Bolle  [140515 11:38]:
> Checks for CONFIG_OMAP2_VENC_OUT_TYPE_SVIDEO and
> CONFIG_OMAP2_VENC_OUT_TYPE_COMPOSITE were added in v2.6.35. But the
> related Kconfig symbols have never been added to the tree. Remove these
> checks.
> 
> Initialize connector_type to OMAP_DSS_VENC_TYPE_COMPOSITE explicitly.
> That's what's happening currently: OMAP_DSS_VENC_TYPE_COMPOSITE equals
> zero and connector_type remains zero since both checks currently fail.

Thanks applying into omap-for-v3.16/board.

Tony
 
> Signed-off-by: Paul Bolle 
> ---
> Untested.
> 
>  arch/arm/mach-omap2/board-omap3stalker.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/arch/arm/mach-omap2/board-omap3stalker.c 
> b/arch/arm/mach-omap2/board-omap3stalker.c
> index 119efaf5808a..a2e035e0792a 100644
> --- a/arch/arm/mach-omap2/board-omap3stalker.c
> +++ b/arch/arm/mach-omap2/board-omap3stalker.c
> @@ -121,11 +121,7 @@ static struct platform_device omap3stalker_tfp410_device 
> = {
>  static struct connector_atv_platform_data omap3stalker_tv_pdata = {
>   .name = "tv",
>   .source = "venc.0",
> -#if defined(CONFIG_OMAP2_VENC_OUT_TYPE_SVIDEO)
> - .connector_type = OMAP_DSS_VENC_TYPE_SVIDEO,
> -#elif defined(CONFIG_OMAP2_VENC_OUT_TYPE_COMPOSITE)
>   .connector_type = OMAP_DSS_VENC_TYPE_COMPOSITE,
> -#endif
>   .invert_polarity = false,
>  };
>  
> -- 
> 1.9.0
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] dma-mapping: pci: Add devm_ interface for pci_map_single

2014-05-16 Thread Tejun Heo
On Fri, May 16, 2014 at 11:26:37AM +0300, Eli Billauer wrote:
> 
> Signed-off-by: Eli Billauer 
> ---
>  Documentation/driver-model/devres.txt |2 ++
>  include/asm-generic/pci-dma-compat.h  |   17 +
>  2 files changed, 19 insertions(+), 0 deletions(-)

The patch looks fine to me but can you please cc PCI subsystem
maintainer and please do so for other patches too.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] perf: disable sampled events if no PMU interrupt

2014-05-16 Thread Vince Weaver

Add common code to generate ENOTSUPP at event creation time if an 
architecture attempts to create a sampled event and PERF_PMU_NO_INTERRUPT
is set.

This adds a new pmu->capabilities flag.  
Initially we only support PERF_PMU_NO_INTERRUPT (to indicate a PMU
has no support for generating hardware interrupts) but there are 
other capabilities that can be added later.

Signed-off-by: Vince Weaver 

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 3356abc..2164763 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -251,9 +251,20 @@ struct pmu {
 * flush branch stack on context-switches (needed in cpu-wide mode)
 */
void (*flush_branch_stack)  (void);
+
+   /*
+* various common per-pmu feature flags
+*/
+   int capabilities;
+
 };
 
 /**
+ * struct pmu->capabilites flags
+ */
+#define PERF_PMU_NO_INTERRUPT  1
+
+/**
  * enum perf_event_active_state - the states of a event
  */
 enum perf_event_active_state {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f83a71a..f5d8554 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7072,6 +7072,13 @@ SYSCALL_DEFINE5(perf_event_open,
}
}
 
+   if (is_sampling_event(event)) {
+   if (event->pmu->capabilities & PERF_PMU_NO_INTERRUPT) {
+   err = -ENOTSUPP;
+   goto err_alloc;
+   }
+   }
+
account_event(event);
 
/*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] dma-mapping: Add devm_ interface for dma_map_single()

2014-05-16 Thread Tejun Heo
Hello,

On Fri, May 16, 2014 at 11:26:36AM +0300, Eli Billauer wrote:
> +dma_addr_t dmam_map_single(struct device *dev, void *ptr, size_t size,
> +enum dma_data_direction direction)
> +
> +{
> + struct dma_devres *dr;
> + dma_addr_t dma_handle;
> +
> + dr = devres_alloc(dmam_map_single_release, sizeof(*dr), GFP_KERNEL);
> + if (!dr)
> + return 0;
> +
> + dma_handle = dma_map_single(dev, ptr, size, direction);

Don't we wanna map the underlying operation - dma_map_single_attrs() -
instead?

> + if (dma_mapping_error(dev, dma_handle)) {
> + devres_free(dr);
> + return 0;

Can't we just keep returning dma_handle?  Even if that means invoking
->mapping_error() twice?  It's yucky to have subtly different error
return especially because in most cases it won't fail.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] perf: disable sampled events if no PMU interrupt

2014-05-16 Thread Vince Weaver
Hello

This patch series adds a common shared interface for returning ENOTSUPP
if a user tries to create a sampled event (one with sample_period set)
on a machine that has no usable PMU interrupt.

Currently only x86 and ARM are implemented but if the patch is accepted
then we can move other architectures over to use the interface.

This patch also has the side effect of enabling perf to work on 
raspberry-pi machines.

Consideration should also be given to disabling sampling support on 
machines with buggy PMU interrupts (such as Cortex-A8 and Cortex-A9
ARM platforms).

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: fix page fault tracing when KVM guest support enabled

2014-05-16 Thread Paolo Bonzini

Il 16/05/2014 22:53, H. Peter Anvin ha scritto:

On 05/16/2014 12:45 PM, Dave Hansen wrote:

From: Dave Hansen 

I noticed on some of my systems that page fault tracing doesn't
work:

cd /sys/kernel/debug/tracing
echo 1 > events/exceptions/enable
cat trace;
# nothing shows up

I eventually traced it down to CONFIG_KVM_GUEST.  At least in a
KVM VM, enabling that option breaks page fault tracing, and
disabling fixes it.  I tried on some old kernels and this does
not appear to be a regression: it never worked.

There are two page-fault entry functions today.  One when tracing
is on and another when it is off.  The KVM code calls do_page_fault()
directly instead of calling the traced version:


dotraplinkage void __kprobes
do_async_page_fault(struct pt_regs *regs, unsigned long
error_code)
{
enum ctx_state prev_state;

switch (kvm_read_and_reset_pf_reason()) {
default:
do_page_fault(regs, error_code);
break;
case KVM_PV_REASON_PAGE_NOT_PRESENT:


I'm also having problems with the page fault tracing on bare
metal (same symptom of no trace output).  I'm unsure if it's
related.

Steven had an alternative to this which has zero overhead when
tracing is off where this includes the standard noops even when
tracing is disabled.  I'm unconvinced that the extra complexity
of his apporach:

http://lkml.kernel.org/r/20140508194508.561ed...@gandalf.local.home

is worth it, expecially considering that the KVM code is already
making page fault entry slower here.  This solution is
dirt-simple.

Gleb, please apply.

Signed-off-by: Dave Hansen 
Cc: Thomas Gleixner 
Cc: x...@kernel.org
Cc: Peter Zijlstra 
Cc: Gleb Natapov 
Cc: "H. Peter Anvin" 
Cc: k...@vger.kernel.org
Cc: Paolo Bonzini 
Cc: Steven Rostedt 


Acked-by: H. Peter Anvin 

If Gleb and Paolo are okay with it, I am.


Yes, of course.  Dave, ok to only have it in 3.16?

Paolo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] devres: Add devm_get_free_pages API

2014-05-16 Thread Tejun Heo
On Fri, May 16, 2014 at 11:26:35AM +0300, Eli Billauer wrote:
> devm_get_free_pages() and devm_free_pages() are the managed counterparts
> for __get_free_pages() and free_pages().
> 
> Signed-off-by: Eli Billauer 

Acked-by: Tejun Heo 

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: fix page fault tracing when KVM guest support enabled

2014-05-16 Thread H. Peter Anvin
On 05/16/2014 12:45 PM, Dave Hansen wrote:
> From: Dave Hansen 
> 
> I noticed on some of my systems that page fault tracing doesn't
> work:
> 
>   cd /sys/kernel/debug/tracing
>   echo 1 > events/exceptions/enable
>   cat trace;
>   # nothing shows up
> 
> I eventually traced it down to CONFIG_KVM_GUEST.  At least in a
> KVM VM, enabling that option breaks page fault tracing, and
> disabling fixes it.  I tried on some old kernels and this does
> not appear to be a regression: it never worked.
> 
> There are two page-fault entry functions today.  One when tracing
> is on and another when it is off.  The KVM code calls do_page_fault()
> directly instead of calling the traced version:
> 
>> dotraplinkage void __kprobes
>> do_async_page_fault(struct pt_regs *regs, unsigned long
>> error_code)
>> {
>> enum ctx_state prev_state;
>>
>> switch (kvm_read_and_reset_pf_reason()) {
>> default:
>> do_page_fault(regs, error_code);
>> break;
>> case KVM_PV_REASON_PAGE_NOT_PRESENT:
> 
> I'm also having problems with the page fault tracing on bare
> metal (same symptom of no trace output).  I'm unsure if it's
> related.
> 
> Steven had an alternative to this which has zero overhead when
> tracing is off where this includes the standard noops even when
> tracing is disabled.  I'm unconvinced that the extra complexity
> of his apporach:
> 
>   http://lkml.kernel.org/r/20140508194508.561ed...@gandalf.local.home
> 
> is worth it, expecially considering that the KVM code is already
> making page fault entry slower here.  This solution is
> dirt-simple.
> 
> Gleb, please apply.
> 
> Signed-off-by: Dave Hansen 
> Cc: Thomas Gleixner 
> Cc: x...@kernel.org
> Cc: Peter Zijlstra 
> Cc: Gleb Natapov 
> Cc: "H. Peter Anvin" 
> Cc: k...@vger.kernel.org
> Cc: Paolo Bonzini 
> Cc: Steven Rostedt 

Acked-by: H. Peter Anvin 

If Gleb and Paolo are okay with it, I am.

-hpa




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] PM / sleep: Mechanism to avoid resuming runtime-suspended devices unnecessarily

2014-05-16 Thread Rafael J. Wysocki
On Friday, May 16, 2014 10:27:37 AM Alan Stern wrote:
> On Fri, 16 May 2014, Rafael J. Wysocki wrote:
> 
> > From: Rafael J. Wysocki 
> > 
> > Currently, some subsystems (e.g. PCI and the ACPI PM domain) have to
> > resume all runtime-suspended devices during system suspend, mostly
> > because those devices may need to be reprogrammed due to different
> > wakeup settings for system sleep and for runtime PM.
> > 
> > For some devices, though, it's OK to remain in runtime suspend 
> > throughout a complete system suspend/resume cycle (if the device was in
> > runtime suspend at the start of the cycle).  We would like to do this
> > whenever possible, to avoid the overhead of extra power-up and power-down
> > events.
> > 
> > However, problems may arise because the device's descendants may require
> > it to be at full power at various points during the cycle.  Therefore the
> > most straightforward way to do this safely is if the device and all its
> > descendants can remain runtime suspended until the complete stage of
> > system resume.
> > 
> > To this end, introduce a new device PM flag, power.direct_complete
> > and modify the PM core to use that flag as follows.
> > 
> > If the ->prepare() callback of a device returns a positive number,
> > the PM core will regard that as an indication that it may leave the
> > device runtime-suspended.  It will then check if the system power
> > transition in progress is a suspend (and not hibernation in particular)
> > and if the device is, indeed, runtime-suspended.  In that case, the PM
> > core will set the device's power.direct_complete flag.  Otherwise it
> > will clear power.direct_complete for the device and it also will later
> > clear it for the device's parent (if there's one).
> > 
> > Next, the PM core will not invoke the ->suspend() ->suspend_late(),
> > ->suspend_irq(), ->resume_irq(), ->resume_early(), or ->resume()
> > callbacks for all devices having power.direct_complete set.  It
> > will invoke their ->complete() callbacks, however, and those
> > callbacks are then responsible for resuming the devices as
> > appropriate, if necessary.  For example, in some cases they may
> > need to queue up runtime resume requests for the devices using
> > pm_request_resume().
> > 
> > Changelog partly based on an Alan Stern's description of the idea
> > (http://marc.info/?l=linux-pm=139940466625569=2).
> > 
> > Signed-off-by: Rafael J. Wysocki 
> 
> Acked-by: Alan Stern 
> 
> And likewise for the documentation patches.

Thanks a lot!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] workqueue: Allow modifying low level unbound workqueue cpumask

2014-05-16 Thread Tejun Heo
Hello, Frederic.

On Fri, May 16, 2014 at 06:16:55PM +0200, Frederic Weisbecker wrote:
> @@ -3643,6 +3643,7 @@ static int apply_workqueue_attrs_locked(struct 
> workqueue_struct *wq,
>  {
>   struct workqueue_attrs *new_attrs, *tmp_attrs;
>   struct pool_workqueue **pwq_tbl, *dfl_pwq;
> + cpumask_var_t saved_cpumask;
>   int node, ret;
>  
>   /* only unbound workqueues can change attributes */
> @@ -3653,15 +3654,25 @@ static int apply_workqueue_attrs_locked(struct 
> workqueue_struct *wq,
>   if (WARN_ON((wq->flags & __WQ_ORDERED) && !attrs->no_numa))
>   return -EINVAL;
>  
> + if (!alloc_cpumask_var(_cpumask, GFP_KERNEL))
> + goto enomem;
> +
>   pwq_tbl = kzalloc(wq_numa_tbl_len * sizeof(pwq_tbl[0]), GFP_KERNEL);
>   new_attrs = alloc_workqueue_attrs(GFP_KERNEL);
>   tmp_attrs = alloc_workqueue_attrs(GFP_KERNEL);
> +
>   if (!pwq_tbl || !new_attrs || !tmp_attrs)
>   goto enomem;
>  
>   /* make a copy of @attrs and sanitize it */
>   copy_workqueue_attrs(new_attrs, attrs);
> - cpumask_and(new_attrs->cpumask, new_attrs->cpumask, unbounds_cpumask);
> +
> + /*
> +  * Apply unbounds_cpumask on the new attrs for pwq and worker pools
> +  * creation but save the wq proper cpumask for unbound attrs backup.
> +  */
> + cpumask_and(saved_cpumask, new_attrs->cpumask, cpu_possible_mask);
> + cpumask_and(new_attrs->cpumask, saved_cpumask, unbounds_cpumask);
>  
>   /*
>* We may create multiple pwqs with differing cpumasks.  Make a
> @@ -3693,6 +3704,7 @@ static int apply_workqueue_attrs_locked(struct 
> workqueue_struct *wq,
>   /* all pwqs have been created successfully, let's install'em */
>   mutex_lock(>mutex);
>  
> + cpumask_copy(new_attrs->cpumask, saved_cpumask);
>   copy_workqueue_attrs(wq->unbound_attrs, new_attrs);

Yeah, this seems like the correct behavior but it's a bit nasty.
Wouldn't creating another application copy be cleaner?  If not, can we
at least add more comment explaining why we're doing this?
H... shouldn't we be able to apply the mask to tmp_attrs?

Also, isn't the code block involving wq_calc_node_cpumask() kinda
broken for this?  It uses @attrs which is not masked by
@unbounds_cpumask.  This used to be okay as whatever it calculates
would fall in @cpu_possible_mask anyway but that no longer is the
case, right?

Another one, why is @unbounds_cpumask passed in as an argument?  Can't
it use the global variable directly?

> +static int unbounds_cpumask_apply(cpumask_var_t cpumask)
> +{
> + struct workqueue_struct *wq;
> + int ret;
> +
> + lockdep_assert_held(_pool_mutex);
> +
> + list_for_each_entry(wq, , list) {
> + struct workqueue_attrs *attrs;
> +
> + if (!(wq->flags & WQ_UNBOUND))
> + continue;
> +
> + attrs = wq_sysfs_prep_attrs(wq);
> + if (!attrs)
> + return -ENOMEM;
> +
> + ret = apply_workqueue_attrs_locked(wq, attrs, cpumask);
> + free_workqueue_attrs(attrs);
> + if (ret)
> + break;
> + }
> +
> + return 0;
> +}
> +
> +static ssize_t unbounds_cpumask_store(struct device *dev,
> +   struct device_attribute *attr,
> +   const char *buf, size_t count)
> +{
> + cpumask_var_t cpumask;
> + int ret = -EINVAL;
> +
> + if (!zalloc_cpumask_var(, GFP_KERNEL))
> + return -ENOMEM;
> +
> + ret = cpumask_parse(buf, cpumask);
> + if (ret)
> + goto out;
> +
> + get_online_cpus();
> + if (cpumask_intersects(cpumask, cpu_online_mask)) {
> + mutex_lock(_pool_mutex);
> + ret = unbounds_cpumask_apply(cpumask);
> + if (ret < 0) {
> + /* Warn if rollback itself fails */
> + 
> WARN_ON_ONCE(unbounds_cpumask_apply(wq_unbound_cpumask));

Hmmm... but there's nothing which makes rolling back more likely to
succeed compared to the original applications.  It's gonna allocate
more pwqs.  Triggering WARN_ON_ONCE() seems weird.

That said, yeah, short of splitting apply_workqueue_attrs_locked()
into two stages - alloc and commit, I don't think proper error
recovery is possible.

There are a couple options, I guess.

1. Split apply_workqueue_attrs_locked() into two stages.  The first
   stage creates new pwqs as necessary and cache it.  Each @wq would
   need a pointer to remember these staged pwq_tbl.  The second stage
   commits them, which obviously can't fail.

2. Proper error handling is hard.  Just do pr_warn() on each failure
   and continue to try to apply and always return 0.

If #1 isn't too complicated (would it be?), it'd be the better option;
otherwise, well, #2 should work most of the time, eh?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body 

Re: [RFC][PATCH 2/3] PM / sleep: Mechanism to avoid resuming runtime-suspended devices unnecessarily

2014-05-16 Thread Rafael J. Wysocki
On Friday, May 16, 2014 08:20:55 AM Jacob Pan wrote:
> On Thu, 15 May 2014 11:58:55 -0400 (EDT)
> Alan Stern  wrote:
> 
> > On Thu, 15 May 2014, Jacob Pan wrote:
> > 
> > > On Thu, 15 May 2014 10:29:42 -0400 (EDT)
> > > Alan Stern  wrote:
> > > 
> > > > On Thu, 15 May 2014, Jacob Pan wrote:
> > > > 
> > > > > > > should we respect ignore_children flag here? not all parent
> > > > > > > devices create children with proper .prepare() function.
> > > > > > > this allows parents override children.
> > > > > > > I am looking at USB, a USB device could have logical
> > > > > > > children such as ep_xx, they don't go through the same
> > > > > > > subsystem .prepare().
> > > > > > 
> > > > > > Well, I'm not sure about that.  Let me consider that for a
> > > > > > while.
> > > > > OK. let me be more clear about the situation i see in USB.
> > > > > Correct me if I am wrong, a USB device will always has at least
> > > > > one endpoint/ep_00 as a kid for control pipe, it is a logical
> > > > > device. So when device_prepare() is called, its call back is
> > > > > NULL which makes .direct_complete = 0. Since children device
> > > > > suspend is called before parents, the parents .direct_complete
> > > > > flag will always get cleared.
> > > > > 
> > > > > What i am trying to achieve here is to see if we avoid resuming
> > > > > built-in (hardwired connect_type) non-hub USB devices based on
> > > > > this new patchset. E.g. we don't want to resume/suspend USB
> > > > > camera every time in system suspend/resume cycle if they are
> > > > > already rpm suspended. We can save ~100ms resume time for the
> > > > > devices we have tested.
> > > > 
> > > > This is a good point, but I don't think it is at all related to 
> > > > ignore_children.
> > > > 
> > > > Instead, it seems that the best way to solve it would be to add a 
> > > > ->prepare() handler for usb_ep_device_type that would always turn 
> > > > on direct_complete.
> > > > 
> > > yeah, that would solve the problem with EP device type. But what
> > > about other subdevices. e.g. for USB camera, uvcvideo device? We
> > > can add .prepare(return 1;) for each level but would it be better
> > > to have a flag similar to ignore_children if not ignore_children
> > > itself.
> > 
> > Something like that could always be added.
> or, how about if a device's .prepare() is NULL, we could
> assume .direct_resume() should be set. i.e.

You mean direct_complete (which is a flag, not a function), I suppose?

Wouldn't that go a bit too far?  It seems to be based on the assumption that
all devices having no ->prepare() callback can be safely left in runtime
suspend over a system suspend/resume cycle, but is that assumption actually
satisfied for all such devices?

> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -1539,7 +1539,7 @@ static int device_prepare(struct device *dev,
> pm_message_t state) pm_runtime_put(dev);
> return ret;
> }
> -   dev->power.direct_complete = ret > 0 && state.event ==
> PM_EVENT_SUSPEND
> +   dev->power.direct_complete = (!callback || ret > 0) &&
> state.event == PM_EVENT_SUSPEND && pm_runtime_suspended(dev);
> dev_dbg(dev, "%s:direct_complete %d, info %s\n", __func__,
> dev->power.direct_complete, info);
> 
> > 
> > > Actually, I don't understand why this is not related to
> > > ignore_children. Could you explain?
> > 
> > It's hard to explain why two things are totally separate.  Much
> > better for you to describe why you think they _are_ related, so that
> > I can explain how you are wrong.
> > 
> > > If the parent knows it can ignore children and already rpm
> > > suspended, why do we still ask children?
> > 
> > The "ignore_children" flag doesn't mean that the parent can ignore
> > its children.  It means that the PM core is allowed to do a runtime
> > suspend of the parent while leaving the children at full power.
> > 
> > In particular, it doesn't mean that the children's ->suspend()
> > callback will work correctly if it is called while the parent is
> > runtime suspended.
> that explains my question about ignore_chilren flag. thanks.
> > 
> > Alan Stern
> > 
> 
> [Jacob Pan]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: fix page fault tracing when KVM guest support enabled

2014-05-16 Thread Steven Rostedt
On Fri, 16 May 2014 12:45:15 -0700
Dave Hansen  wrote:
> 
> Steven had an alternative to this which has zero overhead when
> tracing is off where this includes the standard noops even when
> tracing is disabled.  I'm unconvinced that the extra complexity
> of his apporach:
> 
>   http://lkml.kernel.org/r/20140508194508.561ed...@gandalf.local.home
> 
> is worth it, expecially considering that the KVM code is already
> making page fault entry slower here.  This solution is
> dirt-simple.

I just threw it out there as a suggestion. I don't care either way.

Acked-by: Steven Rostedt 

You probably need an acked-by from hpa or one of the other x86
maintainers as it touches the generic traps.h header.

-- Steve

> 
> Gleb, please apply.
> 
> Signed-off-by: Dave Hansen 
> Cc: Thomas Gleixner 
> Cc: x...@kernel.org
> Cc: Peter Zijlstra 
> Cc: Gleb Natapov 
> Cc: "H. Peter Anvin" 
> Cc: k...@vger.kernel.org
> Cc: Paolo Bonzini 
> Cc: Steven Rostedt 
> ---
> 
>  b/arch/x86/include/asm/traps.h |5 +
>  b/arch/x86/kernel/kvm.c|2 +-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff -puN arch/x86/include/asm/traps.h~muck-with-kvm-guest-code 
> arch/x86/include/asm/traps.h
> --- a/arch/x86/include/asm/traps.h~muck-with-kvm-guest-code   2014-05-16 
> 12:29:23.900429347 -0700
> +++ b/arch/x86/include/asm/traps.h2014-05-16 12:29:23.905429570 -0700
> @@ -74,6 +74,11 @@ dotraplinkage void do_general_protection
>  dotraplinkage void do_page_fault(struct pt_regs *, unsigned long);
>  #ifdef CONFIG_TRACING
>  dotraplinkage void trace_do_page_fault(struct pt_regs *, unsigned long);
> +#else
> +static inline void trace_do_page_fault(struct pt_regs *regs, unsigned long 
> error)
> +{
> + do_page_fault(regs, error);
> +}
>  #endif
>  dotraplinkage void do_spurious_interrupt_bug(struct pt_regs *, long);
>  dotraplinkage void do_coprocessor_error(struct pt_regs *, long);
> diff -puN arch/x86/kernel/kvm.c~muck-with-kvm-guest-code arch/x86/kernel/kvm.c
> --- a/arch/x86/kernel/kvm.c~muck-with-kvm-guest-code  2014-05-16 
> 12:29:23.902429437 -0700
> +++ b/arch/x86/kernel/kvm.c   2014-05-16 12:29:23.906429615 -0700
> @@ -259,7 +259,7 @@ do_async_page_fault(struct pt_regs *regs
>  
>   switch (kvm_read_and_reset_pf_reason()) {
>   default:
> - do_page_fault(regs, error_code);
> + trace_do_page_fault(regs, error_code);
>   break;
>   case KVM_PV_REASON_PAGE_NOT_PRESENT:
>   /* page is swapped out by the host. */
> _

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bnx2x: Convert return 0 to return rc

2014-05-16 Thread David Miller
From: Joe Perches 
Date: Fri, 16 May 2014 13:12:24 -0700

> Couple things actually:
> o Could you please update the MAINTAINER entry for
>   BNX2X?  Ariel Elior's email address is still listed
>   as @broadcom and that seems to bounce.

Let's please give the Broadcom folks a reasonable opportunity to update
the MAINTAINERS entry, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v1 4/5] pci: dw: add common functions to support old hw based pci driver

2014-05-16 Thread Karicheri, Muralidharan
Adding more people to the list for review.
>-Original Message-
>From: Karicheri, Muralidharan
>Sent: Thursday, May 15, 2014 12:02 PM
>To: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; linux-arm-
>ker...@lists.infradead.org
>Cc: Karicheri, Muralidharan; Shilimkar, Santosh; Mohit Kumar; Jingoo Han; 
>Bjorn Helgaas
>Subject: [PATCH v1 4/5] pci: dw: add common functions to support old hw based 
>pci driver
>
>The older version of DW hw has application space registers for MSI controller 
>and
>inbound/outbound access configuration. Also the legacy interrupt has registers 
>in the
>application space. Drivers such as keystone pci uses these common functions to 
>implement
>the driver.
>These are re-factored from the original driver to separate files to allow 
>re-use for the next
>driver that is based on old dw pci hw such as that found on keystone.
>
>CC: Santosh Shilimkar 
>CC: Mohit Kumar 
>CC: Jingoo Han 
>CC: Bjorn Helgaas 
>
>Signed-off-by: Murali Karicheri 
>---
> drivers/pci/host/Kconfig  |6 +-
> drivers/pci/host/Makefile |1 +
> drivers/pci/host/pci-dw-old-msi.c |  150 +++
> drivers/pci/host/pci-dw-old.c |  371
>+
> drivers/pci/host/pci-dw-old.h |   30 +++
> 5 files changed, 557 insertions(+), 1 deletion(-)  create mode 100644 
> drivers/pci/host/pci-
>dw-old-msi.c  create mode 100644 drivers/pci/host/pci-dw-old.c  create mode 
>100644
>drivers/pci/host/pci-dw-old.h
>
>diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig index 
>a6f67ec..c4f4732
>100644
>--- a/drivers/pci/host/Kconfig
>+++ b/drivers/pci/host/Kconfig
>@@ -9,6 +9,11 @@ config PCI_MVEBU
> config PCIE_DW
>   bool
>
>+config PCI_DW_OLD
>+  bool "Designware Old PCIe h/w"
>+  help
>+ Say Y here if the DW h/w is old version (3.65)
>+
> config PCI_EXYNOS
>   bool "Samsung Exynos PCIe controller"
>   depends on SOC_EXYNOS5440
>@@ -32,5 +37,4 @@ config PCI_RCAR_GEN2
> Say Y here if you want internal PCI support on R-Car Gen2 SoC.
> There are 3 internal PCI controllers available with a single
> built-in EHCI/OHCI host controller present on each one.
>-
> endmenu
>diff --git a/drivers/pci/host/Makefile b/drivers/pci/host/Makefile index 
>13fb333..be5d939
>100644
>--- a/drivers/pci/host/Makefile
>+++ b/drivers/pci/host/Makefile
>@@ -4,3 +4,4 @@ obj-$(CONFIG_PCI_IMX6) += pci-imx6.o
> obj-$(CONFIG_PCI_MVEBU) += pci-mvebu.o
> obj-$(CONFIG_PCI_TEGRA) += pci-tegra.o
> obj-$(CONFIG_PCI_RCAR_GEN2) += pci-rcar-gen2.o
>+obj-$(CONFIG_PCI_DW_OLD) += pci-dw-old-msi.o pci-dw-old.o
>diff --git a/drivers/pci/host/pci-dw-old-msi.c 
>b/drivers/pci/host/pci-dw-old-msi.c
>new file mode 100644
>index 000..450bb2f
>--- /dev/null
>+++ b/drivers/pci/host/pci-dw-old-msi.c
>@@ -0,0 +1,150 @@
>+/*
>+ * Designware(dw) old MSI controller (v3.65 or similar)
>+ *
>+ * Copyright (C) 2013-2014 Texas Instruments., Ltd.
>+ *http://www.ti.com
>+ *
>+ * Author: Murali Karicheri 
>+ *
>+ *
>+ * This program is free software; you can redistribute it and/or modify
>+ * it under the terms of the GNU General Public License version 2 as
>+ * published by the Free Software Foundation.
>+ */
>+
>+#include 
>+#include 
>+#include 
>+#include 
>+
>+#include "pcie-designware.h"
>+#include "pci-dw-old.h"
>+
>+#define MSI_IRQ   0x054
>+#define MSI0_IRQ_STATUS   0x104
>+#define MSI0_IRQ_ENABLE_SET   0x108
>+#define MSI0_IRQ_ENABLE_CLR   0x10c
>+#define IRQ_STATUS0x184
>+#define IRQ_EOI 0x050
>+#define MSI_IRQ_OFFSET4
>+
>+static inline struct pcie_port *sys_to_pcie(struct pci_sys_data *sys) {
>+  return sys->private_data;
>+}
>+
>+static inline void update_reg_offset_bit_pos(u32 offset, u32 *reg_offset,
>+  u32 *bit_pos)
>+{
>+  *reg_offset = offset % 8;
>+  *bit_pos = offset >> 3;
>+}
>+
>+inline u32 dw_old_get_msi_data(struct pcie_port *pp) {
>+  return pp->app_base + MSI_IRQ;
>+}
>+
>+void dw_old_handle_msi_irq(struct pcie_port *pp, int offset) {
>+  u32 pending, vector;
>+  int src, virq;
>+
>+  pending = readl(pp->va_app_base + MSI0_IRQ_STATUS + (offset << 4));
>+  /*
>+   * MSI0, Status bit 0-3 shows vectors 0, 8, 16, 24, MSI1 status bit
>+   * shows 1, 9, 17, 25 and so forth
>+   */
>+  for (src = 0; src < 4; src++) {
>+  if (BIT(src) & pending) {
>+  vector = offset + (src << 3);
>+  virq = irq_linear_revmap(pp->irq_domain, vector);
>+  dev_dbg(pp->dev,
>+  "irq: bit %d, vector %d, virq %d\n",
>+   src, vector, virq);
>+  generic_handle_irq(virq);
>+  }
>+  }
>+}
>+
>+static void dw_old_msi_irq_ack(struct irq_data *d) {
>+  u32 offset, 

RE: [PATCH v1 5/5] pci: keystone: add pcie driver based on designware core driver

2014-05-16 Thread Karicheri, Muralidharan
Adding more people to the list for review.

>-Original Message-
>From: Karicheri, Muralidharan
>Sent: Thursday, May 15, 2014 12:02 PM
>To: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; linux-arm-
>ker...@lists.infradead.org
>Cc: Karicheri, Muralidharan; Shilimkar, Santosh; Mohit Kumar; Jingoo Han; 
>Bjorn Helgaas;
>Strashko, Grygorii
>Subject: [PATCH v1 5/5] pci: keystone: add pcie driver based on designware 
>core driver
>
>keystone pcie hardware is based on designware version 3.65.
>This driver make use of the functions from pci-dw-old.c and pci-dw-old-msi.c 
>to implement
>the driver.
>
>Driver mainly handle the platform specific part of the PCI driver and depends 
>on DW Old
>driver to configure application specific registers. Also routes the irq events 
>and ack the
>interrupt after the same is acked by the end point device driver. This 
>requires irqchip
>implementation for legacy and MSI irq handling. This patch adds a quirks to 
>override the
>max read request size as PCI controller has a limit of 256 bytes.
>
>CC: Santosh Shilimkar 
>CC: Mohit Kumar 
>CC: Jingoo Han 
>CC: Bjorn Helgaas 
>
>Signed-off-by: Murali Karicheri 
>Signed-off-by: Grygorii Strashko 
>---
> .../devicetree/bindings/pci/pcie-keystone.txt  |   68 
> drivers/pci/host/Kconfig   |8 +
> drivers/pci/host/Makefile  |1 +
> drivers/pci/host/pci-keystone.c|  400 
> drivers/pci/quirks.c   |   13 +
> 5 files changed, 490 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/pci/pcie-keystone.txt
> create mode 100644 drivers/pci/host/pci-keystone.c
>
>diff --git a/Documentation/devicetree/bindings/pci/pcie-keystone.txt
>b/Documentation/devicetree/bindings/pci/pcie-keystone.txt
>new file mode 100644
>index 000..17cf261
>--- /dev/null
>+++ b/Documentation/devicetree/bindings/pci/pcie-keystone.txt
>@@ -0,0 +1,68 @@
>+Keystone PCIE Root complex device tree bindings
>+---
>+
>+Sample bindings shown below:-
>+
>+ - Remove ti,enable-linktrain if boot loader already does Link training and 
>do EP
>+   configuration.
>+ - Remove ti,init-phy if boot loader already initialize the phy and sets up 
>pcie
>+   link
>+
>+  pcie0_phy: pciephy@232 {
>+  #address-cells = <1>;
>+  #size-cells = <1>;
>+  #phy-cells = <0>;
>+  compatible = "ti,keystone-phy";
>+  reg = <0x0232 0x4000>;
>+  reg-names = "reg_serdes";
>+  };
>+
>+  pcie@2180 {
>+  compatible = "ti,keystone-pcie";
>+  device_type = "pci";
>+  clocks = <>;
>+  clock-names = "pcie";
>+  #address-cells = <3>;
>+  #size-cells = <2>;
>+  reg =  <0x2180 0x1000>, <0x0262014c 4>;
>+  reg-names = "reg_rc_app", "reg_devcfg";
>+  interrupts = ,
>+  ,
>+  ,
>+  ,
>+  ,
>+  ,
>+  ,
>+  ,
>+  ; /* 
>Error IRQ */
>+
>+  ranges = <0x0800 0 0x21801000 0x21801000 0 
>0x0002000/*
>Configuration space */
>+0x8100 0 0  0x2400 0 0x4000   
>/* downstream
>I/O */
>+0x8200 0 0x5000 0x5000 0 
>0x1000>; /*
>+non-prefetchable memory */
>+
>+  num-lanes = <2>;
>+  ti,enable-linktrain;
>+  ti,init-phy;
>+
>+  /* PCIE phy */
>+  phys = <_phy>;
>+  phy-names = "pcie-phy";
>+
>+  #interrupt-cells = <1>;
>+  interrupt-map-mask = <0 0 0 0>;
>+  interrupt-map = <0 0 0 1 _intc 1>, // INT A
>+  <0 0 0 2 _intc 2>, // INT B
>+  <0 0 0 3 _intc 3>, // INT C
>+  <0 0 0 4 _intc 4>; // INT D
>+
>+  pcie_intc: legacy-interrupt-controller {
>+  interrupt-controller;
>+  #interrupt-cells = <1>;
>+  interrupt-parent = <>;
>+  interrupts = ,
>+  ,
>+  ,
>+  ;
>+  };
>+  };
>+
>diff --git a/drivers/pci/host/Kconfig 

RE: [PATCH v1 2/5] pci: designware: enhancements to support keystone pcie

2014-05-16 Thread Karicheri, Muralidharan
Adding more to list.

>-Original Message-
>From: Karicheri, Muralidharan
>Sent: Thursday, May 15, 2014 12:01 PM
>To: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; linux-arm-
>ker...@lists.infradead.org
>Cc: Karicheri, Muralidharan; Mohit Kumar; Jingoo Han; Bjorn Helgaas; 
>Shilimkar, Santosh
>Subject: [PATCH v1 2/5] pci: designware: enhancements to support keystone pcie
>
>keystone pcie hardware is based on designware hw version 3.65.
>There is no support for ATU port and has registers in application space to 
>configure
>inbound/outbound access. Also doesn't support PCI PVM option. The MSI IRQ 
>registers
>available in application space is used to mask/unmask/enable the MSI IRQs.
>
>DW core driver is a set of common functions that are abstracted to support DW 
>pci drivers.
>To allow re-use of these functions for keystone pci driver, core driver is to 
>be enhanced.
>
>Following are done to allow re-use of the functions on keystone pci driver.
>
> 1. Some of the variables in pcie_port struct is folded inside
>a union that now contains both new DW hw related variables as well
>as old hardware related variables such as application reg base.
> 2. Added a dw_pcie_common_host_init() function that holds common
>host initialization code for old and new hw.
> 3. dw_pcie_parse_resource() is used for parsing resource related
>information from DT bindings.
> 4. dw_pcie_host_init() is called by new DW hw drivers as before.
>Added dw_old_pcie_host_init() is it's counter part on old dw hw.
>Both these functions now call dw_pcie_common_host_init().
> 5. Some of the static functions are made global to allow use from
>dw old pci drivers such as pci-keystone.
>
>CC: Mohit Kumar 
>CC: Jingoo Han 
>CC: Bjorn Helgaas 
>CC: Santosh Shilimkar 
>
>Signed-off-by: Murali Karicheri 
>---
> drivers/pci/host/pcie-designware.c |  101 
> drivers/pci/host/pcie-designware.h |   42 ---
> 2 files changed, 103 insertions(+), 40 deletions(-)
>
>diff --git a/drivers/pci/host/pcie-designware.c 
>b/drivers/pci/host/pcie-designware.c
>index c4e3732..9ea8e79 100644
>--- a/drivers/pci/host/pcie-designware.c
>+++ b/drivers/pci/host/pcie-designware.c
>@@ -277,11 +277,15 @@ static int assign_irq(int no_irqs, struct msi_desc 
>*desc, int *pos)
>   }
>   set_bit(pos0 + i, pp->msi_irq_in_use);
>   /*Enable corresponding interrupt in MSI interrupt controller */
>-  res = ((pos0 + i) / 32) * 12;
>-  bit = (pos0 + i) % 32;
>-  dw_pcie_rd_own_conf(pp, PCIE_MSI_INTR0_ENABLE + res, 4, );
>-  val |= 1 << bit;
>-  dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_ENABLE + res, 4, val);
>+  if (!(pp->version & DW_VERSION_OLD)) {
>+  res = ((pos0 + i) / 32) * 12;
>+  bit = (pos0 + i) % 32;
>+  dw_pcie_rd_own_conf(pp, PCIE_MSI_INTR0_ENABLE + res,
>+   4, );
>+  val |= 1 << bit;
>+  dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_ENABLE + res,
>+   4, val);
>+  }
>   }
>
>   *pos = pos0;
>@@ -349,7 +353,10 @@ static int dw_msi_setup_irq(struct msi_chip *chip, struct 
>pci_dev
>*pdev,
>*/
>   desc->msi_attrib.multiple = msgvec;
>
>-  msg.address_lo = virt_to_phys((void *)pp->msi_data);
>+  if (pp->ops->get_msi_data)
>+  msg.address_lo = pp->ops->get_msi_data(pp);
>+  else
>+  msg.address_lo = virt_to_phys((void *)pp->msi_data);
>   msg.address_hi = 0x0;
>   msg.data = pos;
>   write_msi_msg(irq, );
>@@ -389,13 +396,11 @@ static const struct irq_domain_ops msi_domain_ops = {
>   .map = dw_pcie_msi_map,
> };
>
>-int __init dw_pcie_host_init(struct pcie_port *pp)
>+int __init dw_pcie_parse_resource(struct pcie_port *pp)
> {
>   struct device_node *np = pp->dev->of_node;
>-  struct of_pci_range range;
>   struct of_pci_range_parser parser;
>-  u32 val;
>-  int i;
>+  struct of_pci_range range;
>
>   if (of_pci_range_parser_init(, np)) {
>   dev_err(pp->dev, "missing ranges property\n"); @@ -440,23 
> +445,17 @@ int
>__init dw_pcie_host_init(struct pcie_port *pp)
>   return -ENOMEM;
>   }
>   }
>-
>-  pp->cfg0_base = pp->cfg.start;
>-  pp->cfg1_base = pp->cfg.start + pp->config.cfg0_size;
>   pp->mem_base = pp->mem.start;
>
>-  pp->va_cfg0_base = devm_ioremap(pp->dev, pp->cfg0_base,
>-  pp->config.cfg0_size);
>-  if (!pp->va_cfg0_base) {
>-  dev_err(pp->dev, "error with ioremap in function\n");
>-  return -ENOMEM;
>-  }
>-  pp->va_cfg1_base = devm_ioremap(pp->dev, pp->cfg1_base,
>-  pp->config.cfg1_size);
>-  if 

[PATCH v4 1/5] efi: Introduce EFI_DIRECT flag

2014-05-16 Thread Daniel Kiper
Introduce EFI_DIRECT flag. If it is set this means that Linux
Kernel has direct access to EFI infrastructure. If not then
kernel runs on EFI platform but it has not direct control
on EFI stuff. This functionality is used in Xen dom0.

Signed-off-by: Daniel Kiper 
---
 arch/x86/kernel/setup.c |2 ++
 arch/x86/platform/efi/efi.c |   58 ---
 include/linux/efi.h |   13 +-
 3 files changed, 53 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 09c76d2..f41f648 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -926,9 +926,11 @@ void __init setup_arch(char **cmdline_p)
if (!strncmp((char *)_params.efi_info.efi_loader_signature,
 "EL32", 4)) {
set_bit(EFI_BOOT, );
+   set_bit(EFI_DIRECT, );
} else if (!strncmp((char *)_params.efi_info.efi_loader_signature,
 "EL64", 4)) {
set_bit(EFI_BOOT, );
+   set_bit(EFI_DIRECT, );
set_bit(EFI_64BIT, );
}
 
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 3781dd3..7fcef06 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -352,6 +352,9 @@ int __init efi_memblock_x86_reserve_range(void)
struct efi_info *e = _params.efi_info;
unsigned long pmap;
 
+   if (!efi_enabled(EFI_DIRECT))
+   return 0;
+
 #ifdef CONFIG_X86_32
/* Can't handle data above 4GB at this time */
if (e->efi_memmap_hi) {
@@ -457,6 +460,21 @@ void __init efi_free_boot_services(void)
efi_unmap_memmap();
 }
 
+static void __init __iomem *efi_early_ioremap(resource_size_t phys_addr,
+   unsigned long size)
+{
+   if (efi_enabled(EFI_DIRECT))
+   return early_ioremap(phys_addr, size);
+
+   return (__force void __iomem *)phys_addr;
+}
+
+static void __init efi_early_iounmap(void __iomem *addr, unsigned long size)
+{
+   if (efi_enabled(EFI_DIRECT))
+   early_iounmap(addr, size);
+}
+
 static int __init efi_systab_init(void *phys)
 {
if (efi_enabled(EFI_64BIT)) {
@@ -469,8 +487,8 @@ static int __init efi_systab_init(void *phys)
if (!data)
return -ENOMEM;
}
-   systab64 = early_ioremap((unsigned long)phys,
-sizeof(*systab64));
+   systab64 = efi_early_ioremap((unsigned long)phys,
+   sizeof(*systab64));
if (systab64 == NULL) {
pr_err("Couldn't map the system table!\n");
if (data)
@@ -506,7 +524,7 @@ static int __init efi_systab_init(void *phys)
   systab64->tables;
tmp |= data ? data->tables : systab64->tables;
 
-   early_iounmap(systab64, sizeof(*systab64));
+   efi_early_iounmap(systab64, sizeof(*systab64));
if (data)
early_iounmap(data, sizeof(*data));
 #ifdef CONFIG_X86_32
@@ -518,8 +536,8 @@ static int __init efi_systab_init(void *phys)
} else {
efi_system_table_32_t *systab32;
 
-   systab32 = early_ioremap((unsigned long)phys,
-sizeof(*systab32));
+   systab32 = efi_early_ioremap((unsigned long)phys,
+   sizeof(*systab32));
if (systab32 == NULL) {
pr_err("Couldn't map the system table!\n");
return -ENOMEM;
@@ -539,7 +557,7 @@ static int __init efi_systab_init(void *phys)
efi_systab.nr_tables = systab32->nr_tables;
efi_systab.tables = systab32->tables;
 
-   early_iounmap(systab32, sizeof(*systab32));
+   efi_early_iounmap(systab32, sizeof(*systab32));
}
 
efi.systab = _systab;
@@ -619,13 +637,16 @@ static int __init efi_runtime_init(void)
 * address of several of the EFI runtime functions, needed to
 * set the firmware into virtual mode.
 */
-   if (efi_enabled(EFI_64BIT))
-   rv = efi_runtime_init64();
-   else
-   rv = efi_runtime_init32();
 
-   if (rv)
-   return rv;
+   if (efi_enabled(EFI_DIRECT)) {
+   if (efi_enabled(EFI_64BIT))
+   rv = efi_runtime_init64();
+   else
+   rv = efi_runtime_init32();
+
+   if (rv)
+   return rv;
+   }
 
set_bit(EFI_RUNTIME_SERVICES, );
 
@@ -634,6 +655,9 @@ static int __init efi_runtime_init(void)
 
 static int __init efi_memmap_init(void)
 {
+   if (!efi_enabled(EFI_DIRECT))
+ 

[PATCH v4 5/5] arch/x86: Remove redundant set_bit() call

2014-05-16 Thread Daniel Kiper
Remove redundant set_bit(EFI_SYSTEM_TABLES, ) call.
It is executed earlier in efi_systab_init().

Signed-off-by: Daniel Kiper 
---
 arch/x86/platform/efi/efi.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 7fcef06..e7008f0 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -754,8 +754,6 @@ void __init efi_init(void)
if (efi_systab_init(efi_phys.systab))
return;
 
-   set_bit(EFI_SYSTEM_TABLES, );
-
efi.config_table = (unsigned long)efi.systab->tables;
efi.fw_vendor= (unsigned long)efi.systab->fw_vendor;
efi.runtime  = (unsigned long)efi.systab->runtime;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 0/5] xen: Add EFI support

2014-05-16 Thread Daniel Kiper
Hey,

This patch series adds EFI support for Xen dom0 guests.
It is based on Jan Beulich and Tang Liang work. I was
trying to take into account all previous comments,
however, if I missed something sorry for that.

I am still not sure what to do with /sys/firmware/efi/config_table,
/sys/firmware/efi/{fw_vendor,runtime,systab} files. On bare metal
they contain physical addresses of relevant structures. However,
in Xen case they does not make sens. So maybe they should contain
invalid values (e.g. 0) or should not appear at all on Xen (I prefer
last one). What do you think about that?

Daniel

 arch/x86/kernel/setup.c  |6 +-
 arch/x86/platform/efi/efi.c  |   60 ++
 arch/x86/xen/enlighten.c |   26 ++
 drivers/xen/Kconfig  |3 +
 drivers/xen/Makefile |1 +
 drivers/xen/efi.c|  374 
+++
 include/linux/efi.h  |   13 +--
 include/xen/interface/platform.h |  123 +++
 8 files changed, 582 insertions(+), 24 deletions(-)

Daniel Kiper (5):
  efi: Introduce EFI_DIRECT flag
  xen: Define EFI related stuff
  xen: Put EFI machinery in place
  arch/x86: Replace plain strings with constants
  arch/x86: Remove redundant set_bit() call

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 2/5] xen: Define EFI related stuff

2014-05-16 Thread Daniel Kiper
Define EFI related stuff for Xen.

This patch is based on Jan Beulich and Tang Liang work.

v4 - suggestions/fixes:
   - change some types from generic to Xen specific ones
 (suggested by Stefano Stabellini),
   - do some formating changes
 (suggested by Jan Beulich).

Signed-off-by: Jan Beulich 
Signed-off-by: Tang Liang 
Signed-off-by: Daniel Kiper 
---
 include/xen/interface/platform.h |  123 ++
 1 file changed, 123 insertions(+)

diff --git a/include/xen/interface/platform.h b/include/xen/interface/platform.h
index f1331e3..55deeb7 100644
--- a/include/xen/interface/platform.h
+++ b/include/xen/interface/platform.h
@@ -108,11 +108,113 @@ struct xenpf_platform_quirk {
 };
 DEFINE_GUEST_HANDLE_STRUCT(xenpf_platform_quirk_t);
 
+#define XENPF_efi_runtime_call49
+#define XEN_EFI_get_time  1
+#define XEN_EFI_set_time  2
+#define XEN_EFI_get_wakeup_time   3
+#define XEN_EFI_set_wakeup_time   4
+#define XEN_EFI_get_next_high_monotonic_count 5
+#define XEN_EFI_get_variable  6
+#define XEN_EFI_set_variable  7
+#define XEN_EFI_get_next_variable_name8
+#define XEN_EFI_query_variable_info   9
+#define XEN_EFI_query_capsule_capabilities   10
+#define XEN_EFI_update_capsule   11
+
+struct xenpf_efi_runtime_call {
+   uint32_t function;
+   /*
+* This field is generally used for per sub-function flags (defined
+* below), except for the XEN_EFI_get_next_high_monotonic_count case,
+* where it holds the single returned value.
+*/
+   uint32_t misc;
+   xen_ulong_t status;
+   union {
+#define XEN_EFI_GET_TIME_SET_CLEARS_NS 0x0001
+   struct {
+   struct xenpf_efi_time {
+   uint16_t year;
+   uint8_t month;
+   uint8_t day;
+   uint8_t hour;
+   uint8_t min;
+   uint8_t sec;
+   uint32_t ns;
+   int16_t tz;
+   uint8_t daylight;
+   } time;
+   uint32_t resolution;
+   uint32_t accuracy;
+   } get_time;
+
+   struct xenpf_efi_time set_time;
+
+#define XEN_EFI_GET_WAKEUP_TIME_ENABLED 0x0001
+#define XEN_EFI_GET_WAKEUP_TIME_PENDING 0x0002
+   struct xenpf_efi_time get_wakeup_time;
+
+#define XEN_EFI_SET_WAKEUP_TIME_ENABLE  0x0001
+#define XEN_EFI_SET_WAKEUP_TIME_ENABLE_ONLY 0x0002
+   struct xenpf_efi_time set_wakeup_time;
+
+#define XEN_EFI_VARIABLE_NON_VOLATILE   0x0001
+#define XEN_EFI_VARIABLE_BOOTSERVICE_ACCESS 0x0002
+#define XEN_EFI_VARIABLE_RUNTIME_ACCESS 0x0004
+   struct {
+   GUEST_HANDLE(void) name;  /* UCS-2/UTF-16 string */
+   xen_ulong_t size;
+   GUEST_HANDLE(void) data;
+   struct xenpf_efi_guid {
+   uint32_t data1;
+   uint16_t data2;
+   uint16_t data3;
+   uint8_t data4[8];
+   } vendor_guid;
+   } get_variable, set_variable;
+
+   struct {
+   xen_ulong_t size;
+   GUEST_HANDLE(void) name;  /* UCS-2/UTF-16 string */
+   struct xenpf_efi_guid vendor_guid;
+   } get_next_variable_name;
+
+   struct {
+   uint32_t attr;
+   uint64_t max_store_size;
+   uint64_t remain_store_size;
+   uint64_t max_size;
+   } query_variable_info;
+
+   struct {
+   GUEST_HANDLE(void) capsule_header_array;
+   xen_ulong_t capsule_count;
+   uint64_t max_capsule_size;
+   uint32_t reset_type;
+   } query_capsule_capabilities;
+
+   struct {
+   GUEST_HANDLE(void) capsule_header_array;
+   xen_ulong_t capsule_count;
+   uint64_t sg_list; /* machine address */
+   } update_capsule;
+   } u;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xenpf_efi_runtime_call);
+
+#define  XEN_FW_EFI_VERSION0
+#define  XEN_FW_EFI_CONFIG_TABLE   1
+#define  XEN_FW_EFI_VENDOR 2
+#define  XEN_FW_EFI_MEM_INFO   3
+#define  XEN_FW_EFI_RT_VERSION 4
+
 #define XENPF_firmware_info   50
 #define XEN_FW_DISK_INFO  1 /* from int 13 AH=08/41/48 */
 #define XEN_FW_DISK_MBR_SIGNATURE 2 /* from MBR offset 0x1b8 */
 #define XEN_FW_VBEDDC_INFO3 /* from int 10 AX=4f15 */
+#define XEN_FW_EFI_INFO   4 /* from EFI */
 #define XEN_FW_KBD_SHIFT_FLAGS5 /* Int16, Fn02: Get keyboard shift flags. 
*/
+
 struct xenpf_firmware_info {
/* IN variables. */
uint32_t type;
@@ -144,6 +246,26 @@ struct xenpf_firmware_info {
GUEST_HANDLE(uchar) edid;
} vbeddc_info; /* 

[PATCH v4 4/5] arch/x86: Replace plain strings with constants

2014-05-16 Thread Daniel Kiper
Signed-off-by: Daniel Kiper 
---
 arch/x86/kernel/setup.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f41f648..7a67f5d 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -924,11 +924,11 @@ void __init setup_arch(char **cmdline_p)
 #endif
 #ifdef CONFIG_EFI
if (!strncmp((char *)_params.efi_info.efi_loader_signature,
-"EL32", 4)) {
+   EFI32_LOADER_SIGNATURE, 4)) {
set_bit(EFI_BOOT, );
set_bit(EFI_DIRECT, );
} else if (!strncmp((char *)_params.efi_info.efi_loader_signature,
-"EL64", 4)) {
+   EFI64_LOADER_SIGNATURE, 4)) {
set_bit(EFI_BOOT, );
set_bit(EFI_DIRECT, );
set_bit(EFI_64BIT, );
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 3/5] xen: Put EFI machinery in place

2014-05-16 Thread Daniel Kiper
Put EFI machinery for Xen in place.

This patch is based on Jan Beulich and Tang Liang work.

v4 - suggestions/fixes:
   - "just populate an efi_system_table_t object"
 (suggested by Matt Fleming).

Signed-off-by: Jan Beulich 
Signed-off-by: Tang Liang 
Signed-off-by: Daniel Kiper 
---
 arch/x86/xen/enlighten.c |   26 
 drivers/xen/Kconfig  |3 +
 drivers/xen/Makefile |1 +
 drivers/xen/efi.c|  374 ++
 4 files changed, 404 insertions(+)
 create mode 100644 drivers/xen/efi.c

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c34bfc4..d5cc21f 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -150,6 +151,15 @@ struct shared_info *HYPERVISOR_shared_info = 
_dummy_shared_info;
  */
 static int have_vcpu_info_placement = 1;
 
+#ifdef CONFIG_XEN_EFI
+extern efi_system_table_t __init *xen_efi_probe(void);
+#else
+static efi_system_table_t __init *xen_efi_probe(void)
+{
+   return NULL;
+}
+#endif
+
 struct tls_descs {
struct desc_struct desc[3];
 };
@@ -1519,6 +1529,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
 {
struct physdev_set_iopl set_iopl;
int rc;
+   efi_system_table_t *efi_systab_xen;
 
if (!xen_start_info)
return;
@@ -1714,6 +1725,21 @@ asmlinkage __visible void __init xen_start_kernel(void)
 
xen_setup_runstate_info(0);
 
+   efi_systab_xen = xen_efi_probe();
+
+   if (efi_systab_xen) {
+   strncpy((char *)_params.efi_info.efi_loader_signature, 
"Xen",
+   
sizeof(boot_params.efi_info.efi_loader_signature));
+   boot_params.efi_info.efi_systab = 
(__u32)((__u64)efi_systab_xen);
+   boot_params.efi_info.efi_systab_hi = 
(__u32)((__u64)efi_systab_xen >> 32);
+
+   x86_platform.get_wallclock = efi_get_time;
+   x86_platform.set_wallclock = efi_set_rtc_mmss;
+
+   set_bit(EFI_BOOT, );
+   set_bit(EFI_64BIT, );
+   }
+
/* Start the world */
 #ifdef CONFIG_X86_32
i386_start_kernel();
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 38fb36e..cead283 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -240,4 +240,7 @@ config XEN_MCE_LOG
 config XEN_HAVE_PVMMU
bool
 
+config XEN_EFI
+   def_bool X86_64 && EFI
+
 endmenu
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 45e00af..c35de02 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -33,6 +33,7 @@ obj-$(CONFIG_XEN_STUB)+= xen-stub.o
 obj-$(CONFIG_XEN_ACPI_HOTPLUG_MEMORY)  += xen-acpi-memhotplug.o
 obj-$(CONFIG_XEN_ACPI_HOTPLUG_CPU) += xen-acpi-cpuhotplug.o
 obj-$(CONFIG_XEN_ACPI_PROCESSOR)   += xen-acpi-processor.o
+obj-$(CONFIG_XEN_EFI)  += efi.o
 xen-evtchn-y   := evtchn.o
 xen-gntdev-y   := gntdev.o
 xen-gntalloc-y := gntalloc.o
diff --git a/drivers/xen/efi.c b/drivers/xen/efi.c
new file mode 100644
index 000..d81458a
--- /dev/null
+++ b/drivers/xen/efi.c
@@ -0,0 +1,374 @@
+/*
+ * EFI support for Xen.
+ *
+ * Copyright (C) 1999 VA Linux Systems
+ * Copyright (C) 1999 Walt Drummond 
+ * Copyright (C) 1999-2002 Hewlett-Packard Co.
+ * David Mosberger-Tang 
+ * Stephane Eranian 
+ * Copyright (C) 2005-2008 Intel Co.
+ * Fenghua Yu 
+ * Bibo Mao 
+ * Chandramouli Narayanan 
+ * Huang Ying 
+ * Copyright (C) 2011 Novell Co.
+ * Jan Beulich 
+ * Copyright (C) 2011-2012 Oracle Co.
+ * Liang Tang 
+ * Copyright (c) 2014 Daniel Kiper, Oracle Corporation
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include 
+
+#define call (op.u.efi_runtime_call)
+#define DECLARE_CALL(what) \
+   struct xen_platform_op op; \
+   op.cmd = XENPF_efi_runtime_call; \
+   call.function = XEN_EFI_##what; \
+   call.misc = 0
+
+static efi_status_t xen_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
+{
+   int err;
+   DECLARE_CALL(get_time);
+
+   err = HYPERVISOR_dom0_op();
+   if (err)
+   return EFI_UNSUPPORTED;
+
+   if (tm) {
+   BUILD_BUG_ON(sizeof(*tm) != sizeof(call.u.get_time.time));
+   memcpy(tm, _time.time, sizeof(*tm));
+   }
+
+   if (tc) {
+   tc->resolution = call.u.get_time.resolution;
+   tc->accuracy = call.u.get_time.accuracy;
+   tc->sets_to_zero = !!(call.misc &
+ XEN_EFI_GET_TIME_SET_CLEARS_NS);
+   }
+
+   return call.status;
+}
+
+static efi_status_t xen_efi_set_time(efi_time_t *tm)
+{
+   DECLARE_CALL(set_time);
+
+   BUILD_BUG_ON(sizeof(*tm) != sizeof(call.u.set_time));
+   memcpy(_time, tm, sizeof(*tm));
+
+   return 

RE: [PATCH v1 0/5] Add Keystone PCIe controller driver

2014-05-16 Thread Karicheri, Muralidharan
>-Original Message-
>From: Jingoo Han [mailto:jg1@samsung.com]
>Sent: Thursday, May 15, 2014 8:49 PM
>To: Karicheri, Muralidharan
>Cc: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; linux-arm-
>ker...@lists.infradead.org; Shilimkar, Santosh; 'Russell King'; 'Grant 
>Likely'; 'Rob Herring';
>'Mohit Kumar'; 'Bjorn Helgaas'; 'Jingoo Han'; 'Pratyush Anand'; 'Richard Zhu'; 
>ABRAHAM,
>KISHON VIJAY; 'Marek Vasut'
>Subject: Re: [PATCH v1 0/5] Add Keystone PCIe controller driver
>
>On Friday, May 16, 2014 1:01 AM, Murali Karicheri wrote:
>>
>> This patch adds a PCIe controller driver for Keystone SoCs. This is
>> based on the origin RFC patch that I had sent earlier. I have
>> incorporated following comments:-
>>
>>  - Add a interrupt controller node for Legacy irq chip and use
>>interrupt map/map-mask property to map legacy IRQs A/B/C/D
>>  - Add a Phy driver to replace the original serdes driver
>>  - Move common applicaiton register handling code to a separate
>>file to allow re-use across other platforms that use older
>>DW PCIe h/w
>>  - PCI quirk for maximum read request size. Check and override only
>>if the maximum is higher than what controller can handle.
>>  - Converted to a module platform driver.
>>
>> CC: Santosh Shilimkar 
>> CC: Russell King 
>> CC: Grant Likely 
>> CC: Rob Herring 
>> CC: Mohit Kumar 
>> CC: Jingoo Han 
>> CC: Bjorn Helgaas 
>>
>
>Your patches modify 'pcie-designware.c', and affects other PCIe drivers using 
>designware
>PCIe Core IP. Please add the following people to CC list. They are also 
>related to the
>designware PCIe.
>
>  Pratyush Anand 
>  Richard Zhu 
>  Kishon Vijay Abraham I 
>  Marek Vasut 
>
Will forward the patches to the above list as well.

Thanks

Murali
>Best regards,
>Jingoo Han
>
>>
>> Murali Karicheri (5):
>>   ARM: keystone: add pcie related options
>>   pci: designware: enhancements to support keystone pcie
>>   phy: pci serdes phy driver for keystone
>>   pci: dw: add common functions to support old hw based pci driver
>>   pci: keystone: add pcie driver based on designware core driver
>>
>>  .../devicetree/bindings/pci/pcie-keystone.txt  |   68 
>>  arch/arm/mach-keystone/Kconfig |2 +
>>  drivers/pci/host/Kconfig   |   12 +
>>  drivers/pci/host/Makefile  |2 +
>>  drivers/pci/host/pci-dw-old-msi.c  |  150 
>>  drivers/pci/host/pci-dw-old.c  |  371 ++
>>  drivers/pci/host/pci-dw-old.h  |   30 ++
>>  drivers/pci/host/pci-keystone.c|  400 
>> 
>>  drivers/pci/host/pcie-designware.c |  101 +++--
>>  drivers/pci/host/pcie-designware.h |   42 +-
>>  drivers/pci/quirks.c   |   13 +
>>  drivers/phy/Kconfig|6 +
>>  drivers/phy/Makefile   |1 +
>>  drivers/phy/phy-keystone.c |  230 +++
>>  14 files changed, 1388 insertions(+), 40 deletions(-)  create mode
>> 100644 Documentation/devicetree/bindings/pci/pcie-keystone.txt
>>  create mode 100644 drivers/pci/host/pci-dw-old-msi.c  create mode
>> 100644 drivers/pci/host/pci-dw-old.c  create mode 100644
>> drivers/pci/host/pci-dw-old.h  create mode 100644
>> drivers/pci/host/pci-keystone.c  create mode 100644
>> drivers/phy/phy-keystone.c
>>
>> --
>> 1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 mmots] fs/hfsplus/wrapper.c: replace shift loop by fls

2014-05-16 Thread Fabian Frederick
Replace while blocksize;shift by fls -1

Suggested-By: Joe Perches 
Cc: Andrew Morton 
Signed-off-by: Fabian Frederick 
---
v2: rebased on top of mmots
compiles without including bitops

 fs/hfsplus/wrapper.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/hfsplus/wrapper.c b/fs/hfsplus/wrapper.c
index 284c90f..8e8cf50d 100644
--- a/fs/hfsplus/wrapper.c
+++ b/fs/hfsplus/wrapper.c
@@ -231,9 +231,7 @@ reread:
if (blocksize < HFSPLUS_SECTOR_SIZE || ((blocksize - 1) & blocksize))
goto out_free_backup_vhdr;
sbi->alloc_blksz = blocksize;
-   sbi->alloc_blksz_shift = 0;
-   while ((blocksize >>= 1) != 0)
-   sbi->alloc_blksz_shift++;
+   sbi->alloc_blksz_shift = fls(blocksize) - 1;
blocksize = min_t(u32, sbi->alloc_blksz, PAGE_SIZE);
 
/*
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Status of Power Supply Subsystem

2014-05-16 Thread Rafael J. Wysocki
On Friday, May 16, 2014 08:38:14 PM Sebastian Reichel wrote:
> Hi,

Hi,

> It seems the maintainer of the power supply subsystem, Dmitry, has gone 
> missing
> in action since about mid-Feburary. I couldn't find any mail from him on the
> usual mailinglists, he did not reply to any of my mails and the power supply
> subsystem git tree mentioned in the MAINTAINERS file has not been touched 
> since
> 2014-02-01 [0].
> 
> Is there a standard procedure to handle subsystem maintainers, who are missing
> in action? I have patches for the power supply subsystem and have seen more
> patches from other developers on the mailinglists.
> 
> [0] git://git.infradead.org/battery-2.6.git

You can volunteer to take over the maintainership, but then I guess it would
make sense to push this through the PM tree anyway, so please feel free to
send a pull request with those fixes to me.

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

signature.asc
Description: This is a digitally signed message part.


Re: [Qemu-devel] [PATCH v1 RFC 3/6] KVM: s390: use facilities and cpu_id per KVM

2014-05-16 Thread Alexander Graf


On 16.05.14 18:09, Michael Mueller wrote:

On Fri, 16 May 2014 16:49:37 +0200
Alexander Graf  wrote:


On 16.05.14 16:46, Michael Mueller wrote:

On Fri, 16 May 2014 13:55:41 +0200
Alexander Graf  wrote:


On 13.05.14 16:58, Michael Mueller wrote:

The patch introduces facilities and cpu_ids per virtual machine.
Different virtual machines may want to expose different facilities and
cpu ids to the guest, so let's make them per-vm instead of global.

In addition this patch renames all ocurrences of *facilities to *fac_list
smilar to the already exiting symbol stfl_fac_list in lowcore.

Signed-off-by: Michael Mueller 
Acked-by: Cornelia Huck 
Reviewed-by: Christian Borntraeger 
---
arch/s390/include/asm/kvm_host.h |   7 +++
arch/s390/kvm/gaccess.c  |   4 +-
arch/s390/kvm/kvm-s390.c | 107 
+++
arch/s390/kvm/kvm-s390.h |  23 +++--
arch/s390/kvm/priv.c |  13 +++--
5 files changed, 113 insertions(+), 41 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 38d487a..b4751ba 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -414,6 +414,12 @@ struct kvm_s390_config {
struct kvm_s390_attr_name name;
};

+struct kvm_s390_cpu_model {

+   unsigned long *sie_fac;
+   struct cpuid cpu_id;
+   unsigned long *fac_list;
+};
+
struct kvm_arch{
struct sca_block *sca;
debug_info_t *dbf;
@@ -427,6 +433,7 @@ struct kvm_arch{
wait_queue_head_t ipte_wq;
struct kvm_s390_config *cfg;
spinlock_t start_stop_lock;
+   struct kvm_s390_cpu_model model;
};

#define KVM_HVA_ERR_BAD		(-1UL)

diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index db608c3..4c7ca40 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -358,8 +358,8 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, 
unsigned
long gva, union asce asce;

	ctlreg0.val = vcpu->arch.sie_block->gcr[0];

-   edat1 = ctlreg0.edat && test_vfacility(8);
-   edat2 = edat1 && test_vfacility(78);
+   edat1 = ctlreg0.edat && test_kvm_facility(vcpu->kvm, 8);
+   edat2 = edat1 && test_kvm_facility(vcpu->kvm, 78);
asce.val = get_vcpu_asce(vcpu);
if (asce.r)
goto real_address;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 01a5212..a53652f 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1,5 +1,5 @@
/*
- * hosting zSeries kernel virtual machines
+ * Hosting zSeries kernel virtual machines
 *
 * Copyright IBM Corp. 2008, 2009
 *
@@ -30,7 +30,6 @@
#include 
#include 
#include 
-#include 
#include 
#include
#include "kvm-s390.h"
@@ -92,15 +91,33 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
};

-unsigned long *vfacilities;

-static struct gmap_notifier gmap_notifier;
+/* upper facilities limit for kvm */
+unsigned long kvm_s390_fac_list_mask[] = {
+   0xff82fff3f47c2000UL,
+   0x005cUL,
+};
+
+unsigned long kvm_s390_fac_list_mask_size(void)
+{
+   BUILD_BUG_ON(ARRAY_SIZE(kvm_s390_fac_list_mask) >
+S390_ARCH_FAC_MASK_SIZE_U64);
+   return ARRAY_SIZE(kvm_s390_fac_list_mask);
+}

-/* test availability of vfacility */

-int test_vfacility(unsigned long nr)
+void kvm_s390_apply_fac_list_mask(unsigned long fac_list[])
{
-   return __test_facility(nr, (void *) vfacilities);
+   unsigned int i;
+
+   for (i = 0; i < S390_ARCH_FAC_LIST_SIZE_U64; i++) {
+   if (i < kvm_s390_fac_list_mask_size())
+   fac_list[i] &= kvm_s390_fac_list_mask[i];
+   else
+   fac_list[i] &= 0UL;
+   }
}

+static struct gmap_notifier gmap_notifier;

+
/* Section: not file related */
int kvm_arch_hardware_enable(void *garbage)
{
@@ -485,6 +502,30 @@ long kvm_arch_vm_ioctl(struct file *filp,
return r;
}

+/* make sure the memory used for fac_list is zeroed */

+void kvm_s390_get_hard_fac_list(unsigned long *fac_list, int size)

Hard? Wouldn't "host" make more sense here?

Renaming "*hard_fac_list" with "*host_fac_list" here and wherever it appears is 
ok with me.


I also think it makes sense to expose the native host facility list to
user space via an ioctl somehow.


In which situation do you need the full facility list. Do you have an example?

If you want to just implement -cpu host to get the exact feature set
that the host gives you, how do you know which set that is?

During qemu machine initalization I call:

kvm_s390_get_machine_props();

which returns the following information:

typedef struct S390MachineProps {
 uint64_t cpuid;
 uint32_t ibc_range;
 uint8_t  pad[4];
 uint64_t fac_mask[S390_ARCH_FAC_MASK_SIZE_UINT64];
 uint64_t 

Re: [PATCH v1 RFC 6/6] KVM: s390: add cpu model support

2014-05-16 Thread Alexander Graf


On 16.05.14 17:39, Michael Mueller wrote:

On Fri, 16 May 2014 14:08:24 +0200
Alexander Graf  wrote:


On 13.05.14 16:58, Michael Mueller wrote:

This patch enables cpu model support in kvm/s390 via the vm attribute
interface.

During KVM initialization, the host properties cpuid, IBC value and the
facility list are stored in the architecture specific cpu model structure.

During vcpu setup, these properties are taken to initialize the related SIE
state. This mechanism allows to adjust the properties from user space and thus
to implement different selectable cpu models.

This patch uses the IBC functionality to block instructions that have not
been implemented at the requested CPU type and GA level compared to the
full host capability.

Userspace has to initialize the cpu model before vcpu creation. A cpu model
change of running vcpus is currently not possible.

Why is this VM global? It usually fits a lot better modeling wise when
CPU types are vcpu properties.

It simplifies the code substantially because it inherently guarantees the vcpus 
being configured
identical. In addition, there is no S390 hardware implementation containing 
inhomogeneous
processor types. Thus I consider the properties as machine specific.


Signed-off-by: Michael Mueller 
---
   arch/s390/include/asm/kvm_host.h |   4 +-
   arch/s390/include/uapi/asm/kvm.h |  23 ++
   arch/s390/kvm/kvm-s390.c | 146 
++-
   arch/s390/kvm/kvm-s390.h |   1 +
   4 files changed, 172 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index b4751ba..6b826cb 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -84,7 +84,8 @@ struct kvm_s390_sie_block {
atomic_t cpuflags;  /* 0x */
__u32 : 1;  /* 0x0004 */
__u32 prefix : 18;
-   __u32 : 13;
+   __u32 : 1;
+   __u32 ibc : 12;
__u8reserved08[4];  /* 0x0008 */
   #define PROG_IN_SIE (1<<0)
__u32   prog0c; /* 0x000c */
@@ -418,6 +419,7 @@ struct kvm_s390_cpu_model {
unsigned long *sie_fac;
struct cpuid cpu_id;
unsigned long *fac_list;
+   unsigned short ibc;
   };
   
   struct kvm_arch{

diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index 313100a..82ef1b5 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -58,12 +58,35 @@ struct kvm_s390_io_adapter_req {
   
   /* kvm attr_group  on vm fd */

   #define KVM_S390_VM_MEM_CTRL 0
+#define KVM_S390_VM_CPU_MODEL  1
   
   /* kvm attributes for mem_ctrl */

   #define KVM_S390_VM_MEM_ENABLE_CMMA  0
   #define KVM_S390_VM_MEM_CLR_CMMA 1
   #define KVM_S390_VM_MEM_CLR_PAGES2
   
+/* kvm attributes for cpu_model */

+
+/* the s390 processor related attributes are r/w */
+#define KVM_S390_VM_CPU_PROCESSOR  0
+struct kvm_s390_vm_cpu_processor {
+   __u64 cpuid;
+   __u16 ibc;
+   __u8  pad[6];
+   __u64 fac_list[256];
+};
+
+/* the machine related attributes are read only */
+#define KVM_S390_VM_CPU_MACHINE1
+struct kvm_s390_vm_cpu_machine {
+   __u64 cpuid;
+   __u32 ibc_range;
+   __u8  pad[4];
+   __u64 fac_mask[256];
+   __u64 hard_fac_list[256];
+   __u64 soft_fac_list[256];
+};
+
   /* for KVM_GET_REGS and KVM_SET_REGS */
   struct kvm_regs {
/* general purpose regs for s390 */
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index a53652f..9965d8b 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -369,6 +369,110 @@ static int kvm_s390_mem_control(struct kvm *kvm, struct 
kvm_device_attr
*attr) return ret;
   }
   
+static int kvm_s390_set_processor(struct kvm *kvm, struct kvm_device_attr *attr)

+{
+   struct kvm_s390_vm_cpu_processor *proc;
+
+   if (atomic_read(>online_vcpus))
+   return -EBUSY;
+
+   proc = kzalloc(sizeof(*proc), GFP_KERNEL);
+   if (!proc)
+   return -ENOMEM;
+
+   if (copy_from_user(proc, (void __user *)attr->addr,
+  sizeof(*proc))) {
+   kfree(proc);
+   return -EFAULT;
+   }
+
+   mutex_lock(>lock);
+   memcpy(>arch.model.cpu_id, >cpuid,
+  sizeof(struct cpuid));
+   kvm->arch.model.ibc = proc->ibc;
+   kvm_s390_apply_fac_list_mask((long unsigned *)proc->fac_list);
+   memcpy(kvm->arch.model.fac_list, proc->fac_list,
+  S390_ARCH_FAC_LIST_SIZE_BYTE);
+   mutex_unlock(>lock);
+   kfree(proc);
+
+   return 0;
+}
+
+static int kvm_s390_set_cpu_model(struct kvm *kvm, struct kvm_device_attr 
*attr)
+{
+   int ret = -ENXIO;
+
+   switch (attr->attr) {
+   case KVM_S390_VM_CPU_PROCESSOR:
+   ret = kvm_s390_set_processor(kvm, attr);
+   break;
+   }
+   return 

RE: [PATCH v1 5/5] pci: keystone: add pcie driver based on designware core driver

2014-05-16 Thread Karicheri, Muralidharan
>-Original Message-
>From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com]
>Sent: Thursday, May 15, 2014 4:52 PM
>To: Karicheri, Muralidharan
>Cc: Arnd Bergmann; linux-arm-ker...@lists.infradead.org; Strashko, Grygorii; 
>linux-
>p...@vger.kernel.org; Jingoo Han; linux-kernel@vger.kernel.org; Shilimkar, 
>Santosh; Mohit
>Kumar; Bjorn Helgaas
>Subject: Re: [PATCH v1 5/5] pci: keystone: add pcie driver based on designware 
>core
>driver
>
>On Thu, May 15, 2014 at 04:04:47PM -0400, Murali Karicheri wrote:
>
>> Jason What you mean by "The PCI core handles setting the maximum read
>> request size already" I see there is function pcie_write_mrrs() in the
>> drivers/pci/probe.c that reads the mps using pcie_get_mps() and then
>> set mrrs to mps. But this function is called only from
>> pcie_bus_configure_set() that is called by
>> pcie_bus_configure_settings()
>
>Right, that is the common code that correctly sets the MRRS that you should be 
>using
>instead of quirks.
>
>> None of them gets called on ARM platform.
>
>Hmm, a cursory glance tells me the same as well.
>
>That seems to be the root problem here, ARM needs to do the PCIE setup just as 
>much as
>any other arch.
>
>So, I would prefer to see you fix ARM common code to call
>pcie_bus_configure_settings() properly, that seems very simple and is 
>obviously needed for
>any PCI-E host driver on ARM.
>

But pcie_bus_configure_settings just make sure the mrrs for a device is not 
greater than
the max payload size. But the quirk that I need is to limit the size of mrr to 
256 as required
by the keystone PCI controller. So I still need to implement a quirk to enforce 
this limit.

In my reply to Arnd, I have agreed to move the quirk to keystone driver.

Murali
>Thoughts? Arnd? Bjorn?
>
>Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1 5/5] pci: keystone: add pcie driver based on designware core driver

2014-05-16 Thread Murali Karicheri

On 5/15/2014 2:20 PM, Arnd Bergmann wrote:

On Thursday 15 May 2014 13:45:08 Murali Karicheri wrote:

+#ifdef CONFIG_PCI_KEYSTONE
+/*
+ * The KeyStone PCIe controller has maximum read request size of 256 bytes.
+ */
+static void quirk_limit_readrequest(struct pci_dev *dev)
+{
+int readrq = pcie_get_readrq(dev);
+
+if (readrq > 256)
+pcie_set_readrq(dev, 256);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, quirk_limit_readrequest);
+#endif /* CONFIG_PCI_KEYSTONE */

This doesn't work: you can't just limit do this for all devices just based
on PCI_KEYSTONE being enabled, you have to check if you are actually using
this controller.

   Arnd

   I assume, I need to check if PCI controller's vendor ID/ device ID
match with the keystone
   PCI controller's ID and call pcie_set_readrq() for all of the slave
PCI devices and do this fixup.
Is this correct understanding?  If you can point me to an example code
for this that will be
really helpful so that I can avoid re-inventing the wheel.

I think it would be best to move the quirk into the keystone pci driver
and compare compare the dev->driver pointer of the PCI controller device.

Arnd

Arnd,

I will move this quirk to keystone pci driver. So in that case, I guess 
your original comment
is not applicable as  this quirk gets enabled only with PCI keystone 
driver enabled. Right?


Murali
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces

2014-05-16 Thread Seth Forshee
On Fri, May 16, 2014 at 12:28:35PM -0700, James Bottomley wrote:
> On Fri, 2014-05-16 at 11:57 -0700, Greg Kroah-Hartman wrote:
> > On Fri, May 16, 2014 at 09:06:07AM -0500, Seth Forshee wrote:
> > > On Thu, May 15, 2014 at 09:35:32PM -0700, Greg Kroah-Hartman wrote:
> > > > On Fri, May 16, 2014 at 01:49:59AM +, Serge Hallyn wrote:
> > > > > > I think having to pick and choose what device nodes you want in a
> > > > > > container is a good thing.  Becides, you would have to do the same 
> > > > > > thing
> > > > > > in the kernel anyway, what's wrong with userspace making the 
> > > > > > decision
> > > > > > here, especially as it knows exactly what it wants to do much more 
> > > > > > so
> > > > > > than the kernel ever can.
> > > > > 
> > > > > For 'real' devices that sounds sensible.  The thing about loop devices
> > > > > is that we simply want to allow a container to say "give me a loop
> > > > > device to use" and have it receive a unique loop device (or 3), 
> > > > > without
> > > > > having to pre-assign them.  I think that would be cleaner to do using
> > > > > a pseudofs and loop-control device, rather than having to have a
> > > > > daemon in userspace on the host farming those out in response to
> > > > > some, I don't know, dbus request?
> > > > 
> > > > I agree that loop devices would be nice to have in a container, and that
> > > > the existing loop interface doesn't really lend itself to that.  So
> > > > create a new type of thing that acts like a loop device in a container.
> > > > But don't try to mess with the whole driver core just for a single type
> > > > of device.
> > > 
> > > No matter what I don't think we get out of this without driver core
> > > changes, whether this was done in loop or by creating something new.
> > > Not unless the whole thing is punted to userspace, anyway.
> > > 
> > > The first problem is that many block device ioctls check for
> > > CAP_SYS_ADMIN. Most of these might not ever be used on loop devices, I'm
> > > not really sure. But loop does at minimum support partitions, and to get
> > > that functionality in an unprivileged container at least the block layer
> > > needs to know the namespace which has privileges for that device.
> > 
> > That's fine, you should have those permissions in a container if you
> > want to do something like that on a loop device, right?
> 
> Really, no.  CAP_SYS_ADMIN is effectively a pseudo root security hole.
> Any user possessing CAP_SYS_ADMIN can do about as much damage as real
> root can, whether or not you use user namespaces, so it would compromise
> a lot of the security we're just bringing to containers.
> 
> > > The second is that all block devices automatically appear in devtmpfs.
> > > The scenario I'm concerned about is that the host could unknowingly use
> > > a loop device exposed to a container, then the container could see data
> > > from the host.
> > 
> > I don't think that's a real issue, the host should know not to do that.
> > 
> > > So we either need a flag to tell the driver core not to create a node
> > > in devtmpfs, or we need a privileged manager in userspace to remove
> > > them (which kind of defeats the purpose). And it gets more complicated
> > > when partition block devs are mixed in, because they can be created
> > > without involvement from the driver - they would need to inherit the
> > > "no devtmpfs node" property from their parent, and if the driver uses
> > > a psuedo fs to create device nodes for userspace then it needs to be
> > > informed about the partitions too so it can create those nodes.
> > 
> > I don't think that will be needed.  Root in a host can do whatever it
> > wants in the containers, so mixing up block devices is the least of the
> > issues involved :)
> > 
> > > So maybe we could get by without the privileged ioctls, as long as it
> > > was understood that unprivileged containers can't do partitioning. But I
> > > do think the devtmpfs problem would need to be addressed.
> > 
> > I don't think unpriviliged containers should be able to do partitioning.
> > An unpriviliged user can't do that, so why should a container be any
> > different?
> 
> To make sure we're on the same page with terminology, there's an
> unprivileged container and a secure container.  In the former, there's
> no root user (all the processes run as non-root), so the container isn't
> expected to perform any actions root would ... that's easy.  In a secure
> container, root is mapped to a nobody user in the host, so is
> effectively unprivileged, but root in the container expects to look like
> a real root within the VPS (and thus may expect to partition things,
> depending on how they've been given access to the block device).  The
> big problem is giving back capabilities to the container root such that
> a) it loses them if it escapes the container and b) it doesn't get
> sufficient capabilities to damage the system.

Based on your description what I was talking about is a secure

Re: [PATCH 1/5] workqueue: Allow changing attributions of ordered workqueues

2014-05-16 Thread Tejun Heo
Hello,

On Fri, May 16, 2014 at 06:16:51PM +0200, Frederic Weisbecker wrote:
> From: Lai Jiangshan 
> 
> Changing the attributions of a workqueue imply the addition of new pwqs
> to replace the old ones. But the current implementation doesn't handle
> ordered workqueues because they can't carry multi-pwqs without breaking
> ordering. Hence ordered workqueues currently aren't allowed to call
> apply_workqueue_attrs().
...
> Signed-off-by: Lai Jiangshan 
> Cc: Christoph Lameter 
> Cc: Kevin Hilman 
> Cc: Lai Jiangshan 
> Cc: Mike Galbraith 
> Cc: Paul E. McKenney 
> Cc: Tejun Heo 
> Cc: Viresh Kumar 
> Signed-off-by: Frederic Weisbecker 

Do you mind touching up the description and comment a bit as it's
going through you?  They have gotten a lot better (kudos to Lai :) but
I still feel the need to touch them up a bit before applying.  I'd
really appreciate if you can do it as part of your series.

> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index c3f076f..c68e84f 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1355,8 +1355,16 @@ retry:
>* If @work was previously on a different pool, it might still be
>* running there, in which case the work needs to be queued on that
>* pool to guarantee non-reentrancy.
> +  *
> +  * We guarantee that only one pwq is active on an ordered workqueue.
> +  * That alone enforces non-reentrancy for works. So ordered works don't

Let's use "work items" instead of "works".

> +  * need to be requeued to their previous pool. Not to mention that
> +  * an ordered work requeing itself over and over on the same pool may
> +  * prevent a pwq from being released in case of a pool switch. The
> +  * newest pool in that case couldn't switch in and its pending works
> +  * would starve.
>*/
> - last_pool = get_work_pool(work);
> + last_pool = wq->flags & __WQ_ORDERED ? NULL : get_work_pool(work);
>   if (last_pool && last_pool != pwq->pool) {
>   struct worker *worker;

I'm not a big fan of the fact that ordered queues need to be handled
differently when queueing, but as the code is currently written, this
is pretty much necessary to maintain execution order, right?
Otherwise, you end up with requeueing work items targeting the pwq it
was executing on and new ones targeting the newest one screwing up the
ordering.  I think that'd be a lot more important to note in the
comment.  This is a correctness measure.  Back-to-back requeueing
being affected by this is just a side-effect.

Also, can you please use plain if/else instead of "?:"?  This isn't a
simple logic and I don't think compressing it with ?: is a good idea.

> @@ -3708,6 +3712,13 @@ static void rcu_free_pwq(struct rcu_head *rcu)
>   container_of(rcu, struct pool_workqueue, rcu));
>  }
>  
> +static struct pool_workqueue *oldest_pwq(struct workqueue_struct *wq)
> +{
> + return list_last_entry(>pwqs, struct pool_workqueue, pwqs_node);
> +}
> +
> +static void pwq_adjust_max_active(struct pool_workqueue *pwq);
> +
>  /*
>   * Scheduled on system_wq by put_pwq() when an unbound pwq hits zero refcnt
>   * and needs to be destroyed.
> @@ -3723,14 +3734,12 @@ static void pwq_unbound_release_workfn(struct 
> work_struct *work)
>   if (WARN_ON_ONCE(!(wq->flags & WQ_UNBOUND)))
>   return;
>  
> - /*
> -  * Unlink @pwq.  Synchronization against wq->mutex isn't strictly
> -  * necessary on release but do it anyway.  It's easier to verify
> -  * and consistent with the linking path.
> -  */
>   mutex_lock(>mutex);
>   list_del_rcu(>pwqs_node);
>   is_last = list_empty(>pwqs);
> + /* try to activate the oldest pwq when needed */
> + if (!is_last && (wq->flags & __WQ_ORDERED))

Why bother with @is_last when it's used only once and the test is
trivial?  Is the test even necessary?  Invoking
pwq_adjust_max_active() directly should work, no?  Also, this needs
whole lot more comment explaining what's going on.

> + pwq_adjust_max_active(oldest_pwq(wq));
>   mutex_unlock(>mutex);
>  
>   mutex_lock(_pool_mutex);
> @@ -3749,6 +3758,16 @@ static void pwq_unbound_release_workfn(struct 
> work_struct *work)
>   }
>  }
>  
> +static bool pwq_active(struct pool_workqueue *pwq)
> +{
> + /* Only the oldest pwq is active in the ordered wq */
> + if (pwq->wq->flags & __WQ_ORDERED)
> + return pwq == oldest_pwq(pwq->wq);
> +
> + /* All pwqs in the non-ordered wq are active */
> + return true;
> +}

Just collapse it into the calling function.  This obfuscates more than
helps.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bnx2x: Convert return 0 to return rc

2014-05-16 Thread Joe Perches
On Fri, 2014-05-16 at 12:02 +, Dmitry Kravkov wrote:
> > -Original Message-
> > From: netdev-ow...@vger.kernel.org [mailto:netdev-
> > ow...@vger.kernel.org] On Behalf Of Joe Perches
> > Sent: Friday, May 16, 2014 9:52 AM
> > To: Ariel Elior; Dmitry Kravkov
> > Cc: netdev; linux-kernel
> > Subject: [PATCH] bnx2x: Convert return 0 to return rc
> >
> > These "return 0;" uses seem wrong as there are
> > rc variables where error return values are set
> > but unused.
[]
> Thanks Joe for catching this!
> 
> Acked-by: Dmitry Kravkov 

Hello Dmitry.

Couple things actually:
o Could you please update the MAINTAINER entry for
  BNX2X?  Ariel Elior's email address is still listed
  as @broadcom and that seems to bounce.

o I found this via coccinelle actually.
  Julia Lawall submitted a patch to remove unused
  netdev_priv calls.  I modified that to check for
  any expression that returns an unused value and
  found this and many others.  There are a lot of
  false positives though.

@@
local idexpression x;
expression E;
@@
-x = E;
... when != x


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lxc-devel] Mount and other notifiers, was: [RFC PATCH 00/11] Add support for devtmpfs in user namespaces

2014-05-16 Thread Michael H. Warfield
On Fri, 2014-05-16 at 12:52 -0700, James Bottomley wrote:
> On Fri, 2014-05-16 at 15:42 -0400, Michael H. Warfield wrote:
> > > As an aside (probably requiring a new thread) we were wondering about
> > > some type of notifier on the mount call that we could vector into the
> > > host to perform the action.  The main issue for us is mount of procfs,
> > > which really needs to be a bind mount in a container.  All of this led
> > > me to speculate that we could use some type of syscall notifier
> > > mechanism to manage capabilities in the host and even intercept and
> > > complete the syscall action within the host rather than having to keep
> > > evolving more an more complex kernel drivers to do this.
> > 
> > Interesting.  That could be very useful.  That might even help with the
> > loop device case where the mounts have to go through loop devices for
> > things like file system images and builds.  Very interesting...

> Right, it might even make the loop case go away because now we can
> present a dummy device in the container and when the host sees and
> attempted mount on this, it just projects a bind mount into the
> container and says I've *wink* mounted your "device" for you.

Nice.  That idea has prospects.  I like the concept.

> This idea is extremely rough, it came from a conversation I had with
> Pavel (cc'd) just before OpenStack about how we might go about
> eliminating our OpenVZ interception of the mount system call which
> currently does all of this in kernel, so we have no code and no proof
> that it's actually feasible (yet).

K.  I look forward to hearing more.

I switched from OpenVZ years ago to LXC because OpenVZ was falling too
far behind in kernel support and patches for the leading edge kernels.
At the time, I was working on the MD5 signature code for the Quagga
routing suite for BGP and couldn't maintain my hosts with OpenVZ and
maintain my BGP connections (I have a public ASN and peer on both IPv4
and IPv6) with MD5 signatures at the same time.  At the time LXC had
just matured enough to serve my needs.  That's interesting to note that
OpenVZ did this by intercepting the mount call.  Very interesting...

> James

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 978-7061 |  m...@wittsend.com
   /\/\|=mhw=|\/\/  | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9  | An optimist believes we live in the best of all
 PGP Key: 0x674627FF| possible worlds.  A pessimist is sure of it!



signature.asc
Description: This is a digitally signed message part


[GIT PULL] parisc updates for v3.15

2014-05-16 Thread Helge Deller
Hi Linus,

please pull the latest parisc architecture fixes for kernel 3.15 from 
  git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git 
parisc-3.15-4

There are two patches in here:

The first patch greatly improves latency and corrects the memory ordering in
our light-weight atomic locking syscall.

The second patch ratelimits printing of userspace segfaults in the same way as
it's done on other platforms. This fixes a possible DOS on parisc since it
prevents the syslog to grow too fast. For example, when the debian acl2 package
was built on our debian buildd servers, this package produced lots of gigabytes
in syslog in very short time and thus filled our harddisks, which then turned
the server nearly completely unaccessible and unresponsive.

Thanks,
Helge


Helge Deller (1):
  parisc: ratelimit userspace segfault printing

John David Anglin (1):
  parisc: Improve LWS-CAS performance

 arch/parisc/Kconfig  |  1 +
 arch/parisc/kernel/syscall.S | 12 +++---
 arch/parisc/kernel/traps.c   | 54 
 arch/parisc/mm/fault.c   | 44 
 4 files changed, 65 insertions(+), 46 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] Ceph fixes for -rc6

2014-05-16 Thread Sage Weil
Hi Linus,

Please pull the following Ceph fixes from

  git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git for-linus

The first patch fixes a problem when we have a page count of 0 for 
sendpage which is triggered by zfs.  The second fixes a bug in CRUSH that 
was resolved in the userland code a while back but fell through the cracks 
on the kernel side.

Thanks!
sage



Chunwei Chen (1):
  libceph: fix corruption when using page_count 0 page in rbd

Ilya Dryomov (1):
  crush: decode and initialize chooseleaf_vary_r

 net/ceph/messenger.c | 20 +++-
 net/ceph/osdmap.c|  5 +
 2 files changed, 24 insertions(+), 1 deletion(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-16 Thread Mike Turquette
Quoting Stephen Warren (2014-05-15 13:20:21)
> On 05/15/2014 04:52 AM, Peter De Schrijver wrote:
> > On Wed, May 14, 2014 at 04:27:40PM +0200, Thierry Reding wrote:
> >> * PGP Signed by an unknown key
> >>
> >> On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
> >>> On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
>  Add shared and cbus clocks to the Tegra124 clock implementation.
> >>>
>  diff --git a/include/dt-bindings/clock/tegra124-car.h 
>  b/include/dt-bindings/clock/tegra124-car.h
> >>>
>  +#define TEGRA124_CLK_C2BUS 401
>  +#define TEGRA124_CLK_C3BUS 402
>  +#define TEGRA124_CLK_GR3D_CBUS 403
>  +#define TEGRA124_CLK_GR2D_CBUS 404
> >>> ...
> >>>
> >>> I worry about this a bit. IIUC, these clocks don't actually exist in HW,
> >>> but are more a way of SW applying policy to the clock that do exist in
> >>> HW. As such, I'm not convinced it's a good idea to expose these clock
> >>> IDS to DT, since DT is supposed to represent the HW, and not be
> >>> influenced by internal SW implementation details.
> >>>
> >>> Do any DTs actually need to used these new clock IDs? I don't think we
> >>> could actually use these value in e.g. tegra124.dtsi's clocks
> >>> properties, since these clocks don't exist in HW. Was it your intent to
> >>> do that? If not, can't we just define these SW-internal clock IDs in the
> >>> header inside the Tegra clock driver, so the values are invisible to DT?
> >>
> >> I'm beginning to wonder if abusing clocks in this way is really the best
> >> solution. From what I understand there are two problems here that are
> >> mostly orthogonal though they're implemented using similar techniques.
> >>
> >> The reason for introducing cbus clocks are still unclear to me. From the
> >> cover letter of this patch series it seems like these should be
> >> completely hidden from drivers and as such they don't belong in device
> >> tree. Also if they are an implementation detail, why are they even
> >> implemented as clocks? Perhaps an example use-case would help illustrate
> >> the need for this.
> > 
> > We don't have a PLL per engine, hence we have to use a PLL as parent for
> > several module clocks. However you can't change a PLLs rate with
> > active clients. So for scaling the PLL clocking eg. VIC or MSENC, you need 
> > to
> > change their parent to a different PLL, change the original PLL rate and 
> > change
> > the parent back to the original PLL, all while ensuring you never exceed the
> > maximum allowed clock at the current voltage. You also want to take into
> > account if a module is clocked so you don't bother handling clocks which are
> > disabled. (eg. if only the VIC clock is enabled, there is no point in 
> > changing
> > the MSENC parent). All this is handled by the 'cbus' clock.
> 
> Presumably though we can handle this "cbus" concept entirely inside the
> clock driver.
> 
> What happens right now is that when a DT node references a clock, the
> driver gets a clock and then manipulates it directly. What if the clock
> core was reworked a bit such that every single clock was a "cbus" clock.
> clk_get() wouldn't return the raw clock object itself, but rather a
> "clock client" object, which would forward requests on to the underlying
> clk. If there's only 1 clk_get(), there's only 1 client, so all requests
> get forwarded automatically. If there are n clk_get_requests(), the
> clock object gets to implement the appropriate voting/... algorithm to
> mediate the requests.

This was proposed before[1][2] and is something that would be great to
have. The scary thing is to start introducing policy into the clock
framework, which I'd like to avoid as much as possible. But arbitration
of resources (with requisite reference counting) is pretty much
non-existent for clock rates (last call to clk_set_rate always wins),
and is very rudimentary for prepare/enable (we have use counting, but it
does not track individual clients/clock consumers).

Revisiting Rabin's patches has been at the bottom of my todo list for a
while now. I'm happy for someone else to take a crack at it.

Regards,
Mike

[1] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2012-November/135290.html
[2] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2012-November/135574.html

> 
> That way, we don't have to expose any of this logic in the device tree,
> or hopefully/mostly even outside the HW clock's implementation.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Replace strings across all the files using script

2014-05-16 Thread Joe Perches
On Fri, 2014-05-16 at 12:07 -0700, anish singh wrote:
> I am planning to do some cleanup and want to
> replace some string as below across all the files
> in kernel.
>   pr_err("%s "
> TO
>   pr_err("%s: "
> 
> Basically adding semicolon after the %s.
> How can i do it across all the files? I don't
> want to individually go to each file and do it.

Presumably, this would be done only for files that use
__func__ as the first argument to printk.

Something like this might work moderately well.

$ git ls-files | grep '\.[ch]$' | \
  xargs perl -i -e 'local $/; while (<>) { 
s/\b(printk|pr_(?:emerg|crit|alert|err|notice|warn|warning|info))\s*\(\s*\"\%s(?:\s*\(\s*\)\s*)?(?:\s*:\s*)?\s*(?:\-\s*)?([^"]+)"\s*,(\s*)__func__\b/$1("%s:
 $2",$3__func__/g; print; }'


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pinctrl: add params in disable_setting for different usage

2014-05-16 Thread Stephen Warren
On 05/16/2014 10:21 AM, Linus Walleij wrote:
> On Wed, May 14, 2014 at 4:01 AM,   wrote:
> 
>> From: Fan Wu 
>>
>> The patch added params in disable_setting to differ the two possible usage,
>> 1.Only want to disable the pin setting in SW aspect, param can be set to "0"
>> 2.Want to disable the pin setting in both HW and SW aspect, param can be set 
>> to "1";
>>
>> The reason why to do this is that:
>> To avoid duplicated enable_setting operation without disabling operation 
>> which will
>> let Pin's desc->mux_usecount keep being added.
>>
>> In the following case, the issue can be reproduced:
>> 1)There is a driver need to switch Pin state dynamicly, E.g. b/t "sleep" and
>> "default" state
>> 2)The Pin setting configuration in the two state is same, like the following 
>> one:
>> component a {
>> pinctrl-names = "default", "sleep";
>> pinctrl-0 = <_grp_setting _grp_setting>;
>> pinctrl-1 = <_grp_setting _grp_setting>;
>> }
>> The "c_grp_setting" config node is totaly same, maybe like following one:
> 
> Hm this is a quite interesting thing if we can get it in place, but
> I need Stephen's consent, also Tony should have a look at this as
> I know he's had the same problem as you in pinctrl-single.

I only briefly looked at the patch, but it probably solves/hides the
immediate problem.

However, rather than doing this, why not just remove
pinmux_disable_setting() completely. It doesn't make sense to "disable a
mux selection" (some value is always selected in the mux register field)
any more than it does to "disable a drive strength selection". We don't
have a pinconf_disable_setting(), and couldn't really add one if we
wanted. For consistency, let's just remove pinmux_disable_setting(). Do
you agree?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lxc-devel] Mount and other notifiers, was: [RFC PATCH 00/11] Add support for devtmpfs in user namespaces

2014-05-16 Thread James Bottomley
On Fri, 2014-05-16 at 15:42 -0400, Michael H. Warfield wrote:
> > As an aside (probably requiring a new thread) we were wondering about
> > some type of notifier on the mount call that we could vector into the
> > host to perform the action.  The main issue for us is mount of procfs,
> > which really needs to be a bind mount in a container.  All of this led
> > me to speculate that we could use some type of syscall notifier
> > mechanism to manage capabilities in the host and even intercept and
> > complete the syscall action within the host rather than having to keep
> > evolving more an more complex kernel drivers to do this.
> 
> Interesting.  That could be very useful.  That might even help with the
> loop device case where the mounts have to go through loop devices for
> things like file system images and builds.  Very interesting...

Right, it might even make the loop case go away because now we can
present a dummy device in the container and when the host sees and
attempted mount on this, it just projects a bind mount into the
container and says I've *wink* mounted your "device" for you.

This idea is extremely rough, it came from a conversation I had with
Pavel (cc'd) just before OpenStack about how we might go about
eliminating our OpenVZ interception of the mount system call which
currently does all of this in kernel, so we have no code and no proof
that it's actually feasible (yet).

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: uprobes && shmem (Was: uprobes: Shift ->readpage check from __copy_insn() to uprobe_register())

2014-05-16 Thread Oleg Nesterov
On 05/16, Hugh Dickins wrote:
>
> On Fri, 16 May 2014, Oleg Nesterov wrote:
> > On 05/16, Oleg Nesterov wrote:
> > >
> > > copy_insn() fails with -EIO if ->readpage == NULL
> >
> > In particular, this means that we can not probe the binaries on tmpfs.
> > This is pity.
>
> Yes, that is a pity: thanks for noticing.

Thanks to Denys ;)

> > It seems that the potential fix is trivial, copy_insn() could use
> > shmem_getpage_gfp(). But, is there any way to figure out that this
>
> shmem_getpage_gfp() itself is static: please use
> shmem_read_mapping_page(mapping, pgoff): inline in linux/shmem_fs.h,
> calls shmem_read_mapping_page_gfp() in mm/shmem.c (a very few places
> need to override gfp_mask too: you do not), calls shmem_getpage_gfp().

Even better, thanks,

> > inode/mapping/aops/whatever is actually shmem?
> >
> > I am looking at shmem_get_inode() and I see nothing which could help,
> > and shmem_aops/etc are all static.
>
> On 3.15 and later, you're in luck: Hannes added bool shmem_mapping(mapping)
> in his 0cd6144aadd2 "mm + fs: prepare for non-page entries in page cache
> radix trees"; and I just checked, it builds for "tiny" !CONFIG_SHMEM too.

Heh. I need to do git pull more often, I guess. Great.

> If you're backporting to an earlier kernel, it would probably be best
> to add in a very small patch, extracting just shmem_mapping() and its
> linux/mm.h declarations from 0cd6144aadd2.
>
> I notice shmem_mapping() checks backing_dev_info,
> whereas shmem_get_mapping_page_gfp() checks a_ops: no problem in that.
> But it reminds me that you should test uprobe on SysV SHM when you're
> done: again I think no problem, but there's an incestuous relationship
> between shm and shmem that can catch us out when adding such checks.

Hugh, thanks a lot!

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] netdev: add support for interface name retrieval from DT aliases

2014-05-16 Thread Florian Fainelli
2014-05-09 1:26 GMT-07:00 Boris BREZILLON :
> Hi David,
>
> On 09/05/2014 04:42, David Miller wrote:
>> From: Boris BREZILLON 
>> Date: Tue,  6 May 2014 17:36:34 +0200
>>
>>> There is currently no proper way to bind a net interface to a specific
>>> name. The interface name is chosen based on the interface type (eth,
>>> wlan, ...) and the interfaces already registered (the core codes takes
>>> the first unused interface id of the given type).
>>>
>>> Add support for DT retrieval of the interface id based on DT aliases.
>>> The alias name must match the interface type (e.g. ethX if you're defining
>>> an ethernet dev alias).
>>>
>>> Signed-off-by: Boris BREZILLON 
>> This really isn't kosher at all.
>
> Just for my personal knowledge, what is wrong with this code ?
> Is it because I'm using "of_" functions in the core code, and you want
> to keep it DT agnostic ?
> Or is it something else ?

I think the major problem is that you are using DT to name interfaces
and enforcing a naming policy within the kernel, while this should be
left solely to user-space. I know that coming from an embedded
use-case this might sound appealing, but the interface naming policy
had better remain in user-space to avoid mixing policy with
mechanisms.

>
>>
>> And there absolutely is a proper way to bind a net interface to
>> a specific name, udev has provided this facility for years.
>
> Thanks for pointing this out.
>
> But, what if the system does not use udev (this is often the case on
> embedded systems where udev is replaced by mdev) ?

Traditional embedded systems are also using a lot of custom software,
why not write a small device mapper program that looks at aliases in
the device-tree and matches it with sysfs entries for these
corresponding network interfaces?

> Moreover, on embedded systems, most users rely on the default interface
> name provided by the kernel.
>
> IIRC (tell me if I'm wrong), before moving to DT we could control the
> probe order of net interfaces derived from platform devices by modifying
> the platform dev registration order (okay, this is only true if the
> platform devices are controlled by the same driver, which is often the
> case when a SoC provides several net interfaces).

I do not quite agree with this, before moving to DT, we were mostly
relying on the linking order imposed by the lines in the Makefile,
which is still the case for a few things. It is sometimes fragile, and
it is sometimes very convenient, and it also provides some perceived
probing order stability, but that's no longer true with e.g: deffered
probing which can happen regardless of DT.

> With DT we can't know for sure the exact probe order because it depends
> on the net interface node position in the DT, and this node position
> might change over the time (or at least it used to change, now that
> we're enforced to declare DT nodes in strict memory @ order it should
> not change that much).

Which is precisely where aliases are coming handy, and I understand
why it is tempting to use them, but aliases are nothing more than the
mechanism to help you, not the policy.

>
> Another issue: what if I want to rename eth0 into eth1 and eth1 into eth0 ?
> I guess I'll have to execute this sequence: eth1 -> eth2, eth0 -> eth1,
> eth2 -> eth0, otherwise the SIOCSIFNAME ioctl will return an error.

Just like you swap two variables, use a temporary name.
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pull request: wireless 2014-05-15

2014-05-16 Thread David Miller
From: "John W. Linville" 
Date: Thu, 15 May 2014 11:45:46 -0400

> Please pull this batch of fixes for the 3.15 stream...

Pulled, thanks a lot John.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86: fix page fault tracing when KVM guest support enabled

2014-05-16 Thread Dave Hansen

From: Dave Hansen 

I noticed on some of my systems that page fault tracing doesn't
work:

cd /sys/kernel/debug/tracing
echo 1 > events/exceptions/enable
cat trace;
# nothing shows up

I eventually traced it down to CONFIG_KVM_GUEST.  At least in a
KVM VM, enabling that option breaks page fault tracing, and
disabling fixes it.  I tried on some old kernels and this does
not appear to be a regression: it never worked.

There are two page-fault entry functions today.  One when tracing
is on and another when it is off.  The KVM code calls do_page_fault()
directly instead of calling the traced version:

> dotraplinkage void __kprobes
> do_async_page_fault(struct pt_regs *regs, unsigned long
> error_code)
> {
> enum ctx_state prev_state;
>
> switch (kvm_read_and_reset_pf_reason()) {
> default:
> do_page_fault(regs, error_code);
> break;
> case KVM_PV_REASON_PAGE_NOT_PRESENT:

I'm also having problems with the page fault tracing on bare
metal (same symptom of no trace output).  I'm unsure if it's
related.

Steven had an alternative to this which has zero overhead when
tracing is off where this includes the standard noops even when
tracing is disabled.  I'm unconvinced that the extra complexity
of his apporach:

http://lkml.kernel.org/r/20140508194508.561ed...@gandalf.local.home

is worth it, expecially considering that the KVM code is already
making page fault entry slower here.  This solution is
dirt-simple.

Gleb, please apply.

Signed-off-by: Dave Hansen 
Cc: Thomas Gleixner 
Cc: x...@kernel.org
Cc: Peter Zijlstra 
Cc: Gleb Natapov 
Cc: "H. Peter Anvin" 
Cc: k...@vger.kernel.org
Cc: Paolo Bonzini 
Cc: Steven Rostedt 
---

 b/arch/x86/include/asm/traps.h |5 +
 b/arch/x86/kernel/kvm.c|2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff -puN arch/x86/include/asm/traps.h~muck-with-kvm-guest-code 
arch/x86/include/asm/traps.h
--- a/arch/x86/include/asm/traps.h~muck-with-kvm-guest-code 2014-05-16 
12:29:23.900429347 -0700
+++ b/arch/x86/include/asm/traps.h  2014-05-16 12:29:23.905429570 -0700
@@ -74,6 +74,11 @@ dotraplinkage void do_general_protection
 dotraplinkage void do_page_fault(struct pt_regs *, unsigned long);
 #ifdef CONFIG_TRACING
 dotraplinkage void trace_do_page_fault(struct pt_regs *, unsigned long);
+#else
+static inline void trace_do_page_fault(struct pt_regs *regs, unsigned long 
error)
+{
+   do_page_fault(regs, error);
+}
 #endif
 dotraplinkage void do_spurious_interrupt_bug(struct pt_regs *, long);
 dotraplinkage void do_coprocessor_error(struct pt_regs *, long);
diff -puN arch/x86/kernel/kvm.c~muck-with-kvm-guest-code arch/x86/kernel/kvm.c
--- a/arch/x86/kernel/kvm.c~muck-with-kvm-guest-code2014-05-16 
12:29:23.902429437 -0700
+++ b/arch/x86/kernel/kvm.c 2014-05-16 12:29:23.906429615 -0700
@@ -259,7 +259,7 @@ do_async_page_fault(struct pt_regs *regs
 
switch (kvm_read_and_reset_pf_reason()) {
default:
-   do_page_fault(regs, error_code);
+   trace_do_page_fault(regs, error_code);
break;
case KVM_PV_REASON_PAGE_NOT_PRESENT:
/* page is swapped out by the host. */
_
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask

2014-05-16 Thread Tejun Heo
Hello, again.

On Fri, May 16, 2014 at 3:34 PM, Tejun Heo  wrote:
> On Fri, May 16, 2014 at 3:32 PM, Christoph Lameter  wrote:
>> It sets a bad precedent. So move to /sys/kernel/workqueue and lets have a
>> symlink that goes back?
>
> Hmm... I don't think it's a good idea to lose uevent. It's an integral
> part in configuring sysfs. Wouldn't it make more sense to move
> /sys/kernel under /sys/devices?

So, the thing is sysfs has been collecting everything under
/sys/devices because other top level directories added complexity
while missing out on basic event mechanism. If you look at other
top-level directories, other than /sys/modules and /sys/kernel,
everything else is symlink into /sys/devices hierarchy and just kept
around for compatibility. For static knobs, it may not matter but for
things like slab and workqueue which can be dynamically created and
destroyed, being hooked up into uevent mechanism is a necessity, so I
really think we better sort it out properly. Maybe it can use
/sys/devices/virtual or maybe we'll need a separate directory such
/sys/devices/kernel but I really don't find moving workqueue to
/sys/kernel at this point a good idea.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces

2014-05-16 Thread Michael H. Warfield
On Fri, 2014-05-16 at 12:20 -0700, James Bottomley wrote:
> On Thu, 2014-05-15 at 21:42 -0400, Michael H. Warfield wrote:
> > On Thu, 2014-05-15 at 15:15 -0700, Greg Kroah-Hartman wrote:
> > > > PS - Apparently both parallels and Michael independently
> > > > project devices which are hot-plugged on the host into containers.
> > > > That also seems like something worth talking about (best practices,
> > > > shortcomings, use cases not met by it, any ways tha the kernel can
> > > > help out) at ksummit/linuxcon.
> > 
> > > I was told that containers would never want devices hotplugged into
> > > them.
> > 
> > Interesting.  You were told they (who they?) would never want them?  Who
> > said that?  I would have never thought that given that other
> > implementations can provide that.  I would certainly want them.  Seems
> > strange to explicitly relegate LXC containers to being second class
> > citizens behind OpenVZ, Parallels, BSD Gaols, and Solaris Zones.

> That would probably be me.  Running hotplug inside a container is a
> security problem and, since containers are easily entered by the host,
> it's very easy to listen for the hotplug in the host and inject it into
> the container using nsenter.

In all virtualization...  The host, particularly root on the host,
exists as deus ex machina, the "god outside the machine".  They are at
my mercy.  Even hardware virtualization can not protect you from the
host.  You wanna hear some frightening talks on virtualization, catch
Joanna (miss little blue pill) Rutkowska some time.  I'm particularly
interesting in her takes on the "anti evil-maid attacks" and I sat in on
her talks on the "north bridge" and "south bridge" malware evasion
techniques.  She's a good speaker who makes powerful points that makes
you sweat but is pleasant in face to face conversation.  I've played
with her Qubes distribution a couple of times and the way it works with
the TPM to insure a secure boot is interesting.  But that's a completely
different topic on trusted computing.

OTOH, there are plenty of other things to worry about in all forms of
virtualization.  At Internet Security Systems, where I was a founder,
fellow, and "X-Force Senior Wizard", we were looking at the ability to
leak information through the USB subsystem.  No isolation is perfect,
especially when you have USB enabled.

But that's my turf.

> I don't think the intention is to label anyone's implementation as
> preferred.  What this shows, I think, is that we all have different
> practises when it comes to setting up containers.  Some are necessary
> because our containers are different.  Some could do with serious
> examination to see if there's really a best way to do the action which
> we would then all use.

And I hope to contribute to the discussion of said actions.

> > I might believe you were never told they would need them, but that's a
> > totally different sense.  Are we going to tell RedHat and the Docker
> > people that LXC is an inferior technology that is complex and unreliable
> > (to quote another poster) compared to these others?  They're saying this
> > will be enterprise technology.  If I go to Amazon AWS or other VPS
> > services and compare, are we not going to stand on a level playing
> > field?  Admittedly, I don't expect Amazon AWS to provide me with serial
> > consoles, but I do expect to be able to mount file system images within
> > my VPS.

> Well, that's another nasty, isn't it.  We all have different ways of
> coping with mount in the container.  I think at plumbers we need to sit
> down with some of this plumbing and work out which pipes carry the same
> fluids and whether we could unify them.

Concur

> As an aside (probably requiring a new thread) we were wondering about
> some type of notifier on the mount call that we could vector into the
> host to perform the action.  The main issue for us is mount of procfs,
> which really needs to be a bind mount in a container.  All of this led
> me to speculate that we could use some type of syscall notifier
> mechanism to manage capabilities in the host and even intercept and
> complete the syscall action within the host rather than having to keep
> evolving more an more complex kernel drivers to do this.

Interesting.  That could be very useful.  That might even help with the
loop device case where the mounts have to go through loop devices for
things like file system images and builds.  Very interesting...

> James

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 978-7061 |  m...@wittsend.com
   /\/\|=mhw=|\/\/  | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9  | An optimist believes we live in the best of all
 PGP Key: 0x674627FF| possible worlds.  A pessimist is sure of it!



signature.asc
Description: This is a digitally signed message part


Re: [PATCH V2 3/3] cpufreq: Tegra: implement intermediate frequency callbacks

2014-05-16 Thread Stephen Warren
On 05/16/2014 03:07 AM, Viresh Kumar wrote:
> Tegra had always been switching to intermediate frequency (pll_p_clk) since
> ever. CPUFreq core has better support for handling notifications for these
> frequencies and so we can adapt Tegra's driver to it.

You need to squash in the patch below in order for this series to work.
Once that's done,

Tested-by: Stephen Warren 

> Signed-off-by: Stephen Warren 
> 
> diff --git a/drivers/cpufreq/tegra-cpufreq.c b/drivers/cpufreq/tegra-cpufreq.c
> index 10b29ec99bdc..c04fec02ac6a 100644
> --- a/drivers/cpufreq/tegra-cpufreq.c
> +++ b/drivers/cpufreq/tegra-cpufreq.c
> @@ -49,13 +49,22 @@ static struct clk *emc_clk;
>  static unsigned int
>  tegra_get_intermediate(struct cpufreq_policy *policy, unsigned int index)
>  {
> -   return clk_get_rate(pll_p_clk);
> +   return clk_get_rate(pll_p_clk) / 1000; /* kHz */
>  }
>  
>  static int
>  tegra_target_intermediate(struct cpufreq_policy *policy, unsigned int 
> frequency)
>  {
> +   WARN_ON(frequency != clk_get_rate(pll_p_clk) / 1000);
> +
> +   /*
> +* Take an extra reference to the main pll so it doesn't turn
> +* off when we move the cpu off of it
> +*/
> +   clk_prepare_enable(pll_x_clk);
> +
> return clk_set_parent(cpu_clk, pll_p_clk);
> +   /* FIXME: if error, remove pll_x reference */
>  }
>  
>  static int tegra_target(struct cpufreq_policy *policy, unsigned int index)
> @@ -74,16 +83,10 @@ static int tegra_target(struct cpufreq_policy *policy, 
> unsigned int index)
> else
> clk_set_rate(emc_clk, 1);  /* emc 50Mhz */
>  
> -   /*
> -* Take an extra reference to the main pll so it doesn't turn
> -* off when we move the cpu off of it
> -*/
> -   clk_prepare_enable(pll_x_clk);
> -
> if (rate == clk_get_rate(pll_p_clk))
> goto out;
>  
> -   ret = clk_set_rate(pll_x_clk, rate);
> +   ret = clk_set_rate(pll_x_clk, rate * 1000);
> if (ret) {
> pr_err("Failed to change pll_x to %lu\n", rate);
> goto out;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6] mm: support madvise(MADV_FREE)

2014-05-16 Thread Kirill A. Shutemov
On Fri, May 16, 2014 at 03:34:27PM +0900, Minchan Kim wrote:
> > > +static inline unsigned long lazyfree_pmd_range(struct mmu_gather *tlb,
> > > + struct vm_area_struct *vma, pud_t *pud,
> > > + unsigned long addr, unsigned long end)
> > > +{
> > > + pmd_t *pmd;
> > > + unsigned long next;
> > > +
> > > + pmd = pmd_offset(pud, addr);
> > > + do {
> > > + next = pmd_addr_end(addr, end);
> > > + if (pmd_trans_huge(*pmd))
> > > + split_huge_page_pmd(vma, addr, pmd);
> > 
> > /* XXX */ as well? :)
> 
> You meant huge page unit lazyfree rather than 4K page unit?
> If so, I will add.

Please, free huge page if range cover it. 

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask

2014-05-16 Thread Tejun Heo
On Fri, May 16, 2014 at 3:32 PM, Christoph Lameter  wrote:
> It sets a bad precedent. So move to /sys/kernel/workqueue and lets have a
> symlink that goes back?

Hmm... I don't think it's a good idea to lose uevent. It's an integral
part in configuring sysfs. Wouldn't it make more sense to move
/sys/kernel under /sys/devices?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask

2014-05-16 Thread Christoph Lameter
On Fri, 16 May 2014, Tejun Heo wrote:

> > Could we fix that? A workqueue is not a device but more a kernel setting.
> >
> > /sys/kernel/workqueue/ ?
>
> Right, that could have been more in line with slab files.  It's
> already too late tho.  This has been exposed for quite a while now.
> Urgh...

It sets a bad precedent. So move to /sys/kernel/workqueue and lets have a
symlink that goes back?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: imx: fix error handling

2014-05-16 Thread Uwe Kleine-König
Hello Walter,

On Fri, May 16, 2014 at 01:49:10PM +0200, walter harms wrote:
> Am 16.05.2014 13:16, schrieb Emil Goode:
> > Hello Walter,
> > 
> > On Fri, May 16, 2014 at 12:40:19PM +0200, walter harms wrote:
> >>
> >>
> >> Am 16.05.2014 11:54, schrieb Emil Goode:
> >>> If we fail to allocate struct platform_device pdev we
> >>> dereference it after the goto label err.
> >>>
> >>> I have rearranged the error handling a bit to fix the issue
> >>> and also make it more clear.
> >>>
> >>> Signed-off-by: Emil Goode 
> >>> ---
> >>> v2: Changed to return -ENOMEM instead of ret where possible and
> >>> updated the subject line.
> >>>
> >>>  arch/arm/mach-imx/devices/platform-ipu-core.c |   22 
> >>> +-
> >>>  1 file changed, 13 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/arch/arm/mach-imx/devices/platform-ipu-core.c 
> >>> b/arch/arm/mach-imx/devices/platform-ipu-core.c
> >>> index fc4dd7c..68f2a4a 100644
> >>> --- a/arch/arm/mach-imx/devices/platform-ipu-core.c
> >>> +++ b/arch/arm/mach-imx/devices/platform-ipu-core.c
> >>> @@ -77,34 +77,38 @@ struct platform_device *__init imx_alloc_mx3_camera(
> >>>  
> >>>   pdev = platform_device_alloc("mx3-camera", 0);
> >>>   if (!pdev)
> >>> - goto err;
> >>> + return ERR_PTR(-ENOMEM);
> >>>  
> >>>   pdev->dev.dma_mask = kmalloc(sizeof(*pdev->dev.dma_mask), GFP_KERNEL);
> >>>   if (!pdev->dev.dma_mask)
> >>> - goto err;
> >>> + goto put_pdev;
> >>>  
> >>>   *pdev->dev.dma_mask = DMA_BIT_MASK(32);
> >>>   pdev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
> >>>  
> >>>   ret = platform_device_add_resources(pdev, res, ARRAY_SIZE(res));
> >>>   if (ret)
> >>> - goto err;
> >>> + goto free_dma_mask;
> >>>  
> >>>   if (pdata) {
> >>>   struct mx3_camera_pdata *copied_pdata;
> >>>  
> >>>   ret = platform_device_add_data(pdev, pdata, sizeof(*pdata));
> >>> - if (ret) {
> >>> -err:
> >>> - kfree(pdev->dev.dma_mask);
> >>> - platform_device_put(pdev);
> >>> - return ERR_PTR(-ENODEV);
> >>> - }
> >>> + if (ret)
> >>> + goto free_dma_mask;
> >>> +
> >>>   copied_pdata = dev_get_platdata(>dev);
> >>>   copied_pdata->dma_dev = _ipu_coredev->dev;
> >>
> >>
> >> the patch is fine, but what use is this copied_pdata ?
> >> It scope ends next line ?
> >>
> >> re,
> >>  wh
> > 
> > I also thought that looked a bit odd, but copied_pdata is a temporary
> > pointer to platform_data of the dev struct.
> > 
> > dev_get_platdata looks like this:
> > 
> > static inline void *dev_get_platdata(const struct device *dev)
> > {
> > return dev->platform_data;
> > }
> > 
> > So I believe it's a more compact way of writing:
> > 
> > pdev->dev->platform_data->dma_dev = _ipu_coredev->dev;
It's not about compactness. The dev_get_platdata accessor exists to be
used instead of directly accessing dev->platform_data. I admit a comment
would be nice ...

Anyhow this is all ugly, actually you'd want to have the dma_dev member
already fixed when calling platform_device_add_data. But you cannot
simply do

pdata->dma_dev = _ipu_coredev->dev;
ret = platform_device_add_data(pdev, pdata, sizeof(*pdata));

because *pdata is const.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces

2014-05-16 Thread James Bottomley
On Fri, 2014-05-16 at 11:57 -0700, Greg Kroah-Hartman wrote:
> On Fri, May 16, 2014 at 09:06:07AM -0500, Seth Forshee wrote:
> > On Thu, May 15, 2014 at 09:35:32PM -0700, Greg Kroah-Hartman wrote:
> > > On Fri, May 16, 2014 at 01:49:59AM +, Serge Hallyn wrote:
> > > > > I think having to pick and choose what device nodes you want in a
> > > > > container is a good thing.  Becides, you would have to do the same 
> > > > > thing
> > > > > in the kernel anyway, what's wrong with userspace making the decision
> > > > > here, especially as it knows exactly what it wants to do much more so
> > > > > than the kernel ever can.
> > > > 
> > > > For 'real' devices that sounds sensible.  The thing about loop devices
> > > > is that we simply want to allow a container to say "give me a loop
> > > > device to use" and have it receive a unique loop device (or 3), without
> > > > having to pre-assign them.  I think that would be cleaner to do using
> > > > a pseudofs and loop-control device, rather than having to have a
> > > > daemon in userspace on the host farming those out in response to
> > > > some, I don't know, dbus request?
> > > 
> > > I agree that loop devices would be nice to have in a container, and that
> > > the existing loop interface doesn't really lend itself to that.  So
> > > create a new type of thing that acts like a loop device in a container.
> > > But don't try to mess with the whole driver core just for a single type
> > > of device.
> > 
> > No matter what I don't think we get out of this without driver core
> > changes, whether this was done in loop or by creating something new.
> > Not unless the whole thing is punted to userspace, anyway.
> > 
> > The first problem is that many block device ioctls check for
> > CAP_SYS_ADMIN. Most of these might not ever be used on loop devices, I'm
> > not really sure. But loop does at minimum support partitions, and to get
> > that functionality in an unprivileged container at least the block layer
> > needs to know the namespace which has privileges for that device.
> 
> That's fine, you should have those permissions in a container if you
> want to do something like that on a loop device, right?

Really, no.  CAP_SYS_ADMIN is effectively a pseudo root security hole.
Any user possessing CAP_SYS_ADMIN can do about as much damage as real
root can, whether or not you use user namespaces, so it would compromise
a lot of the security we're just bringing to containers.

> > The second is that all block devices automatically appear in devtmpfs.
> > The scenario I'm concerned about is that the host could unknowingly use
> > a loop device exposed to a container, then the container could see data
> > from the host.
> 
> I don't think that's a real issue, the host should know not to do that.
> 
> > So we either need a flag to tell the driver core not to create a node
> > in devtmpfs, or we need a privileged manager in userspace to remove
> > them (which kind of defeats the purpose). And it gets more complicated
> > when partition block devs are mixed in, because they can be created
> > without involvement from the driver - they would need to inherit the
> > "no devtmpfs node" property from their parent, and if the driver uses
> > a psuedo fs to create device nodes for userspace then it needs to be
> > informed about the partitions too so it can create those nodes.
> 
> I don't think that will be needed.  Root in a host can do whatever it
> wants in the containers, so mixing up block devices is the least of the
> issues involved :)
> 
> > So maybe we could get by without the privileged ioctls, as long as it
> > was understood that unprivileged containers can't do partitioning. But I
> > do think the devtmpfs problem would need to be addressed.
> 
> I don't think unpriviliged containers should be able to do partitioning.
> An unpriviliged user can't do that, so why should a container be any
> different?

To make sure we're on the same page with terminology, there's an
unprivileged container and a secure container.  In the former, there's
no root user (all the processes run as non-root), so the container isn't
expected to perform any actions root would ... that's easy.  In a secure
container, root is mapped to a nobody user in the host, so is
effectively unprivileged, but root in the container expects to look like
a real root within the VPS (and thus may expect to partition things,
depending on how they've been given access to the block device).  The
big problem is giving back capabilities to the container root such that
a) it loses them if it escapes the container and b) it doesn't get
sufficient capabilities to damage the system.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Input: Introduce the use of the managed version of kzalloc

2014-05-16 Thread Himangi Saraogi
This patch moves data allocated using kzalloc to managed data allocated
using devm_kzalloc and cleans now unnecessary kfrees in probe and remove
functions.

The following Coccinelle semantic patch was used for making the change:

@platform@
identifier p, probefn, removefn;
@@
struct platform_driver p = {
  .probe = probefn,
  .remove = removefn,
};

@prb@
identifier platform.probefn, pdev;
expression e, e1, e2;
@@
probefn(struct platform_device *pdev, ...) {
  <+...
- e = kzalloc(e1, e2)
+ e = devm_kzalloc(>dev, e1, e2)
  ...
?-kfree(e);
  ...+>
}

@rem depends on prb@
identifier platform.removefn;
expression e;
@@
removefn(...) {
  <...
- kfree(e);
  ...>
}

Signed-off-by: Himangi Saraogi 
Acked-by: Julia Lawall 
---
Can I make the code simpler by changing the code:
poll_dev = input_allocate_polled_device();
if (!poll_dev) {
error = -ENOMEM;
goto failed;
}
to have only return -ENOMEM as there is no error message for failure of
kzalloc and the call to input_free_polled_device does nothing when
poll_dev is null.

 drivers/input/keyboard/jornada680_kbd.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/input/keyboard/jornada680_kbd.c 
b/drivers/input/keyboard/jornada680_kbd.c
index 69b1f00..02c8f5b 100644
--- a/drivers/input/keyboard/jornada680_kbd.c
+++ b/drivers/input/keyboard/jornada680_kbd.c
@@ -185,7 +185,8 @@ static int jornada680kbd_probe(struct platform_device *pdev)
struct input_dev *input_dev;
int i, error;
 
-   jornadakbd = kzalloc(sizeof(struct jornadakbd), GFP_KERNEL);
+   jornadakbd = devm_kzalloc(>dev, sizeof(struct jornadakbd),
+ GFP_KERNEL);
if (!jornadakbd)
return -ENOMEM;
 
@@ -233,7 +234,6 @@ static int jornada680kbd_probe(struct platform_device *pdev)
printk(KERN_ERR "Jornadakbd: failed to register driver, error: %d\n",
error);
input_free_polled_device(poll_dev);
-   kfree(jornadakbd);
return error;
 
 }
@@ -244,7 +244,6 @@ static int jornada680kbd_remove(struct platform_device 
*pdev)
 
input_unregister_polled_device(jornadakbd->poll_dev);
input_free_polled_device(jornadakbd->poll_dev);
-   kfree(jornadakbd);
 
return 0;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] af_rxrpc: Fix XDR length check in rxrpc key demarshalling.

2014-05-16 Thread David Miller
From: David Howells 
Date: Thu, 15 May 2014 15:51:22 +0100

> From: Nathaniel W Filardo 
> 
> There may be padding on the ticket contained in the key payload, so just 
> ensure
> that the claimed token length is large enough, rather than exactly the right
> size.
> 
> Signed-off-by: Nathaniel Wesley Filardo 
> Signed-off-by: David Howells 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: imx: fix error handling

2014-05-16 Thread Uwe Kleine-König
Hello Emil,

IMHO the subject is too general. Maybe better use:

ARM: imx: fix error handling in ipu device registration

On Fri, May 16, 2014 at 11:54:05AM +0200, Emil Goode wrote:
> If we fail to allocate struct platform_device pdev we
> dereference it after the goto label err.
> 
> I have rearranged the error handling a bit to fix the issue
> and also make it more clear.
> 
> Signed-off-by: Emil Goode 
> ---
> v2: Changed to return -ENOMEM instead of ret where possible and
> updated the subject line.
> 
>  arch/arm/mach-imx/devices/platform-ipu-core.c |   22 +-
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm/mach-imx/devices/platform-ipu-core.c 
> b/arch/arm/mach-imx/devices/platform-ipu-core.c
> index fc4dd7c..68f2a4a 100644
> --- a/arch/arm/mach-imx/devices/platform-ipu-core.c
> +++ b/arch/arm/mach-imx/devices/platform-ipu-core.c
> @@ -77,34 +77,38 @@ struct platform_device *__init imx_alloc_mx3_camera(
>  
>   pdev = platform_device_alloc("mx3-camera", 0);
>   if (!pdev)
> - goto err;
> + return ERR_PTR(-ENOMEM);
>  
>   pdev->dev.dma_mask = kmalloc(sizeof(*pdev->dev.dma_mask), GFP_KERNEL);
>   if (!pdev->dev.dma_mask)
> - goto err;
> + goto put_pdev;
>  
>   *pdev->dev.dma_mask = DMA_BIT_MASK(32);
>   pdev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
>  
>   ret = platform_device_add_resources(pdev, res, ARRAY_SIZE(res));
>   if (ret)
> - goto err;
> + goto free_dma_mask;
>  
>   if (pdata) {
>   struct mx3_camera_pdata *copied_pdata;
>  
>   ret = platform_device_add_data(pdev, pdata, sizeof(*pdata));
> - if (ret) {
> -err:
> - kfree(pdev->dev.dma_mask);
> - platform_device_put(pdev);
> - return ERR_PTR(-ENODEV);
> - }
> + if (ret)
> + goto free_dma_mask;
> +
>   copied_pdata = dev_get_platdata(>dev);
>   copied_pdata->dma_dev = _ipu_coredev->dev;
>   }
>  
>   return pdev;
> +
> +free_dma_mask:
> + kfree(pdev->dev.dma_mask);
> +put_pdev:
> + platform_device_put(pdev);
> +
> + return ERR_PTR(ret);
>  }
I didn't check if it is easily possible, but converting this file to use
platform_device_register_full might simplify it considerably.

I'm not sure this fix is critical, because the problem happens if an
allocation during boot fails. But still, if you want to get this fix
into a stable release, you should simplify it, i.e. don't do the code
reorganisations. (Also the "more clear" part seems to be subjective, I
like the error handling better as it is now. But that might only be me.)

Are you using this code? I thought arch/arm/mach-imx/devices to be dead
as it is unused on dt platforms.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Linux common clock framework: device with many clocks

2014-05-16 Thread Mike Turquette
On Wed, Apr 30, 2014 at 4:13 PM, Mark Rutland  wrote:
> Hi,

Thanks for Cc'ing me Mark.

>
> On Wed, Apr 30, 2014 at 09:39:11PM +0100, Jim Quinlan wrote:
>> In most examples of .dtsi files I have perused, a device is associated with
>> typically one clock, maybe two.  In the SoC I'm working on, some devices
>> need to turn off multiple clocks for PM, as many as 13.   The driver gets
>> the clocks from the device tree, and when the driver wants to turn off
>> clocks to the device, it loops through all 13 clocks.
>>
>> I'm wondering if is possible to abstract a group of many clocks into one
>> "software clock". Invoking clk_disable() on said software clock would
>> effect the iteration of clk_disable() on all 13 of the clocks it governs.
>>  Enabling would effect clk_enable() on all 13.  This would make the driver
>> writer's life a little simpler.
>>
>> I've looked at the Linux Common Clock Framework, and it doesn't really
>> accommodate multiple active parents as it's somewhat contrary to its
>> design.  Also, playing with the innards of clk.c is ill-advised.  Should I
>> just stick to putting iteration over the clocks in all my drivers, or is
>> there a better way?
>
> This doesn't strike me as a DT issue. The DT should describe all the
> clocks that a given block takes, and the representation of said clocks
> in the DT is completely separate matter from the management of said
> clocks in any given driver.
>
> If you want a helpful abstraction for combining clocks for management
> purposes you'd be better off talking to Mike Turquette (CC'd), as he's
> in charge of the common clock framework.

Jim emailed me privately. Here is my response for posterity:



On Wed, May 7, 2014 at 8:59 AM, Jim Quinlan  wrote:
> Hi Mike,

Hi Jim,



> In most examples of .dtsi files I have perused, a device is associated with
> typically one clock, maybe two.  In the SoC I'm working on, some devices
> need to turn off multiple clocks for PM, as many as 13.   The driver gets
> the clocks from the device tree, and when the driver wants to turn on/off
> clocks to the device, it loops through all 13 clocks.

Is it possible for you to share a data sheet or TRM for this part? I'd
like to better understand your 13 clock requirement.

Is your device driver upstream? If not, can you share a link to bcom's
public vendor git tree with the driver in question?

> I'm wondering if is possible to abstract a group of many clocks into one
> "software clock". Invoking clk_disable() on said software clock would
> effect the iteration of clk_disable() on all 13 of the clocks it governs.

I oppose "software clocks", "virtual clocks" or any kind of struct clk
that doesn't map onto real clock hardware directly.

> Enabling would effect clk_enable() on all 13.  This would make the driver
> writer's life a little simpler.

What you are asking for is an abstraction. We already have this in the
form of Runtime PM where a driver calls pm_runtime_get() and
pm_runtime_put() without having to worry about the details of enabling
and disabling those 13 clocks. Runtime PM is *the* abstraction you are
looking for.

You should check out the RPM stuff as well as the power_domain and
gen_pd stuff. OMAP, SH-Mobile and Ux500 have interesting
implementations of all of this stuff for rather complex SoCs.

> I've looked at the Linux Common Clock Framework, and it doesn't really
> accommodate multiple active parents as it's somewhat contrary to its
> design.  Also, playing with the innards of clk.c is ill-advised.  Should I
> just stick to putting iteration over the clocks in all my drivers, or is
> there a better way?

Can you elaborate on "multiple active parents"? What does that mean?

Regards,
Mike



>
> Cheers,
> Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Dell Latitude E6440 & i8k

2014-05-16 Thread Pali Rohár
On Friday 16 May 2014 21:11:17 Jean Delvare wrote:
> Hi Pali,
> 
> On Fri, 16 May 2014 20:37:41 +0200, Pali Rohár wrote:
> > Hello,
> > 
> > on Dell Latitude E6440 driver i8k reporting total nonsense
> > values
> 
> That's kind of excessive wording, the output isn't that bad.
> 

I mean fan RPM & temp4. Those are for sure incorrect.

> > $ sensors
> > i8k-virtual-0
> > Adapter: Virtual device
> > Right Fan:   93450 RPM
> > CPU:  +57.0°C
> > temp2:+57.0°C
> > temp3:+40.0°C
> > temp4:   +127.0°C
> > 
> > Right Fan and temp4 are for sure incorrect.
> 
> Driver is reverse-engineered so this is best effort and some
> tweaking may be needed.
> 

Ok, if driver is developed without any documentation, then it 
make sense that not working correctly on new machines...

So is not there any documentation? I think that Dell released 
some SMM/BIOS code... But I'm not sure about it.

> > Value temp4 is always 127 and is never changing, but value
> > for Right Fan is increasing when fan is more noisy. So it
> > looks like value for Right Fan is not correctly normalized
> > or multiplier is incorrect.
> > 
> > And name "Right" is incorrect too. Fan is on left side of
> > this notebook, not right as reported by driver.
> > 
> > It is possible to fix these problems?
> 
> Load the i8k driver with fan_mult=1.
> 

Looks like now it reports more plausible value for fan. When fan 
is at low it reporting between 3000 - 3100 RPM.

> Add the following to /etc/sensors.d/i8k.conf:
> 
> chip "i8k-virtual-0"
> 
>label fan2 "Left Fan"
>ignore temp4

And this fixing output from sensors program.

$ sensors
i8k-virtual-0
Adapter: Virtual device
Left Fan:3088 RPM
CPU:  +54.0°C  
temp2:+57.0°C  
temp3:+40.0°C

But Right Fan name is still present in kernel sysfs:

$ grep "" /sys/class/hwmon/hwmon1/*
/sys/class/hwmon/hwmon1/fan2_input:3091
/sys/class/hwmon/hwmon1/fan2_label:Right Fan
/sys/class/hwmon/hwmon1/name:i8k
/sys/class/hwmon/hwmon1/pwm2:128
/sys/class/hwmon/hwmon1/temp1_input:56000
/sys/class/hwmon/hwmon1/temp1_label:CPU
/sys/class/hwmon/hwmon1/temp2_input:57000
/sys/class/hwmon/hwmon1/temp3_input:4
/sys/class/hwmon/hwmon1/temp4_input:127000

-- 
Pali Rohár
pali.ro...@gmail.com


signature.asc
Description: This is a digitally signed message part.


Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask

2014-05-16 Thread Tejun Heo
On Fri, May 16, 2014 at 03:00:42PM -0400, Tejun Heo wrote:
> > /sys/kernel/workqueue/ ?
> 
> Right, that could have been more in line with slab files.  It's
> already too late tho.  This has been exposed for quite a while now.
> Urgh...

Okay, another difference, so things under /sys/devices generate
uevents and can thus be configured on boot or dynamically as the nodes
are created, which doesn't work for other /sys directories and
top-level directories take more boilerplate code.  Maybe the right
thing to do is moving /sys/kernel under /sys/devices and making
/sys/kernel a symlink?  It kinda sucks that the whole thing sits
outside the usual notification mechanism.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces

2014-05-16 Thread James Bottomley
On Thu, 2014-05-15 at 21:42 -0400, Michael H. Warfield wrote:
> On Thu, 2014-05-15 at 15:15 -0700, Greg Kroah-Hartman wrote:
> > > PS - Apparently both parallels and Michael independently
> > > project devices which are hot-plugged on the host into containers.
> > > That also seems like something worth talking about (best practices,
> > > shortcomings, use cases not met by it, any ways tha the kernel can
> > > help out) at ksummit/linuxcon.
> 
> > I was told that containers would never want devices hotplugged into
> > them.
> 
> Interesting.  You were told they (who they?) would never want them?  Who
> said that?  I would have never thought that given that other
> implementations can provide that.  I would certainly want them.  Seems
> strange to explicitly relegate LXC containers to being second class
> citizens behind OpenVZ, Parallels, BSD Gaols, and Solaris Zones.

That would probably be me.  Running hotplug inside a container is a
security problem and, since containers are easily entered by the host,
it's very easy to listen for the hotplug in the host and inject it into
the container using nsenter.

I don't think the intention is to label anyone's implementation as
preferred.  What this shows, I think, is that we all have different
practises when it comes to setting up containers.  Some are necessary
because our containers are different.  Some could do with serious
examination to see if there's really a best way to do the action which
we would then all use.

> I might believe you were never told they would need them, but that's a
> totally different sense.  Are we going to tell RedHat and the Docker
> people that LXC is an inferior technology that is complex and unreliable
> (to quote another poster) compared to these others?  They're saying this
> will be enterprise technology.  If I go to Amazon AWS or other VPS
> services and compare, are we not going to stand on a level playing
> field?  Admittedly, I don't expect Amazon AWS to provide me with serial
> consoles, but I do expect to be able to mount file system images within
> my VPS.

Well, that's another nasty, isn't it.  We all have different ways of
coping with mount in the container.  I think at plumbers we need to sit
down with some of this plumbing and work out which pipes carry the same
fluids and whether we could unify them.

As an aside (probably requiring a new thread) we were wondering about
some type of notifier on the mount call that we could vector into the
host to perform the action.  The main issue for us is mount of procfs,
which really needs to be a bind mount in a container.  All of this led
me to speculate that we could use some type of syscall notifier
mechanism to manage capabilities in the host and even intercept and
complete the syscall action within the host rather than having to keep
evolving more an more complex kernel drivers to do this.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Dell Latitude E6440 & i8k

2014-05-16 Thread Jean Delvare
Hi Pali,

On Fri, 16 May 2014 20:37:41 +0200, Pali Rohár wrote:
> Hello,
> 
> on Dell Latitude E6440 driver i8k reporting total nonsense values

That's kind of excessive wording, the output isn't that bad.

> $ sensors
> i8k-virtual-0
> Adapter: Virtual device
> Right Fan:   93450 RPM
> CPU:  +57.0°C  
> temp2:+57.0°C  
> temp3:+40.0°C  
> temp4:   +127.0°C
> 
> Right Fan and temp4 are for sure incorrect.

Driver is reverse-engineered so this is best effort and some tweaking
may be needed.

> Value temp4 is always 127 and is never changing, but value for 
> Right Fan is increasing when fan is more noisy. So it looks like 
> value for Right Fan is not correctly normalized or multiplier is 
> incorrect.
> 
> And name "Right" is incorrect too. Fan is on left side of this 
> notebook, not right as reported by driver.
> 
> It is possible to fix these problems?

Load the i8k driver with fan_mult=1.

Add the following to /etc/sensors.d/i8k.conf:

chip "i8k-virtual-0"

   label fan2 "Left Fan"
   ignore temp4

-- 
Jean Delvare
SUSE L3 Support
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Staging: gdm72xx: gdm_wimax: Fixed coding style issues.

2014-05-16 Thread Greg KH
On Mon, Apr 28, 2014 at 01:50:11PM -0500, Junsu Shin wrote:
> Fixed following coding style issues.
>  - No space is necessary after a cast
>  - Alignment should match open parenthesis
>  - Braces {} should be used on all arms of this statement

You are doing 3 things, so this should be 3 patches at the least.
Remember, one patch per "thing".

> Signed-off-by: Junsu Shin x

What is the trailing 'x' for?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 12/17] ARM: configs: enable XHCI mvebu support in mvebu_v7_defconfig

2014-05-16 Thread Jason Cooper
On Thu, May 15, 2014 at 12:17:37PM +0200, Gregory CLEMENT wrote:
> The Marvell Armada 38x platform needs the xhci_mvebu driver enabled
> for the xHCI USB hosts, so this commit enables the corresponding
> Kconfig option in mvebu_v7_defconfig.
> 
> Signed-off-by: Gregory CLEMENT 
> Signed-off-by: Thomas Petazzoni 
> ---
>  arch/arm/configs/mvebu_v7_defconfig | 1 +
>  1 file changed, 1 insertion(+)

Applied to mvebu/defconfig

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 14/17] ARM: mvebu: add Device Tree description of xHCI controllers on Armada 38x

2014-05-16 Thread Jason Cooper
On Thu, May 15, 2014 at 12:17:39PM +0200, Gregory CLEMENT wrote:
> The Marvell Armada 38x SoCs contains two xHCI controllers. This commit
> adds the Device Tree description of those interfaces at the SoC level,
> and also enables the two USB3 ports on the Armada 385 DB platform and
> one USB3 port on the Armada 385 RD platform.
> 
> Signed-off-by: Gregory CLEMENT 
> Signed-off-by: Thomas Petazzoni 
> ---
>  arch/arm/boot/dts/armada-385-db.dts |  8 
>  arch/arm/boot/dts/armada-385-rd.dts |  4 
>  arch/arm/boot/dts/armada-38x.dtsi   | 17 +
>  3 files changed, 29 insertions(+)

Patches 14 through 17 applied to mvebu/dt.  Patch 17 amended as Sergei
suggested.

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 10/17] ARM: mvebu: add USB3 support for Armada 38x

2014-05-16 Thread Jason Cooper
On Thu, May 15, 2014 at 12:17:35PM +0200, Gregory CLEMENT wrote:
> This patch adds the selection of the config symbol needed to build the
> USB3 support for Armada 38x into mvebu_v7_defconfig.
> 
> Signed-off-by: Gregory CLEMENT 
> Signed-off-by: Thomas Petazzoni 
> ---
>  arch/arm/mach-mvebu/Kconfig | 1 +
>  1 file changed, 1 insertion(+)

Patches 10 and 11 applied to mvebu/soc

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] time: Provide full featured jiffies_to_nsecs() function

2014-05-16 Thread Steven Rostedt
On Fri, 16 May 2014 17:17:46 +
"Luck, Tony"  wrote:

> Is this function safe to call in every context (including NMI & machine 
> check)?
> [it uses read_seqcount_begin/read_seqcount_retry ... which I *think* is
> safe ... but this stuff is tricky, so I'd like some reassurance].

No, read_seqcount_begin() is not safe in NMI context. If it interrupts
a write, it goes into an infinite spin (see __read_seqcount_begin()).

> 
> Mauro, Steven: Did we just do math on jiffies because we wanted less overhead
> in a tracepoint?

As I meantioned. read_seqcount_begin() is not safe for tracing, it
had to be reimplemented.

-- Steve

> 
> Bigger question (mostly for Mauro) ... what was the motivation for the 
> "uptime"
> tracer to begin with?  The rasdaemon code that is using it converts the times
> from traces into absolute times (by adding an offset it computes by comparing
> uptime and gettimeofday() when it starts).  But this would seem to be fraught
> with problems:
> 1) Do we get this right for events that happen in daylight saving time shift 
> windows?
> 2) Is there a "drift" problem for systems that stay up for months and rely on 
> ntp
> to keep wall clock time in line with reality?
> 
> -Tony

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask

2014-05-16 Thread Tejun Heo
On Fri, May 16, 2014 at 01:52:48PM -0500, Christoph Lameter wrote:
> On Fri, 16 May 2014, Tejun Heo wrote:
> 
> > On Fri, May 16, 2014 at 12:52:26PM -0500, Christoph Lameter wrote:
> > > On Fri, 16 May 2014, Frederic Weisbecker wrote:
> > >
> > > > It works on a lower-level than the per WQ_SYSFS workqueues cpumask files
> > > > such that the effective cpumask applied for a given unbound workqueue is
> > > > the intersection of /sys/devices/virtual/workqueue/$WORKQUEUE/cpumask 
> > > > and
> > > > the new /sys/devices/virtual/workqueue/cpumask_unbounds file.
> > >
> > > Why is there "virtual" directory in the path? Workqueues are "virtual"
> > > devices?
> >
> > I ain't an actual one.  That apparently is the preferred place to
> > present software constructs like workqueues in sysfs.
> 
> Could we fix that? A workqueue is not a device but more a kernel setting.
> 
> /sys/kernel/workqueue/ ?

Right, that could have been more in line with slab files.  It's
already too late tho.  This has been exposed for quite a while now.
Urgh...

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] tlan: Enable link monitoring

2014-05-16 Thread Ondrej Zary
On Friday 16 May 2014 20:00:30 Ondrej Zary wrote:
> Enable old link monitoring code and modify it:
>  - control LINK LED
>  - use separate timer so it does not interfere with ACT LED
>  - reset adapter on link loss to restart autonegotiation
>(required to switch between 10/100 Mbps on OC-2326)

Oops, sorry. This part is broken - the link comes up properly but no packets 
can be transferred then.

-- 
Ondrej Zary
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching

2014-05-16 Thread Steven Rostedt
On Fri, 16 May 2014 18:27:27 +0200 (CEST)
Jiri Kosina  wrote:


> > ftrace did the stop_machine (and still does for some archs), and slowly
> > moved to a more efficient method. kpatch/kgraft should follow suit.
> 
> I don't really agree here.
> 
> I actually believe that "lazy" switching kgraft is doing provides a little 
> bit more in the sense of consistency than stop_machine()-based aproach.
> 
> Consider this scenario:
> 
>   void foo()
>   {
>   for (i=0; i<1; i++) {
>   bar(i);
>   something_else(i);
>   }
>   }
> 
> Let's say you want to live-patch bar(). With stop_machine()-based aproach, 
> you can easily end-up with old bar() and new bar() being called in two 
> consecutive iterations before the loop is even exited, right? (especially 
> on preemptible kernel, or if something_else() goes to sleep).

And bar() should still do the same result. Otherwise you would think
that foo should change too.

> 
> With lazy-switching implemented in kgraft, this can never happen.
> 
> So I'd like to ask for a little bit more explanation why you think the 
> stop_machine()-based patching provides more sanity/consistency assurance 
> than the lazy switching we're doing.

Here's what I'm more concerned with. With "lazy" switching you can have
two tasks running two different versions of bar(). What happens if the
locking of data within bar changes? Say the data was protected
incorrectly with mutex(X) and you now need to protect it with mutex(Y).

With stop machine, you can make sure everyone is out of bar() and all
tasks will use the same mutex to access the data. But with a lazy
approach, one task can be protecting the data with mutex(X) and the
other with mutex(Y) causing both tasks to be accessing the data at the
same time.

*That* is what I'm more concerned about.

I believe there are more issues with running the two different versions
of the same function at the same time than there are with iterations of
two different versions of the call. One would expect that the results
should stay the same and if not, then the callers would need to be
changed too.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces

2014-05-16 Thread Greg Kroah-Hartman
On Fri, May 16, 2014 at 09:06:07AM -0500, Seth Forshee wrote:
> On Thu, May 15, 2014 at 09:35:32PM -0700, Greg Kroah-Hartman wrote:
> > On Fri, May 16, 2014 at 01:49:59AM +, Serge Hallyn wrote:
> > > > I think having to pick and choose what device nodes you want in a
> > > > container is a good thing.  Becides, you would have to do the same thing
> > > > in the kernel anyway, what's wrong with userspace making the decision
> > > > here, especially as it knows exactly what it wants to do much more so
> > > > than the kernel ever can.
> > > 
> > > For 'real' devices that sounds sensible.  The thing about loop devices
> > > is that we simply want to allow a container to say "give me a loop
> > > device to use" and have it receive a unique loop device (or 3), without
> > > having to pre-assign them.  I think that would be cleaner to do using
> > > a pseudofs and loop-control device, rather than having to have a
> > > daemon in userspace on the host farming those out in response to
> > > some, I don't know, dbus request?
> > 
> > I agree that loop devices would be nice to have in a container, and that
> > the existing loop interface doesn't really lend itself to that.  So
> > create a new type of thing that acts like a loop device in a container.
> > But don't try to mess with the whole driver core just for a single type
> > of device.
> 
> No matter what I don't think we get out of this without driver core
> changes, whether this was done in loop or by creating something new.
> Not unless the whole thing is punted to userspace, anyway.
> 
> The first problem is that many block device ioctls check for
> CAP_SYS_ADMIN. Most of these might not ever be used on loop devices, I'm
> not really sure. But loop does at minimum support partitions, and to get
> that functionality in an unprivileged container at least the block layer
> needs to know the namespace which has privileges for that device.

That's fine, you should have those permissions in a container if you
want to do something like that on a loop device, right?

> The second is that all block devices automatically appear in devtmpfs.
> The scenario I'm concerned about is that the host could unknowingly use
> a loop device exposed to a container, then the container could see data
> from the host.

I don't think that's a real issue, the host should know not to do that.

> So we either need a flag to tell the driver core not to create a node
> in devtmpfs, or we need a privileged manager in userspace to remove
> them (which kind of defeats the purpose). And it gets more complicated
> when partition block devs are mixed in, because they can be created
> without involvement from the driver - they would need to inherit the
> "no devtmpfs node" property from their parent, and if the driver uses
> a psuedo fs to create device nodes for userspace then it needs to be
> informed about the partitions too so it can create those nodes.

I don't think that will be needed.  Root in a host can do whatever it
wants in the containers, so mixing up block devices is the least of the
issues involved :)

> So maybe we could get by without the privileged ioctls, as long as it
> was understood that unprivileged containers can't do partitioning. But I
> do think the devtmpfs problem would need to be addressed.

I don't think unpriviliged containers should be able to do partitioning.
An unpriviliged user can't do that, so why should a container be any
different?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask

2014-05-16 Thread Christoph Lameter
On Fri, 16 May 2014, Tejun Heo wrote:

> On Fri, May 16, 2014 at 12:52:26PM -0500, Christoph Lameter wrote:
> > On Fri, 16 May 2014, Frederic Weisbecker wrote:
> >
> > > It works on a lower-level than the per WQ_SYSFS workqueues cpumask files
> > > such that the effective cpumask applied for a given unbound workqueue is
> > > the intersection of /sys/devices/virtual/workqueue/$WORKQUEUE/cpumask and
> > > the new /sys/devices/virtual/workqueue/cpumask_unbounds file.
> >
> > Why is there "virtual" directory in the path? Workqueues are "virtual"
> > devices?
>
> I ain't an actual one.  That apparently is the preferred place to
> present software constructs like workqueues in sysfs.

Could we fix that? A workqueue is not a device but more a kernel setting.

/sys/kernel/workqueue/ ?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 3/3] cpufreq: Tegra: implement intermediate frequency callbacks

2014-05-16 Thread Stephen Warren
On 05/16/2014 03:07 AM, Viresh Kumar wrote:
> Tegra had always been switching to intermediate frequency (pll_p_clk) since
> ever. CPUFreq core has better support for handling notifications for these
> frequencies and so we can adapt Tegra's driver to it.

> diff --git a/drivers/cpufreq/tegra-cpufreq.c b/drivers/cpufreq/tegra-cpufreq.c

> +static int
> +tegra_target_intermediate(struct cpufreq_policy *policy, unsigned int 
> frequency)
> +{
> + return clk_set_parent(cpu_clk, pll_p_clk);
> +}

I think you also need to move the following code from
tegra_cpu_clk_set_rate() to the start of tegra_target_intermediate().
Otherwise, pll_x will turn off, which judging by the comment in
tegra_cpu_clk_set_rate(), shouldn't be allowed to happen:

/*
 * Take an extra reference to the main pll so it doesn't turn
 * off when we move the cpu off of it
 */
clk_prepare_enable(pll_x_clk);

I'll go try this version anyway in a minute...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: uprobes && shmem (Was: uprobes: Shift ->readpage check from __copy_insn() to uprobe_register())

2014-05-16 Thread Hugh Dickins
On Fri, 16 May 2014, Oleg Nesterov wrote:
> On 05/16, Oleg Nesterov wrote:
> >
> > copy_insn() fails with -EIO if ->readpage == NULL
> 
> In particular, this means that we can not probe the binaries on tmpfs.
> This is pity.

Yes, that is a pity: thanks for noticing.

> 
> It seems that the potential fix is trivial, copy_insn() could use
> shmem_getpage_gfp(). But, is there any way to figure out that this

shmem_getpage_gfp() itself is static: please use
shmem_read_mapping_page(mapping, pgoff): inline in linux/shmem_fs.h,
calls shmem_read_mapping_page_gfp() in mm/shmem.c (a very few places
need to override gfp_mask too: you do not), calls shmem_getpage_gfp().

> inode/mapping/aops/whatever is actually shmem?
> 
> I am looking at shmem_get_inode() and I see nothing which could help,
> and shmem_aops/etc are all static.

On 3.15 and later, you're in luck: Hannes added bool shmem_mapping(mapping)
in his 0cd6144aadd2 "mm + fs: prepare for non-page entries in page cache
radix trees"; and I just checked, it builds for "tiny" !CONFIG_SHMEM too.

If you're backporting to an earlier kernel, it would probably be best
to add in a very small patch, extracting just shmem_mapping() and its
linux/mm.h declarations from 0cd6144aadd2.

I notice shmem_mapping() checks backing_dev_info,
whereas shmem_get_mapping_page_gfp() checks a_ops: no problem in that.
But it reminds me that you should test uprobe on SysV SHM when you're
done: again I think no problem, but there's an incestuous relationship
between shm and shmem that can catch us out when adding such checks.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 2/3] cpufreq: add support for intermediate (stable) frequencies

2014-05-16 Thread Stephen Warren
On 05/16/2014 03:07 AM, Viresh Kumar wrote:
> Douglas Anderson, recently pointed out an interesting problem due to which
> udelay() was expiring earlier than it should.
> 
> While transitioning between frequencies few platforms may temporarily switch 
> to
> a stable frequency, waiting for the main PLL to stabilize.
> 
> For example: When we transition between very low frequencies on exynos, like
> between 200MHz and 300MHz, we may temporarily switch to a PLL running at 
> 800MHz.
> No CPUFREQ notification is sent for that. That means there's a period of time
> when we're running at 800MHz but loops_per_jiffy is calibrated at between 
> 200MHz
> and 300MHz. And so udelay behaves badly.
> 
> To get this fixed in a generic way, lets introduce another set of callbacks
> get_intermediate() and target_intermediate(), only for drivers with
> target_index() and CPUFREQ_ASYNC_NOTIFICATION unset.
> 
> get_intermediate should return a stable intermediate frequency platform wants 
> to
> switch to, and target_intermediate() should set CPU to to that frequency, 
> before
> jumping to the frequency corresponding to 'index'. Core will take care of
> sending notifications and driver doesn't have to handle them in
> target_intermediate() or target_index().
> 
> NOTE: Once set to intermediate frequency, driver isn't expected to fail for 
> the
> following ->target_index() call, if it fails core will issue a WARN().

> diff --git a/Documentation/cpu-freq/cpu-drivers.txt 
> b/Documentation/cpu-freq/cpu-drivers.txt

> +cpufreq_driver.get_intermediate
> +and target_intermediate  Uset to switch to stable frequency while
> + changing CPU frequency.

s/Uset/Used.

> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h

> @@ -226,6 +226,21 @@ struct cpufreq_driver {

> + unsigned int (*get_intermediate)(struct cpufreq_policy *policy,
> +  unsigned int index);

Should get_intermediate be passed a struct cpufreq_freqs freqs rather
than just the target index? That way, if the intermediate frequency
varies depending on old/new frequencies, then the driver won't have to
go look up the current frequency in order to implement that logic.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Status of Power Supply Subsystem

2014-05-16 Thread Sebastian Reichel
Hi,

It seems the maintainer of the power supply subsystem, Dmitry, has gone missing
in action since about mid-Feburary. I couldn't find any mail from him on the
usual mailinglists, he did not reply to any of my mails and the power supply
subsystem git tree mentioned in the MAINTAINERS file has not been touched since
2014-02-01 [0].

Is there a standard procedure to handle subsystem maintainers, who are missing
in action? I have patches for the power supply subsystem and have seen more
patches from other developers on the mailinglists.

[0] git://git.infradead.org/battery-2.6.git

-- Sebastian


signature.asc
Description: Digital signature


Dell Latitude E6440 & i8k

2014-05-16 Thread Pali Rohár
Hello,

on Dell Latitude E6440 driver i8k reporting total nonsense values

$ sensors
i8k-virtual-0
Adapter: Virtual device
Right Fan:   93450 RPM
CPU:  +57.0°C  
temp2:+57.0°C  
temp3:+40.0°C  
temp4:   +127.0°C

Right Fan and temp4 are for sure incorrect.

Value temp4 is always 127 and is never changing, but value for 
Right Fan is increasing when fan is more noisy. So it looks like 
value for Right Fan is not correctly normalized or multiplier is 
incorrect.

And name "Right" is incorrect too. Fan is on left side of this 
notebook, not right as reported by driver.

It is possible to fix these problems?

-- 
Pali Rohár
pali.ro...@gmail.com


signature.asc
Description: This is a digitally signed message part.


Re: Cleanup console loglevels

2014-05-16 Thread Randy Dunlap
On 05/16/2014 10:51 AM, Borislav Petkov wrote:
> On Fri, May 16, 2014 at 07:49:21PM +0200, Borislav Petkov wrote:
>> Hi,
>>
>> so I was staring at
>>
>> 12544697f12e ("x86_64: be less annoying on boot, v2")
>>
>> and how naked numbers mean sh*t and how I have to grep sources to find
>> out what this 10 thing means. So how about the following cleanup? We can
>> do it this way, we can do accessors and stuff, whatever. But the naked
>> numbers are plain misleading.
>>
>> So how about it? I'm asking whether it makes sense first before I go
>> and replace all tests of console_loglevel with naked numbers around the
>> tree.
>>
>> Thanks.
>>
>> ---
>> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
>> index 068054f4bf20..0029d974e431 100644
>> --- a/arch/x86/kernel/head64.c
>> +++ b/arch/x86/kernel/head64.c
>> @@ -172,7 +172,7 @@ asmlinkage __visible void __init 
>> x86_64_start_kernel(char * real_mode_data)
>>   */
>>  load_ucode_bsp();
>>  
>> -if (console_loglevel == 10)
>> +if (console_loglevel >= CONSOLE_LOGLEVEL_QUIET)
> 
> That's CONSOLE_LOGLEVEL_DEBUG, of course.
> 
> See, misleading. :-P
> 

Absolutely.  I'll ack it with that change.

-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] x86: insn decoder: create artificial 3rd byte for 2-byte VEX

2014-05-16 Thread Denys Vlasenko
Before this patch, users need to do this to fetch vex.:

if (insn->vex_prefix.nbytes == 2) {
vex_ = ((insn->vex_prefix.bytes[1] >> 3) & 0xf) ^ 0xf;
}
if (insn->vex_prefix.nbytes == 3) {
vex_ = ((insn->vex_prefix.bytes[2] >> 3) & 0xf) ^ 0xf;
}

Make it so that insn->vex_prefix.bytes[2] always contains vex.wLpp bits.

Signed-off-by: Denys Vlasenko 
Cc: Masami Hiramatsu 
Cc: Frank Ch. Eigler 
Cc: Srikar Dronamraju 
Cc: Ananth N Mavinakayanahalli 
Cc: Jim Keniston 
Cc: Oleg Nesterov 
Cc: Andi Kleen 
Cc: Ingo Molnar 
---
 arch/x86/lib/insn.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
index 54fcffe..829ca4c 100644
--- a/arch/x86/lib/insn.c
+++ b/arch/x86/lib/insn.c
@@ -163,6 +163,12 @@ found:
/* VEX.W overrides opnd_size */
insn->opnd_bytes = 8;
} else {
+   /*
+* For VEX2, fake VEX3-like byte#2.
+* Makes it easier to decode vex.W, vex.,
+* vex.L and vex.pp. Masking with 0x7f sets vex.W == 0.
+*/
+   insn->vex_prefix.bytes[2] = b2 & 0x7f;
insn->vex_prefix.nbytes = 2;
insn->next_byte += 2;
}
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask

2014-05-16 Thread Tejun Heo
On Fri, May 16, 2014 at 12:52:26PM -0500, Christoph Lameter wrote:
> On Fri, 16 May 2014, Frederic Weisbecker wrote:
> 
> > It works on a lower-level than the per WQ_SYSFS workqueues cpumask files
> > such that the effective cpumask applied for a given unbound workqueue is
> > the intersection of /sys/devices/virtual/workqueue/$WORKQUEUE/cpumask and
> > the new /sys/devices/virtual/workqueue/cpumask_unbounds file.
> 
> Why is there "virtual" directory in the path? Workqueues are "virtual"
> devices?

I ain't an actual one.  That apparently is the preferred place to
present software constructs like workqueues in sysfs.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   9   10   >