date:20131210

Re: [B.A.T.M.A.N.] [PATCH -next 2/3] batman-adv: Use seq_overflow

2013-12-10 Thread Al Viro

On Wed, Dec 11, 2013 at 08:31:35AM +0100, Antonio Quartulli wrote:
> Joe,
> 
> we have other places in the batman-adv code where we use seq_printf, but
> at the moment we don't check the return value and we always return 0 at
> the end of the function.
> 
> I think we could use seq_overflow here as well?

Not if you want correctly behaving code...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -next 2/3] batman-adv: Use seq_overflow

2013-12-10 Thread Al Viro

On Tue, Dec 10, 2013 at 09:12:43PM -0800, Joe Perches wrote:

> diff --git a/net/batman-adv/gateway_client.c b/net/batman-adv/gateway_client.c
> index 2449afa..dfa5d2d 100644
> --- a/net/batman-adv/gateway_client.c
> +++ b/net/batman-adv/gateway_client.c
> @@ -517,29 +517,28 @@ static int batadv_write_buffer_text(struct batadv_priv 
> *bat_priv,
>  {
>   struct batadv_gw_node *curr_gw;
>   struct batadv_neigh_node *router;
> - int ret = -1;
>  
>   router = batadv_orig_node_get_router(gw_node->orig_node);
>   if (!router)
> - goto out;
> + return -1;

This (as well as the original) means "fail read(2) with -EINVAL", which
might or might not be correct behaviour.

>   curr_gw = batadv_gw_get_selected_gw_node(bat_priv);
>  
> - ret = seq_printf(seq, "%s %pM (%3i) %pM [%10s]: %u.%u/%u.%u MBit\n",
> -  (curr_gw == gw_node ? "=>" : "  "),
> -  gw_node->orig_node->orig,
> -  router->bat_iv.tq_avg, router->addr,
> -  router->if_incoming->net_dev->name,
> -  gw_node->bandwidth_down / 10,
> -  gw_node->bandwidth_down % 10,
> -  gw_node->bandwidth_up / 10,
> -  gw_node->bandwidth_up % 10);
> + seq_printf(seq, "%s %pM (%3i) %pM [%10s]: %u.%u/%u.%u MBit\n",
> +(curr_gw == gw_node ? "=>" : "  "),
> +gw_node->orig_node->orig,
> +router->bat_iv.tq_avg, router->addr,
> +router->if_incoming->net_dev->name,
> +gw_node->bandwidth_down / 10,
> +gw_node->bandwidth_down % 10,
> +gw_node->bandwidth_up / 10,
> +gw_node->bandwidth_up % 10);
>  
>   batadv_neigh_node_free_ref(router);
>   if (curr_gw)
>   batadv_gw_node_free_ref(curr_gw);
> -out:
> - return ret;
> +
> + return seq_overflow(seq);

... and this is utter junk.

This sucker should return 0.  Insufficiently large buffer will be handled
by caller, TYVM, if you give that caller a chance to do so.  Returning 1
from ->show() is a bug in almost all cases, and definitely so in this one.

Just in case somebody decides that above is worth copying: It Is Not.
Original code is buggy, plain and simple.  This one trades the older
bug ("fail with -EINVAL whenever the buffer is too small") with just as buggy
"silently skip an entry entirely whenever the buffer is too small".

Don't Do That.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] gpu: host1x: clk_round_rate() can return a zero upon error

2013-12-10 Thread Paul Walmsley


On 12/10/2013 11:51 PM, Sascha Hauer wrote:

On Mon, Dec 09, 2013 at 06:00:12PM -0800, Paul Walmsley wrote:

Treat both negative and zero return values from clk_round_rate() as
errors.  This is needed since subsequent patches will convert
clk_round_rate()'s return value to be an unsigned type, rather than a
signed type, since some clock sources can generate rates higher than
(2^31)-1 Hz.

Eventually, when calling clk_round_rate(), only a return value of zero
will be considered a error.  All other values will be considered valid
rates.  The comparison against values less than 0 is kept to preserve
the correct behavior in the meantime.

Shouldn't it be an error when the result is not within sensible limits
instead? What do you do with a rate of 1Hz?


It's up to the caller of clk_round_rate() to decide what doesn't make 
sense for its use-case.  The caller can certainly react to non-zero 
rates as it likes.


The 0 return code (and the previous negative return values that were 
used previously) are just intended for the clock framework to signal 
explicit errors encountered during clk_round_rate()'s execution.


- Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] gpu: host1x: clk_round_rate() can return a zero upon error

2013-12-10 Thread Sascha Hauer

On Mon, Dec 09, 2013 at 06:00:12PM -0800, Paul Walmsley wrote:
> 
> Treat both negative and zero return values from clk_round_rate() as
> errors.  This is needed since subsequent patches will convert
> clk_round_rate()'s return value to be an unsigned type, rather than a
> signed type, since some clock sources can generate rates higher than
> (2^31)-1 Hz.
> 
> Eventually, when calling clk_round_rate(), only a return value of zero
> will be considered a error.  All other values will be considered valid
> rates.  The comparison against values less than 0 is kept to preserve
> the correct behavior in the meantime.

Shouldn't it be an error when the result is not within sensible limits
instead? What do you do with a rate of 1Hz?

Sascha

-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Staging: TIDSPBRIDGE: Use vm_iomap_memory for mmap-ing instead of remap_pfn_range

2013-12-10 Thread Ivajlo Dimitrov



On 08.12.2013 01:49, Steven Luo wrote:

This patch causes problems with DSP codecs on OMAP3 devices running
Android -- specifically, when the decoder is cleaning up after itself,
munmap() of the mapped area fails, leading to a memory leak which
eventually crashes the system.

As far as I can tell, the code with this patch applied reduces to
(ignoring checks and such)

remap_pfn_range(vma, vma->vm_start,
(pdata->phys_mempool_base >> PAGE_SHIFT) + vma->vm_pgoff,
vma->vm_end - vma->vm_start,
vma->vm_page_prot);

whereas the original was


-   status = remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
-vma->vm_end - vma->vm_start,
-vma->vm_page_prot);

We're subtracting (pdata->phys_mempool_base >> PAGE_SHIFT) from
vma->vm_pgoff before calling vm_iomap_memory() to address the issue --
if that's satisfactory to everyone involved, I can submit the following
patch.



Hi,

I can pick your changes and re-send the original patch with them 
incorporated if there are no objections. Are you fine with that?


Regards,
Ivo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -next 2/3] batman-adv: Use seq_overflow

2013-12-10 Thread Antonio Quartulli

On 11/12/13 06:12, Joe Perches wrote:
> Convert the uses of the return of seq_printf to
> instead check seq_overflow to determine if a buffer
> overflow has occurred.
> 
> This will eventually allow seq_printf & seq_puts to
> be converted to a void return instead of the often
> misused return that is often assumed to be an int for
> the number of bytes emitted ala printk.
> 
> Signed-off-by: Joe Perches 

I assume this patch is going to be merged with the others in some tree.
In that case:

Acked-by: Antonio Quartulli 

Thanks,

-- 
Antonio Quartulli



signature.asc
Description: OpenPGP digital signature

Re: [PATCH][RESEND] ARM: pxa: remove IRQF_DISABLED

2013-12-10 Thread Haojian Zhuang


On 12/10/2013 01:43 AM, Eric Miao wrote:

Haojian, could you help take this via your tree to arm-soc?




Applied.

Thanks
Haojian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [B.A.T.M.A.N.] [PATCH -next 2/3] batman-adv: Use seq_overflow

2013-12-10 Thread Antonio Quartulli

Joe,

we have other places in the batman-adv code where we use seq_printf, but
at the moment we don't check the return value and we always return 0 at
the end of the function.

I think we could use seq_overflow here as well?


Thanks,


-- 
Antonio Quartulli



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] ARM: pxa: prevent PXA270 occasional reboot freezes

2013-12-10 Thread Haojian Zhuang


On 12/11/2013 02:31 AM, Marek Vasut wrote:

On Tuesday, December 10, 2013 at 11:48:59 AM, Daniel Mack wrote:

On 12/10/2013 09:43 AM, Haojian Zhuang wrote:

On 12/10/2013 12:39 PM, Sergei Ianovich wrote:

Erratum 71 of PXA270M Processor Family Specification Update
(April 19, 2010) explains that watchdog reset time is just
8us insead of 10ms in EMTS.

If SDRAM is not reset, it causes memory bus congestion and
the device hangs. We put SDRAM in selfresh mode before watchdog
reset, removing potential freezes.

Without this patch PXA270-based ICP DAS LP-8x4x hangs after up to 40
reboots. With this patch it has successfully rebooted 500 times.

Signed-off-by: Sergei Ianovich 
---

   arch/arm/mach-pxa/reset.c | 8 +++-
   1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-pxa/reset.c b/arch/arm/mach-pxa/reset.c
index 0d5dd64..263b152 100644
--- a/arch/arm/mach-pxa/reset.c
+++ b/arch/arm/mach-pxa/reset.c
@@ -13,6 +13,7 @@

   #include 
   #include 

+#include 

   unsigned int reset_status;
   EXPORT_SYMBOL(reset_status);

@@ -81,6 +82,12 @@ static void do_hw_reset(void)

writel_relaxed(OSSR_M3, OSSR);
/* ... in 100 ms */
writel_relaxed(readl_relaxed(OSCR) + 368640, OSMR3);

+   /*
+* SDRAM hangs on watchdog reset on Marvell PXA270 (erratum 71)
+* we put SDRAM into self-refresh to prevent that
+*/
+   while (1)
+   writel_relaxed(MDREFR_SLFRSH, MDREFR);

   }

   void pxa_restart(enum reboot_mode mode, const char *cmd)

@@ -104,4 +111,3 @@ void pxa_restart(enum reboot_mode mode, const char
*cmd)

break;

}

   }

-


Hi Daniel/Marek/Igor,

Could you help to try this patch? I'm lack of PXA27x board.


I don't have any either right now ...


On VPAC270

Tested-by: Marek Vasut 

Best regards,
Marek Vasut



Applied.

Thanks
Haojian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC] net: of_mdio: Scan PHYs which have device_type set to ethernet-phy

2013-12-10 Thread Florian Fainelli

Le jeudi 14 novembre 2013, 15:05:56 Srinivas Kandagatla a écrit :
> According to Documentation/devicetree/bindings/net/phy.txt device_type
> property of PHY nodes is mandatory, which should be set to
> "ethernet-phy". This patch adds check in scanning phys and only scans
> node which have device-type set to "ethernet-phy".

Please CC net...@vger.kernel.org as there might be networking folks not 
actively following devicetree-discuss.

> 
> Signed-off-by: Srinivas Kandagatla 
> ---
>  drivers/of/of_mdio.c |7 +++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
> index d5a57a9..78c53c7 100644
> --- a/drivers/of/of_mdio.c
> +++ b/drivers/of/of_mdio.c
> @@ -57,6 +57,9 @@ int of_mdiobus_register(struct mii_bus *mdio, struct
> device_node *np)
> 
>   /* Loop over the child nodes and register a phy_device for each one */
>   for_each_available_child_of_node(np, child) {
> + /* A PHY must have device_type set to "ethernet-phy" */
> + if (of_node_cmp(child->type, "ethernet-phy"))
> + continue;

As already stated by Grant this will break quite a lot of platforms out there. 
Technically speaking, ePAPR v1.1 only specifies that "cpu" and "memory" nodes 
should have a "device_type" property for compatibility. Altough I do agree 
that it is nice to have a properly set "device_type", we can't always rely on 
that. 
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/15] perf, x86: Basic Haswell LBR call stack support

2013-12-10 Thread Yan, Zheng

From: "Yan, Zheng" 

The new HSW call stack feature provides a facility such that
unfiltered call data will be collected as normal, but as return
instructions are executed the last captured branch record is
popped from the LBR stack. Thus, branch information relative to
leaf functions will not be captured, while preserving the call
stack information of the main line execution path.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.h   |  7 ++-
 arch/x86/kernel/cpu/perf_event_intel.c |  2 +-
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 93 +++---
 3 files changed, 78 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 745f6fb..1213641 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -460,7 +460,10 @@ struct x86_pmu {
 };
 
 enum {
-   PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE = PERF_SAMPLE_BRANCH_MAX_SHIFT,
+   PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT = PERF_SAMPLE_BRANCH_MAX_SHIFT,
+   PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE,
+
+   PERF_SAMPLE_BRANCH_CALL_STACK = 1U << 
PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT,
 };
 
 #define x86_add_quirk(func_)   \
@@ -695,6 +698,8 @@ void intel_pmu_lbr_init_atom(void);
 
 void intel_pmu_lbr_init_snb(void);
 
+void intel_pmu_lbr_init_hsw(void);
+
 int intel_pmu_setup_lbr_filter(struct perf_event *event);
 
 int p4_pmu_init(void);
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 0fa4f24..b477bfc 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2506,7 +2506,7 @@ __init int intel_pmu_init(void)
memcpy(hw_cache_event_ids, snb_hw_cache_event_ids, 
sizeof(hw_cache_event_ids));
memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs, 
sizeof(hw_cache_extra_regs));
 
-   intel_pmu_lbr_init_snb();
+   intel_pmu_lbr_init_hsw();
 
x86_pmu.event_constraints = intel_hsw_event_constraints;
x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints;
diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index 1ae2ec5..1bb844e 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -39,6 +39,7 @@ static enum {
 #define LBR_IND_JMP_BIT6 /* do not capture indirect jumps */
 #define LBR_REL_JMP_BIT7 /* do not capture relative jumps */
 #define LBR_FAR_BIT8 /* do not capture far branches */
+#define LBR_CALL_STACK_BIT 9 /* enable call stack */
 
 #define LBR_KERNEL (1 << LBR_KERNEL_BIT)
 #define LBR_USER   (1 << LBR_USER_BIT)
@@ -49,6 +50,7 @@ static enum {
 #define LBR_REL_JMP(1 << LBR_REL_JMP_BIT)
 #define LBR_IND_JMP(1 << LBR_IND_JMP_BIT)
 #define LBR_FAR(1 << LBR_FAR_BIT)
+#define LBR_CALL_STACK (1 << LBR_CALL_STACK_BIT)
 
 #define LBR_PLM (LBR_KERNEL | LBR_USER)
 
@@ -74,24 +76,25 @@ static enum {
  * x86control flow changes include branches, interrupts, traps, faults
  */
 enum {
-   X86_BR_NONE = 0,  /* unknown */
-
-   X86_BR_USER = 1 << 0, /* branch target is user */
-   X86_BR_KERNEL   = 1 << 1, /* branch target is kernel */
-
-   X86_BR_CALL = 1 << 2, /* call */
-   X86_BR_RET  = 1 << 3, /* return */
-   X86_BR_SYSCALL  = 1 << 4, /* syscall */
-   X86_BR_SYSRET   = 1 << 5, /* syscall return */
-   X86_BR_INT  = 1 << 6, /* sw interrupt */
-   X86_BR_IRET = 1 << 7, /* return from interrupt */
-   X86_BR_JCC  = 1 << 8, /* conditional */
-   X86_BR_JMP  = 1 << 9, /* jump */
-   X86_BR_IRQ  = 1 << 10,/* hw interrupt or trap or fault */
-   X86_BR_IND_CALL = 1 << 11,/* indirect calls */
-   X86_BR_ABORT= 1 << 12,/* transaction abort */
-   X86_BR_IN_TX= 1 << 13,/* in transaction */
-   X86_BR_NO_TX= 1 << 14,/* not in transaction */
+   X86_BR_NONE = 0,  /* unknown */
+
+   X86_BR_USER = 1 << 0, /* branch target is user */
+   X86_BR_KERNEL   = 1 << 1, /* branch target is kernel */
+
+   X86_BR_CALL = 1 << 2, /* call */
+   X86_BR_RET  = 1 << 3, /* return */
+   X86_BR_SYSCALL  = 1 << 4, /* syscall */
+   X86_BR_SYSRET   = 1 << 5, /* syscall return */
+   X86_BR_INT  = 1 << 6, /* sw interrupt */
+   X86_BR_IRET = 1 << 7, /* return from interrupt */
+   X86_BR_JCC  = 1 << 8, /* conditional */
+   X86_BR_JMP  = 1 << 9, /* jump */
+   X86_BR_IRQ  = 1 << 10,/* hw interrupt or trap or fault */
+   X86_BR_IND_CALL = 1 << 11,/* indirect calls */
+   X86_BR_ABORT= 1 << 12,/* transaction abort */
+   X86_BR_IN_TX

[PATCH 04/15] perf, x86: Use context switch callback to flush LBR stack

2013-12-10 Thread Yan, Zheng

From: "Yan, Zheng" 

Enable pmu context switch callback if LBR is used. Use the callback
to flush LBR stack when process is scheduled in.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.c   |  7 ---
 arch/x86/kernel/cpu/perf_event.h   |  2 -
 arch/x86/kernel/cpu/perf_event_intel.c | 14 +-
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 32 +-
 include/linux/perf_event.h |  5 ---
 kernel/events/core.c   | 71 --
 6 files changed, 21 insertions(+), 110 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 6703d17..69e2095 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1852,12 +1852,6 @@ static void x86_pmu_sched_task(struct perf_event_context 
*ctx, bool sched_in)
x86_pmu.sched_task(ctx, sched_in);
 }
 
-static void x86_pmu_flush_branch_stack(void)
-{
-   if (x86_pmu.flush_branch_stack)
-   x86_pmu.flush_branch_stack();
-}
-
 void perf_check_microcode(void)
 {
if (x86_pmu.check_microcode)
@@ -1884,7 +1878,6 @@ static struct pmu pmu = {
.commit_txn = x86_pmu_commit_txn,
 
.event_idx  = x86_pmu_event_idx,
-   .flush_branch_stack = x86_pmu_flush_branch_stack,
.sched_task = x86_pmu_sched_task,
 };
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 5eb3a58..3ef4b79 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -150,7 +150,6 @@ struct cpu_hw_events {
 * Intel LBR bits
 */
int lbr_users;
-   void*lbr_context;
struct perf_branch_stacklbr_stack;
struct perf_branch_entrylbr_entries[MAX_LBR_ENTRIES];
struct er_account   *lbr_sel;
@@ -416,7 +415,6 @@ struct x86_pmu {
void(*cpu_dead)(int cpu);
 
void(*check_microcode)(void);
-   void(*flush_branch_stack)(void);
void(*sched_task)(struct perf_event_context *ctx,
  bool sched_in);
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index b477bfc..84a1c09 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2038,18 +2038,6 @@ static void intel_pmu_cpu_dying(int cpu)
fini_debug_store_on_cpu(cpu);
 }
 
-static void intel_pmu_flush_branch_stack(void)
-{
-   /*
-* Intel LBR does not tag entries with the
-* PID of the current task, then we need to
-* flush it on ctxsw
-* For now, we simply reset it
-*/
-   if (x86_pmu.lbr_nr)
-   intel_pmu_lbr_reset();
-}
-
 PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");
 
 PMU_FORMAT_ATTR(ldlat, "config1:0-15");
@@ -2101,7 +2089,7 @@ static __initconst const struct x86_pmu intel_pmu = {
.cpu_starting   = intel_pmu_cpu_starting,
.cpu_dying  = intel_pmu_cpu_dying,
.guest_get_msrs = intel_guest_get_msrs,
-   .flush_branch_stack = intel_pmu_flush_branch_stack,
+   .sched_task = intel_pmu_lbr_sched_task,
 };
 
 static __init void intel_clovertown_quirk(void)
diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index 1bb844e..c33bf84c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -187,24 +187,32 @@ void intel_pmu_lbr_reset(void)
intel_pmu_lbr_reset_64();
 }
 
-void intel_pmu_lbr_enable(struct perf_event *event)
+void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in)
 {
-   struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-
if (!x86_pmu.lbr_nr)
return;
 
/*
-* Reset the LBR stack if we changed task context to
-* avoid data leaks.
+* It is necessary to flush the stack on context switch. This happens
+* when the branch stack does not tag its entries with the pid of the
+* current task.
 */
-   if (event->ctx->task && cpuc->lbr_context != event->ctx) {
+   if (sched_in)
intel_pmu_lbr_reset();
-   cpuc->lbr_context = event->ctx;
-   }
+}
+
+void intel_pmu_lbr_enable(struct perf_event *event)
+{
+   struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+
+   if (!x86_pmu.lbr_nr)
+   return;
+
cpuc->br_sel = event->hw.branch_reg.reg;
 
cpuc->lbr_users++;
+   if (cpuc->lbr_users == 1)
+   perf_sched_cb_enable(event->ctx->pmu);
 }
 
 void intel_pmu_lbr_disable(struct perf_event *event)
@@ -217,10 +225,10 @@ void intel_pmu_lbr_disable(struct perf_event *event)

[PATCH 03/15] perf, core: Introduce pmu context switch callback

2013-12-10 Thread Yan, Zheng

From: "Yan, Zheng" 

The callback is invoked when process is scheduled in/out. To avoid
unnecessary overhead, the callback can be enabled/disabled.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.c |  7 +
 arch/x86/kernel/cpu/perf_event.h |  4 +++
 include/linux/perf_event.h   |  8 ++
 kernel/events/core.c | 59 
 4 files changed, 78 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 8e13293..6703d17 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1846,6 +1846,12 @@ static const struct attribute_group 
*x86_pmu_attr_groups[] = {
NULL,
 };
 
+static void x86_pmu_sched_task(struct perf_event_context *ctx, bool sched_in)
+{
+   if (x86_pmu.sched_task)
+   x86_pmu.sched_task(ctx, sched_in);
+}
+
 static void x86_pmu_flush_branch_stack(void)
 {
if (x86_pmu.flush_branch_stack)
@@ -1879,6 +1885,7 @@ static struct pmu pmu = {
 
.event_idx  = x86_pmu_event_idx,
.flush_branch_stack = x86_pmu_flush_branch_stack,
+   .sched_task = x86_pmu_sched_task,
 };
 
 void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 1213641..5eb3a58 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -417,6 +417,8 @@ struct x86_pmu {
 
void(*check_microcode)(void);
void(*flush_branch_stack)(void);
+   void(*sched_task)(struct perf_event_context *ctx,
+ bool sched_in);
 
/*
 * Intel Arch Perfmon v2+
@@ -678,6 +680,8 @@ void intel_pmu_pebs_disable_all(void);
 
 void intel_ds_init(void);
 
+void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in);
+
 void intel_pmu_lbr_reset(void);
 
 void intel_pmu_lbr_enable(struct perf_event *event);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8f4a70f..6a3e603 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -251,6 +251,12 @@ struct pmu {
 * flush branch stack on context-switches (needed in cpu-wide mode)
 */
void (*flush_branch_stack)  (void);
+
+   /*
+* PMU callback for context-switches. optional
+*/
+   void (*sched_task)  (struct perf_event_context *ctx,
+bool sched_in);
 };
 
 /**
@@ -546,6 +552,8 @@ extern void perf_event_delayed_put(struct task_struct 
*task);
 extern void perf_event_print_debug(void);
 extern void perf_pmu_disable(struct pmu *pmu);
 extern void perf_pmu_enable(struct pmu *pmu);
+extern void perf_sched_cb_disable(struct pmu *pmu);
+extern void perf_sched_cb_enable(struct pmu *pmu);
 extern int perf_event_task_disable(void);
 extern int perf_event_task_enable(void);
 extern int perf_event_refresh(struct perf_event *event, int refresh);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 403b781..11c63d6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -141,6 +141,7 @@ enum event_type_t {
 struct static_key_deferred perf_sched_events __read_mostly;
 static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
 static DEFINE_PER_CPU(atomic_t, perf_branch_stack_events);
+static DEFINE_PER_CPU(int, perf_sched_cb_usages);
 
 static atomic_t nr_mmap_events __read_mostly;
 static atomic_t nr_comm_events __read_mostly;
@@ -2327,6 +2328,59 @@ unlock:
}
 }
 
+void perf_sched_cb_disable(struct pmu *pmu)
+{
+   __get_cpu_var(perf_sched_cb_usages)--;
+}
+
+void perf_sched_cb_enable(struct pmu *pmu)
+{
+   __get_cpu_var(perf_sched_cb_usages)++;
+}
+
+/*
+ * This function provides the context switch callback to the lower code
+ * layer. It is invoked ONLY when the context switch callback is enabled.
+ */
+static void perf_pmu_sched_task(struct task_struct *prev,
+   struct task_struct *next,
+   bool sched_in)
+{
+   struct perf_cpu_context *cpuctx;
+   struct pmu *pmu;
+   unsigned long flags;
+   int count = 0;
+
+   if (prev == next)
+   return;
+
+   local_irq_save(flags);
+
+   rcu_read_lock();
+
+   list_for_each_entry_rcu(pmu, , entry) {
+   if (!pmu->sched_task)
+   continue;
+
+   cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+   pmu = cpuctx->ctx.pmu;
+
+   perf_ctx_lock(cpuctx, cpuctx->task_ctx);
+
+   perf_pmu_disable(pmu);
+
+   pmu->sched_task(cpuctx->task_ctx, sched_in);
+
+   perf_pmu_enable(pmu);
+
+   perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
+   }
+
+   rcu_read_unlock();
+
+   local_irq_restore(flags);
+}
+
 #define for_each_task_context_nr(ctxn)

[PATCH 00/15] perf, x86: Haswell LBR call stack support

2013-12-10 Thread Yan, Zheng

From: "Yan, Zheng" 

Haswell has a new feature that utilizes the existing Last Branch Record
facility to record call chains. When the feature is enabled, function
call will be collected as normal, but as return instructions are executed
the last captured branch record is popped from the on-chip LBR registers.
The LBR call stack facility can help perf to get call chains of progam
without frame pointer. This feature can be disabled/enabled through an
attribute file in the cpu pmu sysfs directory.

The LBR call stack has following known limitations
 - Zero length calls are not filtered out by hardware
 - Exception handing such as setjmp/longjmp will have calls/returns not
   match
 - Pushing different return address onto the stack will have calls/returns
   not match
 - If callstack is deeper than the LBR, only the last entries are captured

Change since previous version
 - split change into more patches
 - introduce context switch callback and use it to flush LBR
 - use the context switch callback to save/restore LBR
 - dynamic allocate memory area for storing LBR stack, always switch the
   memory area during context switch
 - disable this feature by default
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/15] perf, x86: Reduce lbr_sel_map size

2013-12-10 Thread Yan, Zheng

From: "Yan, Zheng" 

The index of lbr_sel_map is bit value of perf branch_sample_type.
PERF_SAMPLE_BRANCH_MAX is 1024 at present, so each lbr_sel_map
uses 4096 bytes. If we use bit shift as index, each lbr_sel_map
only uses 40 bytes.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.h   |  4 +++
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 50 ++
 include/uapi/linux/perf_event.h| 42 +
 3 files changed, 56 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index fd00bb2..745f6fb 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -459,6 +459,10 @@ struct x86_pmu {
struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
 };
 
+enum {
+   PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE = PERF_SAMPLE_BRANCH_MAX_SHIFT,
+};
+
 #define x86_add_quirk(func_)   \
 do {   \
static struct x86_pmu_quirk __quirk __initdata = {  \
diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index d82d155..1ae2ec5 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -69,10 +69,6 @@ static enum {
 #define LBR_FROM_FLAG_IN_TX(1ULL << 62)
 #define LBR_FROM_FLAG_ABORT(1ULL << 61)
 
-#define for_each_branch_sample_type(x) \
-   for ((x) = PERF_SAMPLE_BRANCH_USER; \
-(x) < PERF_SAMPLE_BRANCH_MAX; (x) <<= 1)
-
 /*
  * x86control flow change classification
  * x86control flow changes include branches, interrupts, traps, faults
@@ -400,14 +396,14 @@ static int intel_pmu_setup_hw_lbr_filter(struct 
perf_event *event)
 {
struct hw_perf_event_extra *reg;
u64 br_type = event->attr.branch_sample_type;
-   u64 mask = 0, m;
-   u64 v;
+   u64 mask = 0, v;
+   int i;
 
-   for_each_branch_sample_type(m) {
-   if (!(br_type & m))
+   for (i = 0; i < PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE; i++) {
+   if (!(br_type & (1ULL << i)))
continue;
 
-   v = x86_pmu.lbr_sel_map[m];
+   v = x86_pmu.lbr_sel_map[i];
if (v == LBR_NOT_SUPP)
return -EOPNOTSUPP;
 
@@ -662,33 +658,33 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
 /*
  * Map interface branch filters onto LBR filters
  */
-static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
-   [PERF_SAMPLE_BRANCH_ANY]= LBR_ANY,
-   [PERF_SAMPLE_BRANCH_USER]   = LBR_USER,
-   [PERF_SAMPLE_BRANCH_KERNEL] = LBR_KERNEL,
-   [PERF_SAMPLE_BRANCH_HV] = LBR_IGN,
-   [PERF_SAMPLE_BRANCH_ANY_RETURN] = LBR_RETURN | LBR_REL_JMP
-   | LBR_IND_JMP | LBR_FAR,
+static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE] = {
+   [PERF_SAMPLE_BRANCH_ANY_SHIFT]  = LBR_ANY,
+   [PERF_SAMPLE_BRANCH_USER_SHIFT] = LBR_USER,
+   [PERF_SAMPLE_BRANCH_KERNEL_SHIFT]   = LBR_KERNEL,
+   [PERF_SAMPLE_BRANCH_HV_SHIFT]   = LBR_IGN,
+   [PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT]   = LBR_RETURN | LBR_REL_JMP
+   | LBR_IND_JMP | LBR_FAR,
/*
 * NHM/WSM erratum: must include REL_JMP+IND_JMP to get CALL branches
 */
-   [PERF_SAMPLE_BRANCH_ANY_CALL] =
+   [PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT] =
 LBR_REL_CALL | LBR_IND_CALL | LBR_REL_JMP | LBR_IND_JMP | LBR_FAR,
/*
 * NHM/WSM erratum: must include IND_JMP to capture IND_CALL
 */
-   [PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP,
+   [PERF_SAMPLE_BRANCH_IND_CALL_SHIFT] = LBR_IND_CALL | LBR_IND_JMP,
 };
 
-static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
-   [PERF_SAMPLE_BRANCH_ANY]= LBR_ANY,
-   [PERF_SAMPLE_BRANCH_USER]   = LBR_USER,
-   [PERF_SAMPLE_BRANCH_KERNEL] = LBR_KERNEL,
-   [PERF_SAMPLE_BRANCH_HV] = LBR_IGN,
-   [PERF_SAMPLE_BRANCH_ANY_RETURN] = LBR_RETURN | LBR_FAR,
-   [PERF_SAMPLE_BRANCH_ANY_CALL]   = LBR_REL_CALL | LBR_IND_CALL
-   | LBR_FAR,
-   [PERF_SAMPLE_BRANCH_IND_CALL]   = LBR_IND_CALL,
+static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE] = {
+   [PERF_SAMPLE_BRANCH_ANY_SHIFT]  = LBR_ANY,
+   [PERF_SAMPLE_BRANCH_USER_SHIFT] = LBR_USER,
+   [PERF_SAMPLE_BRANCH_KERNEL_SHIFT]   = LBR_KERNEL,
+   [PERF_SAMPLE_BRANCH_HV_SHIFT]   = LBR_IGN,
+   [PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT]   = LBR_RETURN | LBR_FAR,
+   [PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT] = LBR_REL_CALL | LBR_IND_CALL
+   | LBR_FAR,
+   [PERF_SAMPLE_BRANCH_IND_CALL_SHIFT]

[PATCH 06/15] perf, core: PMU specific data for perf task context

2013-12-10 Thread Yan, Zheng

From: "Yan, Zheng" 

allow allocating PMU specific data for perf task context

Signed-off-by: Yan, Zheng 
---
 include/linux/perf_event.h |  5 +
 kernel/events/core.c   | 19 ++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 96cb88b..147f9d3 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -252,6 +252,10 @@ struct pmu {
 */
void (*sched_task)  (struct perf_event_context *ctx,
 bool sched_in);
+   /*
+* PMU specific data size
+*/
+   size_t  task_ctx_size;
 };
 
 /**
@@ -496,6 +500,7 @@ struct perf_event_context {
int pin_count;
int nr_cgroups;  /* cgroup evts */
int nr_branch_stack; /* branch_stack evt */
+   void*task_ctx_data; /* pmu specific data */
struct rcu_head rcu_head;
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c281265..6499dae 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -882,6 +882,15 @@ static void get_ctx(struct perf_event_context *ctx)
WARN_ON(!atomic_inc_not_zero(>refcount));
 }
 
+static void free_ctx(struct rcu_head *head)
+{
+   struct perf_event_context *ctx;
+
+   ctx = container_of(head, struct perf_event_context, rcu_head);
+   kfree(ctx->task_ctx_data);
+   kfree(ctx);
+}
+
 static void put_ctx(struct perf_event_context *ctx)
 {
if (atomic_dec_and_test(>refcount)) {
@@ -889,7 +898,7 @@ static void put_ctx(struct perf_event_context *ctx)
put_ctx(ctx->parent_ctx);
if (ctx->task)
put_task_struct(ctx->task);
-   kfree_rcu(ctx, rcu_head);
+   call_rcu(>rcu_head, free_ctx);
}
 }
 
@@ -3025,6 +3034,14 @@ alloc_perf_context(struct pmu *pmu, struct task_struct 
*task)
if (!ctx)
return NULL;
 
+   if (task && pmu->task_ctx_size > 0) {
+   ctx->task_ctx_data = kzalloc(pmu->task_ctx_size, GFP_KERNEL);
+   if (!ctx->task_ctx_data) {
+   kfree(ctx);
+   return NULL;
+   }
+   }
+
__perf_event_init_context(ctx);
if (task) {
ctx->task = task;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 07/15] perf, core: Always swtich pmu specific data during context switch

2013-12-10 Thread Yan, Zheng

From: "Yan, Zheng" 

Swtich pmu specific data even if switching perf task conext is
optimized out.

Signed-off-by: Yan, Zheng 
---
 kernel/events/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6499dae..974f7c7 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2318,6 +2318,8 @@ static void perf_event_context_sched_out(struct 
task_struct *task, int ctxn,
next->perf_event_ctxp[ctxn] = ctx;
ctx->task = next;
next_ctx->task = task;
+   ctx->task_ctx_data = xchg(_ctx->task_ctx_data,
+ ctx->task_ctx_data);
do_switch = 0;
 
perf_event_sync_stat(ctx, next_ctx);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/15] perf, x86: Save/resotre LBR stack during context switch

2013-12-10 Thread Yan, Zheng

From: "Yan, Zheng" 

When the LBR call stack is enabled, it is necessary to save/restore
the LBR stack on context switch. The solution is saving/restoring
the LBR stack to/from task's perf event context.

Don't save/restore the LBR stack if task has no perf event context
or there are system-wide users of LBR stack. Flush the LBR stack
instead.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 83 +-
 1 file changed, 69 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index f57960d..80bb097 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -187,18 +187,85 @@ void intel_pmu_lbr_reset(void)
intel_pmu_lbr_reset_64();
 }
 
+/*
+ * TOS = most recently recorded branch
+ */
+static inline u64 intel_pmu_lbr_tos(void)
+{
+   u64 tos;
+
+   rdmsrl(x86_pmu.lbr_tos, tos);
+
+   return tos;
+}
+
+enum {
+   LBR_UNINIT,
+   LBR_NONE,
+   LBR_VALID,
+};
+
+static void __intel_pmu_lbr_restore(struct x86_perf_task_context *task_ctx)
+{
+   int i;
+   unsigned lbr_idx, mask = x86_pmu.lbr_nr - 1;
+   u64 tos = intel_pmu_lbr_tos();
+
+   for (i = 0; i < x86_pmu.lbr_nr; i++) {
+   lbr_idx = (tos - i) & mask;
+   wrmsrl(x86_pmu.lbr_from + lbr_idx, task_ctx->lbr_from[i]);
+   wrmsrl(x86_pmu.lbr_to + lbr_idx, task_ctx->lbr_to[i]);
+   }
+   task_ctx->lbr_stack_state = LBR_NONE;
+}
+
+static void __intel_pmu_lbr_save(struct x86_perf_task_context *task_ctx)
+{
+   int i;
+   unsigned lbr_idx, mask = x86_pmu.lbr_nr - 1;
+   u64 tos = intel_pmu_lbr_tos();
+
+   for (i = 0; i < x86_pmu.lbr_nr; i++) {
+   lbr_idx = (tos - i) & mask;
+   rdmsrl(x86_pmu.lbr_from + lbr_idx, task_ctx->lbr_from[i]);
+   rdmsrl(x86_pmu.lbr_to + lbr_idx, task_ctx->lbr_to[i]);
+   }
+   task_ctx->lbr_stack_state = LBR_VALID;
+}
+
+
 void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in)
 {
+   struct cpu_hw_events *cpuc;
+   struct x86_perf_task_context *task_ctx;
+
if (!x86_pmu.lbr_nr)
return;
 
+   cpuc = &__get_cpu_var(cpu_hw_events);
+   task_ctx = ctx ? ctx->task_ctx_data : NULL;
+
+
/*
 * It is necessary to flush the stack on context switch. This happens
 * when the branch stack does not tag its entries with the pid of the
 * current task.
 */
-   if (sched_in)
-   intel_pmu_lbr_reset();
+   if (sched_in) {
+   if (cpuc->lbr_sys_users > 0 ||
+   !task_ctx ||
+   !task_ctx->lbr_callstack_users ||
+   task_ctx->lbr_stack_state != LBR_VALID)
+   intel_pmu_lbr_reset();
+   else
+   __intel_pmu_lbr_restore(task_ctx);
+   } else if (task_ctx) {
+   if (task_ctx->lbr_callstack_users &&
+   task_ctx->lbr_stack_state != LBR_UNINIT)
+   __intel_pmu_lbr_save(task_ctx);
+   else
+   task_ctx->lbr_stack_state = LBR_NONE;
+   }
 }
 
 static inline bool branch_user_callstack(unsigned br_sel)
@@ -271,18 +338,6 @@ void intel_pmu_lbr_disable_all(void)
__intel_pmu_lbr_disable();
 }
 
-/*
- * TOS = most recently recorded branch
- */
-static inline u64 intel_pmu_lbr_tos(void)
-{
-   u64 tos;
-
-   rdmsrl(x86_pmu.lbr_tos, tos);
-
-   return tos;
-}
-
 static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc)
 {
unsigned long mask = x86_pmu.lbr_nr - 1;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 05/15] perf, core: Optimize context switch callback invoking

2013-12-10 Thread Yan, Zheng

From: "Yan, Zheng" 

So far only CPU pmu uses context switch callback and it is the
first registered pmu. So we can avoid iterating the whole pmu list.

Signed-off-by: Yan, Zheng 
---
 kernel/events/core.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index fc97ed6..c281265 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2373,6 +2373,10 @@ static void perf_pmu_sched_task(struct task_struct *prev,
perf_pmu_enable(pmu);
 
perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
+
+   /* so far only CPU pmu uses context switch callback */
+   if (++count >= __get_cpu_var(perf_sched_cb_usages))
+   break;
}
 
rcu_read_unlock();
@@ -6492,7 +6496,7 @@ got_cpu_context:
if (!pmu->event_idx)
pmu->event_idx = perf_event_idx_default;
 
-   list_add_rcu(>entry, );
+   list_add_tail_rcu(>entry, );
ret = 0;
 unlock:
mutex_unlock(_lock);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] Documentation: Add MDIO bus node to PHY binding document

2013-12-10 Thread Florian Fainelli

Le mercredi 13 novembre 2013, 15:07:49 Jonas Jensen a écrit :
> Add MDIO bus node segment and update the example,
> allowing trivial bindings to break out boilerplate.
> 
> Signed-off-by: Jonas Jensen 
> ---
> 
> Notes:
> Thanks Mark,
> 
> This should have the changes from your comments. It also adds optional
> properties "compatible" and "reg", were those overlooked or left out
> intentionally?

Please CC net...@vger.kernel.org too as there might be some interest from 
networking folks not actively following devicetree-discuss.

This does looks good to me, there is not much to be said to the point where I 
wonder if this even deserves such an example, but it cannot hurt.

> 
> Changes since v1:
> 
> 1. reformat "MDIO bus node" description and add, node name should be
> "mdio" 2. reformat property descriptions, describe what the cells represent
> 3. add optional properties
> 4. add a description after "PHY nodes"
> 
> Applies to next-20131113
> 
>  Documentation/devicetree/bindings/net/phy.txt | 48
> ++- 1 file changed, 40 insertions(+), 8
> deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/phy.txt
> b/Documentation/devicetree/bindings/net/phy.txt index 7cd18fb..4179a70
> 100644
> --- a/Documentation/devicetree/bindings/net/phy.txt
> +++ b/Documentation/devicetree/bindings/net/phy.txt
> @@ -1,5 +1,26 @@
> +MDIO Bus Nodes
> +
> +An MDIO bus node describes an MDIO bus, and is a container for PHY nodes
> +as described below. An MDIO bus node should be named "mdio".
> +
> +Required properties:
> +
> +- #address-cells = Should be <1>, specifies the number of cells needed
> +  to encode the PHY address
> +- #size-cells = Should be <0>
> +
> +Optional Properties:
> +
> +- compatible : Should contain a specific name for the MDIO bus,
> +  if known, followed by "-mdio"
> +- reg : Should contain register location and length
> +
> +
>  PHY nodes
> 
> +Describes the PHY chip. A MAC connecting the PHY may use a phandle to
> +this node.
> +
>  Required properties:
> 
>   - device_type : Should be "ethernet-phy"
> @@ -23,13 +44,24 @@ Optional Properties:
>assume clause 22. The compatible list may also contain other
>elements.
> 
> +
>  Example:
> 
> -ethernet-phy@0 {
> - compatible = "ethernet-phy-ieee802.3-c22";
> - linux,phandle = <2452000>;
> - interrupt-parent = <4>;
> - interrupts = <35 1>;
> - reg = <0>;
> - device_type = "ethernet-phy";
> -};
> +mdio {
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + ethernet-phy@0 {
> + device_type = "ethernet-phy";
> + compatible = "...", "ethernet-phy-ieee802.3-c22";
> + reg = <0>;
> + interrupts = <24 0>;
> + }
> +
> + ethernet-phy@1 {
> + device_type = "ethernet-phy";
> + compatible = "...";
> + reg = <1>;
> + interrupts = <35 1>;
> + }
> +}

-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2 v2] f2fs: fix the location of tracepoint

2013-12-10 Thread Jaegeuk Kim

Change log from v1:
 o fix a mistake

>From 59df095b83e601b1b317ce97a5f767a8c3303902 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim 
Date: Wed, 11 Dec 2013 14:29:39 +0900
Subject: [PATCH] f2fs: fix the location of tracepoint
Cc: linux-fsde...@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-f2fs-de...@lists.sourceforge.net

We need to get a trace before submit_bio, since its bi_sector is
remapped during
the submit_bio.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/data.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index ebc9177..15956fa 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -104,11 +104,12 @@ static void __submit_merged_bio(struct
f2fs_bio_info *io)
rw = fio->rw | fio->rw_flag;
 
if (is_read_io(rw)) {
-   submit_bio(rw, io->bio);
trace_f2fs_submit_read_bio(io->sbi->sb, rw, fio->type, io->bio);
+   submit_bio(rw, io->bio);
io->bio = NULL;
return;
}
+   trace_f2fs_submit_write_bio(io->sbi->sb, rw, fio->type, io->bio);
 
/*
 * META_FLUSH is only from the checkpoint procedure, and we should
wait
@@ -122,7 +123,6 @@ static void __submit_merged_bio(struct f2fs_bio_info
*io)
} else {
submit_bio(rw, io->bio);
}
-   trace_f2fs_submit_write_bio(io->sbi->sb, rw, fio->type, io->bio);
io->bio = NULL;
 }
 
-- 
1.8.4.474.g128a96c


-- 
Jaegeuk Kim
Samsung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 08/15] perf, x86: Allocate memory for saving LBR stack

2013-12-10 Thread Yan, Zheng

From: "Yan, Zheng" 

Allocate memory for perf task context, use the memory to save
LBR stack.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.c | 1 +
 arch/x86/kernel/cpu/perf_event.h | 7 +++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 69e2095..2e43f1b 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1879,6 +1879,7 @@ static struct pmu pmu = {
 
.event_idx  = x86_pmu_event_idx,
.sched_task = x86_pmu_sched_task,
+   .task_ctx_size  = sizeof(struct x86_perf_task_context),
 };
 
 void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 3ef4b79..3ed9629 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -459,6 +459,13 @@ struct x86_pmu {
struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
 };
 
+struct x86_perf_task_context {
+   u64 lbr_from[MAX_LBR_ENTRIES];
+   u64 lbr_to[MAX_LBR_ENTRIES];
+   int lbr_callstack_users;
+   int lbr_stack_state;
+};
+
 enum {
PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT = PERF_SAMPLE_BRANCH_MAX_SHIFT,
PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE,
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/15] perf, core: Simplify need branch stack check

2013-12-10 Thread Yan, Zheng

From: "Yan, Zheng" 

use event->attr.branch_sample_type to check if branch stack is needed.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event_intel.c | 20 +++-
 include/linux/perf_event.h |  5 +
 kernel/events/core.c   | 11 +++
 3 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 84a1c09..722171c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1030,20 +1030,6 @@ static __initconst const u64 slm_hw_cache_event_ids
  },
 };
 
-static inline bool intel_pmu_needs_lbr_smpl(struct perf_event *event)
-{
-   /* user explicitly requested branch sampling */
-   if (has_branch_stack(event))
-   return true;
-
-   /* implicit branch sampling to correct PEBS skid */
-   if (x86_pmu.intel_cap.pebs_trap && event->attr.precise_ip > 1 &&
-   x86_pmu.intel_cap.pebs_format < 2)
-   return true;
-
-   return false;
-}
-
 static void intel_pmu_disable_all(void)
 {
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
@@ -1208,7 +1194,7 @@ static void intel_pmu_disable_event(struct perf_event 
*event)
 * must disable before any actual event
 * because any event may be combined with LBR
 */
-   if (intel_pmu_needs_lbr_smpl(event))
+   if (needs_branch_stack(event))
intel_pmu_lbr_disable(event);
 
if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
@@ -1269,7 +1255,7 @@ static void intel_pmu_enable_event(struct perf_event 
*event)
 * must enabled before any actual event
 * because any event may be combined with LBR
 */
-   if (intel_pmu_needs_lbr_smpl(event))
+   if (needs_branch_stack(event))
intel_pmu_lbr_enable(event);
 
if (event->attr.exclude_host)
@@ -1741,7 +1727,7 @@ static int intel_pmu_hw_config(struct perf_event *event)
if (event->attr.precise_ip && x86_pmu.pebs_aliases)
x86_pmu.pebs_aliases(event);
 
-   if (intel_pmu_needs_lbr_smpl(event)) {
+   if (needs_branch_stack(event)) {
ret = intel_pmu_setup_lbr_filter(event);
if (ret)
return ret;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 147f9d3..0d88eb8 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -766,6 +766,11 @@ static inline bool has_branch_stack(struct perf_event 
*event)
return event->attr.sample_type & PERF_SAMPLE_BRANCH_STACK;
 }
 
+static inline bool needs_branch_stack(struct perf_event *event)
+{
+   return event->attr.branch_sample_type != 0;
+}
+
 extern int perf_output_begin(struct perf_output_handle *handle,
 struct perf_event *event, unsigned int size);
 extern void perf_output_end(struct perf_output_handle *handle);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 974f7c7..e039dbc 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1137,7 +1137,7 @@ list_add_event(struct perf_event *event, struct 
perf_event_context *ctx)
if (is_cgroup_event(event))
ctx->nr_cgroups++;
 
-   if (has_branch_stack(event))
+   if (needs_branch_stack(event))
ctx->nr_branch_stack++;
 
list_add_rcu(>event_entry, >event_list);
@@ -1302,7 +1302,7 @@ list_del_event(struct perf_event *event, struct 
perf_event_context *ctx)
cpuctx->cgrp = NULL;
}
 
-   if (has_branch_stack(event))
+   if (needs_branch_stack(event))
ctx->nr_branch_stack--;
 
ctx->nr_events--;
@@ -3207,7 +3207,7 @@ static void unaccount_event(struct perf_event *event)
atomic_dec(_freq_events);
if (is_cgroup_event(event))
static_key_slow_dec_deferred(_sched_events);
-   if (has_branch_stack(event))
+   if (needs_branch_stack(event))
static_key_slow_dec_deferred(_sched_events);
 
unaccount_event_cpu(event, event->cpu);
@@ -6619,7 +6619,7 @@ static void account_event(struct perf_event *event)
if (atomic_inc_return(_freq_events) == 1)
tick_nohz_full_kick_all();
}
-   if (has_branch_stack(event))
+   if (needs_branch_stack(event))
static_key_slow_inc(_sched_events.key);
if (is_cgroup_event(event))
static_key_slow_inc(_sched_events.key);
@@ -6727,6 +6727,9 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))
goto err_ns;
 
+   if (!has_branch_stack(event))
+   event->attr.branch_sample_type = 0;
+
pmu = perf_init_event(event);
if (!pmu)
goto err_ns;
-- 
1.8.1.4

--
To unsubscribe

[PATCH 12/15] perf, core: Pass perf_sample_data to perf_callchain()

2013-12-10 Thread Yan, Zheng

From: "Yan, Zheng" 

New Intel CPU can record call chains by using existing last branch
record facility. perf_callchain_user() can make use of the call
chains recorded by hardware in case of there is no frame pointer.

Signed-off-by: Yan, Zheng 
---
 arch/arm/kernel/perf_event.c | 4 ++--
 arch/powerpc/perf/callchain.c| 4 ++--
 arch/sparc/kernel/perf_event.c   | 4 ++--
 arch/x86/kernel/cpu/perf_event.c | 4 ++--
 include/linux/perf_event.h   | 3 ++-
 kernel/events/callchain.c| 8 +---
 kernel/events/core.c | 2 +-
 kernel/events/internal.h | 3 ++-
 8 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index bc3f2ef..fdcc654 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -566,8 +566,8 @@ user_backtrace(struct frame_tail __user *tail,
return buftail.fp - 1;
 }
 
-void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+void perf_callchain_user(struct perf_callchain_entry *entry,
+struct pt_regs *regs, struct perf_sample_data *data)
 {
struct frame_tail __user *tail;
 
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 74d1e78..b379ebc 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -482,8 +482,8 @@ static void perf_callchain_user_32(struct 
perf_callchain_entry *entry,
}
 }
 
-void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+void perf_callchain_user(struct perf_callchain_entry *entry,
+struct pt_regs *regs, struct perf_sample_data *data)
 {
if (current_is_64bit())
perf_callchain_user_64(entry, regs);
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index b5c38fa..cba0306 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1785,8 +1785,8 @@ static void perf_callchain_user_32(struct 
perf_callchain_entry *entry,
} while (entry->nr < PERF_MAX_STACK_DEPTH);
 }
 
-void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+void perf_callchain_user(struct perf_callchain_entry *entry,
+struct pt_regs *regs, struct perf_sample_data *data)
 {
perf_callchain_store(entry, regs->tpc);
 
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 2e43f1b..49128e6 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -2009,8 +2009,8 @@ perf_callchain_user32(struct pt_regs *regs, struct 
perf_callchain_entry *entry)
 }
 #endif
 
-void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+void perf_callchain_user(struct perf_callchain_entry *entry,
+struct pt_regs *regs, struct perf_sample_data *data)
 {
struct stack_frame frame;
const void __user *fp;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0d88eb8..c442276 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -709,7 +709,8 @@ extern void perf_event_fork(struct task_struct *tsk);
 /* Callchains */
 DECLARE_PER_CPU(struct perf_callchain_entry, perf_callchain_entry);
 
-extern void perf_callchain_user(struct perf_callchain_entry *entry, struct 
pt_regs *regs);
+extern void perf_callchain_user(struct perf_callchain_entry *entry, struct 
pt_regs *regs,
+   struct perf_sample_data *data);
 extern void perf_callchain_kernel(struct perf_callchain_entry *entry, struct 
pt_regs *regs);
 
 static inline void perf_callchain_store(struct perf_callchain_entry *entry, 
u64 ip)
diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index 97b67df..19d497c 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -30,7 +30,8 @@ __weak void perf_callchain_kernel(struct perf_callchain_entry 
*entry,
 }
 
 __weak void perf_callchain_user(struct perf_callchain_entry *entry,
-   struct pt_regs *regs)
+   struct pt_regs *regs,
+   struct perf_sample_data *data)
 {
 }
 
@@ -157,7 +158,8 @@ put_callchain_entry(int rctx)
 }
 
 struct perf_callchain_entry *
-perf_callchain(struct perf_event *event, struct pt_regs *regs)
+perf_callchain(struct perf_event *event, struct pt_regs *regs,
+  struct perf_sample_data *data)
 {
int rctx;
struct perf_callchain_entry *entry;
@@ -198,7 +200,7 @@ perf_callchain(struct perf_event *event, struct pt_regs 
*regs)
goto exit_put;
 
perf_callchain_store(entry, PERF_CONTEXT_USER);
-   perf_callchain_user(entry, regs);
+   perf_callchain_user(entry, regs, data);
}
}
 
diff --git a/kernel/events/core.c

[PATCH 13/15] perf, x86: Use LBR call stack to get user callchain

2013-12-10 Thread Yan, Zheng

From: "Yan, Zheng" 

Try utilizing the LBR call stack to get user callchain in case of
there is no frame pointer

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.c   | 33 ++
 arch/x86/kernel/cpu/perf_event_intel.c | 12 ++-
 arch/x86/kernel/cpu/perf_event_intel_lbr.c |  2 ++
 include/linux/perf_event.h |  1 +
 4 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 49128e6..1509340 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1965,12 +1965,28 @@ static unsigned long get_segment_base(unsigned int 
segment)
return get_desc_base(desc + idx);
 }
 
+static inline void
+perf_callchain_lbr_callstack(struct perf_callchain_entry *entry,
+struct perf_sample_data *data)
+{
+   struct perf_branch_stack *br_stack = data->br_stack;
+
+   if (br_stack && br_stack->user_callstack) {
+   int i = 0;
+   while (i < br_stack->nr && entry->nr < PERF_MAX_STACK_DEPTH) {
+   perf_callchain_store(entry, br_stack->entries[i].from);
+   i++;
+   }
+   }
+}
+
 #ifdef CONFIG_COMPAT
 
 #include 
 
 static inline int
-perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
+perf_callchain_user32(struct perf_callchain_entry *entry,
+ struct pt_regs *regs, struct perf_sample_data *data)
 {
/* 32-bit process in 64-bit kernel. */
unsigned long ss_base, cs_base;
@@ -1999,11 +2015,16 @@ perf_callchain_user32(struct pt_regs *regs, struct 
perf_callchain_entry *entry)
perf_callchain_store(entry, cs_base + frame.return_address);
fp = compat_ptr(ss_base + frame.next_frame);
}
+
+   if (fp == compat_ptr(regs->bp))
+   perf_callchain_lbr_callstack(entry, data);
+
return 1;
 }
 #else
 static inline int
-perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
+perf_callchain_user32(struct perf_callchain_entry *entry,
+ struct pt_regs *regs, struct perf_sample_data *data)
 {
 return 0;
 }
@@ -2033,12 +2054,12 @@ void perf_callchain_user(struct perf_callchain_entry 
*entry,
if (!current->mm)
return;
 
-   if (perf_callchain_user32(regs, entry))
+   if (perf_callchain_user32(entry, regs, data))
return;
 
while (entry->nr < PERF_MAX_STACK_DEPTH) {
unsigned long bytes;
-   frame.next_frame = NULL;
+   frame.next_frame = NULL;
frame.return_address = 0;
 
bytes = copy_from_user_nmi(, fp, sizeof(frame));
@@ -2051,6 +2072,10 @@ void perf_callchain_user(struct perf_callchain_entry 
*entry,
perf_callchain_store(entry, frame.return_address);
fp = frame.next_frame;
}
+
+   /* try LBR callstack if there is no frame pointer */
+   if (fp == (void __user *)regs->bp)
+   perf_callchain_lbr_callstack(entry, data);
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 722171c..e0f658a 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1030,6 +1030,15 @@ static __initconst const u64 slm_hw_cache_event_ids
  },
 };
 
+static inline bool intel_pmu_needs_lbr_callstack(struct perf_event *event)
+{
+   if ((event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) &&
+   (event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK))
+   return true;
+
+   return false;
+}
+
 static void intel_pmu_disable_all(void)
 {
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
@@ -1398,7 +1407,8 @@ again:
 
perf_sample_data_init(, 0, event->hw.last_period);
 
-   if (has_branch_stack(event))
+   if (has_branch_stack(event) ||
+   (event->ctx->task && intel_pmu_needs_lbr_callstack(event)))
data.br_stack = >lbr_stack;
 
if (perf_event_overflow(event, , regs))
diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index 80bb097..a879910 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -722,6 +722,8 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
int i, j, type;
bool compress = false;
 
+   cpuc->lbr_stack.user_callstack = branch_user_callstack(br_sel);
+
/* if sampling all branches, then nothing to filter */
if ((br_sel & X86_BR_ALL) == X86_BR_ALL)
return;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index c442276..d2f0488 100644
--- a/include/linux/perf_event.h
+++

[PATCH 15/15] perf, x86: Discard zero length call entries in LBR call stack

2013-12-10 Thread Yan, Zheng

From: "Yan, Zheng" 

"Zero length call" uses the attribute of the call instruction to push
the immediate instruction pointer on to the stack and then pops off
that address into a register. This is accomplished without any matching
return instruction. It confuses the hardware and make the recorded call
stack incorrect. Try fixing the call stack by discarding zero length
call entries.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index a879910..56ff99d 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -94,7 +94,8 @@ enum {
X86_BR_ABORT= 1 << 12,/* transaction abort */
X86_BR_IN_TX= 1 << 13,/* in transaction */
X86_BR_NO_TX= 1 << 14,/* not in transaction */
-   X86_BR_CALL_STACK   = 1 << 15,/* call stack */
+   X86_BR_ZERO_CALL= 1 << 15,/* zero length call */
+   X86_BR_CALL_STACK   = 1 << 16,/* call stack */
 };
 
 #define X86_BR_PLM (X86_BR_USER | X86_BR_KERNEL)
@@ -111,13 +112,15 @@ enum {
 X86_BR_JMP  |\
 X86_BR_IRQ  |\
 X86_BR_ABORT|\
-X86_BR_IND_CALL)
+X86_BR_IND_CALL |\
+X86_BR_ZERO_CALL)
 
 #define X86_BR_ALL (X86_BR_PLM | X86_BR_ANY)
 
 #define X86_BR_ANY_CALL \
(X86_BR_CALL|\
 X86_BR_IND_CALL|\
+X86_BR_ZERO_CALL   |\
 X86_BR_SYSCALL |\
 X86_BR_IRQ |\
 X86_BR_INT)
@@ -656,6 +659,12 @@ static int branch_type(unsigned long from, unsigned long 
to, int abort)
ret = X86_BR_INT;
break;
case 0xe8: /* call near rel */
+   insn_get_immediate();
+   if (insn.immediate1.value == 0) {
+   /* zero length call */
+   ret = X86_BR_ZERO_CALL;
+   break;
+   }
case 0x9a: /* call far absolute */
ret = X86_BR_CALL;
break;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 14/15] perf, x86: Enable LBR callstack when recording callchain

2013-12-10 Thread Yan, Zheng

From: "Yan, Zheng" 

Try enabling the LBR call stack feature if event requests recording
callchain. Also adds a cpu pmu attribute to enable/disable this
feature (disabled by default).

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.c | 94 +++-
 arch/x86/kernel/cpu/perf_event.h |  7 +++
 2 files changed, 72 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 1509340..7fbb751 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -399,37 +399,49 @@ int x86_pmu_hw_config(struct perf_event *event)
 
if (event->attr.precise_ip > precise)
return -EOPNOTSUPP;
+   }
+   /*
+* check that PEBS LBR correction does not conflict with
+* whatever the user is asking with attr->branch_sample_type
+*/
+   if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format < 2) {
+   u64 *br_type = >attr.branch_sample_type;
+
+   if (has_branch_stack(event)) {
+   if (!precise_br_compat(event))
+   return -EOPNOTSUPP;
+
+   /* branch_sample_type is compatible */
+
+   } else {
+   /*
+* user did not specify  branch_sample_type
+*
+* For PEBS fixups, we capture all
+* the branches at the priv level of the
+* event.
+*/
+   *br_type = PERF_SAMPLE_BRANCH_ANY;
+
+   if (!event->attr.exclude_user)
+   *br_type |= PERF_SAMPLE_BRANCH_USER;
+
+   if (!event->attr.exclude_kernel)
+   *br_type |= PERF_SAMPLE_BRANCH_KERNEL;
+   }
+   } else if ((event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) &&
+  !has_branch_stack(event) &&
+  x86_pmu.attr_lbr_callstack &&
+  !event->attr.exclude_user &&
+  (event->attach_state & PERF_ATTACH_TASK)) {
/*
-* check that PEBS LBR correction does not conflict with
-* whatever the user is asking with attr->branch_sample_type
+* user did not specify branch_sample_type,
+* try using the LBR call stack facility to
+* record call chains of user program.
 */
-   if (event->attr.precise_ip > 1 &&
-   x86_pmu.intel_cap.pebs_format < 2) {
-   u64 *br_type = >attr.branch_sample_type;
-
-   if (has_branch_stack(event)) {
-   if (!precise_br_compat(event))
-   return -EOPNOTSUPP;
-
-   /* branch_sample_type is compatible */
-
-   } else {
-   /*
-* user did not specify  branch_sample_type
-*
-* For PEBS fixups, we capture all
-* the branches at the priv level of the
-* event.
-*/
-   *br_type = PERF_SAMPLE_BRANCH_ANY;
-
-   if (!event->attr.exclude_user)
-   *br_type |= PERF_SAMPLE_BRANCH_USER;
-
-   if (!event->attr.exclude_kernel)
-   *br_type |= PERF_SAMPLE_BRANCH_KERNEL;
-   }
-   }
+   event->attr.branch_sample_type =
+   PERF_SAMPLE_BRANCH_USER |
+   PERF_SAMPLE_BRANCH_CALL_STACK;
}
 
/*
@@ -1828,10 +1840,34 @@ static ssize_t set_attr_rdpmc(struct device *cdev,
return count;
 }
 
+static ssize_t get_attr_lbr_callstack(struct device *cdev,
+ struct device_attribute *attr, char *buf)
+{
+   return snprintf(buf, 40, "%d\n", x86_pmu.attr_lbr_callstack);
+}
+
+static ssize_t set_attr_lbr_callstack(struct device *cdev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+   unsigned long val = simple_strtoul(buf, NULL, 0);
+
+   if (x86_pmu.attr_lbr_callstack != !!val) {
+   if (val && !x86_pmu_has_lbr_callstack())
+   return -EOPNOTSUPP;
+   x86_pmu.attr_lbr_callstack = !!val;
+   }
+   return count;
+}
+
 static DEVICE_ATTR(rdpmc, S_IRUSR | S_IWUSR, get_attr_rdpmc, set_attr_rdpmc);
+static DEVICE_ATTR(lbr_callstack, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH,
+

[PATCH 09/15] perf: Track system-wide LBR users and LBR callstack users

2013-12-10 Thread Yan, Zheng

From: "Yan, Zheng" 

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.h   |  1 +
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 25 -
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 3ed9629..bcc8e1f 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -150,6 +150,7 @@ struct cpu_hw_events {
 * Intel LBR bits
 */
int lbr_users;
+   int lbr_sys_users;
struct perf_branch_stacklbr_stack;
struct perf_branch_entrylbr_entries[MAX_LBR_ENTRIES];
struct er_account   *lbr_sel;
diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index c33bf84c..f57960d 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -201,15 +201,29 @@ void intel_pmu_lbr_sched_task(struct perf_event_context 
*ctx, bool sched_in)
intel_pmu_lbr_reset();
 }
 
+static inline bool branch_user_callstack(unsigned br_sel)
+{
+   return (br_sel & X86_BR_USER) && (br_sel & X86_BR_CALL_STACK);
+}
+
 void intel_pmu_lbr_enable(struct perf_event *event)
 {
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+   struct x86_perf_task_context *task_ctx;
 
if (!x86_pmu.lbr_nr)
return;
 
+   cpuc = &__get_cpu_var(cpu_hw_events);
+   task_ctx = event->ctx ? event->ctx->task_ctx_data : NULL;
+
cpuc->br_sel = event->hw.branch_reg.reg;
 
+   if (!(event->attach_state & PERF_ATTACH_TASK))
+   cpuc->lbr_sys_users++;
+   else if (branch_user_callstack(cpuc->br_sel))
+   task_ctx->lbr_callstack_users++;
+
cpuc->lbr_users++;
if (cpuc->lbr_users == 1)
perf_sched_cb_enable(event->ctx->pmu);
@@ -217,11 +231,20 @@ void intel_pmu_lbr_enable(struct perf_event *event)
 
 void intel_pmu_lbr_disable(struct perf_event *event)
 {
-   struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+   struct cpu_hw_events *cpuc;
+   struct x86_perf_task_context *task_ctx;
 
if (!x86_pmu.lbr_nr)
return;
 
+   cpuc = &__get_cpu_var(cpu_hw_events);
+   task_ctx = event->ctx ? event->ctx->task_ctx_data : NULL;
+
+   if (!(event->attach_state & PERF_ATTACH_TASK))
+   cpuc->lbr_sys_users--;
+   else if (branch_user_callstack(cpuc->br_sel))
+   task_ctx->lbr_callstack_users--;
+
cpuc->lbr_users--;
WARN_ON_ONCE(cpuc->lbr_users < 0);
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] f2fs: fix the location of tracepoint

2013-12-10 Thread Jaegeuk Kim

Oops,
Please ignore the second patch.
Sorry for the noise.

2013-12-11 (수), 15:05 +0900, Jaegeuk Kim:
> We need to get a trace before submit_bio, since its bi_sector is remapped 
> during
> the submit_bio.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/data.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index ebc9177..969df55 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -103,9 +103,10 @@ static void __submit_merged_bio(struct f2fs_bio_info *io)
>  
>   rw = fio->rw | fio->rw_flag;
>  
> + trace_f2fs_submit_write_bio(io->sbi->sb, rw, fio->type, io->bio);
> +
>   if (is_read_io(rw)) {
>   submit_bio(rw, io->bio);
> - trace_f2fs_submit_read_bio(io->sbi->sb, rw, fio->type, io->bio);
>   io->bio = NULL;
>   return;
>   }
> @@ -122,7 +123,6 @@ static void __submit_merged_bio(struct f2fs_bio_info *io)
>   } else {
>   submit_bio(rw, io->bio);
>   }
> - trace_f2fs_submit_write_bio(io->sbi->sb, rw, fio->type, io->bio);
>   io->bio = NULL;
>  }
>  

-- 
Jaegeuk Kim
Samsung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] of: Fix early OF builtup on kobj-ification

2013-12-10 Thread Koen Kooi


Op 10 dec. 2013, om 15:13 heeft Pantelis Antoniou 
 het volgende geschreven:

> When booting platforms that do very early OF initialization before
> core_initcalls are performed of_init is called too late.
> 
> This results in a hard-hard without getting a chance to output anything.

I think that should be 'hard-hang'.

regards,

Koen

> 
> Fixed by adding a flag that marks when init has been done, and
> performing the per-node init at that time.
> 
> Signed-off-by: Pantelis Antoniou 
> ---
> drivers/of/base.c | 28 +++-
> 1 file changed, 23 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 734689b..a4f3dda 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -98,7 +98,7 @@ int __weak of_node_to_nid(struct device_node *np)
>  */
> struct device_node *of_node_get(struct device_node *node)
> {
> - if (node)
> + if (node && of_kset)
>   kobject_get(>kobj);
>   return node;
> }
> @@ -156,7 +156,7 @@ static void of_node_release(struct kobject *kobj)
>  */
> void of_node_put(struct device_node *node)
> {
> - if (node)
> + if (node && of_kset)
>   kobject_put(>kobj);
> }
> EXPORT_SYMBOL(of_node_put);
> @@ -202,9 +202,16 @@ static const char *safe_name(struct kobject *kobj, const 
> char *orig_name)
> static int __of_add_property(struct device_node *np, struct property *pp)
> {
>   int rc;
> + bool secure;
> +
> + /* note that we don't take a lock here */
> +
> + /* fake the return while of_init is not yet called */
> + if (!of_kset)
> + return 0;
> 
>   /* Important: Don't leak passwords */
> - bool secure = strncmp(pp->name, "security-", 9) == 0;
> + secure = strncmp(pp->name, "security-", 9) == 0;
> 
>   pp->attr.attr.name = safe_name(>kobj, pp->name);
>   pp->attr.attr.mode = secure ? S_IRUSR : S_IRUGO;
> @@ -222,6 +229,12 @@ static int __of_node_add(struct device_node *np)
>   struct property *pp;
>   int rc;
> 
> + /* note that we don't take a lock here */
> +
> + /* fake the return while of_init is not yet called */
> + if (!of_kset)
> + return 0;
> +
>   np->kobj.kset = of_kset;
>   if (!np->parent) {
>   /* Nodes without parents are new top level trees */
> @@ -245,11 +258,14 @@ static int __of_node_add(struct device_node *np)
> int of_node_add(struct device_node *np)
> {
>   int rc = 0;
> - kobject_init(>kobj, _node_ktype);
> +
> + /* fake the return while of_init is not yet called */
>   mutex_lock(_aliases_mutex);
> + kobject_init(>kobj, _node_ktype);
>   if (of_kset)
>   rc = __of_node_add(np);
>   mutex_unlock(_aliases_mutex);
> +
>   return rc;
> }
> 
> @@ -275,14 +291,16 @@ static int __init of_init(void)
> 
>   /* Make sure all nodes added before this time get added to sysfs */
>   mutex_lock(_aliases_mutex);
> +
>   for_each_of_allnodes(np)
>   __of_node_add(np);
> - mutex_unlock(_aliases_mutex);
> 
>   /* Symlink in /proc as required by userspace ABI */
>   if (of_allnodes)
>   proc_symlink("device-tree", NULL, 
> "/sys/firmware/devicetree/base");
> 
> + mutex_unlock(_aliases_mutex);
> +
>   return 0;
> }
> core_initcall(of_init);
> -- 
> 1.7.12
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1 1/6] net: mv643xx_eth: properly start/stop phy device

2013-12-10 Thread Sebastian Hesselbarth

On 12/11/2013 03:56 AM, David Miller wrote:

From: David Miller 
Date: Tue, 10 Dec 2013 21:52:53 -0500 (EST)

This series looks good, applied to net-next, thanks.

Actually, I had to revert.

You cannot use late_initcall_sync() from code that is potentially
built modular, this caused my build to fail at the mdio_bus code.

drivers/net/phy/mdio_bus.c:344:1: warning: data definition has no type or 
storage class [enabled by default]
drivers/net/phy/mdio_bus.c:344:1: error: type defaults to ‘int’ in declaration 
of ‘late_initcall_sync’ [-Werror=implicit-int]
drivers/net/phy/mdio_bus.c:344:1: warning: parameter names (without types) in 
function declaration [enabled by default]
drivers/net/phy/mdio_bus.c:339:12: warning: ‘mdio_bus_class_suspend_unused’ 
defined but not used [-Wunused-function]

Hmm, I see. What about you drop patch 5 ("net: phy: suspend unused PHYs
on mdio_bus in late_initcall"), take the rest, and I'll have look at
suspending unused PHYs later?

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched/timekeeping: lockdep spew

2013-12-10 Thread John Stultz

On 12/08/2013 07:36 PM, John Stultz wrote:
> On 12/08/2013 03:45 PM, Sasha Levin wrote:
>> Hi all,
>>
>> I've stumbled on this spew inside a KVM tools guest running latest
>> -next kernel:
>>
>>
>> [  251.100221] ==
>> [  251.100221] [ INFO: possible circular locking dependency detected ]
>> [  251.100221] 3.13.0-rc2-next-20131206-sasha-5-g8be2375-dirty
>> #4053 Not tainted
>> [  251.101967] ---
>> [  251.101967] kworker/10:1/4506 is trying to acquire lock:
>> [  251.101967]  (timekeeper_seq){..}, at: []
>> retrigger_next_event+0x56/0x70
>> [  251.101967]
>> [  251.101967] but task is already holding lock:
>> [  251.101967]  (hrtimer_bases.lock#11){-.-...}, at:
>> [] retrigger_next_event+0x3c/0x70
>> [  251.101967]
>> [  251.101967] which lock already depends on the new lock.
>> [  251.101967]
>> [  251.101967]
>> [  251.101967] the existing dependency chain (in reverse order) is:
>> [  251.101967]
>> -> #5 (hrtimer_bases.lock#11){-.-...}:
>> [  251.101967][] validate_chain+0x6c3/0x7b0
>> [  251.101967][] __lock_acquire+0x4ad/0x580
>> [  251.101967][] lock_acquire+0x182/0x1d0
>> [  251.101967][] _raw_spin_lock+0x40/0x80
>> [  251.101967][]
>> __hrtimer_start_range_ns+0x208/0x4c0
>> [  251.101967][]
>> start_bandwidth_timer+0x50/0x60
>> [  251.101967][]
>> __enqueue_rt_entity+0x23e/0x290
>> [  251.101967][] enqueue_rt_entity+0x6b/0x80
>> [  251.101967][] enqueue_task_rt+0x36/0xb0
>> [  251.101967][] enqueue_task+0x52/0x60
>> [  251.101967][] activate_task+0x22/0x30
>> [  251.101967][] ttwu_activate+0x21/0x50
>> [  251.101967][] T.2138+0x3c/0x60
>> [  251.101967][] ttwu_queue+0xb6/0xe0
>> [  251.101967][] try_to_wake_up+0x264/0x2a0
>> [  251.101967][] wake_up_process+0x3f/0x50
>> [  251.101967][] watchdog_timer_fn+0x4c/0x1b0
>> [  251.101967][] __run_hrtimer+0x1f6/0x310
>> [  251.101967][] hrtimer_run_queues+0x196/0x1d0
>> [  251.101967][] run_local_timers+0xe/0x20
>> [  251.101967][] update_process_times+0x3d/0x80
>> [  251.101967][] tick_periodic+0xab/0xc0
>> [  251.101967][] tick_handle_periodic+0x24/0x80
>> [  251.101967][]
>> local_apic_timer_interrupt+0x5d/0x70
>> [  251.101967][]
>> smp_apic_timer_interrupt+0x45/0x60
>> [  251.101967][] apic_timer_interrupt+0x72/0x80
>> [  251.101967][] default_idle+0x11d/0x260
>> [  251.101967][] arch_cpu_idle+0x18/0x50
>> [  251.101967][] cpu_idle_loop+0x351/0x460
>> [  251.101967][] cpu_startup_entry+0x70/0x80
>> [  251.101967][] start_secondary+0xf3/0x100
>> [  251.101967]
>> -> #4 (_b->rt_runtime_lock){-.-...}:
>> [  251.101967][] validate_chain+0x6c3/0x7b0
>> [  251.101967][] __lock_acquire+0x4ad/0x580
>> [  251.101967][] lock_acquire+0x182/0x1d0
>> [  251.101967][] _raw_spin_lock+0x40/0x80
>> [  251.101967][]
>> __enqueue_rt_entity+0x229/0x290
>> [  251.101967][] enqueue_rt_entity+0x6b/0x80
>> [  251.101967][] enqueue_task_rt+0x36/0xb0
>> [  251.101967][] enqueue_task+0x52/0x60
>> [  251.101967][]
>> __sched_setscheduler+0x33b/0x3f0
>> [  251.101967][]
>> sched_setscheduler_nocheck+0x10/0x20
>> [  251.101967][]
>> rcu_cpu_kthread_setup+0x2b/0x30
>> [  251.101967][] smpboot_thread_fn+0x1ed/0x2c0
>> [  251.101967][] kthread+0x105/0x110
>> [  251.101967][] ret_from_fork+0x7c/0xb0
>> [  251.101967]
>> -> #3 (>lock){-.-.-.}:
>> [  251.101967][] validate_chain+0x6c3/0x7b0
>> [  251.101967][] __lock_acquire+0x4ad/0x580
>> [  251.101967][] lock_acquire+0x182/0x1d0
>> [  251.101967][] _raw_spin_lock+0x40/0x80
>> [  251.101967][] wake_up_new_task+0x149/0x270
>> [  251.101967][] do_fork+0x1ba/0x270
>> [  251.101967][] kernel_thread+0x26/0x30
>> [  251.101967][] rest_init+0x26/0x150
>> [  251.101967][] start_kernel+0x3b9/0x3c0
>> [  251.101967][]
>> x86_64_start_reservations+0x2a/0x2c
>> [  251.101967][]
>> x86_64_start_kernel+0x186/0x195
>> [  251.101967]
>> -> #2 (>pi_lock){-.-.-.}:
>> [  251.101967][] validate_chain+0x6c3/0x7b0
>> [  251.101967][] __lock_acquire+0x4ad/0x580
>> [  251.101967][] lock_acquire+0x182/0x1d0
>> [  251.101967][]
>> _raw_spin_lock_irqsave+0x91/0xd0
>> [  251.101967][] try_to_wake_up+0x39/0x2a0
>> [  251.101967][] wake_up_process+0x3f/0x50
>> [  251.101967][] start_worker+0x2a/0x40
>> [  251.101967][]
>> create_and_start_worker+0x4d/0x90
>> [  251.101967][] init_workqueues+0x192/0x3cb
>> [  251.101967][] do_one_initcall+0xca/0x1e0
>> [  251.101967][]
>> kernel_init_freeable+0x2b4/0x354
>> [  251.101967]

Re: [RFC part2 PATCH 4/9] ARM64 / ACPI: Use Parked Address in GIC structure for spin table SMP initialisation

2013-12-10 Thread Hanjun Guo

On 2013-12-10 21:03, Grant Likely wrote:
[...]
>> +/* Parked Address in ACPI GIC structure can be used as cpu release addr */
>> +int acpi_get_parked_address_with_gic_id(u32 gic_id, u64 *parked_address)
>> +{
>> +struct acpi_table_header *table_header = NULL;
>> +struct acpi_subtable_header *entry;
>> +int err = 0;
>> +unsigned long table_end;
>> +acpi_size tbl_size;
>> +struct acpi_madt_generic_interrupt *processor = NULL;
>> +
>> +if (!parked_address)
>> +return -EINVAL;
>> +
>> +acpi_get_table_with_size(ACPI_SIG_MADT, 0, _header, _size);
>> +if (!table_header) {
>> +pr_warn(PREFIX "MADT table not present\n");
>> +return -ENODEV;
>> +}
>> +
>> +table_end = (unsigned long)table_header + table_header->length;
>> +
>> +/* Parse all entries looking for a match. */
>> +entry = (struct acpi_subtable_header *)
>> +((unsigned long)table_header + sizeof(struct acpi_table_madt));
>> +
>> +while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) <
>> +   table_end) {
>> +if (entry->type != ACPI_MADT_TYPE_GENERIC_INTERRUPT
>> +|| BAD_MADT_ENTRY(entry, table_end))
>> +continue;
>> +
>> +processor = (struct acpi_madt_generic_interrupt *)entry;
>> +
>> +if (processor->gic_id == gic_id) {
>> +*parked_address = processor->parked_address;
>> +goto out;
>> +}
>> +
>> +entry = (struct acpi_subtable_header *)
>> +((unsigned long)entry + entry->length);
> 
> All of the casting in this table looks suspicious. If you have to resort
> to casting, then the variable types are very likely wrong.
> 
> In the case immediately above, it seems that the entry size doesn't
> necessarily equal the acpi_subtable_header size, in which case you
> should cast the values to a void* instead of an unsigned long. That
> would mean you can do this:
> 
>   entry = ((void*)entry) + entry->length;
> 
> In fact, if I were writing the code, I would have two variables; the
> iterator pointer as a void* and a header pointer as a struct
> acpi_subtable_header*. Like so:
> 
>   void *entry, *table_end;
>   struct acpi_subtable_header *header;
> 
>   entry = ((void*)table_header) + sizeof(struct acpi_table_madt);
>   table_end = ((void*)table_header) + table_header->length;
>   while (entry + sizeof(*header)) < table_end) {
>   header = entry;
> 
>   if (header->type != ACPI_MADT_TYPE_GENERIC_INTERRUPT ||
>   BAD_MADT_ENTRY(entry, table_end))
>   continue;
>   processor = entry;
> 
>   if (processor->gic_id == gic_id) {
>   *parked_address = processor->parked_address;
>   goto out;
>   }
> 
>   entry += header->length;
>   }
> 
> See? Much cleaner code.

Aha, much much cleaner, thanks for the guidance, will rework my patch
and test it.

Hanjun

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] Add Legacy PM OPS usage checks to class, bus, and driver register functions

2013-12-10 Thread Greg KH

On Thu, Nov 14, 2013 at 10:50:29PM +0100, Rafael J. Wysocki wrote:
> On Thursday, November 14, 2013 08:43:27 AM Shuah Khan wrote:
> > On 11/07/2013 05:03 PM, Shuah Khan wrote:
> > > Add Legacy PM OPS usage checks to class, bus, and driver register 
> > > functions.
> > > If Legacy PM OPS usage is found, print warning message to indicate the 
> > > driver
> > > code needs updating to use Dev PM OPS interfaces. This will help serve as 
> > > a way
> > > to track drivers that still use Legacy PM OPS and fix them.
> > >
> > > This patch set adds Legacy PM OPS usage check and warning to 
> > > bus_register(),
> > > __class_register(), and driver_register() functions.
> > >
> > > Individual patches in this series are not dependent on each other. The 
> > > only
> > > reason this is a series is for context and facilitating discussion on the
> > > overall change as opposed individual patches.
> > >
> > > Shuah Khan (3):
> > >drivers/bus: Add Legacy PM OPS usage check and warning to
> > >  bus_register()
> > >drivers/class: Add Legacy PM OPS usage check and warning to
> > >  __class_register()
> > >driver: Add Legacy PM OPS usage check and warning to driver_register()
> > >
> > >   drivers/base/bus.c| 3 +++
> > >   drivers/base/class.c  | 4 
> > >   drivers/base/driver.c | 4 
> > >   3 files changed, 11 insertions(+)
> > >
> > 
> > Greg/Rafael,
> > 
> > Any feedback on this series? Haven't gotten to it or don't like it?
> 
> I haven't had the time to review this, sorry.

ping?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] usb: phy: initilize the notifier when add a new phy

2013-12-10 Thread Neil Zhang


> -Original Message-
> From: Neil Zhang [mailto:zhan...@marvell.com]
> Sent: 2013年12月11日 14:18
> To: ba...@ti.com; gre...@linuxfoundation.org
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Neil Zhang
> Subject: [PATCH] usb: phy: initilize the notifier when add a new phy
> 
> We need to initilize the notifer before use it.
> So lets initilize it when add a new phy device to reduce the code redundance.
> 
> Signed-off-by: Neil Zhang 
> ---
>  drivers/usb/phy/phy-ab8500-usb.c|2 --
>  drivers/usb/phy/phy-generic.c   |1 -
>  drivers/usb/phy/phy-gpio-vbus-usb.c |2 --
>  drivers/usb/phy/phy-mxs-usb.c   |2 --
>  drivers/usb/phy/phy.c   |4 
>  5 files changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/usb/phy/phy-ab8500-usb.c
> b/drivers/usb/phy/phy-ab8500-usb.c
> index 0874023..11ab2c4 100644
> --- a/drivers/usb/phy/phy-ab8500-usb.c
> +++ b/drivers/usb/phy/phy-ab8500-usb.c
> @@ -1415,8 +1415,6 @@ static int ab8500_usb_probe(struct platform_device
> *pdev)
> 
>   platform_set_drvdata(pdev, ab);
> 
> - ATOMIC_INIT_NOTIFIER_HEAD(>phy.notifier);
> -
>   /* all: Disable phy when called from set_host and set_peripheral */
>   INIT_WORK(>phy_dis_work, ab8500_usb_phy_disable_work);
> 
> diff --git a/drivers/usb/phy/phy-generic.c b/drivers/usb/phy/phy-generic.c 
> index
> aa6d37b..bb39498 100644
> --- a/drivers/usb/phy/phy-generic.c
> +++ b/drivers/usb/phy/phy-generic.c
> @@ -241,7 +241,6 @@ int usb_phy_gen_create_phy(struct device *dev, struct
> usb_phy_gen_xceiv *nop,
>   nop->phy.otg->set_host  = nop_set_host;
>   nop->phy.otg->set_peripheral= nop_set_peripheral;
> 
> - ATOMIC_INIT_NOTIFIER_HEAD(>phy.notifier);
>   return 0;
>  }
>  EXPORT_SYMBOL_GPL(usb_phy_gen_create_phy);
> diff --git a/drivers/usb/phy/phy-gpio-vbus-usb.c
> b/drivers/usb/phy/phy-gpio-vbus-usb.c
> index 02799a5..69462e0 100644
> --- a/drivers/usb/phy/phy-gpio-vbus-usb.c
> +++ b/drivers/usb/phy/phy-gpio-vbus-usb.c
> @@ -314,8 +314,6 @@ static int gpio_vbus_probe(struct platform_device
> *pdev)
>   goto err_irq;
>   }
> 
> - ATOMIC_INIT_NOTIFIER_HEAD(_vbus->phy.notifier);
> -
>   INIT_DELAYED_WORK(_vbus->work, gpio_vbus_work);
> 
>   gpio_vbus->vbus_draw = regulator_get(>dev, "vbus_draw"); diff
> --git a/drivers/usb/phy/phy-mxs-usb.c b/drivers/usb/phy/phy-mxs-usb.c index
> 545844b..9df81f0 100644
> --- a/drivers/usb/phy/phy-mxs-usb.c
> +++ b/drivers/usb/phy/phy-mxs-usb.c
> @@ -160,8 +160,6 @@ static int mxs_phy_probe(struct platform_device
> *pdev)
>   mxs_phy->phy.notify_disconnect  = mxs_phy_on_disconnect;
>   mxs_phy->phy.type   = USB_PHY_TYPE_USB2;
> 
> - ATOMIC_INIT_NOTIFIER_HEAD(_phy->phy.notifier);
> -
>   mxs_phy->clk = clk;
> 
>   platform_set_drvdata(pdev, mxs_phy);
> diff --git a/drivers/usb/phy/phy.c b/drivers/usb/phy/phy.c index
> 1b74523..6689bd0 100644
> --- a/drivers/usb/phy/phy.c
> +++ b/drivers/usb/phy/phy.c
> @@ -329,6 +329,8 @@ int usb_add_phy(struct usb_phy *x, enum
> usb_phy_type type)
>   return -EINVAL;
>   }
> 
> + ATOMIC_INIT_NOTIFIER_HEAD(>phy.notifier);
> +
>   spin_lock_irqsave(_lock, flags);
> 
>   list_for_each_entry(phy, _list, head) { @@ -367,6 +369,8 @@ int
> usb_add_phy_dev(struct usb_phy *x)
>   return -EINVAL;
>   }
> 
> + ATOMIC_INIT_NOTIFIER_HEAD(>phy.notifier);
> +
>   spin_lock_irqsave(_lock, flags);
>   list_for_each_entry(phy_bind, _bind_list, list)
>   if (!(strcmp(phy_bind->phy_dev_name, dev_name(x->dev
> --
> 1.7.9.5

Please ignore this one and check patch v2.
Thanks.

Best Regards,
Neil Zhang

[PATCH v2] usb: phy: initilize the notifier when add a new phy

2013-12-10 Thread Neil Zhang

We need to initilize the notifer before use it.
So lets initilize it when add a new phy device to reduce the code
redundance.

Signed-off-by: Neil Zhang 
---
 drivers/usb/phy/phy-ab8500-usb.c|2 --
 drivers/usb/phy/phy-generic.c   |1 -
 drivers/usb/phy/phy-gpio-vbus-usb.c |2 --
 drivers/usb/phy/phy-mxs-usb.c   |2 --
 drivers/usb/phy/phy.c   |4 
 5 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/phy/phy-ab8500-usb.c b/drivers/usb/phy/phy-ab8500-usb.c
index 0874023..11ab2c4 100644
--- a/drivers/usb/phy/phy-ab8500-usb.c
+++ b/drivers/usb/phy/phy-ab8500-usb.c
@@ -1415,8 +1415,6 @@ static int ab8500_usb_probe(struct platform_device *pdev)
 
platform_set_drvdata(pdev, ab);
 
-   ATOMIC_INIT_NOTIFIER_HEAD(>phy.notifier);
-
/* all: Disable phy when called from set_host and set_peripheral */
INIT_WORK(>phy_dis_work, ab8500_usb_phy_disable_work);
 
diff --git a/drivers/usb/phy/phy-generic.c b/drivers/usb/phy/phy-generic.c
index aa6d37b..bb39498 100644
--- a/drivers/usb/phy/phy-generic.c
+++ b/drivers/usb/phy/phy-generic.c
@@ -241,7 +241,6 @@ int usb_phy_gen_create_phy(struct device *dev, struct 
usb_phy_gen_xceiv *nop,
nop->phy.otg->set_host  = nop_set_host;
nop->phy.otg->set_peripheral= nop_set_peripheral;
 
-   ATOMIC_INIT_NOTIFIER_HEAD(>phy.notifier);
return 0;
 }
 EXPORT_SYMBOL_GPL(usb_phy_gen_create_phy);
diff --git a/drivers/usb/phy/phy-gpio-vbus-usb.c 
b/drivers/usb/phy/phy-gpio-vbus-usb.c
index 02799a5..69462e0 100644
--- a/drivers/usb/phy/phy-gpio-vbus-usb.c
+++ b/drivers/usb/phy/phy-gpio-vbus-usb.c
@@ -314,8 +314,6 @@ static int gpio_vbus_probe(struct platform_device *pdev)
goto err_irq;
}
 
-   ATOMIC_INIT_NOTIFIER_HEAD(_vbus->phy.notifier);
-
INIT_DELAYED_WORK(_vbus->work, gpio_vbus_work);
 
gpio_vbus->vbus_draw = regulator_get(>dev, "vbus_draw");
diff --git a/drivers/usb/phy/phy-mxs-usb.c b/drivers/usb/phy/phy-mxs-usb.c
index 545844b..9df81f0 100644
--- a/drivers/usb/phy/phy-mxs-usb.c
+++ b/drivers/usb/phy/phy-mxs-usb.c
@@ -160,8 +160,6 @@ static int mxs_phy_probe(struct platform_device *pdev)
mxs_phy->phy.notify_disconnect  = mxs_phy_on_disconnect;
mxs_phy->phy.type   = USB_PHY_TYPE_USB2;
 
-   ATOMIC_INIT_NOTIFIER_HEAD(_phy->phy.notifier);
-
mxs_phy->clk = clk;
 
platform_set_drvdata(pdev, mxs_phy);
diff --git a/drivers/usb/phy/phy.c b/drivers/usb/phy/phy.c
index 1b74523..e6f61e4 100644
--- a/drivers/usb/phy/phy.c
+++ b/drivers/usb/phy/phy.c
@@ -329,6 +329,8 @@ int usb_add_phy(struct usb_phy *x, enum usb_phy_type type)
return -EINVAL;
}
 
+   ATOMIC_INIT_NOTIFIER_HEAD(>notifier);
+
spin_lock_irqsave(_lock, flags);
 
list_for_each_entry(phy, _list, head) {
@@ -367,6 +369,8 @@ int usb_add_phy_dev(struct usb_phy *x)
return -EINVAL;
}
 
+   ATOMIC_INIT_NOTIFIER_HEAD(>notifier);
+
spin_lock_irqsave(_lock, flags);
list_for_each_entry(phy_bind, _bind_list, list)
if (!(strcmp(phy_bind->phy_dev_name, dev_name(x->dev
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [question] sched: idle_avg and migration latency

2013-12-10 Thread Mike Galbraith

On Tue, 2013-12-10 at 19:31 +0100, Daniel Lezcano wrote:

> I think I am a bit puzzled with the 'idle_avg' name. I am guessing the 
> semantic of this variable is "how long this cpu has been idle".

Average distance between idles.

> The idle duration, with the no_hz, could be long, several seconds if the 
> work queues have been migrated and if the timer affinity is set to 
> another cpu. So if we fall in this case and there is a burst of activity 
> + micro-idle and idle_avg is not leverage to max, it will stay high 
> during an amount of time, thus pulling tasks at each micro idle period, 
> right ?

Yeah, it cares about shutting the thing down when idle distance is too
small to be affordable, but cranking is back up quickly as to not damage
generic bursty load utilization too much.  It tries to be dirt simply
and cheap, not perfect.

For nohz_full loads, you'll likely want to kill most if not all wake and
idle balancing, or at least put some serious roadblocks up.. but then
you'll have isolated and pinned everything anyway if you deeply care
about perturbation.  All load balancing totally sucks in that regard, as
do those darn workqueues you mentioned.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] debugobject: add support for kref

2013-12-10 Thread Greg KH

On Sun, Nov 03, 2013 at 08:33:08PM +0100, Sebastian Andrzej Siewior wrote:
> I run a couple times into the case where "put" was called on an already
> cleanup object. The results was either nothing because "0" went back to
> 0xff…ff and release was not called a second time or some thing like this
> with SLAB once that memory region was used again:
> 
> |edma-dma-engine edma-dma-engine.0: freeing channel for 57
> |Slab corruption (Not tainted): kmalloc-256 start=edc4c8c0, len=256
> |070: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  jkkk
> |Single bit error detected. Probably bad RAM.
> |Run a memory test tool.
> |Prev obj: start=edc4c7c0, len=256
> |000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> |010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> |Next obj: start=edc4c9c0, len=256
> |000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> |010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 
> which got me a little confused about the bit flip at first.
> The reason for this was resource counting that went wrong followed by a "put"
> called from two places.
> After the second time running into the same problem using the same driver,
> I was looking for something to help me to find atleast one caller (and the
> kind of object) a little quicker. Here is a debugobject extension to kref 
> which
> seems to do the job.
> At kref_init() the debugobject is initialized and activated.
> At kref_get() / kref_sub() time it is checked if the kref is active. If
> it is, the request is completed otherwise the "usual" debugobject is
> printed. Here an example of kref_put() on an already gone object:
> 
> |edma-dma-engine edma-dma-engine.0: freeing channel for 57
> |[ cut here ]
> |WARNING: CPU: 0 PID: 2053 at lib/debugobjects.c:260 
> debug_print_object+0x94/0xc4()
> |ODEBUG: active_state not available (active state 0) object type: kref hint:  
>  (null)
> |Modules linked in: ti_am335x_adc(-)
> |[] (unwind_backtrace+0x0/0xf4) from [] 
> (show_stack+0x14/0x1c)
> |[] (show_stack+0x14/0x1c) from [] 
> (warn_slowpath_common+0x64/0x84)
> |[] (warn_slowpath_common+0x64/0x84) from [] 
> (warn_slowpath_fmt+0x30/0x40)
> |[] (warn_slowpath_fmt+0x30/0x40) from [] 
> (debug_print_object+0x94/0xc4)
> |[] (debug_print_object+0x94/0xc4) from [] 
> (debug_object_active_state+0xe4/0x148)
> |[] (debug_object_active_state+0xe4/0x148) from [] 
> (iio_buffer_put+0x24/0x70 [industrialio])
> |[] (iio_buffer_put+0x24/0x70 [industrialio]) from [] 
> (iio_dev_release+0x44/0x64 [industrialio])
> |[] (iio_dev_release+0x44/0x64 [industrialio]) from [] 
> (device_release+0x2c/0x94)
> |[] (device_release+0x2c/0x94) from [] 
> (kobject_release+0x44/0x78)
> |[] (kobject_release+0x44/0x78) from [] 
> (release_nodes+0x1a0/0x248)
> |[] (release_nodes+0x1a0/0x248) from [] 
> (__device_release_driver+0x98/0xf0)
> |[] (__device_release_driver+0x98/0xf0) from [] 
> (driver_detach+0xb4/0xb8)
> |[] (driver_detach+0xb4/0xb8) from [] 
> (bus_remove_driver+0x98/0xec)
> |[] (bus_remove_driver+0x98/0xec) from [] 
> (SyS_delete_module+0x1e0/0x24c)
> |[] (SyS_delete_module+0x1e0/0x24c) from [] 
> (ret_fast_syscall+0x0/0x48)
> |---[ end trace bc5e9551626b178a ]---
> 
> This should help to notice that there is a second "put" and the
> call chain should point to the object. The "hint" callback is empty because
> in the "double free" case we don't have any information and the release
> function is passed as an argument to kref_put(). I think the information
> is quite good and it is not worth extending debugobject to somehow add
> this information.
> The "get/put unless" macros aren't traced completely because it is hard
> to get it correct without a false positive and without touching each
> user. The object might be added to a list which is browsed by someone.
> That someone holds the same lock that is required for in the cleanup path
> and so the cleanup is blocked. That means that the kref object is
> gone from debugobject point of view but the release function has not yet
> complete so it is valid to check the current counter.
> 
> v1…v2:
> - not an RFC anymore
> - addressed tglx review:
>   - use debug_obj_descr with state active
>   - use debug_object_active_state() to check for active object instead the
> other hack I had.
>   - added debug_object_free() in a way that does not interfere with the
> NSA sniffer API so it does not get removed from the patch by accident.
> 
> Signed-off-by: Sebastian Andrzej Siewior 

I need an ack from Thomas or some other debugobjects-knowledgable
developer before I can take this...

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 1/4] phy: Add provision for tuning phy.

2013-12-10 Thread Vivek Gautam

Hi,


On Tue, Dec 10, 2013 at 7:31 PM, Heikki Krogerus
 wrote:
> Hi,

Thanks for reviewing this.

>
> On Tue, Dec 10, 2013 at 04:25:23PM +0530, Vivek Gautam wrote:
>> Some PHY controllers may need to tune PHY post-initialization,
>> so that the PHY consumers can call phy-tuning at appropriate
>> point of time.
>>
>> Signed-off-by: vivek Gautam 
>> ---
>>  drivers/phy/phy-core.c  |   20 
>>  include/linux/phy/phy.h |7 +++
>>  2 files changed, 27 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/phy/phy-core.c b/drivers/phy/phy-core.c
>> index 03cf8fb..68dbb90 100644
>> --- a/drivers/phy/phy-core.c
>> +++ b/drivers/phy/phy-core.c
>> @@ -239,6 +239,26 @@ out:
>>  }
>>  EXPORT_SYMBOL_GPL(phy_power_off);
>>
>> +int phy_tune(struct phy *phy)
>> +{
>> + int ret = -ENOTSUPP;
>> +
>> + mutex_lock(>mutex);
>> + if (phy->ops->tune) {
>> + ret =  phy->ops->tune(phy);
>> + if (ret < 0) {
>> + dev_err(>dev, "phy tuning failed --> %d\n", ret);
>> + goto out;
>> + }
>> + }
>> +
>> +out:
>> + mutex_unlock(>mutex);
>> +
>> + return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(phy_tune);
>
> I think "setup" instead of "tune" is much more clear and reusable.

I think "setup" will look more like first time setting up the phy,
which is rather served by "init" callback.
This i thought would serve the purpose of over-riding certain PHY
parameters, which would not have been
possible at "init" time.
Please correct my thinking if i am unable to understand your point here.

>
> Thanks,
>
> --
> heikki
> --
> To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" 
> in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards
Vivek Gautam
Samsung R Institute, Bangalore
India
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH] x86_64: double the x86_64 kernel stack size?

2013-12-10 Thread Zhouyi Zhou

Somethings I compiled the Linux network modules (especially bridge and 
netfilter)
without optimization, the kernel always crashes because of exhausted kernel 
stack.

The similar problem has been discussed in
http://lists.linuxfoundation.org/pipermail/bridge/2005-January/004402.html

Is it OK the double the x86_64 kernel stack size?
 

Signed-off-by: Zhouyi Zhou 
Tested-by: Zhouyi Zhou  
---
 arch/x86/include/asm/page_64_types.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/page_64_types.h 
b/arch/x86/include/asm/page_64_types.h
index 43dcd80..dc4c629 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -1,7 +1,7 @@
 #ifndef _ASM_X86_PAGE_64_DEFS_H
 #define _ASM_X86_PAGE_64_DEFS_H
 
-#define THREAD_SIZE_ORDER  1
+#define THREAD_SIZE_ORDER  2
 #define THREAD_SIZE  (PAGE_SIZE << THREAD_SIZE_ORDER)
 #define CURRENT_MASK (~(THREAD_SIZE - 1))
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

nohz_full left a periodic tick cpu issue

2013-12-10 Thread Alex Shi

Hi Frederic,

Sorry for idiot of nohz_full. When we using this feature on my mobile
devices, we found this feature keep cpu0 in periodic tick mode. then the
timer interrupt on cpu0 is very higher than normal nohz mode.
that cause high power consuming cost.

I found you have mention this on commit: a382bf934449
nohz: Assign timekeeping duty to a CPU outside the full dynticks range

In fact, if all full dynticks cpu are in idle, cpu0 should be safe to
get into idle too. Do you have some plan or idea to implement this?
otherwise, power cost is too high to enable nohz_full in mobile platform.

-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] AX88179_178A: Enable the hardware pseudo header in case of the NET_IP_ALIGN equals 0

2013-12-10 Thread Freddy Xin

On 2013年12月10日 09:01, David Miller wrote:

From: fre...@asix.com.tw
Date: Fri,  6 Dec 2013 17:58:18 +0800

From: Freddy Xin 

The AX88179_178A has a hardware feature that it can insert a 2-bytes pseudo
header in front of each received frame by setting the AX_RX_CTL_IPE bit.
This feature is used to let the IP header be aligned on a doubleword-aligned 
address,
but the NET_IP_ALIGN may equals to 2 and the __netdev_alloc_skb_ip_align in 
USBNET will
reserve 2 bytes also, so in this case the driver shouldn't enable this bit.

This patch modifies the driver to set AX_RX_CTL_IPE just in case of the 
NET_IP_ALIGN equals 0.

Signed-off-by: Freddy Xin 

Please avoid larger than 80 column lines in your commit messages,
people use text-only tools to viee these.

Next, it makes no sense to restrict your change to NET_IP_ALIGN==0

Simply handle any case, by undoing the reservation if it's getting
in the way.  If there isn't an appropriate helper for this, add one.

I think there is no way of undoing the reservation in the driver.
Can I add a flag of the driver_info, and use it to determine
whether undoing the reservation in rx_submit of usbnet?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Race in memcg kmem?

2013-12-10 Thread Vladimir Davydov

On 12/11/2013 03:13 AM, Glauber Costa wrote:
> On Tue, Dec 10, 2013 at 5:59 PM, Vladimir Davydov
>  wrote:
>> Hi,
>>
>> Looking through the per-memcg kmem_cache initialization code, I have a
>> bad feeling that it is prone to a race. Before getting to fixing it, I'd
>> like to ensure this race is not only in my imagination. Here it goes.
>>
>> We keep per-memcg kmem caches in the memcg_params field of each root
>> cache. The memcg_params array is grown dynamically by
>> memcg_update_cache_size(). I guess that if this function is executed
>> concurrently with memcg_create_kmem_cache() we can get a race resulting
>> in a memory leak.
>>
> Ok, let's see.
>
>> -- memcg_create_kmem_cache(memcg, cachep) --
>> creates a new kmem_cache corresponding to a memcg and assigns it to the
>> root cache; called in the background - it is OK to have several such
>> functions trying to create a cache for the same memcg concurrently, but
>> only one of them should succeed.
> Yes.
>
>> @cachep is the root cache
>> @memcg is the memcg we want to create a cache for.
>>
>> The function:
>>
>> A1) assures there is no cache corresponding to the memcg (if it is we
>> have nothing to do):
>> idx = memcg_cache_id(memcg);
>> if (cachep->memcg_params[idx])
>> goto out;
>>
>> A2) creates and assigns a new cache:
>> new_cachep = kmem_cache_dup(memcg, cachep);
> Please note this cannot proceed in parallel with memcg_update_cache_size(),
> because they both take the slab mutex.

Oh, I see. memcg_create_kmem_cache() takes and releases the slab mutex
in kmem_cache_create_memcg(), which is called by kmem_cache_dup(). This
effectively works as a barrier that does not allow A2 to proceed until
memcg_update_cache_sizes() finishes, which makes the race implausible.
Did not notice that at first. Thanks for clarification!

>
>> // init new_cachep
>> cachep->memcg_params->memcg_caches[idx] = new_cachep;
>>
>>
>> -- memcg_update_cache_size(s, num_groups) --
>> grows s->memcg_params to accomodate data for num_groups memcg's
>> @s is the root cache whose memcg_params we want to grow
>> @num_groups is the new number of kmem-active cgroups (defines the new
>> size of memcg_params array).
>>
>> The function:
>>
>> B1) allocates and assigns a new cache:
>> cur_params = s->memcg_params;
>> s->memcg_params = kzalloc(size, GFP_KERNEL);
>>
>> B2) copies per-memcg cache ptrs from the old memcg_params array to the
>> new one:
>> for (i = 0; i < memcg_limited_groups_array_size; i++) {
>> if (!cur_params->memcg_caches[i])
>> continue;
>> s->memcg_params->memcg_caches[i] =
>> cur_params->memcg_caches[i];
>> }
>>
>> B3) frees the old array:
>> kfree(cur_params);
>>
>>
>> Since these two functions do not share any mutexes, we can get the
> They do share a mutex, the slab mutex.
>
>> following race:
>>
>> Assume, by the time Cpu0 gets to memcg_create_kmem_cache(), the memcg
>> cache has already been created by another thread, so this function
>> should do nothing.
>>
>> Cpu0Cpu1
>> 
>> B1
>> A1  we haven't initialized memcg_params yet so Cpu0 will
>> proceed to A2 to alloc and assign a new cache
>> A2
>> B2  Cpu1 rewrites the memcg cache ptr set by Cpu0 at A2
>> - a memory leak?
>> B3
>>
>> I'd like to add that even if I'm right about the race, this is rather
>> not critical, because memcg_update_cache_sizes() is called very rarely.
>>
> Every race is critical.
>
> So, I am a bit lost by your description. Get back to me if I got anything 
> wrong,
> but I am think that the point that you're missing is that all heavy
> slab operations
> take the slab_mutex underneath, and that includes cache creation and update.
>
>
>> BTW, it seems to me that the way we update memcg_params in
>> memcg_update_cache_size() make cache_from_memcg_idx() prone to
>> use-after-free:
>>
>>> static inline struct kmem_cache *
>>> cache_from_memcg_idx(struct kmem_cache *s, int idx)
>>> {
>>> if (!s->memcg_params)
>>> return NULL;
>>> return s->memcg_params->memcg_caches[idx];
>>> }
>> This is equivalent to
>>
>> 1) struct memcg_cache_params *params = s->memcg_params;
>> 2) return params->memcg_caches[idx];
>>
>> If memcg_update_cache_size() is executed between steps 1 and 2 on
>> another CPU, at step 2 we will dereference memcg_params that has already
>> been freed. This is very unlikely, but still possible. Perhaps, we
>> should free old memcg params only after a sync_rcu()?
>>
> You seem to be right in this one. Indeed, if my mind does not betray
> me, That is how I freed
> the LRUs. (with synchronize_rcus).

Yes, you freed LRUs only after a sync_rcu, that's why the way
memcg_params is updated looks suspicious to me. I'll try to fix it then.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info

RE: [PATCH] usb: phy: Initilize the spinlock in notifier

2013-12-10 Thread Neil Zhang


> -Original Message-
> From: Neil Zhang [mailto:zhan...@marvell.com]
> Sent: 2013年12月11日 13:05
> To: ba...@ti.com; gre...@linuxfoundation.org
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Neil Zhang
> Subject: [PATCH] usb: phy: Initilize the spinlock in notifier
> 
> We need to initilize every spinlock before use it.
> So lets initilize the spinlock in notifier when add a new phy device.
> 
> Signed-off-by: Neil Zhang 
> ---
>  drivers/usb/phy/phy.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/usb/phy/phy.c b/drivers/usb/phy/phy.c index
> 1b74523..479ceb8 100644
> --- a/drivers/usb/phy/phy.c
> +++ b/drivers/usb/phy/phy.c
> @@ -367,6 +367,8 @@ int usb_add_phy_dev(struct usb_phy *x)
>   return -EINVAL;
>   }
> 
> + spin_lock_init(>notifier.lock);
> +
>   spin_lock_irqsave(_lock, flags);
>   list_for_each_entry(phy_bind, _bind_list, list)
>   if (!(strcmp(phy_bind->phy_dev_name, dev_name(x->dev
> --
> 1.7.9.5

Please ignore this patch and check another one.
Sorry for this noise.

Best Regards,
Neil Zhang

[PATCH] usb: phy: initilize the notifier when add a new phy

2013-12-10 Thread Neil Zhang

We need to initilize the notifer before use it.
So lets initilize it when add a new phy device to reduce the code
redundance.

Signed-off-by: Neil Zhang 
---
 drivers/usb/phy/phy-ab8500-usb.c|2 --
 drivers/usb/phy/phy-generic.c   |1 -
 drivers/usb/phy/phy-gpio-vbus-usb.c |2 --
 drivers/usb/phy/phy-mxs-usb.c   |2 --
 drivers/usb/phy/phy.c   |4 
 5 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/phy/phy-ab8500-usb.c b/drivers/usb/phy/phy-ab8500-usb.c
index 0874023..11ab2c4 100644
--- a/drivers/usb/phy/phy-ab8500-usb.c
+++ b/drivers/usb/phy/phy-ab8500-usb.c
@@ -1415,8 +1415,6 @@ static int ab8500_usb_probe(struct platform_device *pdev)
 
platform_set_drvdata(pdev, ab);
 
-   ATOMIC_INIT_NOTIFIER_HEAD(>phy.notifier);
-
/* all: Disable phy when called from set_host and set_peripheral */
INIT_WORK(>phy_dis_work, ab8500_usb_phy_disable_work);
 
diff --git a/drivers/usb/phy/phy-generic.c b/drivers/usb/phy/phy-generic.c
index aa6d37b..bb39498 100644
--- a/drivers/usb/phy/phy-generic.c
+++ b/drivers/usb/phy/phy-generic.c
@@ -241,7 +241,6 @@ int usb_phy_gen_create_phy(struct device *dev, struct 
usb_phy_gen_xceiv *nop,
nop->phy.otg->set_host  = nop_set_host;
nop->phy.otg->set_peripheral= nop_set_peripheral;
 
-   ATOMIC_INIT_NOTIFIER_HEAD(>phy.notifier);
return 0;
 }
 EXPORT_SYMBOL_GPL(usb_phy_gen_create_phy);
diff --git a/drivers/usb/phy/phy-gpio-vbus-usb.c 
b/drivers/usb/phy/phy-gpio-vbus-usb.c
index 02799a5..69462e0 100644
--- a/drivers/usb/phy/phy-gpio-vbus-usb.c
+++ b/drivers/usb/phy/phy-gpio-vbus-usb.c
@@ -314,8 +314,6 @@ static int gpio_vbus_probe(struct platform_device *pdev)
goto err_irq;
}
 
-   ATOMIC_INIT_NOTIFIER_HEAD(_vbus->phy.notifier);
-
INIT_DELAYED_WORK(_vbus->work, gpio_vbus_work);
 
gpio_vbus->vbus_draw = regulator_get(>dev, "vbus_draw");
diff --git a/drivers/usb/phy/phy-mxs-usb.c b/drivers/usb/phy/phy-mxs-usb.c
index 545844b..9df81f0 100644
--- a/drivers/usb/phy/phy-mxs-usb.c
+++ b/drivers/usb/phy/phy-mxs-usb.c
@@ -160,8 +160,6 @@ static int mxs_phy_probe(struct platform_device *pdev)
mxs_phy->phy.notify_disconnect  = mxs_phy_on_disconnect;
mxs_phy->phy.type   = USB_PHY_TYPE_USB2;
 
-   ATOMIC_INIT_NOTIFIER_HEAD(_phy->phy.notifier);
-
mxs_phy->clk = clk;
 
platform_set_drvdata(pdev, mxs_phy);
diff --git a/drivers/usb/phy/phy.c b/drivers/usb/phy/phy.c
index 1b74523..6689bd0 100644
--- a/drivers/usb/phy/phy.c
+++ b/drivers/usb/phy/phy.c
@@ -329,6 +329,8 @@ int usb_add_phy(struct usb_phy *x, enum usb_phy_type type)
return -EINVAL;
}
 
+   ATOMIC_INIT_NOTIFIER_HEAD(>phy.notifier);
+
spin_lock_irqsave(_lock, flags);
 
list_for_each_entry(phy, _list, head) {
@@ -367,6 +369,8 @@ int usb_add_phy_dev(struct usb_phy *x)
return -EINVAL;
}
 
+   ATOMIC_INIT_NOTIFIER_HEAD(>phy.notifier);
+
spin_lock_irqsave(_lock, flags);
list_for_each_entry(phy_bind, _bind_list, list)
if (!(strcmp(phy_bind->phy_dev_name, dev_name(x->dev
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] security: ima: new helper: file_inode(file)

2013-12-10 Thread Libo Chen


Signed-off-by: Libo Chen 
---
 security/integrity/ima/ima_api.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index c38bbce..6d76d4a 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -132,7 +132,7 @@ void ima_add_violation(struct file *file, const unsigned 
char *filename,
   const char *op, const char *cause)
 {
struct ima_template_entry *entry;
-   struct inode *inode = file->f_dentry->d_inode;
+   struct inode *inode = file_inode(file);
int violation = 1;
int result;

-- 
1.8.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fs: new helper: file_inode(file)

2013-12-10 Thread Libo Chen


Signed-off-by: Libo Chen 
---
 include/linux/fs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 121f11f..a146e2a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2780,7 +2780,7 @@ static inline bool dir_emit(struct dir_context *ctx,
 static inline bool dir_emit_dot(struct file *file, struct dir_context *ctx)
 {
return ctx->actor(ctx, ".", 1, ctx->pos,
- file->f_path.dentry->d_inode->i_ino, DT_DIR) == 0;
+ file_inode(file)->i_ino, DT_DIR) == 0;
 }
 static inline bool dir_emit_dotdot(struct file *file, struct dir_context *ctx)
 {
-- 
1.8.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] f2fs: fix the location of tracepoint

2013-12-10 Thread Jaegeuk Kim

We need to get a trace before submit_bio, since its bi_sector is remapped during
the submit_bio.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/data.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index ebc9177..969df55 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -103,9 +103,10 @@ static void __submit_merged_bio(struct f2fs_bio_info *io)
 
rw = fio->rw | fio->rw_flag;
 
+   trace_f2fs_submit_write_bio(io->sbi->sb, rw, fio->type, io->bio);
+
if (is_read_io(rw)) {
submit_bio(rw, io->bio);
-   trace_f2fs_submit_read_bio(io->sbi->sb, rw, fio->type, io->bio);
io->bio = NULL;
return;
}
@@ -122,7 +123,6 @@ static void __submit_merged_bio(struct f2fs_bio_info *io)
} else {
submit_bio(rw, io->bio);
}
-   trace_f2fs_submit_write_bio(io->sbi->sb, rw, fio->type, io->bio);
io->bio = NULL;
 }
 
-- 
1.8.4.474.g128a96c

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] f2fs: refactor bio->rw handling

2013-12-10 Thread Jaegeuk Kim

This patch introduces f2fs_io_info to mitigate the complex parameter list.

struct f2fs_io_info {
enum page_type type;/* contains DATA/NODE/META/META_FLUSH */
int rw; /* contains R/RS/W/WS */
int rw_flag;/* contains REQ_META/REQ_PRIO */
}

1. f2fs_write_data_pages
 - DATA
 - WRITE_SYNC is set when wbc->WB_SYNC_ALL.

2. sync_node_pages
 - NODE
 - WRITE_SYNC all the time

3. sync_meta_pages
 - META
 - WRITE_SYNC all the time
 - REQ_META | REQ_PRIO all the time

 ** f2fs_submit_merged_bio() handles META_FLUSH.

4. ra_nat_pages, ra_sit_pages, ra_sum_pages
 - META
 - READ_SYNC

Cc: Fan Li 
Cc: Changman Lee 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/checkpoint.c | 11 ---
 fs/f2fs/data.c   | 85 ++--
 fs/f2fs/f2fs.h   | 22 +-
 fs/f2fs/gc.c | 10 ---
 fs/f2fs/node.c   | 22 ++
 fs/f2fs/segment.c| 70 ++-
 fs/f2fs/super.c  |  7 -
 7 files changed, 132 insertions(+), 95 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index cf505eb..f8c0749 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -164,8 +164,7 @@ long sync_meta_pages(struct f2fs_sb_info *sbi, enum 
page_type type,
}
 
if (nwritten)
-   f2fs_submit_merged_bio(sbi, type, nr_to_write == LONG_MAX,
-   WRITE);
+   f2fs_submit_merged_bio(sbi, type, WRITE);
 
return nwritten;
 }
@@ -598,7 +597,7 @@ retry:
 * We should submit bio, since it exists several
 * wribacking dentry pages in the freeing inode.
 */
-   f2fs_submit_merged_bio(sbi, DATA, true, WRITE);
+   f2fs_submit_merged_bio(sbi, DATA, WRITE);
}
goto retry;
 }
@@ -804,9 +803,9 @@ void write_checkpoint(struct f2fs_sb_info *sbi, bool 
is_umount)
 
trace_f2fs_write_checkpoint(sbi->sb, is_umount, "finish block_ops");
 
-   f2fs_submit_merged_bio(sbi, DATA, true, WRITE);
-   f2fs_submit_merged_bio(sbi, NODE, true, WRITE);
-   f2fs_submit_merged_bio(sbi, META, true, WRITE);
+   f2fs_submit_merged_bio(sbi, DATA, WRITE);
+   f2fs_submit_merged_bio(sbi, NODE, WRITE);
+   f2fs_submit_merged_bio(sbi, META, WRITE);
 
/*
 * update checkpoint pack index
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index fb5e5c2..ebc9177 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -93,37 +93,28 @@ static void f2fs_write_end_io(struct bio *bio, int err)
bio_put(bio);
 }
 
-static void __submit_merged_bio(struct f2fs_sb_info *sbi,
-   struct f2fs_bio_info *io,
-   enum page_type type, bool sync, int rw)
+static void __submit_merged_bio(struct f2fs_bio_info *io)
 {
-   enum page_type btype = PAGE_TYPE_OF_BIO(type);
+   struct f2fs_io_info *fio = >fio;
+   int rw;
 
if (!io->bio)
return;
 
-   if (btype == META)
-   rw |= REQ_META;
+   rw = fio->rw | fio->rw_flag;
 
if (is_read_io(rw)) {
-   if (sync)
-   rw |= READ_SYNC;
submit_bio(rw, io->bio);
-   trace_f2fs_submit_read_bio(sbi->sb, rw, type, io->bio);
+   trace_f2fs_submit_read_bio(io->sbi->sb, rw, fio->type, io->bio);
io->bio = NULL;
return;
}
 
-   if (sync)
-   rw |= WRITE_SYNC;
-   if (type >= META_FLUSH)
-   rw |= WRITE_FLUSH_FUA;
-
/*
 * META_FLUSH is only from the checkpoint procedure, and we should wait
 * this metadata bio for FS consistency.
 */
-   if (type == META_FLUSH) {
+   if (fio->type == META_FLUSH) {
DECLARE_COMPLETION_ONSTACK(wait);
io->bio->bi_private = 
submit_bio(rw, io->bio);
@@ -131,12 +122,12 @@ static void __submit_merged_bio(struct f2fs_sb_info *sbi,
} else {
submit_bio(rw, io->bio);
}
-   trace_f2fs_submit_write_bio(sbi->sb, rw, btype, io->bio);
+   trace_f2fs_submit_write_bio(io->sbi->sb, rw, fio->type, io->bio);
io->bio = NULL;
 }
 
 void f2fs_submit_merged_bio(struct f2fs_sb_info *sbi,
-   enum page_type type, bool sync, int rw)
+   enum page_type type, int rw)
 {
enum page_type btype = PAGE_TYPE_OF_BIO(type);
struct f2fs_bio_info *io;
@@ -144,7 +135,13 @@ void f2fs_submit_merged_bio(struct f2fs_sb_info *sbi,
io = is_read_io(rw) ? >read_io : >write_io[btype];
 
mutex_lock(>io_mutex);
-   __submit_merged_bio(sbi, io, type, sync, rw);
+
+   /* change META to META_FLUSH in the checkpoint procedure */
+   if (type >= META_FLUSH) {
+

Re: sched: RT throttling activated, 3.12.3

2013-12-10 Thread Howard Chu


Howard Chu wrote:

Howard Chu wrote:

Li Zefan wrote:

On 2013/12/11 10:59, Howard Chu wrote:

I just upgraded a system from a 3.5 kernel to 3.12.3 and attempted to run some 
new benchmarks on it. I see my test program ramps up in CPU usage for a few 
seconds and then it gradually tails off. There's nothing obvious in the user 
code to trigger this behavior, so I check dmesg, and see this:

[   55.037057] JFS: nTxBlock = 8192, nTxLock = 65536
[163591.807470] perf samples too long (2758 > 2500), lowering 
kernel.perf_event_max_sample_rate to 5
[164061.362762] perf samples too long (5204 > 5000), lowering 
kernel.perf_event_max_sample_rate to 25000
[167969.339513] [sched_delayed] sched: RT throttling activated
[182741.484637] perf samples too long (294588 > 1), lowering 
kernel.perf_event_max_sample_rate to 12500
[182741.484726] INFO: NMI handler (perf_event_nmi_handler) took too long to 
run: 36.665 msecs
[182822.633084] perf samples too long (292359 > 2), lowering 
kernel.perf_event_max_sample_rate to 6250
[182905.606119] perf samples too long (290291 > 4), lowering 
kernel.perf_event_max_sample_rate to 3250
[199384.293514] perf samples too long (288142 > 76923), lowering 
kernel.perf_event_max_sample_rate to 1750
[208507.301027] perf samples too long (285964 > 142857), lowering 
kernel.perf_event_max_sample_rate to 1000
[208528.976208] perf samples too long (283799 > 25), lowering 
kernel.perf_event_max_sample_rate to 500

Why is the kernel throttling my server?



Because that is the default setting of the kernel.


Apparently a "new" default that didn't exist in 3.5? The code in question is
not a realtime process. This behavior also wasn't seen in 3.10 or any older
kernels.


I just downgraded to 3.10.23 to doublecheck - everything is running normally
there, although a few percent slower than I expected. (Last time I tried 3.10
it was 3.10.11.)


For comparison, here's a "normally" behaving benchmark run:
http://highlandsun.com/hyc/linux3.10/

The result is a fairly steady 15,000 ops/sec and CPU usage is around 190% 
(this is a quadcore machine).


On the 3.12.3 kernel:
http://highlandsun.com/hyc/linux3.12/

The CPU usage is initially around 180% but quickly plummets to about 7% and 
stays there. This is a pretty major regression for a "default" kernel setting. 
And given that the target process isn't running with realtime scheduling 
priority, this can only be considered a bug. (Btw, setting both 
sched_rt_period_us and sched_rt_runtime_us to -1 has no effect on this behavior.)


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND] fs: btrfs: new helper: file_inode(file)

2013-12-10 Thread Libo Chen


Signed-off-by: Libo Chen 
---
 fs/btrfs/ioctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

 - just change style

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a111622..fdfc0d7 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2690,7 +2690,7 @@ static long btrfs_ioctl_file_extent_same(struct file 
*file,
struct btrfs_ioctl_same_args tmp;
struct btrfs_ioctl_same_args *same;
struct btrfs_ioctl_same_extent_info *info;
-   struct inode *src = file->f_dentry->d_inode;
+   struct inode *src = file_inode(file);
struct file *dst_file = NULL;
struct inode *dst;
u64 off;
@@ -2779,7 +2779,7 @@ static long btrfs_ioctl_file_extent_same(struct file 
*file,
if (file->f_path.mnt != dst_file->f_path.mnt)
goto next;

-   dst = dst_file->f_dentry->d_inode;
+   dst = file_inode(dst_file);
if (src->i_sb != dst->i_sb)
goto next;

-- 
1.8.2.2



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] dmaengine: remove obsolete comment reference to dma_data_direction

2013-12-10 Thread Alexander Popov

enum dma_transfer_direction is currently used
in struct dma_slave_config, so update the comment

Signed-off-by: Alexander Popov 
---
 include/linux/dmaengine.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 41cf0c3..bd6b882 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -305,9 +305,8 @@ enum dma_slave_buswidth {
 /**
  * struct dma_slave_config - dma slave channel runtime config
  * @direction: whether the data shall go in or out on this slave
- * channel, right now. DMA_TO_DEVICE and DMA_FROM_DEVICE are
- * legal values, DMA_BIDIRECTIONAL is not acceptable since we
- * need to differentiate source and target addresses.
+ * channel, right now. DMA_MEM_TO_DEV and DMA_DEV_TO_MEM are
+ * legal values.
  * @src_addr: this is the physical address where DMA slave data
  * should be read (RX), if the source is memory this argument is
  * ignored.
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fs: btrfs: new helper: file_inode(file)

2013-12-10 Thread Libo Chen


Signed-off-by: Libo Chen f_dentry->d_inode;
+   struct inode *src = file_inode(file);
struct file *dst_file = NULL;
struct inode *dst;
u64 off;
@@ -2779,7 +2779,7 @@ static long btrfs_ioctl_file_extent_same(struct file 
*file,
if (file->f_path.mnt != dst_file->f_path.mnt)
goto next;

-   dst = dst_file->f_dentry->d_inode;
+   dst = file_inode(dst_file);
if (src->i_sb != dst->i_sb)
goto next;

-- 
1.8.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 1/2] proc: change return values while get_proc_task err

2013-12-10 Thread Rui Xiang

On 2013/12/11 7:02, Andrew Morton wrote:
> On Mon, 9 Dec 2013 20:11:54 +0800 Rui Xiang  wrote:
> 
>> While getting proc task error, it shoule return -ESRCH.
>>
>> ...
>>
>> --- a/fs/proc/base.c
>> +++ b/fs/proc/base.c
>> @@ -174,9 +174,10 @@ static int get_task_root(struct task_struct *task, 
>> struct path *root)
>>  static int proc_cwd_link(struct dentry *dentry, struct path *path)
>>  {
>>  struct task_struct *task = get_proc_task(dentry->d_inode);
>> -int result = -ENOENT;
>> +int result = -ESRCH;
> 
> I suppose so.  But there is risk here of breaking existing applications
> and I don't think this problem is serious enough to justify that risk?
> 
Got it. Thanks for your comment.

Thanks
 Rui


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fs: ceph: new helper: file_inode(file)

2013-12-10 Thread Sage Weil

Applied, thanks!

On Wed, 11 Dec 2013, Libo Chen wrote:

> 
> Signed-off-by: Libo Chen 
> ---
>  fs/ceph/file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 3de8982..7549bd6 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -1018,7 +1018,7 @@ static long ceph_fallocate(struct file *file, int mode,
>   loff_t offset, loff_t length)
>  {
>   struct ceph_file_info *fi = file->private_data;
> - struct inode *inode = file->f_dentry->d_inode;
> + struct inode *inode = file_inode(file);
>   struct ceph_inode_info *ci = ceph_inode(inode);
>   struct ceph_osd_client *osdc =
>   _inode_to_client(inode)->client->osdc;
> -- 
> 1.8.2.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fs: ceph: new helper: file_inode(file)

2013-12-10 Thread Libo Chen


Signed-off-by: Libo Chen 
---
 fs/ceph/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 3de8982..7549bd6 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1018,7 +1018,7 @@ static long ceph_fallocate(struct file *file, int mode,
loff_t offset, loff_t length)
 {
struct ceph_file_info *fi = file->private_data;
-   struct inode *inode = file->f_dentry->d_inode;
+   struct inode *inode = file_inode(file);
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_osd_client *osdc =
_inode_to_client(inode)->client->osdc;
-- 
1.8.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 02/11] arm: pxa27x: support ICP DAS LP-8x4x

2013-12-10 Thread Sergei Ianovich

On Wed, 2013-12-11 at 06:11 +0100, Arnd Bergmann wrote:
> It depends: if the driver is for the entire FPGA and does
> the irqchip stuff in addition, it should probably live
> in drivers/mfd. If it's a pure irqchip driver, drivers/irqchip
> is better. You have to be careful in the second case though
> because devices pointing to this irqchip in DT won't get
> an IRQ resource assigned automatically but have to use
> irq_of_parse_and_map instead. This may have been fixed since
> I last looked though, I would consider that behavior a
> bug in the of_platform handling.

Thanks again for explaining.

Although FPGA is a MFD, its irq are used mostly by 8250 serial ports. It
will be enough to embed irqchip into serial driver for now.

If there is a need to use the chip in another driver, it will be
possible to introduce an artificial dependency there for the serial
driver to ensure the serial driver is loaded before the other one.

Is this plan acceptable?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH driver-core-next] sysfs: bail early from kernfs_file_mmap() to avoid spurious lockdep warning

2013-12-10 Thread Greg KH

On Tue, Dec 10, 2013 at 09:50:04AM -0500, Tejun Heo wrote:
> Hello,
> 
> Yeap, I was planning to send this out earlier but was completely
> passed out yesterday after the first snowboarding in over a decade. :)
> 
> The offending commit a8b14744429f isn't applicable to
> driver-core-next.  This was done this way because no matter what we
> do, conflict is inevitable and keeping things minimal is the least
> painful.  The following git branch pulls driver-core-linus into
> driver-core-next, resolves the conflict by ignoring the offending
> commit and applies this patch on top of it to implement the equivalent
> fix.
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git 
> driver-core-fix-lockdep-class

I've pulled from this and pushed it out as my driver-next branch, thanks
for doing this (note, I --ammend the patch and added my signed-off-by,
so you can't just pull and not get a conflict.)

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm,x86: fix span coverage in e820_all_mapped()

2013-12-10 Thread H. Peter Anvin

Is that an actual requirement of the API?

Xishi Qiu  wrote:
>On 2013/12/11 12:02, H. Peter Anvin wrote:
>
>> On 12/10/2013 07:55 PM, Xishi Qiu wrote:
>>>
>>> I think there is a problem.
>>> e.g.
>>> [start, end)=[8, 12), and [A, B)=[0, 10), [B, C)=[10,20),
>>> then e820_all_mapped() will return 1, it spans two regions.
>>>
>> 
>> Why is that a problem?
>> 
>
>[start, end) should be included in one region ?
>
>Thanks,
>Xishi Qiu
>
>>  -hpa
>> 
>> 
>> 
>> 

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC part2 PATCH 8/9] ACPI / ARM64: Update acpi_register_gsi to register with the core IRQ subsystem

2013-12-10 Thread Arnd Bergmann

On Tuesday 10 December 2013, Grant Likely wrote:
> > --- a/drivers/acpi/plat/arm-core.c
> > +++ b/drivers/acpi/plat/arm-core.c
> > @@ -90,7 +90,7 @@ enum acpi_irq_model_id acpi_irq_model = 
> > ACPI_IRQ_MODEL_GIC;
> >  
> >  static unsigned int gsi_to_irq(unsigned int gsi)
> >  {
> > - int irq = irq_create_mapping(NULL, gsi);
> > + int irq = irq_find_mapping(NULL, gsi);
> 
> I suspect this will break FDT users that depend on the old behaviour.

I think not, given this is only in drivers/acpi and gets added in one
of the prior patches of the same series.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -next 0/3] seq_printf/puts/putc: Start to convert to return void

2013-12-10 Thread David Miller

From: Joe Perches 
Date: Tue, 10 Dec 2013 21:12:41 -0800

> Many uses of the return value of seq_printf/seq_puts/seq_putc are
> incorrect.  Many assume that the return value is the number of
> chars emitted into a buffer like printf/puts/putc.
> 
> It would be better to make the return value of these functions void
> to avoid these misuses.
> 
> Start to do so.
> Convert seq_overflow to a public function from a static function.
> 
> Remove the return uses of seq_printf/seq_puts/seq_putc from net.
> Add a seq_overflow function call instead.

I'm fine with this going in whatever tree is appropriate for
the seq_overflow un-static change:

Acked-by: David S. Miller 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -next 2/3] batman-adv: Use seq_overflow

2013-12-10 Thread Joe Perches

Convert the uses of the return of seq_printf to
instead check seq_overflow to determine if a buffer
overflow has occurred.

This will eventually allow seq_printf & seq_puts to
be converted to a void return instead of the often
misused return that is often assumed to be an int for
the number of bytes emitted ala printk.

Signed-off-by: Joe Perches 
---
 net/batman-adv/gateway_client.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/net/batman-adv/gateway_client.c b/net/batman-adv/gateway_client.c
index 2449afa..dfa5d2d 100644
--- a/net/batman-adv/gateway_client.c
+++ b/net/batman-adv/gateway_client.c
@@ -517,29 +517,28 @@ static int batadv_write_buffer_text(struct batadv_priv 
*bat_priv,
 {
struct batadv_gw_node *curr_gw;
struct batadv_neigh_node *router;
-   int ret = -1;
 
router = batadv_orig_node_get_router(gw_node->orig_node);
if (!router)
-   goto out;
+   return -1;
 
curr_gw = batadv_gw_get_selected_gw_node(bat_priv);
 
-   ret = seq_printf(seq, "%s %pM (%3i) %pM [%10s]: %u.%u/%u.%u MBit\n",
-(curr_gw == gw_node ? "=>" : "  "),
-gw_node->orig_node->orig,
-router->bat_iv.tq_avg, router->addr,
-router->if_incoming->net_dev->name,
-gw_node->bandwidth_down / 10,
-gw_node->bandwidth_down % 10,
-gw_node->bandwidth_up / 10,
-gw_node->bandwidth_up % 10);
+   seq_printf(seq, "%s %pM (%3i) %pM [%10s]: %u.%u/%u.%u MBit\n",
+  (curr_gw == gw_node ? "=>" : "  "),
+  gw_node->orig_node->orig,
+  router->bat_iv.tq_avg, router->addr,
+  router->if_incoming->net_dev->name,
+  gw_node->bandwidth_down / 10,
+  gw_node->bandwidth_down % 10,
+  gw_node->bandwidth_up / 10,
+  gw_node->bandwidth_up % 10);
 
batadv_neigh_node_free_ref(router);
if (curr_gw)
batadv_gw_node_free_ref(curr_gw);
-out:
-   return ret;
+
+   return seq_overflow(seq);
 }
 
 int batadv_gw_client_seq_print_text(struct seq_file *seq, void *offset)
-- 
1.8.1.2.459.gbcd45b4.dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -next 0/3] seq_printf/puts/putc: Start to convert to return void

2013-12-10 Thread Joe Perches

Many uses of the return value of seq_printf/seq_puts/seq_putc are
incorrect.  Many assume that the return value is the number of
chars emitted into a buffer like printf/puts/putc.

It would be better to make the return value of these functions void
to avoid these misuses.

Start to do so.
Convert seq_overflow to a public function from a static function.

Remove the return uses of seq_printf/seq_puts/seq_putc from net.
Add a seq_overflow function call instead.

Joe Perches (3):
  seq: Add a seq_overflow test.
  batman-adv: Use seq_overflow
  netfilter: Use seq_overflow

 fs/seq_file.c  | 15 
 include/linux/seq_file.h   |  2 +
 include/net/netfilter/nf_conntrack_acct.h  |  3 +-
 net/batman-adv/gateway_client.c| 25 ++--
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |  6 ++-
 .../netfilter/nf_conntrack_l3proto_ipv4_compat.c   | 42 +
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |  6 ++-
 net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c | 10 +++--
 net/netfilter/nf_conntrack_acct.c  | 11 +++---
 net/netfilter/nf_conntrack_expect.c|  4 +-
 net/netfilter/nf_conntrack_proto_dccp.c| 12 --
 net/netfilter/nf_conntrack_proto_gre.c | 15 +---
 net/netfilter/nf_conntrack_proto_sctp.c| 12 --
 net/netfilter/nf_conntrack_proto_tcp.c | 11 --
 net/netfilter/nf_conntrack_proto_udp.c |  7 ++--
 net/netfilter/nf_conntrack_proto_udplite.c |  7 ++--
 net/netfilter/nf_conntrack_standalone.c| 44 +-
 net/netfilter/nf_log.c | 26 ++---
 net/netfilter/nfnetlink_log.c  | 12 +++---
 net/netfilter/nfnetlink_queue_core.c   | 14 ---
 net/netfilter/x_tables.c   |  8 ++--
 net/netfilter/xt_hashlimit.c   | 34 +
 22 files changed, 191 insertions(+), 135 deletions(-)

-- 
1.8.1.2.459.gbcd45b4.dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -next 1/3] seq: Add a seq_overflow test.

2013-12-10 Thread Joe Perches

seq_printf and seq_puts returns are often misused.

Instead of checking the seq_printf or seq_puts return,
add a new seq_overflow function to test if a seq_file has
overflowed the available buffer space.

This will eventually allow seq_printf and seq_puts to be
converted to have a void return instead of the int return
that is often assumed to have a size_t value instead of an
error/no-error value.

Signed-off-by: Joe Perches 
---
 fs/seq_file.c| 15 ---
 include/linux/seq_file.h |  2 ++
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/seq_file.c b/fs/seq_file.c
index 1d641bb..aab0736 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -14,16 +14,17 @@
 #include 
 #include 
 
-
-/*
- * seq_files have a buffer which can may overflow. When this happens a larger
- * buffer is reallocated and all the data will be printed again.
- * The overflow state is true when m->count == m->size.
+/**
+ * seq_overflow - test if a seq_file has overflowed the space available
+ * @m: the seq_file handle
+ *
+ * Returns -1 when an overflow has occurred, 0 otherwise.
  */
-static bool seq_overflow(struct seq_file *m)
+int seq_overflow(struct seq_file *m)
 {
-   return m->count == m->size;
+   return m->count == m->size ? -1 : 0;
 }
+EXPORT_SYMBOL(seq_overflow);
 
 static void seq_set_overflow(struct seq_file *m)
 {
diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
index 52e0097..f8f9dc0 100644
--- a/include/linux/seq_file.h
+++ b/include/linux/seq_file.h
@@ -107,6 +107,8 @@ int seq_write(struct seq_file *seq, const void *data, 
size_t len);
 __printf(2, 3) int seq_printf(struct seq_file *, const char *, ...);
 __printf(2, 0) int seq_vprintf(struct seq_file *, const char *, va_list args);
 
+int seq_overflow(struct seq_file *m);
+
 int seq_path(struct seq_file *, const struct path *, const char *);
 int seq_dentry(struct seq_file *, struct dentry *, const char *);
 int seq_path_root(struct seq_file *m, const struct path *path,
-- 
1.8.1.2.459.gbcd45b4.dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -next 3/3] netfilter: Use seq_overflow

2013-12-10 Thread Joe Perches

Convert the uses of the return of seq_printf/seq_puts/seq_putc to
use check seq_overflow to determine if a buffer overflow has occurred.

This will eventually allow seq_printf & seq_puts to be converted to a
void return instead of the often misused return that is often assumed
to be an int for the number of bytes emitted ala printk.

Convert seq_print_acct to return int.

Signed-off-by: Joe Perches 
---
 include/net/netfilter/nf_conntrack_acct.h  |  3 +-
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |  6 ++-
 .../netfilter/nf_conntrack_l3proto_ipv4_compat.c   | 42 +
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |  6 ++-
 net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c | 10 +++--
 net/netfilter/nf_conntrack_acct.c  | 11 +++---
 net/netfilter/nf_conntrack_expect.c|  4 +-
 net/netfilter/nf_conntrack_proto_dccp.c| 12 --
 net/netfilter/nf_conntrack_proto_gre.c | 15 +---
 net/netfilter/nf_conntrack_proto_sctp.c| 12 --
 net/netfilter/nf_conntrack_proto_tcp.c | 11 --
 net/netfilter/nf_conntrack_proto_udp.c |  7 ++--
 net/netfilter/nf_conntrack_proto_udplite.c |  7 ++--
 net/netfilter/nf_conntrack_standalone.c| 44 +-
 net/netfilter/nf_log.c | 26 ++---
 net/netfilter/nfnetlink_log.c  | 12 +++---
 net/netfilter/nfnetlink_queue_core.c   | 14 ---
 net/netfilter/x_tables.c   |  8 ++--
 net/netfilter/xt_hashlimit.c   | 34 +
 19 files changed, 169 insertions(+), 115 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_acct.h 
b/include/net/netfilter/nf_conntrack_acct.h
index 79d8d16..939bffe 100644
--- a/include/net/netfilter/nf_conntrack_acct.h
+++ b/include/net/netfilter/nf_conntrack_acct.h
@@ -46,8 +46,7 @@ struct nf_conn_acct *nf_ct_acct_ext_add(struct nf_conn *ct, 
gfp_t gfp)
return acct;
 };
 
-unsigned int seq_print_acct(struct seq_file *s, const struct nf_conn *ct,
-   int dir);
+int seq_print_acct(struct seq_file *s, const struct nf_conn *ct, int dir);
 
 /* Check if connection tracking accounting is enabled */
 static inline bool nf_ct_acct_enabled(struct net *net)
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c 
b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index ecd8bec..aa07f0b 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -59,8 +59,10 @@ static bool ipv4_invert_tuple(struct nf_conntrack_tuple 
*tuple,
 static int ipv4_print_tuple(struct seq_file *s,
const struct nf_conntrack_tuple *tuple)
 {
-   return seq_printf(s, "src=%pI4 dst=%pI4 ",
- >src.u3.ip, >dst.u3.ip);
+   seq_printf(s, "src=%pI4 dst=%pI4 ",
+  >src.u3.ip, >dst.u3.ip);
+
+   return seq_overflow(s);
 }
 
 static int ipv4_get_l4proto(const struct sk_buff *skb, unsigned int nhoff,
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c 
b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
index 4c48e43..67ba510 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
@@ -104,10 +104,10 @@ static int ct_show_secctx(struct seq_file *s, const 
struct nf_conn *ct)
if (ret)
return 0;
 
-   ret = seq_printf(s, "secctx=%s ", secctx);
+   seq_printf(s, "secctx=%s ", secctx);
 
security_release_secctx(secctx, len);
-   return ret;
+   return seq_overflow(s);
 }
 #else
 static inline int ct_show_secctx(struct seq_file *s, const struct nf_conn *ct)
@@ -141,10 +141,11 @@ static int ct_seq_show(struct seq_file *s, void *v)
NF_CT_ASSERT(l4proto);
 
ret = -ENOSPC;
-   if (seq_printf(s, "%-8s %u %ld ",
- l4proto->name, nf_ct_protonum(ct),
- timer_pending(>timeout)
- ? (long)(ct->timeout.expires - jiffies)/HZ : 0) != 0)
+   seq_printf(s, "%-8s %u %ld ",
+  l4proto->name, nf_ct_protonum(ct),
+  timer_pending(>timeout)
+  ? (long)(ct->timeout.expires - jiffies)/HZ : 0);
+   if (seq_overflow(s))
goto release;
 
if (l4proto->print_conntrack && l4proto->print_conntrack(s, ct))
@@ -154,35 +155,44 @@ static int ct_seq_show(struct seq_file *s, void *v)
l3proto, l4proto))
goto release;
 
-   if (seq_print_acct(s, ct, IP_CT_DIR_ORIGINAL))
+   seq_print_acct(s, ct, IP_CT_DIR_ORIGINAL);
+   if (seq_overflow(s))
goto release;
 
-   if (!(test_bit(IPS_SEEN_REPLY_BIT, >status)))
-   if (seq_printf(s, "[UNREPLIED] "))
+   if (!(test_bit(IPS_SEEN_REPLY_BIT, >status))) {
+

Re: [PATCH 02/11] arm: pxa27x: support ICP DAS LP-8x4x

2013-12-10 Thread Arnd Bergmann

On Wednesday 11 December 2013, Sergei Ianovich wrote:
> > It probably makes sense to have a single driver file for the
> > FPGA device that does this, and only split out the other devices
> > from it that consume the irqs.
> 
> Is drivers/irqchip/ the right place this driver?
> 
> I am asking because there is no tristate config options in
> drivers/irqchip/Kconfig at the moment.
> 

It depends: if the driver is for the entire FPGA and does
the irqchip stuff in addition, it should probably live
in drivers/mfd. If it's a pure irqchip driver, drivers/irqchip
is better. You have to be careful in the second case though
because devices pointing to this irqchip in DT won't get
an IRQ resource assigned automatically but have to use
irq_of_parse_and_map instead. This may have been fixed since
I last looked though, I would consider that behavior a
bug in the of_platform handling.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net V3 1/2] tun: unbreak truncated packet signalling

2013-12-10 Thread Jason Wang

Commit 6680ec68eff47d36f67b4351bc9836fd6cba9532
(tuntap: hardware vlan tx support) breaks the truncated packet signal by nev
return a length greater than iov length in tun_put_user(). This patch fixes
by always return the length of packet plus possible vlan header. Caller can
detect the truncated packet by comparing the return value and the size of io
length.

Cc: Zhi Yong Wu 
Cc: Michael S. Tsirkin 
Signed-off-by: Vlad Yasevich 
Signed-off-by: Jason Wang 
---
Changes from V2:
- use "copied" as the variable name instead to be more descriptive
Changes from v1:
- increase total unconditionally
- do not move veth structure out of the vlan handing block
---
 drivers/net/tun.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e26cbea..7c8343a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1184,7 +1184,7 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 {
struct tun_pi pi = { 0, skb->protocol };
ssize_t total = 0;
-   int vlan_offset = 0;
+   int vlan_offset = 0, copied;
 
if (!(tun->flags & TUN_NO_PI)) {
if ((len -= sizeof(pi)) < 0)
@@ -1248,6 +1248,8 @@ static ssize_t tun_put_user(struct tun_struct *tun,
total += tun->vnet_hdr_sz;
}
 
+   copied = total;
+   total += skb->len;
if (!vlan_tx_tag_present(skb)) {
len = min_t(int, skb->len, len);
} else {
@@ -1262,24 +1264,24 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 
vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto);
len = min_t(int, skb->len + VLAN_HLEN, len);
+   total += VLAN_HLEN;
 
copy = min_t(int, vlan_offset, len);
-   ret = skb_copy_datagram_const_iovec(skb, 0, iv, total, copy);
+   ret = skb_copy_datagram_const_iovec(skb, 0, iv, copied, copy);
len -= copy;
-   total += copy;
+   copied += copy;
if (ret || !len)
goto done;
 
copy = min_t(int, sizeof(veth), len);
-   ret = memcpy_toiovecend(iv, (void *), total, copy);
+   ret = memcpy_toiovecend(iv, (void *), copied, copy);
len -= copy;
-   total += copy;
+   copied += copy;
if (ret || !len)
goto done;
}
 
-   skb_copy_datagram_const_iovec(skb, vlan_offset, iv, total, len);
-   total += len;
+   skb_copy_datagram_const_iovec(skb, vlan_offset, iv, copied, len);
 
 done:
tun->dev->stats.tx_packets++;
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net V3 2/2] macvtap: signal truncated packets

2013-12-10 Thread Jason Wang

macvtap_put_user() never return a value grater than iov length, this in fact
bypasses the truncated checking in macvtap_recvmsg(). Fix this by always
returning the size of packet plus the possible vlan header to let the trunca
checking work.

Cc: Vlad Yasevich 
Cc: Zhi Yong Wu 
Cc: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
Changes from V2:
- use total as the bytes of packet returned
Changes from V1:
- increase total unconditionally
- do not move the structure veth out of the vlan handling block
---
 drivers/net/macvtap.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 957cc5c..2a89da0 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -770,7 +770,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
int ret;
int vnet_hdr_len = 0;
int vlan_offset = 0;
-   int copied;
+   int copied, total;
 
if (q->flags & IFF_VNET_HDR) {
struct virtio_net_hdr vnet_hdr;
@@ -785,7 +785,8 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
if (memcpy_toiovecend(iv, (void *)_hdr, 0, 
sizeof(vnet_hdr)))
return -EFAULT;
}
-   copied = vnet_hdr_len;
+   total = copied = vnet_hdr_len;
+   total += skb->len;
 
if (!vlan_tx_tag_present(skb))
len = min_t(int, skb->len, len);
@@ -800,6 +801,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
 
vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto);
len = min_t(int, skb->len + VLAN_HLEN, len);
+   total += VLAN_HLEN;
 
copy = min_t(int, vlan_offset, len);
ret = skb_copy_datagram_const_iovec(skb, 0, iv, copied, copy);
@@ -817,10 +819,9 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
}
 
ret = skb_copy_datagram_const_iovec(skb, vlan_offset, iv, copied, len);
-   copied += len;
 
 done:
-   return ret ? ret : copied;
+   return ret ? ret : total;
 }
 
 static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
@@ -875,7 +876,7 @@ static ssize_t macvtap_aio_read(struct kiocb *iocb, const 
struct iovec *iv,
}
 
ret = macvtap_do_read(q, iocb, iv, len, file->f_flags & O_NONBLOCK);
-   ret = min_t(ssize_t, ret, len); /* XXX copied from tun.c. Why? */
+   ret = min_t(ssize_t, ret, len);
if (ret > 0)
iocb->ki_pos = ret;
 out:
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] usb: phy: Initilize the spinlock in notifier

2013-12-10 Thread Neil Zhang

We need to initilize every spinlock before use it.
So lets initilize the spinlock in notifier when add a new phy device.

Signed-off-by: Neil Zhang 
---
 drivers/usb/phy/phy.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/usb/phy/phy.c b/drivers/usb/phy/phy.c
index 1b74523..479ceb8 100644
--- a/drivers/usb/phy/phy.c
+++ b/drivers/usb/phy/phy.c
@@ -367,6 +367,8 @@ int usb_add_phy_dev(struct usb_phy *x)
return -EINVAL;
}
 
+   spin_lock_init(>notifier.lock);
+
spin_lock_irqsave(_lock, flags);
list_for_each_entry(phy_bind, _bind_list, list)
if (!(strcmp(phy_bind->phy_dev_name, dev_name(x->dev
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/6] clocksource: dw_apb_timer_of: Fix support for dts binding "snps, dw-apb-timer"

2013-12-10 Thread Baruch Siach

Hi Daniel, Dinh,

On Tue, Dec 10, 2013 at 08:16:12PM +0100, Daniel Lezcano wrote:
> From: Dinh Nguyen 
> 
> In commit 620f5e1cbf (dts: Rename DW APB timer compatible strings), both
> "snps,dw-apb-timer-sp" and "snps,dw-apb-timer-osc" were deprecated in place
> of "snps,dw-apb-timer". But the driver also needs to be udpated in order to
> support this new binding "snps,dw-apb-timer".
> 
> Signed-off-by: Dinh Nguyen 
> Signed-off-by: Daniel Lezcano 
> ---

I guess this (and the previous one) should also be pushed to v3.12-stable, 
isn't it?

baruch

>  drivers/clocksource/dw_apb_timer_of.c |5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/clocksource/dw_apb_timer_of.c 
> b/drivers/clocksource/dw_apb_timer_of.c
> index b29d7cd..2a2ea27 100644
> --- a/drivers/clocksource/dw_apb_timer_of.c
> +++ b/drivers/clocksource/dw_apb_timer_of.c
> @@ -113,7 +113,6 @@ static u64 read_sched_clock(void)
>  
>  static const struct of_device_id sptimer_ids[] __initconst = {
>   { .compatible = "picochip,pc3x2-rtc" },
> - { .compatible = "snps,dw-apb-timer-sp" },
>   { /* Sentinel */ },
>  };
>  
> @@ -151,4 +150,6 @@ static void __init dw_apb_timer_init(struct device_node 
> *timer)
>   num_called++;
>  }
>  CLOCKSOURCE_OF_DECLARE(pc3x2_timer, "picochip,pc3x2-timer", 
> dw_apb_timer_init);
> -CLOCKSOURCE_OF_DECLARE(apb_timer, "snps,dw-apb-timer-osc", 
> dw_apb_timer_init);
> +CLOCKSOURCE_OF_DECLARE(apb_timer_osc, "snps,dw-apb-timer-osc", 
> dw_apb_timer_init);
> +CLOCKSOURCE_OF_DECLARE(apb_timer_sp, "snps,dw-apb-timer-sp", 
> dw_apb_timer_init);
> +CLOCKSOURCE_OF_DECLARE(apb_timer, "snps,dw-apb-timer", dw_apb_timer_init);
> -- 
> 1.7.9.5

-- 
 http://baruch.siach.name/blog/  ~. .~   Tk Open Systems
=}ooO--U--Ooo{=
   - bar...@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Dec 11

2013-12-10 Thread Stephen Rothwell

On Wed, 11 Dec 2013 15:56:48 +1100 Stephen Rothwell  
wrote:
>
> Non-merge commits (relative to Linus' tree): 3141
>  3634 files changed, 143080 insertions(+), 83685 deletions(-)

Actually:

Non-merge commits (relative to Linus' tree): 3236
 3756 files changed, 155884 insertions(+), 86331 deletions(-)

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpOwa9meKSFG.pgp
Description: PGP signature

[PATCH] net: macb: Fix build warning

2013-12-10 Thread Soren Brinkmann

When adjusting the link speed, the target frequency is determined by a
'swith (LINK_SPEED)' statement, that assigns the target rate only for
valid and expected LINK_SPEED values. This incomplete switch statement
leads to the following build warning:
 drivers/net/ethernet/cadence/macb.c: In function 'macb_handle_link_change':
  >> drivers/net/ethernet/cadence/macb.c:241:14: warning: 'rate' may be used 
uninitialized in this function [-Wmaybe-uninitialized]
netdev_warn(dev, "unable to generate target frequency: %ld Hz\n",
   ^
 drivers/net/ethernet/cadence/macb.c:215:13: note: 'rate' was declared here
   long ferr, rate, rate_rounded;

Fixing this by bailing out of that function in the switch's default case
before the rate variable is used.

Reported-by: kbuild test robot 
Signed-off-by: Soren Brinkmann 
---
 drivers/net/ethernet/cadence/macb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 419529a9309d..3190d38e16fb 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -225,7 +225,7 @@ static void macb_set_tx_clk(struct clk *clk, int speed, 
struct net_device *dev)
rate = 12500;
break;
default:
-   break;
+   return;
}
 
rate_rounded = clk_round_rate(clk, rate);
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched: RT throttling activated, 3.12.3

2013-12-10 Thread Howard Chu


Howard Chu wrote:

Li Zefan wrote:

On 2013/12/11 10:59, Howard Chu wrote:

I just upgraded a system from a 3.5 kernel to 3.12.3 and attempted to run some 
new benchmarks on it. I see my test program ramps up in CPU usage for a few 
seconds and then it gradually tails off. There's nothing obvious in the user 
code to trigger this behavior, so I check dmesg, and see this:

[   55.037057] JFS: nTxBlock = 8192, nTxLock = 65536
[163591.807470] perf samples too long (2758 > 2500), lowering 
kernel.perf_event_max_sample_rate to 5
[164061.362762] perf samples too long (5204 > 5000), lowering 
kernel.perf_event_max_sample_rate to 25000
[167969.339513] [sched_delayed] sched: RT throttling activated
[182741.484637] perf samples too long (294588 > 1), lowering 
kernel.perf_event_max_sample_rate to 12500
[182741.484726] INFO: NMI handler (perf_event_nmi_handler) took too long to 
run: 36.665 msecs
[182822.633084] perf samples too long (292359 > 2), lowering 
kernel.perf_event_max_sample_rate to 6250
[182905.606119] perf samples too long (290291 > 4), lowering 
kernel.perf_event_max_sample_rate to 3250
[199384.293514] perf samples too long (288142 > 76923), lowering 
kernel.perf_event_max_sample_rate to 1750
[208507.301027] perf samples too long (285964 > 142857), lowering 
kernel.perf_event_max_sample_rate to 1000
[208528.976208] perf samples too long (283799 > 25), lowering 
kernel.perf_event_max_sample_rate to 500

Why is the kernel throttling my server?



Because that is the default setting of the kernel.


Apparently a "new" default that didn't exist in 3.5? The code in question is
not a realtime process. This behavior also wasn't seen in 3.10 or any older
kernels.


I just downgraded to 3.10.23 to doublecheck - everything is running normally 
there, although a few percent slower than I expected. (Last time I tried 3.10 
it was 3.10.11.)


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: Tree for Dec 11

2013-12-10 Thread Stephen Rothwell

Hi all,

Changes since 20131210:

Removed tree: arm-v7-cache-opt (merged into the arm tree)

The powerpc tree still had its build failure for which I applied a
supplied patch.

The sound-asoc tree lost its build failure.

The usb-gadget tree still has its build failure so I used the version from
next-20131206.

The pinctrl tree lost its build failure.

Non-merge commits (relative to Linus' tree): 3141
 3634 files changed, 143080 insertions(+), 83685 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a
multi_v7_defconfig for arm. After the final fixups (if any), it is also
built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and
allyesconfig (minus CONFIG_PROFILE_ALL_BRANCHES - this fails its final
link) and i386, sparc, sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

I am currently merging 209 trees (counting Linus' and 29 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwell 

$ git checkout master
$ git reset --hard stable
Merging origin/master (5e0af24cee56 Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32)
Merging fixes/master (8ae516aa8b81 Merge tag 'trace-fixes-v3.13-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace)
Merging kbuild-current/rc-fixes (19514fc665ff arm, kbuild: make "make install" 
not depend on vmlinux)
Merging arc-current/for-curr (da990a4f2d5a ARC: [perf] Fix a few thinkos)
Merging arm-current/fixes (b31459adeab0 ARM: 7917/1: cacheflush: correctly 
limit range of memory region being flushed)
Merging m68k-current/for-linus (77a42796786c m68k: Remove deprecated 
IRQF_DISABLED)
Merging metag-fixes/fixes (3b2f64d00c46 Linux 3.11-rc2)
Merging powerpc-merge/merge (e641eb03ab2b powerpc: Fix up the kdump base cap to 
128M)
Merging sparc/master (1de425c7b271 sparc64: Fix build regression)
Merging net/master (e9c56f8d2f85 net: allwinner: emac: Add missing free_irq)
Merging ipsec/master (239c78db9c41 net: clear local_df when passing skb between 
namespaces)
Merging sound-current/for-linus (ebb93c057dda ALSA: hda - Mute all aamix inputs 
as default)
Merging pci-current/for-linus (4fc9bbf98fd6 PCI: Disable Bus Master only on 
kexec reboot)
Merging wireless/master (bbf807bc0697 ath9k: fix duration calculation for 
non-aggregated packets)
Merging driver-core.current/driver-core-linus (a8b14744429f sysfs: give 
different locking key to regular and bin files)
Merging tty.current/tty-linus (39434abd942c n_tty: Fix missing newline echo)
Merging usb.current/usb-linus (8820784203ac phy: kconfig: add depends on 
"USB_PHY" to OMAP_USB2 and TWL4030_USB)
Merging staging.current/staging-linus (55ef003e4ae6 Merge tag 
'iio-fixes-for-3.13b' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus)
Merging char-misc.current/char-misc-linus (76a9635979e5 mei: add 9 series PCH 
mei device ids)
Merging input-current/for-linus (241ecf1ce528 Input: adxl34x - Fix bug in 
definition of ADXL346_2D_ORIENT)
Merging md-current/for-linus (d47648fcf061 raid5: avoid finding "discard" 
stripe)
Merging crypto-current/master (389a5390583a crypto: scatterwalk - Use 
sg_chain_ptr on chain entries)
Merging ide/master (c2f7d1e103ef ide: pmac: remove unnecessary 
pci_set_drvdata())
Merging dwmw2/master (5950f0803ca9 pcmcia: remove RPX board stuff)
Merging sh-current/sh-fixes-for-linus (44033109e99c SH: Convert out[bwl] macros 
to inline functions)
Merging devicetree-current/devicetree/merge (1931ee143b0a Revert "drivers: of: 
add initialization code for dma

[PATCH 1/2] perf tools: Drop strdup in get_filename_for_perf_kvm().

2013-12-10 Thread Dongsheng Yang

As we need a const char * as a result of get_filename_for_perf_kvm(),
There is no need to use strdup() for the return value.

This patch drop the strdup() to save memory in get_filename_for_perf_kvm().

Signed-off-by: Dongsheng Yang 
---
 tools/perf/builtin-kvm.c | 8 +---
 tools/perf/util/util.c   | 6 +++---
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index c6fa3cb..03bd946 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1712,15 +1712,9 @@ int cmd_kvm(int argc, const char **argv, const char 
*prefix __maybe_unused)
if (!perf_host)
perf_guest = 1;
 
-   if (!file_name) {
+   if (!file_name)
file_name = get_filename_for_perf_kvm();
 
-   if (!file_name) {
-   pr_err("Failed to allocate memory for filename\n");
-   return -ENOMEM;
-   }
-   }
-
if (!strncmp(argv[0], "rec", 3))
return __cmd_record(file_name, argc, argv);
else if (!strncmp(argv[0], "rep", 3))
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 4a57609..e9cb136 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -488,11 +488,11 @@ const char *get_filename_for_perf_kvm(void)
const char *filename;
 
if (perf_host && !perf_guest)
-   filename = strdup("perf.data.host");
+   filename = "perf.data.host";
else if (!perf_host && perf_guest)
-   filename = strdup("perf.data.guest");
+   filename = "perf.data.guest";
else
-   filename = strdup("perf.data.kvm");
+   filename = "perf.data.kvm";
 
return filename;
 }
-- 
1.8.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] perf tools: Change the default filenames for perf kvm diff to perf.data.xxx and perf.data.xxx.old

2013-12-10 Thread Dongsheng Yang

Command perf kvm diff is used to diff perf.data.host and
perf.data.guest by default currently. But it is not a good
default behavior.

Example:
# perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules 
record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.669 MB perf.data.guest (~29207 
samples) ]
# perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules 
record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.669 MB perf.data.guest (~29207 
samples) ]
# perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules 
diff
failed to open perf.data.host: No such file or directory
Failed to open perf.data.host

We should keep the style of 'perf kvm diff' same with 'perf diff'.
It is used to diff files with same type but captured in different
times, perf.data and perf.data.old. So we need to make perf kvm diff
to diff perf.data.guest and perf.data.guest.old as a default behavior.

What's worse, as we have changed the behaviors of perf kvm record,
we can not get a perf.data.host easily. We have to use a --no-guest
to get a perf.data.host, it means we use perf.data.host as a default
input in 'perf kvm diff' is an odd design.

This patch remove the hard coding of default filenames in builtin-diff.c,
and generate the suitable filename from current options in perf kvm diff
command. It makes the default behavior of perf kvm diff be more valuable.

Verification:
# perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules 
record -a sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.669 MB perf.data.guest (~29207 
samples) ]
# perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules 
record -a sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.669 MB perf.data.guest (~29207 
samples) ]
# perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules 
diff
  # Event 'cycles'
  #
  # BaselineDeltaShared Object  
 Symbol
  #   ...  ...  
...
  #
92.00%   [guest.kernel.kallsyms]  [g] rb_insert_color
 7.51%   [guest.kernel.kallsyms]  [g] 
smp_apic_timer_interrupt
 0.48%   +0.60%  [guest.kernel.kallsyms]  [g] apic_timer_interrupt
+16.35%  [guest.kernel.kallsyms]  [g] kvm_clock_get_cycles
+82.56%  [guest.kernel.kallsyms]  [g] 
irqtime_account_process_tick.isra.2

Signed-off-by: Dongsheng Yang 
---
 tools/perf/builtin-diff.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 2a85cc9..a1d59d8 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -17,6 +17,7 @@
 #include "util/symbol.h"
 #include "util/util.h"
 #include "util/data.h"
+#include "linux/string.h"
 
 #include 
 #include 
@@ -1001,8 +1002,28 @@ static int data_init(int argc, const char **argv)
use_default = false;
}
} else if (perf_guest) {
-   defaults[0] = "perf.data.host";
-   defaults[1] = "perf.data.guest";
+   char *file_name;
+   int len, ret;
+
+   file_name = strdup(get_filename_for_perf_kvm());
+   if (!file_name) {
+   pr_err("Failed to allocate memory for filename\n");
+   return -ENOMEM;
+   }
+
+   defaults[0] = strdup(file_name);
+   if (!defaults[0]) {
+   pr_err("Failed to allocate memory for defaults[0]\n");
+   return -ENOMEM;
+   }
+
+   len = strlen(file_name);
+   ret = str_append(_name, , ".old");
+   if (ret) {
+   pr_err("Failed to allocate memory for defaults[1]\n");
+   return -ENOMEM;
+   }
+   defaults[1] = file_name;
}
 
if (sort_compute >= (unsigned int) data__files_cnt) {
-- 
1.8.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm,x86: fix span coverage in e820_all_mapped()

2013-12-10 Thread Xishi Qiu

On 2013/12/11 12:02, H. Peter Anvin wrote:

> On 12/10/2013 07:55 PM, Xishi Qiu wrote:
>>
>> I think there is a problem.
>> e.g.
>> [start, end)=[8, 12), and [A, B)=[0, 10), [B, C)=[10,20),
>> then e820_all_mapped() will return 1, it spans two regions.
>>
> 
> Why is that a problem?
> 

[start, end) should be included in one region ?

Thanks,
Xishi Qiu

>   -hpa
> 
> 
> 
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

From: Joseph

2013-12-10 Thread Joseph Matins

Greetings

I am a US Army officer currently on military assignment in Iraq, I humbly ask 
of your assistance secure and invest some money for me in your country. As a 
matter of fact, I have the sum of US$5 Million which I would like you to help 
me to invest.
I will appreciate it if you can assist me urgently in securing and investing 
the money in your country pending when I will disengage from my military 
assignment. I promise to compensate you with 10% of the funds for your 
assistance while hoping that you assist me as soon as possible. I await your 
urgent response.

Best Regards,
Joseph
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] perf tool: Fix bug in thread__fork

2013-12-10 Thread David Ahern

Commit 1902efe7f for the new comm infra added the wrong check for return
code on thread__set_comm. err == 0 is normal, so don't return at that
point unless err != 0.

Signed-off-by: David Ahern 
Cc: Frederic Weisbecker 
---
 tools/perf/util/thread.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 49eaf1d7d89d..e3948612543e 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -126,7 +126,7 @@ int thread__fork(struct thread *thread, struct thread 
*parent, u64 timestamp)
if (!comm)
return -ENOMEM;
err = thread__set_comm(thread, comm, timestamp);
-   if (!err)
+   if (err)
return err;
thread->comm_set = true;
}
-- 
1.8.3.4 (Apple Git-47)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 02/11] arm: pxa27x: support ICP DAS LP-8x4x

2013-12-10 Thread Sergei Ianovich

On Tue, 2013-12-10 at 22:57 +0100, Arnd Bergmann wrote:
> It should be possible to make it a loadable module, with deferred
> probing etc. You wouldn't use IRQCHIP_DECLARE() for this though,
> but instead have a platform driver that sets up the irq domain.

Thanks for explaning.

> It probably makes sense to have a single driver file for the
> FPGA device that does this, and only split out the other devices
> from it that consume the irqs.

Is drivers/irqchip/ the right place this driver?

I am asking because there is no tristate config options in
drivers/irqchip/Kconfig at the moment.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: XFS security fix never sent to -stable?

2013-12-10 Thread Dave Chinner

On Tue, Dec 10, 2013 at 06:45:54PM -0800, Kees Cook wrote:
> On Tue, Dec 10, 2013 at 6:00 PM, Dave Chinner  wrote:
> > On Tue, Dec 10, 2013 at 08:10:51PM -0500, Josh Boyer wrote:
> >> On Tue, Dec 10, 2013 at 8:03 PM, Dave Chinner  wrote:
> >> > Security processes are not something that should be hidden away in
> >> > it's own private corner - if there's a problem upstream needs to
> >> > take action on, then direct contact with upstream is necessary. We
> >> > need to know about security issues - even ones that are classified
> >> > post-commit as security issues - so we are operating with full
> >> > knowledge of the issues in our code and the impact of our fixes
> >>
> >> Agreed.  I'm going to interpret your comments at being directed to the
> >> general audience because otherwise you're just shooting the messenger
> >> :).
> >
> > Right, they are not aimed at you - they are aimed at those on the
> > security side of the fence. I'm tired of learning about CVEs in XFS
> > code through chinese whispers and/or luck.
> 
> Mostly I try to shield anyone not interested in CVEs from the boring
> process, and try to focus on just getting things marked as needing to
> go into stable. I don't think anyone needs to read the oss-security
> list if they don't want to.

Which is how is should be. ;) All I want is some kind of
notification when a CVE raised for an XFS issue. It may be telling
us something we already known, but if:

a) it has not yet been pushed upstream; or
b) it was not marked for stable kernels at commit time; or
c) don't have a fix for it yet

then it's an indication that we need to pay a little more attention
to this class of problem when we review similar fixes.

> In this case, the fix Dan sent was part of a larger collection of
> security issues reported by Nico. I think the communication error here
> was Dan accidentally forgetting to add the Cc: stable tag. But beyond
> that, it was sent to the xfs list and Cc: to security, so I'm not sure
> it's fair to say it was hidden away. :)

Right - this falls into the above category a) because of that. There
didn't appear to be any urgency because of the level of exposure of
the problem (i.e. need CAP_SYS_ADMIN to trip over it) and the fact
it's been like this for the past 10 years

> Besides the missing Cc: stable tag, what should future patch senders
> do to call attention to an issue being a security problem at the time
> it is being reported?

Well, it may not be known at the time it's considered a security
issue, so I think that the best thing to do is make sure that when
a CVE is actually raise a note is sent to the relevant list just to
indicate 'CVE 1024-3267 has been raised for commit abcd1234
("xfs: knabgraddle the frobnozzle")'.

At least that way everyone - including XFS users - that there is an
issue that they might want to look out for and plan to upgrade their
stable kernels in the not-to-distant future...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones

On Tue, Dec 10, 2013 at 02:48:52PM -0800, Linus Torvalds wrote:
 
 > Dave, can you re-create that trinity run and test that patch?

Looks ok so far, but I'll leave it run overnight to be sure.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm,x86: fix span coverage in e820_all_mapped()

2013-12-10 Thread H. Peter Anvin

On 12/10/2013 07:55 PM, Xishi Qiu wrote:
> 
> I think there is a problem.
> e.g.
> [start, end)=[8, 12), and [A, B)=[0, 10), [B, C)=[10,20),
> then e820_all_mapped() will return 1, it spans two regions.
> 

Why is that a problem?

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm,x86: fix span coverage in e820_all_mapped()

2013-12-10 Thread Xishi Qiu

On 2013/12/11 10:55, H. Peter Anvin wrote:

> On 12/10/2013 05:35 PM, Xishi Qiu wrote:
>>
>> In this case, old code is right, but I discuss in another one that
>> you wrote above.
>>
> 
> So is there a problem or not?  I have lost track...
> 

I think there is a problem.
e.g.
[start, end)=[8, 12), and [A, B)=[0, 10), [B, C)=[10,20),
then e820_all_mapped() will return 1, it spans two regions.

Thanks,
Xishi Qiu

>   -hpa
> 
> 
> 
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 0/5] net: macb updates

2013-12-10 Thread David Miller

From: Soren Brinkmann 
Date: Tue, 10 Dec 2013 16:07:18 -0800

> I'd really like to have Ethernet working for Zynq, so I want to at least
> revive this discussion regarding this patchset. And the first four
> patches should not even be too controversial.
> I didn't change anything compared to my original RFC submission, except
> for a typo in one of the commit messages.
> Handling the tx_clk as optional clock input seems a little bit weird,
> but it works on my Zynq platform and should be compatible with other
> users of macb and their DT descriptions.

Series applied to net-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3] perf tools: Change the default filenames for perf kvm diff to perf.data.xxx and perf.data.xxx.old

2013-12-10 Thread Dongsheng Yang


On 12/10/2013 10:38 PM, David Ahern wrote:

On 12/11/13, 9:30 AM, Dongsheng Yang wrote:

@@ -1001,8 +1002,28 @@ static int data_init(int argc, const char **argv)
  use_default = false;
  }
  } else if (perf_guest) {
-defaults[0] = "perf.data.host";
-defaults[1] = "perf.data.guest";
+char *file_name;
+int len, ret;
+
+file_name = (char *)get_filename_for_perf_kvm();
+if (!file_name) {
+pr_err("Failed to allocate memory for filename\n");
+return -ENOMEM;
+}
+


The need for a typecast should tell you something is wrong. Why is 
get_filename_for_perf_kvm returning a const char * when it is 
allocated memory?




Yes, there is something is wrong I think. It returning const char* 
because I assume the file_name will never be changed. But now, this 
assumption seems outdated.


I will add a new patch to change the return value to char *, as the 
other two patches is already applied.

http://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/commit/?h=perf/core=e1a2b174dbbe08dce12bde9f05f64dbbae652bed

Thanx

Yang

David



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3] perf tools: Change the default filenames for perf kvm diff to perf.data.xxx and perf.data.xxx.old

2013-12-10 Thread David Ahern


On 12/11/13, 9:30 AM, Dongsheng Yang wrote:

@@ -1001,8 +1002,28 @@ static int data_init(int argc, const char **argv)
use_default = false;
}
} else if (perf_guest) {
-   defaults[0] = "perf.data.host";
-   defaults[1] = "perf.data.guest";
+   char *file_name;
+   int len, ret;
+
+   file_name = (char *)get_filename_for_perf_kvm();
+   if (!file_name) {
+   pr_err("Failed to allocate memory for filename\n");
+   return -ENOMEM;
+   }
+


The need for a typecast should tell you something is wrong. Why is 
get_filename_for_perf_kvm returning a const char * when it is allocated 
memory?


David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH tip 0/5] tracing filters with BPF

2013-12-10 Thread Masami Hiramatsu

(2013/12/11 11:32), Alexei Starovoitov wrote:
> On Tue, Dec 10, 2013 at 7:47 AM, Ingo Molnar  wrote:
>>
>> * Alexei Starovoitov  wrote:
>>
 I'm fine if it becomes a requirement to have a vmlinux built with
 DEBUG_INFO to use BPF and have a tool like perf to translate the
 filters. But it that must not replace what the current filters do
 now. That is, it can be an add on, but not a replacement.
>>>
>>> Of course. tracing filters via bpf is an additional tool for kernel
>>> debugging. bpf by itself has use cases beyond tracing.
>>
>> Well, Steve has a point: forcing DEBUG_INFO is a big showstopper for
>> most people.
> 
> there is a misunderstanding here.
> I was saying 'of course' to 'not replace current filter infra'.
> 
> bpf does not depend on debug info.
> That's the key difference between 'perf probe' approach and bpf filters.
> 
> Masami is right that what I was trying to achieve with bpf filters
> is similar to 'perf probe': insert a dynamic probe anywhere
> in the kernel, walk pointers, data structures, print interesting stuff.
> 
> 'perf probe' does it via scanning vmlinux with debug info.
> bpf filters don't need it.
> tools/bpf/trace/*_orig.c examples only depend on linux headers
> in /lib/modules/../build/include/
> Today bpf compiler struct layout is the same as x86_64.
> 
> Tomorrow bpf compiler will have flags to adjust endianness, pointer size, etc
> of the front-end. Similar to -m32/-m64 and -m*-endian flags.
> Neat part is that I don't need to do any work, just enable it properly in
> the bpf backend. From gcc/llvm point of view, bpf is yet another 'hw'
> architecture that compiler is emitting code for.
> So when C code of filter_ex1_orig.c does 'skb->dev', compiler determines
> field offset by looking at /lib/modules/.../include/skbuff.h
> whereas for 'perf probe' 'skb->dev' means walk debug info.

Right, the offset of the data structure can get from the header etc.

However, how would the bpf get the register or stack assignment of
skb itself? In the tracepoint macro, it will be able to get it from
function parameters (it needs a trick, like jprobe does).
I doubt you can do that on kprobes/uprobes without any debuginfo
support. :(

And is it possible to trace a field in a data structure which is
defined locally in somewhere.c ? :) (maybe it's just a corner case)

> Something like: cc1 -mlayout_x86_64 filter.c will produce bpf code that
> walks all data structures in the same way x86_64 does it.
> Even if the user makes a mistake and uses -mlayout_aarch64, it won't crash.
> Note that all -m* flags will be in one compiler. It won't grow any bigger
> because of that. All of it already supported by C front-ends.
> It may sound complex, but really very little code for the bpf backend.
> 
> I didn't look inside systemtap/ktap enough to say how much they're
> relying on presence of debug info to make a comparison.
> 
> I see two main use cases for bpf tracing filters: debugging live kernel
> and collecting stats. Same tricks that [sk]tap do with their maps.
> Or may be some of the stats that 'perf record' collects in userspace
> can be collected by bpf filter in kernel and stored into generic bpf table?
> 
>> Would it be possible to make BFP filters recognize exposed details
>> like the current filters do, without depending on the vmlinux?
> 
> Well, if you say that presence of linux headers is also too much to ask,
> I can hook bpf after probes stored all the args.
> 
> This way current simple filter syntax can move to userspace.
> 'arg1==x || arg2!=y' can be parsed by userspace, bpf code
> generated and fed into kernel. It will be faster than walk_pred_tree(),
> but if we cannot remove 2k lines from trace_events_filter.c
> because of backward compatibility, extra performance becomes
> the only reason to have two different implementations.
> 
> Another use case is to optimize fetch sequences of dynamic probes
> as Masami suggested, but backward compatibility requirement
> would preserve to ways of doing it as well.

The backward compatibility issue is only for the interface, but not
for the implementation, I think. :) The fetch method and filter
pred do already parse the argument into a syntax tree. IMHO, bpf
can optimize that tree to just a simple opcode stream.

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V3] perf tools: Change the default filenames for perf kvm diff to perf.data.xxx and perf.data.xxx.old

2013-12-10 Thread Dongsheng Yang

Command perf kvm diff is used to diff perf.data.host and
perf.data.guest by default currently. But it is not a good
default behavior.

Example:
# perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules 
record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.669 MB perf.data.guest (~29207 
samples) ]
# perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules 
record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.669 MB perf.data.guest (~29207 
samples) ]
# perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules 
diff
failed to open perf.data.host: No such file or directory
Failed to open perf.data.host

We are more frequently using it to diff the perf data files with
the same type but captured in different times, such as
perf.data.guest and perf.data.guest.old.

This patch remove the hard coding of default filenames in builtin-diff.c,
and generate the suitable filename from current options in perf kvm diff
command. It makes the default behavior of perf kvm diff be more valuable.

Verification:
# perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules 
record -a sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.669 MB perf.data.guest (~29207 
samples) ]
# perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules 
record -a sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.669 MB perf.data.guest (~29207 
samples) ]
# perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules 
diff
  # Event 'cycles'
  #
  # BaselineDeltaShared Object  
 Symbol
  #   ...  ...  
...
  #
92.00%   [guest.kernel.kallsyms]  [g] rb_insert_color
 7.51%   [guest.kernel.kallsyms]  [g] 
smp_apic_timer_interrupt
 0.48%   +0.60%  [guest.kernel.kallsyms]  [g] apic_timer_interrupt
+16.35%  [guest.kernel.kallsyms]  [g] kvm_clock_get_cycles
+82.56%  [guest.kernel.kallsyms]  [g] 
irqtime_account_process_tick.isra.2

Signed-off-by: Dongsheng Yang 
---
Changelog:
changes for v3:
*fix a copy-past bug.
changes for v2:
*Add more commit message.

 tools/perf/builtin-diff.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 2a85cc9..8d1b666 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -17,6 +17,7 @@
 #include "util/symbol.h"
 #include "util/util.h"
 #include "util/data.h"
+#include "linux/string.h"
 
 #include 
 #include 
@@ -1001,8 +1002,28 @@ static int data_init(int argc, const char **argv)
use_default = false;
}
} else if (perf_guest) {
-   defaults[0] = "perf.data.host";
-   defaults[1] = "perf.data.guest";
+   char *file_name;
+   int len, ret;
+
+   file_name = (char *)get_filename_for_perf_kvm();
+   if (!file_name) {
+   pr_err("Failed to allocate memory for filename\n");
+   return -ENOMEM;
+   }
+
+   defaults[0] = strdup(file_name);
+   if (!defaults[0]) {
+   pr_err("Failed to allocate memory for defaults[0]\n");
+   return -ENOMEM;
+   }
+
+   len = strlen(file_name);
+   ret = str_append(_name, , ".old");
+   if (ret) {
+   pr_err("Failed to allocate memory for defaults[1]\n");
+   return -ENOMEM;
+   }
+   defaults[1] = file_name;
}
 
if (sort_compute >= (unsigned int) data__files_cnt) {
-- 
1.8.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net V2 2/2] macvtap: signal truncated packets

2013-12-10 Thread Jason Wang

On 12/10/2013 11:29 PM, Michael S. Tsirkin wrote:
> On Tue, Dec 10, 2013 at 01:49:46PM +0800, Jason Wang wrote:
>> > macvtap_put_user() never return a value grater than iov length, this in 
>> > fact
>> > bypasses the truncated checking in macvtap_recvmsg(). Fix this by always
>> > returning the size of packet plus the possible vlan header to let the 
>> > trunca
>> > checking work.
>> > 
>> > Cc: Vlad Yasevich 
>> > Cc: Zhi Yong Wu 
>> > Cc: Michael S. Tsirkin 
>> > Signed-off-by: Jason Wang 
>> > ---
>> > Changes from v1:
>> > - increase total unconditionally
>> > - do not move the structure veth out of the vlan handling block
>> > ---
>> >  drivers/net/macvtap.c | 19 ++-
>> >  1 file changed, 10 insertions(+), 9 deletions(-)
>> > 
>> > diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> > index 957cc5c..ded4b2c 100644
>> > --- a/drivers/net/macvtap.c
>> > +++ b/drivers/net/macvtap.c
>> > @@ -770,7 +770,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue 
>> > *q,
>> >int ret;
>> >int vnet_hdr_len = 0;
>> >int vlan_offset = 0;
>> > -  int copied;
>> > +  int copied, offset;
>> >  
>> >if (q->flags & IFF_VNET_HDR) {
>> >struct virtio_net_hdr vnet_hdr;
>> > @@ -785,7 +785,8 @@ static ssize_t macvtap_put_user(struct macvtap_queue 
>> > *q,
>> >if (memcpy_toiovecend(iv, (void *)_hdr, 0, 
>> > sizeof(vnet_hdr)))
>> >return -EFAULT;
>> >}
>> > -  copied = vnet_hdr_len;
>> > +  offset = copied = vnet_hdr_len;
>> > +  copied += skb->len;
>> >  
>> >if (!vlan_tx_tag_present(skb))
>> >len = min_t(int, skb->len, len);
>> > @@ -800,24 +801,24 @@ static ssize_t macvtap_put_user(struct macvtap_queue 
>> > *q,
>> >  
>> >vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto);
>> >len = min_t(int, skb->len + VLAN_HLEN, len);
>> > +  copied += VLAN_HLEN;
>> >  
>> >copy = min_t(int, vlan_offset, len);
>> > -  ret = skb_copy_datagram_const_iovec(skb, 0, iv, copied, copy);
>> > +  ret = skb_copy_datagram_const_iovec(skb, 0, iv, offset, copy);
>> >len -= copy;
>> > -  copied += copy;
>> > +  offset += copy;
>> >if (ret || !len)
>> >goto done;
>> >  
>> >copy = min_t(int, sizeof(veth), len);
>> > -  ret = memcpy_toiovecend(iv, (void *), copied, copy);
>> > +  ret = memcpy_toiovecend(iv, (void *), offset, copy);
>> >len -= copy;
>> > -  copied += copy;
>> > +  offset += copy;
>> >if (ret || !len)
>> >goto done;
>> >}
>> >  
>> > -  ret = skb_copy_datagram_const_iovec(skb, vlan_offset, iv, copied, len);
>> > -  copied += len;
>> > +  ret = skb_copy_datagram_const_iovec(skb, vlan_offset, iv, offset, len);
>> >  
>> >  done:
>> >return ret ? ret : copied;
> I commented on this already. copied is how much we copied,
> so its value is correct.
> You want to name the new one total_len or something along
> these lines and return it.
>

Ok, to be same with tun, I will use "total" in V3.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net V2 1/2] tun: unbreak truncated packet signalling

2013-12-10 Thread Jason Wang

On 12/10/2013 11:32 PM, Michael S. Tsirkin wrote:
> On Tue, Dec 10, 2013 at 01:49:45PM +0800, Jason Wang wrote:
>> > Commit 6680ec68eff47d36f67b4351bc9836fd6cba9532
>> > (tuntap: hardware vlan tx support) breaks the truncated packet signal by 
>> > nev
>> > return a length greater than iov length in tun_put_user(). This patch fixes
>> > by always return the length of packet plus possible vlan header. Caller can
>> > detect the truncated packet by comparing the return value and the size of 
>> > io
>> > length.
>> > 
>> > Cc: Zhi Yong Wu 
>> > Cc: Michael S. Tsirkin 
>> > Signed-off-by: Vlad Yasevich 
>> > Signed-off-by: Jason Wang 
>> > ---
>> > Changes from v1:
>> > - increase total unconditionally
>> > - do not move veth structure out of the vlan handing block
>> > ---
>> >  drivers/net/tun.c | 16 +---
>> >  1 file changed, 9 insertions(+), 7 deletions(-)
>> > 
>> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> > index e26cbea..cd142134 100644
>> > --- a/drivers/net/tun.c
>> > +++ b/drivers/net/tun.c
>> > @@ -1184,7 +1184,7 @@ static ssize_t tun_put_user(struct tun_struct *tun,
>> >  {
>> >struct tun_pi pi = { 0, skb->protocol };
>> >ssize_t total = 0;
>> > -  int vlan_offset = 0;
>> > +  int vlan_offset = 0, offset;
>> >  
>> >if (!(tun->flags & TUN_NO_PI)) {
>> >if ((len -= sizeof(pi)) < 0)
>> > @@ -1248,6 +1248,8 @@ static ssize_t tun_put_user(struct tun_struct *tun,
>> >total += tun->vnet_hdr_sz;
>> >}
>> >  
>> > +  offset = total;
>> > +  total += skb->len;
>> >if (!vlan_tx_tag_present(skb)) {
>> >len = min_t(int, skb->len, len);
>> >} else {
>> > @@ -1262,24 +1264,24 @@ static ssize_t tun_put_user(struct tun_struct *tun,
>> >  
>> >vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto);
>> >len = min_t(int, skb->len + VLAN_HLEN, len);
>> > +  total += VLAN_HLEN;
>> >  
>> >copy = min_t(int, vlan_offset, len);
>> > -  ret = skb_copy_datagram_const_iovec(skb, 0, iv, total, copy);
>> > +  ret = skb_copy_datagram_const_iovec(skb, 0, iv, offset, copy);
>> >len -= copy;
>> > -  total += copy;
>> > +  offset += copy;
>> >if (ret || !len)
>> >goto done;
>> >  
>> >copy = min_t(int, sizeof(veth), len);
>> > -  ret = memcpy_toiovecend(iv, (void *), total, copy);
>> > +  ret = memcpy_toiovecend(iv, (void *), offset, copy);
>> >len -= copy;
>> > -  total += copy;
>> > +  offset += copy;
>> >if (ret || !len)
>> >goto done;
>> >}
>> >  
>> > -  skb_copy_datagram_const_iovec(skb, vlan_offset, iv, total, len);
>> > -  total += len;
>> > +  skb_copy_datagram_const_iovec(skb, vlan_offset, iv, offset, len);
>> >  
>> >  done:
>> >tun->dev->stats.tx_packets++;
> offset is not descriptive. I would call new variable "copied",
> and change all code to use that, do
> total = copied + skb->len, total is then the total length.
>

Ok, will do it in v3.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net V2 1/2] tun: unbreak truncated packet signalling

2013-12-10 Thread Jason Wang

On 12/11/2013 11:11 AM, David Miller wrote:
> You and Michael are still discussing these changes it seems.
>
> I accidently commited the first version of these patches, but then
> immediately reverted that after I saw the followups.
>
> Let me know when you have something both of you are happy with.
>
> Thanks.

Sure, I will post v3 which should address all comments from Michael.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] net: handle error more gracefully in socketpair()

2013-12-10 Thread David Miller

From: Yann Droneaud 
Date: Mon,  9 Dec 2013 22:42:20 +0100

> This patch makes socketpair() use error paths which do not
> rely on heavy-weight call to sys_close(): it's better to try
> to push the file descriptor to userspace before installing
> the socket file to the file descriptor, so that errors are
> catched earlier and being easier to handle.
> 
> Using sys_close() seems to be the exception, while writing the
> file descriptor before installing it look like it's more or less
> the norm: eg. except for code used in init/, error handling
> involve fput() and put_unused_fd(), but not sys_close().
> 
> This make socketpair() usage of sys_close() quite unusual.
> So it deserves to be replaced by the common pattern relying on
> fput() and put_unused_fd() just like, for example, the one used
> in pipe(2) or recvmsg(2).
> 
> Three distinct error paths are still needed since calling
> fput() on file structure returned by sock_alloc_file() will
> implicitly call sock_release() on the associated socket
> structure.
> 
> Cc: David S. Miller 
> Cc: Al Viro 
> Signed-off-by: Yann Droneaud 
> Link: 
> http://marc.info/?i=1385979146-13825-1-git-send-email-ydrone...@opteya.com

Applied to net-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] tun: remove useless codes in tun_chr_aio_read() and tun_recvmsg()

2013-12-10 Thread Zhi Yong Wu

On Wed, Dec 11, 2013 at 11:19 AM, David Miller  wrote:
> From: Zhi Yong Wu 
> Date: Wed, 11 Dec 2013 11:14:04 +0800
>
>> Only one reminder, since David has committed the two patches, you
>> maybe need to take their impact on your patches into account.
>
> I reverted these changes from net-next.
So rapid:), thanks for your reminder.



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] net-tg3: Initialize REG_BASE_ADDR at PCI config offset 120 to 0

2013-12-10 Thread David Miller

From: Michael Chan 
Date: Tue, 10 Dec 2013 10:49:39 -0800

> On Tue, 2013-12-10 at 13:43 -0500, David Miller wrote:
> 
>> What if the kernel is booted via kexec, and the driver in the kernel
>> we are kexec'ing from left indirect access enabled in MISC_HOST_CTRL?
> 
> That should be ok.  The driver will only use valid register offsets in
> indirect mode during run time, so register 0x78 should point to a valid
> register.  If indirect mode is never used by the driver, it will point
> to zero with this patch.  So register 0x78 will be valid (or zero) in
> the kexec'ed kernel.

Ok, that may be true, but I'd like to consider the much larger issue
at hand.

If the indirect mechanism is enabled, some of the offsets that may be
in there might be value, but would be entirely undesirable to be read
because the read has side effects.

What if the interrupt status register is what gets read if the user
scans config space at just the wrong moment, and therefore an
interrupt gets lost?

I understand that the patch we are discussing is a serious improvement
from the current situation, so I will apply it and queue it up for
-stable.

However I think we need to do something reasonable to prevent the
kinds of situations I described above.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 06/13] staging/lustre/llite: remove ll_d_root_ops

2013-12-10 Thread Peng Tao

On Wed, Dec 11, 2013 at 10:32 AM, Greg Kroah-Hartman
 wrote:
> On Mon, Dec 09, 2013 at 10:56:58PM +0800, Peng Tao wrote:
>> From: Lai Siyao 
>>
>> Mnt root dentry will never be revalidated, but its d_op->d_compare
>> will be called for its children, to simplify code, we use the same
>> ll_d_ops as normal dentries.
>> But its attribute may be invalid before access, this won't cause
>> any issue because it always exists, and the only operation depends
>> on its attribute is .permission, which has revalidated it in lustre
>> code.
>>
>> So it's okay to remove ll_d_root_ops, and remove unnecessary checks
>> in lookup/revalidate/statahead.
>
> This breaks the build.
>
> It's as if you aren't testing this.
>
err... I tested the series as a whole but it turns out this breakage
got fixed in the next patch. Looks like I should really have tested
every single patch instead.

sorry...

> I'm really tired of this, I really don't want to take any more patches
> from you for a while, as I dread them every time I see a series sent by
> you for this codebase.
>
I'll give you a break then... sorry for getting you annoyed.

Thanks,
Tao
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V2] perf tools: Change the default filenames for perf kvm diff to perf.data.xxx and perf.data.xxx.old

2013-12-10 Thread Dongsheng Yang

Command perf kvm diff is used to diff perf.data.host and
perf.data.guest by default currently. But it is not a good
default behavior.

Example:
# perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules 
record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.669 MB perf.data.guest (~29207 
samples) ]
# perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules 
record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.669 MB perf.data.guest (~29207 
samples) ]
# perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules 
diff
failed to open perf.data.host: No such file or directory
Failed to open perf.data.host

We should keep the style of 'perf kvm diff' same with 'perf diff'.
It is used to diff files with same type but captured in different
times, perf.data and perf.data.old. So we need to make perf kvm diff
to diff perf.data.guest and perf.data.guest.old as a default behavior.

What's worse, as we have changed the behaviors of perf kvm record,
we can not get a perf.data.host easily. We have to use a --no-guest
to get a perf.data.host, it means we use perf.data.host as a default
input in 'perf kvm diff' is an odd design.

This patch remove the hard coding of default filenames in builtin-diff.c,
and generate the suitable filename from current options in perf kvm diff
command. It makes the default behavior of perf kvm diff be more valuable.

Verification:
# perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules 
record -a sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.669 MB perf.data.guest (~29207 
samples) ]
# perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules 
record -a sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.669 MB perf.data.guest (~29207 
samples) ]
# perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules 
diff
  # Event 'cycles'
  #
  # BaselineDeltaShared Object  
 Symbol
  #   ...  ...  
...
  #
92.00%   [guest.kernel.kallsyms]  [g] rb_insert_color
 7.51%   [guest.kernel.kallsyms]  [g] 
smp_apic_timer_interrupt
 0.48%   +0.60%  [guest.kernel.kallsyms]  [g] apic_timer_interrupt
+16.35%  [guest.kernel.kallsyms]  [g] kvm_clock_get_cycles
+82.56%  [guest.kernel.kallsyms]  [g] 
irqtime_account_process_tick.isra.2

Signed-off-by: Dongsheng Yang 
---
Change since v1:
Add more commit message.

 tools/perf/builtin-diff.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 2a85cc9..6a32213 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -17,6 +17,7 @@
 #include "util/symbol.h"
 #include "util/util.h"
 #include "util/data.h"
+#include "linux/string.h"
 
 #include 
 #include 
@@ -1001,8 +1002,28 @@ static int data_init(int argc, const char **argv)
use_default = false;
}
} else if (perf_guest) {
-   defaults[0] = "perf.data.host";
-   defaults[1] = "perf.data.guest";
+   char *file_name;
+   int len, ret;
+
+   file_name = (char *)get_filename_for_perf_kvm();
+   if (!file_name) {
+   pr_err("Failed to allocate memory for filename\n");
+   return -ENOMEM;
+   }
+
+   defaults[0] = strdup(file_name);
+   if (!file_name) {
+   pr_err("Failed to allocate memory for defaults[0]\n");
+   return -ENOMEM;
+   }
+
+   len = strlen(file_name);
+   ret = str_append(_name, , ".old");
+   if (ret) {
+   pr_err("Failed to allocate memory for defaults[1]\n");
+   return -ENOMEM;
+   }
+   defaults[1] = file_name;
}
 
if (sort_compute >= (unsigned int) data__files_cnt) {
-- 
1.8.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 50 Watt idle power regression bisected to Linux-3.10

2013-12-10 Thread Mike Galbraith

Alakazam..

pk cor CPU%c0  GHz  TSC SMI%c1%c3%c6 CTMP   %pc3   %pc6
 0.17 2.01 2.26   0   0.02  99.82   0.00   49  99.55   0.00
 0   0   0   0.95 1.45 2.26   2   0.43  98.62   0.00   48  98.48   0.00
 1   0   8   0.24 1.99 2.26   2   0.02  99.75   0.00   38  99.68   0.00
 2   0  16   0.17 1.97 2.26   2   0.02  99.81   0.00   40  99.65   0.00
 3   0  24   0.18 1.92 2.26   2   0.02  99.80   0.00   41  99.68   0.00
 4   0  32   0.18 1.95 2.26   2   0.02  99.80   0.00   36  99.66   0.00
 5   0  40   0.15 1.85 2.26   0   0.03  99.83   0.00   35  99.70   0.00
 6   0  48   0.10 1.83 2.26   0   0.01  99.89   0.00   36  99.79   0.00
 7   0  56   0.10 1.97 2.26   0   0.01  99.89   0.00   43  99.75   0.00

Yup, magical gremlin repellent works on 8 socket DL980 too.

> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index a55e68f..b6399af 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -118,7 +118,7 @@ int cpuidle_idle_call(void)
>   struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
>   struct cpuidle_driver *drv;
>   int next_state, entered_state;
> - bool broadcast;
> + bool broadcast, coupled = false;
>  
>   if (off || !initialized)
>   return -ENODEV;
> @@ -147,15 +147,18 @@ int cpuidle_idle_call(void)
>   if (broadcast)
>   clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, >cpu);
>  
> - if (cpuidle_state_is_coupled(dev, drv, next_state))
> + if (cpuidle_state_is_coupled(dev, drv, next_state)) {
>   entered_state = cpuidle_enter_state_coupled(dev, drv,
>   next_state);
> - else
> + coupled = true;
> + } else
>   entered_state = cpuidle_enter_state(dev, drv, next_state);
>  
>   if (broadcast)
>   clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, >cpu);
>  
> + trace_printk("coupled %d: entered state %d\n", coupled, entered_state);
> +
>   trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
>  
>   /* give the governor an opportunity to reflect on the outcome */
> diff --git a/drivers/cpuidle/governors/menu.c 
> b/drivers/cpuidle/governors/menu.c
> index cf7f2f0..9de7ee2 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -309,7 +309,6 @@ static int menu_select(struct cpuidle_driver *drv, struct 
> cpuidle_device *dev)
>   data->expected_us =
>   t.tv_sec * USEC_PER_SEC + t.tv_nsec / NSEC_PER_USEC;
>  
> -
>   data->bucket = which_bucket(data->expected_us);
>  
>   multiplier = performance_multiplier();
> @@ -330,6 +329,9 @@ static int menu_select(struct cpuidle_driver *drv, struct 
> cpuidle_device *dev)
>data->correction_factor[data->bucket],
>RESOLUTION * DECAY);
>  
> + trace_printk("expected_us: %d predicted_us: %d\n", data->expected_us,
> +  data->predicted_us);
> +
>   get_typical_interval(data);
>  
>   /*
> @@ -349,10 +351,15 @@ static int menu_select(struct cpuidle_driver *drv, 
> struct cpuidle_device *dev)
>   struct cpuidle_state *s = >states[i];
>   struct cpuidle_state_usage *su = >states_usage[i];
>  
> + trace_printk("Trying idle state %d s->dis %d su->dis %d\n", i,
> +  s->disabled, su->disable);
>   if (s->disabled || su->disable)
>   continue;
> + trace_printk("residency %d\n", s->target_residency);
>   if (s->target_residency > data->predicted_us)
>   continue;
> + trace_printk("exit_latency %d vs. %d multiplier %d\n",
> +  s->exit_latency, latency_req, multiplier);
>   if (s->exit_latency > latency_req)
>   continue;
>   if (s->exit_latency * multiplier > data->predicted_us)
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1 9/9] staging: android: binder: Add binder compat layer

2013-12-10 Thread Arve Hjønnevåg

On Mon, Dec 9, 2013 at 7:01 PM, Octavian Purdila  wrote:
> On Thu, Dec 5, 2013 at 4:02 AM, Arve Hjønnevåg  wrote:
>> On Wed, Dec 4, 2013 at 2:02 PM, Greg KH  wrote:
>>> On Wed, Dec 04, 2013 at 01:55:34PM -0800, Colin Cross wrote:
 On Wed, Dec 4, 2013 at 1:43 PM, Greg KH  wrote:
 > On Wed, Dec 04, 2013 at 12:46:42PM -0800, Colin Cross wrote:
 >> On Wed, Dec 4, 2013 at 10:35 AM, Greg KH  
 >> wrote:
 >> 
 >>
 >> > And finally, is this all really needed?  Why not just fix the 
 >> > structures
 >> > to be "correct", and then fix userspace to use the correct structures 
 >> > as
 >> > well, thereby not needing a compat layer at all?
 >>
 >> Some of the binder ioctls take userspace pointers.  Are you suggesting
 >> storing those pointers in a __u64 to avoid having to have a
 >> compat_ioctl?
 >
 > Yes, that's the best way to solve the issue, right?

 It's the least code, but in exchange you lose all the type safety and
 warnings when copying in and out of the pointers, as well as sparse
 checking on the __user attribute.
>>>
>>> Not if you make the cast right at the beginning, when you first "touch"
>>> the data, but yes, it does take some of the type saftey away, at the
>>> expense of simpler code to mess up :)
>>>
 That doesn't seem like a good tradeoff to me.  In addition it requires
 modifying the existing heavily used 32 bit api, which means a
 mostly-equivalent compat layer added in libbinder to support old
 kernels.
>>>
>>> Wait, I thought that libbinder would have to be changed anyway here, to
>>> handle 64bit kernels (in both 32 and 64bit userspace).  Since you are
>>> already changing it, why not just "do it correctly"?
>>>
>>
>> Yes libbinder will have to be changed to support calls between 32 bit
>> and 64 bit processes, so I don't see much value in a patchset that
>> only supports all 32 bit or all 64 bit processes. If user space is
>> fixed to use 64 bit pointers on a 64 bit system, then much of the code
>> added in this patchset becomes useless (and probably harmful as it
>> appears to prevent 32 bit processes from communicating with 64 bit
>> processes).
>>
>
> Hi,
>
> Coincidentally, I have been working on a compat layer myself lately.
> It is implemented in the binder driver with no changes in libbinder
> and it includes support for mixed mode.
>
> Unless you think that the kernel compat layer is a dead end, I can
> post the patches here for review. IMO the kernel compat layer gives
> you greater flexibility because it keeps the 32bit ABI unchanged. Of
> course it comes with the price of increased complexity.
>
> Thanks,
> Tavi

Assuming you are talking about a kernel compat layer that translates
the flat_binder_object structs as they pass between 32 bit and 64 bit
processes, that will not always work. The data portion of the message
sometimes contain size values that are invisible to the kernel, but
these values will be wrong if the kernel move data to make room for a
different size flat_binder_object.

-- 
Arve Hjønnevåg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] tun: remove useless codes in tun_chr_aio_read() and tun_recvmsg()

2013-12-10 Thread David Miller

From: Zhi Yong Wu 
Date: Wed, 11 Dec 2013 11:14:04 +0800

> Only one reminder, since David has committed the two patches, you
> maybe need to take their impact on your patches into account.

I reverted these changes from net-next.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2256 matches

Mail list logo