Re: [PATCH] sched/numa: Restore sched feature NUMA to its earlier avatar.

2015-07-08 Thread Rik van Riel
On 07/08/2015 09:20 AM, Srikar Dronamraju wrote:
> In commit:8a9e62a "sched/numa: Prefer NUMA hotness over cache hotness"
> sched feature NUMA was always set to true. However this sched feature was
> suppose to be enabled on NUMA boxes only thro set_numabalancing_state().
> 
> To get back to the above behaviour, bring back NUMA_FAVOUR_HIGHER feature.
> Signed-off-by: Srikar Dronamraju 

Reviewed-by: Rik van Riel 

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hwmon: (f71882fg): Add support for f81768d

2015-07-08 Thread George Joseph
On Tue, Jul 7, 2015 at 8:34 PM, Guenter Roeck  wrote:
> Hi George,
>
> On 07/07/2015 06:16 PM, George Joseph wrote:
>>
>> Add f81768d (id 0x1210) currently found on Jetway motherboards.
>> It has 11 voltages but otherwise needed no special handling
>> in this driver.
>>
>> Signed-off-by: George Joseph 
>> ---
>>   drivers/hwmon/f71882fg.c | 44
>> 
>>   1 file changed, 28 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/hwmon/f71882fg.c b/drivers/hwmon/f71882fg.c
>> index e4ff21f..4c9ec5c 100644
>> --- a/drivers/hwmon/f71882fg.c
>> +++ b/drivers/hwmon/f71882fg.c
>> @@ -59,6 +59,7 @@
>>   #define SIO_F71889E_ID0x0909  /* Chipset ID */
>>   #define SIO_F71889A_ID0x1005  /* Chipset ID */
>>   #define SIO_F8000_ID  0x0581  /* Chipset ID */
>> +#define SIO_F81768D_ID 0x1210  /* Chipset ID */
>>   #define SIO_F81865_ID 0x0704  /* Chipset ID */
>>   #define SIO_F81866_ID 0x1010  /* Chipset ID */
>>
>> @@ -107,7 +108,7 @@
>>
>>   #define   F71882FG_REG_START  0x01
>>
>> -#define F71882FG_MAX_INS   10
>> +#define F71882FG_MAX_INS   11
>>
>>   #define FAN_MIN_DETECT366 /* Lowest detectable
>> fanspeed */
>>
>> @@ -116,7 +117,7 @@ module_param(force_id, ushort, 0);
>>   MODULE_PARM_DESC(force_id, "Override the detected device ID");
>>
>>   enum chips { f71808e, f71808a, f71858fg, f71862fg, f71868a, f71869,
>> f71869a,
>> -   f71882fg, f71889fg, f71889ed, f71889a, f8000, f81865f, f81866a};
>> +   f71882fg, f71889fg, f71889ed, f71889a, f8000, f81768d, f81865f,
>> f81866a};
>
>
> line longer than 80 characters. No need to resend, I fixed it up.

Really?  I didn't find any.  Oh well, thanks!

>
> Applied to -next.
>
> Thanks,
> Guenter
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 5/5] usb: gadget: atmel_usba_udc: add missing ret value check

2015-07-08 Thread Michal Nazarewicz
On Wed, Jul 08 2015, Robert Baldyga wrote:
> Add missing return value check. In case of error print debug message
> and return error code.
>
> Signed-off-by: Robert Baldyga 
> Acked-by: Nicolas Ferre 

Acked-by: Michal Nazarewicz 

> ---
>  drivers/usb/gadget/udc/atmel_usba_udc.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/usb/gadget/udc/atmel_usba_udc.c 
> b/drivers/usb/gadget/udc/atmel_usba_udc.c
> index 4095cce0..37d414e 100644
> --- a/drivers/usb/gadget/udc/atmel_usba_udc.c
> +++ b/drivers/usb/gadget/udc/atmel_usba_udc.c
> @@ -1989,6 +1989,10 @@ static struct usba_ep * atmel_udc_of_init(struct 
> platform_device *pdev,
>   ep->can_isoc = of_property_read_bool(pp, "atmel,can-isoc");
>  
>   ret = of_property_read_string(pp, "name", );
> + if (ret) {
> + dev_err(>dev, "of_probe: name error(%d)\n", ret);
> + goto err;
> + }
>   ep->ep.name = name;
>  
>   ep->ep_regs = udc->regs + USBA_EPT_BASE(i);
> -- 
> 1.9.1
>

-- 
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/5] staging: emxx_udc: add missing usb_ep_set_maxpacket_limit()

2015-07-08 Thread Michal Nazarewicz
On Wed, Jul 08 2015, Robert Baldyga wrote:
> Since maxpacket_limit was introduced all UDC drivers should use
> usb_ep_set_maxpacket_limit() function instead of setting maxpacket value
> manually. ep.maxpacket_limit contains actual maximum maxpacket value
> supported by hardware which is needed by epautoconf.
>
> Signed-off-by: Robert Baldyga 

Acked-by: Michal Nazarewicz 

> ---
>  drivers/staging/emxx_udc/emxx_udc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/staging/emxx_udc/emxx_udc.c 
> b/drivers/staging/emxx_udc/emxx_udc.c
> index 4178d96..3b7aa36 100644
> --- a/drivers/staging/emxx_udc/emxx_udc.c
> +++ b/drivers/staging/emxx_udc/emxx_udc.c
> @@ -3203,7 +3203,8 @@ static void __init nbu2ss_drv_ep_init(struct nbu2ss_udc 
> *udc)
>   ep->ep.name = gp_ep_name[i];
>   ep->ep.ops = _ep_ops;
>  
> - ep->ep.maxpacket = (i == 0 ? EP0_PACKETSIZE : EP_PACKETSIZE);
> + usb_ep_set_maxpacket_limit(>ep,
> + i == 0 ? EP0_PACKETSIZE : EP_PACKETSIZE);

I would break line just after ( like so:

+   usb_ep_set_maxpacket_limit(
+   >ep, i ? EP_PACKETSIZE : EP0_PACKETSIZE);

>  
>   list_add_tail(>ep.ep_list, >gadget.ep_list);
>   INIT_LIST_HEAD(>queue);
> -- 
> 1.9.1
>

-- 
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 3/5] usb: isp1760: udc: add missing usb_ep_set_maxpacket_limit()

2015-07-08 Thread Michal Nazarewicz
On Wed, Jul 08 2015, Robert Baldyga wrote:
> Since maxpacket_limit was introduced all UDC drivers should use
> usb_ep_set_maxpacket_limit() function instead of setting maxpacket value
> manually. ep.maxpacket_limit contains actual maximum maxpacket value
> supported by hardware which is needed by epautoconf.
>
> Signed-off-by: Robert Baldyga 

Acked-by: Michal Nazarewicz 

> ---
>  drivers/usb/isp1760/isp1760-udc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/usb/isp1760/isp1760-udc.c 
> b/drivers/usb/isp1760/isp1760-udc.c
> index 18ebf5b1..3699962 100644
> --- a/drivers/usb/isp1760/isp1760-udc.c
> +++ b/drivers/usb/isp1760/isp1760-udc.c
> @@ -1382,11 +1382,11 @@ static void isp1760_udc_init_eps(struct isp1760_udc 
> *udc)
>* This fits in the 8kB FIFO without double-buffering.
>*/
>   if (ep_num == 0) {
> - ep->ep.maxpacket = 64;
> + usb_ep_set_maxpacket_limit(>ep, 64);
>   ep->maxpacket = 64;
>   udc->gadget.ep0 = >ep;
>   } else {
> - ep->ep.maxpacket = 512;
> + usb_ep_set_maxpacket_limit(>ep, 512);
>   ep->maxpacket = 0;
>   list_add_tail(>ep.ep_list, >gadget.ep_list);
>   }
> -- 
> 1.9.1
>

-- 
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Clarification for the use of additional fields in the message body

2015-07-08 Thread SF Markus Elfring
> Is there truly no way to simplify that process?

I see some software development possibilities which could improve
the communication with high volume mailing lists.


> You should be sending the patches directly with SMTP using git-send-email,

This tool is also fine for the publishing of a lot of patches.


> if you're not, then you're making things overly complicated for yourself.

But I prefer a graphical user interface for my mail handling so far.


> Having a feature doesn't mean that it should be used.

Does any of the "questionable functionality" get occasionally overlooked
a bit too often?

Regards,
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/5] usb: gadget: midi: avoid redundant f_midi_set_alt() call

2015-07-08 Thread Michal Nazarewicz
On Wed, Jul 08 2015, Robert Baldyga wrote:
> Function midi registers two interfaces with single set_alt() function
> which means that f_midi_set_alt() is called twice when configuration
> is set. That means that endpoint initialization and ep request allocation
> is done two times. To avoid this problem we do such things only once,
> for interface number 1 (MIDI Streaming interface).
>
> Signed-off-by: Robert Baldyga 

Acked-by: Michal Nazarewicz 

> ---
>  drivers/usb/gadget/function/f_midi.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/usb/gadget/function/f_midi.c 
> b/drivers/usb/gadget/function/f_midi.c
> index 6316aa5..4cef222 100644
> --- a/drivers/usb/gadget/function/f_midi.c
> +++ b/drivers/usb/gadget/function/f_midi.c
> @@ -329,6 +329,10 @@ static int f_midi_set_alt(struct usb_function *f, 
> unsigned intf, unsigned alt)
>   unsigned i;
>   int err;
>  
> + /* For Control Device interface we do nothing */
> + if (intf == 0)
> + return 0;
> +
>   err = f_midi_start_ep(midi, f, midi->in_ep);
>   if (err)
>   return err;
> -- 
> 1.9.1
>

-- 
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] MIPS, IRQCHIP: Move i8259 irqchip driver to drivers/irqchip.

2015-07-08 Thread Sergei Shtylyov

Hello.

On 7/8/2015 3:46 PM, Ralf Baechle wrote:


  arch/mips/Kconfig   |   4 -
  arch/mips/kernel/Makefile   |   1 -
  arch/mips/kernel/i8259.c| 384 
  drivers/irqchip/Kconfig |   4 +
  drivers/irqchip/Makefile|   1 +
  drivers/irqchip/irq-i8259.c | 383 +++
  6 files changed, 388 insertions(+), 389 deletions(-)


[...]


diff --git a/drivers/irqchip/irq-i8259.c b/drivers/irqchip/irq-i8259.c
new file mode 100644
index 000..a29638a
--- /dev/null
+++ b/drivers/irqchip/irq-i8259.c
@@ -0,0 +1,383 @@
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Code to handle x86 style IRQs plus some generic interrupt stuff.
+ *
+ * Copyright (C) 1992 Linus Torvalds
+ * Copyright (C) 1994 - 2000 Ralf Baechle
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+/*
+ * This is the 'legacy' 8259A Programmable Interrupt Controller,
+ * present in the majority of PC/AT boxes.
+ * plus some generic x86 specific things if generic specifics makes
+ * any sense at all.
+ * this file should become arch/i386/kernel/irq.c when the old irq.c
+ * moves to arch independent land


   This comment doesn't make sense anymore, does it?


+static struct irq_chip i8259A_chip = {
+   .name   = "XT-PIC",


   This name is wrong, wrong, wrong. XT only had single 8259 (and is not 
supported by Linux anyway) while jhere we drive the AT specific two cascaded 
8259s (which is just wrong in my opinion as well).



+   .irq_mask   = disable_8259A_irq,
+   .irq_disable= disable_8259A_irq,
+   .irq_unmask = enable_8259A_irq,
+   .irq_mask_ack   = mask_and_ack_8259A,


   I have always thought 8259 was the "fast-EOI" class interrupt chip, I've 
never quite understood all this mask-and-ACK type handling for 8259...


[...]

+/*
+ * Careful! The 8259A is a fragile beast, it pretty
+ * much _has_ to be done exactly like this (mask it
+ * first, _then_ send the EOI, and the order of EOI
+ * to the two 8259s is important!
+ */
+static void mask_and_ack_8259A(struct irq_data *d)
+{

[...]

+handle_real_irq:
+   if (irq & 8) {
+   inb(PIC_SLAVE_IMR); /* DUMMY - (do we need this?) */


   Hardly.


+   outb(cached_slave_mask, PIC_SLAVE_IMR);
+   outb(0x60+(irq&7), PIC_SLAVE_CMD);/* 'Specific EOI' to slave */


  Need spaces around ops.


+   outb(0x60+PIC_CASCADE_IR, PIC_MASTER_CMD); /* 'Specific EOI' to 
master-IRQ2 */
+   } else {
+   inb(PIC_MASTER_IMR);/* DUMMY - (do we need this?) */
+   outb(cached_master_mask, PIC_MASTER_IMR);
+   outb(0x60+irq, PIC_MASTER_CMD); /* 'Specific EOI to master */


   Same here...


+   }
+   raw_spin_unlock_irqrestore(_lock, flags);
+   return;



+   {
+   static int spurious_irq_mask;
+   /*
+* At this point we can be sure the IRQ is spurious,
+* lets ACK and report it. [once per IRQ]


   There's no point in ACKing spurious interrupt, if my memory serves. The 
in-srvice register doesn't have the bit set, so no need to clear it.



+*/
+   if (!(spurious_irq_mask & irqmask)) {
+   printk(KERN_DEBUG "spurious 8259A interrupt: IRQ%d.\n", 
irq);
+   spurious_irq_mask |= irqmask;
+   }
+   atomic_inc(_err_count);
+   /*
+* Theoretically we do not have to handle this IRQ,
+* but in Linux this does not cause problems and is
+* simpler for us.
+*/
+   goto handle_real_irq;


   Hum... only because it doesn't cause issues?

WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/5] usb: gadget: ffs: call functionfs_unbind() if _ffs_func_bind() fails

2015-07-08 Thread Michal Nazarewicz
On Wed, Jul 08 2015, Robert Baldyga wrote:
> Function ffs_do_functionfs_bind() calls functionfs_bind() which allocates
> usb request and increments refcounts. These things needs to be cleaned
> up by if further steps of initialization fail by calling functionfs_unbind().
>
> Signed-off-by: Robert Baldyga 

Acked-by: Michal Nazarewicz 

> ---
>  drivers/usb/gadget/function/f_fs.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/usb/gadget/function/f_fs.c 
> b/drivers/usb/gadget/function/f_fs.c
> index 6e7be91..6516187 100644
> --- a/drivers/usb/gadget/function/f_fs.c
> +++ b/drivers/usb/gadget/function/f_fs.c
> @@ -2897,11 +2897,18 @@ static int ffs_func_bind(struct usb_configuration *c,
>struct usb_function *f)
>  {
>   struct f_fs_opts *ffs_opts = ffs_do_functionfs_bind(f, c);
> + struct ffs_function *func = ffs_func_from_usb(f);
> + int ret;
>  
>   if (IS_ERR(ffs_opts))
>   return PTR_ERR(ffs_opts);
>  
> - return _ffs_func_bind(c, f);
> + ret = _ffs_func_bind(c, f);
> + if (ret)
> + if(!--ffs_opts->refcnt)
> + functionfs_unbind(func->ffs);
> +
> + return ret;
>  }
>  
>  
> -- 
> 1.9.1
>

-- 
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 15/20] staging/lustre/libcfs: get rid of debugfs/lnet/console_backoff

2015-07-08 Thread Oleg Drokin

On Jul 8, 2015, at 4:28 AM, Dan Carpenter wrote:

> On Mon, Jul 06, 2015 at 12:48:53PM -0400, gr...@linuxhacker.ru wrote:
>> +static int param_set_uint_minmax(const char *val,
>> + const struct kernel_param *kp,
>> + unsigned int min, unsigned int max)
>> +{
>> +unsigned int num;
>> +int ret;
>> +
>> +if (!val)
>> +return -EINVAL;
>> +ret = kstrtouint(val, 0, );
>> +if (ret == -EINVAL || num < min || num > max)
>  ^^^
> Smatch is smart enough to know that "num" can be uninitialized here on
> some paths.  It doesn't generate a warning yet because a lot of the
> kernel has error paths where we mostly assume things won't fail.
> 
> It should probably be:
> 
>   ret = kstrtouint(val, 0, );
>   if (ret)
>   return ret;
>   if (num < min || num > max)
>   return -EINVAL;

Hm, indeed kstrtouint can return errors other than -EINVAL.
In reality this code comes from net/sunrpc/xprtsock.c
and I failed to see the problem there while copying.

This also suggests that the type is might be enough in demand
to make it generic so that people don't reimplement it themselves?

Thanks!

Bye,
Oleg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 4/8] lsm: smack: smack callbacks for kdbus security hooks

2015-07-08 Thread Stephen Smalley
On 07/08/2015 06:25 AM, Paul Osmialowski wrote:
> This adds implementation of three smack callbacks sitting behind kdbus
> security hooks as proposed by Karol Lewandowski.
> 
> Originates from:
> 
> git://git.infradead.org/users/pcmoore/selinux (branch: working-kdbus)
> commit: fc3505d058c001fe72a6f66b833e0be5b2d118f3
> 
> https://github.com/lmctl/linux.git (branch: kdbus-lsm-v4.for-systemd-v212)
> commit: 103c26fd27d1ec8c32d85dd3d85681f936ac66fb
> 
> Signed-off-by: Karol Lewandowski 
> Signed-off-by: Paul Osmialowski 
> ---
>  security/smack/smack_lsm.c | 68 
> ++
>  1 file changed, 68 insertions(+)
> 
> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
> index a143328..033b756 100644
> --- a/security/smack/smack_lsm.c
> +++ b/security/smack/smack_lsm.c
> @@ -41,6 +41,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "smack.h"
>  
>  #define TRANS_TRUE   "TRUE"
> @@ -3336,6 +3337,69 @@ static int smack_setprocattr(struct task_struct *p, 
> char *name,
>  }
>  
>  /**
> + * smack_kdbus_connect - Set the security blob for a KDBus connection
> + * @conn: the connection
> + * @secctx: smack label
> + * @seclen: smack label length
> + *
> + * Returns 0
> + */
> +static int smack_kdbus_connect(struct kdbus_conn *conn,
> +const char *secctx, u32 seclen)
> +{
> + struct smack_known *skp;
> +
> + if (secctx && seclen > 0)
> + skp = smk_import_entry(secctx, seclen);
> + else
> + skp = smk_of_current();
> + conn->security = skp;
> +
> + return 0;
> +}
> +
> +/**
> + * smack_kdbus_conn_free - Clear the security blob for a KDBus connection
> + * @conn: the connection
> + *
> + * Clears the blob pointer
> + */
> +static void smack_kdbus_conn_free(struct kdbus_conn *conn)
> +{
> + conn->security = NULL;
> +}
> +
> +/**
> + * smack_kdbus_talk - Smack access on KDBus
> + * @src: source kdbus connection
> + * @dst: destination kdbus connection
> + *
> + * Return 0 if a subject with the smack of sock could access
> + * an object with the smack of other, otherwise an error code
> + */
> +static int smack_kdbus_talk(const struct kdbus_conn *src,
> + const struct kdbus_conn *dst)
> +{
> + struct smk_audit_info ad;
> + struct smack_known *sskp = src->security;
> + struct smack_known *dskp = dst->security;
> + int ret;
> +
> + BUG_ON(sskp == NULL);
> + BUG_ON(dskp == NULL);
> +
> + if (smack_privileged(CAP_MAC_OVERRIDE))
> + return 0;
> +
> + smk_ad_init(, __func__, LSM_AUDIT_DATA_NONE);
> +
> + ret = smk_access(sskp, dskp, MAY_WRITE, );
> + if (ret)
> + return ret;
> + return 0;
> +}
> +
> +/**
>   * smack_unix_stream_connect - Smack access on UDS
>   * @sock: one sock
>   * @other: the other sock
> @@ -4393,6 +4457,10 @@ struct security_hook_list smack_hooks[] = {
>   LSM_HOOK_INIT(inode_notifysecctx, smack_inode_notifysecctx),
>   LSM_HOOK_INIT(inode_setsecctx, smack_inode_setsecctx),
>   LSM_HOOK_INIT(inode_getsecctx, smack_inode_getsecctx),
> +
> + LSM_HOOK_INIT(kdbus_connect, smack_kdbus_connect),
> + LSM_HOOK_INIT(kdbus_conn_free, smack_kdbus_conn_free),
> + LSM_HOOK_INIT(kdbus_talk, smack_kdbus_talk),
>  };

If Smack only truly needs 3 hooks, then it begs the question of why
there are so many other hooks defined.  Are the other hooks just to
support finer-grained distinctions, or is Smack's coverage incomplete?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 08/16] staging: vme_user: provide DMA functionality

2015-07-08 Thread Martyn Welch

On 07/07/15 13:51, Alessio Igor Bogani wrote:

Current VME stack links windows not to the boards, but to device drivers.
>Driver
>could potentially minimise window usage within it’s scope (any sort of
>window
>reusing, like mapping whole A16 once to be used with all boards), but this
>won’t
>work across multiple drivers. Even if all of your drivers are window-wise
>economic,
>they will still need some amount of windows per each driver. Not that we
>have that
>many kernel drivers...

Yes you can share a window/image between all boards of the same type
(in effect we are porting our drivers in this way)*but*  it isn't the
expected way to work (see Documentation/vme_api.txt struct
vme_driver's probe() and match() functions and the GE PIO2 VME
driver).


I think it's perfectly valid to use a single window to dynamically map 
to the address space belonging to one of a number of devices supported 
by a single driver. I think this is almost preferable to mapping a large 
window over a large portion of the VME address space to drive a number 
of devices as (depending on there spacing in the VME address space) the 
latter could cause issues with filling available PCI address space. 
Admittedly this is more of a problem on 32-bit systems, but...


--
Martyn Welch (Lead Software Engineer)  | Registered in England and Wales
GE Intelligent Platforms   | (3828642) at 100 Barbirolli Square
T +44(0)1327322748 | Manchester, M2 3AB
E martyn.we...@ge.com  | VAT:GB 927559189
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 3/3] dt-binding:Documents the mbigen bindings

2015-07-08 Thread Mark Rutland
On Mon, Jul 06, 2015 at 08:09:08AM +0100, Ma Jun wrote:
> Add the mbigen msi interrupt controller bindings document
> 
> Change in v3:
> ---Change the compatible string
> ---Change the interrupt cells definition.
> 
> Signed-off-by: Ma Jun 
> ---
>  Documentation/devicetree/bindings/arm/mbigen.txt |   65 
> ++
>  1 files changed, 65 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/arm/mbigen.txt
> 
> diff --git a/Documentation/devicetree/bindings/arm/mbigen.txt 
> b/Documentation/devicetree/bindings/arm/mbigen.txt
> new file mode 100644
> index 000..cf92ef8
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/mbigen.txt
> @@ -0,0 +1,65 @@
> +Hisilicon mbigen device tree bindings.
> +===
> +
> +Mbigen means: message based interrupt generator.
> +
> +MBI is kind of msi interrupt only used on Non-PCI devices.
> +
> +To reduce the wired interrupt number connected to GIC,
> +Hisilicon designed mbigen to collect and generate interrupt.
> +
> +
> +Non-pci devices can connect to mbigen and gnerate the inteerrupt
> +by wirtting ITS register.

Please run this through a spell checker to get rid of typos.

> +
> +The mbigen and devices connect to mbigen have the following properties:
> +
> +
> +Mbigen required properties:
> +---
> +-compatible: Should be "hisilicon,mbigen-v2"
> +-msi-parent: should specified the ITS mbigen connected
> +-interrupt controller: Identifies the node as an interrupt controller
> +- #interrupt-cells : Specifies the number of cells needed to encode an
> +  interrupt source. The value is 5 now.
> +
> +  The 1st cell is the device id.

Does a given mbigen block generate interrupts with different ITS device
IDs? Or does a given mbigen block have a single device ID shared by all
interrupts it generates?

> +  The 2nd cell is the totall interrupt number of this device?

I don't follow. What is a "total interrupt number"?

> +  The 3rd cell is the hardware pin number of the interrupt.
> +  This value depends on the Soc design.

This property seems sane.

> +  The 4th cell is the mbigen node number. This value should refer to the
> +  vendor soc specification.

What is this, and why do you think you need it?

Surely the address of the mbigen node is a sufficient unique identifier?

> +  The 5th cell is the interrupt trigger type, encoded as follows:
> + 1 = edge triggered
> + 4 = level triggered

Hmm. How are level-triggered interrupts expected to be handled by the
mbigen?

> +
> +- reg: Specifies the base physical address and size of the ITS
> +  registers.

NAK. You should not describe ITS properties here given this is not the
ITS.

Perhaps you mean "the mbigen registers"?

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/8] kdbus: use LSM hooks in kdbus code

2015-07-08 Thread Stephen Smalley
On 07/08/2015 06:25 AM, Paul Osmialowski wrote:
> Originates from:
> 
> https://github.com/lmctl/kdbus.git (branch: kdbus-lsm-v4.for-systemd-v212)
> commit: aa0885489d19be92fa41c6f0a71df28763228a40
> 
> Signed-off-by: Karol Lewandowski 
> Signed-off-by: Paul Osmialowski 
> ---
>  ipc/kdbus/bus.c| 12 ++-
>  ipc/kdbus/bus.h|  3 +++
>  ipc/kdbus/connection.c | 54 
> ++
>  ipc/kdbus/connection.h |  4 
>  ipc/kdbus/domain.c |  9 -
>  ipc/kdbus/domain.h |  2 ++
>  ipc/kdbus/endpoint.c   | 11 ++
>  ipc/kdbus/names.c  | 11 ++
>  ipc/kdbus/queue.c  | 30 ++--
>  9 files changed, 124 insertions(+), 12 deletions(-)
> 
>

> diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
> index 9993753..b85cdc7 100644
> --- a/ipc/kdbus/connection.c
> +++ b/ipc/kdbus/connection.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "bus.h"
>  #include "connection.h"
> @@ -73,6 +74,8 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep 
> *ep, bool privileged,
>   bool is_activator;
>   bool is_monitor;
>   struct kvec kvec;
> + u32 sid, len;
> + char *label;
>   int ret;
>  
>   struct {
> @@ -222,6 +225,14 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep 
> *ep, bool privileged,
>   }
>   }
>  
> + security_task_getsecid(current, );
> + security_secid_to_secctx(sid, , );
> + ret = security_kdbus_connect(conn, label, len);
> + if (ret) {
> + ret = -EPERM;
> + goto exit_unref;
> + }

This seems convoluted and expensive.  If you always want the label of
the current task here, then why not just have security_kdbus_connect()
internally extract the label of the current task?

> @@ -1107,6 +1119,12 @@ static int kdbus_conn_reply(struct kdbus_conn *src, 
> struct kdbus_kmsg *kmsg)
>   if (ret < 0)
>   goto exit;
>  
> + ret = security_kdbus_talk(src, dst);
> + if (ret) {
> + ret = -EPERM;
> + goto exit;
> + }

Where does kdbus apply its uid-based or other restrictions on
connections?  Why do we need to insert separate hooks into each of these
functions?  Is there no central chokepoint already for permission
checking that we can hook?

> diff --git a/ipc/kdbus/connection.h b/ipc/kdbus/connection.h
> index d1ffe90..1f91d39 100644
> --- a/ipc/kdbus/connection.h
> +++ b/ipc/kdbus/connection.h
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "limits.h"
>  #include "metadata.h"
> @@ -73,6 +74,7 @@ struct kdbus_kmsg;
>   * @names_queue_list:Well-known names this connection waits for
>   * @privileged:  Whether this connection is privileged on the bus
>   * @faked_meta:  Whether the metadata was faked on HELLO
> + * @security:LSM security blob
>   */
>  struct kdbus_conn {
>   struct kref kref;
> @@ -113,6 +115,8 @@ struct kdbus_conn {
>  
>   bool privileged:1;
>   bool faked_meta:1;
> +
> + void *security;
>  };

Unless I missed it, you may have missed the most important thing of all:
 controlling kdbus's notion of "privileged".  kdbus sets privileged to
true if the process has CAP_IPC_OWNER or the process euid matches the
uid of the bus creator, and then it allows those processes to do many
dangerous things, including monitoring all traffic, impersonating
credentials, pids, or seclabel, etc.

I don't believe we should ever permit impersonating seclabel information.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] add stealth mode

2015-07-08 Thread Austin S Hemmelgarn

On 2015-07-06 15:44, Matteo Croce wrote:

2015-07-06 12:49 GMT+02:00  :

On Thu, 02 Jul 2015 10:56:01 +0200, Matteo Croce said:

Add option to disable any reply not related to a listening socket,
like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
Also disables ICMP replies to echo request and timestamp.
The stealth mode can be enabled selectively for a single interface.


A few notes.

2) You *do* realize that this isn't anywhere near sufficient in order
to actually make your machine "invisible", right?  (Hint: What *other*
packets can be sent to a machine to provoke a response?)


Other than ICMP, UDP and TCP excluding open TCP/UDP ports?


Just to name a few that I know of off the top of my head:
1. IP packets with any protocol number not supported by your current 
kernel (these return a special ICMP message).
2. SCTP INIT and COOKIE_ECHO chunks when you have SCTP enabled in the 
kernel.

3. Theoretically, some IGMP messages.
4. NDP messages.
5. ARP queries looking for the machine's IP addresses.
6. Certain odd flag combinations on single TCP packets (check the 
documentation for Nmap for more info regarding these), which I believe 
(although I may be reading the code wrong) you aren't accounting for.

7. DAD queries.
8. ICMP address mask queries (which you also don't appear to account for).

This is by no means an exhaustive list, but all of them really should be 
addressed if you want to do this properly.





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH] MIPS, IRQCHIP: Move i8259 irqchip driver to drivers/irqchip.

2015-07-08 Thread Ralf Baechle
On Wed, Jul 08, 2015 at 02:57:38PM +0200, Thomas Gleixner wrote:

> >  arch/mips/Kconfig   |   4 -
> >  arch/mips/kernel/Makefile   |   1 -
> >  arch/mips/kernel/i8259.c| 384 
> > 
> >  drivers/irqchip/Kconfig |   4 +
> >  drivers/irqchip/Makefile|   1 +
> >  drivers/irqchip/irq-i8259.c | 383 
> > +++
> >  6 files changed, 388 insertions(+), 389 deletions(-)
> 
> Should I carry it, or want you merge it via the mips tree?
> 
> In the latter case: Acked-by-me.

I guess the conflict potencial will be lower if you carry it - and if
somebody wants to merge it with one of the other i8259.c littered through
the tree it probably also is better if you carry it.

Thanks!

  Ralf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:sched/core] sched/numa: Prefer NUMA hotness over cache hotness

2015-07-08 Thread Srikar Dronamraju
* Srikar Dronamraju  [2015-07-07 05:49:31]:

> * tip-bot for Srikar Dronamraju  [2015-07-06 08:50:28]:
> 
> > Commit-ID:  8a9e62a238a3033158e0084d8df42ea116d69ce1
> > Gitweb: 
> > http://git.kernel.org/tip/8a9e62a238a3033158e0084d8df42ea116d69ce1
> > Author: Srikar Dronamraju 
> > AuthorDate: Tue, 16 Jun 2015 17:25:59 +0530
> > Committer:  Ingo Molnar 
> > CommitDate: Mon, 6 Jul 2015 15:29:55 +0200
> >
> > sched/numa: Prefer NUMA hotness over cache hotness
> 
> In the above commit, I missed a fact that sched feature NUMA was used to
> enable/disable NUMA_BALANCING. The below version of the same patch takes
> care of this fact. While I am posting the fixed version, it would need a
> revert of the above commit. Please let me know if you just want the
> differential patch that can apply on top of the above commit.

Posted the differential patch just in case you are looking for it.
http://mid.gmane.org/1436361633-4970-1-git-send-email-sri...@linux.vnet.ibm.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 08/16] staging: vme_user: provide DMA functionality

2015-07-08 Thread Martyn Welch

On 07/07/15 08:08, Alessio Igor Bogani wrote:

I would be glad to try it if the maintainer is willing to receive this
type of changes.


Such requirements have come up in the past. I'd welcome such support 
being contributed to the kernel. My view has been that such an API could 
be built on top of the existing API. Does that seem workable to you?


--
Martyn Welch (Lead Software Engineer)  | Registered in England and Wales
GE Intelligent Platforms   | (3828642) at 100 Barbirolli Square
T +44(0)1327322748 | Manchester, M2 3AB
E martyn.we...@ge.com  | VAT:GB 927559189
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 2/3] megaraid_sas : use dev_printk when possible

2015-07-08 Thread Bjorn Helgaas
On Wed, Jul 8, 2015 at 5:47 AM, Hannes Reinecke  wrote:
> On 07/07/2015 10:52 PM, Bjorn Helgaas wrote:
>> Use dev_printk() when possible to make messages more useful.
>>
>> Signed-off-by: Bjorn Helgaas 
>> ---
>>  drivers/scsi/megaraid/megaraid_sas_base.c   |  304 
>> +--
>>  drivers/scsi/megaraid/megaraid_sas_fusion.c |   95 
>>  2 files changed, 196 insertions(+), 203 deletions(-)
>>
> [ .. ]
>> @@ -1873,8 +1872,8 @@ static int megasas_get_ld_vf_affiliation_111(struct 
>> megasas_instance *instance,
>>   cmd = megasas_get_cmd(instance);
>>
>>   if (!cmd) {
>> - printk(KERN_DEBUG "megasas: megasas_get_ld_vf_affiliation_111:"
>> -"Failed to get cmd for scsi%d.\n",
>> + dev_printk(KERN_DEBUG, >pdev->dev, 
>> "megasas_get_ld_vf_affiliation_111:"
>> +"Failed to get cmd for scsi%d\n",
>>   instance->host->host_no);
>>   return -ENOMEM;
>>   }
> Makes one wonder why we don't have a 'dev_debug'; dev_notice() and
> dev_warn() are there ...

There actually is a 'dev_dbg()' but when CONFIG_DYNAMIC_DEBUG is set,
I think dev_dbg() generates no output by default.  So to preserve the
previous behavior of "this message always appears in the dmesg log no
matter what the dynamic debug setting," I used dev_printk(KERN_DEBUG).

Somebody who maintains these drivers could probably go through and
convert these to either dev_info() or dev_dbg() depending on what they
need.  That would require more judgment than I wanted to get into :)

Thanks for taking a look at these!

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 1/4] perf: Add PERF_RECORD_SWITCH to indicate context switches

2015-07-08 Thread Arnaldo Carvalho de Melo
Em Wed, Jul 08, 2015 at 12:52:40AM +0200, Peter Zijlstra escreveu:
> On Tue, Jul 07, 2015 at 01:13:59PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Jul 07, 2015 at 05:36:14PM +0200, Peter Zijlstra escreveu:
> > > > To help userspace in places where all it has is the union perf_event, we
> > > > can reuse one bit in misc to state that, i.e.

> > > >   #define PERF_RECORD_MISC_SWITCH_NEXT_PREV_PID 14
 
> > > > For instance.

> > > The other option would be a separate RECORD type, which might be
> > > simpler.

> > Humm, do we really need it?

> > I think this is just us wanting to, since we are going to add a new
> > record, to make it more useful for other, not right now needed,
> > situations, i.e. if the user is priviledged, there are two other options
> > to get his info, right?
 
> I was just thinking that 2 records, each with a fixed layout would be
> easier to parse than 1 record with variable layout.
 
> The record space is immense, so from that point it really doesn't
> matter.

We could do a land grab at some point there, if/when we find some reason
for that... :-)
 
> Do whatever is easiest, less mistakes get made etc. :-)
 
> No real preference either way, as long we we've thought about it.

Right, I just don't want to have two u32 carrying -1 for no reason.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 06/39] bpf tools: Read eBPF object from buffer

2015-07-08 Thread Wang Nan
To support dynamic compiling, this patch allows caller to pass a
in-memory buffer to libbpf by bpf_object__open_buffer(). libbpf calls
elf_memory() to open it as ELF object file.

Because __bpf_object__open() collects all required data and won't need
that buffer anymore, libbpf uses that buffer directly instead of clone a
new buffer. Caller of libbpf can free that buffer or use it do other
things after bpf_object__open_buffer() return.

Signed-off-by: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-7-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 62 --
 tools/lib/bpf/libbpf.h |  2 ++
 2 files changed, 52 insertions(+), 12 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 9e44608..36dfbc1 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -84,6 +84,8 @@ struct bpf_object {
 */
struct {
int fd;
+   void *obj_buf;
+   size_t obj_buf_sz;
Elf *elf;
GElf_Ehdr ehdr;
} efile;
@@ -91,7 +93,9 @@ struct bpf_object {
 };
 #define obj_elf_valid(o)   ((o)->efile.elf)
 
-static struct bpf_object *bpf_object__new(const char *path)
+static struct bpf_object *bpf_object__new(const char *path,
+ void *obj_buf,
+ size_t obj_buf_sz)
 {
struct bpf_object *obj;
 
@@ -103,6 +107,16 @@ static struct bpf_object *bpf_object__new(const char *path)
 
strcpy(obj->path, path);
obj->efile.fd = -1;
+
+   /*
+* Caller of this function should also calls
+* bpf_object__elf_finish() after data collection to return
+* obj_buf to user. If not, we should duplicate the buffer to
+* avoid user freeing them before elf finish.
+*/
+   obj->efile.obj_buf = obj_buf;
+   obj->efile.obj_buf_sz = obj_buf_sz;
+
return obj;
 }
 
@@ -116,6 +130,8 @@ static void bpf_object__elf_finish(struct bpf_object *obj)
obj->efile.elf = NULL;
}
zclose(obj->efile.fd);
+   obj->efile.obj_buf = NULL;
+   obj->efile.obj_buf_sz = 0;
 }
 
 static int bpf_object__elf_init(struct bpf_object *obj)
@@ -128,16 +144,26 @@ static int bpf_object__elf_init(struct bpf_object *obj)
return -EEXIST;
}
 
-   obj->efile.fd = open(obj->path, O_RDONLY);
-   if (obj->efile.fd < 0) {
-   pr_warning("failed to open %s: %s\n", obj->path,
-   strerror(errno));
-   return -errno;
+   if (obj->efile.obj_buf_sz > 0) {
+   /*
+* obj_buf should have been validated by
+* bpf_object__open_buffer().
+*/
+   obj->efile.elf = elf_memory(obj->efile.obj_buf,
+   obj->efile.obj_buf_sz);
+   } else {
+   obj->efile.fd = open(obj->path, O_RDONLY);
+   if (obj->efile.fd < 0) {
+   pr_warning("failed to open %s: %s\n", obj->path,
+   strerror(errno));
+   return -errno;
+   }
+
+   obj->efile.elf = elf_begin(obj->efile.fd,
+   LIBBPF_ELF_C_READ_MMAP,
+   NULL);
}
 
-   obj->efile.elf = elf_begin(obj->efile.fd,
-LIBBPF_ELF_C_READ_MMAP,
-NULL);
if (!obj->efile.elf) {
pr_warning("failed to open %s as ELF file\n",
obj->path);
@@ -167,7 +193,7 @@ errout:
 }
 
 static struct bpf_object *
-__bpf_object__open(const char *path)
+__bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz)
 {
struct bpf_object *obj;
 
@@ -176,7 +202,7 @@ __bpf_object__open(const char *path)
return NULL;
}
 
-   obj = bpf_object__new(path);
+   obj = bpf_object__new(path, obj_buf, obj_buf_sz);
if (!obj)
return NULL;
 
@@ -198,7 +224,19 @@ struct bpf_object *bpf_object__open(const char *path)
 
pr_debug("loading %s\n", path);
 
-   return __bpf_object__open(path);
+   return __bpf_object__open(path, NULL, 0);
+}
+
+struct bpf_object *bpf_object__open_buffer(void *obj_buf,
+  size_t obj_buf_sz)
+{
+   /* param validation */
+   if (!obj_buf || obj_buf_sz <= 0)
+   return NULL;
+
+   pr_debug("loading object from buffer\n");
+
+   return __bpf_object__open("[buffer]", obj_buf, 

[BUG ?] brcmsmac: condition with no effect

2015-07-08 Thread Nicholas Mc Guire
From: Nicholas Mc Guire   

scanning for trivial bug-patters with coccinelle spatches returned:
drivers/net/wireless/brcm80211/brcmsmac/phy/phy_lcn.c:3391
WARNING: condition with no effect (if branch == else)

added in 'commit 5b435de0d786 ("net: wireless: add brcm80211 drivers")'

drivers/net/wireless/brcm80211/brcmsmac/phy/phy_lcn.c:wlc_lcnphy_deaf_mode()
(line numbers are from linux-next v4.2-rc2)
3391 if (LCNREV_LT(pi->pubpi.phy_rev, 2)) {
3392 mod_phy_reg(pi, 0x4b0, (0x1 << 5), (mode) << 5);   

3393 mod_phy_reg(pi, 0x4b1, (0x1 << 9), 0 << 9);
3394 } else {
3395 mod_phy_reg(pi, 0x4b0, (0x1 << 5), (mode) << 5);
3396 mod_phy_reg(pi, 0x4b1, (0x1 << 9), 0 << 9);
3397 }

Can't figure out what the intent of this condition is but it currently has
no effect as if == else and this most likely is not the intent.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 20/39] bpf tools: Introduce accessors for struct bpf_program

2015-07-08 Thread Wang Nan
This patch introduces accessors for user of libbpf to retrieve section
name and fd of a opened/loaded eBPF program. 'struct bpf_prog_handler'
is used for that purpose. Accessors of programs section name and file
descriptor are provided. Set/get private data are also impelmented.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: a...@kernel.org
Link: 
http://lkml.kernel.org/r/1435716878-189507-21-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 82 ++
 tools/lib/bpf/libbpf.h | 25 +++
 2 files changed, 107 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index d826d5b..b1575c4 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -98,6 +98,10 @@ struct bpf_program {
int nr_reloc;
 
int fd;
+
+   struct bpf_object *obj;
+   void *priv;
+   bpf_program_clear_priv_t clear_priv;
 };
 
 struct bpf_object {
@@ -150,6 +154,12 @@ static void bpf_program__clear(struct bpf_program *prog)
if (!prog)
return;
 
+   if (prog->clear_priv)
+   prog->clear_priv(prog, prog->priv);
+
+   prog->priv = NULL;
+   prog->clear_priv = NULL;
+
bpf_program__unload(prog);
zfree(>section_name);
zfree(>insns);
@@ -224,6 +234,7 @@ bpf_program__new(struct bpf_object *obj, void *data, size_t 
size,
 
obj->programs = progs;
obj->nr_programs = nr_progs + 1;
+   prog.obj = obj;
progs[nr_progs] = prog;
return [nr_progs];
 }
@@ -934,3 +945,74 @@ void bpf_object__close(struct bpf_object *obj)
 
free(obj);
 }
+
+struct bpf_program *
+bpf_program__next(struct bpf_program *prev, struct bpf_object *obj)
+{
+   size_t idx;
+
+   if (!obj->programs)
+   return NULL;
+   /* First handler */
+   if (prev == NULL)
+   return >programs[0];
+
+   if (prev->obj != obj) {
+   pr_warning("error: program handler doesn't match object\n");
+   return NULL;
+   }
+
+   idx = (prev - obj->programs) + 1;
+   if (idx >= obj->nr_programs)
+   return NULL;
+   return >programs[idx];
+}
+
+int bpf_program__set_private(struct bpf_program *prog,
+void *priv,
+bpf_program_clear_priv_t clear_priv)
+{
+   if (prog->priv && prog->clear_priv)
+   prog->clear_priv(prog, prog->priv);
+
+   prog->priv = priv;
+   prog->clear_priv = clear_priv;
+   return 0;
+}
+
+int bpf_program__get_private(struct bpf_program *prog, void **ppriv)
+{
+   *ppriv = prog->priv;
+   return 0;
+}
+
+int bpf_program__get_title(struct bpf_program *prog,
+  const char **ptitle, bool dup)
+{
+   const char *title;
+
+   if (!ptitle)
+   return -EINVAL;
+
+   title = prog->section_name;
+   if (dup) {
+   title = strdup(title);
+   if (!title) {
+   pr_warning("failed to strdup program title\n");
+   *ptitle = NULL;
+   return -ENOMEM;
+   }
+   }
+
+   *ptitle = title;
+   return 0;
+}
+
+int bpf_program__get_fd(struct bpf_program *prog, int *pfd)
+{
+   if (!pfd)
+   return -EINVAL;
+
+   *pfd = prog->fd;
+   return 0;
+}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 3e69600..9e0e102 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -9,6 +9,7 @@
 #define __BPF_LIBBPF_H
 
 #include 
+#include 
 
 /*
  * In include/linux/compiler-gcc.h, __printf is defined. However
@@ -34,6 +35,30 @@ void bpf_object__close(struct bpf_object *object);
 int bpf_object__load(struct bpf_object *obj);
 int bpf_object__unload(struct bpf_object *obj);
 
+/* Accessors of bpf_program. */
+struct bpf_program;
+struct bpf_program *bpf_program__next(struct bpf_program *prog,
+ struct bpf_object *obj);
+
+#define bpf_object__for_each_program(pos, obj) \
+   for ((pos) = bpf_program__next(NULL, (obj));\
+(pos) != NULL; \
+(pos) = bpf_program__next((pos), (obj)))
+
+typedef void (*bpf_program_clear_priv_t)(struct bpf_program *,
+void *);
+
+int bpf_program__set_private(struct bpf_program *prog, void *priv,
+bpf_program_clear_priv_t clear_priv);
+
+int bpf_program__get_private(struct bpf_program *prog,
+void **ppriv);
+
+int bpf_program__get_title(struct bpf_program *prog,
+  const char **ptitle, bool dup);
+
+int 

[PATCH v11 04/39] bpf tools: Allow caller to set printing function

2015-07-08 Thread Wang Nan
By libbpf_set_print(), users of libbpf are allowed to register he/she
own debug, info and warning printing functions. Libbpf will use those
functions to print messages. If not provided, default info and warning
printing functions are fprintf(stderr, ...); default debug printing
is NULL.

This API is designed to be used by perf, enables it to register its own
logging functions to make all logs uniform, instead of separated
logging level control.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-5-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 40 
 tools/lib/bpf/libbpf.h | 12 
 2 files changed, 52 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index c08d6bc..6f0c13a 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -7,8 +7,48 @@
  */
 
 #include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
 
 #include "libbpf.h"
+
+#define __printf(a, b) __attribute__((format(printf, a, b)))
+
+__printf(1, 2)
+static int __base_pr(const char *format, ...)
+{
+   va_list args;
+   int err;
+
+   va_start(args, format);
+   err = vfprintf(stderr, format, args);
+   va_end(args);
+   return err;
+}
+
+static __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
+static __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
+static __printf(1, 2) libbpf_print_fn_t __pr_debug;
+
+#define __pr(func, fmt, ...)   \
+do {   \
+   if ((func)) \
+   (func)("libbpf: " fmt, ##__VA_ARGS__); \
+} while (0)
+
+#define pr_warning(fmt, ...)   __pr(__pr_warning, fmt, ##__VA_ARGS__)
+#define pr_info(fmt, ...)  __pr(__pr_info, fmt, ##__VA_ARGS__)
+#define pr_debug(fmt, ...) __pr(__pr_debug, fmt, ##__VA_ARGS__)
+
+void libbpf_set_print(libbpf_print_fn_t warn,
+ libbpf_print_fn_t info,
+ libbpf_print_fn_t debug)
+{
+   __pr_warning = warn;
+   __pr_info = info;
+   __pr_debug = debug;
+}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index a6f46d9..8d1eeba 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -8,4 +8,16 @@
 #ifndef __BPF_LIBBPF_H
 #define __BPF_LIBBPF_H
 
+/*
+ * In include/linux/compiler-gcc.h, __printf is defined. However
+ * it should be better if libbpf.h doesn't depend on Linux header file.
+ * So instead of __printf, here we use gcc attribute directly.
+ */
+typedef int (*libbpf_print_fn_t)(const char *, ...)
+   __attribute__((format(printf, 1, 2)));
+
+void libbpf_set_print(libbpf_print_fn_t warn,
+ libbpf_print_fn_t info,
+ libbpf_print_fn_t debug);
+
 #endif
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND] Documentation: Add MCB documentation

2015-07-08 Thread Johannes Thumshirn
Jonathan Corbet  writes:

> On Tue,  7 Jul 2015 10:37:08 +0200
> Johannes Thumshirn  wrote:
>
>> The 1st version of this patch was send on Feb 16
>> (https://lkml.org/lkml/2015/2/16/31) and it seems like it missed v4.1 and I
>> couldn't find it in the docs-next tree as well, that's the reason for the
>> re-send.
>
> Very weird.  I replied at the time that I had applied it, but it's sure
> not there now.  Unless, chameleon-like, it's hiding?
>
> Now that I'm paying attention again, could I ask for a name change for the
> file?  MCB.txt doesn't really tell anybody anything.  How about
> chameleon-bus.txt or something like that?

Sure I can but as the source for it is in drivers/mcb/ I think the link
is there? OTOH you're the docs maintainer, I do what ever you want me to
do. Just tell me how you want it named and I'll re-spin (and include the
documentation into MAINTAINERS as well).

> Thanks (and sorry),

No problem, it's not an ultra important bug fix after all ;-)

   Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 05/39] bpf tools: Open eBPF object file and do basic validation

2015-07-08 Thread Wang Nan
This patch defines basic interface of libbpf. 'struct bpf_object' will
be the handler of each object file. Its internal structure is hide to
user. eBPF object files are compiled by LLVM as ELF format. In this
patch, libelf is used to open those files, read EHDR and do basic
validation according to e_type and e_machine.

All elf related staffs are grouped together and reside in efile field of
'struct bpf_object'. bpf_object__elf_finish() is introduced to clear it.

After all eBPF programs in an object file are loaded, related ELF
information is useless. Close the object file and free those memory.

The zfree() and zclose() functions are introduced to ensure setting NULL
pointers and negative file descriptors after resources are released.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-6-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 158 +
 tools/lib/bpf/libbpf.h |   8 +++
 2 files changed, 166 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 6f0c13a..9e44608 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -11,8 +11,12 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
+#include 
+#include 
 
 #include "libbpf.h"
 
@@ -52,3 +56,157 @@ void libbpf_set_print(libbpf_print_fn_t warn,
__pr_info = info;
__pr_debug = debug;
 }
+
+/* Copied from tools/perf/util/util.h */
+#ifndef zfree
+# define zfree(ptr) ({ free(*ptr); *ptr = NULL; })
+#endif
+
+#ifndef zclose
+# define zclose(fd) ({ \
+   int ___err = 0; \
+   if ((fd) >= 0)  \
+   ___err = close((fd));   \
+   fd = -1;\
+   ___err; })
+#endif
+
+#ifdef HAVE_LIBELF_MMAP_SUPPORT
+# define LIBBPF_ELF_C_READ_MMAP ELF_C_READ_MMAP
+#else
+# define LIBBPF_ELF_C_READ_MMAP ELF_C_READ
+#endif
+
+struct bpf_object {
+   /*
+* Information when doing elf related work. Only valid if fd
+* is valid.
+*/
+   struct {
+   int fd;
+   Elf *elf;
+   GElf_Ehdr ehdr;
+   } efile;
+   char path[];
+};
+#define obj_elf_valid(o)   ((o)->efile.elf)
+
+static struct bpf_object *bpf_object__new(const char *path)
+{
+   struct bpf_object *obj;
+
+   obj = calloc(1, sizeof(struct bpf_object) + strlen(path) + 1);
+   if (!obj) {
+   pr_warning("alloc memory failed for %s\n", path);
+   return NULL;
+   }
+
+   strcpy(obj->path, path);
+   obj->efile.fd = -1;
+   return obj;
+}
+
+static void bpf_object__elf_finish(struct bpf_object *obj)
+{
+   if (!obj_elf_valid(obj))
+   return;
+
+   if (obj->efile.elf) {
+   elf_end(obj->efile.elf);
+   obj->efile.elf = NULL;
+   }
+   zclose(obj->efile.fd);
+}
+
+static int bpf_object__elf_init(struct bpf_object *obj)
+{
+   int err = 0;
+   GElf_Ehdr *ep;
+
+   if (obj_elf_valid(obj)) {
+   pr_warning("elf init: internal error\n");
+   return -EEXIST;
+   }
+
+   obj->efile.fd = open(obj->path, O_RDONLY);
+   if (obj->efile.fd < 0) {
+   pr_warning("failed to open %s: %s\n", obj->path,
+   strerror(errno));
+   return -errno;
+   }
+
+   obj->efile.elf = elf_begin(obj->efile.fd,
+LIBBPF_ELF_C_READ_MMAP,
+NULL);
+   if (!obj->efile.elf) {
+   pr_warning("failed to open %s as ELF file\n",
+   obj->path);
+   err = -EINVAL;
+   goto errout;
+   }
+
+   if (!gelf_getehdr(obj->efile.elf, >efile.ehdr)) {
+   pr_warning("failed to get EHDR from %s\n",
+   obj->path);
+   err = -EINVAL;
+   goto errout;
+   }
+   ep = >efile.ehdr;
+
+   if ((ep->e_type != ET_REL) || (ep->e_machine != 0)) {
+   pr_warning("%s is not an eBPF object file\n",
+   obj->path);
+   err = -EINVAL;
+   goto errout;
+   }
+
+   return 0;
+errout:
+   bpf_object__elf_finish(obj);
+   return err;
+}
+
+static struct bpf_object *
+__bpf_object__open(const char *path)
+{
+   struct bpf_object *obj;
+
+   if (elf_version(EV_CURRENT) == EV_NONE) {
+   pr_warning("failed to init libelf for %s\n", path);
+   return NULL;
+   }
+
+   obj = bpf_object__new(path);
+   if (!obj)
+   return NULL;
+
+   

[PATCH v11 15/39] bpf tools: Add bpf.c/h for common bpf operations

2015-07-08 Thread Wang Nan
This patch introduces bpf.c and bpf.h, which hold common functions
issuing bpf syscall. The goal of these two files is to hide syscall
completely from user. Note that bpf.c and bpf.h deal with kernel
interface only. Things like structure of 'map' section in the ELF object
is not cared by of bpf.[ch].

We first introduce bpf_create_map().

Note that, since functions in bpf.[ch] are wrapper of sys_bpf, they
don't use OO style naming.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-16-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/Build |  2 +-
 tools/lib/bpf/bpf.c | 51 +++
 tools/lib/bpf/bpf.h | 16 
 3 files changed, 68 insertions(+), 1 deletion(-)
 create mode 100644 tools/lib/bpf/bpf.c
 create mode 100644 tools/lib/bpf/bpf.h

diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index a316484..d874975 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1 +1 @@
-libbpf-y := libbpf.o
+libbpf-y := libbpf.o bpf.o
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
new file mode 100644
index 000..208de7c3
--- /dev/null
+++ b/tools/lib/bpf/bpf.c
@@ -0,0 +1,51 @@
+/*
+ * common eBPF ELF operations.
+ *
+ * Copyright (C) 2013-2015 Alexei Starovoitov 
+ * Copyright (C) 2015 Wang Nan 
+ * Copyright (C) 2015 Huawei Inc.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf.h"
+
+/*
+ * When building perf, unistd.h is override. Define __NR_bpf is
+ * required to be defined.
+ */
+#ifndef __NR_bpf
+# if defined(__i386__)
+#  define __NR_bpf 357
+# elif defined(__x86_64__)
+#  define __NR_bpf 321
+# elif defined(__aarch64__)
+#  define __NR_bpf 280
+# else
+#  error __NR_bpf not defined. libbpf does not support your arch.
+# endif
+#endif
+
+static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
+  unsigned int size)
+{
+   return syscall(__NR_bpf, cmd, attr, size);
+}
+
+int bpf_create_map(enum bpf_map_type map_type, int key_size,
+  int value_size, int max_entries)
+{
+   union bpf_attr attr;
+
+   memset(, '\0', sizeof(attr));
+
+   attr.map_type = map_type;
+   attr.key_size = key_size;
+   attr.value_size = value_size;
+   attr.max_entries = max_entries;
+
+   return sys_bpf(BPF_MAP_CREATE, , sizeof(attr));
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
new file mode 100644
index 000..28f7942
--- /dev/null
+++ b/tools/lib/bpf/bpf.h
@@ -0,0 +1,16 @@
+/*
+ * common eBPF ELF operations.
+ *
+ * Copyright (C) 2013-2015 Alexei Starovoitov 
+ * Copyright (C) 2015 Wang Nan 
+ * Copyright (C) 2015 Huawei Inc.
+ */
+#ifndef __BPF_BPF_H
+#define __BPF_BPF_H
+
+#include 
+
+int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
+  int max_entries);
+
+#endif
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 14/39] bpf tools: Record map accessing instructions for each program

2015-07-08 Thread Wang Nan
This patch records the indices of instructions which are needed to be
relocated. That information is saved in the 'reloc_desc' field in
'struct bpf_program'. In the loading phase (this patch takes effect in
the opening phase), the collected instructions will be replaced by map
loading instructions.

Since we are going to close the ELF file and clear all data at the end
of the 'opening' phase, the ELF information will no longer be valid in
the 'loading' phase. We have to locate the instructions before maps are
loaded, instead of directly modifying the instruction.

'struct bpf_map_def' is introduced in this patch to let us know how many
maps are defined in the object.

This is the third part of map relocation. The principle of map relocation
is described in commit message of 'bpf tools: Collect symbol table from
SHT_SYMTAB section'.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-15-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 124 +
 tools/lib/bpf/libbpf.h |  13 ++
 2 files changed, 137 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 5f12fa6..4f13772 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -88,6 +89,12 @@ struct bpf_program {
char *section_name;
struct bpf_insn *insns;
size_t insns_cnt;
+
+   struct {
+   int insn_idx;
+   int map_idx;
+   } *reloc_desc;
+   int nr_reloc;
 };
 
 struct bpf_object {
@@ -127,6 +134,9 @@ static void bpf_program__clear(struct bpf_program *prog)
 
zfree(>section_name);
zfree(>insns);
+   zfree(>reloc_desc);
+
+   prog->nr_reloc = 0;
prog->insns_cnt = 0;
prog->idx = -1;
 }
@@ -484,6 +494,118 @@ out:
return err;
 }
 
+static struct bpf_program *
+bpf_object__find_prog_by_idx(struct bpf_object *obj, int idx)
+{
+   struct bpf_program *prog;
+   size_t i;
+
+   for (i = 0; i < obj->nr_programs; i++) {
+   prog = >programs[i];
+   if (prog->idx == idx)
+   return prog;
+   }
+   return NULL;
+}
+
+static int
+bpf_program__collect_reloc(struct bpf_program *prog,
+  size_t nr_maps, GElf_Shdr *shdr,
+  Elf_Data *data, Elf_Data *symbols)
+{
+   int i, nrels;
+
+   pr_debug("collecting relocating info for: '%s'\n",
+prog->section_name);
+   nrels = shdr->sh_size / shdr->sh_entsize;
+
+   prog->reloc_desc = malloc(sizeof(*prog->reloc_desc) * nrels);
+   if (!prog->reloc_desc) {
+   pr_warning("failed to alloc memory in relocation\n");
+   return -ENOMEM;
+   }
+   prog->nr_reloc = nrels;
+
+   for (i = 0; i < nrels; i++) {
+   GElf_Sym sym;
+   GElf_Rel rel;
+   unsigned int insn_idx;
+   struct bpf_insn *insns = prog->insns;
+   size_t map_idx;
+
+   if (!gelf_getrel(data, i, )) {
+   pr_warning("relocation: failed to get %d reloc\n", i);
+   return -EINVAL;
+   }
+
+   insn_idx = rel.r_offset / sizeof(struct bpf_insn);
+   pr_debug("relocation: insn_idx=%u\n", insn_idx);
+
+   if (!gelf_getsym(symbols,
+GELF_R_SYM(rel.r_info),
+)) {
+   pr_warning("relocation: symbol %"PRIx64" not found\n",
+  GELF_R_SYM(rel.r_info));
+   return -EINVAL;
+   }
+
+   if (insns[insn_idx].code != (BPF_LD | BPF_IMM | BPF_DW)) {
+   pr_warning("bpf: relocation: invalid relo for 
insns[%d].code 0x%x\n",
+  insn_idx, insns[insn_idx].code);
+   return -EINVAL;
+   }
+
+   map_idx = sym.st_value / sizeof(struct bpf_map_def);
+   if (map_idx >= nr_maps) {
+   pr_warning("bpf relocation: map_idx %d large than %d\n",
+  (int)map_idx, (int)nr_maps - 1);
+   return -EINVAL;
+   }
+
+   prog->reloc_desc[i].insn_idx = insn_idx;
+   prog->reloc_desc[i].map_idx = map_idx;
+   }
+   return 0;
+}
+
+static int bpf_object__collect_reloc(struct bpf_object *obj)
+{
+   int i, err;
+
+   if (!obj_elf_valid(obj)) {
+   pr_warning("Internal error: elf object 

[PATCH v11 21/39] bpf tools: Introduce accessors for struct bpf_object

2015-07-08 Thread Wang Nan
This patch add an accessor which allows caller to get count of programs
in an object file.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-22-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 9 +
 tools/lib/bpf/libbpf.h | 3 +++
 2 files changed, 12 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index b1575c4..b58b13b 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -946,6 +946,15 @@ void bpf_object__close(struct bpf_object *obj)
free(obj);
 }
 
+int bpf_object__get_prog_cnt(struct bpf_object *obj, size_t *pcnt)
+{
+   if (!obj || !pcnt)
+   return -EINVAL;
+
+   *pcnt = obj->nr_programs;
+   return 0;
+}
+
 struct bpf_program *
 bpf_program__next(struct bpf_program *prev, struct bpf_object *obj)
 {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 9e0e102..a20ae2e 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -35,6 +35,9 @@ void bpf_object__close(struct bpf_object *object);
 int bpf_object__load(struct bpf_object *obj);
 int bpf_object__unload(struct bpf_object *obj);
 
+/* Accessors of bpf_object */
+int bpf_object__get_prog_cnt(struct bpf_object *obj, size_t *pcnt);
+
 /* Accessors of bpf_program. */
 struct bpf_program;
 struct bpf_program *bpf_program__next(struct bpf_program *prog,
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 08/16] staging: vme_user: provide DMA functionality

2015-07-08 Thread Martyn Welch



On 06/07/15 18:24, Dmitry Kalinkin wrote:

Some functionality was dropped as it was not good practice
>(such as receiving VME interrupts in user space, it's not really doable if
>the slave card is Release On Register Access rather than Release on
>Acknowledge),

Didn't know about RORA. I wonder how different this is compared to the
PCI bus case.


Little I suspect. What it does mean is that there's no generic mechanism 
for clearing down an interrupt, so a device specific interrupt routine 
is required, which needs to be in kernel space.



>so the interface became more of a debug mechanism for me.
>Others have clearly found it provides enough for them to allow drivers to be
>written in user space.
>
>I was thinking that the opposite might be better, no windows were mapped at
>module load, windows could be allocated and mapped using the control device.
>This would ensure that unused resources were still available for kernel
>based drivers and would mean the driver wouldn't be pre-allocating a bunch
>of fairly substantially sized slave window buffers (the buffers could also
>be allocated to match the size of the slave window requested). What do you
>think?

I'm not a VME expert, but it seems that VME windows are a quiet limited resource
no matter how you allocate your resources. Theoretically we could put up to 32
different boards in a single crate, so there won't be enough windows for each
driver to allocate. That said, there is no way around this when putting together
a really heterogeneous VME system. To overcome such problem, one could
develop a different kernel API that would not provide windows to the
drivers, but
handle reads and writes by reconfiguring windows on the fly, which in turn would
introduce more latency. Those who need such API are welcome to develop it:)



The aim of the existing APIs is to provide a mechanism for allocating 
resources. You're right, the resources are limited when scaling to a 32 
slot crate. There's a number of ways to share the resources, though they 
tend to all have trade offs. My experience has been that the majority of 
VME systems don't tend to stretch up to 32 cards.



As for dynamic vme_user device allocation, I don't see the point in this.
The only existing kernel VME driver allocates windows in advance, user is just
to make sure to leave one free window if she wants to use that. Module parameter
for window count will be dynamic enough to handle that.


If vme_user grabs all the VME windows, there are no windows available 
for any kernel level VME drivers to use. If a kernel level driver loads 
before vme_user and is allocated a window, if vme_user demands 8 windows 
(and assuming it doesn't deal with some already having been allocated 
gracefully, which it doesn't at the moment) then it doesn't load. 
Dynamic allocation would leave "unused" resources available rather than 
prospectively hogging them.


--
Martyn Welch (Lead Software Engineer)  | Registered in England and Wales
GE Intelligent Platforms   | (3828642) at 100 Barbirolli Square
T +44(0)1327322748 | Manchester, M2 3AB
E martyn.we...@ge.com  | VAT:GB 927559189
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 19/39] bpf tools: Load eBPF programs in object files into kernel

2015-07-08 Thread Wang Nan
This patch utilizes previous introduced bpf_load_program to load
programs in the ELF file into kernel. Result is stored in 'fd' field in
'struct bpf_program'.

During loading, it allocs a log buffer and free it before return.  Note
that that buffer is not passed to bpf_load_program() if the first
loading try is successful. Doesn't use a statically allocated log buffer
to avoid potention multi-thread problem.

Instructions collected during opening is cleared after loading.

load_program() is created for loading a 'struct bpf_insn' array into
kernel, bpf_program__load() calls it. By this design we have a function
loads instructions into kernel. It will be used by further patches,
which creates different instances from a program and load them into
kernel.

Signed-off-by: Wang Nan 
Cc: 
Cc: 
Cc: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Link: 
http://lkml.kernel.org/r/1435716878-189507-20-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 90 ++
 1 file changed, 90 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index cd40ae0..d826d5b 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -96,6 +96,8 @@ struct bpf_program {
int map_idx;
} *reloc_desc;
int nr_reloc;
+
+   int fd;
 };
 
 struct bpf_object {
@@ -135,11 +137,20 @@ struct bpf_object {
 };
 #define obj_elf_valid(o)   ((o)->efile.elf)
 
+static void bpf_program__unload(struct bpf_program *prog)
+{
+   if (!prog)
+   return;
+
+   zclose(prog->fd);
+}
+
 static void bpf_program__clear(struct bpf_program *prog)
 {
if (!prog)
return;
 
+   bpf_program__unload(prog);
zfree(>section_name);
zfree(>insns);
zfree(>reloc_desc);
@@ -176,6 +187,7 @@ __bpf_program__new(void *data, size_t size, char *name, int 
idx,
memcpy(prog->insns, data,
   prog->insns_cnt * sizeof(struct bpf_insn));
prog->idx = idx;
+   prog->fd = -1;
 
return 0;
 errout:
@@ -721,6 +733,79 @@ static int bpf_object__collect_reloc(struct bpf_object 
*obj)
return 0;
 }
 
+static int
+load_program(struct bpf_insn *insns, int insns_cnt,
+char *license, u32 kern_version, int *pfd)
+{
+   int ret;
+   char *log_buf;
+
+   if (!insns || !insns_cnt)
+   return -EINVAL;
+
+   log_buf = malloc(BPF_LOG_BUF_SIZE);
+   if (!log_buf)
+   pr_warning("Alloc log buffer for bpf loader error, continue 
without log\n");
+
+   ret = bpf_load_program(BPF_PROG_TYPE_KPROBE, insns,
+  insns_cnt, license, kern_version,
+  log_buf, BPF_LOG_BUF_SIZE);
+
+   if (ret >= 0) {
+   *pfd = ret;
+   ret = 0;
+   goto out;
+   }
+
+   ret = -EINVAL;
+   pr_warning("load bpf program failed: %s\n", strerror(errno));
+
+   if (log_buf) {
+   pr_warning("-- BEGIN DUMP LOG ---\n");
+   pr_warning("\n%s\n", log_buf);
+   pr_warning("-- END LOG --\n");
+   }
+
+out:
+   free(log_buf);
+   return ret;
+}
+
+static int
+bpf_program__load(struct bpf_program *prog,
+ char *license, u32 kern_version)
+{
+   int err, fd;
+
+   err = load_program(prog->insns, prog->insns_cnt,
+  license, kern_version, );
+   if (!err)
+   prog->fd = fd;
+
+   if (err)
+   pr_warning("failed to load program '%s'\n",
+  prog->section_name);
+   zfree(>insns);
+   prog->insns_cnt = 0;
+   return err;
+}
+
+static int
+bpf_object__load_progs(struct bpf_object *obj)
+{
+   size_t i;
+   int err;
+
+   for (i = 0; i < obj->nr_programs; i++) {
+   err = bpf_program__load(>programs[i],
+   obj->license,
+   obj->kern_version);
+   if (err)
+   return err;
+   }
+   return 0;
+}
+
 static int bpf_object__validate(struct bpf_object *obj)
 {
if (obj->kern_version == 0) {
@@ -798,6 +883,9 @@ int bpf_object__unload(struct bpf_object *obj)
zfree(>map_fds);
obj->nr_map_fds = 0;
 
+   for (i = 0; i < obj->nr_programs; i++)
+   bpf_program__unload(>programs[i]);
+
return 0;
 }
 
@@ -816,6 +904,8 @@ int bpf_object__load(struct bpf_object *obj)
goto out;
if (bpf_object__relocate(obj))
goto out;
+   if (bpf_object__load_progs(obj))
+   goto out;
 
return 0;
 out:
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe 

[PATCH v11 09/39] bpf tools: Collect version and license from ELF sections

2015-07-08 Thread Wang Nan
Expand bpf_obj_elf_collect() to collect license and kernel version
information in eBPF object file. eBPF object file should have a section
named 'license', which contains a string. It should also have a section
named 'version', contains a u32 LINUX_VERSION_CODE.

bpf_obj_validate() is introduced to validate object file after loaded.
Currently it only check existence of 'version' section.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-10-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 53 ++
 1 file changed, 53 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index d8d6eb5..95c8d8e 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -78,6 +79,8 @@ void libbpf_set_print(libbpf_print_fn_t warn,
 #endif
 
 struct bpf_object {
+   char license[64];
+   u32 kern_version;
/*
 * Information when doing elf related work. Only valid if fd
 * is valid.
@@ -220,6 +223,33 @@ mismatch:
return -EINVAL;
 }
 
+static int
+bpf_object__init_license(struct bpf_object *obj,
+void *data, size_t size)
+{
+   memcpy(obj->license, data,
+  min(size, sizeof(obj->license) - 1));
+   pr_debug("license of %s is %s\n", obj->path, obj->license);
+   return 0;
+}
+
+static int
+bpf_object__init_kversion(struct bpf_object *obj,
+ void *data, size_t size)
+{
+   u32 kver;
+
+   if (size != sizeof(kver)) {
+   pr_warning("invalid kver section in %s\n", obj->path);
+   return -EINVAL;
+   }
+   memcpy(, data, sizeof(kver));
+   obj->kern_version = kver;
+   pr_debug("kernel version of %s is %x\n", obj->path,
+obj->kern_version);
+   return 0;
+}
+
 static int bpf_object__elf_collect(struct bpf_object *obj)
 {
Elf *elf = obj->efile.elf;
@@ -266,11 +296,32 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
 name, (unsigned long)data->d_size,
 (int)sh.sh_link, (unsigned long)sh.sh_flags,
 (int)sh.sh_type);
+
+   if (strcmp(name, "license") == 0)
+   err = bpf_object__init_license(obj,
+  data->d_buf,
+  data->d_size);
+   else if (strcmp(name, "version") == 0)
+   err = bpf_object__init_kversion(obj,
+   data->d_buf,
+   data->d_size);
+   if (err)
+   goto out;
}
 out:
return err;
 }
 
+static int bpf_object__validate(struct bpf_object *obj)
+{
+   if (obj->kern_version == 0) {
+   pr_warning("%s doesn't provide kernel version\n",
+  obj->path);
+   return -EINVAL;
+   }
+   return 0;
+}
+
 static struct bpf_object *
 __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz)
 {
@@ -291,6 +342,8 @@ __bpf_object__open(const char *path, void *obj_buf, size_t 
obj_buf_sz)
goto out;
if (bpf_object__elf_collect(obj))
goto out;
+   if (bpf_object__validate(obj))
+   goto out;
 
bpf_object__elf_finish(obj);
return obj;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 0/5] Allow user to request memory to be locked on page fault

2015-07-08 Thread Eric B Munson
On Tue, 07 Jul 2015, Andrew Morton wrote:

> On Tue,  7 Jul 2015 13:03:38 -0400 Eric B Munson  wrote:
> 
> > mlock() allows a user to control page out of program memory, but this
> > comes at the cost of faulting in the entire mapping when it is
> > allocated.  For large mappings where the entire area is not necessary
> > this is not ideal.  Instead of forcing all locked pages to be present
> > when they are allocated, this set creates a middle ground.  Pages are
> > marked to be placed on the unevictable LRU (locked) when they are first
> > used, but they are not faulted in by the mlock call.
> > 
> > This series introduces a new mlock() system call that takes a flags
> > argument along with the start address and size.  This flags argument
> > gives the caller the ability to request memory be locked in the
> > traditional way, or to be locked after the page is faulted in.  New
> > calls are added for munlock() and munlockall() which give the called a
> > way to specify which flags are supposed to be cleared.  A new MCL flag
> > is added to mirror the lock on fault behavior from mlock() in
> > mlockall().  Finally, a flag for mmap() is added that allows a user to
> > specify that the covered are should not be paged out, but only after the
> > memory has been used the first time.
> 
> Thanks for sticking with this.  Adding new syscalls is a bit of a
> hassle but I do think we end up with a better interface - the existing
> mlock/munlock/mlockall interfaces just aren't appropriate for these
> things.
> 
> I don't know whether these syscalls should be documented via new
> manpages, or if we should instead add them to the existing
> mlock/munlock/mlockall manpages.  Michael, could you please advise?
> 

Thanks for adding the series.  I owe you several updates (getting the
new syscall right for all architectures and a set of tests for the new
syscalls).  Would you prefer a new pair of patches or I update this set?

Eric


signature.asc
Description: Digital signature


[PATCH v11 17/39] bpf tools: Relocate eBPF programs

2015-07-08 Thread Wang Nan
If an eBPF program accesses a map, LLVM generates a load instruction
which loads an absolute address into a register, like this:

  ld_64   r1, 
  ...
  call2

That ld_64 instruction will be recorded in relocation section.
To enable the usage of that map, relocation must be done by replacing
the immediate value by real map file descriptor so it can be found by
eBPF map functions.

This patch to the relocation work based on information collected by
patches:

'bpf tools: Collect symbol table from SHT_SYMTAB section',
'bpf tools: Collect relocation sections from SHT_REL sections'
and
'bpf tools: Record map accessing instructions for each program'.

For each instruction which needs relocation, it inject corresponding
file descriptor to imm field. As a part of protocol, src_reg is set to
BPF_PSEUDO_MAP_FD to notify kernel this is a map loading instruction.

This is the final part of map relocation patch. The principle of map
relocation is described in commit message of 'bpf tools: Collect symbol
table from SHT_SYMTAB section'.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-18-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 52 ++
 1 file changed, 52 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index c214e1c..cd40ae0 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -633,6 +633,56 @@ bpf_object__create_maps(struct bpf_object *obj)
return 0;
 }
 
+static int
+bpf_program__relocate(struct bpf_program *prog, int *map_fds)
+{
+   int i;
+
+   if (!prog || !prog->reloc_desc)
+   return 0;
+
+   for (i = 0; i < prog->nr_reloc; i++) {
+   int insn_idx, map_idx;
+   struct bpf_insn *insns = prog->insns;
+
+   insn_idx = prog->reloc_desc[i].insn_idx;
+   map_idx = prog->reloc_desc[i].map_idx;
+
+   if (insn_idx >= (int)prog->insns_cnt) {
+   pr_warning("relocation out of range: '%s'\n",
+  prog->section_name);
+   return -ERANGE;
+   }
+   insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
+   insns[insn_idx].imm = map_fds[map_idx];
+   }
+
+   zfree(>reloc_desc);
+   prog->nr_reloc = 0;
+   return 0;
+}
+
+
+static int
+bpf_object__relocate(struct bpf_object *obj)
+{
+   struct bpf_program *prog;
+   size_t i;
+   int err;
+
+   for (i = 0; i < obj->nr_programs; i++) {
+   prog = >programs[i];
+
+   err = bpf_program__relocate(prog, obj->map_fds);
+   if (err) {
+   pr_warning("failed to relocate '%s'\n",
+  prog->section_name);
+   return err;
+   }
+   }
+   return 0;
+}
+
 static int bpf_object__collect_reloc(struct bpf_object *obj)
 {
int i, err;
@@ -764,6 +814,8 @@ int bpf_object__load(struct bpf_object *obj)
obj->loaded = true;
if (bpf_object__create_maps(obj))
goto out;
+   if (bpf_object__relocate(obj))
+   goto out;
 
return 0;
 out:
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 23/39] perf tools: Introduce llvm config options

2015-07-08 Thread Wang Nan
This patch introduces [llvm] config section with 5 options. Following
patches will use then to config llvm dynamica compiling.

'llvm-utils.[ch]' is introduced in this patch for holding all
llvm/clang related stuffs.

Example:

  [llvm]
# Path to clang. If omit, search it from $PATH.
clang-path = "/path/to/clang"

# Cmdline template. Following line shows its default value.
# Environment variable is used to passing options.
clang-bpf-cmd-template = "$CLANG_EXEC $CLANG_OPTIONS \
  $KERNEL_INC_OPTIONS -Wno-unused-value \
  -Wno-pointer-sign -working-directory \
  $WORKING_DIR  -c $CLANG_SOURCE -target \
  bpf -O2 -o -"

# Options passed to clang, will be passed to cmdline by
# $CLANG_OPTIONS.
clang-opt = "-Wno-unused-value -Wno-pointer-sign"

# kbuild directory. If not set, use /lib/modules/`uname -r`/build.
# If set to "" deliberately, skip kernel header auto-detector.
kbuild-dir = "/path/to/kernel/build"

# Options passed to 'make' when detecting kernel header options.
kbuild-opts = "ARCH=x86_64"

Signed-off-by: Wang Nan 
---
 tools/perf/util/Build|  1 +
 tools/perf/util/config.c |  4 
 tools/perf/util/llvm-utils.c | 45 
 tools/perf/util/llvm-utils.h | 36 +++
 4 files changed, 86 insertions(+)
 create mode 100644 tools/perf/util/llvm-utils.c
 create mode 100644 tools/perf/util/llvm-utils.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 601d114..bc24293 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -14,6 +14,7 @@ libperf-y += find_next_bit.o
 libperf-y += help.o
 libperf-y += kallsyms.o
 libperf-y += levenshtein.o
+libperf-y += llvm-utils.o
 libperf-y += parse-options.o
 libperf-y += parse-events.o
 libperf-y += path.o
diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index e18f653..2e452ac 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -12,6 +12,7 @@
 #include "cache.h"
 #include "exec_cmd.h"
 #include "util/hist.h"  /* perf_hist_config */
+#include "util/llvm-utils.h"   /* perf_llvm_config */
 
 #define MAXNAME (256)
 
@@ -408,6 +409,9 @@ int perf_default_config(const char *var, const char *value,
if (!prefixcmp(var, "call-graph."))
return perf_callchain_config(var, value);
 
+   if (!prefixcmp(var, "llvm."))
+   return perf_llvm_config(var, value);
+
/* Add other config variables here. */
return 0;
 }
diff --git a/tools/perf/util/llvm-utils.c b/tools/perf/util/llvm-utils.c
new file mode 100644
index 000..fd5b1bc
--- /dev/null
+++ b/tools/perf/util/llvm-utils.c
@@ -0,0 +1,45 @@
+/*
+ * Copyright (C) 2015, Wang Nan 
+ * Copyright (C) 2015, Huawei Inc.
+ */
+
+#include 
+#include "util.h"
+#include "debug.h"
+#include "llvm-utils.h"
+#include "cache.h"
+
+#define CLANG_BPF_CMD_DEFAULT_TEMPLATE \
+   "$CLANG_EXEC $CLANG_OPTIONS $KERNEL_INC_OPTIONS " \
+   "-Wno-unused-value -Wno-pointer-sign "  \
+   "-working-directory $WORKING_DIR "  \
+   " -c \"$CLANG_SOURCE\" -target bpf -O2 -o -"
+
+struct llvm_param llvm_param = {
+   .clang_path = "clang",
+   .clang_bpf_cmd_template = CLANG_BPF_CMD_DEFAULT_TEMPLATE,
+   .clang_opt = NULL,
+   .kbuild_dir = NULL,
+   .kbuild_opts = NULL,
+};
+
+int perf_llvm_config(const char *var, const char *value)
+{
+   if (prefixcmp(var, "llvm."))
+   return 0;
+   var += sizeof("llvm.") - 1;
+
+   if (!strcmp(var, "clang-path"))
+   llvm_param.clang_path = strdup(value);
+   else if (!strcmp(var, "clang-bpf-cmd-template"))
+   llvm_param.clang_bpf_cmd_template = strdup(value);
+   else if (!strcmp(var, "clang-opt"))
+   llvm_param.clang_opt = strdup(value);
+   else if (!strcmp(var, "kbuild-dir"))
+   llvm_param.kbuild_dir = strdup(value);
+   else if (!strcmp(var, "kbuild-opts"))
+   llvm_param.kbuild_opts = strdup(value);
+   else
+   return -1;
+   return 0;
+}
diff --git a/tools/perf/util/llvm-utils.h b/tools/perf/util/llvm-utils.h
new file mode 100644
index 000..504b799
--- /dev/null
+++ b/tools/perf/util/llvm-utils.h
@@ -0,0 +1,36 @@
+/*
+ * Copyright (C) 2015, Wang Nan 
+ * Copyright (C) 2015, Huawei Inc.
+ */
+#ifndef __LLVM_UTILS_H
+#define __LLVM_UTILS_H
+
+#include "debug.h"
+
+struct llvm_param {
+   /* Path of clang executable */
+   const char *clang_path;
+   /*
+* Template of clang bpf compiling. 5 env variables
+* can be used:
+*   $CLANG_EXEC:   Path to clang.
+*   $CLANG_OPTIONS:Extra options to clang.
+

[PATCH v11 31/39] perf tools: Parse probe points of eBPF programs during preparation

2015-07-08 Thread Wang Nan
This patch parses section name of each program, and creates
corresponding 'struct perf_probe_event' structure.

parse_perf_probe_command() is used to do the main parsing works.
Parsing result is stored into a global array. This is because
add_perf_probe_events() is non-reentrantable. In following patch,
add_perf_probe_events will be introduced to insert kprobes. It accepts
an array of 'struct perf_probe_event' and do all works in one call.

Define PERF_BPF_PROBE_GROUP as "perf_bpf_probe", which will be used
as group name of all eBPF probing points.

This patch utilizes bpf_program__set_private(), bind perf_probe_event
with bpf program by private field.

Signed-off-by: Wang Nan 
---
 tools/perf/util/bpf-loader.c | 126 ++-
 tools/perf/util/bpf-loader.h |   2 +
 2 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 61d3adf..e810d05 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -10,6 +10,8 @@
 #include "debug.h"
 #include "bpf-loader.h"
 #include "llvm-utils.h"
+#include "probe-event.h"
+#include "probe-finder.h"
 
 #define DEFINE_PRINT_FN(name, level) \
 static int libbpf_##name(const char *fmt, ...) \
@@ -29,9 +31,122 @@ DEFINE_PRINT_FN(debug, 1)
 
 static bool libbpf_initialized;
 
+static struct perf_probe_event probe_event_array[MAX_PROBES];
+static size_t nr_probe_events;
+
+static struct perf_probe_event *
+alloc_perf_probe_event(void)
+{
+   struct perf_probe_event *pev;
+   int n = nr_probe_events;
+
+   if (n >= MAX_PROBES) {
+   pr_err("bpf: too many events, increase MAX_PROBES\n");
+   return NULL;
+   }
+
+   nr_probe_events = n + 1;
+   pev = _event_array[n];
+   bzero(pev, sizeof(*pev));
+   return pev;
+}
+
+struct bpf_prog_priv {
+   struct perf_probe_event *pev;
+};
+
+static void
+bpf_prog_priv__clear(struct bpf_program *prog __maybe_unused,
+ void *_priv)
+{
+   struct bpf_prog_priv *priv = _priv;
+
+   if (priv->pev)
+   clear_perf_probe_event(priv->pev);
+   free(priv);
+}
+
+static int
+config_bpf_program(struct bpf_program *prog)
+{
+   struct perf_probe_event *pev = alloc_perf_probe_event();
+   struct bpf_prog_priv *priv = NULL;
+   const char *config_str;
+   int err;
+
+   /* pr_err has been done by alloc_perf_probe_event */
+   if (!pev)
+   return -ENOMEM;
+
+   err = bpf_program__get_title(prog, _str, false);
+   if (err || !config_str) {
+   pr_err("bpf: unable to get title for program\n");
+   return -EINVAL;
+   }
+
+   pr_debug("bpf: config program '%s'\n", config_str);
+   err = parse_perf_probe_command(config_str, pev);
+   if (err < 0) {
+   pr_err("bpf: '%s' is not a valid config string\n",
+  config_str);
+   /* parse failed, don't need clear pev. */
+   return -EINVAL;
+   }
+
+   if (pev->group && strcmp(pev->group, PERF_BPF_PROBE_GROUP)) {
+   pr_err("bpf: '%s': group for event is set and not '%s'.\n",
+  config_str, PERF_BPF_PROBE_GROUP);
+   err = -EINVAL;
+   goto errout;
+   } else if (!pev->group)
+   pev->group = strdup(PERF_BPF_PROBE_GROUP);
+
+   if (!pev->group) {
+   pr_err("bpf: strdup failed\n");
+   err = -ENOMEM;
+   goto errout;
+   }
+
+   if (!pev->event) {
+   pr_err("bpf: '%s': event name is missing\n",
+  config_str);
+   err = -EINVAL;
+   goto errout;
+   }
+
+   pr_debug("bpf: config '%s' is ok\n", config_str);
+
+   priv = calloc(1, sizeof(*priv));
+   if (!priv) {
+   pr_err("bpf: failed to alloc memory\n");
+   err = -ENOMEM;
+   goto errout;
+   }
+
+   priv->pev = pev;
+
+   err = bpf_program__set_private(prog, priv,
+  bpf_prog_priv__clear);
+   if (err) {
+   pr_err("bpf: set program private failed\n");
+   err = -ENOMEM;
+   goto errout;
+   }
+   return 0;
+
+errout:
+   if (pev)
+   clear_perf_probe_event(pev);
+   if (priv)
+   free(priv);
+   return err;
+}
+
 int bpf__prepare_load(const char *filename, bool source)
 {
struct bpf_object *obj;
+   struct bpf_program *prog;
+   int err = 0;
 
if (!libbpf_initialized)
libbpf_set_print(libbpf_warning,
@@ -41,7 +156,6 @@ int bpf__prepare_load(const char *filename, bool source)
if (source) {
void *obj_buf;
size_t obj_buf_sz;
-   int err;
 
err = llvm__compile_bpf(filename, _buf, _buf_sz);
if (err)
@@ -56,12 +170,20 

[PATCH] sched/numa: Restore sched feature NUMA to its earlier avatar.

2015-07-08 Thread Srikar Dronamraju
In commit:8a9e62a "sched/numa: Prefer NUMA hotness over cache hotness"
sched feature NUMA was always set to true. However this sched feature was
suppose to be enabled on NUMA boxes only thro set_numabalancing_state().

To get back to the above behaviour, bring back NUMA_FAVOUR_HIGHER feature.
Signed-off-by: Srikar Dronamraju 

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 587a2f6..aea72d5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5676,10 +5676,10 @@ static int migrate_degrades_locality(struct task_struct 
*p, struct lb_env *env)
unsigned long src_faults, dst_faults;
int src_nid, dst_nid;
 
-   if (!p->numa_faults || !(env->sd->flags & SD_NUMA))
+   if (!sched_feat(NUMA) || !sched_feat(NUMA_FAVOUR_HIGHER))
return -1;
 
-   if (!sched_feat(NUMA))
+   if (!p->numa_faults || !(env->sd->flags & SD_NUMA))
return -1;
 
src_nid = cpu_to_node(env->src_cpu);
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 83a50e7..d4d4726 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -79,12 +79,13 @@ SCHED_FEAT(LB_MIN, false)
  * numa_balancing=
  */
 #ifdef CONFIG_NUMA_BALANCING
+SCHED_FEAT(NUMA,   false)
 
 /*
- * NUMA will favor moving tasks towards nodes where a higher number of
- * hinting faults are recorded during active load balancing. It will
- * resist moving tasks towards nodes where a lower number of hinting
- * faults have been recorded.
+ * NUMA_FAVOUR_HIGHER will favor moving tasks towards nodes where a
+ * higher number of hinting faults are recorded during active load
+ * balancing. It will resist moving tasks towards nodes where a lower
+ * number of hinting faults have been recorded.
  */
-SCHED_FEAT(NUMA,   true)
+SCHED_FEAT(NUMA_FAVOUR_HIGHER, true)
 #endif

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 4/4] trace: Trace log handler for logging into STM blocks

2015-07-08 Thread Ingo Molnar

* Chunyan Zhang  wrote:

> >> +++ b/include/trace/perf.h
> >> @@ -175,6 +175,7 @@ trace_event_raw_event_##call(void *__data, proto)  
> >>\
> >>   { assign; } \
> >>   \
> >>   trace_event_buffer_commit();\
> >> + trace_event_stm_log();  \
> >
> > This makes every trace event slower.
> 
> It doesn't actually, you may decide if enable this feature, the trace event 
> will 
> not be slowed if STM_TRACE_EVENT is not selected.

It slows down everyone if a distro enables this feature - that's not acceptable.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 29/39] perf record: Enable passing bpf object file to --event

2015-07-08 Thread Wang Nan
By introducing new rules in tools/perf/util/parse-events.[ly], this
patch enables 'perf record --event bpf_file.o' to select events by
an eBPF object file. It calls parse_events_load_bpf() to load that
file, which uses bpf__prepare_load() and finally calls
bpf_object__open() for the object files.

Instead of introducing evsel to evlist during parsing, events
selected by eBPF object files are appended separately. The reason
is:

 1. During parsing, the probing points have not been initialized.

 2. Currently we are unable to call add_perf_probe_events() twice,
therefore we have to wait until all such events are collected,
then probe all points by one call.

The real probing and selecting is reside in following patches.

'bpf-loader.[ch]' are introduced in this patch. Which will be the
interface between perf and libbpf. bpf__prepare_load() resides in
bpf-loader.c. Dummy functions should be used because bpf-loader.c is
available only when CONFIG_LIBBPF is on.

Signed-off-by: Wang Nan 
---
 tools/perf/util/Build  |  1 +
 tools/perf/util/bpf-loader.c   | 60 ++
 tools/perf/util/bpf-loader.h   | 24 +
 tools/perf/util/debug.c|  5 
 tools/perf/util/debug.h|  1 +
 tools/perf/util/parse-events.c | 16 +++
 tools/perf/util/parse-events.h |  2 ++
 tools/perf/util/parse-events.l |  3 +++
 tools/perf/util/parse-events.y | 18 -
 9 files changed, 129 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/util/bpf-loader.c
 create mode 100644 tools/perf/util/bpf-loader.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index bc24293..3357e5a 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -79,6 +79,7 @@ libperf-y += thread-stack.o
 libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-y += parse-branch-options.o
 
+libperf-$(CONFIG_LIBBPF) += bpf-loader.o
 libperf-$(CONFIG_LIBELF) += symbol-elf.o
 libperf-$(CONFIG_LIBELF) += probe-event.o
 
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
new file mode 100644
index 000..7c750b6
--- /dev/null
+++ b/tools/perf/util/bpf-loader.c
@@ -0,0 +1,60 @@
+/*
+ * bpf-loader.c
+ *
+ * Copyright (C) 2015 Wang Nan 
+ * Copyright (C) 2015 Huawei Inc.
+ */
+
+#include 
+#include "perf.h"
+#include "debug.h"
+#include "bpf-loader.h"
+
+#define DEFINE_PRINT_FN(name, level) \
+static int libbpf_##name(const char *fmt, ...) \
+{  \
+   va_list args;   \
+   int ret;\
+   \
+   va_start(args, fmt);\
+   ret = veprintf(level, verbose, pr_fmt(fmt), args);\
+   va_end(args);   \
+   return ret; \
+}
+
+DEFINE_PRINT_FN(warning, 0)
+DEFINE_PRINT_FN(info, 0)
+DEFINE_PRINT_FN(debug, 1)
+
+static bool libbpf_initialized;
+
+int bpf__prepare_load(const char *filename)
+{
+   struct bpf_object *obj;
+
+   if (!libbpf_initialized)
+   libbpf_set_print(libbpf_warning,
+libbpf_info,
+libbpf_debug);
+
+   obj = bpf_object__open(filename);
+   if (!obj) {
+   pr_err("bpf: failed to load %s\n", filename);
+   return -EINVAL;
+   }
+
+   /*
+* Throw object pointer away: it will be retrived using
+* bpf_objects iterater.
+*/
+
+   return 0;
+}
+
+void bpf__clear(void)
+{
+   struct bpf_object *obj, *tmp;
+
+   bpf_object__for_each_safe(obj, tmp)
+   bpf_object__close(obj);
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
new file mode 100644
index 000..39d8d1a
--- /dev/null
+++ b/tools/perf/util/bpf-loader.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (C) 2015, Wang Nan 
+ * Copyright (C) 2015, Huawei Inc.
+ */
+#ifndef __BPF_LOADER_H
+#define __BPF_LOADER_H
+
+#include 
+#include "debug.h"
+
+#ifdef HAVE_LIBBPF_SUPPORT
+int bpf__prepare_load(const char *filename);
+
+void bpf__clear(void);
+#else
+static inline int bpf__prepare_load(const char *filename __maybe_unused)
+{
+   pr_err("ERROR: eBPF object loading is disabled during compiling.\n");
+   return -1;
+}
+
+static inline void bpf__clear(void) { }
+#endif
+#endif
diff --git a/tools/perf/util/debug.c b/tools/perf/util/debug.c
index 2da5581..86d9c73 100644
--- a/tools/perf/util/debug.c
+++ b/tools/perf/util/debug.c
@@ -36,6 +36,11 @@ static int _eprintf(int level, int var, const char *fmt, 
va_list args)
return ret;
 }
 
+int veprintf(int level, int var, const char *fmt, va_list args)
+{
+   return _eprintf(level, var, fmt, args);
+}
+
 int eprintf(int level, int var, const char *fmt, ...)
 {
va_list args;
diff --git a/tools/perf/util/debug.h b/tools/perf/util/debug.h
index caac2fd..8b9a088 100644
--- a/tools/perf/util/debug.h
+++ 

[PATCH v11 16/39] bpf tools: Create eBPF maps defined in an object file

2015-07-08 Thread Wang Nan
This patch creates maps based on 'map' section in object file using
bpf_create_map(), and stores the fds into an array in 'struct
bpf_object'.

Previous patches parse ELF object file and collects required data, but
doesn't play with the kernel. They belong to the 'opening' phase. This
patch is the first patch in 'loading' phase. The 'loaded' field is
introduced in 'struct bpf_object' to avoid loading an object twice,
because the loading phase clears resources collected during the opening
which becomes useless after loading. In this patch, maps_buf is cleared.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-17-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 102 +
 tools/lib/bpf/libbpf.h |   4 ++
 2 files changed, 106 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 4f13772..c214e1c 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -21,6 +21,7 @@
 #include 
 
 #include "libbpf.h"
+#include "bpf.h"
 
 #define __printf(a, b) __attribute__((format(printf, a, b)))
 
@@ -105,6 +106,13 @@ struct bpf_object {
 
struct bpf_program *programs;
size_t nr_programs;
+   int *map_fds;
+   /*
+* This field is required because maps_buf will be freed and
+* maps_buf_sz will be set to 0 after loaded.
+*/
+   size_t nr_map_fds;
+   bool loaded;
 
/*
 * Information when doing elf related work. Only valid if fd
@@ -232,6 +240,7 @@ static struct bpf_object *bpf_object__new(const char *path,
obj->efile.obj_buf = obj_buf;
obj->efile.obj_buf_sz = obj_buf_sz;
 
+   obj->loaded = false;
return obj;
 }
 
@@ -568,6 +577,62 @@ bpf_program__collect_reloc(struct bpf_program *prog,
return 0;
 }
 
+static int
+bpf_object__create_maps(struct bpf_object *obj)
+{
+   unsigned int i;
+   size_t nr_maps;
+   int *pfd;
+
+   nr_maps = obj->maps_buf_sz / sizeof(struct bpf_map_def);
+   if (!obj->maps_buf || !nr_maps) {
+   pr_debug("don't need create maps for %s\n",
+obj->path);
+   return 0;
+   }
+
+   obj->map_fds = malloc(sizeof(int) * nr_maps);
+   if (!obj->map_fds) {
+   pr_warning("realloc perf_bpf_map_fds failed\n");
+   return -ENOMEM;
+   }
+   obj->nr_map_fds = nr_maps;
+
+   /* fill all fd with -1 */
+   memset(obj->map_fds, 0xff, sizeof(int) * nr_maps);
+
+   pfd = obj->map_fds;
+   for (i = 0; i < nr_maps; i++) {
+   struct bpf_map_def def;
+
+   def = *(struct bpf_map_def *)(obj->maps_buf +
+   i * sizeof(struct bpf_map_def));
+
+   *pfd = bpf_create_map(def.type,
+ def.key_size,
+ def.value_size,
+ def.max_entries);
+   if (*pfd < 0) {
+   size_t j;
+   int err = *pfd;
+
+   pr_warning("failed to create map: %s\n",
+  strerror(errno));
+   for (j = 0; j < i; j++)
+   zclose(obj->map_fds[j]);
+   obj->nr_map_fds = 0;
+   zfree(>map_fds);
+   return err;
+   }
+   pr_debug("create map: fd=%d\n", *pfd);
+   pfd++;
+   }
+
+   zfree(>maps_buf);
+   obj->maps_buf_sz = 0;
+   return 0;
+}
+
 static int bpf_object__collect_reloc(struct bpf_object *obj)
 {
int i, err;
@@ -671,6 +736,42 @@ struct bpf_object *bpf_object__open_buffer(void *obj_buf,
return __bpf_object__open("[buffer]", obj_buf, obj_buf_sz);
 }
 
+int bpf_object__unload(struct bpf_object *obj)
+{
+   size_t i;
+
+   if (!obj)
+   return -EINVAL;
+
+   for (i = 0; i < obj->nr_map_fds; i++)
+   zclose(obj->map_fds[i]);
+   zfree(>map_fds);
+   obj->nr_map_fds = 0;
+
+   return 0;
+}
+
+int bpf_object__load(struct bpf_object *obj)
+{
+   if (!obj)
+   return -EINVAL;
+
+   if (obj->loaded) {
+   pr_warning("object should not be loaded twice\n");
+   return -EINVAL;
+   }
+
+   obj->loaded = true;
+   if (bpf_object__create_maps(obj))
+   goto out;
+
+   return 0;
+out:
+   bpf_object__unload(obj);
+   pr_warning("failed to load object '%s'\n", obj->path);
+   return -EINVAL;
+}
+
 void bpf_object__close(struct bpf_object *obj)
 {
size_t i;
@@ 

[PATCH v11 18/39] bpf tools: Introduce bpf_load_program() to bpf.c

2015-07-08 Thread Wang Nan
bpf_load_program() can be used to load bpf program into kernel. To make
loading faster, first try to load without logbuf. Try again with logbuf
if the first try failed.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-19-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/bpf.c | 34 ++
 tools/lib/bpf/bpf.h |  7 +++
 2 files changed, 41 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 208de7c3..a633105 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -29,6 +29,11 @@
 # endif
 #endif
 
+static __u64 ptr_to_u64(void *ptr)
+{
+   return (__u64) (unsigned long) ptr;
+}
+
 static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
   unsigned int size)
 {
@@ -49,3 +54,32 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size,
 
return sys_bpf(BPF_MAP_CREATE, , sizeof(attr));
 }
+
+int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
+size_t insns_cnt, char *license,
+u32 kern_version, char *log_buf, size_t log_buf_sz)
+{
+   int fd;
+   union bpf_attr attr;
+
+   bzero(, sizeof(attr));
+   attr.prog_type = type;
+   attr.insn_cnt = (__u32)insns_cnt;
+   attr.insns = ptr_to_u64(insns);
+   attr.license = ptr_to_u64(license);
+   attr.log_buf = ptr_to_u64(NULL);
+   attr.log_size = 0;
+   attr.log_level = 0;
+   attr.kern_version = kern_version;
+
+   fd = sys_bpf(BPF_PROG_LOAD, , sizeof(attr));
+   if (fd >= 0 || !log_buf || !log_buf_sz)
+   return fd;
+
+   /* Try again with log */
+   attr.log_buf = ptr_to_u64(log_buf);
+   attr.log_size = log_buf_sz;
+   attr.log_level = 1;
+   log_buf[0] = 0;
+   return sys_bpf(BPF_PROG_LOAD, , sizeof(attr));
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 28f7942..854b736 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -13,4 +13,11 @@
 int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
   int max_entries);
 
+/* Recommend log buffer size */
+#define BPF_LOG_BUF_SIZE 65536
+int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
+size_t insns_cnt, char *license,
+u32 kern_version, char *log_buf,
+size_t log_buf_sz);
+
 #endif
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 28/39] perf tools: Make perf depend on libbpf

2015-07-08 Thread Wang Nan
By adding libbpf into perf's Makefile, this patch enables perf to build
libbpf during building if libelf is found and neither NO_LIBELF nor
NO_LIBBPF is set. The newly introduced code is similar to libapi and
libtraceevent building in Makefile.perf.

MANIFEST is also updated for 'make perf-*-src-pkg'.

Append make_no_libbpf to tools/perf/tests/make.

'bpf' feature check is appended into default FEATURE_TESTS and
FEATURE_DISPLAY, so perf will check API version of bpf in
/path/to/kernel/include/uapi/linux/bpf.h. Which should not fail except
when we are trying to port this code to an old kernel.

Error messages are also updated to notify users about the disable of BPF
support of 'perf record' if libelf is missed or BPF API check failed.

Signed-off-by: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-24-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/build/Makefile.feature |  6 --
 tools/perf/MANIFEST  |  3 +++
 tools/perf/Makefile.perf | 19 +--
 tools/perf/config/Makefile   | 19 ++-
 tools/perf/tests/make|  4 +++-
 5 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index 2975632..5ec6b37 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -51,7 +51,8 @@ FEATURE_TESTS ?=  \
timerfd \
libdw-dwarf-unwind  \
zlib\
-   lzma
+   lzma\
+   bpf
 
 FEATURE_DISPLAY ?= \
dwarf   \
@@ -67,7 +68,8 @@ FEATURE_DISPLAY ?=\
libunwind   \
libdw-dwarf-unwind  \
zlib\
-   lzma
+   lzma\
+   bpf
 
 # Set FEATURE_CHECK_(C|LD)FLAGS-all for all FEATURE_TESTS features.
 # If in the future we need per-feature checks/flags for features not
diff --git a/tools/perf/MANIFEST b/tools/perf/MANIFEST
index 09dc0aa..1b42f7c 100644
--- a/tools/perf/MANIFEST
+++ b/tools/perf/MANIFEST
@@ -17,6 +17,7 @@ tools/build
 tools/arch/x86/include/asm/atomic.h
 tools/arch/x86/include/asm/rmwcc.h
 tools/lib/traceevent
+tools/lib/bpf
 tools/lib/api
 tools/lib/rbtree.c
 tools/lib/symbol/kallsyms.c
@@ -67,6 +68,8 @@ arch/*/lib/memset*.S
 include/linux/poison.h
 include/linux/hw_breakpoint.h
 include/uapi/linux/perf_event.h
+include/uapi/linux/bpf.h
+include/uapi/linux/bpf_common.h
 include/uapi/linux/const.h
 include/uapi/linux/swab.h
 include/uapi/linux/hw_breakpoint.h
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 7a4b549..4857129 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -125,6 +125,7 @@ STRIP   = strip
 
 LIB_DIR  = $(srctree)/tools/lib/api/
 TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/
+BPF_DIR = $(srctree)/tools/lib/bpf/
 
 # include config/Makefile by default and rule out
 # non-config cases
@@ -160,6 +161,7 @@ strip-libs = $(filter-out -l%,$(1))
 
 ifneq ($(OUTPUT),)
   TE_PATH=$(OUTPUT)
+  BPF_PATH=$(OUTPUT)
 ifneq ($(subdir),)
   LIB_PATH=$(OUTPUT)/../lib/api/
 else
@@ -168,6 +170,7 @@ endif
 else
   TE_PATH=$(TRACE_EVENT_DIR)
   LIB_PATH=$(LIB_DIR)
+  BPF_PATH=$(BPF_DIR)
 endif
 
 LIBTRACEEVENT = $(TE_PATH)libtraceevent.a
@@ -179,6 +182,8 @@ LIBTRACEEVENT_DYNAMIC_LIST_LDFLAGS = -Xlinker 
--dynamic-list=$(LIBTRACEEVENT_DYN
 LIBAPI = $(LIB_PATH)libapi.a
 export LIBAPI
 
+LIBBPF = $(BPF_PATH)libbpf.a
+
 # python extension build directories
 PYTHON_EXTBUILD := $(OUTPUT)python_ext_build/
 PYTHON_EXTBUILD_LIB := $(PYTHON_EXTBUILD)lib/
@@ -231,6 +236,9 @@ export PERL_PATH
 LIB_FILE=$(OUTPUT)libperf.a
 
 PERFLIBS = $(LIB_FILE) $(LIBAPI) $(LIBTRACEEVENT)
+ifndef NO_LIBBPF
+  PERFLIBS += $(LIBBPF)
+endif
 
 # We choose to avoid "if .. else if .. else .. endif endif"
 # because maintaining the nesting to match is a pain.  If
@@ -400,6 +408,13 @@ $(LIBAPI)-clean:
$(call QUIET_CLEAN, libapi)
$(Q)$(MAKE) -C $(LIB_DIR) O=$(OUTPUT) clean >/dev/null
 
+$(LIBBPF): FORCE
+   $(Q)$(MAKE) -C $(BPF_DIR) O=$(OUTPUT) $(OUTPUT)libbpf.a
+
+$(LIBBPF)-clean:
+   $(call QUIET_CLEAN, libbpf)
+   $(Q)$(MAKE) -C $(BPF_DIR) O=$(OUTPUT) clean >/dev/null
+
 help:
@echo 'Perf make targets:'
@echo '  doc- make *all* documentation (see below)'
@@ -439,7 +454,7 @@ INSTALL_DOC_TARGETS += quick-install-doc quick-install-man 
quick-install-html
 $(DOC_TARGETS):
$(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) $(@:doc=all)
 
-TAG_FOLDERS= . ../lib/traceevent ../lib/api ../lib/symbol

[PATCH v11 37/39] perf tools: Suppress probing messages when probing by BPF loading

2015-07-08 Thread Wang Nan
This patch suppress message output by add_perf_probe_events() and
del_perf_probe_events() if they are triggered by BPF loading. Before
this patch, when using 'perf record' with BPF object/source as event
selector, following message will be output:

 Added new event:
   perf_bpf_probe:lock_page_ret (on __lock_page%return)
You can now use it in all perf tools, such as:
perf record -e perf_bpf_probe:lock_page_ret -aR sleep 1
 ...
 Removed event: perf_bpf_probe:lock_page_ret

Which is misleading, especially 'use it in all perf tools' because they
will be removed after 'pref record' exit.

In this patch, a 'silent' field is appended into probe_conf to control
output. bpf__{,un}probe() set it to true when calling
{add,del}_perf_probe_events().

Signed-off-by: Wang Nan 
---
 tools/perf/util/bpf-loader.c  |  6 ++
 tools/perf/util/probe-event.c | 22 --
 tools/perf/util/probe-event.h |  1 +
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 63077b9..4aa372b 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -199,6 +199,7 @@ static bool is_probing;
 int bpf__unprobe(void)
 {
struct strfilter *delfilter;
+   bool old_silent = probe_conf.silent;
int ret;
 
if (!is_probing)
@@ -210,7 +211,9 @@ int bpf__unprobe(void)
return -ENOMEM;
}
 
+   probe_conf.silent = true;
ret = del_perf_probe_events(delfilter);
+   probe_conf.silent = old_silent;
strfilter__delete(delfilter);
if (ret < 0 && is_probing)
pr_err("Error: failed to delete events: %s\n",
@@ -223,15 +226,18 @@ int bpf__unprobe(void)
 int bpf__probe(void)
 {
int err;
+   bool old_silent = probe_conf.silent;
 
if (nr_probe_events <= 0)
return 0;
 
+   probe_conf.silent = true;
probe_conf.max_probes = MAX_PROBES;
/* Let add_perf_probe_events keeps probe_trace_event */
err = add_perf_probe_events(probe_event_array,
nr_probe_events,
false);
+   probe_conf.silent = old_silent;
 
/* add_perf_probe_events return negative when fail */
if (err < 0)
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 083e8b4..b9573c5 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -51,7 +51,9 @@
 #define PERFPROBE_GROUP "probe"
 
 bool probe_event_dry_run;  /* Dry run flag */
-struct probe_conf probe_conf;
+struct probe_conf probe_conf = {
+   .silent = false,
+};
 
 #define semantic_error(msg ...) pr_err("Semantic error :" msg)
 
@@ -2250,10 +2252,12 @@ static int show_perf_probe_event(const char *group, 
const char *event,
 
ret = perf_probe_event__sprintf(group, event, pev, module, );
if (ret >= 0) {
-   if (use_stdout)
+   if (use_stdout && !probe_conf.silent)
printf("%s\n", buf.buf);
-   else
+   else if (!probe_conf.silent)
pr_info("%s\n", buf.buf);
+   else
+   pr_debug("%s\n", buf.buf);
}
strbuf_release();
 
@@ -2512,7 +2516,10 @@ static int __add_probe_trace_events(struct 
perf_probe_event *pev,
 
safename = (pev->point.function && !strisglob(pev->point.function));
ret = 0;
-   pr_info("Added new event%s\n", (ntevs > 1) ? "s:" : ":");
+   if (!probe_conf.silent)
+   pr_info("Added new event%s\n", (ntevs > 1) ? "s:" : ":");
+   else
+   pr_debug("Added new event%s\n", (ntevs > 1) ? "s:" : ":");
for (i = 0; i < ntevs; i++) {
tev = [i];
/* Skip if the symbol is out of .text or blacklisted */
@@ -2569,7 +2576,7 @@ static int __add_probe_trace_events(struct 
perf_probe_event *pev,
warn_uprobe_event_compat(tev);
 
/* Note that it is possible to skip all events because of blacklist */
-   if (ret >= 0 && event) {
+   if (ret >= 0 && event && !probe_conf.silent) {
/* Show how to use the event. */
pr_info("\nYou can now use it in all perf tools, such as:\n\n");
pr_info("\tperf record -e %s:%s -aR sleep 1\n\n", group, event);
@@ -2865,7 +2872,10 @@ static int __del_trace_probe_event(int fd, struct 
str_node *ent)
goto error;
}
 
-   pr_info("Removed event: %s\n", ent->s);
+   if (!probe_conf.silent)
+   pr_info("Removed event: %s\n", ent->s);
+   else
+   pr_debug("Removed event: %s\n", ent->s);
return 0;
 error:
pr_warning("Failed to delete event: %s\n",
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 4b7a951..116b0aa 100644
--- a/tools/perf/util/probe-event.h

[PATCH v11 03/39] bpf tools: Introduce 'bpf' library and add bpf feature check

2015-07-08 Thread Wang Nan
This is the first patch of libbpf. The goal of libbpf is to create a
standard way for accessing eBPF object files. This patch creates
'Makefile' and 'Build' for it, allows 'make' to build libbpf.a and
libbpf.so, 'make install' to put them into proper directories.
Most part of Makefile is borrowed from traceevent.

Before building, it checks the existence of libelf in Makefile, and deny
to build if not found. Instead of throwing an error if libelf not found,
the error raises in a phony target "elfdep". This design is to ensure
'make clean' still workable even if libelf is not found.

Because libbpf requires 'kern_version' field set for 'union bpf_attr'
(bpfdep" is used for that dependency), Kernel BPF API is also checked
by intruducing a new feature check 'bpf' into tools/build/feature,
which checks the existence and version of linux/bpf.h. When building
libbpf, it searches that file from include/uapi/linux in kernel source
tree (controlled by FEATURE_CHECK_CFLAGS-bpf). Since it searches kernel
source tree it reside, installing of newest kernel headers is not
required, except we are trying to port these files to an old kernel.

To avoid checking that file when perf building, the newly introduced
'bpf' feature check doesn't added into FEATURE_TESTS and
FEATURE_DISPLAY by default in tools/build/Makefile.feature, but added
into libbpf's specific.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Bcc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-4-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/build/feature/Makefile   |   6 +-
 tools/build/feature/test-bpf.c |  18 
 tools/lib/bpf/.gitignore   |   2 +
 tools/lib/bpf/Build|   1 +
 tools/lib/bpf/Makefile | 195 +
 tools/lib/bpf/libbpf.c |  14 +++
 tools/lib/bpf/libbpf.h |  11 +++
 7 files changed, 246 insertions(+), 1 deletion(-)
 create mode 100644 tools/build/feature/test-bpf.c
 create mode 100644 tools/lib/bpf/.gitignore
 create mode 100644 tools/lib/bpf/Build
 create mode 100644 tools/lib/bpf/Makefile
 create mode 100644 tools/lib/bpf/libbpf.c
 create mode 100644 tools/lib/bpf/libbpf.h

diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
index 463ed8f..1c0d69f 100644
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@@ -33,7 +33,8 @@ FILES=\
test-compile-32.bin \
test-compile-x32.bin\
test-zlib.bin   \
-   test-lzma.bin
+   test-lzma.bin   \
+   test-bpf.bin
 
 CC := $(CROSS_COMPILE)gcc -MD
 PKG_CONFIG := $(CROSS_COMPILE)pkg-config
@@ -156,6 +157,9 @@ test-zlib.bin:
 test-lzma.bin:
$(BUILD) -llzma
 
+test-bpf.bin:
+   $(BUILD)
+
 -include *.d
 
 ###
diff --git a/tools/build/feature/test-bpf.c b/tools/build/feature/test-bpf.c
new file mode 100644
index 000..062bac8
--- /dev/null
+++ b/tools/build/feature/test-bpf.c
@@ -0,0 +1,18 @@
+#include 
+
+int main(void)
+{
+   union bpf_attr attr;
+
+   attr.prog_type = BPF_PROG_TYPE_KPROBE;
+   attr.insn_cnt = 0;
+   attr.insns = 0;
+   attr.license = 0;
+   attr.log_buf = 0;
+   attr.log_size = 0;
+   attr.log_level = 0;
+   attr.kern_version = 0;
+
+   attr = attr;
+   return 0;
+}
diff --git a/tools/lib/bpf/.gitignore b/tools/lib/bpf/.gitignore
new file mode 100644
index 000..812aeed
--- /dev/null
+++ b/tools/lib/bpf/.gitignore
@@ -0,0 +1,2 @@
+libbpf_version.h
+FEATURE-DUMP
diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
new file mode 100644
index 000..a316484
--- /dev/null
+++ b/tools/lib/bpf/Build
@@ -0,0 +1 @@
+libbpf-y := libbpf.o
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
new file mode 100644
index 000..f68d23a
--- /dev/null
+++ b/tools/lib/bpf/Makefile
@@ -0,0 +1,195 @@
+# Most of this file is copied from tools/lib/traceevent/Makefile
+
+BPF_VERSION = 0
+BPF_PATCHLEVEL = 0
+BPF_EXTRAVERSION = 1
+
+MAKEFLAGS += --no-print-directory
+
+
+# Makefiles suck: This macro sets a default value of $(2) for the
+# variable named by $(1), unless the variable has been set by
+# environment or command line. This is necessary for CC and AR
+# because make sets default values, so the simpler ?= approach
+# won't work as expected.
+define allow-override
+  $(if $(or $(findstring environment,$(origin $(1))),\
+$(findstring command line,$(origin $(1,,\
+$(eval $(1) = $(2)))
+endef
+
+# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix.
+$(call allow-override,CC,$(CROSS_COMPILE)gcc)
+$(call allow-override,AR,$(CROSS_COMPILE)ar)
+
+INSTALL = install
+
+# Use DESTDIR 

[PATCH v11 32/39] perf probe: Attach trace_probe_event with perf_probe_event

2015-07-08 Thread Wang Nan
This patch drops struct __event_package structure. Instead, it adds
trace_probe_event into 'struct perf_probe_event'.

trace_probe_event information gives further patches a chance to access
actual probe points and actual arguments. Using them, bpf_loader will
be able to attach one bpf program to different probing points of a
inline functions (which has multiple probing points) and glob
functions. Moreover, by reading arguments information, bpf code for
reading those arguments can be generated.

Signed-off-by: Wang Nan 
---
 tools/perf/builtin-probe.c|  4 ++-
 tools/perf/util/probe-event.c | 60 +--
 tools/perf/util/probe-event.h |  6 -
 3 files changed, 38 insertions(+), 32 deletions(-)

diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index b81cec3..826d452 100644
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -496,7 +496,9 @@ __cmd_probe(int argc, const char **argv, const char *prefix 
__maybe_unused)
usage_with_options(probe_usage, options);
}
 
-   ret = add_perf_probe_events(params.events, params.nevents);
+   ret = add_perf_probe_events(params.events,
+   params.nevents,
+   true);
if (ret < 0) {
pr_err_with_code("  Error: Failed to add events.", ret);
return ret;
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 381f23a..083e8b4 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1930,6 +1930,9 @@ void clear_perf_probe_event(struct perf_probe_event *pev)
struct perf_probe_arg_field *field, *next;
int i;
 
+   if (pev->ntevs)
+   cleanup_perf_probe_event(pev);
+
free(pev->event);
free(pev->group);
free(pev->target);
@@ -2778,61 +2781,58 @@ static int convert_to_probe_trace_events(struct 
perf_probe_event *pev,
return find_probe_trace_events_from_map(pev, tevs);
 }
 
-struct __event_package {
-   struct perf_probe_event *pev;
-   struct probe_trace_event*tevs;
-   int ntevs;
-};
-
-int add_perf_probe_events(struct perf_probe_event *pevs, int npevs)
+int cleanup_perf_probe_event(struct perf_probe_event *pev)
 {
-   int i, j, ret;
-   struct __event_package *pkgs;
+   int i;
 
-   ret = 0;
-   pkgs = zalloc(sizeof(struct __event_package) * npevs);
+   if (!pev || !pev->ntevs)
+   return 0;
 
-   if (pkgs == NULL)
-   return -ENOMEM;
+   for (i = 0; i < pev->ntevs; i++)
+   clear_probe_trace_event(>tevs[i]);
+
+   zfree(>tevs);
+   pev->ntevs = 0;
+   return 0;
+}
+
+int add_perf_probe_events(struct perf_probe_event *pevs, int npevs,
+ bool cleanup)
+{
+   int i, ret;
 
ret = init_symbol_maps(pevs->uprobes);
-   if (ret < 0) {
-   free(pkgs);
+   if (ret < 0)
return ret;
-   }
 
/* Loop 1: convert all events */
for (i = 0; i < npevs; i++) {
-   pkgs[i].pev = [i];
/* Init kprobe blacklist if needed */
-   if (!pkgs[i].pev->uprobes)
+   if (pevs[i].uprobes)
kprobe_blacklist__init();
/* Convert with or without debuginfo */
-   ret  = convert_to_probe_trace_events(pkgs[i].pev,
-[i].tevs);
-   if (ret < 0)
+   ret  = convert_to_probe_trace_events([i], [i].tevs);
+   if (ret < 0) {
+   cleanup = true;
goto end;
-   pkgs[i].ntevs = ret;
+   }
+   pevs[i].ntevs = ret;
}
/* This just release blacklist only if allocated */
kprobe_blacklist__release();
 
/* Loop 2: add all events */
for (i = 0; i < npevs; i++) {
-   ret = __add_probe_trace_events(pkgs[i].pev, pkgs[i].tevs,
-  pkgs[i].ntevs,
+   ret = __add_probe_trace_events([i], pevs[i].tevs,
+  pevs[i].ntevs,
   probe_conf.force_add);
if (ret < 0)
break;
}
 end:
/* Loop 3: cleanup and free trace events  */
-   for (i = 0; i < npevs; i++) {
-   for (j = 0; j < pkgs[i].ntevs; j++)
-   clear_probe_trace_event([i].tevs[j]);
-   zfree([i].tevs);
-   }
-   free(pkgs);
+   for (i = 0; cleanup && (i < npevs); i++)
+   cleanup_perf_probe_event([i]);
exit_symbol_maps();
 
return ret;
diff --git a/tools/perf/util/probe-event.h 

[PATCH v11 35/39] perf tools: Add bpf_fd field to evsel and config it

2015-07-08 Thread Wang Nan
This patch adds a bpf_fd field to 'struct evsel' then introduces method
to config it. In bpf-loader, a bpf__foreach_tev() function is added,
Which calls the callback function for each 'struct probe_trace_event'
events for each bpf program with their file descriptors. In evlist.c,
perf_evlist__add_bpf() is introduced to add all bpf events into evlist.
The event names are found from probe_trace_event structure.
'perf record' calls perf_evlist__add_bpf().

Since bpf-loader.c will not be built if libbpf is turned off, an empty
bpf__foreach_tev() is defined in bpf-loader.h to avoid compiling
error.

This patch iterates over 'struct probe_trace_event' instead of
'struct probe_trace_event' during the loop for further patches, which
will generate multiple instances form one BPF program and install then
onto different 'struct probe_trace_event'.

Signed-off-by: Wang Nan 
---
 tools/perf/builtin-record.c  |  6 ++
 tools/perf/util/bpf-loader.c | 42 ++
 tools/perf/util/bpf-loader.h | 13 +
 tools/perf/util/evlist.c | 41 +
 tools/perf/util/evlist.h |  1 +
 tools/perf/util/evsel.c  |  1 +
 tools/perf/util/evsel.h  |  1 +
 7 files changed, 105 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 6d943a7..bd189b1 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1140,6 +1140,12 @@ int cmd_record(int argc, const char **argv, const char 
*prefix __maybe_unused)
goto out_symbol_exit;
}
 
+   err = perf_evlist__add_bpf(rec->evlist);
+   if (err < 0) {
+   pr_err("Failed to add events from BPF object(s)\n");
+   goto out_symbol_exit;
+   }
+
symbol__init(NULL);
 
if (symbol_conf.kptr_restrict)
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index f68ba33..63077b9 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -260,3 +260,45 @@ errout:
bpf_object__unload(obj);
return err;
 }
+
+int bpf__foreach_tev(bpf_prog_iter_callback_t func, void *arg)
+{
+   struct bpf_object *obj, *tmp;
+   struct bpf_program *prog;
+   int err;
+
+   bpf_object__for_each_safe(obj, tmp) {
+   bpf_object__for_each_program(prog, obj) {
+   struct probe_trace_event *tev;
+   struct perf_probe_event *pev;
+   struct bpf_prog_priv *priv;
+   int i, fd;
+
+   err = bpf_program__get_private(prog,
+  (void **));
+   if (err || !priv) {
+   pr_err("bpf: failed to get private field\n");
+   return -EINVAL;
+   }
+
+   pev = priv->pev;
+   for (i = 0; i < pev->ntevs; i++) {
+   tev = >tevs[i];
+
+   err = bpf_program__get_fd(prog, );
+
+   if (err || fd < 0) {
+   pr_err("bpf: failed to get file 
descriptor\n");
+   return -EINVAL;
+   }
+   err = func(tev, fd, arg);
+   if (err) {
+   pr_err("bpf: call back failed, stop 
iterate\n");
+   return err;
+   }
+   }
+   }
+   }
+
+   return 0;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index ae0dc9b..ef9b3bb 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -6,10 +6,14 @@
 #define __BPF_LOADER_H
 
 #include 
+#include "probe-event.h"
 #include "debug.h"
 
 #define PERF_BPF_PROBE_GROUP "perf_bpf_probe"
 
+typedef int (*bpf_prog_iter_callback_t)(struct probe_trace_event *tev,
+   int fd, void *arg);
+
 #ifdef HAVE_LIBBPF_SUPPORT
 int bpf__prepare_load(const char *filename, bool source);
 int bpf__probe(void);
@@ -17,6 +21,8 @@ int bpf__unprobe(void);
 int bpf__load(void);
 
 void bpf__clear(void);
+
+int bpf__foreach_tev(bpf_prog_iter_callback_t func, void *arg);
 #else
 static inline int bpf__prepare_load(const char *filename __maybe_unused,
bool source __maybe_unused)
@@ -29,5 +35,12 @@ static inline int bpf__probe(void) { return 0; }
 static inline int bpf__unprobe(void) { return 0; }
 static inline int bpf__load(void) { return 0; }
 static inline void bpf__clear(void) { }
+
+static inline int
+bpf__foreach_tev(bpf_prog_iter_callback_t func __maybe_unused,
+void *arg __maybe_unused)
+{
+   return 0;
+}
 #endif
 #endif
diff --git 

[PATCH v11 30/39] perf record: Compile scriptlets if pass '.c' to --event

2015-07-08 Thread Wang Nan
This patch enables passing source files to --event directly using:

 # perf record --event bpf-file.c command

This patch does following works:
 1) Allow passing '.c' file to '--event'. parse_events_load_bpf() is
expanded to allow caller tell it whether the passed file is source
file or object.

 2) llvm__compile_bpf() is called to compile the '.c' file, the result
is saved into memory. Use bpf_object__open_buffer() to load the
in-memory object.

Signed-off-by: Wang Nan 
---
 tools/perf/util/bpf-loader.c   | 17 +++--
 tools/perf/util/bpf-loader.h   |  5 +++--
 tools/perf/util/parse-events.c |  4 ++--
 tools/perf/util/parse-events.h |  2 +-
 tools/perf/util/parse-events.l |  3 +++
 tools/perf/util/parse-events.y | 15 +--
 6 files changed, 37 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 7c750b6..61d3adf 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -9,6 +9,7 @@
 #include "perf.h"
 #include "debug.h"
 #include "bpf-loader.h"
+#include "llvm-utils.h"
 
 #define DEFINE_PRINT_FN(name, level) \
 static int libbpf_##name(const char *fmt, ...) \
@@ -28,7 +29,7 @@ DEFINE_PRINT_FN(debug, 1)
 
 static bool libbpf_initialized;
 
-int bpf__prepare_load(const char *filename)
+int bpf__prepare_load(const char *filename, bool source)
 {
struct bpf_object *obj;
 
@@ -37,7 +38,19 @@ int bpf__prepare_load(const char *filename)
 libbpf_info,
 libbpf_debug);
 
-   obj = bpf_object__open(filename);
+   if (source) {
+   void *obj_buf;
+   size_t obj_buf_sz;
+   int err;
+
+   err = llvm__compile_bpf(filename, _buf, _buf_sz);
+   if (err)
+   return err;
+   obj = bpf_object__open_buffer(obj_buf, obj_buf_sz);
+   free(obj_buf);
+   } else
+   obj = bpf_object__open(filename);
+
if (!obj) {
pr_err("bpf: failed to load %s\n", filename);
return -EINVAL;
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 39d8d1a..5566be0 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -9,11 +9,12 @@
 #include "debug.h"
 
 #ifdef HAVE_LIBBPF_SUPPORT
-int bpf__prepare_load(const char *filename);
+int bpf__prepare_load(const char *filename, bool source);
 
 void bpf__clear(void);
 #else
-static inline int bpf__prepare_load(const char *filename __maybe_unused)
+static inline int bpf__prepare_load(const char *filename __maybe_unused,
+   bool source __maybe_unused)
 {
pr_err("ERROR: eBPF object loading is disabled during compiling.\n");
return -1;
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index a2829ef..8f3644f 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -478,7 +478,7 @@ int parse_events_add_tracepoint(struct list_head *list, int 
*idx,
 
 int parse_events_load_bpf(struct list_head *list __maybe_unused,
  int *idx __maybe_unused,
- char *bpf_file_name)
+ char *bpf_file_name, bool source)
 {
/*
 * Currently don't link any event to list. BPF object files
@@ -488,7 +488,7 @@ int parse_events_load_bpf(struct list_head *list 
__maybe_unused,
 * problem. After that probe events file by file is possible.
 * However, probing cost is still need to be considered.
 */
-   return bpf__prepare_load(bpf_file_name);
+   return bpf__prepare_load(bpf_file_name, source);
 }
 
 static int
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 41b962a..5841d4f 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -115,7 +115,7 @@ int parse_events_name(struct list_head *list, char *name);
 int parse_events_add_tracepoint(struct list_head *list, int *idx,
char *sys, char *event);
 int parse_events_load_bpf(struct list_head *list, int *idx,
- char *bpf_file_name);
+ char *bpf_file_name, bool source);
 int parse_events_add_numeric(struct parse_events_evlist *data,
 struct list_head *list,
 u32 type, u64 config,
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 8328b28..556fa21 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -116,6 +116,7 @@ group   [^,{}/]*[{][^}]*[}][^,{}/]*
 event_pmu  [^,{}/]+[/][^/]*[/][^,{}/]*
 event  [^,{}/]+
 bpf_object .*\.(o|bpf)
+bpf_source .*\.c
 
 num_dec[0-9]+
 num_hex0x[a-fA-F0-9]+
@@ -161,6 +162,7 @@ modifier_bp [rwx]{1,3}
 
 {event_pmu}|
 {bpf_object}  

[PATCH v11 25/39] perf tools: Auto detecting kernel build directory

2015-07-08 Thread Wang Nan
This patch detects kernel build directory using a embedded shell
script 'kbuild_detector', which does this by checking existence of
include/generated/autoconf.h.

clang working directory is changed to kbuild directory if it is found,
to help user use relative include path. Following patch will detect
kernel include directory, which contains relative include patch so this
workdir changing is needed.

Users are allowed to set 'kbuild-dir = ""' manually to disable this
checking.

Signed-off-by: Wang Nan 
---
 tools/perf/util/llvm-utils.c | 56 +++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/llvm-utils.c b/tools/perf/util/llvm-utils.c
index dca16e7..2ca2bd6 100644
--- a/tools/perf/util/llvm-utils.c
+++ b/tools/perf/util/llvm-utils.c
@@ -204,6 +204,51 @@ version_notice(void)
 );
 }
 
+static const char *kbuild_detector =
+"#!/usr/bin/env sh\n"
+"DEFAULT_KBUILD_DIR=/lib/modules/`uname -r`/build\n"
+"if test -z \"$KBUILD_DIR\"\n"
+"then\n"
+"KBUILD_DIR=$DEFAULT_KBUILD_DIR\n"
+"fi\n"
+"if test -f $KBUILD_DIR/include/generated/autoconf.h\n"
+"then\n"
+"  echo -n $KBUILD_DIR\n"
+"  exit 0\n"
+"fi\n"
+"exit -1\n";
+
+static inline void
+get_kbuild_opts(char **kbuild_dir)
+{
+   int err;
+
+   if (!kbuild_dir)
+   return;
+
+   *kbuild_dir = NULL;
+
+   if (llvm_param.kbuild_dir && !llvm_param.kbuild_dir[0]) {
+   pr_debug("[llvm.kbuild-dir] is set to \"\" deliberately.\n");
+   pr_debug("Skip kbuild options detection.\n");
+   return;
+   }
+
+   force_set_env("KBUILD_DIR", llvm_param.kbuild_dir);
+   force_set_env("KBUILD_OPTS", llvm_param.kbuild_opts);
+   err = read_from_pipe(kbuild_detector,
+((void **)kbuild_dir),
+NULL);
+   if (err) {
+   pr_warning(
+"WARNING:\tunable to get correct kernel building directory.\n"
+"Hint:\tSet correct kbuild directory using 'kbuild-dir' option in [llvm]\n"
+" \tsection of ~/.perfconfig or set it to \"\" to suppress kbuild\n"
+" \tdetection.\n\n");
+   return;
+   }
+}
+
 int llvm__compile_bpf(const char *path, void **p_obj_buf,
  size_t *p_obj_buf_sz)
 {
@@ -211,6 +256,7 @@ int llvm__compile_bpf(const char *path, void **p_obj_buf,
char clang_path[PATH_MAX];
const char *clang_opt = llvm_param.clang_opt;
const char *template = llvm_param.clang_bpf_cmd_template;
+   char *kbuild_dir = NULL;
void *obj_buf = NULL;
size_t obj_buf_sz;
 
@@ -228,10 +274,16 @@ int llvm__compile_bpf(const char *path, void **p_obj_buf,
return -ENOENT;
}
 
+   /*
+* This is an optional work. Even it fail we can continue our
+* work. Needn't to check error return.
+*/
+   get_kbuild_opts(_dir);
+
force_set_env("CLANG_EXEC", clang_path);
force_set_env("CLANG_OPTIONS", clang_opt);
force_set_env("KERNEL_INC_OPTIONS", NULL);
-   force_set_env("WORKING_DIR", ".");
+   force_set_env("WORKING_DIR", kbuild_dir ? : ".");
 
/*
 * Since we may reset clang's working dir, path of source file
@@ -252,6 +304,7 @@ int llvm__compile_bpf(const char *path, void **p_obj_buf,
goto errout;
}
 
+   free(kbuild_dir);
if (!p_obj_buf)
free(obj_buf);
else
@@ -261,6 +314,7 @@ int llvm__compile_bpf(const char *path, void **p_obj_buf,
*p_obj_buf_sz = obj_buf_sz;
return 0;
 errout:
+   free(kbuild_dir);
free(obj_buf);
if (p_obj_buf)
*p_obj_buf = NULL;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 39/39] bpf tools: Load a program with different instance using preprocessor

2015-07-08 Thread Wang Nan
In this patch, caller of libbpf is able to control the loaded programs
by installing a preprocessor callback for a BPF program. With
preprocessor, different instances can be created from one BPF program.

This patch will be used by perf to generate different prologue for
different 'struct probe_trace_event' instances matched by one
'struct perf_probe_event'.

bpf_program__set_prep() is added to support this function. Caller
should pass libbpf the number of instance should be created and a
preprocessor function which will be called when doing real loading.
The callback should return instructions arrays for each instances.

nr_instance and instance_fds are appended into bpf_programs to support
multiple instances. bpf_program__get_nth_fd() is introduced for
read fd of instances. Old interface bpf_program__get_fd() won't work
for program which has preprocessor hooked.

Signed-off-by: Wang Nan 
Signed-off-by: He Kuang 
---
 tools/lib/bpf/libbpf.c | 135 +++--
 tools/lib/bpf/libbpf.h |  22 
 2 files changed, 152 insertions(+), 5 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 83a3b06..d467fcb 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -100,6 +100,10 @@ struct bpf_program {
 
int fd;
 
+   int nr_instance;
+   int *instance_fds;
+   bpf_program_prep_t preprocessor;
+
struct bpf_object *obj;
void *priv;
bpf_program_clear_priv_t clear_priv;
@@ -156,6 +160,19 @@ static void bpf_program__unload(struct bpf_program *prog)
return;
 
zclose(prog->fd);
+
+   if (prog->preprocessor) {
+   int i;
+
+   if (prog->nr_instance <= 0)
+   pr_warning("Internal error when unloading: instance is 
%d\n",
+  prog->nr_instance);
+   else
+   for (i = 0; i < prog->nr_instance; i++)
+   zclose(prog->instance_fds[i]);
+   prog->nr_instance = -1;
+   zfree(>instance_fds);
+   }
 }
 
 static void bpf_program__clear(struct bpf_program *prog)
@@ -207,6 +224,8 @@ __bpf_program__new(void *data, size_t size, char *name, int 
idx,
   prog->insns_cnt * sizeof(struct bpf_insn));
prog->idx = idx;
prog->fd = -1;
+   prog->nr_instance = -1;
+   prog->instance_fds = NULL;
 
return 0;
 errout:
@@ -798,13 +817,57 @@ static int
 bpf_program__load(struct bpf_program *prog,
  char *license, u32 kern_version)
 {
-   int err, fd;
+   int err = 0, fd, i;
+
+   if (!prog->preprocessor) {
+   err = load_program(prog->insns, prog->insns_cnt,
+   license, kern_version, );
+   if (!err)
+   prog->fd = fd;
+   goto out;
+   }
 
-   err = load_program(prog->insns, prog->insns_cnt,
-  license, kern_version, );
-   if (!err)
-   prog->fd = fd;
+   if (prog->nr_instance <= 0 || !prog->instance_fds) {
+   pr_warning("Internal error when loading '%s'\n",
+   prog->section_name);
+   return -EINVAL;
+   }
+
+   for (i = 0; i < prog->nr_instance; i++) {
+   struct bpf_prog_prep_result result;
+   bpf_program_prep_t preprocessor = prog->preprocessor;
+
+   bzero(, sizeof(result));
+   err = preprocessor(prog, i, prog->insns,
+  prog->insns_cnt, );
+   if (err) {
+   pr_warning("Preprocessing %dth instance of program '%s' 
failed\n",
+   i, prog->section_name);
+   goto out;
+   }
+
+   if (!result.new_insn_ptr || !result.new_insn_cnt) {
+   pr_debug("Skip loading %dth instance of program '%s'\n",
+   i, prog->section_name);
+   prog->instance_fds[i] = -1;
+   continue;
+   }
 
+   err = load_program(result.new_insn_ptr,
+  result.new_insn_cnt,
+  license, kern_version, );
+
+   if (err) {
+   pr_warning("Loading %dth instance of program '%s' 
failed\n",
+   i, prog->section_name);
+   goto out;
+   }
+
+   if (result.pfd)
+   *result.pfd = fd;
+   prog->instance_fds[i] = fd;
+   }
+out:
if (err)
pr_warning("failed to load program '%s'\n",
   prog->section_name);
@@ -1054,6 +1117,68 @@ int bpf_program__get_fd(struct bpf_program *prog, int 
*pfd)
if (!pfd)
return -EINVAL;
 
+   if 

[PATCH v11 27/39] perf tests: Add LLVM test for eBPF on-the-fly compiling

2015-07-08 Thread Wang Nan
Previous patches introduce llvm__compile_bpf() to compile source file to
eBPF object. This patch adds testcase to test it. It also test libbpf
by opening generated object.

Signed-off-by: Wang Nan 
---
 tools/perf/tests/Build  |  1 +
 tools/perf/tests/builtin-test.c |  4 ++
 tools/perf/tests/llvm.c | 85 +
 tools/perf/tests/tests.h|  1 +
 4 files changed, 91 insertions(+)
 create mode 100644 tools/perf/tests/llvm.c

diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index d20d6e6..c1518bd 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -32,6 +32,7 @@ perf-y += sample-parsing.o
 perf-y += parse-no-sample-id-all.o
 perf-y += kmod-path.o
 perf-y += thread-map.o
+perf-y += llvm.o
 
 perf-$(CONFIG_X86) += perf-time-to-tsc.o
 
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index c1dde73..6a3fb54 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -174,6 +174,10 @@ static struct test {
.desc = "Test thread map",
.func = test__thread_map,
},
+   {
+   .desc = "Test LLVM searching and compiling",
+   .func = test__llvm,
+   },
{
.func = NULL,
},
diff --git a/tools/perf/tests/llvm.c b/tools/perf/tests/llvm.c
new file mode 100644
index 000..9984631
--- /dev/null
+++ b/tools/perf/tests/llvm.c
@@ -0,0 +1,85 @@
+#include 
+#include 
+#include 
+#include 
+#include "tests.h"
+#include "debug.h"
+
+static int perf_config_cb(const char *var, const char *val,
+ void *arg __maybe_unused)
+{
+   return perf_default_config(var, val, arg);
+}
+
+/*
+ * Randomly give it a "version" section since we don't really load it
+ * into kernel
+ */
+static const char test_bpf_prog[] =
+   "__attribute__((section(\"do_fork\"), used)) "
+   "int fork(void *ctx) {return 0;} "
+   "char _license[] __attribute__((section(\"license\"), used)) = \"GPL\";"
+   "int _version __attribute__((section(\"version\"), used)) = 0x40100;";
+
+#ifdef HAVE_LIBBPF_SUPPORT
+static int test__bpf_parsing(void *obj_buf, size_t obj_buf_sz)
+{
+   struct bpf_object *obj;
+
+   obj = bpf_object__open_buffer(obj_buf, obj_buf_sz);
+   if (!obj)
+   return -1;
+   bpf_object__close(obj);
+   return 0;
+}
+#else
+static int test__bpf_parsing(void *obj_buf __maybe_unused,
+size_t obj_buf_sz __maybe_unused)
+{
+   fprintf(stderr, " (skip bpf parsing)");
+   return 0;
+}
+#endif
+
+#define new_string(n, fmt, ...) \
+do {   \
+   n##_sz = snprintf(NULL, 0, fmt, __VA_ARGS__);\
+   if (n##_sz == 0)\
+   return -1;  \
+   n##_new = malloc(n##_sz + 1);   \
+   if (!n##_new)   \
+   return -1;  \
+   snprintf(n##_new, n##_sz + 1, fmt, __VA_ARGS__);\
+   n##_new[n##_sz] = '\0'; \
+} while (0)
+
+int test__llvm(void)
+{
+   char *tmpl_new, *clang_opt_new;
+   size_t tmpl_sz, clang_opt_sz;
+   void *obj_buf;
+   size_t obj_buf_sz;
+   int err;
+
+   perf_config(perf_config_cb, NULL);
+
+   if (!llvm_param.clang_bpf_cmd_template)
+   return -1;
+
+   if (!llvm_param.clang_opt)
+   llvm_param.clang_opt = strdup("");
+
+   new_string(tmpl, "echo '%s' | %s", test_bpf_prog,
+  llvm_param.clang_bpf_cmd_template);
+   new_string(clang_opt, "-xc %s", llvm_param.clang_opt);
+
+   llvm_param.clang_bpf_cmd_template = tmpl_new;
+   llvm_param.clang_opt = clang_opt_new;
+   err = llvm__compile_bpf("-", _buf, _buf_sz);
+   if (err)
+   return -1;
+
+   err = test__bpf_parsing(obj_buf, obj_buf_sz);
+   free(obj_buf);
+   return err;
+}
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index ebb47d9..bf113a2 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -62,6 +62,7 @@ int test__fdarray__filter(void);
 int test__fdarray__add(void);
 int test__kmod_path__parse(void);
 int test__thread_map(void);
+int test__llvm(void);
 
 #if defined(__x86_64__) || defined(__i386__) || defined(__arm__) || 
defined(__aarch64__)
 #ifdef HAVE_DWARF_UNWIND_SUPPORT
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 4/4] trace: Trace log handler for logging into STM blocks

2015-07-08 Thread Steven Rostedt
On Wed, 8 Jul 2015 14:31:48 +0200
Peter Zijlstra  wrote:

> On Tue, Jul 07, 2015 at 06:10:43PM +0800, Chunyan Zhang wrote:
> > Add the function 'trace_event_stm_output_##call' for printing events
> > trace log into STM blocks.
> > 
> > This patch also adds a function call at where the events have been
> > committed to ring buffer to export the trace event information to
> > STM blocks.
> 
> So then you have two copies of the data, why that? Would a scheme were
> data either goes to the STM or the regular buffer not make much more
> sense?
> 
> > +++ b/include/trace/perf.h
> > @@ -175,6 +175,7 @@ trace_event_raw_event_##call(void *__data, proto)   
> > \
> > { assign; } \
> > \
> > trace_event_buffer_commit();\
> > +   trace_event_stm_log();  \
> 
> This makes every trace event slower.

Of course this could use a jump label.

But I agree, I think a switch to which buffer it should be sent to is
better. I could come up with a way to make the buffer more generic, and
have it switch between where the event is recorded.

-- Steve

> 
> >  }
> >  /*
> >   * The ftrace_test_probe is compiled out, it is only here as a build time 
> > check

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 24/39] perf tools: Call clang to compile C source to object code

2015-07-08 Thread Wang Nan
This is the core patch for supporting eBPF on-the-fly compiling, does
the following work:

 1. Search clang compiler using search_program().

 2. Run command template defined in llvm-bpf-cmd-template option in
[llvm] config section using read_from_pipe(). Patch of clang and
source code path is injected into shell command using environment
variable using force_set_env().

Signed-off-by: Wang Nan 
---
 tools/perf/util/llvm-utils.c | 225 +++
 tools/perf/util/llvm-utils.h |   3 +
 2 files changed, 228 insertions(+)

diff --git a/tools/perf/util/llvm-utils.c b/tools/perf/util/llvm-utils.c
index fd5b1bc..dca16e7 100644
--- a/tools/perf/util/llvm-utils.c
+++ b/tools/perf/util/llvm-utils.c
@@ -43,3 +43,228 @@ int perf_llvm_config(const char *var, const char *value)
return -1;
return 0;
 }
+
+static int
+search_program(const char *def, const char *name,
+  char *output)
+{
+   char *env, *path, *tmp;
+   char buf[PATH_MAX];
+   int ret;
+
+   output[0] = '\0';
+   if (def && def[0] != '\0') {
+   if (def[0] == '/') {
+   if (access(def, F_OK) == 0) {
+   strlcpy(output, def, PATH_MAX);
+   return 0;
+   }
+   } else if (def[0] != '\0')
+   name = def;
+   }
+
+   env = getenv("PATH");
+   if (!env)
+   return -1;
+   env = strdup(env);
+   if (!env)
+   return -1;
+
+   ret = -ENOENT;
+   path = strtok_r(env, ":",  );
+   while (path) {
+   scnprintf(buf, sizeof(buf), "%s/%s", path, name);
+   if (access(buf, F_OK) == 0) {
+   strlcpy(output, buf, PATH_MAX);
+   ret = 0;
+   break;
+   }
+   path = strtok_r(NULL, ":", );
+   }
+
+   free(env);
+   return ret;
+}
+
+#define READ_SIZE  4096
+static int
+read_from_pipe(const char *cmd, void **p_buf, size_t *p_read_sz)
+{
+   int err = 0;
+   void *buf = NULL;
+   FILE *file = NULL;
+   size_t read_sz = 0, buf_sz = 0;
+
+   file = popen(cmd, "r");
+   if (!file) {
+   pr_err("ERROR: unable to popen cmd: %s\n",
+  strerror(errno));
+   return -EINVAL;
+   }
+
+   while (!feof(file) && !ferror(file)) {
+   /*
+* Make buf_sz always have obe byte extra space so we
+* can put '\0' there.
+*/
+   if (buf_sz - read_sz < READ_SIZE + 1) {
+   void *new_buf;
+
+   buf_sz = read_sz + READ_SIZE + 1;
+   new_buf = realloc(buf, buf_sz);
+
+   if (!new_buf) {
+   pr_err("ERROR: failed to realloc memory\n");
+   err = -ENOMEM;
+   goto errout;
+   }
+
+   buf = new_buf;
+   }
+   read_sz += fread(buf + read_sz, 1, READ_SIZE, file);
+   }
+
+   if (buf_sz - read_sz < 1) {
+   pr_err("ERROR: internal error\n");
+   err = -EINVAL;
+   goto errout;
+   }
+
+   if (ferror(file)) {
+   pr_err("ERROR: error occurred when reading from pipe: %s\n",
+  strerror(errno));
+   err = -EIO;
+   goto errout;
+   }
+
+   err = WEXITSTATUS(pclose(file));
+   file = NULL;
+   if (err) {
+   err = -EINVAL;
+   goto errout;
+   }
+
+   /*
+* If buf is string, give it terminal '\0' to make our life
+* easier. If buf is not string, that '\0' is out of space
+* indicated by read_sz so caller won't even notice it.
+*/
+   ((char *)buf)[read_sz] = '\0';
+
+   if (!p_buf)
+   free(buf);
+   else
+   *p_buf = buf;
+
+   if (p_read_sz)
+   *p_read_sz = read_sz;
+   return 0;
+
+errout:
+   if (file)
+   pclose(file);
+   free(buf);
+   if (p_buf)
+   *p_buf = NULL;
+   if (p_read_sz)
+   *p_read_sz = 0;
+   return err;
+}
+
+static inline void
+force_set_env(const char *var, const char *value)
+{
+   if (value) {
+   setenv(var, value, 1);
+   pr_debug("set env: %s=%s\n", var, value);
+   } else {
+   unsetenv(var);
+   pr_debug("unset env: %s\n", var);
+   }
+}
+
+static void
+version_notice(void)
+{
+   pr_err(
+" \tLLVM 3.7 or newer is required. Which can be found from 
http://llvm.org\n;
+" \tYou may want to try git trunk:\n"
+" \t\tgit clone http://llvm.org/git/llvm.git\n;
+" \t\t and\n"
+" \t\tgit clone http://llvm.org/git/clang.git\n\n;

[PATCH v11 02/39] tracing, perf: Implement BPF programs attached to uprobes

2015-07-08 Thread Wang Nan
By copying BPF related operation to uprobe processing path, this patch
allow users attach BPF programs to uprobes like what they are already
doing on kprobes.

After this patch, users are allowed to use PERF_EVENT_IOC_SET_BPF on a
uprobe perf event. Which make it possible to profile user space programs
and kernel events together using BPF.

Because of this patch, CONFIG_BPF_EVENTS should be selected by
CONFIG_UPROBE_EVENT to ensure trace_call_bpf() is compiled even if
KPROBE_EVENT is not set.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-3-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 include/linux/trace_events.h | 5 +
 kernel/events/core.c | 4 ++--
 kernel/trace/Kconfig | 2 +-
 kernel/trace/trace_uprobe.c  | 5 +
 4 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 180dbf8..ed27917 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -243,6 +243,7 @@ enum {
TRACE_EVENT_FL_USE_CALL_FILTER_BIT,
TRACE_EVENT_FL_TRACEPOINT_BIT,
TRACE_EVENT_FL_KPROBE_BIT,
+   TRACE_EVENT_FL_UPROBE_BIT,
 };
 
 /*
@@ -257,6 +258,7 @@ enum {
  *  USE_CALL_FILTER - For trace internal events, don't use file filter
  *  TRACEPOINT- Event is a tracepoint
  *  KPROBE- Event is a kprobe
+ *  UPROBE- Event is a uprobe
  */
 enum {
TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT),
@@ -267,8 +269,11 @@ enum {
TRACE_EVENT_FL_USE_CALL_FILTER  = (1 << 
TRACE_EVENT_FL_USE_CALL_FILTER_BIT),
TRACE_EVENT_FL_TRACEPOINT   = (1 << TRACE_EVENT_FL_TRACEPOINT_BIT),
TRACE_EVENT_FL_KPROBE   = (1 << TRACE_EVENT_FL_KPROBE_BIT),
+   TRACE_EVENT_FL_UPROBE   = (1 << TRACE_EVENT_FL_UPROBE_BIT),
 };
 
+#define TRACE_EVENT_FL_UKPROBE (TRACE_EVENT_FL_KPROBE | TRACE_EVENT_FL_UPROBE)
+
 struct trace_event_call {
struct list_headlist;
struct trace_event_class *class;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d3dae34..0b1d564 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6749,8 +6749,8 @@ static int perf_event_set_bpf_prog(struct perf_event 
*event, u32 prog_fd)
if (event->tp_event->prog)
return -EEXIST;
 
-   if (!(event->tp_event->flags & TRACE_EVENT_FL_KPROBE))
-   /* bpf programs can only be attached to kprobes */
+   if (!(event->tp_event->flags & TRACE_EVENT_FL_UKPROBE))
+   /* bpf programs can only be attached to u/kprobes */
return -EINVAL;
 
prog = bpf_prog_get(prog_fd);
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 3b9a48a..1153c43 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -434,7 +434,7 @@ config UPROBE_EVENT
 
 config BPF_EVENTS
depends on BPF_SYSCALL
-   depends on KPROBE_EVENT
+   depends on KPROBE_EVENT || UPROBE_EVENT
bool
default y
help
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index aa1ea7b..f97479f 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -1095,11 +1095,15 @@ static void __uprobe_perf_func(struct trace_uprobe *tu,
 {
struct trace_event_call *call = >tp.call;
struct uprobe_trace_entry_head *entry;
+   struct bpf_prog *prog = call->prog;
struct hlist_head *head;
void *data;
int size, esize;
int rctx;
 
+   if (prog && !trace_call_bpf(prog, regs))
+   return;
+
esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
 
size = esize + tu->tp.size + dsize;
@@ -1289,6 +1293,7 @@ static int register_uprobe_event(struct trace_uprobe *tu)
return -ENODEV;
}
 
+   call->flags = TRACE_EVENT_FL_UPROBE;
call->class->reg = trace_uprobe_register;
call->data = tu;
ret = trace_add_event_call(call);
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 38/39] perf record: Add clang options for compiling BPF scripts

2015-07-08 Thread Wang Nan
Although previous patch allows setting BPF compiler related options in
perfconfig, on some ad-hoc situation it still requires passing options
through cmdline. This patch introduces 2 options to 'perf record' for
this propose: --clang-path and --clang-opt.

Signed-off-by: Wang Nan 
---
 tools/perf/builtin-record.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index bd189b1..e89c045 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -30,6 +30,7 @@
 #include "util/auxtrace.h"
 #include "util/parse-branch-options.h"
 #include "util/bpf-loader.h"
+#include "util/llvm-utils.h"
 
 #include 
 #include 
@@ -1073,6 +1074,12 @@ struct option __record_options[] = {
  "opts", "AUX area tracing Snapshot Mode", ""),
OPT_UINTEGER(0, "proc-map-timeout", _map_timeout,
"per thread proc mmap processing timeout in ms"),
+#ifdef HAVE_LIBBPF_SUPPORT
+   OPT_STRING(0, "clang-path", _param.clang_path, "clang path",
+  "clang binary to use for compiling BPF scriptlets"),
+   OPT_STRING(0, "clang-opt", _param.clang_opt, "clang options",
+  "options passed to clang when compiling BPF scriptlets"),
+#endif
OPT_END()
 };
 
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 34/39] perf record: Load all eBPF object into kernel

2015-07-08 Thread Wang Nan
This patch utilizes bpf_load_object() provided by libbpf to load all
objects into kernel.

Signed-off-by: Wang Nan 
---
 tools/perf/builtin-record.c  | 12 
 tools/perf/util/bpf-loader.c | 19 +++
 tools/perf/util/bpf-loader.h |  2 ++
 3 files changed, 33 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 33b213a..6d943a7 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1128,6 +1128,18 @@ int cmd_record(int argc, const char **argv, const char 
*prefix __maybe_unused)
goto out_symbol_exit;
}
 
+   /*
+* bpf__probe() also calls symbol__init() if there are probe
+* events in bpf objects, so calling symbol_exit when failure
+* is safe. If there is no probe event, bpf__load() always
+* success.
+*/
+   err = bpf__load();
+   if (err) {
+   pr_err("Loading BPF programs failed\n");
+   goto out_symbol_exit;
+   }
+
symbol__init(NULL);
 
if (symbol_conf.kptr_restrict)
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index df3f471..f68ba33 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -241,3 +241,22 @@ int bpf__probe(void)
 
return err < 0 ? err : 0;
 }
+
+int bpf__load(void)
+{
+   struct bpf_object *obj, *tmp;
+   int err = 0;
+
+   bpf_object__for_each_safe(obj, tmp) {
+   err = bpf_object__load(obj);
+   if (err) {
+   pr_err("bpf: load objects failed\n");
+   goto errout;
+   }
+   }
+   return 0;
+errout:
+   bpf_object__for_each_safe(obj, tmp)
+   bpf_object__unload(obj);
+   return err;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 374aec0..ae0dc9b 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -14,6 +14,7 @@
 int bpf__prepare_load(const char *filename, bool source);
 int bpf__probe(void);
 int bpf__unprobe(void);
+int bpf__load(void);
 
 void bpf__clear(void);
 #else
@@ -26,6 +27,7 @@ static inline int bpf__prepare_load(const char *filename 
__maybe_unused,
 
 static inline int bpf__probe(void) { return 0; }
 static inline int bpf__unprobe(void) { return 0; }
+static inline int bpf__load(void) { return 0; }
 static inline void bpf__clear(void) { }
 #endif
 #endif
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 10/39] bpf tools: Collect map definitions from 'maps' section

2015-07-08 Thread Wang Nan
If maps are used by eBPF programs, corresponding object file(s) should
contain a section named 'map'. Which contains map definitions. This
patch copies the data of the whole section. Map data parsing should be
acted just before map loading.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-11-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 95c8d8e..87f5054 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -81,6 +81,9 @@ void libbpf_set_print(libbpf_print_fn_t warn,
 struct bpf_object {
char license[64];
u32 kern_version;
+   void *maps_buf;
+   size_t maps_buf_sz;
+
/*
 * Information when doing elf related work. Only valid if fd
 * is valid.
@@ -250,6 +253,28 @@ bpf_object__init_kversion(struct bpf_object *obj,
return 0;
 }
 
+static int
+bpf_object__init_maps(struct bpf_object *obj, void *data,
+ size_t size)
+{
+   if (size == 0) {
+   pr_debug("%s doesn't need map definition\n",
+obj->path);
+   return 0;
+   }
+
+   obj->maps_buf = malloc(size);
+   if (!obj->maps_buf) {
+   pr_warning("malloc maps failed: %s\n", obj->path);
+   return -ENOMEM;
+   }
+
+   obj->maps_buf_sz = size;
+   memcpy(obj->maps_buf, data, size);
+   pr_debug("maps in %s: %ld bytes\n", obj->path, (long)size);
+   return 0;
+}
+
 static int bpf_object__elf_collect(struct bpf_object *obj)
 {
Elf *elf = obj->efile.elf;
@@ -305,6 +330,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
err = bpf_object__init_kversion(obj,
data->d_buf,
data->d_size);
+   else if (strcmp(name, "maps") == 0)
+   err = bpf_object__init_maps(obj, data->d_buf,
+   data->d_size);
if (err)
goto out;
}
@@ -382,5 +410,6 @@ void bpf_object__close(struct bpf_object *obj)
 
bpf_object__elf_finish(obj);
 
+   zfree(>maps_buf);
free(obj);
 }
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 01/39] bpf: Use correct #ifdef controller for trace_call_bpf()

2015-07-08 Thread Wang Nan
Commit e1abf2cc8d5d80b41c4419368ec743ccadbb131e ("bpf: Fix the build on
BPF_SYSCALL=y && !CONFIG_TRACING kernels, make it more configurable")
updated the building condition of bpf_trace.o from CONFIG_BPF_SYSCALL
to CONFIG_BPF_EVENTS, but the corresponding #ifdef controller in
trace_events.h for trace_call_bpf() was not changed. Which, in theory,
is incorrect.

With current Kconfigs, we can create a .config with CONFIG_BPF_SYSCALL=y
and CONFIG_BPF_EVENTS=n by unselecting CONFIG_KPROBE_EVENT and
selecting CONFIG_BPF_SYSCALL. With these options, trace_call_bpf() will
be defined as an extern function, but if anyone calls it a symbol missing
error will be triggered since bpf_trace.o was not built.

This patch changes the #ifdef controller for trace_call_bpf() from
CONFIG_BPF_SYSCALL to CONFIG_BPF_EVENTS. I'll show its correctness:

Before this patch:

   BPF_SYSCALL   BPF_EVENTS   trace_call_bpf   bpf_trace.o
   y y   normal   compiled
   n n   inline   not compiled
   y n   normal   not compiled (incorrect)
   n y  impossible (BPF_EVENTS depends on BPF_SYSCALL)

After this patch:

   BPF_SYSCALL   BPF_EVENTS   trace_call_bpf   bpf_trace.o
   y y   normal   compiled
   n n   inline   not compiled
   y n   inline   not compiled (fixed)
   n y  impossible (BPF_EVENTS depends on BPF_SYSCALL)

So this patch doesn't break anything. QED.

Signed-off-by: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-2-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 include/linux/trace_events.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 1063c85..180dbf8 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -542,7 +542,7 @@ event_trigger_unlock_commit_regs(struct trace_event_file 
*file,
event_triggers_post_call(file, tt);
 }
 
-#ifdef CONFIG_BPF_SYSCALL
+#ifdef CONFIG_BPF_EVENTS
 unsigned int trace_call_bpf(struct bpf_prog *prog, void *ctx);
 #else
 static inline unsigned int trace_call_bpf(struct bpf_prog *prog, void *ctx)
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 22/39] bpf tools: Link all bpf objects onto a list

2015-07-08 Thread Wang Nan
To allow enumeration of all bpf_objects, keep them in a list (hidden to
caller). bpf_object__for_each_safe() is introduced to do this iteration.
It is safe even user close the object during iteration.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-23-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 32 
 tools/lib/bpf/libbpf.h |  7 +++
 2 files changed, 39 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index b58b13b..83a3b06 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -104,6 +105,8 @@ struct bpf_program {
bpf_program_clear_priv_t clear_priv;
 };
 
+static LIST_HEAD(bpf_objects_list);
+
 struct bpf_object {
char license[64];
u32 kern_version;
@@ -137,6 +140,12 @@ struct bpf_object {
} *reloc;
int nr_reloc;
} efile;
+   /*
+* All loaded bpf_object is linked in a list, which is
+* hidden to caller. bpf_objects__ handlers deal with
+* all objects.
+*/
+   struct list_head list;
char path[];
 };
 #define obj_elf_valid(o)   ((o)->efile.elf)
@@ -264,6 +273,9 @@ static struct bpf_object *bpf_object__new(const char *path,
obj->efile.obj_buf_sz = obj_buf_sz;
 
obj->loaded = false;
+
+   INIT_LIST_HEAD(>list);
+   list_add(>list, _objects_list);
return obj;
 }
 
@@ -943,6 +955,7 @@ void bpf_object__close(struct bpf_object *obj)
}
zfree(>programs);
 
+   list_del(>list);
free(obj);
 }
 
@@ -955,6 +968,25 @@ int bpf_object__get_prog_cnt(struct bpf_object *obj, 
size_t *pcnt)
return 0;
 }
 
+struct bpf_object *
+bpf_object__next(struct bpf_object *prev)
+{
+   struct bpf_object *next;
+
+   if (!prev)
+   next = list_first_entry(_objects_list,
+   struct bpf_object,
+   list);
+   else
+   next = list_next_entry(prev, list);
+
+   /* Empty list is noticed here so don't need checking on entry. */
+   if (>list == _objects_list)
+   return NULL;
+
+   return next;
+}
+
 struct bpf_program *
 bpf_program__next(struct bpf_program *prev, struct bpf_object *obj)
 {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index a20ae2e..1643b57 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -38,6 +38,13 @@ int bpf_object__unload(struct bpf_object *obj);
 /* Accessors of bpf_object */
 int bpf_object__get_prog_cnt(struct bpf_object *obj, size_t *pcnt);
 
+struct bpf_object *bpf_object__next(struct bpf_object *prev);
+#define bpf_object__for_each_safe(pos, tmp)\
+   for ((pos) = bpf_object__next(NULL),\
+   (tmp) = bpf_object__next(pos);  \
+(pos) != NULL; \
+(pos) = (tmp), (tmp) = bpf_object__next(tmp))
+
 /* Accessors of bpf_program. */
 struct bpf_program;
 struct bpf_program *bpf_program__next(struct bpf_program *prog,
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 11/39] bpf tools: Collect symbol table from SHT_SYMTAB section

2015-07-08 Thread Wang Nan
This patch collects symbols section. This section is useful when linking
BPF maps.

What 'bpf_map_xxx()' functions actually require are map's file
descriptors (and the internal verifier converts fds into pointers to
'struct bpf_map'), which we don't know when compiling. Therefore, we
should make compiler generate a 'ldr_64 r1, ' instruction, and
fill the 'imm' field with the actual file descriptor when loading in
libbpf.

BPF programs should be written in this way:

 struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(unsigned long),
.value_size = sizeof(unsigned long),
.max_entries = 100,
 };

 SEC("my_func=sys_write")
 int my_func(void *ctx)
 {
 ...
 bpf_map_update_elem(_map, , , BPF_ANY);
 ...
 }

Compiler should convert '_map' into a 'ldr_64, r1, '
instruction, where imm should be the address of 'my_map'. According to
the address, libbpf knows which map it actually referenced, and then
fills the imm field with the 'fd' of that map created by it.

However, since we never really 'link' the object file, the imm field is
only a record in relocation section. Therefore libbpf should do the
relocation:

 1. In relocation section (type == SHT_REL), positions of each such
'ld_64' instruction are recorded with a reference of an entry in
symbol table (SHT_SYMTAB);

 2. From records in symbol table we can find the indics of map
variables.

Libbpf first record SHT_SYMTAB and positions of each instruction which
required bu such operation. Then create file descriptor. Finally, after
map creation complete, replace the imm field.

This is the first patch of BPF map related stuff. It records SHT_SYMTAB
into object's efile field for further use.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-12-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 87f5054..9b016c0 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -94,6 +94,7 @@ struct bpf_object {
size_t obj_buf_sz;
Elf *elf;
GElf_Ehdr ehdr;
+   Elf_Data *symbols;
} efile;
char path[];
 };
@@ -135,6 +136,7 @@ static void bpf_object__elf_finish(struct bpf_object *obj)
elf_end(obj->efile.elf);
obj->efile.elf = NULL;
}
+   obj->efile.symbols = NULL;
zclose(obj->efile.fd);
obj->efile.obj_buf = NULL;
obj->efile.obj_buf_sz = 0;
@@ -333,6 +335,14 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
else if (strcmp(name, "maps") == 0)
err = bpf_object__init_maps(obj, data->d_buf,
data->d_size);
+   else if (sh.sh_type == SHT_SYMTAB) {
+   if (obj->efile.symbols) {
+   pr_warning("bpf: multiple SYMTAB in %s\n",
+  obj->path);
+   err = -EEXIST;
+   } else
+   obj->efile.symbols = data;
+   }
if (err)
goto out;
}
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 36/39] perf tools: Attach eBPF program to perf event

2015-07-08 Thread Wang Nan
This is the final patch which makes basic BPF filter work. After
applying this patch, users are allowed to use BPF filter like:

 # perf record --event ./hello_world.c ls

In this patch PERF_EVENT_IOC_SET_BPF ioctl is used to attach eBPF
program to a newly created perf event. The file descriptor of the
eBPF program is passed to perf record using previous patches, and
stored into evsel->bpf_fd.

It is possible that different perf event are created for one kprobe
events for different CPUs. In this case, when trying to call the
ioctl, EEXIST will be return. This patch doesn't treat it as an error.

Signed-off-by: Wang Nan 
---
 tools/perf/util/evsel.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index fe80047..73c17f3 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1216,6 +1216,22 @@ retry_open:
  err);
goto try_fallback;
}
+
+   if (evsel->bpf_fd >= 0) {
+   int evt_fd = FD(evsel, cpu, thread);
+   int bpf_fd = evsel->bpf_fd;
+
+   err = ioctl(evt_fd,
+   PERF_EVENT_IOC_SET_BPF,
+   bpf_fd);
+   if (err && errno != EEXIST) {
+   pr_err("failed to attach bpf fd %d: 
%s\n",
+  bpf_fd, strerror(errno));
+   err = -EINVAL;
+   goto out_close;
+   }
+   }
+
set_rlimit = NO_CHANGE;
 
/*
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 00/39] perf tools: filtering events using eBPF programs - part1

2015-07-08 Thread Wang Nan
Hi Arnaldo Carvalho de Melo,

   I rearranged the first 39 patches of this patchset according to
your comments. After applying all of them you can use a hello world
BPF program for testing. They are based on your 'tmp.perf/ebpf', commit
60cd37eb100c4880b28078a47f3062fac7572095.

  I hope I can manage a public avaliable git repository for you
tomorrow (tomorrow means 24 hours later). What about a repository on
github? However I have to do this out of my office because of company's
IT policy.

 In this v11 you can see following improvements:

 Commit messages improvements:
 'bpf tools: Collect symbol table from SHT_SYMTAB section'
 'bpf tools: Collect relocation sections from SHT_REL sections'
 'bpf tools: Record map accessing instructions for each program'
 'bpf tools: Relocate eBPF programs'
 'bpf tools: Link all bpf objects onto a list'

 Decoupling:
 'bpf tools: Collect eBPF programs from their own sections'
 'bpf tools: Introduce accessors for struct bpf_program'

 Renaming: bpf_object__for_each -> bpf_object__for_each_safe
 'bpf tools: Link all bpf objects onto a list'

 Patch ordering:
 'perf tools: Make perf depend on libbpf'

 Error message improvement (refer to http://llvm.org/apt):
 'perf tools: Call clang to compile C source to object code'

In this v11 part 1 patch set, I haven't follow your comment in
'bpf tools: Introduce accessors for struct bpf_object' that let me
update accessors API from returning error code to returning actual
value and indicate error using invalid values. I prefer current API
because I saw and fixed many bugs related to error code in perf's
code (like commit ed30775). Reason of those bugs are misusing of
error code: some part of code return negative on error, some part
of code return non-zero on error, and developer forgot them. I don't
want libbpf to introduce more bugs like them. But if you insist on
it, I'll change it.

Wang Nan (39):
  bpf: Use correct #ifdef controller for trace_call_bpf()
  tracing, perf: Implement BPF programs attached to uprobes
  bpf tools: Introduce 'bpf' library and add bpf feature check
  bpf tools: Allow caller to set printing function
  bpf tools: Open eBPF object file and do basic validation
  bpf tools: Read eBPF object from buffer
  bpf tools: Check endianness and make libbpf fail early
  bpf tools: Iterate over ELF sections to collect information
  bpf tools: Collect version and license from ELF sections
  bpf tools: Collect map definitions from 'maps' section
  bpf tools: Collect symbol table from SHT_SYMTAB section
  bpf tools: Collect eBPF programs from their own sections
  bpf tools: Collect relocation sections from SHT_REL sections
  bpf tools: Record map accessing instructions for each program
  bpf tools: Add bpf.c/h for common bpf operations
  bpf tools: Create eBPF maps defined in an object file
  bpf tools: Relocate eBPF programs
  bpf tools: Introduce bpf_load_program() to bpf.c
  bpf tools: Load eBPF programs in object files into kernel
  bpf tools: Introduce accessors for struct bpf_program
  bpf tools: Introduce accessors for struct bpf_object
  bpf tools: Link all bpf objects onto a list
  perf tools: Introduce llvm config options
  perf tools: Call clang to compile C source to object code
  perf tools: Auto detecting kernel build directory
  perf tools: Auto detecting kernel include options
  perf tests: Add LLVM test for eBPF on-the-fly compiling
  perf tools: Make perf depend on libbpf
  perf record: Enable passing bpf object file to --event
  perf record: Compile scriptlets if pass '.c' to --event
  perf tools: Parse probe points of eBPF programs during preparation
  perf probe: Attach trace_probe_event with perf_probe_event
  perf record: Probe at kprobe points
  perf record: Load all eBPF object into kernel
  perf tools: Add bpf_fd field to evsel and config it
  perf tools: Attach eBPF program to perf event
  perf tools: Suppress probing messages when probing by BPF loading
  perf record: Add clang options for compiling BPF scripts
  bpf tools: Load a program with different instance using preprocessor

 include/linux/trace_events.h|7 +-
 kernel/events/core.c|4 +-
 kernel/trace/Kconfig|2 +-
 kernel/trace/trace_uprobe.c |5 +
 tools/build/Makefile.feature|6 +-
 tools/build/feature/Makefile|6 +-
 tools/build/feature/test-bpf.c  |   18 +
 tools/lib/bpf/.gitignore|2 +
 tools/lib/bpf/Build |1 +
 tools/lib/bpf/Makefile  |  195 +++
 tools/lib/bpf/bpf.c |   85 +++
 tools/lib/bpf/bpf.h |   23 +
 tools/lib/bpf/libbpf.c  | 1184 +++
 tools/lib/bpf/libbpf.h  |  107 
 tools/perf/MANIFEST |3 +
 tools/perf/Makefile.perf|   19 +-
 tools/perf/builtin-probe.c  |4 +-
 tools/perf/builtin-record.c |   43 +-
 tools/perf/config/Makefile  |   19 +-
 tools/perf/tests/Build  |1 +
 tools/perf/tests/builtin-test.c 

[PATCH v11 12/39] bpf tools: Collect eBPF programs from their own sections

2015-07-08 Thread Wang Nan
This patch collects all programs in an object file into an array of
'struct bpf_program' for further processing. That structure is for
representing each eBPF program. 'bpf_prog' should be a better name, but
it has been used by linux/filter.h. Although it is a kernel space name,
I still prefer to call it 'bpf_program' to prevent possible confusion.

bpf_program__new() creates a new 'struct bpf_program' object. It first
init a variable in stack using __bpf_program__new(), then if success,
enlarges obj->programs array and copy the new object in.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-13-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 117 +
 1 file changed, 117 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 9b016c0..3b717de 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -78,12 +78,27 @@ void libbpf_set_print(libbpf_print_fn_t warn,
 # define LIBBPF_ELF_C_READ_MMAP ELF_C_READ
 #endif
 
+/*
+ * bpf_prog should be a better name but it has been used in
+ * linux/filter.h.
+ */
+struct bpf_program {
+   /* Index in elf obj file, for relocation use. */
+   int idx;
+   char *section_name;
+   struct bpf_insn *insns;
+   size_t insns_cnt;
+};
+
 struct bpf_object {
char license[64];
u32 kern_version;
void *maps_buf;
size_t maps_buf_sz;
 
+   struct bpf_program *programs;
+   size_t nr_programs;
+
/*
 * Information when doing elf related work. Only valid if fd
 * is valid.
@@ -100,6 +115,84 @@ struct bpf_object {
 };
 #define obj_elf_valid(o)   ((o)->efile.elf)
 
+static void bpf_program__clear(struct bpf_program *prog)
+{
+   if (!prog)
+   return;
+
+   zfree(>section_name);
+   zfree(>insns);
+   prog->insns_cnt = 0;
+   prog->idx = -1;
+}
+
+static int
+__bpf_program__new(void *data, size_t size, char *name, int idx,
+  struct bpf_program *prog)
+{
+   if (size < sizeof(struct bpf_insn)) {
+   pr_warning("corrupted section '%s'\n", name);
+   return -EINVAL;
+   }
+
+   bzero(prog, sizeof(*prog));
+
+   prog->section_name = strdup(name);
+   if (!prog->section_name) {
+   pr_warning("failed to alloc name for prog %s\n",
+  name);
+   goto errout;
+   }
+
+   prog->insns = malloc(size);
+   if (!prog->insns) {
+   pr_warning("failed to alloc insns for %s\n", name);
+   goto errout;
+   }
+   prog->insns_cnt = size / sizeof(struct bpf_insn);
+   memcpy(prog->insns, data,
+  prog->insns_cnt * sizeof(struct bpf_insn));
+   prog->idx = idx;
+
+   return 0;
+errout:
+   bpf_program__clear(prog);
+   return -ENOMEM;
+}
+
+static struct bpf_program *
+bpf_program__new(struct bpf_object *obj, void *data, size_t size,
+char *name, int idx)
+{
+   struct bpf_program prog, *progs;
+   int nr_progs, err;
+
+   err = __bpf_program__new(data, size, name, idx, );
+   if (err)
+   return NULL;
+
+   progs = obj->programs;
+   nr_progs = obj->nr_programs;
+
+   progs = realloc(progs, sizeof(progs[0]) * (nr_progs + 1));
+   if (!progs) {
+   /*
+* In this case the original obj->programs
+* is still valid, so don't need special treat for
+* bpf_close_object().
+*/
+   pr_warning("failed to alloc a new program '%s'\n",
+  name);
+   bpf_program__clear();
+   return NULL;
+   }
+
+   obj->programs = progs;
+   obj->nr_programs = nr_progs + 1;
+   progs[nr_progs] = prog;
+   return [nr_progs];
+}
+
 static struct bpf_object *bpf_object__new(const char *path,
  void *obj_buf,
  size_t obj_buf_sz)
@@ -342,6 +435,21 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
err = -EEXIST;
} else
obj->efile.symbols = data;
+   } else if ((sh.sh_type == SHT_PROGBITS) &&
+  (sh.sh_flags & SHF_EXECINSTR) &&
+  (data->d_size > 0)) {
+   struct bpf_program *prog;
+
+   prog = bpf_program__new(obj, data->d_buf,
+   data->d_size, name,
+   idx);
+  

[PATCH v11 33/39] perf record: Probe at kprobe points

2015-07-08 Thread Wang Nan
In this patch, kprobe points are created using add_perf_probe_events.
Since all events are already grouped together in an array, calling
add_perf_probe_events() creates all of them.

probe_conf.max_probes is set to MAX_PROBES to support glob matching.

Signed-off-by: Wang Nan 
---
 tools/perf/builtin-record.c  | 18 -
 tools/perf/util/bpf-loader.c | 48 
 tools/perf/util/bpf-loader.h |  4 
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 283fe96..33b213a 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -29,6 +29,7 @@
 #include "util/data.h"
 #include "util/auxtrace.h"
 #include "util/parse-branch-options.h"
+#include "util/bpf-loader.h"
 
 #include 
 #include 
@@ -,7 +1112,21 @@ int cmd_record(int argc, const char **argv, const char 
*prefix __maybe_unused)
if (err)
return err;
 
-   err = -ENOMEM;
+   /*
+* bpf__probe must be called before symbol__init() because we
+* need init_symbol_maps. If called after symbol__init,
+* symbol_conf.sort_by_name won't take effect.
+*
+* bpf__unprobe() is safe even if bpf__probe() failed, and it
+* also calls symbol__init. Therefore, goto out_symbol_exit
+* is safe when probe failed.
+*/
+   err = bpf__probe();
+   if (err) {
+   pr_err("Probing at events in BPF object failed.\n");
+   pr_err("Try perf probe -d '*' to remove existing probe 
events.\n");
+   goto out_symbol_exit;
+   }
 
symbol__init(NULL);
 
@@ -1172,6 +1187,7 @@ out_symbol_exit:
perf_evlist__delete(rec->evlist);
symbol__exit();
auxtrace_record__free(rec->itr);
+   bpf__unprobe();
return err;
 }
 
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index e810d05..df3f471 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -193,3 +193,51 @@ void bpf__clear(void)
bpf_object__for_each_safe(obj, tmp)
bpf_object__close(obj);
 }
+
+static bool is_probing;
+
+int bpf__unprobe(void)
+{
+   struct strfilter *delfilter;
+   int ret;
+
+   if (!is_probing)
+   return 0;
+
+   delfilter = strfilter__new(PERF_BPF_PROBE_GROUP ":*", NULL);
+   if (!delfilter) {
+   pr_err("Failed to create delfilter when unprobing\n");
+   return -ENOMEM;
+   }
+
+   ret = del_perf_probe_events(delfilter);
+   strfilter__delete(delfilter);
+   if (ret < 0 && is_probing)
+   pr_err("Error: failed to delete events: %s\n",
+   strerror(-ret));
+   else
+   is_probing = false;
+   return ret < 0 ? ret : 0;
+}
+
+int bpf__probe(void)
+{
+   int err;
+
+   if (nr_probe_events <= 0)
+   return 0;
+
+   probe_conf.max_probes = MAX_PROBES;
+   /* Let add_perf_probe_events keeps probe_trace_event */
+   err = add_perf_probe_events(probe_event_array,
+   nr_probe_events,
+   false);
+
+   /* add_perf_probe_events return negative when fail */
+   if (err < 0)
+   pr_err("bpf probe: failed to probe events\n");
+   else
+   is_probing = true;
+
+   return err < 0 ? err : 0;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 5a3c954..374aec0 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -12,6 +12,8 @@
 
 #ifdef HAVE_LIBBPF_SUPPORT
 int bpf__prepare_load(const char *filename, bool source);
+int bpf__probe(void);
+int bpf__unprobe(void);
 
 void bpf__clear(void);
 #else
@@ -22,6 +24,8 @@ static inline int bpf__prepare_load(const char *filename 
__maybe_unused,
return -1;
 }
 
+static inline int bpf__probe(void) { return 0; }
+static inline int bpf__unprobe(void) { return 0; }
 static inline void bpf__clear(void) { }
 #endif
 #endif
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 13/39] bpf tools: Collect relocation sections from SHT_REL sections

2015-07-08 Thread Wang Nan
This patch collects relocation sections into 'struct object'.  Such
sections are used for connecting maps to bpf programs. 'reloc' field in
'struct bpf_object' is introduced for storing such information.

This patch simply store the data into 'reloc' field. Following patch
will parse them to know the exact instructions which are needed to be
relocated.

Note that the collected data will be invalid after ELF object file is
closed.

This is the second patch related to map relocation. The first one is
'bpf tools: Collect symbol table from SHT_SYMTAB section'. The
principle of map relocation is described in its commit message.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-14-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 3b717de..5f12fa6 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -110,6 +110,11 @@ struct bpf_object {
Elf *elf;
GElf_Ehdr ehdr;
Elf_Data *symbols;
+   struct {
+   GElf_Shdr shdr;
+   Elf_Data *data;
+   } *reloc;
+   int nr_reloc;
} efile;
char path[];
 };
@@ -230,6 +235,9 @@ static void bpf_object__elf_finish(struct bpf_object *obj)
obj->efile.elf = NULL;
}
obj->efile.symbols = NULL;
+
+   zfree(>efile.reloc);
+   obj->efile.nr_reloc = 0;
zclose(obj->efile.fd);
obj->efile.obj_buf = NULL;
obj->efile.obj_buf_sz = 0;
@@ -450,6 +458,24 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
} else
pr_debug("found program %s\n",
 prog->section_name);
+   } else if (sh.sh_type == SHT_REL) {
+   void *reloc = obj->efile.reloc;
+   int nr_reloc = obj->efile.nr_reloc + 1;
+
+   reloc = realloc(reloc,
+   sizeof(*obj->efile.reloc) * nr_reloc);
+   if (!reloc) {
+   pr_warning("realloc failed\n");
+   err = -ENOMEM;
+   } else {
+   int n = nr_reloc - 1;
+
+   obj->efile.reloc = reloc;
+   obj->efile.nr_reloc = nr_reloc;
+
+   obj->efile.reloc[n].shdr = sh;
+   obj->efile.reloc[n].data = data;
+   }
}
if (err)
goto out;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 26/39] perf tools: Auto detecting kernel include options

2015-07-08 Thread Wang Nan
To help user find correct kernel include options, this patch extracts
them from kbuild system by an embedded script kinc_fetch_script, which
creates a temporary directory, generates Makefile and an empty dummy.o
then use the Makefile to fetch $(NOSTDINC_FLAGS), $(LINUXINCLUDE) and
$(EXTRA_CFLAGS) options. The result is passed to compiler script using
'KERNEL_INC_OPTIONS' environment variable.

Because options from kbuild contains relative path like
'Iinclude/generated/uapi', the work directory must be changed. This is
done by previous patch.

Signed-off-by: Wang Nan 
---
 tools/perf/util/llvm-utils.c | 59 
 1 file changed, 54 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/llvm-utils.c b/tools/perf/util/llvm-utils.c
index 2ca2bd6..b658896 100644
--- a/tools/perf/util/llvm-utils.c
+++ b/tools/perf/util/llvm-utils.c
@@ -218,15 +218,42 @@ static const char *kbuild_detector =
 "fi\n"
 "exit -1\n";
 
+static const char *kinc_fetch_script =
+"#!/usr/bin/env sh\n"
+"if ! test -d \"$KBUILD_DIR\"\n"
+"then\n"
+"  exit -1\n"
+"fi\n"
+"if ! test -f \"$KBUILD_DIR/include/generated/autoconf.h\"\n"
+"then\n"
+"  exit -1\n"
+"fi\n"
+"TMPDIR=`mktemp -d`\n"
+"if test -z \"$TMPDIR\"\n"
+"then\n"
+"exit -1\n"
+"fi\n"
+"cat << EOF > $TMPDIR/Makefile\n"
+"obj-y := dummy.o\n"
+"\\$(obj)/%.o: \\$(src)/%.c\n"
+"\t@echo -n \"\\$(NOSTDINC_FLAGS) \\$(LINUXINCLUDE) \\$(EXTRA_CFLAGS)\"\n"
+"EOF\n"
+"touch $TMPDIR/dummy.c\n"
+"make -s -C $KBUILD_DIR M=$TMPDIR $KBUILD_OPTS dummy.o 2>/dev/null\n"
+"RET=$?\n"
+"rm -rf $TMPDIR\n"
+"exit $RET\n";
+
 static inline void
-get_kbuild_opts(char **kbuild_dir)
+get_kbuild_opts(char **kbuild_dir, char **kbuild_include_opts)
 {
int err;
 
-   if (!kbuild_dir)
+   if (!kbuild_dir || !kbuild_include_opts)
return;
 
*kbuild_dir = NULL;
+   *kbuild_include_opts = NULL;
 
if (llvm_param.kbuild_dir && !llvm_param.kbuild_dir[0]) {
pr_debug("[llvm.kbuild-dir] is set to \"\" deliberately.\n");
@@ -247,6 +274,26 @@ get_kbuild_opts(char **kbuild_dir)
 " \tdetection.\n\n");
return;
}
+
+   pr_debug("Kernel build dir is set to %s\n", *kbuild_dir);
+   force_set_env("KBUILD_DIR", *kbuild_dir);
+   err = read_from_pipe(kinc_fetch_script,
+(void **)kbuild_include_opts,
+NULL);
+   if (err) {
+   pr_warning(
+"WARNING:\tunable to get kernel include directories from '%s'\n"
+"Hint:\tTry set clang include options using 'clang-bpf-cmd-template'\n"
+" \toption in [llvm] section of ~/.perfconfig and set 'kbuild-dir'\n"
+" \toption in [llvm] to \"\" to suppress this detection.\n\n",
+   *kbuild_dir);
+
+   free(*kbuild_dir);
+   *kbuild_dir = NULL;
+   return;
+   }
+
+   pr_debug("include option is set to %s\n", *kbuild_include_opts);
 }
 
 int llvm__compile_bpf(const char *path, void **p_obj_buf,
@@ -256,7 +303,7 @@ int llvm__compile_bpf(const char *path, void **p_obj_buf,
char clang_path[PATH_MAX];
const char *clang_opt = llvm_param.clang_opt;
const char *template = llvm_param.clang_bpf_cmd_template;
-   char *kbuild_dir = NULL;
+   char *kbuild_dir = NULL, *kbuild_include_opts = NULL;
void *obj_buf = NULL;
size_t obj_buf_sz;
 
@@ -278,11 +325,11 @@ int llvm__compile_bpf(const char *path, void **p_obj_buf,
 * This is an optional work. Even it fail we can continue our
 * work. Needn't to check error return.
 */
-   get_kbuild_opts(_dir);
+   get_kbuild_opts(_dir, _include_opts);
 
force_set_env("CLANG_EXEC", clang_path);
force_set_env("CLANG_OPTIONS", clang_opt);
-   force_set_env("KERNEL_INC_OPTIONS", NULL);
+   force_set_env("KERNEL_INC_OPTIONS", kbuild_include_opts);
force_set_env("WORKING_DIR", kbuild_dir ? : ".");
 
/*
@@ -305,6 +352,7 @@ int llvm__compile_bpf(const char *path, void **p_obj_buf,
}
 
free(kbuild_dir);
+   free(kbuild_include_opts);
if (!p_obj_buf)
free(obj_buf);
else
@@ -315,6 +363,7 @@ int llvm__compile_bpf(const char *path, void **p_obj_buf,
return 0;
 errout:
free(kbuild_dir);
+   free(kbuild_include_opts);
free(obj_buf);
if (p_obj_buf)
*p_obj_buf = NULL;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 07/39] bpf tools: Check endianness and make libbpf fail early

2015-07-08 Thread Wang Nan
Check endianness according to EHDR. Code is taken from
tools/perf/util/symbol-elf.c.

Libbpf doesn't magically convert missmatched endianness. Even if we swap
eBPF instructions to correct byte order, we are unable to deal with
endianness in code logical generated by LLVM.

Therefore, libbpf should simply reject missmatched ELF object, and let
LLVM to create good code.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-8-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 36dfbc1..15b3e82 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -192,6 +192,34 @@ errout:
return err;
 }
 
+static int
+bpf_object__check_endianness(struct bpf_object *obj)
+{
+   static unsigned int const endian = 1;
+
+   switch (obj->efile.ehdr.e_ident[EI_DATA]) {
+   case ELFDATA2LSB:
+   /* We are big endian, BPF obj is little endian. */
+   if (*(unsigned char const *) != 1)
+   goto mismatch;
+   break;
+
+   case ELFDATA2MSB:
+   /* We are little endian, BPF obj is big endian. */
+   if (*(unsigned char const *) != 0)
+   goto mismatch;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   return 0;
+
+mismatch:
+   pr_warning("Error: endianness mismatch.\n");
+   return -EINVAL;
+}
+
 static struct bpf_object *
 __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz)
 {
@@ -208,6 +236,8 @@ __bpf_object__open(const char *path, void *obj_buf, size_t 
obj_buf_sz)
 
if (bpf_object__elf_init(obj))
goto out;
+   if (bpf_object__check_endianness(obj))
+   goto out;
 
bpf_object__elf_finish(obj);
return obj;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 08/39] bpf tools: Iterate over ELF sections to collect information

2015-07-08 Thread Wang Nan
bpf_obj_elf_collect() is introduced to iterate over each elf sections to
collection information in eBPF object files. This function will futher
enhanced to collect license, kernel version, programs, configs and map
information.

Signed-off-by: Wang Nan 
Acked-by: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Kaixu Xia 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Zefan Li 
Cc: pi3or...@163.com
Link: 
http://lkml.kernel.org/r/1435716878-189507-9-git-send-email-wangn...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 53 ++
 1 file changed, 53 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 15b3e82..d8d6eb5 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -220,6 +220,57 @@ mismatch:
return -EINVAL;
 }
 
+static int bpf_object__elf_collect(struct bpf_object *obj)
+{
+   Elf *elf = obj->efile.elf;
+   GElf_Ehdr *ep = >efile.ehdr;
+   Elf_Scn *scn = NULL;
+   int idx = 0, err = 0;
+
+   /* Elf is corrupted/truncated, avoid calling elf_strptr. */
+   if (!elf_rawdata(elf_getscn(elf, ep->e_shstrndx), NULL)) {
+   pr_warning("failed to get e_shstrndx from %s\n",
+  obj->path);
+   return -EINVAL;
+   }
+
+   while ((scn = elf_nextscn(elf, scn)) != NULL) {
+   char *name;
+   GElf_Shdr sh;
+   Elf_Data *data;
+
+   idx++;
+   if (gelf_getshdr(scn, ) != ) {
+   pr_warning("failed to get section header from %s\n",
+  obj->path);
+   err = -EINVAL;
+   goto out;
+   }
+
+   name = elf_strptr(elf, ep->e_shstrndx, sh.sh_name);
+   if (!name) {
+   pr_warning("failed to get section name from %s\n",
+  obj->path);
+   err = -EINVAL;
+   goto out;
+   }
+
+   data = elf_getdata(scn, 0);
+   if (!data) {
+   pr_warning("failed to get section data from %s(%s)\n",
+  name, obj->path);
+   err = -EINVAL;
+   goto out;
+   }
+   pr_debug("section %s, size %ld, link %d, flags %lx, type=%d\n",
+name, (unsigned long)data->d_size,
+(int)sh.sh_link, (unsigned long)sh.sh_flags,
+(int)sh.sh_type);
+   }
+out:
+   return err;
+}
+
 static struct bpf_object *
 __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz)
 {
@@ -238,6 +289,8 @@ __bpf_object__open(const char *path, void *obj_buf, size_t 
obj_buf_sz)
goto out;
if (bpf_object__check_endianness(obj))
goto out;
+   if (bpf_object__elf_collect(obj))
+   goto out;
 
bpf_object__elf_finish(obj);
return obj;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] MIPS, IRQCHIP: Move i8259 irqchip driver to drivers/irqchip.

2015-07-08 Thread Thomas Gleixner
On Wed, 8 Jul 2015, Ralf Baechle wrote:
> On Wed, Jul 08, 2015 at 02:57:38PM +0200, Thomas Gleixner wrote:
> 
> > >  arch/mips/Kconfig   |   4 -
> > >  arch/mips/kernel/Makefile   |   1 -
> > >  arch/mips/kernel/i8259.c| 384 
> > > 
> > >  drivers/irqchip/Kconfig |   4 +
> > >  drivers/irqchip/Makefile|   1 +
> > >  drivers/irqchip/irq-i8259.c | 383 
> > > +++
> > >  6 files changed, 388 insertions(+), 389 deletions(-)
> > 
> > Should I carry it, or want you merge it via the mips tree?
> > 
> > In the latter case: Acked-by-me.
> 
> I guess the conflict potencial will be lower if you carry it - and if
> somebody wants to merge it with one of the other i8259.c littered through
> the tree it probably also is better if you carry it.

Good point. That move might be just the trigger for me to sit down and
do it :)

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS

2015-07-08 Thread Stephen Smalley
On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins  wrote:
> It appears that, at some point last year, XFS made directory handling
> changes which bring it into lockdep conflict with shmem_zero_setup():
> it is surprising that mmap() can clone an inode while holding mmap_sem,
> but that has been so for many years.
>
> Since those few lockdep traces that I've seen all implicated selinux,
> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which
> v3.13's commit c7277090927a ("security: shmem: implement kernel private
> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes:
> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail.
>
> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero
> (and MAP_SHARED|MAP_ANONYMOUS).  I thought there were also drivers
> which cloned inode in mmap(), but if so, I cannot locate them now.

This causes a regression for SELinux (please, in the future, cc
selinux list and Paul Moore on SELinux-related changes).  In
particular, this change disables SELinux checking of mprotect
PROT_EXEC on shared anonymous mappings, so we lose the ability to
control executable mappings.  That said, we are only getting that
check today as a side effect of our file execute check on the tmpfs
inode, whereas it would be better (and more consistent with the
mmap-time checks) to apply an execmem check in that case, in which
case we wouldn't care about the inode-based check.  However, I am
unclear on how to correctly detect that situation from
selinux_file_mprotect() -> file_map_prot_check(), because we do have a
non-NULL vma->vm_file so we treat it as a file execute check.  In
contrast, if directly creating an anonymous shared mapping with
PROT_EXEC via mmap(...PROT_EXEC...),  selinux_mmap_file is called with
a NULL file and therefore we end up applying an execmem check.

>
> Reported-and-tested-by: Prarit Bhargava 
> Reported-by: Daniel Wagner 
> Reported-by: Morten Stevens 
> Signed-off-by: Hugh Dickins 
> ---
>
>  mm/shmem.c |8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> --- 4.1-rc7/mm/shmem.c  2015-04-26 19:16:31.352191298 -0700
> +++ linux/mm/shmem.c2015-06-14 09:26:49.461120166 -0700
> @@ -3401,7 +3401,13 @@ int shmem_zero_setup(struct vm_area_stru
> struct file *file;
> loff_t size = vma->vm_end - vma->vm_start;
>
> -   file = shmem_file_setup("dev/zero", size, vma->vm_flags);
> +   /*
> +* Cloning a new file under mmap_sem leads to a lock ordering conflict
> +* between XFS directory reading and selinux: since this file is only
> +* accessible to the user through its mapping, use S_PRIVATE flag to
> +* bypass file security, in the same way as shmem_kernel_file_setup().
> +*/
> +   file = __shmem_file_setup("dev/zero", size, vma->vm_flags, S_PRIVATE);
> if (IS_ERR(file))
> return PTR_ERR(file);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: perf, kprobes: fuzzer generates huge number of WARNings

2015-07-08 Thread Masami Hiramatsu
On 2015/07/08 6:21, Alexei Starovoitov wrote:
> On Tue, Jul 07, 2015 at 05:08:51PM -0400, Vince Weaver wrote:
>> On Tue, 7 Jul 2015, Alexei Starovoitov wrote:
>>
>>> On Tue, Jul 07, 2015 at 12:00:12AM -0400, Vince Weaver wrote:

 Well the BPF hack is in the fuzzer, not the kernel.  And it's not really a 
 hack, it just turned out to be a huge pain to figure out how to 
 manually create a valid BPF program in conjunction with a valid kprobe 
 event.
>>>
>>> You mean automatically generating valid bpf program? That's definitely hard.
>>> If you mean just few hardcoded programs then take them from samples or
>>> from test_bpf ?
>>
>> there's already code in trinity that in theory autogenerates bpf programs, 
>> but for now I was just trying to hook up a short known valid one.
>>
>> it might not be possible to really test things though, as you need to be 
>> root to create a kprobe and attach a BPF program, but my fuzzer when run 
>> as root often does all kinds of other stuff that will crash a machine.
>> Is it ever planned to allow using bpf/kprobes without requiring full 
>> CAP_ADMIN privledges?
> 
> I suspect kprobes will forever be root only, whereas for bpf I'm thinking
> to introduce CAP_BPF, but before that we need to finish constant blinding
> and add address leak prevention. So not soon.

Currently I don't plan to do that. Actually systemtap allows that, but
with much bigger blacklist. I think we can make a tool which also
allows user to add new events on the limited functions (white-list).
But anyway, since these can expose kernel function addresses to users,
it is highly recommended to limit users by some capabilities.

 I did have to sprinkle printks in the kprobe and bpf code to find out 
 where various EINVAL returns were coming from, so potentially this is just 
 a problem of printks happening where they shouldn't.  I'll remove those 
 changes and try to reproduce this tomorrow.
>>>
>>> could you please elaborate on this further. Which EINVALs you talking about?
>>
>> When you are trying to create a kprobe and bpf file there's about 10 
>> different ways to get EINVAL as a return value and no way of knowing which 
>> one you are hitting.  I added printks so I could know what issue was 
>> causing the einval.  (from memory, the problems I hit were not zeroing out 
>> the attr structure, having a wrong instruction count, and a few others).

Hmm I must fix some parts of kprobes by changing retval or showing more
precise messages.

Thanks!

> I see. I guess anyone trying to use syscall directly will be facing such
> issues, but libbpf that is being developed to be used by perf and others
> should solve these problems.



-- 
Masami HIRAMATSU
Linux Technology Research Center, System Productivity Research Dept.
Center for Technology Innovation - Systems Engineering
Hitachi, Ltd., Research & Development Group
E-mail: masami.hiramatsu...@hitachi.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 9/9] usb: dwc3: core: don't break during suspend/resume while we're dual-role

2015-07-08 Thread Sergei Shtylyov

Hello.

On 7/8/2015 1:37 PM, Roger Quadros wrote:


We can't rely just on dr_mode to decide if we're in host or gadget
mode when we're configured as otg/dual-role. So while dr_mode is
OTG, we find out from  the otg state machine if we're in host
or gadget mode and take the necessary actions during suspend/resume.



Also make sure that we disable OTG irq and events during system suspend
so that we don't lockup the system during system suspend/resume.



Signed-off-by: Roger Quadros 
---
  drivers/usb/dwc3/core.c | 27 +--
  1 file changed, 9 insertions(+), 18 deletions(-)



diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index e3c094d..3784287 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1444,18 +1444,15 @@ static int dwc3_suspend(struct device *dev)
dwc->octl = dwc3_readl(dwc->regs, DWC3_OCTL);
dwc->oevt = dwc3_readl(dwc->regs, DWC3_OEVT);
dwc->oevten = dwc3_readl(dwc->regs, DWC3_OEVTEN);
+   dwc3_writel(dwc->regs, DWC3_OEVTEN, 0);
+   disable_irq(dwc->otg_irq);
}

-   switch (dwc->dr_mode) {
-   case USB_DR_MODE_PERIPHERAL:
-   case USB_DR_MODE_OTG:
+   if (dwc->dr_mode == USB_DR_MODE_PERIPHERAL ||
+   ((dwc->dr_mode == USB_DR_MODE_OTG) && dwc->fsm->protocol == 
PROTO_GADGET))


   Hum, enclosing the first == op into parens and not doing it to the second 
== op doesn't look very consistent... :-)


[...]

@@ -1495,18 +1492,12 @@ static int dwc3_resume(struct device *dev)
dwc3_writel(dwc->regs, DWC3_OCTL, dwc->octl);
dwc3_writel(dwc->regs, DWC3_OEVT, dwc->oevt);
dwc3_writel(dwc->regs, DWC3_OEVTEN, dwc->oevten);
+   enable_irq(dwc->otg_irq);
}

-   switch (dwc->dr_mode) {
-   case USB_DR_MODE_PERIPHERAL:
-   case USB_DR_MODE_OTG:
+   if (dwc->dr_mode == USB_DR_MODE_PERIPHERAL ||
+   ((dwc->dr_mode == USB_DR_MODE_OTG) && dwc->fsm->protocol == 
PROTO_GADGET))


   Same here...

[...]

WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 1/4] STM trace event: Adding generic buffer interface driver

2015-07-08 Thread Alexander Shishkin
Peter Zijlstra  writes:

> On Tue, Jul 07, 2015 at 06:10:40PM +0800, Chunyan Zhang wrote:
>> +config STM_TRACE_EVENT
>> +tristate "Redirect/copy the output from kernel trace event to STM 
>> engine"
>
> How does tristate make sense here? You're using it unconditionally for
> in-kernel tracepoints. This must be bool, which in turn makes the whole
> STM thing bool afaiu.

That would make the whole STM thing a bool. I'd rather we used stm
output *conditionally* by somehow plugging into tracepoint output, say
with a jump label or something. Haven't had time to think about it yet.

Regards,
--
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 4/4] trace: Trace log handler for logging into STM blocks

2015-07-08 Thread Chunyan Zhang
On Wed, Jul 8, 2015 at 8:31 PM, Peter Zijlstra  wrote:
> On Tue, Jul 07, 2015 at 06:10:43PM +0800, Chunyan Zhang wrote:
>> Add the function 'trace_event_stm_output_##call' for printing events
>> trace log into STM blocks.
>>
>> This patch also adds a function call at where the events have been
>> committed to ring buffer to export the trace event information to
>> STM blocks.
>
> So then you have two copies of the data, why that? Would a scheme were
> data either goes to the STM or the regular buffer not make much more
> sense?

We don't have two copies when we export the trace logs to STM, because
the event trace logs what we can see by catting the Ftrace files
haven't been generated at that moment.

>
>> +++ b/include/trace/perf.h
>> @@ -175,6 +175,7 @@ trace_event_raw_event_##call(void *__data, proto)
>>  \
>>   { assign; } \
>>   \
>>   trace_event_buffer_commit();\
>> + trace_event_stm_log();  \
>
> This makes every trace event slower.

It doesn't actually, you may decide if enable this feature, the trace
event will not be slowed if STM_TRACE_EVENT is not selected.
But if this feature enabled, it will indeed take more time than
without this feature.

Best regards,
Chunyan

>
>>  }
>>  /*
>>   * The ftrace_test_probe is compiled out, it is only here as a build time 
>> check
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 0/7]powerpc/powernv: Nest Instrumentation support

2015-07-08 Thread Madhavan Srinivasan
This patchset enables Nest Instrumentation support on powerpc.
POWER8 has per-chip Nest Intrumentation which provides various
per-chip metrics like memory, powerbus, Xlink and Alink
bandwidth.

Nest Instrumentation provides an interface (via PORE Engine)
to configure and move the nest counter data to memory. From
kernel side, OPAL Call interface is used to activate/deactivate
PORE Engine for nest data collection.

OPAL at boot, detects the feature, initializes it and pass on
the nest units and other related information such as memory
region, events supported so on, to kernel via device-tree.

Kernel code then, parses the device-tree for nest pmu support
and registers nest pmu with the events available. PORE Engine collects
and accumulate nest counter data in per-chip reserved memory region, hence
device-tree also exports per-chip nest accumulation memory region.
And individual event offset are used as event configuration values.

Here is sample perf usage to explain the interface.

#./perf list

  iTLB-load-misses   [Hardware cache event]

  Nest_Alink_BW/Alink0/  [Kernel PMU event]
  Nest_Alink_BW/Alink1/  [Kernel PMU event]
  Nest_Alink_BW/Alink2/  [Kernel PMU event]
  Nest_MCS_Read_BW/MCS_00/   [Kernel PMU event]
  Nest_MCS_Read_BW/MCS_01/   [Kernel PMU event]
  Nest_MCS_Read_BW/MCS_02/   [Kernel PMU event]
  Nest_MCS_Read_BW/MCS_03/   [Kernel PMU event]
  Nest_MCS_Write_BW/MCS_00/  [Kernel PMU event]
  Nest_MCS_Write_BW/MCS_01/  [Kernel PMU event]
  Nest_MCS_Write_BW/MCS_02/  [Kernel PMU event]
  Nest_MCS_Write_BW/MCS_03/  [Kernel PMU event]
  Nest_PowerBus_BW/External/ [Kernel PMU event]
  Nest_PowerBus_BW/Internal/ [Kernel PMU event]
  Nest_Xlink_BW/Xlink0/  [Kernel PMU event]
  Nest_Xlink_BW/Xlink1/  [Kernel PMU event]
  Nest_Xlink_BW/Xlink2/  [Kernel PMU event]

  rNNN   [Raw hardware event 
descriptor]
  cpu/t1=v1[,t2=v2,t3 ...]/modifier  [Raw hardware event 
descriptor]
.

# ./perf stat -e 'Nest_Xlink_BW/Xlink1/' -a -A sleep 1

 Performance counter stats for 'system wide':

CPU0 15,913.18 MiB  Nest_Xlink_BW/Xlink1/
CPU3211,955.88 MiB  Nest_Xlink_BW/Xlink1/
CPU6411,042.43 MiB  Nest_Xlink_BW/Xlink1/
CPU9614,065.27 MiB  Nest_Xlink_BW/Xlink1/

   1.001062038 seconds time elapsed

# ./perf stat -e 
'Nest_Alink_BW/Alink0/,Nest_Alink_BW/Alink1/,Nest_Alink_BW/Alink2/' -a -A -I 
1000 sleep 5

 Performance counter stats for 'system wide':

CPU0  0.00 MiB  Nest_Alink_BW/Alink0/   
  (100.00%)
CPU32 0.00 MiB  Nest_Alink_BW/Alink0/   
  (100.00%)
CPU64 0.00 MiB  Nest_Alink_BW/Alink0/   
  (100.00%)
CPU96 0.00 MiB  Nest_Alink_BW/Alink0/   
  (100.00%)
CPU0  1,430.43 MiB  Nest_Alink_BW/Alink1/   
  (100.00%)
CPU32   320.99 MiB  Nest_Alink_BW/Alink1/   
  (100.00%)
CPU64 3,443.83 MiB  Nest_Alink_BW/Alink1/   
  (100.00%)
CPU96 1,904.41 MiB  Nest_Alink_BW/Alink1/   
  (100.00%)
CPU0  2,856.85 MiB  Nest_Alink_BW/Alink2/
CPU32 7.50 MiB  Nest_Alink_BW/Alink2/
CPU64 4,034.29 MiB  Nest_Alink_BW/Alink2/
CPU96   288.49 MiB  Nest_Alink_BW/Alink2/
.

OPAL side patches are posted in the skiboot mailing list.

Changelog from v3:

No logic change, just a rebase to latest upstream kernel.

Changelog from v2:

1) Changed variable and macro names to be consistent.
2) Made changes to commit message and code comment messages
3) Moved "format attribute" related code from patch 6 to 5
4) Added check for pmu register function
5) Changed cpu_init and cpu_exit functions to use first online
   cpu of the chip, there by making code lot simplier.

Changelog from v1:

1) No logic changes, re-ordered patches make each patch compile
   without errors
2) Added comments based on the review feedback.
3) removed perf_event_del function and replaced it with perf_event_stop.
4) Moved Nest feature detection code out of parser function.
5) Optimized functions and removed some variables.
6) squashed the makefile changes, instead of the separate patch
7) squashed the cpumask and hotplug patches as single patch
8) Added cpu checks in 

[PATCH 3/4] mm, oom: organize oom context into struct

2015-07-08 Thread Michal Hocko
From: David Rientjes 

There are essential elements to an oom context that are passed around to
multiple functions.

Organize these elements into a new struct, struct oom_context, that
specifies the context for an oom condition.

This patch introduces no functional change.

[mho...@suse.cz: s@oom_control@oom_context@]
[mho...@suse.cz: do not initialize on stack oom_context with NULL or 0]
Signed-off-by: David Rientjes 
Signed-off-by: Michal Hocko 
---
 drivers/tty/sysrq.c |  10 -
 include/linux/oom.h |  25 +++-
 mm/memcontrol.c |  13 +++---
 mm/oom_kill.c   | 115 +++-
 mm/page_alloc.c |   9 +++-
 5 files changed, 89 insertions(+), 83 deletions(-)

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index b20d2c0ec451..865b837a9aee 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -356,9 +356,15 @@ static struct sysrq_key_op sysrq_term_op = {
 
 static void moom_callback(struct work_struct *ignored)
 {
+   const gfp_t gfp_mask = GFP_KERNEL;
+   struct oom_context oc = {
+   .zonelist = node_zonelist(first_memory_node, gfp_mask),
+   .gfp_mask = gfp_mask,
+   .force_kill = true,
+   };
+
mutex_lock(_lock);
-   if (!out_of_memory(node_zonelist(first_memory_node, GFP_KERNEL),
-  GFP_KERNEL, 0, NULL, true))
+   if (!out_of_memory())
pr_info("OOM request ignored because killer is disabled\n");
mutex_unlock(_lock);
 }
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 7deecb7bca5e..094407cb2d2e 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -12,6 +12,14 @@ struct notifier_block;
 struct mem_cgroup;
 struct task_struct;
 
+struct oom_context {
+   struct zonelist *zonelist;
+   nodemask_t  *nodemask;
+   gfp_t   gfp_mask;
+   int order;
+   boolforce_kill;
+};
+
 /*
  * Types of limitations to the nodes from which allocations may occur
  */
@@ -57,21 +65,18 @@ extern unsigned long oom_badness(struct task_struct *p,
 
 extern int oom_kills_count(void);
 extern void note_oom_kill(void);
-extern void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
+extern void oom_kill_process(struct oom_context *oc, struct task_struct *p,
 unsigned int points, unsigned long totalpages,
-struct mem_cgroup *memcg, nodemask_t *nodemask,
-const char *message);
+struct mem_cgroup *memcg, const char *message);
 
-extern void check_panic_on_oom(enum oom_constraint constraint, gfp_t gfp_mask,
-  int order, const nodemask_t *nodemask,
+extern void check_panic_on_oom(struct oom_context *oc,
+  enum oom_constraint constraint,
   struct mem_cgroup *memcg);
 
-extern enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
-   unsigned long totalpages, const nodemask_t *nodemask,
-   bool force_kill);
+extern enum oom_scan_t oom_scan_process_thread(struct oom_context *oc,
+   struct task_struct *task, unsigned long totalpages);
 
-extern bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
-   int order, nodemask_t *mask, bool force_kill);
+extern bool out_of_memory(struct oom_context *oc);
 
 extern void exit_oom_victim(void);
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index acb93c554f6e..7ad5352bd3f0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1545,6 +1545,10 @@ static unsigned long mem_cgroup_get_limit(struct 
mem_cgroup *memcg)
 static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 int order)
 {
+   struct oom_context oc = {
+   .gfp_mask = gfp_mask,
+   .order = order,
+   };
struct mem_cgroup *iter;
unsigned long chosen_points = 0;
unsigned long totalpages;
@@ -1563,7 +1567,7 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup 
*memcg, gfp_t gfp_mask,
goto unlock;
}
 
-   check_panic_on_oom(CONSTRAINT_MEMCG, gfp_mask, order, NULL, memcg);
+   check_panic_on_oom(, CONSTRAINT_MEMCG, memcg);
totalpages = mem_cgroup_get_limit(memcg) ? : 1;
for_each_mem_cgroup_tree(iter, memcg) {
struct css_task_iter it;
@@ -1571,8 +1575,7 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup 
*memcg, gfp_t gfp_mask,
 
css_task_iter_start(>css, );
while ((task = css_task_iter_next())) {
-   switch (oom_scan_process_thread(task, totalpages, NULL,
-   false)) {
+   switch (oom_scan_process_thread(, task, totalpages)) 
{
case OOM_SCAN_SELECT:
 

[PATCH 2/4] oom: Do not invoke oom notifiers on sysrq+f

2015-07-08 Thread Michal Hocko
From: Michal Hocko 

A github user rfjakob has reported the following issue via IRC.
 Manually triggering the OOM killer does not work anymore in 4.0.5
 This is what it looks like: 
https://gist.github.com/rfjakob/346b7dc611fc3cdf4011
 Basically, what happens is that the GPU driver frees some memory, 
that satisfies the OOM killer
 But the memory is allocated immediately again, and in the, no 
processes are killed no matter how often you trigger the oom killer
 "in the end"

Quoting from the github:
"
[19291.202062] sysrq: SysRq : Manual OOM execution
[19291.208335] Purging GPU memory, 74399744 bytes freed, 8728576 bytes still 
pinned.
[19291.390767] sysrq: SysRq : Manual OOM execution
[19291.396792] Purging GPU memory, 74452992 bytes freed, 8728576 bytes still 
pinned.
[19291.560349] sysrq: SysRq : Manual OOM execution
[19291.566018] Purging GPU memory, 75489280 bytes freed, 8728576 bytes still 
pinned.
[19291.729944] sysrq: SysRq : Manual OOM execution
[19291.735686] Purging GPU memory, 74399744 bytes freed, 8728576 bytes still 
pinned.
[19291.918637] sysrq: SysRq : Manual OOM execution
[19291.924299] Purging GPU memory, 74403840 bytes freed, 8728576 bytes still 
pinned.
"

The issue is that sysrq+f (force_kill) gets confused by the regular OOM
heuristic which tries to prevent from OOM killer if some of the oom
notifier can relase a memory. The heuristic doesn't make much sense for
the sysrq+f path because this one is used by the administrator to kill
a memory hog.

Reported-by: Jakob Unterwurzacher 
Signed-off-by: Michal Hocko 
---
 mm/oom_kill.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index f2737d66f66a..0b1b0b25f928 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -661,10 +661,12 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t 
gfp_mask,
if (oom_killer_disabled)
return false;
 
-   blocking_notifier_call_chain(_notify_list, 0, );
-   if (freed > 0)
-   /* Got some memory back in the last second. */
-   goto out;
+   if (!force_kill) {
+   blocking_notifier_call_chain(_notify_list, 0, );
+   if (freed > 0)
+   /* Got some memory back in the last second. */
+   goto out;
+   }
 
/*
 * If current has a pending SIGKILL or is exiting, then automatically
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 4/9] usb: dwc3: core: Adapt to named interrupts

2015-07-08 Thread Sergei Shtylyov

Hello.

On 7/8/2015 1:36 PM, Roger Quadros wrote:


From: Felipe Balbi 



Add support to use interrupt names,



Following are the interrupt names



Peripheral Interrupt - peripheral
HOST Interrupt - host
OTG Interrupt - otg



[Roger Q]
- If any of these are missing we use the
first available IRQ resource so that we don't
break with older DTBs.



- Use gadget_irq in gadget driver.



Signed-off-by: Felipe Balbi 
Signed-off-by: Roger Quadros 
---
  drivers/usb/dwc3/core.c   | 12 
  drivers/usb/dwc3/core.h   |  7 +++
  drivers/usb/dwc3/gadget.c |  2 +-
  3 files changed, 20 insertions(+), 1 deletion(-)



diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index a7498e0..7b33d7b 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -941,6 +941,18 @@ static int dwc3_probe(struct platform_device *pdev)
dwc->xhci_resources[1].flags = res->flags;
dwc->xhci_resources[1].name = res->name;

+   dwc->otg_irq = platform_get_irq_byname(pdev, "otg");
+   if (!dwc->otg_irq)


   The usual mistake repeated again: that function reutrns error # on 
failure, not 0.



+   dwc->otg_irq = res->start;
+
+   dwc->gadget_irq = platform_get_irq_byname(pdev, "peripheral");
+   if (!dwc->gadget_irq)
+   dwc->gadget_irq = res->start;


   Likewise.


+
+   dwc->xhci_irq = platform_get_irq_byname(pdev, "host");
+   if (!dwc->xhci_irq)


   Likewise.

[...]

WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 1/7] powerpc/powernv: Data structure and macros definition

2015-07-08 Thread Madhavan Srinivasan
Create new header file "nest-pmu.h" to add the data structures
and macros needed for the nest pmu support.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Anshuman Khandual 
Cc: Stephane Eranian 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/nest-pmu.h | 53 
 1 file changed, 53 insertions(+)
 create mode 100644 arch/powerpc/perf/nest-pmu.h

diff --git a/arch/powerpc/perf/nest-pmu.h b/arch/powerpc/perf/nest-pmu.h
new file mode 100644
index 000..ecb5d26
--- /dev/null
+++ b/arch/powerpc/perf/nest-pmu.h
@@ -0,0 +1,53 @@
+/*
+ * Nest Performance Monitor counter support for POWER8 processors.
+ *
+ * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define P8_NEST_MAX_CHIPS  32
+#define P8_NEST_MAX_PMUS   32
+#define P8_NEST_MAX_PMU_NAME_LEN   256
+#define P8_NEST_MAX_EVENTS_SUPPORTED   256
+#define P8_NEST_ENGINE_START   1
+#define P8_NEST_ENGINE_STOP0
+
+/*
+ * Structure to hold per chip specific memory address
+ * information for nest pmus. Nest Counter data are exported
+ * in per-chip reserved memory region by the PORE Engine.
+ */
+struct perchip_nest_info {
+   uint32_t chip_id;
+   uint64_t pbase;
+   uint64_t vbase;
+   uint32_t size;
+};
+
+/*
+ * Place holder for nest pmu events and values.
+ */
+struct nest_ima_events {
+   const char *ev_name;
+   const char *ev_value;
+};
+
+/*
+ * Device tree parser code detects nest pmu support and
+ * registers new nest pmus. This structure will
+ * hold the pmu functions and attrs for each nest pmu and
+ * will be referenced at the time of pmu registration.
+ */
+struct nest_pmu {
+   struct pmu pmu;
+   const struct attribute_group *attr_groups[4];
+};
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 7/7] powerpc/powernv: nest pmu cpumask and cpu hotplug support

2015-07-08 Thread Madhavan Srinivasan
Adds cpumask attribute to be used by each nest pmu since nest
units are per-chip. Only one cpu (first online cpu) from each node/chip
is designated to read counters.

On cpu hotplug, dying cpu is checked to see whether it is one of the
designated cpus, if yes, next online cpu from the same node/chip is
designated as new cpu to read counters.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Anshuman Khandual 
Cc: Stephane Eranian 
Cc: Preeti U Murthy 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/nest-pmu.c | 146 +++
 1 file changed, 146 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index c2ada13..31943c5 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -12,6 +12,7 @@
 
 static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
 static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
+static cpumask_t nest_pmu_cpu_mask;
 
 PMU_FORMAT_ATTR(event, "config:0-20");
 struct attribute *p8_nest_format_attrs[] = {
@@ -24,6 +25,147 @@ struct attribute_group p8_nest_format_group = {
.attrs = p8_nest_format_attrs,
 };
 
+static ssize_t nest_pmu_cpumask_get_attr(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   return cpumap_print_to_pagebuf(true, buf, _pmu_cpu_mask);
+}
+
+static DEVICE_ATTR(cpumask, S_IRUGO, nest_pmu_cpumask_get_attr, NULL);
+
+static struct attribute *nest_pmu_cpumask_attrs[] = {
+   _attr_cpumask.attr,
+   NULL,
+};
+
+static struct attribute_group nest_pmu_cpumask_attr_group = {
+   .attrs = nest_pmu_cpumask_attrs,
+};
+
+static void nest_init(void *dummy)
+{
+   opal_nest_ima_control(P8_NEST_ENGINE_START);
+}
+
+static void nest_change_cpu_context(int old_cpu, int new_cpu)
+{
+   int i;
+
+   for (i = 0; per_nest_pmu_arr[i] != NULL; i++)
+   perf_pmu_migrate_context(_nest_pmu_arr[i]->pmu,
+   old_cpu, new_cpu);
+}
+
+static void nest_exit_cpu(int cpu)
+{
+   int nid, target = -1;
+   struct cpumask *l_cpumask;
+
+   /*
+* Check in the designated list for this cpu. Dont bother
+* if not one of them.
+*/
+   if (!cpumask_test_and_clear_cpu(cpu, _pmu_cpu_mask))
+   return;
+
+   /*
+* Now that this cpu is one of the designated,
+* find a next cpu a) which is online and b) in same chip.
+*/
+   nid = cpu_to_node(cpu);
+   l_cpumask = cpumask_of_node(nid);
+   target = cpumask_next(cpu, l_cpumask);
+
+   /*
+* Update the cpumask with the target cpu and
+* migrate the context if needed
+*/
+   if (target >= 0 && target <= nr_cpu_ids) {
+   cpumask_set_cpu(target, _pmu_cpu_mask);
+   nest_change_cpu_context(cpu, target);
+   }
+}
+
+static void nest_init_cpu(int cpu)
+{
+   int nid, fcpu, ncpu;
+   struct cpumask *l_cpumask, tmp_mask;
+
+   nid = cpu_to_node(cpu);
+   l_cpumask = cpumask_of_node(nid);
+
+   /*
+* if empty cpumask, just add incoming cpu and move on.
+*/
+   if (!cpumask_and(_mask, l_cpumask, _pmu_cpu_mask)) {
+   cpumask_set_cpu(cpu, _pmu_cpu_mask);
+   return;
+   }
+
+   /*
+* Alway have the first online cpu of a chip as designated one.
+*/
+   fcpu = cpumask_first(l_cpumask);
+   ncpu = cpumask_next(cpu, l_cpumask);
+   if (cpu == fcpu) {
+   if (cpumask_test_and_clear_cpu(ncpu, _pmu_cpu_mask)) {
+   cpumask_set_cpu(cpu, _pmu_cpu_mask);
+   nest_change_cpu_context(ncpu, cpu);
+   }
+   }
+}
+
+static int nest_pmu_cpu_notifier(struct notifier_block *self,
+   unsigned long action, void *hcpu)
+{
+   long cpu = (long)hcpu;
+
+   switch (action & ~CPU_TASKS_FROZEN) {
+   case CPU_ONLINE:
+   nest_init_cpu(cpu);
+   break;
+   case CPU_DOWN_PREPARE:
+  nest_exit_cpu(cpu);
+  break;
+   default:
+   break;
+   }
+
+   return NOTIFY_OK;
+}
+
+static struct notifier_block nest_pmu_cpu_nb = {
+   .notifier_call  = nest_pmu_cpu_notifier,
+   .priority   = CPU_PRI_PERF + 1,
+};
+
+void nest_pmu_cpumask_init(void)
+{
+   const struct cpumask *l_cpumask;
+   int cpu, nid;
+
+   cpu_notifier_register_begin();
+
+   /*
+* Nest PMUs are per-chip counters. So designate a cpu
+* from each chip for counter collection.
+*/
+   for_each_online_node(nid) {
+   l_cpumask = cpumask_of_node(nid);
+
+   /* designate first online cpu in this node */
+   cpu = cpumask_first(l_cpumask);
+   

[PATCH v4 3/7] powerpc/powernv: Nest PMU detection and device tree parser

2015-07-08 Thread Madhavan Srinivasan
Create a file "nest-pmu.c" to contain nest pmu related functions. Code
to detect nest pmu support and parser to collect per-chip reserved memory
region information from device tree (DT).

Detection mechanism is to look for specific property "ibm,ima-chip" in DT.
For Nest pmu, device tree will have two set of information.
1) Per-chip reserved memory region for nest pmu counter collection area.
2) Supported Nest PMUs and events

Device tree layout for the Nest PMU as follows.

  / -- DT root folder
  |
  -nest-ima -- Nest PMU folder
   |

   -ima-chip@  -- Per-chip folder for reserved region information
|
-ibm,chip-id-- Chip id
-ibm,ima-chip
-reg-- HOMER PORE Nest Counter collection Address (RA)
-size   -- size to map in kernel space

   -Alink_BW-- Nest PMU folder
|
-Alink0 -- Nest PMU Alink Event file
-scale.Alink0.scale -- Event scale file
-unit.Alink0.unit   -- Event unit file
-device_type-- "nest-ima-unit" marker
  

Subsequent patch will parse the next part of the DT to find various
Nest PMUs and their events.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Anshuman Khandual 
Cc: Stephane Eranian 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/Makefile   |  2 +-
 arch/powerpc/perf/nest-pmu.c | 85 
 2 files changed, 86 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/perf/nest-pmu.c

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index f9c083a..6da656b 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -5,7 +5,7 @@ obj-$(CONFIG_PERF_EVENTS)   += callchain.o
 obj-$(CONFIG_PPC_PERF_CTRS)+= core-book3s.o bhrb.o
 obj64-$(CONFIG_PPC_PERF_CTRS)  += power4-pmu.o ppc970-pmu.o power5-pmu.o \
   power5+-pmu.o power6-pmu.o power7-pmu.o \
-  power8-pmu.o
+  power8-pmu.o nest-pmu.o
 obj32-$(CONFIG_PPC_PERF_CTRS)  += mpc7450-pmu.o
 
 obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
new file mode 100644
index 000..e7d45ed
--- /dev/null
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -0,0 +1,85 @@
+/*
+ * Nest Performance Monitor counter support for POWER8 processors.
+ *
+ * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include "nest-pmu.h"
+
+static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
+
+static int nest_ima_dt_parser(void)
+{
+   const __be32 *gcid;
+   const __be64 *chip_ima_reg;
+   const __be64 *chip_ima_size;
+   struct device_node *dev;
+   struct perchip_nest_info *p8ni;
+   int idx;
+
+   /*
+* "nest-ima" folder contains two things,
+* a) per-chip reserved memory region for Nest PMU Counter data
+* b) Support Nest PMU units and their event files
+*/
+   for_each_node_with_property(dev, "ibm,ima-chip") {
+   gcid = of_get_property(dev, "ibm,chip-id", NULL);
+   chip_ima_reg = of_get_property(dev, "reg", NULL);
+   chip_ima_size = of_get_property(dev, "size", NULL);
+
+   if ((!gcid) || (!chip_ima_reg) || (!chip_ima_size)) {
+   pr_err("Nest_PMU: device %s missing property\n",
+   dev->full_name);
+   return -ENODEV;
+   }
+
+   /* chip id to save reserve memory region */
+   idx = (uint32_t)be32_to_cpup(gcid);
+
+   /*
+* Using a local variable to make it compact and
+* easier to read
+*/
+   p8ni = _nest_perchip_info[idx];
+   p8ni->pbase = be64_to_cpup(chip_ima_reg);
+   p8ni->size = be64_to_cpup(chip_ima_size);
+   p8ni->vbase = (uint64_t) phys_to_virt(p8ni->pbase);
+   }
+
+   return 0;
+}
+
+static int __init nest_pmu_init(void)
+{
+   int ret = -ENODEV;
+
+   /*
+* Lets do this only if we are hypervisor
+*/
+   if (!cur_cpu_spec->oprofile_cpu_type ||
+   !(strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power8") == 0) ||
+   !cpu_has_feature(CPU_FTR_HVMODE))
+   return ret;
+
+   /*
+* Nest PMU information is grouped under "nest-ima" node
+* of the top-level device-tree directory. Detect Nest PMU
+* by the "ibm,ima-chip" property.
+*/
+   if (!of_find_node_with_property(NULL, "ibm,ima-chip"))
+   return ret;
+
+

Re: [RFC PATCH v10 23/50] perf tools: Make perf depend on libbpf

2015-07-08 Thread Arnaldo Carvalho de Melo
Em Tue, Jul 07, 2015 at 07:03:50PM -0700, Alexei Starovoitov escreveu:
> On 7/7/15 1:16 PM, Arnaldo Carvalho de Melo wrote:
> >So, please move this to just before we can use it, wiring it up should
> >mean, hey, try this "hello, world" eBPF program right now!
> 
> btw, since bpf is now stable llvm backend, one can just get the latest
> clang/llvm 3.7 from pre-built llvm packages for debian/ubuntu:
> http://llvm.org/apt/
> and bpf backend will be there by default.
> No need to build clang/llvm from sources.

Are there any for RHEL7 or Fedora21/22? 8-)

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] oom: Do not panic when OOM killer is sysrq triggered

2015-07-08 Thread Michal Hocko
From: Michal Hocko 

OOM killer might be triggered explicitly via sysrq+f. This is supposed
to kill a task no matter what e.g. a task is selected even though there
is an OOM victim on the way to exit. This is a big hammer for an admin
to help to resolve a memory short condition when the system is not able
to cope with it on its own in a reasonable time frame (e.g. when the
system is trashing or the OOM killer cannot make sufficient progress)

E.g. it doesn't make any sense to obey panic_on_oom setting because
a) administrator could have used other sysrqs to achieve the
panic/reboot and b) the policy would break an existing usecase to
kill a memory hog which would be recoverable unlike the panic which
might be configured for the real OOM condition.

It also doesn't make much sense to panic the system when there is no
OOM killable task because administrator might choose to do additional
steps before rebooting/panicking the system.

While we are there also add a comment explaining why
sysctl_oom_kill_allocating_task doesn't apply to sysrq triggered OOM
killer even though there is no explicit check and we subtly rely
on current->mm being NULL for the context from which it is triggered.

Also be more explicit about sysrq+f behavior in the documentation.

Signed-off-by: Michal Hocko 
---
 Documentation/sysrq.txt |  5 -
 mm/oom_kill.c   | 15 ---
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt
index 0e307c94809a..7664e93411d2 100644
--- a/Documentation/sysrq.txt
+++ b/Documentation/sysrq.txt
@@ -75,7 +75,10 @@ On other - If you know of the key combos for other 
architectures, please
 
 'e' - Send a SIGTERM to all processes, except for init.
 
-'f'- Will call oom_kill to kill a memory hog process.
+'f'- Will call oom_kill to kill a memory hog process. Please note that
+ parallel OOM killer is ignored and a task is killed even though
+ there was an oom victim selected already. panic_on_oom is ignored
+ and the system doesn't panic if there are no oom killable tasks.
 
 'g'- Used by kgdb (kernel debugger)
 
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index dff991e0681e..f2737d66f66a 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -687,8 +687,13 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t 
gfp_mask,
constraint = constrained_alloc(zonelist, gfp_mask, nodemask,
);
mpol_mask = (constraint == CONSTRAINT_MEMORY_POLICY) ? nodemask : NULL;
-   check_panic_on_oom(constraint, gfp_mask, order, mpol_mask, NULL);
+   if (!force_kill)
+   check_panic_on_oom(constraint, gfp_mask, order, mpol_mask, 
NULL);
 
+   /*
+* not affecting force_kill because sysrq triggered OOM killer runs from
+* the workqueue context so current->mm will be NULL
+*/
if (sysctl_oom_kill_allocating_task && current->mm &&
!oom_unkillable_task(current, NULL, nodemask) &&
current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) {
@@ -702,8 +707,12 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t 
gfp_mask,
p = select_bad_process(, totalpages, mpol_mask, force_kill);
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!p) {
-   dump_header(NULL, gfp_mask, order, NULL, mpol_mask);
-   panic("Out of memory and no killable processes...\n");
+   if (!force_kill) {
+   dump_header(NULL, gfp_mask, order, NULL, mpol_mask);
+   panic("Out of memory and no killable processes...\n");
+   } else {
+   pr_info("Sysrq triggered out of memory. No killable 
task found...\n");
+   }
}
if (p != (void *)-1UL) {
oom_kill_process(p, gfp_mask, order, points, totalpages, NULL,
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] oom: sysrq+f fixes + cleanups

2015-07-08 Thread Michal Hocko
Hi,
some of these patches have been posted already: 
http://marc.info/?l=linux-mm=143462145830969=2
This series contains an additional fix for another sysrq+f issue
reported off mailing list (patch #2).

First two patches are clear fixes. The third patch is from David with
my minor changes. The last patch is a cleanup but I have put it after
others because it has seen some opposition in the past.

Shortlog says:
David Rientjes (1):
  mm, oom: organize oom context into struct

Michal Hocko (3):
  oom: Do not panic when OOM killer is sysrq triggered
  oom: Do not invoke oom notifiers on sysrq+f
  oom: split out forced OOM killer

Thanks!

And diffstat:
 Documentation/sysrq.txt |   5 +-
 drivers/tty/sysrq.c |   3 +-
 include/linux/oom.h |  26 +
 mm/memcontrol.c |  13 +++--
 mm/oom_kill.c   | 141 +++-
 mm/page_alloc.c |   9 +++-
 6 files changed, 116 insertions(+), 81 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] oom: split out forced OOM killer

2015-07-08 Thread Michal Hocko
From: Michal Hocko 

The forced OOM killing is currently wired into out_of_memory() call
even though their objective is different which makes the code ugly
and harder to follow. Generic out_of_memory path has to deal with
configuration settings and heuristics which are completely irrelevant
to the forced OOM killer (e.g. sysctl_oom_kill_allocating_task or
OOM killer prevention for already dying tasks). All of them are
either relying on explicit force_kill check or indirectly by checking
current->mm which is always NULL for sysrq+f. This is not nice, hard
to follow and error prone.

Let's pull forced OOM killer code out into a separate function
(force_out_of_memory) which is really trivial now.
As a bonus we can clearly state that this is a forced OOM killer
in the OOM message which is helpful to distinguish it from the
regular OOM killer.

Signed-off-by: Michal Hocko 
---
 drivers/tty/sysrq.c |  9 +
 include/linux/oom.h |  1 +
 mm/oom_kill.c   | 57 -
 3 files changed, 41 insertions(+), 26 deletions(-)

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index 865b837a9aee..6a3def693ded 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -356,15 +356,8 @@ static struct sysrq_key_op sysrq_term_op = {
 
 static void moom_callback(struct work_struct *ignored)
 {
-   const gfp_t gfp_mask = GFP_KERNEL;
-   struct oom_context oc = {
-   .zonelist = node_zonelist(first_memory_node, gfp_mask),
-   .gfp_mask = gfp_mask,
-   .force_kill = true,
-   };
-
mutex_lock(_lock);
-   if (!out_of_memory())
+   if (!force_out_of_memory())
pr_info("OOM request ignored because killer is disabled\n");
mutex_unlock(_lock);
 }
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 094407cb2d2e..6af2d12d6134 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -77,6 +77,7 @@ extern enum oom_scan_t oom_scan_process_thread(struct 
oom_context *oc,
struct task_struct *task, unsigned long totalpages);
 
 extern bool out_of_memory(struct oom_context *oc);
+extern bool force_out_of_memory(void);
 
 extern void exit_oom_victim(void);
 
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 01aa4cb86857..6a0b09296236 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -627,6 +627,38 @@ int unregister_oom_notifier(struct notifier_block *nb)
 EXPORT_SYMBOL_GPL(unregister_oom_notifier);
 
 /**
+ * force_out_of_memory - forces OOM killer to kill a process
+ *
+ * Explicitly trigger the OOM killer. The system doesn't have to be under
+ * OOM condition (e.g. sysrq+f).
+ */
+bool force_out_of_memory(void)
+{
+   struct task_struct *p;
+   unsigned long totalpages;
+   unsigned int points;
+   const gfp_t gfp_mask = GFP_KERNEL;
+   struct oom_context oc = {
+   .zonelist = node_zonelist(first_memory_node, gfp_mask),
+   .gfp_mask = gfp_mask,
+   .force_kill = true,
+   };
+
+   if (oom_killer_disabled)
+   return false;
+
+   constrained_alloc(, );
+   p = select_bad_process(, , totalpages);
+   if (p != (void *)-1UL)
+   oom_kill_process(, p, points, totalpages, NULL,
+"Forced out of memory killer");
+   else
+   pr_warn("Sysrq triggered out of memory. No killable task 
found...\n");
+
+   return true;
+}
+
+/**
  * out_of_memory - kill the "best" process when we run out of memory
  * @oc: pointer to struct oom_context
  *
@@ -647,12 +679,10 @@ bool out_of_memory(struct oom_context *oc)
if (oom_killer_disabled)
return false;
 
-   if (!oc->force_kill) {
-   blocking_notifier_call_chain(_notify_list, 0, );
-   if (freed > 0)
-   /* Got some memory back in the last second. */
-   goto out;
-   }
+   blocking_notifier_call_chain(_notify_list, 0, );
+   if (freed > 0)
+   /* Got some memory back in the last second. */
+   goto out;
 
/*
 * If current has a pending SIGKILL or is exiting, then automatically
@@ -675,13 +705,8 @@ bool out_of_memory(struct oom_context *oc)
constraint = constrained_alloc(oc, );
if (constraint != CONSTRAINT_MEMORY_POLICY)
oc->nodemask = NULL;
-   if (!oc->force_kill)
-   check_panic_on_oom(oc, constraint, NULL);
+   check_panic_on_oom(oc, constraint, NULL);
 
-   /*
-* not affecting force_kill because sysrq triggered OOM killer runs from
-* the workqueue context so current->mm will be NULL
-*/
if (sysctl_oom_kill_allocating_task && current->mm &&
!oom_unkillable_task(current, NULL, oc->nodemask) &&
current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) {
@@ -694,12 +719,8 @@ bool out_of_memory(struct oom_context *oc)
p = 

[PATCH v4 6/7] powerpc/powernv: generic nest pmu event functions

2015-07-08 Thread Madhavan Srinivasan
Add set of generic nest pmu related event functions to be used by
each nest pmu. Add code to register nest pmus.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Anshuman Khandual 
Cc: Stephane Eranian 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/nest-pmu.c | 104 +++
 1 file changed, 104 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index 20ed9f8..c2ada13 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -24,6 +24,100 @@ struct attribute_group p8_nest_format_group = {
.attrs = p8_nest_format_attrs,
 };
 
+static int p8_nest_event_init(struct perf_event *event)
+{
+   int chip_id;
+
+   if (event->attr.type != event->pmu->type)
+   return -ENOENT;
+
+   /* Sampling not supported yet */
+   if (event->hw.sample_period)
+   return -EINVAL;
+
+   /* unsupported modes and filters */
+   if (event->attr.exclude_user   ||
+   event->attr.exclude_kernel ||
+   event->attr.exclude_hv ||
+   event->attr.exclude_idle   ||
+   event->attr.exclude_host   ||
+   event->attr.exclude_guest)
+   return -EINVAL;
+
+   if (event->cpu < 0)
+   return -EINVAL;
+
+   chip_id = topology_physical_package_id(event->cpu);
+   event->hw.event_base = event->attr.config +
+   p8_nest_perchip_info[chip_id].vbase;
+
+   return 0;
+}
+
+static void p8_nest_read_counter(struct perf_event *event)
+{
+   uint64_t *addr;
+   u64 data = 0;
+
+   addr = (u64 *)event->hw.event_base;
+   data = __be64_to_cpu(*addr);
+   local64_set(>hw.prev_count, data);
+}
+
+static void p8_nest_perf_event_update(struct perf_event *event)
+{
+   u64 counter_prev, counter_new, final_count;
+   uint64_t *addr;
+
+   addr = (uint64_t *)event->hw.event_base;
+   counter_prev = local64_read(>hw.prev_count);
+   counter_new = __be64_to_cpu(*addr);
+   final_count = counter_new - counter_prev;
+
+   local64_set(>hw.prev_count, counter_new);
+   local64_add(final_count, >count);
+}
+
+static void p8_nest_event_start(struct perf_event *event, int flags)
+{
+   event->hw.state = 0;
+   p8_nest_read_counter(event);
+}
+
+static void p8_nest_event_stop(struct perf_event *event, int flags)
+{
+   if (flags & PERF_EF_UPDATE)
+   p8_nest_perf_event_update(event);
+}
+
+static int p8_nest_event_add(struct perf_event *event, int flags)
+{
+   if (flags & PERF_EF_START)
+   p8_nest_event_start(event, flags);
+
+   return 0;
+}
+
+/*
+ * Populate pmu ops in the structure
+ */
+static int update_pmu_ops(struct nest_pmu *pmu)
+{
+   if (!pmu)
+   return -EINVAL;
+
+   pmu->pmu.task_ctx_nr = perf_invalid_context;
+   pmu->pmu.event_init = p8_nest_event_init;
+   pmu->pmu.add = p8_nest_event_add;
+   pmu->pmu.del = p8_nest_event_stop;
+   pmu->pmu.start = p8_nest_event_start;
+   pmu->pmu.stop = p8_nest_event_stop;
+   pmu->pmu.read = p8_nest_perf_event_update;
+   pmu->pmu.attr_groups = pmu->attr_groups;
+
+   return 0;
+}
+
 static int nest_event_info(struct property *pp, char *start,
struct nest_ima_events *p8_events, int flg, u32 val)
 {
@@ -179,6 +273,16 @@ static int nest_pmu_create(struct device_node *dev, int 
pmu_index)
update_events_in_group(
(struct nest_ima_events *)p8_events_arr, idx, pmu_ptr);
 
+   update_pmu_ops(pmu_ptr);
+   /* Register the pmu */
+   ret = perf_pmu_register(_ptr->pmu, pmu_ptr->pmu.name, -1);
+   if (ret) {
+   pr_err("Nest PMU %s Register failed\n", pmu_ptr->pmu.name);
+   return ret;
+   }
+
+   pr_info("%s performance monitor hardware support registered\n",
+   pmu_ptr->pmu.name);
return 0;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 5/7] powerpc/powernv: add event attribute and group to nest pmu

2015-07-08 Thread Madhavan Srinivasan
Add code to create event/format attributes and attribute groups for
each nest pmu.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Anshuman Khandual 
Cc: Stephane Eranian 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/nest-pmu.c | 57 
 1 file changed, 57 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index 6116ff3..20ed9f8 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -13,6 +13,17 @@
 static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
 static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
 
+PMU_FORMAT_ATTR(event, "config:0-20");
+struct attribute *p8_nest_format_attrs[] = {
+   _attr_event.attr,
+   NULL,
+};
+
+struct attribute_group p8_nest_format_group = {
+   .name = "format",
+   .attrs = p8_nest_format_attrs,
+};
+
 static int nest_event_info(struct property *pp, char *start,
struct nest_ima_events *p8_events, int flg, u32 val)
 {
@@ -45,6 +56,48 @@ static int nest_event_info(struct property *pp, char *start,
return 0;
 }
 
+/*
+ * Populate event name and string in attribute
+ */
+struct attribute *dev_str_attr(const char *name, const char *str)
+{
+   struct perf_pmu_events_attr *attr;
+
+   attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+
+   attr->event_str = str;
+   attr->attr.attr.name = name;
+   attr->attr.attr.mode = 0444;
+   attr->attr.show = perf_event_sysfs_show;
+
+   return >attr.attr;
+}
+
+int update_events_in_group(
+   struct nest_ima_events *p8_events, int idx, struct nest_pmu *pmu)
+{
+   struct attribute_group *attr_group;
+   struct attribute **attrs;
+   int i;
+
+   /* Allocate memory for event attribute group */
+   attr_group = kzalloc(((sizeof(struct attribute *) * (idx + 1)) +
+   sizeof(*attr_group)), GFP_KERNEL);
+   if (!attr_group)
+   return -ENOMEM;
+
+   attrs = (struct attribute **)(attr_group + 1);
+   attr_group->name = "events";
+   attr_group->attrs = attrs;
+
+   for (i = 0; i < idx; i++, p8_events++)
+   attrs[i] = dev_str_attr((char *)p8_events->ev_name,
+   (char *)p8_events->ev_value);
+
+   pmu->attr_groups[0] = attr_group;
+   return 0;
+}
+
 static int nest_pmu_create(struct device_node *dev, int pmu_index)
 {
struct nest_ima_events **p8_events_arr, *p8_events;
@@ -91,6 +144,7 @@ static int nest_pmu_create(struct device_node *dev, int 
pmu_index)
/* Save the name to register it later */
sprintf(buf, "Nest_%s", (char *)pp->value);
pmu_ptr->pmu.name = (char *)buf;
+   pmu_ptr->attr_groups[1] = _nest_format_group;
continue;
}
 
@@ -122,6 +176,9 @@ static int nest_pmu_create(struct device_node *dev, int 
pmu_index)
idx++;
}
 
+   update_events_in_group(
+   (struct nest_ima_events *)p8_events_arr, idx, pmu_ptr);
+
return 0;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 4/7] powerpc/powernv: detect supported nest pmus and its events

2015-07-08 Thread Madhavan Srinivasan
Parse device tree to detect supported nest pmu units. Traverse
through each nest pmu unit folder to find supported events and
corresponding unit/scale files (if any).

The nest unit event file from DT, will contain the offset in the
reserved memory region to get the counter data for a given event.
Kernel code uses this offset as event configuration value.

Device tree parser code also looks for scale/unit in the file name and
passes on the file as an event attr for perf tool to use in the post
processing.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Anshuman Khandual 
Cc: Stephane Eranian 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/nest-pmu.c | 124 ++-
 1 file changed, 123 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index e7d45ed..6116ff3 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -11,6 +11,119 @@
 #include "nest-pmu.h"
 
 static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
+static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
+
+static int nest_event_info(struct property *pp, char *start,
+   struct nest_ima_events *p8_events, int flg, u32 val)
+{
+   char *buf;
+
+   /* memory for event name */
+   buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   strncpy(buf, start, strlen(start));
+   p8_events->ev_name = buf;
+
+   /* memory for content */
+   buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   if (flg) {
+   /* string content*/
+   if (!pp->value ||
+  (strnlen(pp->value, pp->length) == pp->length))
+   return -EINVAL;
+
+   strncpy(buf, (const char *)pp->value, pp->length);
+   } else
+   sprintf(buf, "event=0x%x", val);
+
+   p8_events->ev_value = buf;
+   return 0;
+}
+
+static int nest_pmu_create(struct device_node *dev, int pmu_index)
+{
+   struct nest_ima_events **p8_events_arr, *p8_events;
+   struct nest_pmu *pmu_ptr;
+   struct property *pp;
+   char *buf, *start;
+   const __be32 *lval;
+   u32 val;
+   int idx = 0, ret;
+
+   if (!dev)
+   return -EINVAL;
+
+   /* memory for nest pmus */
+   pmu_ptr = kzalloc(sizeof(struct nest_pmu), GFP_KERNEL);
+   if (!pmu_ptr)
+   return -ENOMEM;
+
+   /* Needed for hotplug/migration */
+   per_nest_pmu_arr[pmu_index] = pmu_ptr;
+
+   /* memory for nest pmu events */
+   p8_events_arr = kzalloc((sizeof(struct nest_ima_events) * 64),
+   GFP_KERNEL);
+   if (!p8_events_arr)
+   return -ENOMEM;
+   p8_events = (struct nest_ima_events *)p8_events_arr;
+
+   /*
+* Loop through each property
+*/
+   for_each_property_of_node(dev, pp) {
+   start = pp->name;
+
+   if (!strcmp(pp->name, "name")) {
+   if (!pp->value ||
+  (strnlen(pp->value, pp->length) == pp->length))
+   return -EINVAL;
+
+   buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   /* Save the name to register it later */
+   sprintf(buf, "Nest_%s", (char *)pp->value);
+   pmu_ptr->pmu.name = (char *)buf;
+   continue;
+   }
+
+   /* Skip these, we dont need it */
+   if (!strcmp(pp->name, "phandle") ||
+   !strcmp(pp->name, "device_type") ||
+   !strcmp(pp->name, "linux,phandle"))
+   continue;
+
+   if (strncmp(pp->name, "unit.", 5) == 0) {
+   /* Skip first few chars in the name */
+   start += 5;
+   ret = nest_event_info(pp, start, p8_events++, 1, 0);
+   } else if (strncmp(pp->name, "scale.", 6) == 0) {
+   /* Skip first few chars in the name */
+   start += 6;
+   ret = nest_event_info(pp, start, p8_events++, 1, 0);
+   } else {
+   lval = of_get_property(dev, pp->name, NULL);
+   val = (uint32_t)be32_to_cpup(lval);
+
+   ret = nest_event_info(pp, start, p8_events++, 0, val);
+   }
+
+   if (ret)
+   return ret;
+
+   /* book keeping */
+   idx++;
+   }
+
+   return 0;
+}
 
 static int nest_ima_dt_parser(void)
 {
@@ -19,7 +132,7 @@ static 

Re: [RFC PATCH v10 23/50] perf tools: Make perf depend on libbpf

2015-07-08 Thread Arnaldo Carvalho de Melo
Em Wed, Jul 08, 2015 at 07:45:34PM +0800, Wangnan (F) escreveu:
> On 2015/7/8 4:16, Arnaldo Carvalho de Melo wrote:
> >Em Tue, Jul 07, 2015 at 04:54:52PM -0300, Arnaldo Carvalho de Melo escreveu:
> >>Em Wed, Jul 01, 2015 at 02:14:11AM +, Wang Nan escreveu:
> >>>Error messages are also updated to notify users about the disable of
> >>>BPF support of 'perf record' if libelf is missed or BPF API check
> >>>failed.

> >>Much better!

> >But... I was all happy about this being linked with perf, went straight
> >ahead to try to use it! No, its not possible, I have to go thru a series
> >of other patches first... anticlimactic :-(

> >So, please move this to just before we can use it, wiring it up should
> >mean, hey, try this "hello, world" eBPF program right now!

> It is not an easy work, since there is a bulk of code in
> tools/perf/utils/bpf-loader.c depend on HAVE_LIBBPF_SUPPORT and
> CONFIG_LIBBPF. If put this patch the final one, we will make hundreds
> of lines of code avaiable by one patch.  It is not good.

Understood, makes sense, you are building the perf support step by step,
so it is not possible to test it in the first, only compile test the
initial wiring.
 
> I have an idea that, put this patch after the llvm tester:
 
> $ git log --oneline
> d011a28 perf tools: Make perf depend on libbpf
> 57ad12f perf tests: Add LLVM test for eBPF on-the-fly compiling
> 8c7e20b perf tools: Auto detecting kernel include options
> 442675f perf tools: Auto detecting kernel build directory
> dcd9304 perf tools: Call clang to compile C source to object code
> 864e2fb perf tools: Introduce llvm config options
> 8558c38 bpf tools: Link all bpf objects onto a list
> ...
 
> Then before this patch, you can test llvm on-the-fly compiling without
> parsing the
> result .o:
 
> $ perf test 38
> 38: Test LLVM searching and compiling: (skip bpf
> parsing) Ok
 
> After this patch, libbpf should be compiled so basic libbpf parsing can be
> tested:
> 
> $ perf test 38
> 38: Test LLVM searching and compiling: Ok

That looks enough for me, 'perf test' should exercise the code, making
sure that what was done up to that point got some testing.

We need to try to avoid adding too much code that doesn't get exercised
as soon as possible, otherwise we lose a lot of bisectability, i.e. when
trying to find a problem using 'git bisect' we will hit the first place
when all that code gets used and will be left with having to look at all
the code up to that point instead of landing in a more likely culprit.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 2/7] powerpc/powernv: Add OPAL support for Nest PMU

2015-07-08 Thread Madhavan Srinivasan
Nest Counters can be configured via PORE Engine and OPAL
provides an interface to start/stop it.

OPAL side patches are posted in the skiboot mailing.

Cc: Stewart Smith 
Cc: Jeremy Kerr 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Anshuman Khandual 
Cc: Stephane Eranian 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/opal-api.h| 3 ++-
 arch/powerpc/include/asm/opal.h| 1 +
 arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index e9e4c52..4cd8128 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -154,7 +154,8 @@
 #define OPAL_FLASH_WRITE   111
 #define OPAL_FLASH_ERASE   112
 #define OPAL_PRD_MSG   113
-#define OPAL_LAST  113
+#define OPAL_NEST_IMA_CONTROL  116
+#define OPAL_LAST  116
 
 /* Device tree flags */
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 958e941..7cb6215 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -202,6 +202,7 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, 
uint64_t buf,
uint64_t size, uint64_t token);
 int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
uint64_t token);
+int64_t opal_nest_ima_control(uint32_t value);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
b/arch/powerpc/platforms/powernv/opal-wrappers.S
index d6a7b82..c475c04 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -297,3 +297,4 @@ OPAL_CALL(opal_flash_read,  
OPAL_FLASH_READ);
 OPAL_CALL(opal_flash_write,OPAL_FLASH_WRITE);
 OPAL_CALL(opal_flash_erase,OPAL_FLASH_ERASE);
 OPAL_CALL(opal_prd_msg,OPAL_PRD_MSG);
+OPAL_CALL(opal_nest_ima_control,   OPAL_NEST_IMA_CONTROL);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] MIPS, IRQCHIP: Move i8259 irqchip driver to drivers/irqchip.

2015-07-08 Thread Thomas Gleixner
On Wed, 8 Jul 2015, Ralf Baechle wrote:

>  arch/mips/Kconfig   |   4 -
>  arch/mips/kernel/Makefile   |   1 -
>  arch/mips/kernel/i8259.c| 384 
> 
>  drivers/irqchip/Kconfig |   4 +
>  drivers/irqchip/Makefile|   1 +
>  drivers/irqchip/irq-i8259.c | 383 +++
>  6 files changed, 388 insertions(+), 389 deletions(-)

Should I carry it, or want you merge it via the mips tree?

In the latter case: Acked-by-me.
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/pci/intel_mid_pci: work around for IRQ0 assignment

2015-07-08 Thread Thomas Gleixner
On Wed, 8 Jul 2015, Andy Shevchenko wrote:
> A few devices on Intel Edison board (Intel Tangier) has IRQ0 as an IRQ line in
> the PCI configuration. The actual one which is using that is a first eMMC host
> controller.

You fail to explain that the other devices have a bogus PCI configuration.

> In case we compile sdhci-pci as a module and leave serial driver built-in,
> first serial device not in use and has IRQ0 assigned as well, the latter takes
> the interrupt allocation.

We are really not interested in the details of whats compiled in or
not and which other device is acquiring the interrupt. What matters
is: It's an init ordering problem.
 
> The result of such behaviour is impossibility to
> allocate the interrupt by sdhci-pci driver.
> 
> This patch introduces a quirk inside intel_mid_pci_irq_enable() to avoid
> described behaviour.

That's pretty useless. You tell the reader that you add a quirk, which
is hardly interesting because the subject line already talks about a
workaround. You fail to tell WHAT the quirk is doing.

Aside of that, starting a sentence in a changelog with "This patch" is
silly. We already know that THIS is a patch.

Let me rephrase the whole thing:

"On Intel Tangier the MMC host controller is wired up to irq 0. But
 several other devices have irq 0 associated as well due to a bogus
 PCI configuration.

 The first initialized driver will acquire irq 0 and make it
 unavailable for other devices. If the sdhci driver is not the first
 one it will fail to acquire the interrupt and therefor be non
 functional.

 Add a quirk to the pci irq enable function which denies irq 0 to
 anything else than the MMC host controller driver on Tangier
 platforms."

Can you see the difference?
 
> +#define PCI_DEVICE_ID_INTEL_MRFL_MMC 0x1190
> +

Please add defines at the top of the file, not just randomly in the
middle of the code.

>  static int intel_mid_pci_irq_enable(struct pci_dev *dev)
>  {
>   struct irq_alloc_info info;
>   int polarity;
>   int ret;
>  
> - if (dev->irq_managed && dev->irq > 0)
> + if (dev->irq_managed && dev->irq >= 0)
>   return 0;

What's the point here? Can dev->irq_managed be set and dev->irq be < 0?

> + /* Special treatment for IRQ0 */
> + if (dev->irq == 0) {
> + switch (intel_mid_identify_cpu()) {
> + case INTEL_MID_CPU_CHIP_TANGIER:
> + /*
> +  * TNG has IRQ0 assigned to eMMC controller. This makes
> +  * it happy to get an interrupt.

It's nice that you want to make the eMMC controller happy, but I doubt
that the silicon actually cares.

Please add a proper comment explaining the issue at hand.

> +  */
> + if (dev->device != PCI_DEVICE_ID_INTEL_MRFL_MMC)
> + return -EBUSY;
> + break;
> + default:
> + break;
> + }
> + }
> +
>   /* Set IRQ polarity */
>   if (intel_mid_identify_cpu() == INTEL_MID_CPU_CHIP_TANGIER)
>   polarity = 0; /* active high */

So now we have:

   if (dev->irq == 0) {
 switch(intel_mid_identify_cpu()) {
 case INTEL_MID_CPU_CHIP_TANGIER:
 
   }

and right after that:

   if (intel_mid_identify_cpu() == INTEL_MID_CPU_CHIP_TANGIER)
  

That's just silly. Whats wrong with:

   switch (intel_mid_identify_cpu()) {
   case INTEL_MID_CPU_CHIP_TANGIER:
polarity = 0;
if (dev->irq == 0) {
   
}
break;
   default:
polarity = 1;
   }

Hmm?

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] iio: accel: kxcjk-1013: Remove blank lines

2015-07-08 Thread Daniel Baluta
On Wed, Jul 8, 2015 at 3:44 PM, Ana Calinov  wrote:
> This patch fixes the the following errors given by
> checkpatch.pl with --strict:
> Please don't use multiple blank lines.
> Blank lines aren't necessary after an open brace '{'.
>
> Signed-off-by: Ana Calinov 


Looks good to me.

Reviewed-by: Daniel Baluta 

Thanks Ana!

Daniel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] kmod: Add up-to-date explanations on the purpose of each asynchronous levels

2015-07-08 Thread Frederic Weisbecker
On Tue, Jul 07, 2015 at 04:07:58PM -0700, Andrew Morton wrote:
> On Mon,  6 Jul 2015 17:33:40 +0200 Frederic Weisbecker  
> wrote:
> 
> > There seem to be quite some confusions on the comments, likely due to
> > changes that came after them.
> > 
> > Now since it's very non obvious why we have 3 levels of asynchronous
> > code to implement usermodehelpers, it's important to comment in detail
> > the reason of this layout.
> 
> There are still a few references to keventd in there.  One of them is
> simply wrong: "runs as a child of keventd".  The userspace code is
> actually a child of the khelper thread, yes?
> 
> I guess we should remove all kernel references to "keventd".  It got
> renamed to "kworker".

Right, I think I missed them because I confused khelper with keventd.
In fact here they are all children of khelper, which is the singlethread
workqueue tied to kmod.

But I'm working on a new iteration that makes use of a global no numa
workqueue.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] irq: IRQ bypass manager

2015-07-08 Thread Alex Williamson
On Wed, 2015-07-08 at 12:22 +, Wu, Feng wrote:
> 
> 
> > -Original Message-
> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Wednesday, July 08, 2015 5:40 AM
> > To: linux-kernel@vger.kernel.org; k...@vger.kernel.org
> > Cc: eric.au...@st.com; eric.au...@linaro.org; j...@8bytes.org;
> > avi.kiv...@gmail.com; pbonz...@redhat.com; Wu, Feng
> > Subject: [RFC PATCH] irq: IRQ bypass manager
> > 
> > When a physical I/O device is assigned to a virtual machine through
> > facilities like VFIO and KVM, the interrupt for the device generally
> > bounces through the host system before being injected into the VM.
> > However, hardware technologies exist that often allow the host to be
> > bypassed for some of these scenarios.  Intel Posted Interrupts allow
> > the specified physical edge interrupts to be directly injected into a
> > guest when delivered to a physical processor while the vCPU is
> > running.  ARM IRQ Forwarding allows the hypervisor to handle level
> > triggered device interrupts as edge interrupts, by giving the guest
> > control of de-asserting and unmasking the interrupt line.
> > 
> > The IRQ bypass manager here is meant to provide the shim to connect
> > interrupt producers, generally the host physical device driver, with
> > interrupt consumers, generally the hypervisor, in order to configure
> > these bypass mechanism.  To do this, we base the connection on a
> > shared, opaque token.  For KVM-VFIO this is expected to be an
> > eventfd_ctx since this is the connection we already use to connect an
> > eventfd to an irqfd on the in-kernel path.  When a producer and
> > consumer with matching tokens is found, callbacks via both registered
> > participants allow the bypass facilities to be automatically enabled.
> 
> My Pi patches can work well based on this one and the one Eric sent
> out earlier. Alex, what should we do in the next step to speed up the
> upstreaming process?

Hi Feng,

Post the patches.  Define how the update() callback is used.  Help to
address the issues I've outlined below.  Thanks,

Alex

> > 
> > Signed-off-by: Alex Williamson 
> > Cc: Eric Auger 
> > ---
> > 
> > This is the current draft of the IRQ bypass manager, I've made the
> > following changes:
> > 
> >  - Incorporated Eric's extensions (I would welcome Sign-offs from all
> >involved in the development, especially Eric - I've gone ahead and
> >added Linaro copyright for the contributions so far)
> >  - Module support with module reference tracking
> >  - might_sleep() as suggested by Paolo
> >  - kerneldoc as suggested by Paolo
> >  - Renamed file s/bypass/irqbypass/ because a module named "bypass"
> >is strange
> > 
> > Issues:
> >  - The update() callback is defined but not used
> >  - We can't have *all* the callbacks be optional.  I assume add/del
> >are required
> >  - Naming consistency, stop is to start as suspend is to resume, not
> >stop/resume
> >  - Callback descriptions including why we need separate stop/start
> >hooks when it seems like the callee could reasonably assume such
> >around the add/del callbacks
> >  - Need functional prototypes for both PI and forwarding
> > 
> >  include/linux/irqbypass.h |   75 
> >  kernel/irq/Kconfig|3 +
> >  kernel/irq/Makefile   |1
> >  kernel/irq/irqbypass.c|  206
> > +
> >  4 files changed, 285 insertions(+)
> >  create mode 100644 include/linux/irqbypass.h
> >  create mode 100644 kernel/irq/irqbypass.c
> > 
> > diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h
> > new file mode 100644
> > index 000..cc7ce45
> > --- /dev/null
> > +++ b/include/linux/irqbypass.h
> > @@ -0,0 +1,75 @@
> > +/*
> > + * IRQ offload/bypass manager
> > + *
> > + * Copyright (C) 2015 Red Hat, Inc.
> > + * Copyright (c) 2015 Linaro Ltd.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + */
> > +#ifndef IRQBYPASS_H
> > +#define IRQBYPASS_H
> > +
> > +#include 
> > +
> > +struct irq_bypass_consumer;
> > +
> > +/**
> > + * struct irq_bypass_producer - IRQ bypass producer definition
> > + * @node: IRQ bypass manager private list management
> > + * @token: opaque token to match between producer and consumer
> > + * @irq: Linux IRQ number for the producer device
> > + * @stop:
> > + * @resume:
> > + * @add_consumer:
> > + * @del_consumer:
> > + *
> > + * The IRQ bypass producer structure represents an interrupt source for
> > + * participation in possible host bypass, for instance an interrupt vector
> > + * for a physical device assigned to a VM.
> > + */
> > +struct irq_bypass_producer {
> > +   struct list_head node;
> > +   void *token;
> > +   int irq; /* linux irq */
> > +   void (*stop)(struct irq_bypass_producer *);
> > +   void (*resume)(struct 

[PATCH v2 0/2] arm: kernel: implement cpuidle_ops with psci backend

2015-07-08 Thread Jisheng Zhang
We'd like to use cpuidle-arm.c for both arm and arm64 with psci as backend.
For arm64, it works. But for arm, we miss cpuidle_ops, these patches try to
address this issue.

Has been tested on Marvell Berlin SoCs. These patches are rebased on the Mark
Rutland's latest psci unification work and Lorenzo Pieralisi's psci related
work[1].

[PATCH 1/2] move psci's cpu_suspend handling to generic code as suggested
by Lorenzo.
[PATCH 2/2] implement cpuidle_ops with psci finally.

Since v1:
 - add a new file psci_cpuidle.c to implment cpuidle_ops

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-May/347352.html

Jisheng Zhang (2):
  firmware: psci: move cpu_suspend handling to generic code
  arm: kernel: implement cpuidle_ops with psci backend

 arch/arm/kernel/Makefile   |  1 +
 arch/arm/kernel/psci_cpuidle.c | 29 +
 arch/arm64/kernel/psci.c   | 95 --
 drivers/firmware/psci.c| 95 ++
 include/linux/psci.h   |  2 +
 5 files changed, 127 insertions(+), 95 deletions(-)
 create mode 100644 arch/arm/kernel/psci_cpuidle.c

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console

2015-07-08 Thread Prarit Bhargava


On 07/08/2015 07:55 AM, Peter Hurley wrote:
> Hi Taichi,
> 
> On 06/05/2015 06:03 AM, Taichi Kageyama wrote:
>> This patch provides a new parameter as a workaround of the following
>> problem. It allows us to skip autoconfig_irq() and to use a well-known irq
>> number for a console even if CONFIG_SERIAL_8250_DETECT_IRQ is defined.
>>
>> There're cases where autoconfig_irq() fails during boot.
>> In these cases, the console doesn't work in interrupt mode,
>> the mode cannot be changed anymore, and "input overrun"
>> (which can make operation mistakes) happens easily.
>> This problem happens with high rate every boot once it occurs
>> because the boot sequence is always almost same.
>>
>> autoconfig_irq() assumes that a CPU can handle an interrupt from a serial
>> during the waiting time, but there're some cases where the CPU cannot
>> handle the interrupt for longer than the time. It completely depends on
>> how other functions work on the CPU. Ideally, autoconfig_irq() should be
>> fixed
> 
> It completely depends on how long some other driver has interrupts disabled,
> which is a problem that needs fixed _in that driver_. autoconfig_irq() does 
> not
> need fixing.
> 
> A typical cause of this behavior is printk spew from overly-instrumented
> debugging of some driver (trace_printk is better suited to heavy 
> instrumentation).
> 

Peter, I understand what you're saying, however the problem is that this is the
serial driver which is one of, if not _the_ main debug output mechanism in the
kernel.

I'm not sure I agree with doing this patch, but perhaps an easier approach would
be to add a debug kernel parameter like "serial.force_irq=4" to force the irq to
a previously known good value?  That way the issue of the broken driver can be
debugged.

P.

> Regards,
> Peter Hurley
> 
> 
>> to control these cases but as long as auto_irq algorithm is used,
>> it's hard to know which CPU can handle an interrupt from a serial and
>> to assure the interrupt of the CPU is enabled during auto_irq.
>> Meanwhile for legacy consoles, they actually don't require auto_irq
>> because they basically use well-known irq number.
>> For non-console serials, this workaround is not required
>> because setserial command can kick autoconfig_irq() again for them.
>>
>> Signed-off-by: Taichi Kageyama 
>> Cc: Naoya Horiguchi 
>> ---
>>   drivers/tty/serial/8250/8250_core.c |   17 +
>>   1 files changed, 17 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/tty/serial/8250/8250_core.c 
>> b/drivers/tty/serial/8250/8250_core.c
>> index 6bf31f2..60fda28 100644
>> --- a/drivers/tty/serial/8250/8250_core.c
>> +++ b/drivers/tty/serial/8250/8250_core.c
>> @@ -65,6 +65,8 @@ static int serial_index(struct uart_port *port)
>>
>>   static unsigned int skip_txen_test; /* force skip of txen test at init 
>> time */
>>
>> +static unsigned int skip_cons_autoirq; /* force skip autoirq for console */
>> +
>>   /*
>>* Debugging.
>>*/
>> @@ -3336,6 +3338,9 @@ serial8250_register_ports(struct uart_driver *drv, 
>> struct device *dev)
>>  if (skip_txen_test)
>>  up->port.flags |= UPF_NO_TXEN_TEST;
>>
>> +if (uart_console(>port) && skip_cons_autoirq)
>> +up->port.flags &= ~UPF_AUTO_IRQ;
>> +
>>  uart_add_one_port(drv, >port);
>>  }
>>   }
>> @@ -3875,6 +3880,9 @@ int serial8250_register_8250_port(struct 
>> uart_8250_port *up)
>>  if (skip_txen_test)
>>  uart->port.flags |= UPF_NO_TXEN_TEST;
>>
>> +if (uart_console(>port) && skip_cons_autoirq)
>> +uart->port.flags &= ~UPF_AUTO_IRQ;
>> +
>>  if (up->port.flags & UPF_FIXED_TYPE)
>>  uart->port.type = up->port.type;
>>
>> @@ -3936,6 +3944,10 @@ void serial8250_unregister_port(int line)
>>  uart->port.flags &= ~UPF_BOOT_AUTOCONF;
>>  if (skip_txen_test)
>>  uart->port.flags |= UPF_NO_TXEN_TEST;
>> +
>> +if (uart_console(>port) && skip_cons_autoirq)
>> +uart->port.flags &= ~UPF_AUTO_IRQ;
>> +
>>  uart->port.type = PORT_UNKNOWN;
>>  uart->port.dev = _isa_devs->dev;
>>  uart->capabilities = 0;
>> @@ -4044,6 +4056,9 @@ MODULE_PARM_DESC(nr_uarts, "Maximum number of UARTs 
>> supported. (1-" __MODULE_STR
>>   module_param(skip_txen_test, uint, 0644);
>>   MODULE_PARM_DESC(skip_txen_test, "Skip checking for the TXEN bug at init 
>> time");
>>
>> +module_param(skip_cons_autoirq, uint, 0644);
>> +MODULE_PARM_DESC(skip_cons_autoirq, "Skip auto irq for console during 
>> boot");
>> +
>>   #ifdef CONFIG_SERIAL_8250_RSA
>>   module_param_array(probe_rsa, ulong, _rsa_count, 0444);
>>   MODULE_PARM_DESC(probe_rsa, "Probe I/O ports for RSA");
>> @@ -4070,6 +4085,8 @@ static void __used s8250_options(void)
>>  module_param_cb(share_irqs, _ops_uint, _irqs, 0644);
>>  module_param_cb(nr_uarts, _ops_uint, 

[PATCH v2 1/2] firmware: psci: move cpu_suspend handling to generic code

2015-07-08 Thread Jisheng Zhang
Functions implemented on arm64 to suspend cpu and translate the idle
state index passed by the cpu_suspend core call to a valid PSCI state
are not arm64 specific and should be moved to generic code so that they
can be reused on arm systems too.

This patch moves these functions to generic PSCI firmware layer code.

Signed-off-by: Jisheng Zhang 
---
 arch/arm64/kernel/psci.c | 95 
 drivers/firmware/psci.c  | 95 
 include/linux/psci.h |  2 +
 3 files changed, 97 insertions(+), 95 deletions(-)

diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
index 4be5972..06e5c9a 100644
--- a/arch/arm64/kernel/psci.c
+++ b/arch/arm64/kernel/psci.c
@@ -20,7 +20,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 
@@ -28,73 +27,6 @@
 #include 
 #include 
 #include 
-#include 
-
-static DEFINE_PER_CPU_READ_MOSTLY(u32 *, psci_power_state);
-
-static int __maybe_unused cpu_psci_cpu_init_idle(unsigned int cpu)
-{
-   int i, ret, count = 0;
-   u32 *psci_states;
-   struct device_node *state_node, *cpu_node;
-
-   cpu_node = of_get_cpu_node(cpu, NULL);
-   if (!cpu_node)
-   return -ENODEV;
-
-   /*
-* If the PSCI cpu_suspend function hook has not been initialized
-* idle states must not be enabled, so bail out
-*/
-   if (!psci_ops.cpu_suspend)
-   return -EOPNOTSUPP;
-
-   /* Count idle states */
-   while ((state_node = of_parse_phandle(cpu_node, "cpu-idle-states",
- count))) {
-   count++;
-   of_node_put(state_node);
-   }
-
-   if (!count)
-   return -ENODEV;
-
-   psci_states = kcalloc(count, sizeof(*psci_states), GFP_KERNEL);
-   if (!psci_states)
-   return -ENOMEM;
-
-   for (i = 0; i < count; i++) {
-   u32 state;
-
-   state_node = of_parse_phandle(cpu_node, "cpu-idle-states", i);
-
-   ret = of_property_read_u32(state_node,
-  "arm,psci-suspend-param",
-  );
-   if (ret) {
-   pr_warn(" * %s missing arm,psci-suspend-param 
property\n",
-   state_node->full_name);
-   of_node_put(state_node);
-   goto free_mem;
-   }
-
-   of_node_put(state_node);
-   pr_debug("psci-power-state %#x index %d\n", state, i);
-   if (!psci_power_state_is_valid(state)) {
-   pr_warn("Invalid PSCI power state %#x\n", state);
-   ret = -EINVAL;
-   goto free_mem;
-   }
-   psci_states[i] = state;
-   }
-   /* Idle states parsed correctly, initialize per-cpu pointer */
-   per_cpu(psci_power_state, cpu) = psci_states;
-   return 0;
-
-free_mem:
-   kfree(psci_states);
-   return ret;
-}
 
 #ifdef CONFIG_SMP
 
@@ -181,33 +113,6 @@ static int cpu_psci_cpu_kill(unsigned int cpu)
 #endif
 #endif
 
-static int psci_suspend_finisher(unsigned long index)
-{
-   u32 *state = __this_cpu_read(psci_power_state);
-
-   return psci_ops.cpu_suspend(state[index - 1],
-   virt_to_phys(cpu_resume));
-}
-
-static int __maybe_unused cpu_psci_cpu_suspend(unsigned long index)
-{
-   int ret;
-   u32 *state = __this_cpu_read(psci_power_state);
-   /*
-* idle state index 0 corresponds to wfi, should never be called
-* from the cpu_suspend operations
-*/
-   if (WARN_ON_ONCE(!index))
-   return -EINVAL;
-
-   if (!psci_power_state_loses_context(state[index - 1]))
-   ret = psci_ops.cpu_suspend(state[index - 1], 0);
-   else
-   ret = cpu_suspend(index, psci_suspend_finisher);
-
-   return ret;
-}
-
 const struct cpu_operations cpu_psci_ops = {
.name   = "psci",
 #ifdef CONFIG_CPU_IDLE
diff --git a/drivers/firmware/psci.c b/drivers/firmware/psci.c
index 2f5d611..5da8aa2 100644
--- a/drivers/firmware/psci.c
+++ b/drivers/firmware/psci.c
@@ -20,12 +20,14 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
 #include 
 #include 
 #include 
+#include 
 
 /*
  * While a 64-bit OS can make calls with SMC32 calling conventions, for some
@@ -81,6 +83,8 @@ static u32 psci_function_id[PSCI_FN_MAX];
 
 static u32 psci_cpu_suspend_feature;
 
+static DEFINE_PER_CPU_READ_MOSTLY(u32 *, psci_power_state);
+
 static inline bool psci_has_ext_power_state(void)
 {
return psci_cpu_suspend_feature &
@@ -217,6 +221,97 @@ static void psci_sys_poweroff(void)
invoke_psci_fn(PSCI_0_2_FN_SYSTEM_OFF, 0, 0, 0);
 }
 
+int cpu_psci_cpu_init_idle(unsigned int cpu)
+{
+   int i, ret, count = 0;
+   u32 *psci_states;
+   struct device_node 

Re: + kmod-remove-unecessary-explicit-wide-cpu-affinity-setting.patch added to -mm tree

2015-07-08 Thread Frederic Weisbecker
On Wed, Jul 08, 2015 at 01:32:26AM +0200, Oleg Nesterov wrote:
> Well, sorry for noise.
> 
> Let me repeat that I agree with this change, but...
> 
> On 07/07, Andrew Morton wrote:
> >
> > From: Frederic Weisbecker 
> > Subject: kmod: remove unecessary explicit wide CPU affinity setting
> >
> > Not only useless it even breaks nohz full.  The housekeeping work (general
> > kernel internal code that user doesn't care much about) is handled by a
> > reduced set of CPUs in nohz full, precisely those that are not included by
> > nohz_full= kernel parameters.  For example unbound workqueues are handled
> > by housekeeping CPUs.
> 
> I still think this part of the changelog looks confusing and just wrong.

I agree!

> 
> It is not that it breaks nohz full, unbound workqueues have nothing to
> do with housekeeping_mask from the kernel pov. But yes, people can change
> ->cpumask and this can connect to housekeeping_mask.

Right. In fact that's the motivation of the patch but the connection is
much more indirect than what the changelog suggests. So I'll fix the changelog.

> 
> Frederic, may I ask you to update the changelog? Although perhaps it was
> just me who was confused...

Sure! I think Andrew applied the patches to keep track of them and make sure
they don't get lost. But I'm working on a new iteration to replace them.

Thanks!

> Oleg.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/2] arm: kernel: implement cpuidle_ops with psci backend

2015-07-08 Thread Jisheng Zhang
This patch implements cpuidle_ops using psci. After this patch, we can
use cpuidle-arm.c with psci backend for both arm and arm64.

Signed-off-by: Jisheng Zhang 
---
 arch/arm/kernel/Makefile   |  1 +
 arch/arm/kernel/psci_cpuidle.c | 29 +
 2 files changed, 30 insertions(+)
 create mode 100644 arch/arm/kernel/psci_cpuidle.c

diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index 3b995f5..96383d8 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -91,6 +91,7 @@ obj-$(CONFIG_ARM_VIRT_EXT)+= hyp-stub.o
 ifeq ($(CONFIG_ARM_PSCI),y)
 obj-y  += psci-call.o
 obj-$(CONFIG_SMP)  += psci_smp.o
+obj-$(CONFIG_CPU_IDLE) += psci_cpuidle.o
 endif
 
 extra-y := $(head-y) vmlinux.lds
diff --git a/arch/arm/kernel/psci_cpuidle.c b/arch/arm/kernel/psci_cpuidle.c
new file mode 100644
index 000..ae85d97
--- /dev/null
+++ b/arch/arm/kernel/psci_cpuidle.c
@@ -0,0 +1,29 @@
+/*
+ * Copyright (C) 2015 Marvell Technology Group Ltd.
+ * Author: Jisheng Zhang 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+
+#include 
+
+static int psci_cpuidle_suspend(int cpu, unsigned long arg)
+{
+   return cpu_psci_cpu_suspend(arg);
+}
+
+static int psci_cpuidle_init(struct device_node *node, int cpu)
+{
+   return cpu_psci_cpu_init_idle(cpu);
+}
+
+static struct cpuidle_ops psci_cpuidle_ops __initdata = {
+   .suspend = psci_cpuidle_suspend,
+   .init = psci_cpuidle_init,
+};
+CPUIDLE_METHOD_OF_DECLARE(psci_cpuidle, "psci", _cpuidle_ops);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/2] vhost: support more than 64 memory regions

2015-07-08 Thread Igor Mammedov
On Thu,  2 Jul 2015 15:08:09 +0200
Igor Mammedov  wrote:

> changes since v3:
>   * rebased on top of vhost-next branch
> changes since v2:
>   * drop cache patches for now as suggested
>   * add max_mem_regions module parameter instead of unconditionally
> increasing limit
>   * drop bsearch patch since it's already queued
> 
> References to previous versions:
> v2: https://lkml.org/lkml/2015/6/17/276
> v1: http://www.spinics.net/lists/kvm/msg117654.html
> 
> Series allows to tweak vhost's memory regions count limit.
> 
> It fixes VM crashing on memory hotplug due to vhost refusing
> accepting more than 64 memory regions with max_mem_regions
> set to more than 262 slots in default QEMU configuration.
> 
> Igor Mammedov (2):
>   vhost: extend memory regions allocation to vmalloc
>   vhost: add max_mem_regions module parameter
> 
>  drivers/vhost/vhost.c | 28 ++--
>  1 file changed, 22 insertions(+), 6 deletions(-)
> 

ping
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    2   3   4   5   6   7   8   9   10   11   >