date:20150515

Re: [PATCH v2] crypto: omap-sham: Check for return value from pm_runtime_get_sync

2015-05-15 Thread Herbert Xu

On Thu, May 14, 2015 at 11:40:37PM +0200, Pali Rohár wrote:
> On Sunday 08 March 2015 11:01:01 Pali Rohár wrote:
> > Function pm_runtime_get_sync could fail and we need to check return
> > value to prevent kernel crash.
> > 
> > Signed-off-by: Pali Rohár 

Applied.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Patch "block: destroy bdi before blockdev is unregistered." has been added to the 4.0-stable tree

2015-05-15 Thread Sergey Senozhatsky

On (05/14/15 19:18), gre...@linuxfoundation.org wrote:
> This is a note to let you know that I've just added the patch titled
> 
> block: destroy bdi before blockdev is unregistered.
> 
> to the 4.0-stable tree which can be found at:
> 
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
> 
> The filename of the patch is:
>  block-destroy-bdi-before-blockdev-is-unregistered.patch
> and it can be found in the queue-4.0 subdirectory.
> 
> If you, or anyone else, feels it should not be added to the stable tree,
> please let  know about it.
> 
> 

Hello Greg,

jfi, I think this commit will WARN_ON(). fixed by 
https://lkml.org/lkml/2015/5/8/29


(https://lkml.org/lkml/2015/4/28/568)

-ss

> From 6cd18e711dd8075da9d78cfc1239f912ff28968a Mon Sep 17 00:00:00 2001
> From: NeilBrown 
> Date: Mon, 27 Apr 2015 14:12:22 +1000
> Subject: block: destroy bdi before blockdev is unregistered.
> 
> From: NeilBrown 
> 
> commit 6cd18e711dd8075da9d78cfc1239f912ff28968a upstream.
> 
> Because of the peculiar way that md devices are created (automatically
> when the device node is opened), a new device can be created and
> registered immediately after the
>   blk_unregister_region(disk_devt(disk), disk->minors);
> call in del_gendisk().
> 
> Therefore it is important that all visible artifacts of the previous
> device are removed before this call.  In particular, the 'bdi'.
> 
> Since:
> commit c4db59d31e39ea067c32163ac961e9c80198fd37
> Author: Christoph Hellwig 
> fs: don't reassign dirty inodes to default_backing_dev_info
> 
> moved the
>device_unregister(bdi->dev);
> call from bdi_unregister() to bdi_destroy() it has been quite easy to
> lose a race and have a new (e.g.) "md127" be created after the
> blk_unregister_region() call and before bdi_destroy() is ultimately
> called by the final 'put_disk', which must come after del_gendisk().
> 
> The new device finds that the bdi name is already registered in sysfs
> and complains
> 
> > [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 
> > sysfs_warn_dup+0x5a/0x70()
> > [ 9627.630032] sysfs: cannot create duplicate filename 
> > '/devices/virtual/bdi/9:127'
> 
> We can fix this by moving the bdi_destroy() call out of
> blk_release_queue() (which can happen very late when a refcount
> reaches zero) and into blk_cleanup_queue() - which happens exactly when the md
> device driver calls it.
> 
> Then it is only necessary for md to call blk_cleanup_queue() before
> del_gendisk().  As loop.c devices are also created on demand by
> opening the device node, we make the same change there.
> 
> Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
> Reported-by: Azat Khuzhin 
> Cc: Christoph Hellwig 
> Signed-off-by: NeilBrown 
> Reviewed-by: Christoph Hellwig 
> Signed-off-by: Jens Axboe 
> Signed-off-by: Greg Kroah-Hartman 
> 
> ---
>  block/blk-core.c |2 ++
>  block/blk-sysfs.c|2 --
>  drivers/block/loop.c |2 +-
>  drivers/md/md.c  |4 ++--
>  4 files changed, 5 insertions(+), 5 deletions(-)
> 
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -552,6 +552,8 @@ void blk_cleanup_queue(struct request_qu
>   q->queue_lock = &q->__queue_lock;
>   spin_unlock_irq(lock);
>  
> + bdi_destroy(&q->backing_dev_info);
> +
>   /* @q is and will stay empty, shutdown and put */
>   blk_put_queue(q);
>  }
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -522,8 +522,6 @@ static void blk_release_queue(struct kob
>  
>   blk_trace_shutdown(q);
>  
> - bdi_destroy(&q->backing_dev_info);
> -
>   ida_simple_remove(&blk_queue_ida, q->id);
>   call_rcu(&q->rcu_head, blk_free_queue_rcu);
>  }
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1672,8 +1672,8 @@ out:
>  
>  static void loop_remove(struct loop_device *lo)
>  {
> - del_gendisk(lo->lo_disk);
>   blk_cleanup_queue(lo->lo_queue);
> + del_gendisk(lo->lo_disk);
>   blk_mq_free_tag_set(&lo->tag_set);
>   put_disk(lo->lo_disk);
>   kfree(lo);
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4754,12 +4754,12 @@ static void md_free(struct kobject *ko)
>   if (mddev->sysfs_state)
>   sysfs_put(mddev->sysfs_state);
>  
> + if (mddev->queue)
> + blk_cleanup_queue(mddev->queue);
>   if (mddev->gendisk) {
>   del_gendisk(mddev->gendisk);
>   put_disk(mddev->gendisk);
>   }
> - if (mddev->queue)
> - blk_cleanup_queue(mddev->queue);
>  
>   kfree(mddev);
>  }
> 
> 
> Patches currently in stable-queue which might be from ne...@suse.de are
> 
> queue-4.0/block-destroy-bdi-before-blockdev-is-unregistered.patch
> --
> To unsubscribe from this list: send the line "unsubscribe stable-commits" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" i

Re: [PATCH] crypto: vmx - fix two mistyped texts

2015-05-15 Thread Herbert Xu

On Thu, May 14, 2015 at 12:21:04PM -0300, Paulo Flabiano Smorigo wrote:
> One mistyped description and another mistyped target were corrected.
> 
> Signed-off-by: Paulo Flabiano Smorigo 

Applied.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [̈́PATCHv5 00/12] usb: ulpi bus

2015-05-15 Thread Heikki Krogerus

Hi Al,

> How did you end up with that in subject lines?  "[\u0344PATCH ", that is...

I don't see anything like that in the subject lines? Is someone else
seeing it?


Thanks,

-- 
heikki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] drm/ttm: dma: Don't crash on memory in the vmalloc range

2015-05-15 Thread Alexandre Courbot

dma_alloc_coherent() can return memory in the vmalloc range.
virt_to_page() cannot handle such addresses and crashes. This
patch detects such cases and obtains the struct page * using
vmalloc_to_page() instead.

Signed-off-by: Alexandre Courbot 
---
This patch is a follow-up of the following discussion:

https://www.marc.info/?l=dri-devel&m=141579595431254&w=3

It works for me on both 32-bit and 64-bit Tegra, so I am not convinced
that Thierry's initial change from virt_to_page() to phys_to_page() is
still required - Thierry, can you confirm whether your patch is still
relevant after this one?

 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c 
b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
index 01e1d27eb078..3077f1554099 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
@@ -342,9 +342,12 @@ static struct dma_page *__ttm_dma_alloc_page(struct 
dma_pool *pool)
d_page->vaddr = dma_alloc_coherent(pool->dev, pool->size,
   &d_page->dma,
   pool->gfp_flags);
-   if (d_page->vaddr)
-   d_page->p = virt_to_page(d_page->vaddr);
-   else {
+   if (d_page->vaddr) {
+   if (is_vmalloc_addr(d_page->vaddr))
+   d_page->p = vmalloc_to_page(d_page->vaddr);
+   else
+   d_page->p = virt_to_page(d_page->vaddr);
+   } else {
kfree(d_page);
d_page = NULL;
}
-- 
2.4.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] irq fix

2015-05-15 Thread Ingo Molnar

Linus,

Please pull the latest irq-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
irq-urgent-for-linus

   # HEAD: 9cf82e72ec449b4516843377ac7a20abe300c64f irqchip: tegra: Set the 
proper base address in irq chip data

An irqchip driver memory corruption fix.

 Thanks,

Ingo

-->
Lucas Stach (1):
  irqchip: tegra: Set the proper base address in irq chip data


 drivers/irqchip/irq-tegra.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-tegra.c b/drivers/irqchip/irq-tegra.c
index 51c485d9a877..f67bbd80433e 100644
--- a/drivers/irqchip/irq-tegra.c
+++ b/drivers/irqchip/irq-tegra.c
@@ -264,7 +264,7 @@ static int tegra_ictlr_domain_alloc(struct irq_domain 
*domain,
 
irq_domain_set_hwirq_and_chip(domain, virq + i, hwirq + i,
  &tegra_ictlr_chip,
- &info->base[ictlr]);
+ info->base[ictlr]);
}
 
parent_args = *args;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Open-FCoE] [PATCH] scsi: fix Wunused-but-set-variable buildwarning

2015-05-15 Thread Nicholas Mc Guire

On Thu, 14 May 2015, Prasad Gondi wrote:

> It seems like rpriv is used to set the fsp->tgt_flags originally
> 
> >   fsp->tgt_flags = rpriv->flags 
> 
> And fsp->tgt_flags are used in "fc_fcp_cmd_send" like this
> 
> setup_timer(&fsp->timer, fc_fcp_timeout, (unsigned long)fsp);
> if (rpriv->flags & FC_RP_FLAGS_REC_SUPPORTED)
> fc_fcp_timer_set(fsp, get_fsp_rec_tov(fsp));
> 
> Main purpose of this flags used is to set the correct TimeOut Value for 
> fc_fcp_timer. 
> 
> So is the removal of the "fsp->tgt_flags = rpriv->flags" in fc_queuecommand() 
> is intentional? Or by mistake?
> 
thats something I can't say - but the commit message indicated that the
removal of tgt_flags was intentional.

> Once we clear that out we can see whether this change make sense?
>
yup - many thanks !

hofrat
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] perf fixes

2015-05-15 Thread Ingo Molnar

Linus,

Please pull the latest perf-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
perf-urgent-for-linus

   # HEAD: 60d5ddeabdda2d6453280efcf172d2429da10eac Merge branch 
'liblockdep-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/sashal/linux into perf/urgent

Mostly tooling fixes, but also a lockdep annotation fix, a PMU event 
list fix and a new model addition.

 Thanks,

Ingo

-->
Andi Kleen (1):
  tools: Fix tools/vm build

Eunbong Song (2):
  tools/liblockdep: Fix linker error in case of cross compile
  tools/liblockdep: Fix compilation error

Kan Liang (1):
  perf/x86/intel: Fix SLM cache event list

Peter Zijlstra (1):
  perf: Annotate inherited event ctx->mutex recursion

Stephane Eranian (1):
  perf/x86/rapl: Enable Broadwell-U RAPL support

Will Deacon (1):
  perf tools: Use getconf to determine number of online CPUs


 arch/x86/kernel/cpu/perf_event_intel.c  |  7 +++--
 arch/x86/kernel/cpu/perf_event_intel_rapl.c |  1 +
 kernel/events/core.c| 41 +++--
 tools/lib/lockdep/Makefile  |  3 ++-
 tools/lib/lockdep/uinclude/linux/kernel.h   |  3 +++
 tools/perf/Makefile |  2 +-
 tools/vm/Makefile   |  2 +-
 7 files changed, 44 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 960e85de13fb..3998131d1a68 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1134,7 +1134,7 @@ static __initconst const u64 slm_hw_cache_extra_regs
  [ C(LL  ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = SLM_DMND_READ|SLM_LLC_ACCESS,
-   [ C(RESULT_MISS)   ] = SLM_DMND_READ|SLM_LLC_MISS,
+   [ C(RESULT_MISS)   ] = 0,
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = SLM_DMND_WRITE|SLM_LLC_ACCESS,
@@ -1184,8 +1184,7 @@ static __initconst const u64 slm_hw_cache_event_ids
[ C(OP_READ) ] = {
/* OFFCORE_RESPONSE.ANY_DATA.LOCAL_CACHE */
[ C(RESULT_ACCESS) ] = 0x01b7,
-   /* OFFCORE_RESPONSE.ANY_DATA.ANY_LLC_MISS */
-   [ C(RESULT_MISS)   ] = 0x01b7,
+   [ C(RESULT_MISS)   ] = 0,
},
[ C(OP_WRITE) ] = {
/* OFFCORE_RESPONSE.ANY_RFO.LOCAL_CACHE */
@@ -1217,7 +1216,7 @@ static __initconst const u64 slm_hw_cache_event_ids
  [ C(ITLB) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x00c0, /* INST_RETIRED.ANY_P */
-   [ C(RESULT_MISS)   ] = 0x0282, /* ITLB.MISSES */
+   [ C(RESULT_MISS)   ] = 0x40205, /* PAGE_WALKS.I_SIDE_WALKS */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
diff --git a/arch/x86/kernel/cpu/perf_event_intel_rapl.c 
b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
index 999289b94025..358c54ad20d4 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
@@ -722,6 +722,7 @@ static int __init rapl_pmu_init(void)
break;
case 60: /* Haswell */
case 69: /* Haswell-Celeron */
+   case 61: /* Broadwell */
rapl_cntr_mask = RAPL_IDX_HSW;
rapl_pmu_events_group.attrs = rapl_events_hsw_attr;
break;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 81aa3a4ece9f..1a3bf48743ce 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -913,10 +913,30 @@ static void put_ctx(struct perf_event_context *ctx)
  * Those places that change perf_event::ctx will hold both
  * perf_event_ctx::mutex of the 'old' and 'new' ctx value.
  *
- * Lock ordering is by mutex address. There is one other site where
- * perf_event_context::mutex nests and that is put_event(). But remember that
- * that is a parent<->child context relation, and migration does not affect
- * children, therefore these two orderings should not interact.
+ * Lock ordering is by mutex address. There are two other sites where
+ * perf_event_context::mutex nests and those are:
+ *
+ *  - perf_event_exit_task_context()   [ child , 0 ]
+ *  __perf_event_exit_task()
+ *sync_child_event()
+ *  put_event()[ parent, 1 ]
+ *
+ *  - perf_event_init_context()[ parent, 0 ]
+ *  inherit_task_group()
+ *inherit_group()
+ *  inherit_event()
+ *perf_event_alloc()
+ *  perf_init_event()
+ *perf_try_init_event()[ child , 1 ]
+ *
+ * While it appears there is an obvious deadlock here -- the parent and child
+ * nesting levels are inverted between the two. This is in fact safe because
+ * life-time rules separate them. That is an exiting task cannot fork, and a
+ * spawning task cannot (yet) exit.
+ *
+ * But remembe

[GIT PULL] scheduler fixes

2015-05-15 Thread Ingo Molnar

Linus,

Please pull the latest sched-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
sched-urgent-for-linus

   # HEAD: 533445c6e53368569e50ab3fb712230c03d523f3 sched/core: Fix regression 
in cpuset_cpu_inactive() for suspend

Two fixes: a suspend/resume related regression fix, and an RT priority 
boosting fix.

 Thanks,

Ingo

-->
Omar Sandoval (1):
  sched/core: Fix regression in cpuset_cpu_inactive() for suspend

Thomas Gleixner (1):
  sched: Handle priority boosted tasks proper in setscheduler()


 include/linux/sched/rt.h |  7 ---
 kernel/locking/rtmutex.c | 12 ++-
 kernel/sched/core.c  | 54 +++-
 3 files changed, 37 insertions(+), 36 deletions(-)

diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index 6341f5be6e24..a30b172df6e1 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -18,7 +18,7 @@ static inline int rt_task(struct task_struct *p)
 #ifdef CONFIG_RT_MUTEXES
 extern int rt_mutex_getprio(struct task_struct *p);
 extern void rt_mutex_setprio(struct task_struct *p, int prio);
-extern int rt_mutex_check_prio(struct task_struct *task, int newprio);
+extern int rt_mutex_get_effective_prio(struct task_struct *task, int newprio);
 extern struct task_struct *rt_mutex_get_top_task(struct task_struct *task);
 extern void rt_mutex_adjust_pi(struct task_struct *p);
 static inline bool tsk_is_pi_blocked(struct task_struct *tsk)
@@ -31,9 +31,10 @@ static inline int rt_mutex_getprio(struct task_struct *p)
return p->normal_prio;
 }
 
-static inline int rt_mutex_check_prio(struct task_struct *task, int newprio)
+static inline int rt_mutex_get_effective_prio(struct task_struct *task,
+ int newprio)
 {
-   return 0;
+   return newprio;
 }
 
 static inline struct task_struct *rt_mutex_get_top_task(struct task_struct 
*task)
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index b73279367087..b025295f4966 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -265,15 +265,17 @@ struct task_struct *rt_mutex_get_top_task(struct 
task_struct *task)
 }
 
 /*
- * Called by sched_setscheduler() to check whether the priority change
- * is overruled by a possible priority boosting.
+ * Called by sched_setscheduler() to get the priority which will be
+ * effective after the change.
  */
-int rt_mutex_check_prio(struct task_struct *task, int newprio)
+int rt_mutex_get_effective_prio(struct task_struct *task, int newprio)
 {
if (!task_has_pi_waiters(task))
-   return 0;
+   return newprio;
 
-   return task_top_pi_waiter(task)->task->prio <= newprio;
+   if (task_top_pi_waiter(task)->task->prio <= newprio)
+   return task_top_pi_waiter(task)->task->prio;
+   return newprio;
 }
 
 /*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fe22f7510bce..57bd333bc4ab 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3300,15 +3300,18 @@ static void __setscheduler_params(struct task_struct *p,
 
 /* Actually do priority change: must hold pi & rq lock. */
 static void __setscheduler(struct rq *rq, struct task_struct *p,
-  const struct sched_attr *attr)
+  const struct sched_attr *attr, bool keep_boost)
 {
__setscheduler_params(p, attr);
 
/*
-* If we get here, there was no pi waiters boosting the
-* task. It is safe to use the normal prio.
+* Keep a potential priority boosting if called from
+* sched_setscheduler().
 */
-   p->prio = normal_prio(p);
+   if (keep_boost)
+   p->prio = rt_mutex_get_effective_prio(p, normal_prio(p));
+   else
+   p->prio = normal_prio(p);
 
if (dl_prio(p->prio))
p->sched_class = &dl_sched_class;
@@ -3408,7 +3411,7 @@ static int __sched_setscheduler(struct task_struct *p,
int newprio = dl_policy(attr->sched_policy) ? MAX_DL_PRIO - 1 :
  MAX_RT_PRIO - 1 - attr->sched_priority;
int retval, oldprio, oldpolicy = -1, queued, running;
-   int policy = attr->sched_policy;
+   int new_effective_prio, policy = attr->sched_policy;
unsigned long flags;
const struct sched_class *prev_class;
struct rq *rq;
@@ -3590,15 +3593,14 @@ static int __sched_setscheduler(struct task_struct *p,
oldprio = p->prio;
 
/*
-* Special case for priority boosted tasks.
-*
-* If the new priority is lower or equal (user space view)
-* than the current (boosted) priority, we just store the new
+* Take priority boosted tasks into account. If the new
+* effective priority is unchanged, we just store the new
 * normal parameters and do not touch the scheduler class and
 * the runqueue. This will

Re: [PATCH] force inlining of spinlock ops

2015-05-15 Thread Heiko Carstens

On Wed, May 13, 2015 at 04:09:18PM +0200, Denys Vlasenko wrote:
> On 05/13/2015 12:43 PM, Ingo Molnar wrote:
> > We only know that the net effect is +70 bytes. Does that come out of:
> > 
> >  - large fluctuations such as -1000-1000+1000+1070, which happens to 
> >net out into a small net number?
> > 
> >  - or does it come from much smaller fluctuations?
> > 
> > So to make an informed decision we need to know those details.
> 
> Fair enough. Let's investigate.
> 
> I produced a list of functions with their sizes from each vmlinux,
> and diffed them:
> 
> $ nm --size-sort vmlinux | sed 's/\.[0-9]*.*/.NNN/' >vmlinux.nm
> $ nm --size-sort vmlinuxO2.before | sed 's/\.[0-9]*.*/.NNN/' 
> >vmlinuxO2.before.nm
> $ diff -u vmlinuxO2.before.nm vmlinux.nm | grep -v '^[ @]' >vmlinux.nm.dif

FWIW, scripts/bloat-o-meter is a nice tool to examine the size differences
of two vmlinux images.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1] lib/sort: Add 64 bit swap function

2015-05-15 Thread Rasmus Villemoes

On Wed, May 13 2015, Daniel Wagner  wrote:

>  
> - if (!swap_func)
> - swap_func = (size == 4 ? u32_swap : generic_swap);
> + if (!swap_func) {
> +#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
> + switch (size) {
> + case 4:
> + swap_func = u32_swap;
> + break;
> + case 8:
> + swap_func = u64_swap;
> + break;
> + }
> +#else
> + switch (size) {
> + case 4:
> + if (((unsigned long)base & 3) == 0)
> + swap_func = u32_swap;
> + break;
> + case 8:
> + if (((unsigned long)base & 7) == 0)
> + swap_func = u64_swap;
> + break;
> + }
> +#endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */
> +
> + if (!swap_func)
> + swap_func = generic_swap;
> + }

I was more thinking of something like

static int alignment_ok(const void *base, int align)
{
return IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) ||
((unsigned long)base & (align - 1)) == 0;
}

...

if (!swap_func) {
if (size == 4 && alignment_ok(base, 4))
swap_func = u32_swap;
else if (size == 8 && alignment_ok(base, 8))
swap_func = u64_swap;
else
swap_func = generic_swap;
}

It seems to generate the same code (I usually worry about how gcc messes
up switches), so this is just a readability thing.

Rasmus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] x86 fix

2015-05-15 Thread Ingo Molnar

Linus,

Please pull the latest x86-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
x86-urgent-for-linus

   # HEAD: ef7254a595912b026d80a4116b8c4cd5b79d9c62 x86/vdso: Fix 'make 
bzImage' on older distros

A bzImage build fix on older distros.

 Thanks,

Ingo

-->
Oleg Nesterov (1):
  x86/vdso: Fix 'make bzImage' on older distros


 arch/x86/vdso/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index 275a3a8b78af..e97032069f88 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -51,7 +51,7 @@ VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
 $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
$(call if_changed,vdso)
 
-HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi
+HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi 
-I$(srctree)/arch/x86/include/uapi
 hostprogs-y+= vdso2c
 
 quiet_cmd_vdso2c = VDSO2C  $@
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] x86, espfix: use spin_lock rather than mutex

2015-05-15 Thread H. Peter Anvin


On 05/14/2015 11:54 PM, Ingo Molnar wrote:


The only slightly subtle detail with that is to use alloc_pages_node()
with the secondary CPU's node, to make sure the espfix stack is
NUMA-local to the CPU that is going to use it.



It doesn't hurt, although it isn't super critical as each page will be 
shared among 64 CPUs.  The whole espfix stack is only a single cacheline 
long.


-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] mtd: nand_bbt: separate struct nand_chip from nand_bbt.c

2015-05-15 Thread peterpandong

Currently nand_bbt.c is tied with struct nand_chip, and it makes other
NAND family chips hard to use nand_bbt.c. Maybe it's the reason why
onenand has own bbt(onenand_bbt.c).

Parameterize a few relevant device detail information into a new
nand_bbt struct, and set some hooks for chip specified part. Allocate
and initialize struct nand_bbt in nand_base.c.

Most of the patch is borrowed from Brian Norris .
http://git.infradead.org/users/norris/linux-mtd.git/shortlog/refs/heads/nand-bbt

Signed-off-by: Peter Pan 
Signed-off-by: Brian Norris 
---
 drivers/mtd/nand/docg4.c |   8 +-
 drivers/mtd/nand/nand_base.c | 145 +++-
 drivers/mtd/nand/nand_bbt.c  | 518 +--
 include/linux/mtd/bbm.h  |  96 +---
 include/linux/mtd/nand.h |  11 +-
 include/linux/mtd/nand_bbt.h | 160 +
 6 files changed, 516 insertions(+), 422 deletions(-)
 create mode 100644 include/linux/mtd/nand_bbt.h

diff --git a/drivers/mtd/nand/docg4.c b/drivers/mtd/nand/docg4.c
index 510c12d..ec83d9a 100644
--- a/drivers/mtd/nand/docg4.c
+++ b/drivers/mtd/nand/docg4.c
@@ -1037,7 +1037,7 @@ static int __init read_factory_bbt(struct mtd_info *mtd)
 * operation after device power-up.  The above read ensures it never is.
 * Ugly, I know.
 */
-   if (nand->bbt == NULL)  /* no memory-based bbt */
+   if (nand->nand_bbt == NULL)  /* no memory-based bbt */
goto exit;
 
if (mtd->ecc_stats.failed > eccfailed_stats) {
@@ -1064,7 +1064,11 @@ static int __init read_factory_bbt(struct mtd_info *mtd)
unsigned long bits = ~buf[i];
for_each_set_bit(bitnum, &bits, 8) {
int badblock = block + 7 - bitnum;
-   nand->bbt[badblock / 4] |=
+   /*
+* Should we create a mark factory bad block interface
+* for this?
+*/
+   nand->nand_bbt->bbt[badblock / 4] |=
0x03 << ((badblock % 4) * 2);
mtd->ecc_stats.badblocks++;
dev_notice(doc->dev, "factory-marked bad block: %d\n",
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 0afe763..8bd8aa5 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -98,6 +98,9 @@ static int nand_get_device(struct mtd_info *mtd, int 
new_state);
 static int nand_do_write_oob(struct mtd_info *mtd, loff_t to,
 struct mtd_oob_ops *ops);
 
+static int nand_erase_nand(struct mtd_info *mtd, struct erase_info *instr,
+  int allowbbt);
+
 /*
  * For devices which display every fart in the system on a separate LED. Is
  * compiled away when LED support is disabled.
@@ -451,8 +454,8 @@ static int nand_block_markbad_lowlevel(struct mtd_info 
*mtd, loff_t ofs)
}
 
/* Mark block bad in BBT */
-   if (chip->bbt) {
-   res = nand_markbad_bbt(mtd, ofs);
+   if (chip->nand_bbt) {
+   res = nand_bbt_markbad(chip->nand_bbt, ofs);
if (!ret)
ret = res;
}
@@ -494,10 +497,10 @@ static int nand_block_isreserved(struct mtd_info *mtd, 
loff_t ofs)
 {
struct nand_chip *chip = mtd->priv;
 
-   if (!chip->bbt)
+   if (!chip->nand_bbt)
return 0;
/* Return info from the table */
-   return nand_isreserved_bbt(mtd, ofs);
+   return nand_bbt_isreserved(chip->nand_bbt, ofs);
 }
 
 /**
@@ -514,12 +517,18 @@ static int nand_block_checkbad(struct mtd_info *mtd, 
loff_t ofs, int getchip,
   int allowbbt)
 {
struct nand_chip *chip = mtd->priv;
+   struct nand_bbt *bbt = chip->nand_bbt;
 
-   if (!chip->bbt)
+   if (!bbt)
return chip->block_bad(mtd, ofs, getchip);
 
/* Return info from the table */
-   return nand_isbad_bbt(mtd, ofs, allowbbt);
+   if (nand_bbt_isbad(bbt, ofs))
+   return 1;
+   else if (allowbbt)
+   return 0;
+   else
+   return nand_bbt_isreserved(bbt, ofs);
 }
 
 /**
@@ -2730,7 +2739,7 @@ static int nand_erase(struct mtd_info *mtd, struct 
erase_info *instr)
  *
  * Erase one ore more blocks.
  */
-int nand_erase_nand(struct mtd_info *mtd, struct erase_info *instr,
+static int nand_erase_nand(struct mtd_info *mtd, struct erase_info *instr,
int allowbbt)
 {
int page, status, pages_per_block, ret, chipnr;
@@ -2837,6 +2846,122 @@ erase_exit:
return ret;
 }
 
+/* NAND BBT helper - erase a block, including reserved blocks */
+static int nand_bbt_erase_block(struct mtd_info *mtd, loff_t addr)
+{
+   struct erase_info einfo;
+   struct nand_chip *chip = mtd->priv;
+
+   memset(&einfo, 0, sizeof(einfo));
+   einfo.mtd = mtd;
+   einfo.addr = addr;
+   einfo.len =

Re: [PATCH v9 1/4] ARM: sun7i: dt: Add Security System to A20 SoC DTS

2015-05-15 Thread Maxime Ripard

On Thu, May 14, 2015 at 02:58:58PM +0200, LABBE Corentin wrote:
> The Security System is a hardware cryptographic accelerator that support
> AES/MD5/SHA1/DES/3DES/PRNG algorithms.
> It could be found on many Allwinner SoC.
> 
> This patch enable the Security System on the Allwinner A20 SoC Device-tree.
> 
> Signed-off-by: LABBE Corentin 
> ---
>  arch/arm/boot/dts/sun7i-a20.dtsi | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/sun7i-a20.dtsi 
> b/arch/arm/boot/dts/sun7i-a20.dtsi
> index fdd1817..9a823ce 100644
> --- a/arch/arm/boot/dts/sun7i-a20.dtsi
> +++ b/arch/arm/boot/dts/sun7i-a20.dtsi
> @@ -679,6 +679,14 @@
>   status = "disabled";
>   };
>  
> + crypto: crypto-engine@01c15000 {
> + compatible = "allwinner,sun4i-a20-crypto";

This looks wrong. It's either sun4i-a10 or sun7i-a20.

It looks good otherwise.

Thanks!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


signature.asc
Description: Digital signature

Re: [PATCH v2] Documentation/arch: Add kernel feature descriptions and arch support status under Documentation/features/

2015-05-15 Thread Ingo Molnar


* Stephen Rothwell  wrote:

> Hi Ingo,
> 
> Thanks for this.  The concept is certainly good.
> 
> On Thu, 14 May 2015 21:59:25 +0200 Ingo Molnar  wrote:
> >
> > The patch that adds a new architecture to all these files would give 
> > us a good overview about how complete an initial port is.
> 
> If you want to test how hard that is (and what sort of patch it
> produces), the h8300 architecture was added to linux-next recently.

I tried this out, this is the resulting feature support matrix for the 
new h8300 architecture:

#
# Kernel feature support matrix of architecture 'h8300':
#
   arch-tick-broadcast: |   h8300: | TODO |
   BPF-JIT: |   h8300: | TODO |
   clockevents: |   h8300: |  ok  |
 cmpxchg-local: |   h8300: | TODO |
  context-tracking: |   h8300: | TODO |
 dma-api-debug: |   h8300: | TODO |
dma-contiguous: |   h8300: | TODO |
 dma_map_attrs: |   h8300: |  ok  |
  ELF-ASLR: |   h8300: | TODO |
  gcov-profile-all: |   h8300: | TODO |
   generic-idle-thread: |   h8300: | TODO |
 huge-vmap: |   h8300: | TODO |
  ioremap_prot: |   h8300: | TODO |
 irq-time-acct: |   h8300: | TODO |
   jump-labels: |   h8300: | TODO |
 KASAN: |   h8300: | TODO |
  kgdb: |   h8300: | TODO |
   kprobes: |   h8300: | TODO |
 kprobes-event: |   h8300: | TODO |
 kprobes-on-ftrace: |   h8300: | TODO |
kretprobes: |   h8300: | TODO |
   lockdep: |   h8300: | TODO |
modern-timekeeping: |   h8300: |  ok  |
numa-balancing: |   h8300: |  ..  |
 numa-memblock: |   h8300: |  ..  |
 optprobes: |   h8300: | TODO |
 perf-regs: |   h8300: | TODO |
perf-stackdump: |   h8300: | TODO |
   PG_uncached: |   h8300: | TODO |
  pmdp_splitting_flush: |   h8300: | TODO |
   pte_special: |   h8300: | TODO |
queued-rwlocks: |   h8300: | TODO |
  queued-spinlocks: |   h8300: | TODO |
   rwsem-optimized: |   h8300: |  ok  |
seccomp-filter: |   h8300: | TODO |
  sg-chain: |   h8300: | TODO |
stackprotector: |   h8300: | TODO |
   strncasecmp: |   h8300: | TODO |
   THP: |   h8300: |  ..  |
 tracehook: |   h8300: | TODO |
   uprobes: |   h8300: | TODO |
 user-ret-profiler: |   h8300: | TODO |
  virt-cpuacct: |   h8300: | TODO |

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] spi: mediatek: Add spi bus for Mediatek MT8173

2015-05-15 Thread leilk liu

Dear Mark,

Thanks for your reply.

On Tue, 2015-05-12 at 17:05 +0100, Mark Brown wrote:
> On Tue, May 12, 2015 at 08:39:16PM +0800, leilk liu wrote:
> > On Fri, 2015-05-08 at 18:53 +0100, Mark Brown wrote:
> > > On Fri, May 08, 2015 at 04:55:42PM +0800, leilk@mediatek.com wrote:
> 
> > Could you tell me more details about "You should also be using the core
> > helpers for DMA mapping"?
> 
> Implement can_dma() - look for drivers providing that for examples.
> 

MTK spi hardware uses the dmaengine in spi controller. According to
datasheet, spi driver just need to enable dma register bit and write a
physical address to relevant dma address register, so I think it may be
complex while the driver supports can_dma.

> > > > +static const struct of_device_id mtk_spi_of_match[] = {
> > > > +   { .compatible = "mediatek,mt6589-spi", .data = (void 
> > > > *)COMPAT_MT6589},
> > > > +   { .compatible = "mediatek,mt8173-spi", .data = (void 
> > > > *)COMPAT_MT8173},
> > > > +   {}
> > > > +};
> > > > +MODULE_DEVICE_TABLE(of, mtk_spi_of_match);
> 
> > > There were three compatible strings listed in the DT binding but only
> > > two here.
> 
> > MT6589 and MT8135 is compatible; 
> > For MT8135 IC, it can use the follow way in dts to probe:
> >   compatible = "mediatek,mt8135-spi", 
> >"mediatek,mt6589-spi";
> 
> > And I test it's ok on MT8135 platform. So I add struct of_device_id
> > mtk_spi_of_match like this in spi driver code.
> 
> You should list all the compatibles documented in the binding here, if
> some of them are the same just have them map to a single constant.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [̈́PATCHv5 00/12] usb: ulpi bus

2015-05-15 Thread Tal Shorer

Yes,
"[\xcd\x84PATCHv5 00/12] usb: ulpi bus"
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] kernel/printk/printk.c: check_syslog_permissions() cleanup

2015-05-15 Thread Vasily Averin

On 15.05.2015 01:01, Andrew Morton wrote:
> On Sun, 10 May 2015 09:35:53 +0300 Vasily Averin  wrote:
> 
>> Fixes: 637241a900cb ("kmsg: honor dmesg_restrict sysctl on /dev/kmsg")
>>
>> Final version of patch 637241a900cb ("kmsg: honor dmesg_restrict sysctl
>> on /dev/kmsg") lost few hooks. As result security_syslog() is not checked
>> inside check_syslog_permissions() if dmesg_restrict is set,
>> or it can be called twice in do_syslog().
> 
> I'm not seeing how security_syslog() is called twice from do_syslog(). 
> Put more details in the changelog, please.

For example, if dmesg_restrict is not set and SYSLOG_ACTION_OPEN is requested.
In this case do_syslog() calls check_syslog_permissions() 
where security_syslog() is called first time and approves the operation,
then do_syslog() itself calls security_syslog() 2nd time.

>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -484,11 +484,11 @@ int check_syslog_permissions(int type, bool from_file)
>>   * already done the capabilities checks at open time.
>>   */
>>  if (from_file && type != SYSLOG_ACTION_OPEN)
>> -return 0;
>> +goto ok;
> 
> This seems wrong - we should only call security_syslog() for opens?

Are you sure?
I saw Linus comment in thread related to old patch, 
and I agree that usual kernel checks can be skipped.

However I believe security hooks should be called anyway,
in general case they can have own rules about access, logging
or additional things that should be called before execution of requested 
operation.

Am I wrong?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 RT 3.18] irq_work: Provide a soft-irq based queue

2015-05-15 Thread Jan Kiszka

On 2015-05-14 21:58, Sebastian Andrzej Siewior wrote:
> * Jan Kiszka | 2015-04-23 09:35:59 [+0200]:
> 
>> Instead of turning all irq_work requests into lazy ones on -rt, just
>> move their execution from hard into soft-irq context.
>>
>> This resolves deadlocks of ftrace which will queue work from arbitrary
>> contexts, including those that have locks held that are needed for
>> raising a soft-irq.
>>
>> Signed-off-by: Jan Kiszka 
> Applied

The thread went on, and Mike suggested an alternative implementation [1]
that works fine and is even cleaner. Let's pick his.

Jan

[1] http://thread.gmane.org/gmane.linux.kernel/1937960




signature.asc
Description: OpenPGP digital signature

Re: [PATCHv5 03/28] memcg: adjust to support new THP refcounting

2015-05-15 Thread Vlastimil Babka


On 04/23/2015 11:03 PM, Kirill A. Shutemov wrote:

As with rmap, with new refcounting we cannot rely on PageTransHuge() to
check if we need to charge size of huge page form the cgroup. We need to
get information from caller to know whether it was mapped with PMD or
PTE.

We do uncharge when last reference on the page gone. At that point if we
see PageTransHuge() it means we need to unchange whole huge page.

The tricky part is partial unmap -- when we try to unmap part of huge
page. We don't do a special handing of this situation, meaning we don't
uncharge the part of huge page unless last user is gone or
split_huge_page() is triggered. In case of cgroup memory pressure
happens the partial unmapped page will be split through shrinker. This
should be good enough.

Signed-off-by: Kirill A. Shutemov 
Tested-by: Sasha Levin 


Acked-by: Vlastimil Babka 

But same question about whether it should be using hpage_nr_pages() 
instead of a constant.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 4/5] clk: hi6220: Clock driver support for Hisilicon hi6220 SoC

2015-05-15 Thread Bintian


Hello Stephen,

On 2015/5/15 8:25, Stephen Boyd wrote:

On 05/05, Bintian Wang wrote:

diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
index 9897f35..935c44b 100644
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@ -152,6 +152,8 @@ config COMMON_CLK_CDCE706

  source "drivers/clk/qcom/Kconfig"

+source "drivers/clk/hisilicon/Kconfig"
+


Please move this above qcom to maintain alphabet sort.

OK, fix in next version.




  endmenu

  source "drivers/clk/bcm/Kconfig"
diff --git a/drivers/clk/hisilicon/Kconfig b/drivers/clk/hisilicon/Kconfig
new file mode 100644
index 000..8034739
--- /dev/null
+++ b/drivers/clk/hisilicon/Kconfig
@@ -0,0 +1,6 @@
+config COMMON_CLK_HI6220
+   bool "Hi6220 Clock Driver"
+   depends on OF && ARCH_HISI
+   default y


Can this be

depends on ARCH_HISI || COMPILE_TEST
default ARCH_HISI

instead? I'd like to increase build coverage.

No problem, will fix in next version.




+   help
+ Build the Hisilicon Hi6220 clock driver based on the common clock 
framework.
diff --git a/drivers/clk/hisilicon/clk-hi6220.c 
b/drivers/clk/hisilicon/clk-hi6220.c
new file mode 100644
index 000..91b1cd7
--- /dev/null
+++ b/drivers/clk/hisilicon/clk-hi6220.c
@@ -0,0 +1,292 @@
+/*
+ * Hisilicon Hi6220 clock driver
+ *
+ * Copyright (c) 2015 Hisilicon Limited.
+ *
+ * Author: Bintian Wang 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 


Do we need to include linux/clk.h? I don't see any consumer
usage here.

You are right, remove in next version.




+
+#include 
+
+#include "clk.h"
+
diff --git a/drivers/clk/hisilicon/clk.c b/drivers/clk/hisilicon/clk.c
index a078e84..5d2305c 100644
--- a/drivers/clk/hisilicon/clk.c
+++ b/drivers/clk/hisilicon/clk.c
@@ -108,4 +123,6 @@ void __init hisi_clk_register_gate(struct hisi_gate_clock *,
int, struct hisi_clock_data *);
  void __init hisi_clk_register_gate_sep(struct hisi_gate_clock *,
int, struct hisi_clock_data *);
+void __init hi6220_clk_register_divider(struct hi6220_divider_clock *,
+   int, struct hisi_clock_data *);


__init markings on function prototypes are useless. Please remove them.

OK




  #endif/* __HISI_CLK_H */
diff --git a/drivers/clk/hisilicon/clkdivider-hi6220.c 
b/drivers/clk/hisilicon/clkdivider-hi6220.c
+
+/**
+ * struct hi6220_clk_divider - divider clock for hi6220
+ *
+ * @hw:handle between common and hardware-specific interfaces
+ * @reg:   register containing divider
+ * @shift: shift to the divider bit field
+ * @width: width of the divider bit field
+ * @mask:  mask for setting divider rate
+ * @table: the div table that the divider supports
+ * @lock:  register lock
+ */
+struct hi6220_clk_divider {
+   struct clk_hw   hw;
+   void __iomem*reg;
+   u8  shift;
+   u8  width;
+   u32 mask;
+   const struct clk_div_table *table;
+   spinlock_t  *lock;
+};


The clk-divider.c code has been made "reusable". Can you please
try to use the functions that it now exposes instead of
copy/pasting it and modifying it to suit your needs? A lot of
this code looks the same.

In fact, I discussed this problem with Rob Herring and Mike Turquette
in the 96boards internal mail list before.

The divider in hi6220 has a mask bit to guarantee writing the correct
bits in register when setting rate, but the index of this mask bit has
no rules to get (e.g. by left shift some fixed bits), so I add this
divider clock to handle it, we can regard hi6220_clk_divider as a
special case of generic divider clock.

If I don't add this divider clock for hi6220 chip, then I should change
the core APIs "clk_register_divider" and "clk_register_divider_table",
and then many other drivers will be updated.
So I think just add this divider clock is a good solution now.




+
+#define to_hi6220_clk_divider(_hw) \

[..]

+
+static struct clk_ops hi6220_clkdiv_ops = {


const?

Add in next version.




+   .recalc_rate = hi6220_clkdiv_recalc_rate,
+   .round_rate = hi6220_clkdiv_round_rate,
+   .set_rate = hi6220_clkdiv_set_rate,
+};
+
+struct clk *hi6220_register_clkdiv(struct device *dev, const char *name,
+   const char *parent_name, unsigned long flags, void __iomem *reg,
+   u8 shift, u8 width, u32 mask_bit, spinlock_t *lock)
+{
+   struct hi6220_clk_divider *div;
+   struct clk *clk;
+   struct clk_init_data init;
+   struct clk_div_table *table;
+   u32 max_div, min_div;
+   int i;
+
+   /* allocate the divider */
+   div = kzalloc(sizeof(struct hi6220_clk_divider), GFP_KE

Re: [PATCH v2] Documentation/arch: Add kernel feature descriptions and arch support status under Documentation/features/

2015-05-15 Thread Ingo Molnar


* Michael Ellerman  wrote:

> On Thu, 2015-05-14 at 12:38 -0700, Andrew Morton wrote:
> > > Add arch support matrices for more than 40 generic kernel features
> > > that need per architecture support.
> > > 
> > > Each feature has its own directory under 
> > > Documentation/features/feature_name/,
> > > and the arch-support.txt file shows its current arch porting status.
> > 
> > It would be nice to provide people with commit IDs to look at, but the
> > IDs won't be known at the time the documentation file is created.  We
> > could provide patch titles.
> 
> +1 on patch titles.

Ok, I'll solve this.

> > But still, let's not overdo it - get something in there, see how 
> > well it works, evolve it over time.
> > 
> > I don't think we've heard from any (non-x86) arch maintainers?  Do 
> > they consider this useful at all?  Poke.
> 
> Yes it is. I have my own version I've cobbled together for powerpc, 
> but this is much better.

Please double check the PowerPC support matrix for correctness (if you 
haven't yet):

#
# Kernel feature support matrix of architecture 'powerpc':
#
   arch-tick-broadcast:  |  ok  |
ARCH_HAS_TICK_BROADCAST #  arch provides tick_broadcast()
   BPF-JIT:  |  ok  |   
HAVE_BPF_JIT #  arch supports BPF JIT optimizations
   clockevents:  |  ok  |
GENERIC_CLOCKEVENTS #  arch support generic clock events
 cmpxchg-local:  | TODO | 
HAVE_CMPXCHG_LOCAL #  arch supports the this_cpu_cmpxchg() API
  context-tracking:  |  ok  |  
HAVE_CONTEXT_TRACKING #  arch supports context tracking for NO_HZ_FULL
 dma-api-debug:  |  ok  | 
HAVE_DMA_API_DEBUG #  arch supports DMA debug facilities
dma-contiguous:  | TODO |
HAVE_DMA_CONTIGUOUS #  arch supports the DMA CMA (continuous memory allocator)
 dma_map_attrs:  |  ok  | 
HAVE_DMA_ATTRS #  arch provides dma_*map*_attrs() APIs
  ELF-ASLR:  |  ok  | 
ARCH_HAS_ELF_RANDOMIZE #  arch randomizes the stack, heap and binary images of 
ELF binaries
  gcov-profile-all:  |  ok  |  
ARCH_HAS_GCOV_PROFILE_ALL #  arch supports whole-kernel GCOV code coverage 
profiling
   generic-idle-thread:  |  ok  |
GENERIC_SMP_IDLE_THREAD #  arch makes use of the generic SMP idle thread 
facility
 huge-vmap:  | TODO |
HAVE_ARCH_HUGE_VMAP #  arch supports the ioremap_pud_enabled() and 
ioremap_pmd_enabled() VM APIs
  ioremap_prot:  |  ok  |  
HAVE_IOREMAP_PROT #  arch has ioremap_prot()
 irq-time-acct:  |  ok  |   
HAVE_IRQ_TIME_ACCOUNTING #  arch supports precise IRQ time accounting
   jump-labels:  |  ok  |   
HAVE_ARCH_JUMP_LABEL #  arch supports live patched high efficiency branches
 KASAN:  | TODO |
HAVE_ARCH_KASAN #  arch supports the KASAN runtime memory checker
  kgdb:  |  ok  | 
HAVE_ARCH_KGDB #  arch supports the kGDB kernel debugger
   kprobes:  |  ok  |   
HAVE_KPROBES #  arch supports live patched kernel probe
 kprobes-event:  |  ok  | 
HAVE_REGS_AND_STACK_ACCESS_API #  arch supports kprobes with perf events
 kprobes-on-ftrace:  | TODO | 
HAVE_KPROBES_ON_FTRACE #  arch supports combined kprobes and ftrace live 
patching
kretprobes:  |  ok  |
HAVE_KRETPROBES #  arch supports kernel function-return probes
   lockdep:  |  ok  |
LOCKDEP_SUPPORT #  arch supports the runtime locking correctness debug facility
modern-timekeeping:  |  ok  |   
!ARCH_USES_GETTIMEOFFSET #  arch does not use arch_gettimeoffset() anymore
numa-balancing:  |  ok  |  ARCH_SUPPORTS_NUMA_BALANCING && 64BIT && 
NUMA #  arch supports NUMA balancing
 numa-memblock:  |  ok  | 
HAVE_MEMBLOCK_NODE_MAP #  arch supports NUMA aware memblocks
 optprobes:  | TODO | 
HAVE_OPTPROBES #  arch supports live patched optprobes
 perf-regs:  | TODO | 
HAVE_PERF_REGS #  arch supports perf events register access
perf-stackdump:  | TODO |  
HAVE_PERF_USER_STACK_DUMP #  arch supports perf events stack dumps
   PG_uncached:  | TODO |  
ARCH_USES_PG_UNCACHED #  arch supports the PG_uncached page flag
  pmdp_splitting_flush:  |  ok  |

Re: [PATCH] net: macb: Add better comment for RXUBR handling

2015-05-15 Thread Nicolas Ferre

Le 14/05/2015 00:01, Nathan Sullivan a écrit :
> Describe the handler for RXUBR better with a new comment.
> 
> Signed-off-by: Nathan Sullivan 
> Reviewied-by: Josh Cartwright 
> Reviewied-by: Ben Shelton 

Thanks Nathan: good that you've added this comment!

Acked-by: Nicolas Ferre 

> ---
>  drivers/net/ethernet/cadence/macb.c |6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/ethernet/cadence/macb.c 
> b/drivers/net/ethernet/cadence/macb.c
> index 61aa570..5f10dfc 100644
> --- a/drivers/net/ethernet/cadence/macb.c
> +++ b/drivers/net/ethernet/cadence/macb.c
> @@ -1037,6 +1037,12 @@ static irqreturn_t macb_interrupt(int irq, void 
> *dev_id)
>* add that if/when we get our hands on a full-blown MII PHY.
>*/
>  
> + /* There is a hardware issue under heavy load where DMA can
> +  * stop, this causes endless "used buffer descriptor read"
> +  * interrupts but it can be cleared by re-enabling RX. See
> +  * the at91 manual, section 41.3.1 or the Zynq manual
> +  * section 16.7.4 for details.
> +  */
>   if (status & MACB_BIT(RXUBR)) {
>   ctrl = macb_readl(bp, NCR);
>   macb_writel(bp, NCR, ctrl & ~MACB_BIT(RE));
> 


-- 
Nicolas Ferre
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] Documentation/arch: Add kernel feature descriptions and arch support status under Documentation/features/

2015-05-15 Thread Ingo Molnar

* Ingo Molnar  wrote:

> 
> * Stephen Rothwell  wrote:
> 
> > Hi Ingo,
> > 
> > Thanks for this.  The concept is certainly good.
> > 
> > On Thu, 14 May 2015 21:59:25 +0200 Ingo Molnar  wrote:
> > >
> > > The patch that adds a new architecture to all these files would give 
> > > us a good overview about how complete an initial port is.
> > 
> > If you want to test how hard that is (and what sort of patch it
> > produces), the h8300 architecture was added to linux-next recently.
> 
> I tried this out, this is the resulting feature support matrix for the 
> new h8300 architecture:
> 
> #
> # Kernel feature support matrix of architecture 'h8300':
> #
>arch-tick-broadcast: |   h8300: | TODO |
>BPF-JIT: |   h8300: | TODO |

Perhaps the more verbose table is more useful:

#
# Kernel feature support matrix of architecture 'h8300':
#
   arch-tick-broadcast:  | TODO |
ARCH_HAS_TICK_BROADCAST #  arch provides tick_broadcast()
   BPF-JIT:  | TODO |   
HAVE_BPF_JIT #  arch supports BPF JIT optimizations
   clockevents:  |  ok  |
GENERIC_CLOCKEVENTS #  arch support generic clock events
 cmpxchg-local:  | TODO | 
HAVE_CMPXCHG_LOCAL #  arch supports the this_cpu_cmpxchg() API
  context-tracking:  | TODO |  
HAVE_CONTEXT_TRACKING #  arch supports context tracking for NO_HZ_FULL
 dma-api-debug:  | TODO | 
HAVE_DMA_API_DEBUG #  arch supports DMA debug facilities
dma-contiguous:  | TODO |
HAVE_DMA_CONTIGUOUS #  arch supports the DMA CMA (continuous memory allocator)
 dma_map_attrs:  |  ok  | 
HAVE_DMA_ATTRS #  arch provides dma_*map*_attrs() APIs
  ELF-ASLR:  | TODO | 
ARCH_HAS_ELF_RANDOMIZE #  arch randomizes the stack, heap and binary images of 
ELF binaries
  gcov-profile-all:  | TODO |  
ARCH_HAS_GCOV_PROFILE_ALL #  arch supports whole-kernel GCOV code coverage 
profiling
   generic-idle-thread:  | TODO |
GENERIC_SMP_IDLE_THREAD #  arch makes use of the generic SMP idle thread 
facility
 huge-vmap:  | TODO |
HAVE_ARCH_HUGE_VMAP #  arch supports the ioremap_pud_enabled() and 
ioremap_pmd_enabled() VM APIs
  ioremap_prot:  | TODO |  
HAVE_IOREMAP_PROT #  arch has ioremap_prot()
 irq-time-acct:  | TODO |   
HAVE_IRQ_TIME_ACCOUNTING #  arch supports precise IRQ time accounting
   jump-labels:  | TODO |   
HAVE_ARCH_JUMP_LABEL #  arch supports live patched high efficiency branches
 KASAN:  | TODO |
HAVE_ARCH_KASAN #  arch supports the KASAN runtime memory checker
  kgdb:  | TODO | 
HAVE_ARCH_KGDB #  arch supports the kGDB kernel debugger
   kprobes:  | TODO |   
HAVE_KPROBES #  arch supports live patched kernel probe
 kprobes-event:  | TODO | 
HAVE_REGS_AND_STACK_ACCESS_API #  arch supports kprobes with perf events
 kprobes-on-ftrace:  | TODO | 
HAVE_KPROBES_ON_FTRACE #  arch supports combined kprobes and ftrace live 
patching
kretprobes:  | TODO |
HAVE_KRETPROBES #  arch supports kernel function-return probes
   lockdep:  | TODO |
LOCKDEP_SUPPORT #  arch supports the runtime locking correctness debug facility
modern-timekeeping:  |  ok  |   
!ARCH_USES_GETTIMEOFFSET #  arch does not use arch_gettimeoffset() anymore
numa-balancing:  |  ..  |  ARCH_SUPPORTS_NUMA_BALANCING && 64BIT && 
NUMA #  arch supports NUMA balancing
 numa-memblock:  |  ..  | 
HAVE_MEMBLOCK_NODE_MAP #  arch supports NUMA aware memblocks
 optprobes:  | TODO | 
HAVE_OPTPROBES #  arch supports live patched optprobes
 perf-regs:  | TODO | 
HAVE_PERF_REGS #  arch supports perf events register access
perf-stackdump:  | TODO |  
HAVE_PERF_USER_STACK_DUMP #  arch supports perf events stack dumps
   PG_uncached:  | TODO |  
ARCH_USES_PG_UNCACHED #  arch supports the PG_uncached page flag
  pmdp_splitting_flush:  | TODO |   #define 
__HAVE_ARCH_PMDP_SPLITTING_FLUSH #  arch supports the pmdp_splitting_flush() VM 
API
   pte_special:  | TODO |#define 
__HAVE_ARCH_PTE_SPECIAL #  arch supports t

Re: [PATCH RFC] Documentation/infiniband: Add docs for rdma-helpers

2015-05-15 Thread Michael Wang



On 05/13/2015 05:59 PM, Jason Gunthorpe wrote:
> On Wed, May 13, 2015 at 03:24:32PM +0200, Michael Wang wrote:
>> This is the following patch for:
>>   https://lkml.org/lkml/2015/5/5/417
>> which try to document the settled rdma_cap_XX().
>>
>> Highlights:
>>   There could be many missing/mistakes/misunderstanding, please don't
>>   be hesitate to point out the issues, any suggestions to improve or
>>   complete the description are very welcomed ;-)
> 
> I'd rather see this in the kdoc for each function.

I used to thought you mean the kernel documentation like
this... my misunderstanding but this is the usual way to
document kernel stuff, isn't it?

BTW, could you give more details on the kdoc?

Regards,
Michael Wang

> 
> Thanks,
> Jason
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 22/37] tools lib bpf: introduce bpf_load_program to bpf.c.

2015-05-15 Thread Wang Nan

bpf_load_program() can be used to load bpf program into kernel. To make
loading faster, first try to load without logbuf. Try again with logbuf
if the first try failed.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/bpf.c | 34 ++
 tools/lib/bpf/bpf.h |  7 +++
 2 files changed, 41 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 3dbe30d..e4eed22 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -33,6 +33,11 @@
 #endif
 #endif
 
+static __u64 ptr_to_u64(void *ptr)
+{
+   return (__u64) (unsigned long) ptr;
+}
+
 static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr, unsigned int size)
 {
return syscall(__NR_bpf, cmd, attr, size);
@@ -51,3 +56,32 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size, 
int value_size,
 
return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
 }
+
+int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
+size_t insns_cnt, char *license,
+u32 kern_version, char *log_buf, size_t log_buf_sz)
+{
+   int fd;
+   union bpf_attr attr;
+
+   bzero(&attr, sizeof(attr));
+   attr.prog_type = type;
+   attr.insn_cnt = (__u32)insns_cnt;
+   attr.insns = ptr_to_u64((void *) insns);
+   attr.license = ptr_to_u64((void *) license);
+   attr.log_buf = ptr_to_u64(NULL);
+   attr.log_size = 0;
+   attr.log_level = 0;
+   attr.kern_version = kern_version;
+
+   fd = sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
+   if ((fd >= 0) || !log_buf || !log_buf_sz)
+   return fd;
+
+   /* Try again with log */
+   attr.log_buf = ptr_to_u64(log_buf);
+   attr.log_size = log_buf_sz;
+   attr.log_level = 1;
+   log_buf[0] = 0;
+   return sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 1e9a53b..fb2a613 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -14,4 +14,11 @@
 int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
   int max_entries);
 
+/* Recommend log buffer size */
+#define BPF_LOG_BUF_SIZE 65536
+int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
+size_t insns_cnt, char *license,
+u32 kern_version, char *log_buf,
+size_t log_buf_sz);
+
 #endif
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 01/37] tools perf: set vmlinux_path__nr_entries to 0 in vmlinux_path__exit.

2015-05-15 Thread Wang Nan

Original vmlinux_path__exit() doesn't revert vmlinux_path__nr_entries
to its original state. After the while loop vmlinux_path__nr_entries
becomes -1 instead of 0. This makes a problem that, if runs twice,
during the second run vmlinux_path__init() will set vmlinux_path[-1]
to strdup("vmlinux"), corrupts random memory.

This patch reset vmlinux_path__nr_entries to 0 after the while loop.

Signed-off-by: Wang Nan 
Acked-by: Namhyung Kim 
---
 tools/perf/util/symbol.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 201f6c4c..451777f 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1802,6 +1802,7 @@ static void vmlinux_path__exit(void)
 {
while (--vmlinux_path__nr_entries >= 0)
zfree(&vmlinux_path[vmlinux_path__nr_entries]);
+   vmlinux_path__nr_entries = 0;
 
zfree(&vmlinux_path);
 }
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 26/37] tools perf: Add new 'perf bpf' command.

2015-05-15 Thread Wang Nan

Adding new 'perf bpf' command to provide eBPF program management
operations. This patch only creates basic 'perf bpf'. Subcommands will
be introduced in following patches.

To utilize existing code of usage_with_options() while enable
subcommand list get output after 'Usage ...' indicator, this patch
add a usage_with_options_noexit() function, which does similar thing
except exiting, allows caller print more information before quit.

In this patch, 'perf bpf' command won't be built if doesn't find
libelf.

Signed-off-by: Wang Nan 
---
 tools/perf/Build  |  1 +
 tools/perf/Documentation/perf-bpf.txt | 18 
 tools/perf/builtin-bpf.c  | 85 +++
 tools/perf/builtin.h  |  1 +
 tools/perf/command-list.txt   |  1 +
 tools/perf/perf.c |  3 ++
 tools/perf/util/parse-options.c   |  8 +++-
 tools/perf/util/parse-options.h   |  2 +
 8 files changed, 118 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/Documentation/perf-bpf.txt
 create mode 100644 tools/perf/builtin-bpf.c

diff --git a/tools/perf/Build b/tools/perf/Build
index b77370e..c3c6cc3 100644
--- a/tools/perf/Build
+++ b/tools/perf/Build
@@ -19,6 +19,7 @@ perf-y += builtin-kvm.o
 perf-y += builtin-inject.o
 perf-y += builtin-mem.o
 perf-y += builtin-data.o
+perf-$(CONFIG_LIBELF) += builtin-bpf.o
 
 perf-$(CONFIG_AUDIT) += builtin-trace.o
 perf-$(CONFIG_LIBELF) += builtin-probe.o
diff --git a/tools/perf/Documentation/perf-bpf.txt 
b/tools/perf/Documentation/perf-bpf.txt
new file mode 100644
index 000..0e8b590
--- /dev/null
+++ b/tools/perf/Documentation/perf-bpf.txt
@@ -0,0 +1,18 @@
+perf-bpf(1)
+==
+
+NAME
+
+perf-bpf - Management of eBPF programs.
+
+SYNOPSIS
+
+[verse]
+'perf bpf' []  []",
+
+DESCRIPTION
+---
+Management of eBPF programs.
+
+OPTIONS
+---
diff --git a/tools/perf/builtin-bpf.c b/tools/perf/builtin-bpf.c
new file mode 100644
index 000..a8858e2
--- /dev/null
+++ b/tools/perf/builtin-bpf.c
@@ -0,0 +1,85 @@
+/*
+ * builtin-bpf.c
+ *
+ * Copyright (C) 2015, Wang Nan 
+ * Copyright (C) 2015, Huawei Inc.
+ *
+ * Released under the GPL v2. (and only v2, not any later version)
+ *
+ * Builtin bpf command: management of bpf programs.
+ */
+#include "builtin.h"
+#include "perf.h"
+#include "debug.h"
+#include "parse-options.h"
+
+typedef int (*bpf_cmd_fn_t)(int argc, const char **argv, const char *prefix);
+
+struct bpf_cmd {
+   const char  *name;
+   const char  *summary;
+   bpf_cmd_fn_tfn;
+};
+
+static struct bpf_cmd bpf_cmds[];
+
+#define for_each_cmd(cmd) \
+   for (cmd = bpf_cmds; cmd && cmd->name; cmd++)
+
+struct option bpf_options[] = {
+   OPT_INCR('v', "verbose", &verbose, "be more verbose "
+  "(show debug information)"),
+   OPT_END()
+};
+
+static const char *bpf_usage[] = {
+   "perf bpf []  []",
+   NULL
+};
+
+static void print_usage(void)
+{
+   struct bpf_cmd *cmd;
+
+   usage_with_options_noexit(bpf_usage, bpf_options);
+   printf("\tAvailable commands:\n");
+   for_each_cmd(cmd)
+   printf("\t %s\t- %s\n", cmd->name, cmd->summary);
+   exit(129);
+}
+
+static const char * const bpf_subcommands[] = { NULL };
+
+static struct bpf_cmd bpf_cmds[] = {
+   { .name = NULL, },
+};
+
+int cmd_bpf(int argc, const char **argv,
+   const char *prefix __maybe_unused)
+{
+   struct bpf_cmd *cmd;
+   const char *cmdstr;
+
+   /* No command specified. */
+   if (argc < 2)
+   goto usage;
+
+   argc = parse_options_subcommand(argc, argv, bpf_options, 
bpf_subcommands, bpf_usage,
+PARSE_OPT_STOP_AT_NON_OPTION);
+   if (argc < 1)
+   goto usage;
+
+   cmdstr = argv[0];
+
+   for_each_cmd(cmd) {
+   if (strcmp(cmd->name, cmdstr))
+   continue;
+
+   return cmd->fn(argc, argv, prefix);
+   }
+
+   pr_err("Unknown command %s\n", cmdstr);
+usage:
+   print_usage();
+   return -1;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index 3688ad2..c2c4a0d 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -38,6 +38,7 @@ extern int cmd_trace(int argc, const char **argv, const char 
*prefix);
 extern int cmd_inject(int argc, const char **argv, const char *prefix);
 extern int cmd_mem(int argc, const char **argv, const char *prefix);
 extern int cmd_data(int argc, const char **argv, const char *prefix);
+extern int cmd_bpf(int argc, const char **argv, const char *prefix);
 
 extern int find_scripts(char **scripts_array, char **scripts_path_array);
 #endif
diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt
index 00fcaf8..1000463 100644
--- a/tools/perf/command-list.txt
+++ b/tools/perf/command-list.txt
@@ -5,6 +5,7 @@
 perf-annotate  mainporcelain common
 perf-arc

[RFC PATCH v2 17/37] tools lib bpf: collect relocation instructions for each program.

2015-05-15 Thread Wang Nan

This patch records the indics of instructions which are needed to be
relocated. Those information are saved in 'reloc_desc' field in
'struct bpf_program'. In loading phase (this patch takes effect in
opening phase), the collected instructions will be replaced by
map loading instructions.

Since we are going to close the ELF file and clear all data at the end
of 'opening' phase, ELF information will no longer be valid in
'loading' phase. We have to locate the instructions before maps are
loaded, instead of directly modifying the instruction.

'struct bpf_map_def' is introduce in this patch to let us know how many
maps defined in the object.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/libbpf.c | 128 +
 tools/lib/bpf/libbpf.h |  10 
 2 files changed, 138 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 8fd29a9..ded96cb 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -87,6 +88,12 @@ struct bpf_program {
char *section_name;
struct bpf_insn *insns;
size_t insns_cnt;
+
+   struct {
+   int insn_idx;
+   int map_idx;
+   } *reloc_desc;
+   int nr_reloc;
 };
 
 struct bpf_object {
@@ -131,6 +138,12 @@ static void bpf_clear_program(struct bpf_program *prog)
free(prog->insns);
prog->insns = NULL;
}
+   if (prog->reloc_desc) {
+   free(prog->reloc_desc);
+   prog->reloc_desc = NULL;
+   }
+
+   prog->nr_reloc = 0;
prog->insns_cnt = 0;
prog->idx = -1;
 }
@@ -521,6 +534,119 @@ out:
return err;
 }
 
+static struct bpf_program *
+bpf_find_prog_by_idx(struct bpf_object *obj, int idx)
+{
+   struct bpf_program *prog;
+   size_t i;
+
+   for (i = 0; i < obj->nr_programs; i++) {
+   prog = &obj->programs[i];
+   if (prog->idx == idx)
+   return prog;
+   }
+   return NULL;
+}
+
+static int
+bpf_program_collect_reloc(struct bpf_object *obj,
+ GElf_Shdr *shdr,
+ Elf_Data *data,
+ struct bpf_program *prog)
+{
+   int i, nrels;
+   size_t nr_maps = obj->maps_buf_sz / sizeof(struct bpf_map_def);
+
+   pr_debug("collecting relocating info for: '%s'\n",
+   prog->section_name);
+   nrels = shdr->sh_size / shdr->sh_entsize;
+   
+   prog->reloc_desc = malloc(sizeof(*prog->reloc_desc) * nrels);
+   if (!prog->reloc_desc) {
+   pr_warning("failed to alloc memory in relocation\n");
+   return -ENOMEM;
+   }
+   prog->nr_reloc = nrels;
+
+   for (i = 0; i < nrels; i++) {
+   GElf_Sym sym;
+   GElf_Rel rel;
+   unsigned int insn_idx;
+   struct bpf_insn *insns = prog->insns;
+   size_t map_idx;
+
+   if (!gelf_getrel(data, i, &rel)) {
+   pr_warning("relocation: failed to get "
+  "%d reloc\n", i);
+   return -EINVAL;
+   }
+
+   insn_idx = rel.r_offset / sizeof(struct bpf_insn);
+   pr_debug("relocation: insn_idx=%u\n", insn_idx);
+
+   if (!gelf_getsym(obj->elf.symbols,
+GELF_R_SYM(rel.r_info),
+&sym)) {
+   pr_warning("relocation: symbol %"PRIx64
+  " not found\n",
+  GELF_R_SYM(rel.r_info));
+   return -EINVAL;
+   }
+
+   if (insns[insn_idx].code != (BPF_LD | BPF_IMM | BPF_DW)) {
+   pr_warning("bpf: relocation: invalid relo for "
+  "insns[%d].code 0x%x\n",
+  insn_idx, insns[insn_idx].code);
+   return -EINVAL;
+   }
+
+   map_idx = sym.st_value / sizeof(struct bpf_map_def);
+   if (map_idx >= nr_maps) {
+   pr_warning("bpf relocation: map_idx %d large than %d\n",
+  (int)map_idx, (int)nr_maps - 1);
+   return -EINVAL;
+   }
+
+   prog->reloc_desc[i].insn_idx = insn_idx;
+   prog->reloc_desc[i].map_idx = map_idx;
+   }
+   return 0;
+}
+
+static int bpf_obj_collect_reloc(struct bpf_object *obj)
+{
+   int i, err;
+
+   if (!obj_elf_valid(obj)) {
+   pr_warning("Internal error: elf object is closed\n");
+   return -EINVAL;
+   }
+
+   for (i = 0; i < obj->elf.nr_reloc; i++) {
+   GElf_Shdr *shdr = &obj->elf.reloc[i].shdr;
+   Elf_Data *data = obj->elf.reloc[i].data;
+

[RFC PATCH v2 19/37] tools lib bpf: add bpf.c/h for common bpf operations.

2015-05-15 Thread Wang Nan

This patch introduces bpf.c and bpf.h, which hold common functions
issuing bpf syscall. The goal of these two files is to hide syscall
completly from user.  Note that bpf.c and bpf.h only deal with kernel
interface. Things like structure of 'map' section in the ELF object is
not cared by of bpf.[ch].

We first introduce bpf_create_map().

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/Build |  2 +-
 tools/lib/bpf/bpf.c | 53 +
 tools/lib/bpf/bpf.h | 17 +
 3 files changed, 71 insertions(+), 1 deletion(-)
 create mode 100644 tools/lib/bpf/bpf.c
 create mode 100644 tools/lib/bpf/bpf.h

diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index a316484..d874975 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1 +1 @@
-libbpf-y := libbpf.o
+libbpf-y := libbpf.o bpf.o
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
new file mode 100644
index 000..3dbe30d
--- /dev/null
+++ b/tools/lib/bpf/bpf.c
@@ -0,0 +1,53 @@
+/*
+ * common eBPF ELF operations.
+ *
+ * Copyright (C) 2015, Wang Nan 
+ * Copyright (C) 2015, Huawei Inc.
+ *
+ * Released under the GPL v2. (and only v2, not any later version)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf.h"
+
+/* When building perf, unistd.h is override. __NR_bpf by ourself. */
+#if defined(__i386__)
+#ifndef __NR_bpf
+# define __NR_bpf 357
+#endif
+#endif
+
+#if defined(__x86_64__)
+#ifndef __NR_bpf
+# define __NR_bpf 321
+#endif
+#endif
+
+#if defined(__aarch64__)
+#ifndef __NR_bpf
+# define __NR_bpf 280
+#endif
+#endif
+
+static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr, unsigned int size)
+{
+   return syscall(__NR_bpf, cmd, attr, size);
+}
+
+int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
+  int max_entries)
+{
+   union bpf_attr attr;
+   memset(&attr, '\0', sizeof(attr));
+   
+   attr.map_type = map_type;
+   attr.key_size = key_size;
+   attr.value_size = value_size;
+   attr.max_entries = max_entries;
+
+   return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
new file mode 100644
index 000..1e9a53b
--- /dev/null
+++ b/tools/lib/bpf/bpf.h
@@ -0,0 +1,17 @@
+/*
+ * common eBPF ELF operations.
+ *
+ * Copyright (C) 2015, Wang Nan 
+ * Copyright (C) 2015, Huawei Inc.
+ *
+ * Released under the GPL v2. (and only v2, not any later version)
+ */
+#ifndef __LIBBPF_BPF_H
+#define __LIBBPF_BPF_H
+
+#include 
+
+int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
+  int max_entries);
+
+#endif
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 06/37] tools lib bpf: allow set printing function.

2015-05-15 Thread Wang Nan

By libbpf_set_print(), users of libbpf are allowed to register he/she
own debug, info and warning printing functions. Libbpf will use those
functions to print messages. If not provided, default info and warning
printing functions are fprintf(stderr, ...); defailt debug printing
is NULL.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/libbpf.c | 43 +++
 tools/lib/bpf/libbpf.h |  4 
 2 files changed, 47 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index bebe99a..d7a7869 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -8,8 +8,51 @@
  */
 
 #include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
 
 #include "libbpf.h"
+
+#define __printf(a, b) __attribute__((format(printf, a, b)))
+
+__printf(1, 2)
+static int __base_pr(const char *format, ...)
+{
+   va_list args;
+   int err;
+
+   va_start(args, format);
+   err = vfprintf(stderr, format, args);
+   va_end(args);
+   return err;
+}
+
+static __printf(1, 2) int (*__pr_warning)(const char *format, ...) =
+   __base_pr;
+static __printf(1, 2) int (*__pr_info)(const char *format, ...) =
+   __base_pr;
+static __printf(1, 2) int (*__pr_debug)(const char *format, ...) =
+   NULL;
+
+#define __pr(func, fmt, ...)   \
+do {   \
+   if ((func)) \
+   (func)("libbpf: " fmt, ##__VA_ARGS__); \
+} while(0)
+
+#define pr_warning(fmt, ...)   __pr(__pr_warning, fmt, ##__VA_ARGS__)
+#define pr_info(fmt, ...)  __pr(__pr_info, fmt, ##__VA_ARGS__)
+#define pr_debug(fmt, ...) __pr(__pr_debug, fmt, ##__VA_ARGS__)
+
+void libbpf_set_print(int (*warn)(const char *format, ...),
+ int (*info)(const char *format, ...),
+ int (*debug)(const char *format, ...))
+{
+   __pr_warning = warn;
+   __pr_info = info;
+   __pr_debug = debug;
+}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 10845a0..eb306c0 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -9,4 +9,8 @@
 #ifndef __BPF_LIBBPF_H
 #define __BPF_LIBBPF_H
 
+void libbpf_set_print(int (*warn)(const char *format, ...),
+ int (*info)(const char *format, ...),
+ int (*debug)(const char *format, ...));
+
 #endif
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 05/37] tools lib bpf: introduce 'bpf' library to tools.

2015-05-15 Thread Wang Nan

This is the first patch of libbpf. The goal of libbpf is to create a
standard way for accessing eBPF object files. This patch creates
Makefile and Build for it, allows 'make' to build libbpf.a and
libbpf.so, 'make install' to put them into proper directories.
Most part of Makefile is borrowed from traceevent. Before building,
it checks the existance of libelf in Makefile, and deny to build if
not found. Instead of throw an error if libelf not found, the error
raises in a phony target "elfdep". This design is to ensure
'make clean' still workable even if libelf is not found.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/.gitignore |   2 +
 tools/lib/bpf/Build  |   1 +
 tools/lib/bpf/Makefile   | 191 +++
 tools/lib/bpf/libbpf.c   |  15 
 tools/lib/bpf/libbpf.h   |  12 +++
 5 files changed, 221 insertions(+)
 create mode 100644 tools/lib/bpf/.gitignore
 create mode 100644 tools/lib/bpf/Build
 create mode 100644 tools/lib/bpf/Makefile
 create mode 100644 tools/lib/bpf/libbpf.c
 create mode 100644 tools/lib/bpf/libbpf.h

diff --git a/tools/lib/bpf/.gitignore b/tools/lib/bpf/.gitignore
new file mode 100644
index 000..812aeed
--- /dev/null
+++ b/tools/lib/bpf/.gitignore
@@ -0,0 +1,2 @@
+libbpf_version.h
+FEATURE-DUMP
diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
new file mode 100644
index 000..a316484
--- /dev/null
+++ b/tools/lib/bpf/Build
@@ -0,0 +1 @@
+libbpf-y := libbpf.o
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
new file mode 100644
index 000..bd5769c
--- /dev/null
+++ b/tools/lib/bpf/Makefile
@@ -0,0 +1,191 @@
+BPF_VERSION = 0
+BPF_PATCHLEVEL = 0
+BPF_EXTRAVERSION = 1
+
+# Version of eBPF elf file
+FILE_VERSION = 1
+
+MAKEFLAGS += --no-print-directory
+
+
+# Makefiles suck: This macro sets a default value of $(2) for the
+# variable named by $(1), unless the variable has been set by
+# environment or command line. This is necessary for CC and AR
+# because make sets default values, so the simpler ?= approach
+# won't work as expected.
+define allow-override
+  $(if $(or $(findstring environment,$(origin $(1))),\
+$(findstring command line,$(origin $(1,,\
+$(eval $(1) = $(2)))
+endef
+
+# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix.
+$(call allow-override,CC,$(CROSS_COMPILE)gcc)
+$(call allow-override,AR,$(CROSS_COMPILE)ar)
+
+EXT = -std=gnu99
+INSTALL = install
+
+# Use DESTDIR for installing into a different root directory.
+# This is useful for building a package. The program will be
+# installed in this directory as if it was the root directory.
+# Then the build tool can move it later.
+DESTDIR ?=
+DESTDIR_SQ = '$(subst ','\'',$(DESTDIR))'
+
+LP64 := $(shell echo __LP64__ | ${CC} ${CFLAGS} -E -x c - | tail -n 1)
+ifeq ($(LP64), 1)
+  libdir_relative = lib64
+else
+  libdir_relative = lib
+endif
+
+prefix ?= /usr/local
+libdir = $(prefix)/$(libdir_relative)
+man_dir = $(prefix)/share/man
+man_dir_SQ = '$(subst ','\'',$(man_dir))'
+
+export man_dir man_dir_SQ INSTALL
+export DESTDIR DESTDIR_SQ
+
+include ../../scripts/Makefile.include
+
+# copy a bit from Linux kbuild
+
+ifeq ("$(origin V)", "command line")
+  VERBOSE = $(V)
+endif
+ifndef VERBOSE
+  VERBOSE = 0
+endif
+
+ifeq ($(srctree),)
+srctree := $(patsubst %/,%,$(dir $(shell pwd)))
+srctree := $(patsubst %/,%,$(dir $(srctree)))
+srctree := $(patsubst %/,%,$(dir $(srctree)))
+#$(info Determined 'srctree' to be $(srctree))
+endif
+
+FEATURE_DISPLAY = libelf libelf-getphdrnum libelf-mmap
+FEATURE_TESTS = libelf
+include $(srctree)/tools/build/Makefile.feature
+
+ifeq ($(feature-libelf-mmap), 1)
+  override CFLAGS += -DHAVE_LIBELF_MMAP_SUPPORT
+endif
+
+ifeq ($(feature-libelf-getphdrnum), 1)
+  override CFLAGS += -DHAVE_ELF_GETPHDRNUM_SUPPORT
+endif
+
+export prefix libdir src obj
+
+# Shell quotes
+libdir_SQ = $(subst ','\'',$(libdir))
+libdir_relative_SQ = $(subst ','\'',$(libdir_relative))
+plugin_dir_SQ = $(subst ','\'',$(plugin_dir))
+
+LIB_FILE = libbpf.a libbpf.so
+
+VERSION= $(BPF_VERSION)
+PATCHLEVEL = $(BPF_PATCHLEVEL)
+EXTRAVERSION   = $(BPF_EXTRAVERSION)
+
+OBJ= $@
+N  =
+
+LIBBPF_VERSION = $(BPF_VERSION).$(BPF_PATCHLEVEL).$(BPF_EXTRAVERSION)
+
+INCLUDES = -I. -I $(srctree)/tools/include 
-I$(srctree)/arch/$(ARCH)/include/uapi -I$(srctree)/include/uapi
+
+# Set compile option CFLAGS
+ifdef EXTRA_CFLAGS
+  CFLAGS := $(EXTRA_CFLAGS)
+else
+  CFLAGS := -g -Wall
+endif
+
+# Append required CFLAGS
+override CFLAGS += -fPIC
+override CFLAGS += $(INCLUDES)
+override CFLAGS += -D_GNU_SOURCE
+
+ifeq ($(VERBOSE),1)
+  Q =
+else
+  Q = @
+endif
+
+# Disable command line variables (CFLAGS) overide from top
+# level Makefile (perf), otherwise build Makefile will get
+# the same command line setup.
+MAKEOVERRIDES=
+
+export srctree OUTPUT CC LD CFLAGS V
+build := -f $(srctree)/tools/build/Makefile.build dir=. obj
+
+BPF_IN:= $(OUTPUT)libbpf-in.o
+LIB_FILE := $(addprefix $(OUTPUT),

[RFC PATCH v2 12/37] tools lib bpf: collect map definitions.

2015-05-15 Thread Wang Nan

If maps are used by eBPF programs, corresponding object file(s) should
contain a section named 'map'. Which contains map definitions. This
patch copies the data of the whole section. Map data parsing should be
acted just before map loading.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/libbpf.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index b26f1ee..6ee5f3c 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -82,6 +82,8 @@ struct bpf_object {
bool needs_swap;
char license[64];
u32 kern_version;
+   void *maps_buf;
+   size_t maps_buf_sz;
 
/*
 * Information when doing elf related work. Only valid if fd
@@ -244,6 +246,27 @@ static int bpf_obj_kver_init(struct bpf_object *obj,
return 0;
 }
 
+static int bpf_obj_maps_init(struct bpf_object *obj, void *data,
+size_t size)
+{
+   if (size == 0) {
+   pr_debug("%s doesn't need map definition\n",
+obj->path);
+   return 0;
+   }
+
+   obj->maps_buf = malloc(size);
+   if (!obj->maps_buf) {
+   pr_warning("malloc maps failed: %s\n", obj->path);
+   return -ENOMEM;
+   }
+
+   obj->maps_buf_sz = size;
+   memcpy(obj->maps_buf, data, size);
+   pr_debug("maps in %s: %ld bytes\n", obj->path, (long)size);
+   return 0;
+}
+
 static int bpf_obj_elf_collect(struct bpf_object *obj)
 {
Elf *elf = obj->elf.elf;
@@ -298,6 +321,9 @@ static int bpf_obj_elf_collect(struct bpf_object *obj)
else if (strcmp(name, "version") == 0)
err = bpf_obj_kver_init(obj, data->d_buf,
data->d_size);
+   else if (strcmp(name, "maps") == 0)
+   err = bpf_obj_maps_init(obj, data->d_buf,
+   data->d_size);
if (err)
goto out;
}
@@ -359,5 +385,7 @@ void bpf_close_object(struct bpf_object *obj)
 
if (obj->path)
free(obj->path);
+   if (obj->maps_buf)
+   free(obj->maps_buf);
free(obj);
 }
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 13/37] tools lib bpf: collect config section in object.

2015-05-15 Thread Wang Nan

A 'config' section is allowed to enable eBPF object file to pass
something to user. libbpf doesn't use config string.

To make further processing easiler, this patch converts '\0' in the
whole config strings into '\n'

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/libbpf.c | 48 
 1 file changed, 48 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 6ee5f3c..43b22a5 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -84,6 +84,7 @@ struct bpf_object {
u32 kern_version;
void *maps_buf;
size_t maps_buf_sz;
+   char *config_str;
 
/*
 * Information when doing elf related work. Only valid if fd
@@ -267,6 +268,48 @@ static int bpf_obj_maps_init(struct bpf_object *obj, void 
*data,
return 0;
 }
 
+static int bpf_obj_config_init(struct bpf_object *obj, void *data,
+  size_t size)
+{
+   char *config_str;
+   char *p, *pend;
+
+   if (size == 0) {
+   pr_debug("bpf: config section in %s empty\n",
+obj->path);
+   return 0;
+   }
+   if (obj->config_str) {
+   pr_warning("bpf: multiple config section in %s\n",
+  obj->path);
+   return -EEXIST;
+   }
+
+   config_str = malloc(size + 1);
+   if (!config_str) {
+   pr_warning("bpf: malloc config string failed\n");
+   return -ENOMEM;
+   }
+
+   memcpy(config_str, data, size);
+
+   /*
+* It is possible that config section contains multiple
+* Make it a big string by converting all '\0' to '\n' and
+* append final '\0'.
+*/
+   pend = config_str + size;
+   for (p = config_str; p < pend; p++)
+   *p == '\0' ? *p = '\n' : 0 ;
+   *pend = '\0';
+
+   obj->config_str = config_str;
+   pr_debug("--- CONFIG STRING IN %s: ---\n%s\n",
+obj->path, config_str);
+   pr_debug("\n");
+   return 0;
+}
+
 static int bpf_obj_elf_collect(struct bpf_object *obj)
 {
Elf *elf = obj->elf.elf;
@@ -324,6 +367,9 @@ static int bpf_obj_elf_collect(struct bpf_object *obj)
else if (strcmp(name, "maps") == 0)
err = bpf_obj_maps_init(obj, data->d_buf,
data->d_size);
+   else if (strcmp(name, "config") == 0)
+   err = bpf_obj_config_init(obj, data->d_buf,
+ data->d_size);
if (err)
goto out;
}
@@ -387,5 +433,7 @@ void bpf_close_object(struct bpf_object *obj)
free(obj->path);
if (obj->maps_buf)
free(obj->maps_buf);
+   if (obj->config_str)
+   free(obj->config_str);
free(obj);
 }
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 34/37] tools perf: add a bpf_wrapper global flag.

2015-05-15 Thread Wang Nan

The newly introduced flag is a indicator for 'perf bpf'. When commands
like 'cmd_record' is started using 'perf bpf', they should consider the
binding of bpf programs.

Signed-off-by: Wang Nan 
---
 tools/perf/builtin-bpf.c | 3 +++
 tools/perf/perf.c| 7 +++
 tools/perf/perf.h| 1 +
 3 files changed, 11 insertions(+)

diff --git a/tools/perf/builtin-bpf.c b/tools/perf/builtin-bpf.c
index 8155f39..94978c7 100644
--- a/tools/perf/builtin-bpf.c
+++ b/tools/perf/builtin-bpf.c
@@ -201,6 +201,9 @@ int cmd_bpf(int argc, const char **argv,
 
cmdstr = argv[0];
 
+   /* This flag is for commands 'perf bpf' start. */
+   bpf_wrapper = true;
+
for_each_cmd(cmd) {
if (strcmp(cmd->name, cmdstr))
continue;
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index eff1a55..2c41c43 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -28,6 +28,13 @@ int use_browser = -1;
 static int use_pager = -1;
 const char *input_name;
 
+/*
+ * Only for cmd_bpf, set this wrapper to true. This flag is to tell
+ * commands like 'record' that they are running inside a 'perf bpf'
+ * command, and let them consider binding of bpf programs.
+ */
+bool bpf_wrapper = false;
+
 struct cmd_struct {
const char *cmd;
int (*fn)(int, const char **, const char *);
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index e14bb63..f3d233a 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -69,4 +69,5 @@ struct record_opts {
 struct option;
 extern const char * const *record_usage;
 extern struct option *record_options;
+extern bool bpf_wrapper;
 #endif
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 28/37] tools perf: add 'perf bpf record' subcommand.

2015-05-15 Thread Wang Nan

'perf bpf record' will be implemented to load eBPF object file then
start recording on events defined in it. This patch only adds a
'--object' option for selecting object file. Other arguments are
directly passed to cmd_record.

Example:

 # perf bpf --object obj1.o --object obj2.o -- -a command-to-record

Signed-off-by: Wang Nan 
---
 tools/perf/builtin-bpf.c | 111 ++-
 1 file changed, 109 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-bpf.c b/tools/perf/builtin-bpf.c
index a8858e2..b5c613f 100644
--- a/tools/perf/builtin-bpf.c
+++ b/tools/perf/builtin-bpf.c
@@ -48,14 +48,121 @@ static void print_usage(void)
exit(129);
 }
 
-static const char * const bpf_subcommands[] = { NULL };
+static const char * const bpf_record_usage[] = {
+   "perf bpf record [] -- [options passed to record]",
+   NULL
+};
+
+struct param {
+   struct strlist *object_file_names;
+} param;
+
+static int add_bpf_object_file(const struct option *opt,
+  const char *str,
+  int unset __maybe_unused)
+{
+   struct strlist **filenames = (struct strlist **)opt->value;
+
+   if (!*filenames)
+   *filenames = strlist__new(true, NULL);
+
+   if (!*filenames) {
+   pr_err("alloc strlist failed\n");
+   return -ENOMEM;
+   }
+
+   strlist__add(*filenames, str);
+   return 0;
+}
+
+static int start_bpf_record(int argc, const char **argv)
+{
+   int i, new_argc, err, pos = 0;
+   const char **new_argv;
+
+   new_argc = argc + 1;
+   new_argv = malloc((new_argc + 1) * sizeof(const char *));
+   if (!new_argv) {
+   pr_err("malloc failed\n");
+   return -ENOMEM;
+   }
+   bzero(new_argv, sizeof(const char *) * (new_argc + 1));
+
+   new_argv[pos++] = strdup("bpf record --");
+
+   for (i = 0; i < argc; i++) {
+   new_argv[pos++] = strdup(argv[i]);
+   if (!new_argv[pos - 1]) {
+   pr_err("strdup failed\n");
+   err = -ENOMEM;
+   goto errout;
+   }
+   }
+
+   return cmd_record(new_argc, new_argv, NULL);
+
+errout:
+   if (new_argv) {
+   for (i = 0; i < pos; i++)
+   free((void *)new_argv[i]);
+   free(new_argv);
+   }
+   return err;
+}
+
+static int cmd_bpf_record(int argc, const char **argv,
+   const char *prefix __maybe_unused)
+{
+   /*
+* Options in perf-record may be mirrored here. This command
+* should add '-e' options to cmd_record.
+*/
+   static const struct option options[] = {
+   OPT_CALLBACK(0, "object", ¶m.object_file_names,
+"file", "eBPF object file name",
+add_bpf_object_file),
+   OPT_END()
+   };
+   struct str_node *str_node;
+
+   argc = parse_options(argc, argv, options,
+bpf_record_usage, PARSE_OPT_KEEP_DASHDASH);
+
+   if (!param.object_file_names) {
+   pr_err("At least one '--object' option is needed to "
+  "select an eBPF object file\n");
+   goto usage;
+   }
+
+   if (!argv || strcmp(argv[0], "--")) {
+   pr_err("Should use '--' to pass options to perf "
+  "record\n");
+   goto usage;
+   }
+
+   /* skip "--" */
+   argc--;
+   argv++;
+
+   strlist__for_each(str_node, param.object_file_names)
+   pr_debug("loading %s\n", str_node->s);
+
+   return start_bpf_record(argc, argv);
+usage:
+   usage_with_options(bpf_record_usage, options);
+   return -1;
+}
+
+
+static const char * const bpf_subcommands[] = { "record", NULL };
 
 static struct bpf_cmd bpf_cmds[] = {
+   { "record", "load eBPF program into kernel then start record on events 
in it", cmd_bpf_record },
{ .name = NULL, },
 };
 
 int cmd_bpf(int argc, const char **argv,
-   const char *prefix __maybe_unused)
+   const char *prefix)
 {
struct bpf_cmd *cmd;
const char *cmdstr;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 14/37] tools lib bpf: collect symbol table in object files.

2015-05-15 Thread Wang Nan

This patch collects symbols section. This section is useful when
linking ELF maps.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/libbpf.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 43b22a5..2068f0b 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -94,6 +94,7 @@ struct bpf_object {
int fd;
Elf *elf;
GElf_Ehdr ehdr;
+   Elf_Data *symbols;
} elf;
 };
 #define obj_elf_valid(o)   ((o)->elf.fd >= 0)
@@ -142,6 +143,8 @@ static void bpf_obj_clear_elf(struct bpf_object *obj)
elf_end(obj->elf.elf);
obj->elf.elf = NULL;
}
+   obj->elf.symbols = NULL;
+
if (obj->elf.fd >= 0) {
close(obj->elf.fd);
obj->elf.fd = -1;
@@ -370,6 +373,14 @@ static int bpf_obj_elf_collect(struct bpf_object *obj)
else if (strcmp(name, "config") == 0)
err = bpf_obj_config_init(obj, data->d_buf,
  data->d_size);
+   else if (sh.sh_type == SHT_SYMTAB) {
+   if (obj->elf.symbols) {
+   pr_warning("bpf: multiple SYMTAB in %s\n",
+  obj->path);
+   err = -EEXIST;
+   } else
+   obj->elf.symbols = data;
+   }
if (err)
goto out;
}
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 37/37] tools perf bpf: passes generated arguments to cmd_record.

2015-05-15 Thread Wang Nan

This patch utilizes previous introduced bpf_load_gen_argv() to generate
arguments for cmd_record.

We should try to avoid triggering atexit() unprobe handler because the
state of the process is uncommon. For example, stdio and stderr are
closed. Moreover, we are unable to ensure atexit() handler will be
called at every situations.

Signed-off-by: Wang Nan 
---
 tools/perf/builtin-bpf.c | 36 ++--
 1 file changed, 10 insertions(+), 26 deletions(-)

diff --git a/tools/perf/builtin-bpf.c b/tools/perf/builtin-bpf.c
index 94978c7..c818c9f 100644
--- a/tools/perf/builtin-bpf.c
+++ b/tools/perf/builtin-bpf.c
@@ -78,36 +78,20 @@ static int add_bpf_object_file(const struct option *opt,
 
 static int start_bpf_record(int argc, const char **argv)
 {
-   int i, new_argc, err, pos = 0;
+   int new_argc, err;
const char **new_argv;
 
-   new_argc = argc + 1;
-   new_argv = malloc((new_argc + 1) * sizeof(const char *));
-   if (!new_argv) {
-   pr_err("malloc failed\n");
-   return -ENOMEM;
-   }
-   bzero(new_argv, sizeof(const char *) * (new_argc + 1));
-
-   new_argv[pos++] = strdup("bpf record --");
-
-   for (i = 0; i < argc; i++) {
-   new_argv[pos++] = strdup(argv[i]);
-   if (!new_argv[pos - 1]) {
-   pr_err("strdup failed\n");
-   err = -ENOMEM;
-   goto errout;
-   }
+   if ((err = bpf_load_gen_argv(&new_argc, &new_argv,
+argc, argv,
+"bpf record --")))
+   {
+   pr_err("Failed to generate arguments for record\n");
+   return err;
}
 
-   return cmd_record(new_argc, new_argv, NULL);
-
-errout:
-   if (new_argv) {
-   for (i = 0; i < pos; i++)
-   free((void *)new_argv[i]);
-   free(new_argv);
-   }
+   /* new_argv won't be freed because cmd_record may change it. */
+   err = cmd_record(new_argc, new_argv, NULL);
+   bpf_unprobe();
return err;
 }
 
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 16/37] tools lib bpf: collect relocation sections from object file.

2015-05-15 Thread Wang Nan

This patch collects relocation sections into 'struct object'.
Such sections are used for connecting maps to bpf programs.
'reloc' field in 'struct bpf_object' is introduced for storing
such informations.

This patch simply store the data into 'reloc' field. Following
patch will parse them to know the exact instructions which are
needed to be relocated.

Note that the collected data will be invalid after ELF object file
is closed.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/libbpf.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 18b221f..8fd29a9 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -110,6 +110,11 @@ struct bpf_object {
Elf *elf;
GElf_Ehdr ehdr;
Elf_Data *symbols;
+   struct {
+   GElf_Shdr shdr;
+   Elf_Data *data;
+   } *reloc;
+   int nr_reloc;
} elf;
 };
 #define obj_elf_valid(o)   ((o)->elf.fd >= 0)
@@ -233,6 +238,12 @@ static void bpf_obj_clear_elf(struct bpf_object *obj)
}
obj->elf.symbols = NULL;
 
+   if (obj->elf.reloc) {
+   free(obj->elf.reloc);
+   obj->elf.reloc = NULL;
+   obj->elf.nr_reloc = 0;
+   }
+
if (obj->elf.fd >= 0) {
close(obj->elf.fd);
obj->elf.fd = -1;
@@ -484,6 +495,24 @@ static int bpf_obj_elf_collect(struct bpf_object *obj)
} else
pr_debug("found program %s\n",
 prog->section_name);
+   } else if (sh.sh_type == SHT_REL) {
+   void *reloc = obj->elf.reloc;
+   int nr_reloc = obj->elf.nr_reloc;
+
+   reloc = realloc(reloc,
+   sizeof(*obj->elf.reloc) * (++nr_reloc));
+   if (!reloc) {
+   pr_warning("realloc failed\n");
+   err = -ENOMEM;
+   } else {
+   int n = nr_reloc - 1;
+
+   obj->elf.reloc = reloc;
+   obj->elf.nr_reloc = nr_reloc;
+
+   obj->elf.reloc[n].shdr = sh;
+   obj->elf.reloc[n].data = data;
+   }
}
if (err)
goto out;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 36/37] tools perf: generate event argv.

2015-05-15 Thread Wang Nan

bpf_load_gen_argv() generates argc and argv which will be passed to
other commands like cmd_record(). The generated arguments utilized
previous introduced '|' syntax to pass file descriptor of bpf
programs.

Signed-off-by: Wang Nan 
---
 tools/perf/util/bpf-loader.c | 87 
 tools/perf/util/bpf-loader.h |  8 
 2 files changed, 95 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 7295a3b..a75b0b7 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -215,6 +215,93 @@ int bpf_probe(void)
return err < 0 ? err : 0;
 }
 
+int bpf_load_gen_argv(int *pargc, const char ***pargv,
+ int old_argc, const char **old_argv,
+ const char *arg0)
+{
+   const char **argv = NULL;
+   int i, argc, pos, err;
+
+   if (!pargc || !pargv)
+   return -EINVAL;
+   if (!old_argv && old_argc)
+   return -EINVAL;
+
+   /*
+*NIL
+* Omit the last NIL in argc.
+*/
+   argc = 1 + params.nr_progs * 2 + old_argc;
+   argv = malloc(sizeof(const char *) * (argc + 1));
+   if (!argv) {
+   pr_err("No enough memory\n");
+   return -ENOMEM;
+   }
+   bzero(argv, sizeof(char *) * (argc + 1));
+
+   pos = 0;
+
+   if (arg0)
+   argv[pos++] = strdup(arg0);
+
+   for (i = 0; i < (int)params.nr_progs; i++) {
+   struct bpf_prog_handler *prog =
+   params.progs[i].prog;
+   struct perf_probe_event *pevent =
+   params.progs[i].pevent;
+   int cmd_size, fd;
+   char *event_str;
+
+   err = bpf_prog_get_fd(prog, &fd);
+   if (err) {
+   pr_err("Unable to get program fd\n");
+   goto errout;
+   }
+
+   cmd_size = snprintf(NULL, 0,
+   "%s:%s|bpf_fd=%d|", pevent->group,
+   pevent->event, fd);
+
+   event_str = malloc(cmd_size + 1);
+   if (!event_str) {
+   pr_err("No enough menory\n");
+   goto errout;
+   }
+
+   snprintf(event_str, cmd_size + 1, "%s:%s|bpf_fd=%d|",
+pevent->group, pevent->event, fd);
+
+   argv[pos++] = strdup("-e");
+   if (!argv[pos - 1]) {
+   pr_err("No enough memory\n");
+   goto errout;
+   }
+
+   argv[pos++] = event_str;
+
+   pr_debug("event: -e %s\n", event_str);
+   }
+
+   for (i = 0; i < old_argc; i++) {
+   argv[pos++] = strdup(old_argv[i]);
+   if (!argv[pos - 1]) {
+   pr_err("No enough memory\n");
+   goto errout;
+   }
+   }
+
+   *pargc = pos;
+   *pargv = argv;
+   return 0;
+errout:
+   for (i = 0; (int)i < argc; i++) {
+   if (argv[i])
+   free((void *)argv[i]);
+   }
+   free(argv);
+   return err;
+}
+
 int bpf_load(void)
 {
size_t i;
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 1ccebdf..eea45f4 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -13,4 +13,12 @@ int bpf_load(void);
 
 int bpf_unprobe(void);
 void bpf_clear(void);
+
+/*
+ * Generates argv like '-e my_event|bpf_fd=5|' for wrapping other
+ * commands.
+ */
+int bpf_load_gen_argv(int *pargc, const char ***pargv,
+ int old_argc, const char **old_argv,
+ const char *arg0);
 #endif
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-15 Thread Wang Nan

This is the second version of 'perf bpf' patch series, based on
v4.1-rc3.

The goal of this series of patches is to integrate eBPF with perf.
After applying these patches, users are allowed to use following
command to load eBPF program compiled by LLVM into kernel then start
recording with filters on: 

 # perf bpf record --object sample_bpf.o -- -a sleep 4

Different from previous version (can be retrived from lkml:
https://lkml.org/lkml/2015/4/30/264 ), v2 series has following
modifications:

 1. Put common eBPF and eBPF object operations into tools/lib/bpf
instead of perf itself. Other programs, like iproute2, can
utilize libbpf for their own use.

 2. Doesn't rely on 'config' section. In v2 patch, eBPF programs
describe their probing points in their section names. 'config'
section is no longer mandatory for probing points. However, I
still leave the space of 'config' section in libbpf for further
expansion. See following descussion.

 3. Kprobe points will be removed after exiting.

 4. Redesign the logical of perf bpf command. Doesn't like v1 which
implements 'perf bpf' as a standalone perf command, in this patch
series perf bpf acts as a commmand wrapper. It loads eBPF programs
into kernel then start other perf commands based on them.
The first wrapped command is 'record'. In the example shown above,
'perf record' will start and captures filtered samples. Other
commands, like 'perf top', are possible to be wrapped in similay
way.

Because of the new logic, a design decision should be made that, are we
actually need 'perf bpf' command? Another choice is to midify
'pref report' by introducing new option like '--ebpf-object' and
make them load BPF programs before do other things. I prefer keeping
'perf bpf' to group all eBPF related stuffs together using a uniform
entry. In addition, eBPF programs can act not only as filters but also
data aggregator. It is possible to make something link 'perf bpf run'
to simply make it run, and dump result after user hit 'C-c' or timeout.
The 'config' section may be utilized in this case to let 'perf bpf'
know how to display results.

Following is detail description.

 Patch  1/37 -  4/37 are bugfixs. Some of them are already acked.

 Patch  5/37 - 25/37 creates tools/lib/bpf.

  Libbpf will be compiled into libbpf.a and libbpf.so. It can be
  devided into 2 parts:

1) User-kernel interface. The API is defined by tools/lib/bpf/bpf.h,
   encapsulates map and program loading operations. In
   bpf_load_program(), it doesn't use log buffer in the first try to
   improve performance, and retry with log buffer enabled when
   failure.

2) ELF operations. The structure of eBPF object file is defined
   here. API of this part can be found in tools/lib/bpf/libbpf.h.
   'struct bpf_map_def' is also put here.

   Libbpf's API hides internal structures. Callers access data of
   object files with handlers and accessors. 'struct bpf_object *'
   is handler of a whole object file. 'struct bpf_prog_handler *'
   is handler and iterator of programs. Some of accessors are
   defined to enable caller to retrive section name and file
   descriptor of a program. Further accessor can be appended.

   In the design of libbpf, I explictly separate full procedure
   into opening and loading phase. Data are collected during
   'opening' phase. BPF syscalls are called in 'loading' phase.
   The separation is designed for potential cross-objects
   operations. Such separation also give caller a chance to let
   him/her to adjust bytecode and/or maps before real loading.
   (API of such operation is not provided in this version).

   During loading, fields in 'struct bpf_map_def' are also swapped
   if endianess mismatched.
  
 Patch 26/37 - 37/37 are patches on perf, which introduce 'perf bpf'
 command and 'perf bpf record' subcommand.

   Like previous discussed, 'perf bpf' is not a standalone command.
   The usage should be:

 perf bpf []  --objects  -- \
 

   First two patches make 'perf bpf' avaliable and make perf depend on
   libbpf. 28/37 creates 'perf bpf record' and directly passes
   everything after '--' to cmd_record(). Other stuffs resides in
   tools/perf/utils/bpf-loader.[ch], which are introduced in 29/37.
   Following patches do collection -> probing -> loading works step
   by step. In those operations, 'perf bpf' collects all required
   objects before creating kprobe points, and load programs into kernel
   after probing finish.

   A 'bpf_unload()' is used to remove kprobe points. I use 'atexit'
   hook to ensure it called before exiting. However, I find that
   atexit hookers are not always work well. For example, when program
   is canceled by SIGINT. Therefore we still need to call bpf_unload()
   after cmd_record().

   Patch 34 and 35 introduce a special syntax for event parsing:
   'group:na

[RFC PATCH v2 24/37] tools lib bpf: accessors of bpf_program.

2015-05-15 Thread Wang Nan

This patch introduces accessors for user of libbpf to retrive section
name and fd of a opened/loaded eBPF program. 'struct bpf_prog_handler'
is used for that purpose. Accessors of programs section name and file
descriptor are provided. Set/get private data are also impelmented.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/libbpf.c | 137 +
 tools/lib/bpf/libbpf.h |  25 +
 2 files changed, 162 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index e8ef78e..d770adc 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -79,6 +79,20 @@ void libbpf_set_print(int (*warn)(const char *format, ...),
 # define LIBBPF_ELF_C_READ_MMAP ELF_C_READ
 #endif
 
+#ifndef container_of
+# define container_of(ptr, type, member) ({\
+   const typeof( ((type *)0)->member ) *__mptr = (ptr);\
+   (type *)( (char *)__mptr - offsetof(type,member) );})
+#endif
+
+/* Accessing of project */
+struct bpf_prog_handler {
+   struct bpf_object *obj;
+
+   void *priv;
+   bpf_prog_clear_priv_t clear_priv;
+};
+
 /* 
  * bpf_prog should be a better name but it has been used in
  * linux/filter.h.
@@ -97,6 +111,7 @@ struct bpf_program {
int nr_reloc;
 
int fd;
+   struct bpf_prog_handler handler;
 };
 
 struct bpf_object {
@@ -146,6 +161,12 @@ static void bpf_clear_program(struct bpf_program *prog)
if (!prog)
return;
 
+   if (prog->handler.clear_priv)
+   prog->handler.clear_priv(&prog->handler, prog->handler.priv);
+
+   prog->handler.priv = NULL;
+   prog->handler.clear_priv = NULL;
+
bpf_unload_program(prog);
 
if (prog->section_name) {
@@ -217,6 +238,7 @@ bpf_program_alloc(struct bpf_object *obj, void *data, 
size_t size,
   prog->insns_cnt * sizeof(struct bpf_insn));
prog->idx = idx;
prog->fd = -1;
+   prog->handler.obj = obj;
 
return prog;
 out:
@@ -941,3 +963,118 @@ void bpf_close_object(struct bpf_object *obj)
}
free(obj);
 }
+
+static inline struct bpf_program *
+handler_to_prog(struct bpf_prog_handler *handler)
+{
+   struct bpf_program *prog;
+   struct bpf_object *obj;
+   size_t idx;
+
+   if (!handler)
+   goto out_warn;
+
+   obj = handler->obj;
+   prog = container_of(handler, struct bpf_program, handler);
+   idx = prog - obj->programs;
+   if (idx >= obj->nr_programs)
+   goto out_warn;
+   return &obj->programs[idx];
+
+out_warn:
+   pr_warning("invalid handler %p\n", handler);
+   return NULL;
+}
+
+struct bpf_prog_handler *
+bpf_next_prog(struct bpf_object *obj, struct bpf_prog_handler *handler)
+{
+   struct bpf_program *prog;
+   size_t idx;
+
+   if (!obj->programs)
+   return NULL;
+   /* First handler */
+   if (handler == NULL)
+   return (&obj->programs[0].handler);
+
+   if (handler->obj != obj) {
+   pr_warning("error: program handler doesn't "
+  "match object\n");
+   return NULL;
+   }
+
+   prog = handler_to_prog(handler);
+   if (!prog)
+   return NULL;
+
+   idx = (prog - obj->programs) + 1;
+   if (idx >= obj->nr_programs)
+   return NULL;
+   return &obj->programs[idx].handler;
+}
+
+int bpf_prog_set_private(struct bpf_prog_handler *handler, void *priv,
+bpf_prog_clear_priv_t clear_priv)
+{
+   struct bpf_program *prog;
+
+   prog = handler_to_prog(handler);
+   if (!prog)
+   return -EINVAL;
+
+   if (prog->handler.priv && prog->handler.clear_priv)
+   prog->handler.clear_priv(&prog->handler, prog->handler.priv);
+
+   prog->handler.priv = priv;
+   prog->handler.clear_priv = clear_priv;
+   return 0;
+}
+
+int bpf_prog_get_private(struct bpf_prog_handler *handler, void **ppriv)
+{
+   struct bpf_program *prog;
+
+   prog = handler_to_prog(handler);
+   if (!prog || !ppriv)
+   return -EINVAL;
+
+   *ppriv = prog->handler.priv;
+   return 0;
+}
+
+int bpf_prog_get_title(struct bpf_prog_handler *handler,
+  const char **ptitle, bool dup)
+{
+   struct bpf_program *prog;
+   const char *title;
+
+   prog = handler_to_prog(handler);
+   if (!prog || !ptitle)
+   return -EINVAL;
+
+   title = prog->section_name;
+   if (dup) {
+   title = strdup(title);
+   if (!title) {
+   pr_warning("failed to strdup program title\n");
+   *ptitle = NULL;
+   return -ENOMEM;
+   }
+   }
+
+   *ptitle = title;
+   return 0;
+}
+
+int bpf_prog_get_fd(struct bpf_prog_handler *handler, int *pfd)
+{
+   struct bpf_program *prog;
+
+   prog = handler_to_prog(handler);
+

[RFC PATCH v2 30/37] tools perf: collect all bpf programs.

2015-05-15 Thread Wang Nan

This patch collects 'struct bpf_prog_handler *' after opening an object
file. Handlers are stored into an array of MAX_PROBES slots.

Signed-off-by: Wang Nan 
---
 tools/perf/util/bpf-loader.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 17cd2b6..67bfb62 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -8,6 +8,7 @@
 #include "perf.h"
 #include "debug.h"
 #include "bpf-loader.h"
+#include "probe-finder.h" // for MAX_PROBES
 
 #define DEFINE_PRINT_FN(name, level) \
 static int libbpf_##name(const char *fmt, ...) \
@@ -32,11 +33,17 @@ static bool libbpf_inited = false;
 struct {
struct bpf_object *objects[MAX_OBJECTS];
size_t nr_objects;
+
+   struct {
+   struct bpf_prog_handler *prog;
+   } progs[MAX_PROBES];
+   size_t nr_progs;
 } params;
 
 int bpf_prepare_load(const char *filename)
 {
struct bpf_object *obj;
+   struct bpf_prog_handler *prog;
 
if (!libbpf_inited)
libbpf_set_print(libbpf_warning,
@@ -57,6 +64,24 @@ int bpf_prepare_load(const char *filename)
}
 
params.objects[params.nr_objects++] = obj;
+
+   bpf_obj_for_each_prog(obj, prog) {
+   const char *title;
+
+   if (params.nr_progs + 1 > MAX_PROBES) {
+   pr_err("Too many programs. "
+  "Increase MAX_PROBES\n");
+   return -EMFILE;
+   }
+
+   params.progs[params.nr_progs++].prog = prog;
+
+   if (bpf_prog_get_title(prog, &title, false)) {
+   pr_err("Unable to get title of a program\n");
+   return -EINVAL;
+   }
+   pr_debug("bpf: add program '%s'\n", title);
+   }
return 0;
 }
 
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 29/37] tools perf: add bpf-loader and open elf object files.

2015-05-15 Thread Wang Nan

bpf_prepare_laod() is used to open each eBPF object files. The
returned handlers are stored into an array. A corresponding bpf_clear()
is introduced to free all resources.

For the propose of logging, 3 printing functions are defined using
DEFINE_PRINT_FN, which require a veprintf to process va_list args.

Signed-off-by: Wang Nan 
---
 tools/perf/builtin-bpf.c | 12 +++-
 tools/perf/util/Build|  1 +
 tools/perf/util/bpf-loader.c | 69 
 tools/perf/util/bpf-loader.h | 13 +
 tools/perf/util/debug.c  |  5 
 tools/perf/util/debug.h  |  1 +
 6 files changed, 100 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/util/bpf-loader.c
 create mode 100644 tools/perf/util/bpf-loader.h

diff --git a/tools/perf/builtin-bpf.c b/tools/perf/builtin-bpf.c
index b5c613f..e922921 100644
--- a/tools/perf/builtin-bpf.c
+++ b/tools/perf/builtin-bpf.c
@@ -12,6 +12,7 @@
 #include "perf.h"
 #include "debug.h"
 #include "parse-options.h"
+#include "bpf-loader.h"
 
 typedef int (*bpf_cmd_fn_t)(int argc, const char **argv, const char *prefix);
 
@@ -124,6 +125,7 @@ static int cmd_bpf_record(int argc, const char **argv,
OPT_END()
};
struct str_node *str_node;
+   int err;
 
argc = parse_options(argc, argv, options,
 bpf_record_usage, PARSE_OPT_KEEP_DASHDASH);
@@ -144,8 +146,16 @@ static int cmd_bpf_record(int argc, const char **argv,
argc--;
argv++;
 
-   strlist__for_each(str_node, param.object_file_names)
+   strlist__for_each(str_node, param.object_file_names) {
+   const char *fn = str_node->s;
+
pr_debug("loading %s\n", str_node->s);
+   if ((err = bpf_prepare_load(fn))) {
+   pr_err("bpf: failed to load object file %s\n",
+  fn);
+   return -1;
+   }
+   }
 
return start_bpf_record(argc, argv);
 usage:
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 797490a..609f6d6 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -75,6 +75,7 @@ libperf-$(CONFIG_X86) += tsc.o
 libperf-y += cloexec.o
 libperf-y += thread-stack.o
 
+libperf-$(CONFIG_LIBELF) += bpf-loader.o
 libperf-$(CONFIG_LIBELF) += symbol-elf.o
 libperf-$(CONFIG_LIBELF) += probe-event.o
 
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
new file mode 100644
index 000..17cd2b6
--- /dev/null
+++ b/tools/perf/util/bpf-loader.c
@@ -0,0 +1,69 @@
+/*
+ * bpf-loader.c
+ *
+ * Load bpf files into kernel using libbpf; create kprobe events.
+ */
+
+#include 
+#include "perf.h"
+#include "debug.h"
+#include "bpf-loader.h"
+
+#define DEFINE_PRINT_FN(name, level) \
+static int libbpf_##name(const char *fmt, ...) \
+{  \
+   va_list args;   \
+   int ret;\
+   \
+   va_start(args, fmt);\
+   ret = veprintf(level, verbose, pr_fmt(fmt), args);\
+   va_end(args);   \
+   return ret; \
+}
+
+DEFINE_PRINT_FN(warning, 0)
+DEFINE_PRINT_FN(info, 0)
+DEFINE_PRINT_FN(debug, 1)
+
+static bool libbpf_inited = false;
+
+#define MAX_OBJECTS128
+
+struct {
+   struct bpf_object *objects[MAX_OBJECTS];
+   size_t nr_objects;
+} params;
+
+int bpf_prepare_load(const char *filename)
+{
+   struct bpf_object *obj;
+
+   if (!libbpf_inited)
+   libbpf_set_print(libbpf_warning,
+libbpf_info,
+libbpf_debug);
+
+   pr_debug("bpf: loading %s\n", filename);
+   if (params.nr_objects + 1 > MAX_OBJECTS) {
+   pr_err("Too many objects to load. "
+   "Increase MAX_OBJECTS\n");
+   return -EMFILE;
+   }
+
+   obj = bpf_open_object(filename);
+   if (!obj) {
+   pr_err("bpf: failed to load %s\n", filename);
+   return -EINVAL;
+   }
+
+   params.objects[params.nr_objects++] = obj;
+   return 0;
+}
+
+void bpf_clear(void)
+{
+   size_t i;
+
+   for (i = 0; i < params.nr_objects; i++)
+   bpf_close_object(params.objects[i]);
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
new file mode 100644
index 000..74eebc3
--- /dev/null
+++ b/tools/perf/util/bpf-loader.h
@@ -0,0 +1,13 @@
+/*
+ * Copyright (C) 2015, Wang Nan 
+ * Copyright (C) 2015, Huawei Inc.
+ *
+ * Released under the GPL v2. (and only v2, not any later version)
+ */
+#ifndef __BPF_LOADER_H
+#define __BPF_LOADER_H
+
+int bpf_prepare_load(const char *filename);
+
+void bpf_clear(void);
+#endif
diff --git a/tools/perf/util/debug.c b/tools/perf/util/debug.c
index 2da5581..86d9c73 100644
--- a/tools/perf/util/debug.c
+++ b/t

[RFC PATCH v2 15/37] tools lib bpf: collect bpf programs from object files.

2015-05-15 Thread Wang Nan

This patch collects all programs in an object file into an array of
'struct bpf_program' for further processing. That structure is for
representing each eBPF program. 'bpf_prog' should be a better name, but
it has been used by linux/filter.h. Although it is a kernel space name,
I still prefer to call it 'bpf_program' to prevent possible confusion.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/libbpf.c | 111 +
 1 file changed, 111 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2068f0b..18b221f 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -77,6 +77,18 @@ void libbpf_set_print(int (*warn)(const char *format, ...),
 # define LIBBPF_ELF_C_READ_MMAP ELF_C_READ
 #endif
 
+/* 
+ * bpf_prog should be a better name but it has been used in
+ * linux/filter.h.
+ */
+struct bpf_program {
+   /* Index in elf obj file, for relocation use. */
+   int idx;
+   char *section_name;
+   struct bpf_insn *insns;
+   size_t insns_cnt;
+};
+
 struct bpf_object {
char *path;
bool needs_swap;
@@ -86,6 +98,9 @@ struct bpf_object {
size_t maps_buf_sz;
char *config_str;
 
+   struct bpf_program *programs;
+   size_t nr_programs;
+
/*
 * Information when doing elf related work. Only valid if fd
 * is valid.
@@ -99,6 +114,79 @@ struct bpf_object {
 };
 #define obj_elf_valid(o)   ((o)->elf.fd >= 0)
 
+static void bpf_clear_program(struct bpf_program *prog)
+{
+   if (!prog)
+   return;
+   if (prog->section_name) {
+   free(prog->section_name);
+   prog->section_name = NULL;
+   }
+   if (prog->insns) {
+   free(prog->insns);
+   prog->insns = NULL;
+   }
+   prog->insns_cnt = 0;
+   prog->idx = -1;
+}
+
+static struct bpf_program *
+bpf_program_alloc(struct bpf_object *obj, void *data, size_t size,
+ char *name, int idx)
+{
+   struct bpf_program *prog, *progs;
+   int nr_progs;
+
+   if (size < sizeof(struct bpf_insn)) {
+   pr_warning("corrupted section '%s'\n", name);
+   return NULL;
+   }
+   
+   progs = obj->programs;
+   nr_progs = obj->nr_programs;
+
+   progs = realloc(progs, sizeof(*prog) * (nr_progs + 1));
+   if (!progs) {
+   /*
+* In this case the original obj->programs
+* is still valid, so don't need special treat for
+* bpf_close_object().
+*/
+   pr_warning("failed to alloc a new program '%s'\n",
+  name);
+   return NULL;
+   }
+
+   obj->programs = progs;
+
+   prog = &progs[nr_progs];
+   bzero(prog, sizeof(*prog));
+
+   obj->nr_programs = nr_progs + 1;
+
+   prog->section_name = strdup(name);
+   if (!prog->section_name) {
+   pr_warning("failed to alloc name for prog %s\n",
+  name);
+   goto out;
+   }
+
+   prog->insns = malloc(size);
+   if (!prog->insns) {
+   pr_warning("failed to alloc insns for %s\n", name);
+   goto out;
+   }
+   prog->insns_cnt = size / sizeof(struct bpf_insn);
+   memcpy(prog->insns, data,
+  prog->insns_cnt * sizeof(struct bpf_insn));
+   prog->idx = idx;
+
+   return prog;
+out:
+   bpf_clear_program(prog);
+   return NULL;
+}
+
 static struct bpf_object *__bpf_obj_alloc(const char *path)
 {
struct bpf_object *obj;
@@ -380,6 +468,22 @@ static int bpf_obj_elf_collect(struct bpf_object *obj)
err = -EEXIST;
} else
obj->elf.symbols = data;
+   } else if ((sh.sh_type == SHT_PROGBITS) &&
+  (sh.sh_flags & SHF_EXECINSTR) &&
+  (data->d_size > 0)) {
+   struct bpf_program *prog;
+
+   prog = bpf_program_alloc(obj, data->d_buf,
+data->d_size, name,
+idx);
+   if (!prog) {
+   pr_warning("failed to alloc "
+  "program %s (%s)", name,
+  obj->path);
+   err = -ENOMEM;
+   } else
+   pr_debug("found program %s\n",
+prog->section_name);
}
if (err)
goto out;
@@ -446,5 +550,12 @@ void bpf_close_object(struct bpf_object *obj)
free(obj->maps_buf);
if (obj->config_str)
free(obj->config_str);
+   if (obj->programs) {
+   size_t i;
+
+

[RFC PATCH v2 11/37] tools lib bpf: collect version and license from ELF.

2015-05-15 Thread Wang Nan

Expand bpf_obj_elf_collect() to collect license and kernel version
information in eBPF object file. eBPF object file should have a section
named 'license', which contains a string. It should also have a section
named 'version', contains a u32 LINUX_VERSION_CODE.

bpf_obj_validate() is introduced to validate object file after loaded.
Currently it only check existance of 'version' section.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/libbpf.c | 59 ++
 1 file changed, 59 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 1af80c3..b26f1ee 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -15,12 +15,22 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 
 #include "libbpf.h"
 
+#ifdef min
+# undef min
+#endif
+#define min(x, y) ({   \
+   typeof(x) _min1 = (x);  \
+   typeof(y) _min2 = (y);  \
+   (void) (&_min1 == &_min2);  \
+   _min1 < _min2 ? _min1 : _min2; })
+
 #define __printf(a, b) __attribute__((format(printf, a, b)))
 
 __printf(1, 2)
@@ -70,6 +80,8 @@ void libbpf_set_print(int (*warn)(const char *format, ...),
 struct bpf_object {
char *path;
bool needs_swap;
+   char license[64];
+   u32 kern_version;
 
/*
 * Information when doing elf related work. Only valid if fd
@@ -206,6 +218,32 @@ bpf_obj_swap_init(struct bpf_object *obj)
}
 }
 
+static int bpf_obj_license_init(struct bpf_object *obj,
+   void *data, size_t size)
+{
+   memcpy(obj->license, data,
+  min(size, sizeof(obj->license) - 1));
+   pr_debug("license of %s is %s\n", obj->path, obj->license);
+   return 0;
+}
+
+static int bpf_obj_kver_init(struct bpf_object *obj,
+void *data, size_t size)
+{
+   u32 kver;
+   if (size < sizeof(kver)) {
+   pr_warning("invalid kver section in %s\n", obj->path);
+   return -EINVAL;
+   }
+   memcpy(&kver, data, sizeof(kver));
+   if (obj->needs_swap)
+   kver = bswap_32(kver);
+   obj->kern_version = kver;
+   pr_debug("kernel version of %s is %x\n", obj->path,
+obj->kern_version);
+   return 0;
+}
+
 static int bpf_obj_elf_collect(struct bpf_object *obj)
 {
Elf *elf = obj->elf.elf;
@@ -253,11 +291,30 @@ static int bpf_obj_elf_collect(struct bpf_object *obj)
 name, (unsigned long)data->d_size,
 (int)sh.sh_link, (unsigned long)sh.sh_flags,
 (int)sh.sh_type);
+
+   if (strcmp(name, "license") == 0)
+   err = bpf_obj_license_init(obj, data->d_buf,
+  data->d_size);
+   else if (strcmp(name, "version") == 0)
+   err = bpf_obj_kver_init(obj, data->d_buf,
+   data->d_size);
+   if (err)
+   goto out;
}
 out:
return err;
 }
 
+static int bpf_obj_validate(struct bpf_object *obj)
+{
+   if (obj->kern_version == 0) {
+   pr_warning("%s doesn't provide kernel version\n",
+  obj->path);
+   return -EINVAL;
+   }
+   return 0;
+}
+
 struct bpf_object *bpf_open_object(const char *path)
 {
struct bpf_object *obj;
@@ -283,6 +340,8 @@ struct bpf_object *bpf_open_object(const char *path)
goto out;
if (bpf_obj_elf_collect(obj))
goto out;
+   if (bpf_obj_validate(obj))
+   goto out;
 
return obj;
 
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 09/37] tools lib bpf: check swap according to EHDR.

2015-05-15 Thread Wang Nan

Check endianess according to EHDR to support loading eBPF objects into
big endian machines. Code is taken from tools/perf/util/symbol-elf.c.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/libbpf.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 43c16bc..a4910a8 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -69,6 +69,7 @@ void libbpf_set_print(int (*warn)(const char *format, ...),
 
 struct bpf_object {
char *path;
+   bool needs_swap;
 
/*
 * Information when doing elf related work. Only valid if fd
@@ -109,6 +110,7 @@ static struct bpf_object *bpf_obj_alloc(const char *path)
if (!obj)
goto out;
 
+   obj->needs_swap = false;
obj->elf.fd = -1;
return obj;
 out:
@@ -179,6 +181,31 @@ errout:
return err;
 }
 
+static int
+bpf_obj_swap_init(struct bpf_object *obj)
+{
+   static unsigned int const endian = 1;
+
+   obj->needs_swap = false;
+
+   switch (obj->elf.ehdr.e_ident[EI_DATA]) {
+   case ELFDATA2LSB:
+   /* We are big endian, BPF obj is little endian. */
+   if (*(unsigned char const *)&endian != 1)
+   obj->needs_swap = true;
+   return 0;
+
+   case ELFDATA2MSB:
+   /* We are little endian, BPF obj is big endian. */
+   if (*(unsigned char const *)&endian != 0)
+   obj->needs_swap = true;
+   return 0;
+
+   default:
+   return -EINVAL;
+   }
+}
+
 struct bpf_object *bpf_open_object(const char *path)
 {
struct bpf_object *obj;
@@ -200,6 +227,8 @@ struct bpf_object *bpf_open_object(const char *path)
 
if (bpf_obj_elf_init(obj))
goto out;
+   if (bpf_obj_swap_init(obj))
+   goto out;
 
return obj;
 
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 08/37] tools lib bpf: open eBPF object file and do basic validation.

2015-05-15 Thread Wang Nan

This patch adds basic 'struct bpf_object' which will be used for eBPF
object file loading. eBPF object files are compiled by LLVM as ELF
format. In this patch, libelf is used to open those files, read EHDR
and do basic validation according to e_type and e_machine.

All elf related staffs are grouped together and reside in elf field of
'struct bpf_object'. bpf_obj_clear_elf() is introduced to clear it.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/libbpf.c | 156 -
 1 file changed, 154 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index f8decff..43c16bc 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -12,9 +12,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "libbpf.h"
 
@@ -58,12 +61,161 @@ void libbpf_set_print(int (*warn)(const char *format, ...),
__pr_debug = debug;
 }
 
-struct bpf_object *bpf_open_object(const char *path)
+#ifdef HAVE_LIBELF_MMAP_SUPPORT
+# define LIBBPF_ELF_C_READ_MMAP ELF_C_READ_MMAP
+#else
+# define LIBBPF_ELF_C_READ_MMAP ELF_C_READ
+#endif
+
+struct bpf_object {
+   char *path;
+
+   /*
+* Information when doing elf related work. Only valid if fd
+* is valid.
+*/
+   struct {
+   int fd;
+   Elf *elf;
+   GElf_Ehdr ehdr;
+   } elf;
+};
+#define obj_elf_valid(o)   ((o)->elf.fd >= 0)
+
+static struct bpf_object *__bpf_obj_alloc(const char *path)
+{
+   struct bpf_object *obj;
+
+   obj = calloc(1, sizeof(struct bpf_object));
+   if (!obj) {
+   pr_warning("alloc memory failed for %s\n", path);
+   return NULL;
+   }
+
+   obj->path = strdup(path);
+   if (!obj->path) {
+   pr_warning("failed to strdup '%s'\n", path);
+   free(obj);
+   return NULL;
+   }
+   return obj;
+}
+
+static struct bpf_object *bpf_obj_alloc(const char *path)
 {
+   struct bpf_object *obj;
+
+   obj = __bpf_obj_alloc(path);
+   if (!obj)
+   goto out;
+
+   obj->elf.fd = -1;
+   return obj;
+out:
+   bpf_close_object(obj);
return NULL;
 }
 
-void bpf_close_object(struct bpf_object *object)
+static void bpf_obj_clear_elf(struct bpf_object *obj)
+{
+   if (!obj_elf_valid(obj))
+   return;
+
+   if (obj->elf.elf) {
+   elf_end(obj->elf.elf);
+   obj->elf.elf = NULL;
+   }
+   if (obj->elf.fd >= 0) {
+   close(obj->elf.fd);
+   obj->elf.fd = -1;
+   }
+}
+
+static int bpf_obj_elf_init(struct bpf_object *obj)
 {
+   int err = 0;
+   GElf_Ehdr *ep;
+
+   if (obj_elf_valid(obj)) {
+   pr_warning("elf init: internal error\n");
+   return -EEXIST;
+   }
+   
+   obj->elf.fd = open(obj->path, O_RDONLY);
+   if (obj->elf.fd < 0) {
+   pr_warning("failed to open %s: %s\n", obj->path,
+   strerror(errno));
+   return -errno;
+   }
+
+   obj->elf.elf = elf_begin(obj->elf.fd,
+LIBBPF_ELF_C_READ_MMAP,
+NULL);
+   if (!obj->elf.elf) {
+   pr_warning("failed to open %s as ELF file\n",
+   obj->path);
+   err = -EINVAL;
+   goto errout;
+   }
+
+   if (!gelf_getehdr(obj->elf.elf, &obj->elf.ehdr)) {
+   pr_warning("failed to get EHDR from %s\n",
+   obj->path);
+   err = -EINVAL;
+   goto errout;
+   }
+   ep = &obj->elf.ehdr;
+
+   if ((ep->e_type != ET_REL) || (ep->e_machine != 0)) {
+   pr_warning("%s is not an eBPF object file\n",
+   obj->path);
+   err = -EINVAL;
+   goto errout;
+   }
+
return 0;
+errout:
+   bpf_obj_clear_elf(obj);
+   return err;
+}
+
+struct bpf_object *bpf_open_object(const char *path)
+{
+   struct bpf_object *obj;
+
+   /* param validation */
+   if (!path)
+   return NULL;
+
+   pr_debug("loading %s\n", path);
+
+   if (elf_version(EV_CURRENT) == EV_NONE) {
+   pr_warning("failed to init libelf for %s\n", path);
+   return NULL;
+   }
+
+   obj = bpf_obj_alloc(path);
+   if (!obj)
+   return NULL;
+
+   if (bpf_obj_elf_init(obj))
+   goto out;
+
+   return obj;
+
+out:
+   bpf_close_object(obj);
+   return NULL;
+}
+
+void bpf_close_object(struct bpf_object *obj)
+{
+   if (!obj)
+   return;
+
+   bpf_obj_clear_elf(obj);
+
+   if (obj->path)
+   free(obj->path);
+   free(obj);
 }
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a

[RFC PATCH v2 10/37] tools lib bpf: iterater over elf sections to collect information.

2015-05-15 Thread Wang Nan

bpf_obj_elf_collect() is introduced to iterate over each elf sections
to collection informations in eBPF object files. This function will
futher enhanced to collect license, kernel version, programs, configs
and map information.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/libbpf.c | 54 ++
 1 file changed, 54 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index a4910a8..1af80c3 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -206,6 +206,58 @@ bpf_obj_swap_init(struct bpf_object *obj)
}
 }
 
+static int bpf_obj_elf_collect(struct bpf_object *obj)
+{
+   Elf *elf = obj->elf.elf;
+   GElf_Ehdr *ep = &obj->elf.ehdr;
+   Elf_Scn *scn = NULL;
+   int idx = 0, err = 0;
+
+   /* Elf is corrupted/truncated, avoid calling elf_strptr. */
+   if (!elf_rawdata(elf_getscn(elf, ep->e_shstrndx), NULL)) {
+   pr_warning("failed to get e_shstrndx from %s\n",
+  obj->path);
+   return -EINVAL;
+   }
+
+   while ((scn = elf_nextscn(elf, scn)) != NULL) {
+   char *name;
+   GElf_Shdr sh;
+   Elf_Data *data;
+
+   idx++;
+   if (gelf_getshdr(scn, &sh) != &sh) {
+   pr_warning("failed to get section header"
+  " from %s\n", obj->path);
+   err = -EINVAL;
+   goto out;
+   }
+
+   name = elf_strptr(elf, ep->e_shstrndx, sh.sh_name);
+   if (!name) {
+   pr_warning("failed to get section name "
+  "from %s\n", obj->path);
+   err = -EINVAL;
+   goto out;
+   }
+
+   data = elf_getdata(scn, 0);
+   if (!data) {
+   pr_warning("failed to get section data "
+  "from %s(%s)\n", name, obj->path);
+   err = -EINVAL;
+   goto out;
+   }
+   pr_debug("section %s, size %ld, link %d, "
+"flags %lx, type=%d\n",
+name, (unsigned long)data->d_size,
+(int)sh.sh_link, (unsigned long)sh.sh_flags,
+(int)sh.sh_type);
+   }
+out:
+   return err;
+}
+
 struct bpf_object *bpf_open_object(const char *path)
 {
struct bpf_object *obj;
@@ -229,6 +281,8 @@ struct bpf_object *bpf_open_object(const char *path)
goto out;
if (bpf_obj_swap_init(obj))
goto out;
+   if (bpf_obj_elf_collect(obj))
+   goto out;
 
return obj;
 
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 32/37] tools perf bpf: probe at kprobe points.

2015-05-15 Thread Wang Nan

In this patch, kprobe points are created using add_perf_probe_events.
Since all events are already grouped together in an array, calling
add_perf_probe_events() once creates all of them.

To ensure recover the system when existing, a bpf_unprobe() is also
provided and hooked to atexit(). Because all of events are in group
"perf_bpf_probe" (PERF_BPF_PROBE_GROUP), use 'perf_bpf_probe:*' string
to remove all of them should be a simple method. However, this also
introduces a constrain that only one instance of 'perf bpf' is allowed
to be active.

Due to the atexit hook, this patch must ensure bpf_unprobe() won't
error if it has been executed. A global flag 'is_probing' is used to
track probing state. bpf_unprobe() will do nothing if it is unset.

Signed-off-by: Wang Nan 
---
 tools/perf/builtin-bpf.c |  8 
 tools/perf/util/bpf-loader.c | 48 
 tools/perf/util/bpf-loader.h |  2 ++
 3 files changed, 58 insertions(+)

diff --git a/tools/perf/builtin-bpf.c b/tools/perf/builtin-bpf.c
index e922921..95e0c65 100644
--- a/tools/perf/builtin-bpf.c
+++ b/tools/perf/builtin-bpf.c
@@ -157,10 +157,18 @@ static int cmd_bpf_record(int argc, const char **argv,
}
}
 
+   if (bpf_probe()) {
+   pr_err("bpf: failed to probe\n");
+   goto errout;
+   }
+
return start_bpf_record(argc, argv);
 usage:
usage_with_options(bpf_record_usage, options);
return -1;
+errout:
+   bpf_clear();
+   return -1;
 }
 
 
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 3dc9b61..c820d1a 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -160,9 +160,57 @@ void bpf_clear(void)
 {
size_t i;
 
+   bpf_unprobe();
for (i = 0; i < params.nr_events; i++)
clear_perf_probe_event(¶ms.event_array[i]);
 
for (i = 0; i < params.nr_objects; i++)
bpf_close_object(params.objects[i]);
 }
+
+static bool is_probing = false;
+
+int bpf_unprobe(void)
+{
+   struct strlist *dellist;
+   int ret;
+
+   if (!is_probing)
+   return 0;
+
+   dellist = strlist__new(true, PERF_BPF_PROBE_GROUP ":*");
+   if (!dellist) {
+   pr_err("Failed to create dellist when unprobing\n");
+   return -ENOMEM;
+   }
+
+   ret = del_perf_probe_events(dellist);
+   strlist__delete(dellist);
+   if (ret < 0)
+   pr_err("  Error: failed to delete events: %s\n",
+   strerror(-ret));
+   else
+   is_probing = false;
+   return ret < 0 ? ret : 0;
+}
+
+static void bpf_unprobe_atexit(void)
+{
+   bpf_unprobe();
+}
+
+int bpf_probe(void)
+{
+   int err = add_perf_probe_events(params.event_array,
+   params.nr_events,
+   MAX_PROBES, 0);
+   /* add_perf_probe_events return negative when fail */
+   if (err < 0)
+   pr_err("bpf probe: failed to probe events\n");
+   else {
+   is_probing = true;
+   atexit(bpf_unprobe_atexit);
+   }
+
+   return err < 0 ? err : 0;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 74eebc3..30dea2e 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -8,6 +8,8 @@
 #define __BPF_LOADER_H
 
 int bpf_prepare_load(const char *filename);
+int bpf_probe(void);
+int bpf_unprobe(void);
 
 void bpf_clear(void);
 #endif
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 03/37] tools build: Allow other override features to check.

2015-05-15 Thread Wang Nan

Replace strong binding of FEATURE_TESTS and FEATURE_DISPLAY by weak
binding. This patch enables other makefiles which include
tools/build/Makefile.feature enable only limited feathres to check.

Signed-off-by: Wang Nan 
---
 tools/build/Makefile.feature | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index 3a0b0ca..2975632 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -27,7 +27,7 @@ endef
 #   the rule that uses them - an example for that is the 'bionic'
 #   feature check. ]
 #
-FEATURE_TESTS =\
+FEATURE_TESTS ?=   \
backtrace   \
dwarf   \
fortify-source  \
@@ -53,7 +53,7 @@ FEATURE_TESTS =   \
zlib\
lzma
 
-FEATURE_DISPLAY =  \
+FEATURE_DISPLAY ?= \
dwarf   \
glibc   \
gtk2\
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 07/37] tools lib bpf: defines basic interface.

2015-05-15 Thread Wang Nan

bpf_open_object() and bpf_close_object() are open and close function of
eBPF object files. 'struct bpf_object' will be handler of one object
file. Its internal structure is hide to user.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/libbpf.c | 11 +++
 tools/lib/bpf/libbpf.h |  8 
 2 files changed, 19 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index d7a7869..f8decff 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -56,3 +57,13 @@ void libbpf_set_print(int (*warn)(const char *format, ...),
__pr_info = info;
__pr_debug = debug;
 }
+
+struct bpf_object *bpf_open_object(const char *path)
+{
+   return NULL;
+}
+
+void bpf_close_object(struct bpf_object *object)
+{
+   return 0;
+}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index eb306c0..e523ae9 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -9,8 +9,16 @@
 #ifndef __BPF_LIBBPF_H
 #define __BPF_LIBBPF_H
 
+#include 
+
 void libbpf_set_print(int (*warn)(const char *format, ...),
  int (*info)(const char *format, ...),
  int (*debug)(const char *format, ...));
 
+/* Hide internal to user */
+struct bpf_object;
+
+struct bpf_object *bpf_open_object(const char *path);
+void bpf_close_object(struct bpf_object *object);
+
 #endif
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 25/37] tools lib bpf: accessors for struct bpf_object.

2015-05-15 Thread Wang Nan

This patch add an accessor which allows caller to get count of programs
in an object file.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/libbpf.c | 9 +
 tools/lib/bpf/libbpf.h | 3 +++
 2 files changed, 12 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index d770adc..89c725a 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -964,6 +964,15 @@ void bpf_close_object(struct bpf_object *obj)
free(obj);
 }
 
+int bpf_obj_get_prog_cnt(struct bpf_object *obj, size_t *pcnt)
+{
+   if (!obj || !pcnt)
+   return -EINVAL;
+
+   *pcnt = obj->nr_programs;
+   return 0;
+}
+
 static inline struct bpf_program *
 handler_to_prog(struct bpf_prog_handler *handler)
 {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index b451d1a..31ff5d9 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -26,6 +26,9 @@ void bpf_close_object(struct bpf_object *object);
 int bpf_load_object(struct bpf_object *obj);
 int bpf_unload_object(struct bpf_object *obj);
 
+/* Accessors of bpf_object */
+int bpf_obj_get_prog_cnt(struct bpf_object *obj, size_t *pcnt);
+
 /* Accessors of bpf_program. */
 struct bpf_prog_handler;
 struct bpf_prog_handler *bpf_next_prog(struct bpf_object *obj,
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 23/37] tools lib bpf: load bpf programs in object file into kernel.

2015-05-15 Thread Wang Nan

This patch utilizes previous introduced bpf_load_program to load
programs in the ELF file into kernel. Result is stored in 'fd' field
in 'struct bpf_program'.

During loading, it allocs a log buffer and free it before return.
Note that that buffer is not passed to bpf_load_program() if the first
loading try is successful. Doesn't use a statically allocated log
buffer to avoid potention multi-thread problem.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/libbpf.c | 79 ++
 1 file changed, 79 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 5e25ea7..e8ef78e 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -95,6 +95,8 @@ struct bpf_program {
int map_idx;
} *reloc_desc;
int nr_reloc;
+
+   int fd;
 };
 
 struct bpf_object {
@@ -128,10 +130,24 @@ struct bpf_object {
 };
 #define obj_elf_valid(o)   ((o)->elf.fd >= 0)
 
+static void bpf_unload_program(struct bpf_program *prog)
+{
+   if (!prog)
+   return;
+
+   if (prog->fd >= 0) {
+   close(prog->fd);
+   prog->fd = -1;
+   }
+}
+
 static void bpf_clear_program(struct bpf_program *prog)
 {
if (!prog)
return;
+
+   bpf_unload_program(prog);
+
if (prog->section_name) {
free(prog->section_name);
prog->section_name = NULL;
@@ -200,6 +216,7 @@ bpf_program_alloc(struct bpf_object *obj, void *data, 
size_t size,
memcpy(prog->insns, data,
   prog->insns_cnt * sizeof(struct bpf_insn));
prog->idx = idx;
+   prog->fd = -1;
 
return prog;
 out:
@@ -756,6 +773,59 @@ static int bpf_obj_collect_reloc(struct bpf_object *obj)
return 0;
 }
 
+static int
+bpf_obj_load_prog(struct bpf_object *obj, struct bpf_program *prog)
+{
+   int fd, err;
+   char *log_buf;
+
+   log_buf = malloc(BPF_LOG_BUF_SIZE);
+   if (!log_buf)
+   pr_warning("Alloc log buffer for bpf loader error, "
+  "continue without log\n");
+
+   fd = bpf_load_program(BPF_PROG_TYPE_KPROBE, prog->insns,
+ prog->insns_cnt, obj->license,
+ obj->kern_version, log_buf,
+ BPF_LOG_BUF_SIZE);
+
+   if (fd >= 0) {
+   prog->fd = fd;
+   pr_debug("load bpf program '%s': fd = %d\n",
+prog->section_name, prog->fd);
+   err = 0;
+   goto out;
+   }
+
+   err = -EINVAL;
+   pr_warning("load bpf program '%s' failed: %s\n",
+  prog->section_name, strerror(errno));
+
+   if (log_buf)
+   pr_warning("bpf: load: failed to load program '%s':\n"
+  "-- BEGIN DUMP LOG ---\n%s\n-- END LOG --\n",
+   prog->section_name, log_buf);
+
+out:
+   if (log_buf)
+   free(log_buf);
+   return err;
+}
+
+static int
+bpf_obj_load_progs(struct bpf_object *obj)
+{
+   size_t i;
+   int err;
+
+   for (i = 0; i < obj->nr_programs; i++) {
+   err = bpf_obj_load_prog(obj, &obj->programs[i]);
+   if (err)
+   return err;
+   }
+   return 0;
+}
+
 static int bpf_obj_validate(struct bpf_object *obj)
 {
if (obj->kern_version == 0) {
@@ -819,6 +889,13 @@ int bpf_unload_object(struct bpf_object *obj)
free(obj->maps_fds);
}
 
+   if (obj->programs) {
+   size_t i;
+
+   for (i = 0; i < obj->nr_programs; i++)
+   bpf_unload_program(&obj->programs[i]);
+   }
+
return 0;
 }
 
@@ -831,6 +908,8 @@ int bpf_load_object(struct bpf_object *obj)
goto out;
if (bpf_obj_relocate(obj))
goto out;
+   if (bpf_obj_load_progs(obj))
+   goto out;
 
return 0;
 out:
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 18/37] tools lib bpf: clean elf memory after loading.

2015-05-15 Thread Wang Nan

After all eBPF programs in an object file are loaded, related ELF
information is useless. Close the object file and free those memory.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/libbpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ded96cb..9ed8cca 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -687,8 +687,8 @@ struct bpf_object *bpf_open_object(const char *path)
if (bpf_obj_validate(obj))
goto out;
 
+   bpf_obj_clear_elf(obj);
return obj;
-
 out:
bpf_close_object(obj);
return NULL;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 04/37] tools include: add __aligned_u64 to types.h.

2015-05-15 Thread Wang Nan

Following patches will introduce linux/bpf.h to a new libbpf library,
which requires definition of __aligned_u64. This patch add it to the
common types.h for tools.

Signed-off-by: Wang Nan 
---
 tools/include/linux/types.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tools/include/linux/types.h b/tools/include/linux/types.h
index b5cf25e..10a2cdc 100644
--- a/tools/include/linux/types.h
+++ b/tools/include/linux/types.h
@@ -60,6 +60,11 @@ typedef __u32 __bitwise __be32;
 typedef __u64 __bitwise __le64;
 typedef __u64 __bitwise __be64;
 
+/* Taken from uapi/linux/types.h. Required by linux/bpf.h */
+#ifndef __aligned_u64
+# define __aligned_u64 __u64 __attribute__((aligned(8)))
+#endif
+
 struct list_head {
struct list_head *next, *prev;
 };
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 21/37] tools lib bpf: relocation programs.

2015-05-15 Thread Wang Nan

If an eBPF program access a map, LLVM generates a relocated load
instruction. To enable the usage of that map, relocation must be done
by replacing original instructions by map loading instructions.

Based on relocation description collected during 'opening' phase, this
patch replaces the instructions with map loading with correct map fd.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/libbpf.c | 48 
 1 file changed, 48 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 6ff4cb6..5e25ea7 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -676,6 +676,52 @@ bpf_obj_create_maps(struct bpf_object *obj)
return 0;
 }
 
+static int
+bpf_program_relocate(struct bpf_object *obj, struct bpf_program *prog)
+{
+   int i;
+
+   if (!prog || !prog->reloc_desc)
+   return 0;
+
+   for (i = 0; i < prog->nr_reloc; i++) {
+   int insn_idx, map_idx;
+   struct bpf_insn *insns = prog->insns;
+
+   insn_idx = prog->reloc_desc[i].insn_idx;
+   map_idx = prog->reloc_desc[i].map_idx;
+
+   if (insn_idx >= (int)prog->insns_cnt) {
+   pr_warning("relocation out of range: '%s'\n",
+  prog->section_name);
+   return -ERANGE;
+   }
+   insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
+   insns[insn_idx].imm = obj->maps_fds[map_idx];
+   }
+
+   return 0;
+}
+
+static int
+bpf_obj_relocate(struct bpf_object *obj)
+{
+   struct bpf_program *prog;
+   size_t i;
+   int err;
+
+   for (i = 0; i < obj->nr_programs; i++) {
+   prog = &obj->programs[i];
+
+   if ((err = bpf_program_relocate(obj, prog))) {
+   pr_warning("failed to relocate '%s'\n",
+  prog->section_name);
+   return err;
+   }
+   }
+   return 0;
+}
+
 static int bpf_obj_collect_reloc(struct bpf_object *obj)
 {
int i, err;
@@ -783,6 +829,8 @@ int bpf_load_object(struct bpf_object *obj)
 
if (bpf_obj_create_maps(obj))
goto out;
+   if (bpf_obj_relocate(obj))
+   goto out;
 
return 0;
 out:
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 27/37] tools perf: make perf depend on libbpf.

2015-05-15 Thread Wang Nan

By adding libbpf into perf's Makefile, this patch enable perf to
build libbpf during building if libelf is found and NO_LIBELF is not
set. The newly introduced code is similar to libapi and libtraceevent
building in Makefile.perf.

Signed-off-by: Wang Nan 
---
 tools/perf/Makefile.perf | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index c43a205..c69821c 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -123,6 +123,7 @@ STRIP   = strip
 
 LIB_DIR  = $(srctree)/tools/lib/api/
 TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/
+BPF_DIR = $(srctree)/tools/lib/bpf/
 
 # include config/Makefile by default and rule out
 # non-config cases
@@ -158,6 +159,7 @@ strip-libs = $(filter-out -l%,$(1))
 
 ifneq ($(OUTPUT),)
   TE_PATH=$(OUTPUT)
+  BPF_PATH=$(OUTPUT)
 ifneq ($(subdir),)
   LIB_PATH=$(OUTPUT)/../lib/api/
 else
@@ -166,6 +168,7 @@ endif
 else
   TE_PATH=$(TRACE_EVENT_DIR)
   LIB_PATH=$(LIB_DIR)
+  BPF_PATH=$(BPF_DIR)
 endif
 
 LIBTRACEEVENT = $(TE_PATH)libtraceevent.a
@@ -174,6 +177,9 @@ export LIBTRACEEVENT
 LIBAPI = $(LIB_PATH)libapi.a
 export LIBAPI
 
+LIBBPF = $(BPF_PATH)/libbpf.a
+export LIBBPF
+
 # python extension build directories
 PYTHON_EXTBUILD := $(OUTPUT)python_ext_build/
 PYTHON_EXTBUILD_LIB := $(PYTHON_EXTBUILD)lib/
@@ -225,6 +231,11 @@ export PERL_PATH
 LIB_FILE=$(OUTPUT)libperf.a
 
 PERFLIBS = $(LIB_FILE) $(LIBAPI) $(LIBTRACEEVENT)
+ifndef NO_LIBELF
+  ifeq ($(feature-libelf), 1)
+PERFLIBS += $(LIBBPF)
+  endif
+endif
 
 # We choose to avoid "if .. else if .. else .. endif endif"
 # because maintaining the nesting to match is a pain.  If
@@ -387,6 +398,13 @@ $(LIBAPI)-clean:
$(call QUIET_CLEAN, libapi)
$(Q)$(MAKE) -C $(LIB_DIR) O=$(OUTPUT) clean >/dev/null
 
+$(LIBBPF): FORCE
+   $(Q)$(MAKE) -C $(BPF_DIR) O=$(OUTPUT) $(OUTPUT)libbpf.a
+
+$(LIBBPF)-clean:
+   $(call QUIET_CLEAN, libbpf)
+   $(Q)$(MAKE) -C $(BPF_DIR) O=$(OUTPUT) clean >/dev/null
+
 help:
@echo 'Perf make targets:'
@echo '  doc- make *all* documentation (see below)'
@@ -525,7 +543,7 @@ config-clean:
$(call QUIET_CLEAN, config)
$(Q)$(MAKE) -C $(srctree)/tools/build/feature/ clean >/dev/null
 
-clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean config-clean
+clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean $(LIBBPF)-clean config-clean
$(call QUIET_CLEAN, core-objs)  $(RM) $(LIB_FILE) $(OUTPUT)perf-archive 
$(OUTPUT)perf-with-kcore $(LANG_BINDINGS)
$(Q)find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name 
'\.*.d' -delete
$(Q)$(RM) .config-detected
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 31/37] tools perf: config probe points of eBPF programs during prepartion.

2015-05-15 Thread Wang Nan

This patch parses section name of each program, and creates
corresponding 'struct perf_probe_event' structure.

parse_perf_probe_command() is used to do the main parsing works.
Parsing result is stored into a global array. This is because
add_perf_probe_events() is non-reentrantable. In following patch,
add_perf_probe_events will be introduced to insert kprobes. It accepts
an array of 'struct perf_probe_event' and do works in one call.

Define PERF_BPF_PROBE_GROUP as "perf_bpf_probe", which will be used
as group name of all eBPF probing points.

Signed-off-by: Wang Nan 
---
 tools/perf/util/bpf-loader.c | 74 
 1 file changed, 74 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 67bfb62..3dc9b61 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -29,6 +29,7 @@ DEFINE_PRINT_FN(debug, 1)
 static bool libbpf_inited = false;
 
 #define MAX_OBJECTS128
+#define PERF_BPF_PROBE_GROUP "perf_bpf_probe"
 
 struct {
struct bpf_object *objects[MAX_OBJECTS];
@@ -36,10 +37,78 @@ struct {
 
struct {
struct bpf_prog_handler *prog;
+   struct perf_probe_event *pevent;
} progs[MAX_PROBES];
size_t nr_progs;
+
+   struct perf_probe_event event_array[MAX_PROBES];
+   size_t nr_events;
 } params;
 
+static struct perf_probe_event *
+alloc_perf_probe_event(void)
+{
+   struct perf_probe_event *pev;
+   int n = params.nr_events;
+
+   if (n >= MAX_PROBES) {
+   pr_err("bpf: too many events, increase MAX_PROBES\n");
+   return NULL;
+   }
+
+   params.nr_events = n + 1;
+   pev = ¶ms.event_array[n];
+   bzero(pev, sizeof(*pev));
+   return pev;
+}
+
+static int
+bpf_do_config(size_t prog_idx, const char *config_str)
+{
+   struct perf_probe_event *pev = alloc_perf_probe_event();
+   int err = 0;
+   
+   if (!pev)
+   return -ENOMEM;
+
+   if ((err = parse_perf_probe_command(config_str, pev)) < 0) {
+   pr_err("bpf config: %s is not a valid config string\n",
+   config_str);
+   /* parse failed, don't need clear pev. */
+   return -EINVAL;
+   }
+
+   if (pev->group && strcmp(pev->group, PERF_BPF_PROBE_GROUP)) {
+   pr_err("bpf config: '%s': group for event is set "
+  "and not '%s'.\n", config_str,
+  PERF_BPF_PROBE_GROUP);
+   err = -EINVAL;
+   goto errout;
+   } else if (!pev->group)
+   pev->group = strdup(PERF_BPF_PROBE_GROUP);
+
+   if (!pev->group) {
+   pr_err("bpf config: strdup failed\n");
+   err = -ENOMEM;
+   goto errout;
+   }
+
+   if (!pev->event) {
+   pr_err("bpf config: '%s': event name is missing\n",
+   config_str);
+   err = -EINVAL;
+   goto errout;
+   }
+
+   pr_debug("bpf config: config '%s' ok\n", config_str);
+   params.progs[prog_idx].pevent = pev;
+   return 0;
+errout:
+   if (pev)
+   clear_perf_probe_event(pev);
+   return err;
+}
+
 int bpf_prepare_load(const char *filename)
 {
struct bpf_object *obj;
@@ -81,6 +150,8 @@ int bpf_prepare_load(const char *filename)
return -EINVAL;
}
pr_debug("bpf: add program '%s'\n", title);
+
+   bpf_do_config(params.nr_progs - 1, title);
}
return 0;
 }
@@ -89,6 +160,9 @@ void bpf_clear(void)
 {
size_t i;
 
+   for (i = 0; i < params.nr_events; i++)
+   clear_perf_probe_event(¶ms.event_array[i]);
+
for (i = 0; i < params.nr_objects; i++)
bpf_close_object(params.objects[i]);
 }
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 02/37] tools lib traceevent: install libtraceevent.a into libdir.

2015-05-15 Thread Wang Nan

Before this patch, 'make install' installs libraries into bindir:

  $ make install DESTDIR=./tree
   INSTALL  trace_plugins
   INSTALL  libtraceevent.a
   INSTALL  libtraceevent.so
  $ find ./tree
   ./tree/
   ./tree/usr
   ./tree/usr/local
   ./tree/usr/local/bin
   ./tree/usr/local/bin/libtraceevent.a
   ./tree/usr/local/bin/libtraceevent.so
   ...

/usr/local/lib( or lib64) should be a better place.

This patch replaces 'bin' with libdir. For __LP64__ building, libraries
are installed to /usr/local/lib64. For other building, to
/usr/local/lib instead.

After applying this patch:

  $ make install DESTDIR=./tree
   INSTALL  trace_plugins
   INSTALL  libtraceevent.a
   INSTALL  libtraceevent.so
  $ find ./tree
   ./tree
   ./tree/usr
   ./tree/usr/local
   ./tree/usr/local/lib64
   ./tree/usr/local/lib64/libtraceevent.a
   ./tree/usr/local/lib64/traceevent
   ./tree/usr/local/lib64/traceevent/plugins
   ./tree/usr/local/lib64/traceevent/plugins/plugin_mac80211.so
   ./tree/usr/local/lib64/traceevent/plugins/plugin_hrtimer.so
   ...
   ./tree/usr/local/lib64/libtraceevent.so

Signed-off-by: Wang Nan 
---
 tools/lib/traceevent/Makefile | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/tools/lib/traceevent/Makefile b/tools/lib/traceevent/Makefile
index d410da3..8464039 100644
--- a/tools/lib/traceevent/Makefile
+++ b/tools/lib/traceevent/Makefile
@@ -34,9 +34,15 @@ INSTALL = install
 DESTDIR ?=
 DESTDIR_SQ = '$(subst ','\'',$(DESTDIR))'
 
+LP64 := $(shell echo __LP64__ | ${CC} ${CFLAGS} -E -x c - | tail -n 1)
+ifeq ($(LP64), 1)
+  libdir_relative = lib64
+else
+  libdir_relative = lib
+endif
+
 prefix ?= /usr/local
-bindir_relative = bin
-bindir = $(prefix)/$(bindir_relative)
+libdir = $(prefix)/$(libdir_relative)
 man_dir = $(prefix)/share/man
 man_dir_SQ = '$(subst ','\'',$(man_dir))'
 
@@ -58,7 +64,7 @@ ifeq ($(prefix),$(HOME))
 override plugin_dir = $(HOME)/.traceevent/plugins
 set_plugin_dir := 0
 else
-override plugin_dir = $(prefix)/lib/traceevent/plugins
+override plugin_dir = $(libdir)/traceevent/plugins
 endif
 endif
 
@@ -85,11 +91,11 @@ srctree := $(patsubst %/,%,$(dir $(srctree)))
 #$(info Determined 'srctree' to be $(srctree))
 endif
 
-export prefix bindir src obj
+export prefix libdir src obj
 
 # Shell quotes
-bindir_SQ = $(subst ','\'',$(bindir))
-bindir_relative_SQ = $(subst ','\'',$(bindir_relative))
+libdir_SQ = $(subst ','\'',$(libdir))
+libdir_relative_SQ = $(subst ','\'',$(libdir_relative))
 plugin_dir_SQ = $(subst ','\'',$(plugin_dir))
 
 LIB_FILE = libtraceevent.a libtraceevent.so
@@ -240,7 +246,7 @@ endef
 
 install_lib: all_cmd install_plugins
$(call QUIET_INSTALL, $(LIB_FILE)) \
-   $(call do_install,$(LIB_FILE),$(bindir_SQ))
+   $(call do_install,$(LIB_FILE),$(libdir_SQ))
 
 install_plugins: $(PLUGINS)
$(call QUIET_INSTALL, trace_plugins) \
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/1] iio: ltr501: Add light channel support

2015-05-15 Thread Peter Meerwald


> Added support to calculate lux value from visible
> and IR spectrum adc count values. Also added IIO_LIGHT
> channel to enable user read the lux value directly
> from device using illuminance input ABI.

minor comment below
 
> Signed-off-by: Kuppuswamy Sathyanarayanan 
> 
> ---
>  drivers/iio/light/ltr501.c | 57 
> ++
>  1 file changed, 57 insertions(+)
> 
> diff --git a/drivers/iio/light/ltr501.c b/drivers/iio/light/ltr501.c
> index ca4bf47..449b0fd 100644
> --- a/drivers/iio/light/ltr501.c
> +++ b/drivers/iio/light/ltr501.c
> @@ -66,6 +66,9 @@
>  
>  #define LTR501_REGMAP_NAME "ltr501_regmap"
>  
> +#define LTR501_LUX_CONV(vis_coeff, vis_data, ir_coeff, ir_data) \
> + ((vis_coeff * vis_data) - (ir_coeff * ir_data))
> +
>  static const int int_time_mapping[] = {10, 5, 20, 40};
>  
>  static const struct reg_field reg_field_it =
> @@ -298,6 +301,29 @@ static int ltr501_ps_read_samp_period(struct ltr501_data 
> *data, int *val)
>   return IIO_VAL_INT;
>  }
>  
> +/* IR and visible spectrum coeff's are given in data sheet */
> +static unsigned long ltr501_calculate_lux(u16 vis_data, u16 ir_data)
> +{
> + unsigned long ratio, lux;
> +
> + if (vis_data == 0)
> + return 0;
> +
> + /* multiply numerator by 100 to avoid handling ratio < 1 */
> + ratio = DIV_ROUND_UP(ir_data * 100, ir_data + vis_data);
> +
> + if (ratio < 45)
> + lux = LTR501_LUX_CONV(1774, vis_data, -1105, ir_data);
> + else if (ratio >= 45 && ratio < 64)
> + lux = LTR501_LUX_CONV(3772, vis_data, 1336, ir_data);
> + else if (ratio >= 64 && ratio < 85)
> + lux = LTR501_LUX_CONV(1690, vis_data, 169, ir_data);
> + else
> + lux = 0;
> +
> + return lux / 1000;
> +}
> +
>  static int ltr501_drdy(struct ltr501_data *data, u8 drdy_mask)
>  {
>   int tries = 100;
> @@ -548,7 +574,20 @@ static const struct iio_event_spec 
> ltr501_pxs_event_spec[] = {
>   .num_event_specs = _evsize,\
>  }
>  
> +#define LTR501_LIGHT_CHANNEL() { \
> + .type = IIO_LIGHT, \
> + .info_mask_separate = BIT(IIO_CHAN_INFO_PROCESSED), \
> + .scan_index = -1, \

I think there is no need for .scan_type if scan_index == -1

bad examples include: adc/ad7298.c and pressure/st_pressure_core.c


> + .scan_type = { \
> + .sign = 'u', \
> + .realbits = 16, \
> + .storagebits = 16, \
> + .endianness = IIO_CPU, \
> + }, \
> +}
> +
>  static const struct iio_chan_spec ltr501_channels[] = {
> + LTR501_LIGHT_CHANNEL(),
>   LTR501_INTENSITY_CHANNEL(0, LTR501_ALS_DATA0, IIO_MOD_LIGHT_BOTH, 0,
>ltr501_als_event_spec,
>ARRAY_SIZE(ltr501_als_event_spec)),
> @@ -576,6 +615,7 @@ static const struct iio_chan_spec ltr501_channels[] = {
>  };
>  
>  static const struct iio_chan_spec ltr301_channels[] = {
> + LTR501_LIGHT_CHANNEL(),
>   LTR501_INTENSITY_CHANNEL(0, LTR501_ALS_DATA0, IIO_MOD_LIGHT_BOTH, 0,
>ltr501_als_event_spec,
>ARRAY_SIZE(ltr501_als_event_spec)),
> @@ -596,6 +636,23 @@ static int ltr501_read_raw(struct iio_dev *indio_dev,
>   int ret, i;
>  
>   switch (mask) {
> + case IIO_CHAN_INFO_PROCESSED:
> + if (iio_buffer_enabled(indio_dev))
> + return -EBUSY;
> +
> + switch (chan->type) {
> + case IIO_LIGHT:
> + mutex_lock(&data->lock_als);
> + ret = ltr501_read_als(data, buf);
> + mutex_unlock(&data->lock_als);
> + if (ret < 0)
> + return ret;
> + *val = ltr501_calculate_lux(le16_to_cpu(buf[1]),
> + le16_to_cpu(buf[0]));
> + return IIO_VAL_INT;
> + default:
> + return -EINVAL;
> + }
>   case IIO_CHAN_INFO_RAW:
>   if (iio_buffer_enabled(indio_dev))
>   return -EBUSY;
> 

-- 

Peter Meerwald
+43-664-218 (mobile)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/2] tools lib traceevent: Export dynamic symbols used by traceevent plugins

2015-05-15 Thread He Kuang




On 2015/5/14 21:31, Jiri Olsa wrote:

On Thu, May 14, 2015 at 08:56:15PM +0800, He Kuang wrote:

SNIP


It seems new targets are needed. In the v2 patch,


hum, I dont get it.. why ?

dynamic-list-file gets rebuilt any time plugins are rebuilt..
why not keep just the 'plugins' dependency?


You can test your patch as following steps:

$ touch ../lib/traceevent/plugin_function.c
$ make
   CC   plugin_function.o
   LD   plugin_function-in.o
   LINK plugin_function.so
   GEN  libtraceevent-dynamic-list

perf is not rebuilt. There should be a 'GEN perf', right?


hum, right.. so this is separated bug that was there even
without your change. I tried to kick my change to address
that and ended up with what you sent in v2 ;-)

I have another comment for your v2, which I'll send right away

thanks,
jirka



Ok, please review the new version.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-15 Thread Ingo Molnar


Just a small stylistic side note:

> Wang Nan (37):
>   tools perf: set vmlinux_path__nr_entries to 0 in vmlinux_path__exit.
>   tools lib traceevent: install libtraceevent.a into libdir.
>   tools build: Allow other override features to check.
>   tools include: add __aligned_u64 to types.h.
>   tools lib bpf: introduce 'bpf' library to tools.
>   tools lib bpf: allow set printing function.
>   tools lib bpf: defines basic interface.
>   tools lib bpf: open eBPF object file and do basic validation.
>   tools lib bpf: check swap according to EHDR.
>   tools lib bpf: iterater over elf sections to collect information.
>   tools lib bpf: collect version and license from ELF.
>   tools lib bpf: collect map definitions.
>   tools lib bpf: collect config section in object.
>   tools lib bpf: collect symbol table in object files.
>   tools lib bpf: collect bpf programs from object files.
>   tools lib bpf: collect relocation sections from object file.
>   tools lib bpf: collect relocation instructions for each program.
>   tools lib bpf: clean elf memory after loading.
>   tools lib bpf: add bpf.c/h for common bpf operations.
>   tools lib bpf: create maps needed by object file.
>   tools lib bpf: relocation programs.
>   tools lib bpf: introduce bpf_load_program to bpf.c.
>   tools lib bpf: load bpf programs in object file into kernel.
>   tools lib bpf: accessors of bpf_program.
>   tools lib bpf: accessors for struct bpf_object.
>   tools perf: Add new 'perf bpf' command.
>   tools perf: make perf depend on libbpf.
>   tools perf: add 'perf bpf record' subcommand.
>   tools perf: add bpf-loader and open elf object files.
>   tools perf: collect all bpf programs.
>   tools perf: config probe points of eBPF programs during prepartion.
>   tools perf bpf: probe at kprobe points.
>   tools perf bpf: load eBPF object into kernel.
>   tools perf: add a bpf_wrapper global flag.
>   tools perf: add bpf_fd field to evsel and introduce new event syntax.
>   tools perf: generate event argv.
>   tools perf bpf: passes generated arguments to cmd_record.

The titles of the changes have numerous problems and inconsistencies:

 - use consistent capitalization, i.e.:

tools perf: Collect all bpf programs

 - don't use period at the end, i.e.:

tools perf: Generate event argv

 - use consistent present tense verbs, i.e.:

-  tools lib bpf: defines basic interface.
+  tools lib bpf: Define basic interface

 - Always use verbs! I.e. these are bad:

-   tools perf: config probe points of eBPF programs during prepartion.
-   tools lib bpf: relocation programs.

 - take a look at 'git log tools/perf' to see what the established 
   title style is. For example it's not 'tools perf' but 'perf tools', 
   etc.

etc.

There's not a single title in this series that has a proper title. 
This makes the shortlog a difficult read and gives a bad first 
impression. Please fix.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [FYI] tux3: Core changes

2015-05-15 Thread Mel Gorman

On Thu, May 14, 2015 at 05:06:39PM -0700, Daniel Phillips wrote:
> Hi Rik,
> 
> Added Mel, Andrea and Peterz to CC as interested parties. There are
> probably others, please just jump in.
> 
> On 05/14/2015 05:59 AM, Rik van Riel wrote:
> > On 05/14/2015 04:26 AM, Daniel Phillips wrote:
> >> Hi Rik,
> >>
> >> Our linux-tux3 tree currently currently carries this 652 line diff
> >> against core, to make Tux3 work. This is mainly by Hirofumi, except
> >> the fs-writeback.c hook, which is by me. The main part you may be
> >> interested in is rmap.c, which addresses the issues raised at the
> >> 2013 Linux Storage Filesystem and MM Summit 2015 in San Francisco.[1]
> >>
> >>LSFMM: Page forking
> >>http://lwn.net/Articles/548091/
> >>
> >> This is just a FYI. An upcoming Tux3 report will be a tour of the page
> >> forking design and implementation. For now, this is just to give a
> >> general sense of what we have done. We heard there are concerns about
> >> how ptrace will work. I really am not familiar with the issue, could
> >> you please explain what you were thinking of there?
> > 
> > The issue is that things like ptrace, AIO, infiniband
> > RDMA, and other direct memory access subsystems can take
> > a reference to page A, which Tux3 clones into a new page B
> > when the process writes it.
> > 
> > However, while the process now points at page B, ptrace,
> > AIO, infiniband, etc will still be pointing at page A.
> > 
> > This causes the process and the other subsystem to each
> > look at a different page, instead of at shared state,
> > causing ptrace to do nothing, AIO and RDMA data to be
> > invisible (or corrupted), etc...
> 
> Is this a bit like page migration?
> 

No, it's not.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: Tree for May 15

2015-05-15 Thread Stephen Rothwell

Hi all,

Changes since 20150514:

The rcu tree gained a conflict against the net-next tree.

The target-updates tree still had its build failure so I used the version
from next-20150511.

The akpm-current tree lost its build failure.

Non-merge commits (relative to Linus' tree): 4200
 3935 files changed, 186488 insertions(+), 93048 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a
multi_v7_defconfig for arm. After the final fixups (if any), it is also
built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and
allyesconfig (this fails its final link) and i386, sparc, sparc64 and arm
defconfig.

Below is a summary of the state of the merge.

I am currently merging 216 trees (counting Linus' and 30 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (110bc76729d4 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging fixes/master (b94d525e58dc Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging kbuild-current/rc-fixes (c517d838eb7d Linux 4.0-rc1)
Merging arc-current/for-curr (e4140819dadc ARC: signal handling robustify)
Merging arm-current/fixes (3b8786ff7a1b ARM: 8352/1: perf: Fix the pmu node 
name in warning message)
Merging m68k-current/for-linus (b24f670b7f5b m68k/mac: Fix out-of-bounds array 
index in OSS IRQ source initialization)
Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached 
build errors)
Merging mips-fixes/mips-fixes (1795cd9b3a91 Linux 3.16-rc5)
Merging powerpc-merge-mpe/fixes (ffb2d78eca08 powerpc/mce: fix off by one 
errors in mce event handling)
Merging powerpc-merge/merge (c517d838eb7d Linux 4.0-rc1)
Merging sparc/master (acc455cffa75 sparc64: Setup sysfs to mark LDOM sockets, 
cores and threads correctly)
Merging net/master (91dd93f956b9 netlink: move nl_table in read_mostly section)
Merging ipsec/master (6d7258ca9370 esp6: Use high-order sequence number bits 
for IV generation)
Merging sound-current/for-linus (9b5a4e395c2f ALSA: hda - Add headset mic quirk 
for Dell Inspiron 5548)
Merging pci-current/for-linus (5ebe6afaf005 Linux 4.1-rc2)
Merging wireless-drivers/master (f67382186489 ath9k: fix per-packet tx power 
configuration)
Merging driver-core.current/driver-core-linus (030bbdbf4c83 Linux 4.1-rc3)
Merging tty.current/tty-linus (1a48632ffed6 pty: Fix input race when closing)
Merging usb.current/usb-linus (569192605f56 Merge tag 'usb-serial-4.1-rc4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus)
Merging usb-gadget-fixes/fixes (c94e289f195e usb: gadget: remove incorrect 
__init/__exit annotations)
Merging usb-serial-fixes/usb-linus (82ee3aeb9295 USB: visor: Match I330 phone 
more precisely)
Merging staging.current/staging-linus (ec94efcdadab Merge tag 
'iio-fixes-for-4.1a-take2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus)
Merging char-misc.current/char-misc-linus (0f5b6ec67404 Merge tag 
'extcon-fixes-for-4.1-rc2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon into 
char-misc-linus)
Merging input-current/for-linus (48853389f206 Merge branch 'next' into 
for-linus)
Merging crypto-current/master (ec59a65d694e crypto: arm64/sha2-ce - prevent asm 
code finalization in final() path)
Merging ide/master (d681f1166919 ide: remove deprecated use of pci api)
Merging devicetree-current/devicetree/merge (41d9489319f2 drivers/of: Add empty 
ranges quirk for PA-Semi)
Merging rr-fixes/fixes (f47689345931 lguest: update help text.)
Merging vfio-fixes/for-linus (db7d4d7f4021 vfio: Fix runaway interruptible 
timeout)
Merging kselftest-fixes/fixes (e9886ace222e selftests, x86: Rework x86 target 
architecture detection)
Merging drm-intel-fixes/for-linux-next-fixes (364aece01a2d drm/i915: Avoid GPU 
hang w

Re: [FYI] tux3: Core changes

2015-05-15 Thread Mel Gorman

On Thu, May 14, 2015 at 11:06:22PM -0400, Rik van Riel wrote:
> On 05/14/2015 08:06 PM, Daniel Phillips wrote:
> > Hi Rik,
> > 
> > Added Mel, Andrea and Peterz to CC as interested parties. There are
> > probably others, please just jump in.
> > 
> > On 05/14/2015 05:59 AM, Rik van Riel wrote:
> >> On 05/14/2015 04:26 AM, Daniel Phillips wrote:
> >>> Hi Rik,
> >>>
> >>> Our linux-tux3 tree currently currently carries this 652 line diff
> >>> against core, to make Tux3 work. This is mainly by Hirofumi, except
> >>> the fs-writeback.c hook, which is by me. The main part you may be
> >>> interested in is rmap.c, which addresses the issues raised at the
> >>> 2013 Linux Storage Filesystem and MM Summit 2015 in San Francisco.[1]
> >>>
> >>>LSFMM: Page forking
> >>>http://lwn.net/Articles/548091/
> >>>
> >>> This is just a FYI. An upcoming Tux3 report will be a tour of the page
> >>> forking design and implementation. For now, this is just to give a
> >>> general sense of what we have done. We heard there are concerns about
> >>> how ptrace will work. I really am not familiar with the issue, could
> >>> you please explain what you were thinking of there?
> >>
> >> The issue is that things like ptrace, AIO, infiniband
> >> RDMA, and other direct memory access subsystems can take
> >> a reference to page A, which Tux3 clones into a new page B
> >> when the process writes it.
> >>
> >> However, while the process now points at page B, ptrace,
> >> AIO, infiniband, etc will still be pointing at page A.
> >>
> >> This causes the process and the other subsystem to each
> >> look at a different page, instead of at shared state,
> >> causing ptrace to do nothing, AIO and RDMA data to be
> >> invisible (or corrupted), etc...
> > 
> > Is this a bit like page migration?
> 
> Yes. Page migration will fail if there is an "extra"
> reference to the page that is not accounted for by
> the migration code.
> 

When I said it's not like page migration, I was referring to the fact
that a COW on a pinned page for RDMA is a different problem to page
migration. The COW of a pinned page can lead to lost writes or
corruption depending on the ordering of events. Page migration fails
when there are unexpected problems to avoid this class of issue which is
fine for page migration but may be a critical failure in a filesystem
depending on exactly why the copy is required.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH kernel v10 23/34] powerpc/iommu/powernv: Release replaced TCE

2015-05-15 Thread Thomas Huth

On Thu, 14 May 2015 13:53:57 +1000
Alexey Kardashevskiy  wrote:

> On 05/14/2015 01:00 AM, Thomas Huth wrote:
> > On Tue, 12 May 2015 01:39:12 +1000
> > Alexey Kardashevskiy  wrote:
...
> >> -/*
> >> - * hwaddr is a kernel virtual address here (0xc... bazillion),
> >> - * tce_build converts it to a physical address.
> >> - */
> >> -int iommu_tce_build(struct iommu_table *tbl, unsigned long entry,
> >> -  unsigned long hwaddr, enum dma_data_direction direction)
> >> -{
> >> -  int ret = -EBUSY;
> >> -  unsigned long oldtce;
> >> -  struct iommu_pool *pool = get_pool(tbl, entry);
> >> -
> >> -  spin_lock(&(pool->lock));
> >> -
> >> -  oldtce = tbl->it_ops->get(tbl, entry);
> >> -  /* Add new entry if it is not busy */
> >> -  if (!(oldtce & (TCE_PCI_WRITE | TCE_PCI_READ)))
> >> -  ret = tbl->it_ops->set(tbl, entry, 1, hwaddr, direction, NULL);
> >> -
> >> -  spin_unlock(&(pool->lock));
> >> +  if (!ret && ((*direction == DMA_FROM_DEVICE) ||
> >> +  (*direction == DMA_BIDIRECTIONAL)))
> >
> > You could drop some of the parentheses:
> >
> > if (!ret && (*direction == DMA_FROM_DEVICE ||
> > *direction == DMA_BIDIRECTIONAL))
> 
> I really (really) like braces. Is there any kernel code design rule against 
> it?

I don't think so ... but for me it's rather the other way round: If I
see too many braces, I always wonder whether there is a reason for it in
the sense that I did not understand the statement right at the first
glance. Additionally, this is something that Pascal programmers like to
do, so IMHO this just looks ugly in C.

> >> @@ -405,19 +410,26 @@ static long tce_iommu_ioctl(void *iommu_data,
> >>return -EINVAL;
> >>
> >>/* iova is checked by the IOMMU API */
> >> -  tce = param.vaddr;
> >>if (param.flags & VFIO_DMA_MAP_FLAG_READ)
> >> -  tce |= TCE_PCI_READ;
> >> -  if (param.flags & VFIO_DMA_MAP_FLAG_WRITE)
> >> -  tce |= TCE_PCI_WRITE;
> >> +  if (param.flags & VFIO_DMA_MAP_FLAG_WRITE)
> >> +  direction = DMA_BIDIRECTIONAL;
> >> +  else
> >> +  direction = DMA_TO_DEVICE;
> >> +  else
> >> +  if (param.flags & VFIO_DMA_MAP_FLAG_WRITE)
> >> +  direction = DMA_FROM_DEVICE;
> >> +  else
> >> +  return -EINVAL;
> >
> > IMHO some curly braces for the outer if-statement would be really fine
> > here.
> 
> I believe checkpatch.pl won't like it. There is a check against single 
> lines having braces after "if" statements.

If you write your code like this (I was only talking about the outer
braces!):

if (param.flags & VFIO_DMA_MAP_FLAG_READ) {
if (param.flags & VFIO_DMA_MAP_FLAG_WRITE)
direction = DMA_BIDIRECTIONAL;
else
direction = DMA_TO_DEVICE;
} else {
if (param.flags & VFIO_DMA_MAP_FLAG_WRITE)
direction = DMA_FROM_DEVICE;
else
return -EINVAL;
}

... then checkpatch should not complain, as far as I know - in this
case, the braces include three lines, don't they?

 Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] spi: Force the registration of the spidev devices

2015-05-15 Thread Maxime Ripard

On Wed, May 13, 2015 at 03:33:31PM -0700, Greg Kroah-Hartman wrote:
> On Wed, May 13, 2015 at 09:26:40PM +0200, Maxime Ripard wrote:
> > On Wed, May 13, 2015 at 11:17:36AM -0700, Greg Kroah-Hartman wrote:
> > > On Wed, May 13, 2015 at 07:50:34PM +0200, Maxime Ripard wrote:
> > > > Hi Greg,
> > > > 
> > > > On Wed, May 13, 2015 at 08:37:40AM -0700, Greg Kroah-Hartman wrote:
> > > > > On Wed, May 13, 2015 at 12:26:04PM +0100, Mark Brown wrote:
> > > > > > On Tue, May 12, 2015 at 10:33:24PM +0200, Maxime Ripard wrote:
> > > > > > 
> > > > > > > While this is nicer than the DT solution because of its accurate 
> > > > > > > hardware
> > > > > > > representation, it's still not perfect because you might not have 
> > > > > > > access to the
> > > > > > > DT, or you might be driving a completely generic device (such as a
> > > > > > > microcontroller) that might be used for something else in a 
> > > > > > > different
> > > > > > > context/board.
> > > > > > 
> > > > > > Greg, you're copied on this because this seems to be a generic 
> > > > > > problem
> > > > > > that should perhaps be solved at a driver model level - having a 
> > > > > > way to
> > > > > > bind userspace access to devices that we don't otherwise have a 
> > > > > > driver
> > > > > > for.  The subsystem could specify the UIO driver to use when no 
> > > > > > other
> > > > > > driver is available.
> > > > > 
> > > > > That doesn't really work.  I've been talking to the ACPI people about
> > > > > this, and the problem is "don't otherwise have a driver for" is an
> > > > > impossible thing to prove, as you never know when a driver is going to
> > > > > be loaded from userspace.
> > > > > 
> > > > > You can easily bind drivers to devices today from userspace, why not
> > > > > just use the built-in functionality you have today if you "know" that
> > > > > there is no driver for this hardware.
> > > > 
> > > > What we're really after here is that we want to have an spidev
> > > > instance when we don't even have a device.
> > > 
> > > That's crazy, just create a device, things do not work without one.
> > 
> > Our use case is this one: we want to export spidev files so that "dev
> > boards" with a header that allows to plug virtually anything on it
> > (Raspberry Pi, Cubieboards, Xplained, and all the likes) without
> > having to change the kernel and / or device tree.
> 
> You want to do that on a bus that is not self-describing or dynamic?
> I too want a pony.  Please go kick the hardware engineer who designed
> such a mess, we solved this problem 20+ years ago with "real" busses.

Well, we do have such ponies on some bus that don't have any kind of
enumeration. i2cdev allows to do just that already. That would seem
logical to have a similar behaviour for SPI.

> > That would mean that if we plug something to that port, no device will
> > be created because the DT itself won't have that device declared in
> > the first place.
> 
> Because you can't dynamically determine that something was plugged in,
> of course.

Well.. Yeah.

> 
> > This patch is actually doing this: creating a new device for all the
> > chipselects that are not in use that will be bound to the spidev
> > driver.
> 
> I have yet to see a patch...

You were in Cc of that patch, which is the first message in this thread.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


signature.asc
Description: Digital signature

[RFC PATCH v2 35/37] tools perf: add bpf_fd field to evsel and introduce new event syntax.

2015-05-15 Thread Wang Nan

This patch adds a bpf_fd field to 'struct evsel' and instroduces new
syntax for bpf use. Following patches will generate cmdline for
cmd_record, add '-e' options to enable tracing on events set in eBPF
object files. By using the newly introduced 'bpf_wrapper', this patch
ensures that the new '|bpf_fd=%d|' syntax is hidden to user, only
internal use is valid.

Signed-off-by: Wang Nan 
---
 tools/perf/util/evsel.c|  1 +
 tools/perf/util/evsel.h|  1 +
 tools/perf/util/parse-events.c | 19 +++
 tools/perf/util/parse-events.h |  3 +++
 tools/perf/util/parse-events.l |  8 +++-
 tools/perf/util/parse-events.y | 21 +
 6 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 33e3fd8..04d60a7 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -205,6 +205,7 @@ void perf_evsel__init(struct perf_evsel *evsel,
evsel->leader  = evsel;
evsel->unit= "";
evsel->scale   = 1.0;
+   evsel->bpf_fd  = -1;
INIT_LIST_HEAD(&evsel->node);
perf_evsel__object.init(evsel);
evsel->sample_size = __perf_evsel__sample_size(attr->sample_type);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index e486151..ff1f634 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -100,6 +100,7 @@ struct perf_evsel {
int sample_read;
struct perf_evsel   *leader;
char*group_name;
+   int bpf_fd;
 };
 
 union u64_swap {
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index be06553..5e49ddb 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -471,6 +471,25 @@ int parse_events_add_tracepoint(struct list_head *list, 
int *idx,
return add_tracepoint_event(list, idx, sys, event);
 }
 
+int parse_events_set_bpf_fd(struct list_head *list, int bpf_fd)
+{
+   struct perf_evsel *evsel;
+
+   if (!bpf_wrapper) {
+   fprintf(stderr,
+   "ERROR:\n"
+   "\tbpf fd should only be set by 'perf bpf', \n"
+   "\tuser should never use '|' syntax when \n"
+   "\tsetup events.\n"
+   "\n");
+   return -EINVAL;
+   }
+
+   __evlist__for_each(list, evsel)
+   evsel->bpf_fd = bpf_fd;
+   return 0;
+}
+
 static int
 parse_breakpoint_type(const char *type, struct perf_event_attr *attr)
 {
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 52a2dda..427a74a 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -102,6 +102,9 @@ int parse_events_add_tracepoint(struct list_head *list, int 
*idx,
 int parse_events_add_numeric(struct list_head *list, int *idx,
 u32 type, u64 config,
 struct list_head *head_config);
+
+int parse_events_set_bpf_fd(struct list_head *list, int bpf_fd);
+
 int parse_events_add_cache(struct list_head *list, int *idx,
   char *type, char *op_result1, char *op_result2);
 int parse_events_add_breakpoint(struct list_head *list, int *idx,
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 8895cf3..0be944a 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -88,7 +88,7 @@ static int term(yyscan_t scanner, int type)
 %}
 
 %x mem
-%s config
+%s config bpf_config
 %x event
 
 group  [^,{}/]*[{][^}]*[}][^,{}/]*
@@ -156,6 +156,11 @@ branch_type{ return term(yyscanner, 
PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE
 {name_minus}   { return str(yyscanner, PE_NAME); }
 }
 
+{
+bpf_fd { return PE_BPFFD; }
+"|"{ BEGIN(INITIAL); return '|'; }
+}
+
 {
 {modifier_bp}  { return str(yyscanner, PE_MODIFIER_BP); }
 :  { return ':'; }
@@ -230,6 +235,7 @@ r{num_raw_hex}  { return raw(yyscanner); }
 {modifier_event}   { return str(yyscanner, PE_MODIFIER_EVENT); }
 {name} { return pmu_str_check(yyscanner); }
 "/"{ BEGIN(config); return '/'; }
+"|"{ BEGIN(bpf_config); return '|'; }
 -  { return '-'; }
 ,  { BEGIN(event); return ','; }
 :  { return ':'; }
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 72def07..6c99322 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -48,6 +48,7 @@ static inc_group_count(struct list_head *list,
 %token PE_PREFIX_MEM PE_PREFIX_RAW PE_PREFIX_GROUP
 %token PE_ERROR
 %token PE_PMU_EVENT_PRE PE_PMU_EVENT_SUF PE_KERNEL_PMU_EVENT
+%token PE_BPFFD
 %type  PE_VALUE
 %type  PE_VALUE_SYM_HW
 %type  PE_VALUE_SYM_SW
@@ -62,12

[PATCH v3 2/2] tools lib traceevent: Ignore libtrace-dynamic-list file

2015-05-15 Thread He Kuang

The libtrace-dynamic-list file is used to export symbols used by
traceevent plugins.

Signed-off-by: He Kuang 
---
 tools/lib/traceevent/.gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/lib/traceevent/.gitignore b/tools/lib/traceevent/.gitignore
index 35f56be..3c60335 100644
--- a/tools/lib/traceevent/.gitignore
+++ b/tools/lib/traceevent/.gitignore
@@ -1 +1,2 @@
 TRACEEVENT-CFLAGS
+libtraceevent-dynamic-list
-- 
1.8.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2 20/37] tools lib bpf: create maps needed by object file.

2015-05-15 Thread Wang Nan

This patch creates maps based on 'map' section in object file using
bpf_create_map(), and store the fds into an array in
'struct bpf_object'. Since the byte order of the object may differ
from the host, swap map definition before processing.

This is the first patch in 'loading' phase. Previous patches parse ELF
object file and create needed data structure, but doesnnt play with
kernel. They belong to 'opening' phase.

Signed-off-by: Wang Nan 
---
 tools/lib/bpf/libbpf.c | 98 ++
 tools/lib/bpf/libbpf.h |  4 +++
 2 files changed, 102 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 9ed8cca..6ff4cb6 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -22,6 +22,7 @@
 #include 
 
 #include "libbpf.h"
+#include "bpf.h"
 
 #ifdef min
 # undef min
@@ -107,6 +108,7 @@ struct bpf_object {
 
struct bpf_program *programs;
size_t nr_programs;
+   int *maps_fds;
 
/*
 * Information when doing elf related work. Only valid if fd
@@ -613,6 +615,67 @@ bpf_program_collect_reloc(struct bpf_object *obj,
return 0;
 }
 
+static int
+bpf_obj_create_maps(struct bpf_object *obj)
+{
+   unsigned int i;
+   size_t nr_maps;
+   int *pfd;
+
+   nr_maps = obj->maps_buf_sz / sizeof(struct bpf_map_def);
+   if (!obj->maps_buf || !nr_maps) {
+   pr_debug("don't need create maps for %s\n",
+obj->path);
+   return 0;
+   }
+
+   obj->maps_fds = malloc(sizeof(int) * nr_maps);
+   if (!obj->maps_fds) {
+   pr_warning("realloc perf_bpf_maps_fds failed\n");
+   return -ENOMEM;
+   }
+
+   /* fill all fd with -1 */
+   memset(obj->maps_fds, 0xff, sizeof(int) * nr_maps);
+   
+   pfd = obj->maps_fds;
+   for (i = 0; i < nr_maps; i++) {
+   struct bpf_map_def def;
+
+   def = *(struct bpf_map_def *)(obj->maps_buf +
+   i * sizeof(struct bpf_map_def));
+
+   if (obj->needs_swap) {
+   def.type= bswap_32(def.type);
+   def.key_size= bswap_32(def.key_size);
+   def.value_size  = bswap_32(def.value_size);
+   def.max_entries = bswap_32(def.max_entries);
+   }
+
+   *pfd = bpf_create_map(def.type,
+ def.key_size,
+ def.value_size,
+ def.max_entries);
+   if (*pfd < 0) {
+   size_t j;
+   int err = *pfd;
+
+   pr_warning("failed to create map: %s\n",
+  strerror(errno));
+   for (j = 0; j < i; j++) {
+   close(obj->maps_fds[j]);
+   obj->maps_fds[j] = -1;
+   }
+   free(obj->maps_fds);
+   obj->maps_fds = NULL;
+   return err;
+   }
+   pr_debug("create map: fd=%d\n", *pfd);
+   pfd ++;
+   }
+   return 0;
+}
+
 static int bpf_obj_collect_reloc(struct bpf_object *obj)
 {
int i, err;
@@ -694,12 +757,47 @@ out:
return NULL;
 }
 
+int bpf_unload_object(struct bpf_object *obj)
+{
+   if (!obj)
+   return -EINVAL;
+
+   if (obj->maps_fds) {
+   size_t i;
+   size_t sz = sizeof(struct bpf_map_def);
+
+   for (i = 0; i < obj->maps_buf_sz; i += sz) {
+   if (obj->maps_fds[i] >= 0)
+   close(obj->maps_fds[i]);
+   }
+   free(obj->maps_fds);
+   }
+
+   return 0;
+}
+
+int bpf_load_object(struct bpf_object *obj)
+{
+   if (!obj)
+   return -EINVAL;
+
+   if (bpf_obj_create_maps(obj))
+   goto out;
+
+   return 0;
+out:
+   bpf_unload_object(obj);
+   pr_warning("failed to load object '%s'\n", obj->path);
+   return -EINVAL;
+}
+
 void bpf_close_object(struct bpf_object *obj)
 {
if (!obj)
return;
 
bpf_obj_clear_elf(obj);
+   bpf_unload_object(obj);
 
if (obj->path)
free(obj->path);
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 3505be7..c0b290d 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -21,6 +21,10 @@ struct bpf_object;
 struct bpf_object *bpf_open_object(const char *path);
 void bpf_close_object(struct bpf_object *object);
 
+/* Load/unload object into/from kernel */
+int bpf_load_object(struct bpf_object *obj);
+int bpf_unload_object(struct bpf_object *obj);
+
 /*
  * packed attribute is unnecessary for 'bpf_map_def'.
  */
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the

Re: [PATCH RFC] Documentation/infiniband: Add docs for rdma-helpers

2015-05-15 Thread Michael Wang



On 05/13/2015 05:11 PM, Doug Ledford wrote:
[snip]
>> +
>> +  For core layer, below helpers are used to check if a paticular capability
>> +  is supported by the port.
> 
> The following helpers are used to check the specific capabilities of a
> particular port before utilizing those capabilities.

Will be in next version :-)

> 
>> +
>> +rdma_cap_ib_mad - Infiniband Management Datagrams.
>> +rdma_cap_ib_smi - Infiniband Subnet Management Interface.
>> +rdma_cap_ib_cm  - Infiniband Communication Manager.
> InfiniBand Connection Management

Me too used to think it's 'connection', while I found some docs explain
this as 'communication'... but anyway, 'connection' sounds
more close to what it did in kernel :-)

>> +rdma_cap_iw_cm  - IWARP Communication Manager.
> iWARP Connection Management
>> +rdma_cap_ib_sa  - Infiniband Subnet Administration.
>> +rdma_cap_ib_mcast   - Infiniband Multicast.
> InfiniBand Multicast join/leave protocol
>> +rdma_cap_read_multi_sge - RDMA Read Multiple Scatter-Gather Entries.
> RDMA Read verb supports more than 1 sge in the work request

Will be in next version :-)

>> +rdma_cap_af_ib  - Native Infiniband Address.
>> +rdma_cap_eth_ah - Ethernet Address Handler.
> Queue Pair is InfiniBand transport, but uses Ethernet address instead of
> native InfiniBand address (aka, this is a RoCE QP, and that means
> ethertype 0x8915 + GRH for RoCEv1 and IP/UDP to well known UDP port for
> RoCEv2)

Shall we put this long description into USAGE? Here maybe list
all the helpers to give some quick overview with a brief
description, what's your opinion?

>> +
>> +USAGE
>> +
>> +  if (rdma_cap_XX(device, i)) {
>> +/* The port i of device support XX */
>> +...
>> +  } else {
>> +/* The port i of device don't support XX */
>> +...
>> +  }
>> +
>> +  rdma_cap_ib_mad
>> +  ---
>> +Management Datagrams (MAD) is the prototype of management packet
>> +to be used by all the kinds of infiniband managers, use the helper
>> +to verify the port before utilize related features.
> Management Datagrams (MAD) are a required part of the InfiniBand
> specification and are supported on all InfiniBand devices.  A slightly
> extended version are also supported on OPA interfaces.
> 
> I would drop all instances of "use the helper to verify..." as that's
> redundant.  This whole doc is about using the helpers to verify things.

Agree, will be dropped in next version.

And all the comments below make sense, will be merged ;-)

Regards,
Michael Wang

> 
>> +
>> +  rdma_cap_ib_smi
>> +  ---
>> +Subnet Management Interface (SMI) will handle SMP packet from SM
>> +in an infiniband fabric, use the helper to verify the port before
>> +utilize related features.
>> +
>> +  rdma_cap_ib_cm
>> +  ---
>> +Communication Manager (CM) will handle the connections between
>^Connection Manager (CM) service, used to ease the process of
> connecting to a remote host.  The IB CM can be used to connect to remote
> hosts using either InfiniBand or RoCE connections.  iWARP has its own
> connection manager, see below.
>> +adaptors, currently there are two different implementation,
>> +IB or IWARP, use the helper to verify whether the port using
>> +IB-CM or not
>> +
>> +  rdma_cap_iw_cm
>> +  ---
>> +IWARP has it's own implemented CM which is different from infiniband,
>   iWARP connection manager.  Similar to the IB Connection Manager,
> but only used on iWARP devices.
>> +use the helper to check whether the port using IWARP-CM or not.
>> +
>> +  rdma_cap_ib_sa
>> +  ---
>> +Subnet Administration (SA) is the database built by SM in an
>> +infiniband fabric, use the helper to verify the port before
>> +utilize related features.
>> +
>> +  rdma_cap_ib_mcast
>> +  ---
>> +Multicast is the feature for one QP to send messages to multiple
>> +QP in an infiniband fabric, use the helper to verify the port before
>> +utilize related features.
> 
> InfiniBand (and OPA) use a different multicast mechanism than
> traditional IP multicast found on Ethernet devices.  If this capability
> is true, then traditional IPv4/IPv6 multicast is handled by the IPoIB
> layer and direct multicast joins and leaves are handled per the
> InfiniBand specifications.
> 
>> +
>> +  rdma_cap_read_multi_sge
>> +  ---
>> +RDMA read operation could support multiple scatter-gather entries,
>> +use the helper to verify wthether the port support this feature
>> +or not.
> 
> Certain devices (iWARP in particular) have restrictions on the number of
> scatter gather elements that can be present in an RDMA READ work
> request.  This is true if the device does not have that restriction.
> 
>> +  rdma_cap_af_ib
>> +  ---
>> +RDMA address format could be ethernet or infiniband,

[PATCH v3 1/2] tools lib traceevent: Export dynamic symbols used by traceevent plugins

2015-05-15 Thread He Kuang

Traceevent plugins need dynamic symbols exported from libtraceevent.a,
otherwise a dlopen error will occur during plugins loading.

This patch uses dynamic-list-file to export dynamic symbols which will
be used in plugins to perf executable.

The problem is covered up if feature-libpython is enabled, because
PYTHON_EMBED_LDOPTS contains '-Xlinker --export-dynamic' which adds all
symbols to the dynamic symbol table. So we should reproduce the problem
by setting NO_LIBPYTHON=1.

Before this patch:

  (Prepare plugins)
  $ ls /root/.traceevent/plugins/
  plugin_sched_switch.so
  plugin_function.so
  ...

  $ perf record -e 'ftrace:function' ls

  $ perf script
Warning: could not load plugin 
'/mnt/data/root/.traceevent/plugins/plugin_sched_switch.so'
/root/.traceevent/plugins/plugin_sched_switch.so: undefined symbol: 
pevent_unregister_event_handler

Warning: could not load plugin 
'/root/.traceevent/plugins/plugin_function.so'
/root/.traceevent/plugins/plugin_function.so: undefined symbol: warning
...
   :1049  1049 [000]  9666.754487: ftrace:function:  8118bc50 
<-- 8118c5b3
   :1049  1049 [000]  9666.754487: ftrace:function:  818e2440 
<-- 8118bc75
   :1049  1049 [000]  9666.754487: ftrace:function:  8106eee0 
<-- 811212e2

After this patch:

  $ perf record -e 'ftrace:function' ls
  $ perf script
   :1049  1049 [000]  9666.754487: ftrace:function: __set_task_comm
   :1049  1049 [000]  9666.754487: ftrace:function:_raw_spin_lock
   :1049  1049 [000]  9666.754487: ftrace:function: task_tgid_nr_ns
   ...

Signed-off-by: He Kuang 
---
 tools/lib/traceevent/Makefile | 14 +-
 tools/perf/Makefile.perf  | 14 --
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/tools/lib/traceevent/Makefile b/tools/lib/traceevent/Makefile
index d410da3..2cbac13 100644
--- a/tools/lib/traceevent/Makefile
+++ b/tools/lib/traceevent/Makefile
@@ -23,6 +23,7 @@ endef
 # Allow setting CC and AR, or setting CROSS_COMPILE as a prefix.
 $(call allow-override,CC,$(CROSS_COMPILE)gcc)
 $(call allow-override,AR,$(CROSS_COMPILE)ar)
+$(call allow-override,NM,$(CROSS_COMPILE)nm)
 
 EXT = -std=gnu99
 INSTALL = install
@@ -151,8 +152,9 @@ PLUGINS_IN := $(PLUGINS:.so=-in.o)
 
 TE_IN:= $(OUTPUT)libtraceevent-in.o
 LIB_FILE := $(addprefix $(OUTPUT),$(LIB_FILE))
+DYNAMIC_LIST_FILE := $(OUTPUT)libtraceevent-dynamic-list
 
-CMD_TARGETS = $(LIB_FILE) $(PLUGINS)
+CMD_TARGETS = $(LIB_FILE) $(PLUGINS) $(DYNAMIC_LIST_FILE)
 
 TARGETS = $(CMD_TARGETS)
 
@@ -169,6 +171,9 @@ $(OUTPUT)libtraceevent.so: $(TE_IN)
 $(OUTPUT)libtraceevent.a: $(TE_IN)
$(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^
 
+$(OUTPUT)libtraceevent-dynamic-list: $(PLUGINS)
+   $(QUIET_GEN)$(call do_generate_dynamic_list_file, $(PLUGINS), $@)
+
 plugins: $(PLUGINS)
 
 __plugin_obj = $(notdir $@)
@@ -238,6 +243,13 @@ define do_install_plugins
done
 endef
 
+define do_generate_dynamic_list_file
+   (echo '{';  \
+   $(NM) -u -D $1 | awk 'NF>1 {print "\t"$$2";"}' | sort -u;   \
+   echo '};';  \
+   ) > $2
+endef
+
 install_lib: all_cmd install_plugins
$(call QUIET_INSTALL, $(LIB_FILE)) \
$(call do_install,$(LIB_FILE),$(bindir_SQ))
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 03409cc..1e6e038 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -173,6 +173,9 @@ endif
 LIBTRACEEVENT = $(TE_PATH)libtraceevent.a
 export LIBTRACEEVENT
 
+LIBTRACEEVENT_DYNAMIC_LIST = $(TE_PATH)libtraceevent-dynamic-list
+LDFLAGS += -Xlinker --dynamic-list=$(LIBTRACEEVENT_DYNAMIC_LIST)
+
 LIBAPI = $(LIB_PATH)libapi.a
 export LIBAPI
 
@@ -278,7 +281,7 @@ build := -f $(srctree)/tools/build/Makefile.build dir=. obj
 $(PERF_IN): $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h FORCE
$(Q)$(MAKE) $(build)=perf
 
-$(OUTPUT)perf: $(PERFLIBS) $(PERF_IN)
+$(OUTPUT)perf: $(PERFLIBS) $(PERF_IN) $(LIBTRACEEVENT_DYNAMIC_LIST)
$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $(PERF_IN) $(LIBS) -o $@
 
 $(GTK_IN): FORCE
@@ -373,7 +376,13 @@ $(LIB_FILE): $(LIBPERF_IN)
 LIBTRACEEVENT_FLAGS += plugin_dir=$(plugindir_SQ)
 
 $(LIBTRACEEVENT): FORCE
-   $(Q)$(MAKE) -C $(TRACE_EVENT_DIR) $(LIBTRACEEVENT_FLAGS) O=$(OUTPUT) 
$(OUTPUT)libtraceevent.a plugins
+   $(Q)$(MAKE) -C $(TRACE_EVENT_DIR) $(LIBTRACEEVENT_FLAGS) O=$(OUTPUT) 
$(OUTPUT)libtraceevent.a
+
+libtraceevent_plugins: FORCE
+   $(Q)$(MAKE) -C $(TRACE_EVENT_DIR) $(LIBTRACEEVENT_FLAGS) O=$(OUTPUT) 
plugins
+
+$(LIBTRACEEVENT_DYNAMIC_LIST): libtraceevent_plugins
+   $(Q)$(MAKE) -C $(TRACE_EVENT_DIR) $(LIBTRACEEVENT_FLAGS) O=$(OUTPUT) 
$(OUTPUT)libtraceevent-dynamic-list
 
 $(LIBTRACEEVENT)-clean:
$(call QUIET_CLEAN, libtraceevent)
@@ -551,4 +560,5 @@ FORCE:
 .PHONY: all install cle

Re: [PATCH v2] leds: fix brightness changing when software blinking is active

2015-05-15 Thread Jacek Anaszewski


Hi Stas,

On 05/14/2015 05:24 PM, Stas Sergeev wrote:


The following sequence:
echo timer >/sys/class/leds//trigger
echo 1 >/sys/class/leds//brightness
should change the ON brightness for blinking.
The function led_set_brightness() was mistakenly initiating the
delayed blink stop procedure, which resulted in no blinking with
the timer trigger still active.

This patch fixes the problem by changing led_set_brightness()
to not initiate the delayed blink stop when brightness is not 0.

CC: Bryan Wu 
CC: Richard Purdie 
CC: Jacek Anaszewski 
CC: Kyungmin Park 
CC: linux-l...@vger.kernel.org
CC: linux-kernel@vger.kernel.org

Signed-off-by: Stas Sergeev 
---
  drivers/leds/led-class.c |5 +
  drivers/leds/led-core.c  |5 +++--
  2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/leds/led-class.c b/drivers/leds/led-class.c
index 795ec99..65c2c80 100644
--- a/drivers/leds/led-class.c
+++ b/drivers/leds/led-class.c
@@ -121,6 +121,11 @@ static void led_timer_function(unsigned long data)
brightness = led_get_brightness(led_cdev);
if (!brightness) {
/* Time to switch the LED on. */
+   if (led_cdev->delayed_set_value) {
+   led_cdev->blink_brightness =
+   led_cdev->delayed_set_value;
+   led_cdev->delayed_set_value = 0;
+   }
brightness = led_cdev->blink_brightness;
delay = led_cdev->blink_delay_on;
} else {
diff --git a/drivers/leds/led-core.c b/drivers/leds/led-core.c
index 356e851..a90dd26 100644
--- a/drivers/leds/led-core.c
+++ b/drivers/leds/led-core.c
@@ -119,10 +119,11 @@ void led_set_brightness(struct led_classdev *led_cdev,
  {
int ret = 0;

-   /* delay brightness setting if need to stop soft-blink timer */
+   /* delay brightness if soft-blink is active */
if (led_cdev->blink_delay_on || led_cdev->blink_delay_off) {
led_cdev->delayed_set_value = brightness;
-   schedule_work(&led_cdev->set_brightness_work);
+   if (brightness == LED_OFF)
+   schedule_work(&led_cdev->set_brightness_work);
return;
}



Acked-by: Jacek Anaszewski 

--
Best Regards,
Jacek Anaszewski
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next,1/1] hv_netvsc: change member name of struct netvsc_stats

2015-05-15 Thread Simon Xiao

Currently the struct netvsc_stats has a member s_sync
of type u64_stats_sync.
This definition will break kernel build as the macro
netdev_alloc_pcpu_stats requires this member name to be syncp.
(see netdev_alloc_pcpu_stats definition in ./include/linux/netdevice.h)

This patch changes netvsc_stats's member name from s_sync to syncp to fix
the build break.

Signed-off-by: Simon Xiao 
---
 drivers/net/hyperv/hyperv_net.h |  2 +-
 drivers/net/hyperv/netvsc_drv.c | 16 
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 5a92b36..ddcc7f8 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -614,7 +614,7 @@ struct multi_send_data {
 struct netvsc_stats {
u64 packets;
u64 bytes;
-   struct u64_stats_sync s_sync;
+   struct u64_stats_sync syncp;
 };
 
 /* The context of the netvsc device  */
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 0c858724..d9c88bc 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -580,10 +580,10 @@ do_send:
 
 drop:
if (ret == 0) {
-   u64_stats_update_begin(&tx_stats->s_sync);
+   u64_stats_update_begin(&tx_stats->syncp);
tx_stats->packets++;
tx_stats->bytes += skb_length;
-   u64_stats_update_end(&tx_stats->s_sync);
+   u64_stats_update_end(&tx_stats->syncp);
} else {
if (ret != -EAGAIN) {
dev_kfree_skb_any(skb);
@@ -692,10 +692,10 @@ int netvsc_recv_callback(struct hv_device *device_obj,
skb_record_rx_queue(skb, packet->channel->
offermsg.offer.sub_channel_index);
 
-   u64_stats_update_begin(&rx_stats->s_sync);
+   u64_stats_update_begin(&rx_stats->syncp);
rx_stats->packets++;
rx_stats->bytes += packet->total_data_buflen;
-   u64_stats_update_end(&rx_stats->s_sync);
+   u64_stats_update_end(&rx_stats->syncp);
 
/*
 * Pass the skb back up. Network stack will deallocate the skb when it
@@ -776,16 +776,16 @@ static struct rtnl_link_stats64 
*netvsc_get_stats64(struct net_device *net,
unsigned int start;
 
do {
-   start = u64_stats_fetch_begin_irq(&tx_stats->s_sync);
+   start = u64_stats_fetch_begin_irq(&tx_stats->syncp);
tx_packets = tx_stats->packets;
tx_bytes = tx_stats->bytes;
-   } while (u64_stats_fetch_retry_irq(&tx_stats->s_sync, start));
+   } while (u64_stats_fetch_retry_irq(&tx_stats->syncp, start));
 
do {
-   start = u64_stats_fetch_begin_irq(&rx_stats->s_sync);
+   start = u64_stats_fetch_begin_irq(&rx_stats->syncp);
rx_packets = rx_stats->packets;
rx_bytes = rx_stats->bytes;
-   } while (u64_stats_fetch_retry_irq(&rx_stats->s_sync, start));
+   } while (u64_stats_fetch_retry_irq(&rx_stats->syncp, start));
 
t->tx_bytes += tx_bytes;
t->tx_packets   += tx_packets;
-- 
1.8.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH V2 3/4] watchdog: da9062: DA9062 watchdog driver

2015-05-15 Thread Opensource [Steve Twiss]

Hi Guenter,

Thank you for your comments again,
Here are my responses.

Regards,
Steve

On 15 May 2015 03:13, Guenter Roeck 
> Subject: Re: [PATCH V2 3/4] watchdog: da9062: DA9062 watchdog driver
> 

[...]

> > +static void da9062_apply_window_protection(struct da9062_watchdog
> *wdt)
> > +{
> > +   unsigned long delay =
> msecs_to_jiffies(DA9062_RESET_PROTECTION_MS);
> > +   unsigned long timeout = wdt->j_time_stamp + delay;
> > +   unsigned long now = jiffies;
> > +   unsigned int diff_ms;
> > +
> > +   /* if time-limit has not elapsed then wait for remainder */
> > +   if (time_before(now, timeout)) {
> > +   diff_ms = jiffies_to_msecs(timeout-now);
> > +   dev_dbg(wdt->hw->dev,
> > +   "Kicked too quickly. Delaying %u msecs\n", diff_ms);
> > +   msleep(diff_ms);
> > +   }
> > +
> > +   return;
> 
> Unnecessary return statement.
> 

Deleted.

> > +static unsigned int da9062_wdt_timeout_to_sel(unsigned int secs)
> > +{
> > +   unsigned int i;
> > +
> > +   for (i = DA9062_TWDSCALE_MIN; i <= DA9062_TWDSCALE_MAX; i++) {
> > +   if (wdt_timeout[i] >= secs)
> > +   return i;
> > +   }
> > +
> > +   return DA9062_TWDSCALE_MAX;
> > +}
> > +
> > +static int da9062_reset_watchdog_timer(struct da9062_watchdog *wdt)
> > +{
> > +   int ret;
> > +
> > +   da9062_apply_window_protection(wdt);
> > +
> > +   ret = regmap_update_bits(wdt->hw->regmap,
> > +  DA9062AA_CONTROL_F,
> > +  DA9062AA_WATCHDOG_MASK,
> > +  DA9062AA_WATCHDOG_MASK);
> > +
> > +   da9062_set_window_start(wdt);
> > +
> > +   return ret;
> > +}
> > +
> > +static int da9062_wdt_update_timeout_register(struct da9062_watchdog *wdt,
> > + unsigned int regval)
> > +{
> > +   struct da9062 *chip = wdt->hw;
> > +   int ret;
> > +
> > +   ret = da9062_reset_watchdog_timer(wdt);
> > +   if (ret) {
> > +   dev_err(chip->dev, "Failed to ping the watchdog (err = %d)\n",
> > +   ret);
> 
> I am kind of torn about all this noisiness on error. Personally I would tend 
> to
> ask people to let user space handle it, and not be that noisy in the kernel.
> 
> Wim, any guidance ?

At the time I thought it would be a really good idea to keep a debug message 
in. 
But -- this has been questioned several times and so I will remove.

> > +   return ret;
> > +   }
> > +
> > +   return regmap_update_bits(chip->regmap,
> > + DA9062AA_CONTROL_D,
> > + DA9062AA_TWDSCALE_MASK,
> > + regval);
> 
> ... and it is inconsistent - no error message here.
> 

Removed the dev_err() defined previously and therefore this makes this return
without an error message more consistent with the earlier parts of the function.
(no change needed)

 [...]

> > +static int da9062_wdt_stop(struct watchdog_device *wdd)
> > +{
> > +   struct da9062_watchdog *wdt = watchdog_get_drvdata(wdd);
> > +   int ret;
> > +
> > +   ret = da9062_reset_watchdog_timer(wdt);
> > +   if (ret) {
> > +   dev_err(wdt->hw->dev, "Failed to ping the watchdog (err =
> %d)\n",
> > +   ret);
> > +   return ret;
> > +   }
> > +
> > +   ret = regmap_update_bits(wdt->hw->regmap,
> > +DA9062AA_CONTROL_D,
> > +DA9062AA_TWDSCALE_MASK,
> > +DA9062_TWDSCALE_DISABLE);
> > +   if (ret)
> > +   dev_alert(wdt->hw->dev, "Watchdog failed to stop (err =
> %d)\n",
> > + ret);
> 
> .. and now we have an alert. Hmm..

.. I've replaced it with a dev_err()

> > +
> > +   return ret;
> > +}
> > +
> > +static int da9062_wdt_ping(struct watchdog_device *wdd)
> > +{
> > +   struct da9062_watchdog *wdt = watchdog_get_drvdata(wdd);
> > +   int ret;
> > +
> > +   dev_dbg(wdt->hw->dev, "watchdog ping\n");
> > +
> 
> Is this really valuable enough to keep in the code ?
> 

Removed also.

> > +   ret = da9062_reset_watchdog_timer(wdt);
> > +   if (ret)
> > +   dev_err(wdt->hw->dev, "Failed to ping the watchdog (err =
> %d)\n",
> > +   ret);
> > +
> > +   return ret;
> > +}
> > +

[...]

> > +
> > +/* E_WDG_WARN interrupt handler */
> > +static irqreturn_t da9062_wdt_wdg_warn_irq_handler(int irq, void *data)
> > +{
> > +   struct da9062_watchdog *wdt = data;
> > +
> > +   dev_notice(wdt->hw->dev, "Watchdog timeout warning trigger.\n");
> > +   return IRQ_HANDLED;
> > +}
> > +

[...]

> > +static int da9062_wdt_probe(struct platform_device *pdev)
> > +{
> > +   int ret;
> > +   struct da9062 *chip;
> > +   struct da9062_watchdog *wdt;
> > +   int irq;
> > +
> > +   chip = dev_get_drvdata(pdev->dev.parent);
> > +   if (!chip)
> > +   return -EINVAL;
> > +
> > +   wdt = devm_kzalloc(&pdev->dev, sizeof(*wdt), GFP_KERNEL);
> > +   if (!wdt)
> > +   return -ENOMEM;
> > +
> > +   wdt->hw = chip;
> > +
> > +   wd

[ 00/48] 2.6.32.66-longterm review

2015-05-15 Thread Willy Tarreau

This is the start of the longterm review cycle for the 2.6.32.66 release.
All patches will be posted as a response to this one. If anyone has any
issue with these being applied, please let me know. If anyone is a
maintainer of the proper subsystem, and wants to add a Signed-off-by: line
to the patch, please respond with it. If anyone thinks some important
patches are missing and should be added prior to the release, please
report them quickly with their respective mainline commit IDs.

Responses should be made by Thu May 21 10:05:29 CEST 2015.
Anything received after that time might be too late. If someone
wants a bit more time for a deeper review, please let me know.

NOTE: 2.6.32 is approaching end of support. There will probably be one
or maybe two other versions issued in the next 3-6 months, and that will
be all, at least for me. Adding to this the time it can take to validate
and deploy in some environments, it probably makes sense to start to
think about switching to another longterm branch. 3.2 and 3.4 are good
candidates for those seeking rock-solid versions. Longterm branches and
their projected EOLs are listed here :

 https://www.kernel.org/category/releases.html

The whole patch series can be found in one patch at :
 
https://kernel.org/pub/linux/kernel/v2.6/longterm-review/patch-2.6.32.66-rc1.gz

The shortlog and diffstat are appended below.

Thanks,
Willy

===

Al Viro (1):
  rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()

Alexey Khoroshilov (1):
  sound/oss: fix deadlock in sequencer_ioctl(SNDCTL_SEQ_OUTOFBAND)

Alexey Kodanev (1):
  net: sysctl_net_core: check SNDBUF and RCVBUF for min length

Andy Lutomirski (10):
  x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs
  x86/tls: Validate TLS entries to protect espfix
  x86, tls, ldt: Stop checking lm in LDT_empty
  x86, tls: Interpret an all-zero struct user_desc as "no segment"
  x86_64, switch_to(): Load TLS descriptors before switching DS and ES
  x86/tls: Disallow unusual TLS segments
  x86/tls: Don't validate lm in set_thread_area() after all
  x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit
  x86_64, vdso: Fix the vdso address randomization algorithm
  x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization

Ani Sinha (1):
  net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr 
struct from userland.

Arnd Bergmann (1):
  rds: avoid potential stack overflow

Ben Hutchings (1):
  splice: Apply generic position and size checks to each write

Benjamin Coddington (1):
  lockd: Try to reconnect if statd has moved

Borislav Petkov (1):
  x86, cpu, amd: Add workaround for family 16h, erratum 793

D.S. Ljungmark (1):
  ipv6: Don't reduce hop limit for an interface

Dan Carpenter (1):
  ipvs: uninitialized data with IP_VS_IPV6

Daniel Borkmann (2):
  net: sctp: fix memory leak in auth key management
  net: sctp: fix slab corruption from use after free on INIT collisions

Eli Cohen (1):
  IB/core: Avoid leakage from kernel to user space

Eric Dumazet (2):
  tcp: make connect() mem charging friendly
  tcp: avoid looping in tcp_send_fin()

Florian Westphal (2):
  netfilter: conntrack: disable generic tracking for known protocols
  ppp: deflate: never return len larger than output buffer

Hector Marco-Gisbert (1):
  ASLR: fix stack randomization on 64-bit systems

Ian Abbott (1):
  spi: spidev: fix possible arithmetic overflow for multi-transfer message

Ignacy GawÄdzki (1):
  ematch: Fix auto-loading of ematch modules.

Jan Kara (3):
  isofs: Fix infinite looping over CE entries
  isofs: Fix unchecked printing of ER records
  scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND

Jann Horn (1):
  fs: take i_mutex during prepare_binprm for set[ug]id executables

Jiri Pirko (1):
  ipv4: fix nexthop attlen check in fib_nh_match

Kirill A. Shutemov (1):
  pagemap: do not leak physical addresses to non-privileged userspace

Mathias Krause (1):
  posix-timers: Fix stack info leak in timer_create()

Matthew Thode (1):
  net: reject creation of netdev names with colons

Michal KubeÄek (1):
  udp: only allow UFO for packets from SOCK_DGRAM sockets

Robert Baldyga (1):
  serial: samsung: wait for transfer completion before clock disable

Sasha Levin (2):
  net: llc: use correct size for sysctl timeout entries
  net: rds: use correct size for max unacked packets and bytes

Sebastian Pöhn (1):
  ip_forward: Drop frames with attached skb->sk

Sergei Antonov (1):
  hfsplus: fix B-tree corruption after insertion at position 0

Shachar Raindel (1):
  IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic

Shai Fultheim (1):
  x86: Conditionally update time when ack-ing pending irqs

Steffen Klassert (1):
  ipv4: Don't use ufo handling on later transformed packets

bingtian..

[RFC PATCH v2 33/37] tools perf bpf: load eBPF object into kernel.

2015-05-15 Thread Wang Nan

This patch utilizes bpf_load_object() provided by libbpf to load all
objects into kernel.

Signed-off-by: Wang Nan 
---
 tools/perf/builtin-bpf.c |  5 +
 tools/perf/util/bpf-loader.c | 16 
 tools/perf/util/bpf-loader.h |  3 ++-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-bpf.c b/tools/perf/builtin-bpf.c
index 95e0c65..8155f39 100644
--- a/tools/perf/builtin-bpf.c
+++ b/tools/perf/builtin-bpf.c
@@ -162,6 +162,11 @@ static int cmd_bpf_record(int argc, const char **argv,
goto errout;
}
 
+   if (bpf_load()) {
+   pr_err("bpf: failed to load\n");
+   goto errout;
+   }
+
return start_bpf_record(argc, argv);
 usage:
usage_with_options(bpf_record_usage, options);
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index c820d1a..7295a3b 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -214,3 +214,19 @@ int bpf_probe(void)
 
return err < 0 ? err : 0;
 }
+
+int bpf_load(void)
+{
+   size_t i;
+   int err;
+
+   for (i = 0; i < params.nr_objects; i++) {
+   err = bpf_load_object(params.objects[i]);
+   if (err) {
+   pr_err("failed to load object\n");
+   return err;
+   }
+   }
+
+   return 0;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 30dea2e..1ccebdf 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -9,7 +9,8 @@
 
 int bpf_prepare_load(const char *filename);
 int bpf_probe(void);
-int bpf_unprobe(void);
+int bpf_load(void);
 
+int bpf_unprobe(void);
 void bpf_clear(void);
 #endif
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 46/48] posix-timers: Fix stack info leak in timer_create()

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Mathias Krause 

commit 6891c4509c792209c44ced55a60f13954cb50ef4 upstream.

If userland creates a timer without specifying a sigevent info, we'll
create one ourself, using a stack local variable. Particularly will we
use the timer ID as sival_int. But as sigev_value is a union containing
a pointer and an int, that assignment will only partially initialize
sigev_value on systems where the size of a pointer is bigger than the
size of an int. On such systems we'll copy the uninitialized stack bytes
from the timer_create() call to userland when the timer actually fires
and we're going to deliver the signal.

Initialize sigev_value with 0 to plug the stack info leak.

Found in the PaX patch, written by the PaX Team.

Fixes: 5a9fa7307285 ("posix-timers: kill ->it_sigev_signo and...")
Signed-off-by: Mathias Krause 
Cc: Oleg Nesterov 
Cc: Brad Spengler 
Cc: PaX Team 
Link: 
http://lkml.kernel.org/r/1412456799-32339-1-git-send-email-mini...@googlemail.com
Signed-off-by: Thomas Gleixner 
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings 
(cherry picked from commit 3cd3a349aa3519b88d29845c0bc36bcbae158e93)

Signed-off-by: Willy Tarreau 
---
 kernel/posix-timers.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index 5e76d22..f2335e8 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -578,6 +578,7 @@ SYSCALL_DEFINE3(timer_create, const clockid_t, which_clock,
goto out;
}
} else {
+   memset(&event.sigev_value, 0, sizeof(event.sigev_value));
event.sigev_notify = SIGEV_SIGNAL;
event.sigev_signo = SIGALRM;
event.sigev_value.sival_int = new_timer->it_id;
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 11/48] x86, cpu, amd: Add workaround for family 16h, erratum 793

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Borislav Petkov 

commit 3b56496865f9f7d9bcb2f93b44c63f274f08e3b6 upstream

This adds the workaround for erratum 793 as a precaution in case not
every BIOS implements it.  This addresses CVE-2013-6885.

Erratum text:

[Revision Guide for AMD Family 16h Models 00h-0Fh Processors,
document 51810 Rev. 3.04 November 2013]

793 Specific Combination of Writes to Write Combined Memory Types and
Locked Instructions May Cause Core Hang

Description

Under a highly specific and detailed set of internal timing
conditions, a locked instruction may trigger a timing sequence whereby
the write to a write combined memory type is not flushed, causing the
locked instruction to stall indefinitely.

Potential Effect on System

Processor core hang.

Suggested Workaround

BIOS should set MSR
C001_1020[15] = 1b.

Fix Planned

No fix planned

[ hpa: updated description, fixed typo in MSR name ]

Signed-off-by: Borislav Petkov 
Link: http://lkml.kernel.org/r/20140114230711.gs29...@pd.tnic
Tested-by: Aravind Gopalakrishnan 
Signed-off-by: H. Peter Anvin 
[bwh: Backported to 3.2:
 - Adjust filename
 - Venkatesh Srinivas pointed out we should use {rd,wr}msrl_safe() to
   avoid crashing on KVM.  This was fixed upstream by commit 8f86a7373a1c
   ("x86, AMD: Convert to the new bit access MSR accessors") but that's too
   much trouble to backport.  Here we must use {rd,wr}msrl_amd_safe().]
Signed-off-by: Willy Tarreau 
---
 arch/x86/include/asm/msr-index.h |  1 +
 arch/x86/kernel/cpu/amd.c| 10 ++
 2 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 883037b..6057b70 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -110,6 +110,7 @@
 #define MSR_AMD64_PATCH_LOADER 0xc0010020
 #define MSR_AMD64_OSVW_ID_LENGTH   0xc0010140
 #define MSR_AMD64_OSVW_STATUS  0xc0010141
+#define MSR_AMD64_LS_CFG   0xc0011020
 #define MSR_AMD64_DC_CFG   0xc0011022
 #define MSR_AMD64_IBSFETCHCTL  0xc0011030
 #define MSR_AMD64_IBSFETCHLINAD0xc0011031
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 6e082dc..ae8b02c 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -413,6 +413,16 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
set_cpu_cap(c, X86_FEATURE_EXTD_APICID);
}
 #endif
+
+   /* F16h erratum 793, CVE-2013-6885 */
+   if (c->x86 == 0x16 && c->x86_model <= 0xf) {
+   u64 val;
+
+   if (!rdmsrl_amd_safe(MSR_AMD64_LS_CFG, &val) &&
+   !(val & BIT(15)))
+   wrmsrl_amd_safe(MSR_AMD64_LS_CFG, val | BIT(15));
+   }
+
 }
 
 static void __cpuinit init_amd(struct cpuinfo_x86 *c)
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 45/48] scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Jan Kara 

commit 84ce0f0e94ac97217398b3b69c21c7a62ebeed05 upstream.

When sg_scsi_ioctl() fails to prepare request to submit in
blk_rq_map_kern() we jump to a label where we just end up copying
(luckily zeroed-out) kernel buffer to userspace instead of reporting
error. Fix the problem by jumping to the right label.

CC: Jens Axboe 
CC: linux-s...@vger.kernel.org
Coverity-id: 1226871
Signed-off-by: Jan Kara 

Fixed up the, now unused, out label.

Signed-off-by: Jens Axboe 
Signed-off-by: Ben Hutchings 
(cherry picked from commit d73b032b63e8967462e1cf5763858ed89e97880f)

Signed-off-by: Willy Tarreau 
---
 block/scsi_ioctl.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 123eb17..f5df2a8 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -503,7 +503,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk 
*disk, fmode_t mode,
 
if (bytes && blk_rq_map_kern(q, rq, buffer, bytes, __GFP_WAIT)) {
err = DRIVER_ERROR << 24;
-   goto out;
+   goto error;
}
 
memset(sense, 0, sizeof(sense));
@@ -513,7 +513,6 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk 
*disk, fmode_t mode,
 
blk_execute_rq(q, disk, rq, 0);
 
-out:
err = rq->errors & 0xff;/* only 8 bit SCSI status */
if (err) {
if (rq->sense_len && rq->sense) {
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 23/48] net: rds: use correct size for max unacked packets and bytes

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Sasha Levin 

commit db27ebb111e9f69efece08e4cb6a34ff980f8896 upstream.

Max unacked packets/bytes is an int while sizeof(long) was used in the
sysctl table.

This means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.

Signed-off-by: Sasha Levin 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings 
(cherry picked from commit 3760b67b3e419b9ac42a45417491360a14a35357)

Signed-off-by: Willy Tarreau 
---
 net/rds/sysctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/rds/sysctl.c b/net/rds/sysctl.c
index 307dc5c..870e808 100644
--- a/net/rds/sysctl.c
+++ b/net/rds/sysctl.c
@@ -74,7 +74,7 @@ static ctl_table rds_sysctl_rds_table[] = {
.ctl_name   = CTL_UNNUMBERED,
.procname   = "max_unacked_packets",
.data   = &rds_sysctl_max_unacked_packets,
-   .maxlen = sizeof(unsigned long),
+   .maxlen = sizeof(int),
.mode   = 0644,
.proc_handler   = &proc_dointvec,
},
@@ -82,7 +82,7 @@ static ctl_table rds_sysctl_rds_table[] = {
.ctl_name   = CTL_UNNUMBERED,
.procname   = "max_unacked_bytes",
.data   = &rds_sysctl_max_unacked_bytes,
-   .maxlen = sizeof(unsigned long),
+   .maxlen = sizeof(int),
.mode   = 0644,
.proc_handler   = &proc_dointvec,
},
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 31/48] udp: only allow UFO for packets from SOCK_DGRAM sockets

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: =?latin1?q?Michal=20Kube=C4=8Dek?= 

[ Upstream commit acf8dd0a9d0b9e4cdb597c2f74802f79c699e802 ]

If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a
UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to
CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer
checksum is to be computed on segmentation. However, in this case,
skb->csum_start and skb->csum_offset are never set as raw socket
transmit path bypasses udp_send_skb() where they are usually set. As a
result, driver may access invalid memory when trying to calculate the
checksum and store the result (as observed in virtio_net driver).

Moreover, the very idea of modifying the userspace provided UDP header
is IMHO against raw socket semantics (I wasn't able to find a document
clearly stating this or the opposite, though). And while allowing
CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit
too intrusive change just to handle a corner case like this. Therefore
disallowing UFO for packets from SOCK_DGRAM seems to be the best option.

Signed-off-by: Michal Kubecek 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings 
(cherry picked from commit 332640b2821f75381b1049a904d93d4fb846334f)
Signed-off-by: Willy Tarreau 
---
 net/ipv4/ip_output.c  | 4 ++--
 net/ipv6/ip6_output.c | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index bd5c4b3..00d4d00 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -877,8 +877,8 @@ int ip_append_data(struct sock *sk,
inet->cork.length += length;
if (((length > mtu) || (skb && skb_has_frags(skb))) &&
(sk->sk_protocol == IPPROTO_UDP) &&
-   (rt->u.dst.dev->features & NETIF_F_UFO)) {
-   (rt->u.dst.dev->features & NETIF_F_UFO) && !rt->u.dst.header_len) {
+   (rt->u.dst.dev->features & NETIF_F_UFO) && !rt->u.dst.header_len &&
+   (sk->sk_type == SOCK_DGRAM)) {
err = ip_ufo_append_data(sk, getfrag, from, length, hh_len,
 fragheaderlen, transhdrlen, mtu,
 flags);
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6dff3d7..1934328 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1259,7 +1259,8 @@ int ip6_append_data(struct sock *sk, int getfrag(void 
*from, char *to,
if (((length > mtu) ||
 (skb && skb_has_frags(skb))) &&
(sk->sk_protocol == IPPROTO_UDP) &&
-   (rt->u.dst.dev->features & NETIF_F_UFO)) {
+   (rt->u.dst.dev->features & NETIF_F_UFO) &&
+   (sk->sk_type == SOCK_DGRAM)) {
err = ip6_ufo_append_data(sk, getfrag, from, length,
  hh_len, fragheaderlen,
  transhdrlen, mtu, flags, rt);
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 33/48] net: sysctl_net_core: check SNDBUF and RCVBUF for min length

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Alexey Kodanev 

[ Upstream commit b1cb59cf2efe7971d3d72a7b963d09a512d994c9 ]

sysctl has sysctl.net.core.rmem_*/wmem_* parameters which can be
set to incorrect values. Given that 'struct sk_buff' allocates from
rcvbuf, incorrectly set buffer length could result to memory
allocation failures. For example, set them as follows:

# sysctl net.core.rmem_default=64
  net.core.wmem_default = 64
# sysctl net.core.wmem_default=64
  net.core.wmem_default = 64
# ping localhost -s 1024 -i 0 > /dev/null

This could result to the following failure:

skbuff: skb_over_panic: text:81628db4 len:-32 put:-32
head:88003a1cc200 data:88003a1cc200 tail:0xffe0 end:0xc0 dev:
kernel BUG at net/core/skbuff.c:102!
invalid opcode:  [#1] SMP
...
task: 88003b7f5550 ti: 88003ae88000 task.ti: 88003ae88000
RIP: 0010:[]  [] skb_put+0xa1/0xb0
RSP: 0018:88003ae8bc68  EFLAGS: 00010296
RAX: 008d RBX: ffe0 RCX: 
RDX: 88003fdcf598 RSI: 88003fdcd9c8 RDI: 88003fdcd9c8
RBP: 88003ae8bc88 R08: 0001 R09: 
R10: 0001 R11: 02b2 R12: 
R13:  R14: 88003d3f7300 R15: 8812a900
FS:  7fa0e2b4a840() GS:88003fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 00d0f7e0 CR3: 3b8fb000 CR4: 06f0
Stack:
 88003a1cc200 ffe0 00c0 818cab1d
 88003ae8bd68 81628db4 88003ae8bd48 88003b7f5550
 880031a09408 88003b7f5550 8812aa48 8812ab00
Call Trace:
 [] unix_stream_sendmsg+0x2c4/0x470
 [] sock_write_iter+0x146/0x160
 [] new_sync_write+0x92/0xd0
 [] vfs_write+0xd6/0x180
 [] SyS_write+0x59/0xd0
 [] system_call_fastpath+0x12/0x17
Code: 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00
  00 00 48 c7 c7 30 db 91 81 48 89 04 24 31 c0 e8 4f a8 0e 00 <0f> 0b
  eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83
RIP  [] skb_put+0xa1/0xb0
RSP 
Kernel panic - not syncing: Fatal exception

Moreover, the possible minimum is 1, so we can get another kernel panic:
...
BUG: unable to handle kernel paging request at 88013caee5c0
IP: [] __alloc_skb+0x12f/0x1f0
...

Signed-off-by: Alexey Kodanev 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2: delete now-unused 'one' variable]
Signed-off-by: Ben Hutchings 
(cherry picked from commit 2d6dfb109bfbf3abd5f762173b1d73fd321dbe37)
Signed-off-by: Willy Tarreau 
---
 net/core/sysctl_net_core.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index a600328..d0a07c2 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -17,7 +17,8 @@
 static int zero = 0;
 static int ushort_max = 65535;
 
-static int one = 1;
+static int min_sndbuf = SOCK_MIN_SNDBUF;
+static int min_rcvbuf = SOCK_MIN_RCVBUF;
 
 static struct ctl_table net_core_table[] = {
 #ifdef CONFIG_NET
@@ -29,7 +30,7 @@ static struct ctl_table net_core_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec_minmax,
.strategy   = sysctl_intvec,
-   .extra1 = &one,
+   .extra1 = &min_sndbuf,
},
{
.ctl_name   = NET_CORE_RMEM_MAX,
@@ -39,7 +40,7 @@ static struct ctl_table net_core_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec_minmax,
.strategy   = sysctl_intvec,
-   .extra1 = &one,
+   .extra1 = &min_rcvbuf,
},
{
.ctl_name   = NET_CORE_WMEM_DEFAULT,
@@ -49,7 +50,7 @@ static struct ctl_table net_core_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec_minmax,
.strategy   = sysctl_intvec,
-   .extra1 = &one,
+   .extra1 = &min_sndbuf,
},
{
.ctl_name   = NET_CORE_RMEM_DEFAULT,
@@ -59,7 +60,7 @@ static struct ctl_table net_core_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec_minmax,
.strategy   = sysctl_intvec,
-   .extra1 = &one,
+   .extra1 = &min_rcvbuf,
},
{
.ctl_name   = NET_CORE_DEV_WEIGHT,
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 27/48] ppp: deflate: never return len larger than output buffer

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Florian Westphal 

[ Upstream commit e2a4800e75780ccf4e6c2487f82b688ba736eb18 ]

When we've run out of space in the output buffer to store more data, we
will call zlib_deflate with a NULL output buffer until we've consumed
remaining input.

When this happens, olen contains the size the output buffer would have
consumed iff we'd have had enough room.

This can later cause skb_over_panic when ppp_generic skb_put()s
the returned length.

Reported-by: Iain Douglas 
Signed-off-by: Florian Westphal 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings 
(cherry picked from commit 8bcd64423836bad3638684677f6d740bc7c9297f)
Signed-off-by: Willy Tarreau 
---
 drivers/net/ppp_deflate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ppp_deflate.c b/drivers/net/ppp_deflate.c
index 034c1c6..09a4382 100644
--- a/drivers/net/ppp_deflate.c
+++ b/drivers/net/ppp_deflate.c
@@ -269,7 +269,7 @@ static int z_compress(void *arg, unsigned char *rptr, 
unsigned char *obuf,
/*
 * See if we managed to reduce the size of the packet.
 */
-   if (olen < isize) {
+   if (olen < isize && olen <= osize) {
state->stats.comp_bytes += olen;
state->stats.comp_packets++;
} else {
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 47/48] hfsplus: fix B-tree corruption after insertion at position 0

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Sergei Antonov 

commit 98cf21c61a7f5419d82f847c4d77bf6e96a76f5f upstream.

Fix B-tree corruption when a new record is inserted at position 0 in the node
in hfs_brec_insert(). In this case a hfs_brec_update_parent() is called to
update the parent index node (if exists) and it is passed hfs_find_data with
a search_key containing a newly inserted key instead of the key to be updated.
This results in an inconsistent index node. The bug reproduces on my machine
after an extents overflow record for the catalog file (CNID=4) is inserted into
the extents overflow B-tree. Because of a low (reserved) value of CNID=4, it
has to become the first record in the first leaf node.
The resulting first leaf node is correct:

| key0.CNID=4 | key1.CNID=123 | key2.CNID=456, ... |

But the parent index key0 still contains the previous key CNID=123:
---
| key0.CNID=123 | ... |
---

A change in hfs_brec_insert() makes hfs_brec_update_parent() work correctly
by preventing it from getting fd->record=-1 value from __hfs_brec_find().

Along the way, I removed duplicate code with unification of the if condition.
The resulting code is equivalent to the original code because node is never 0.

Also hfs_brec_update_parent() will now return an error after getting a negative
fd->record value. However, the return value of hfs_brec_update_parent() is not
checked anywhere in the file and I'm leaving it unchanged by this patch.
brec.c lacks error checking after some other calls too, but this issue is of
less importance than the one being fixed by this patch.

Cc: sta...@vger.kernel.org
Cc: Joe Perches 
Cc: Andrew Morton 
Cc: Vyacheslav Dubeyko 
Cc: Hin-Tak Leung 
Cc: Anton Altaparmakov 
Cc: Al Viro 
Cc: Christoph Hellwig 
Signed-off-by: Sergei Antonov 
Signed-off-by: Willy Tarreau 
---
 fs/hfsplus/brec.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/hfsplus/brec.c b/fs/hfsplus/brec.c
index c88e5d7..5bcf730 100644
--- a/fs/hfsplus/brec.c
+++ b/fs/hfsplus/brec.c
@@ -119,13 +119,16 @@ skip:
hfs_bnode_write(node, entry, data_off + key_len, entry_len);
hfs_bnode_dump(node);
 
-   if (new_node) {
-   /* update parent key if we inserted a key
-* at the start of the first node
-*/
-   if (!rec && new_node != node)
-   hfs_brec_update_parent(fd);
+   /*
+* update parent key if we inserted a key
+* at the start of the node and it is not the new node
+*/
+   if (!rec && new_node != node) {
+   hfs_bnode_read_key(node, fd->search_key, data_off + size);
+   hfs_brec_update_parent(fd);
+   }
 
+   if (new_node) {
hfs_bnode_put(fd->bnode);
if (!new_node->parent) {
hfs_btree_inc_height(tree);
@@ -154,9 +157,6 @@ skip:
goto again;
}
 
-   if (!rec)
-   hfs_brec_update_parent(fd);
-
return 0;
 }
 
@@ -341,6 +341,8 @@ again:
if (IS_ERR(parent))
return PTR_ERR(parent);
__hfs_brec_find(parent, fd);
+   if (fd->record < 0)
+   return -ENOENT;
hfs_bnode_dump(parent);
rec = fd->record;
 
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 18/48] isofs: Fix unchecked printing of ER records

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Jan Kara 

commit 4e2024624e678f0ebb916e6192bd23c1f9fdf696 upstream

We didn't check length of rock ridge ER records before printing them.
Thus corrupted isofs image can cause us to access and print some memory
behind the buffer with obvious consequences.

Reported-and-tested-by: Carl Henrik Lunde 
CC: sta...@vger.kernel.org
Signed-off-by: Jan Kara 
Signed-off-by: Willy Tarreau 
---
 fs/isofs/rock.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/isofs/rock.c b/fs/isofs/rock.c
index 69c737d..2ec72ae 100644
--- a/fs/isofs/rock.c
+++ b/fs/isofs/rock.c
@@ -363,6 +363,9 @@ repeat:
rs.cont_size = isonum_733(rr->u.CE.size);
break;
case SIG('E', 'R'):
+   /* Invalid length of ER tag id? */
+   if (rr->u.ER.len_id + offsetof(struct rock_ridge, 
u.ER.data) > rr->len)
+   goto out;
ISOFS_SB(inode->i_sb)->s_rock = 1;
printk(KERN_DEBUG "ISO 9660 Extensions: ");
{
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 39/48] spi: spidev: fix possible arithmetic overflow for multi-transfer message

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Ian Abbott 

commit f20fbaad7620af2df36a1f9d1c9ecf48ead5b747 upstream.

`spidev_message()` sums the lengths of the individual SPI transfers to
determine the overall SPI message length.  It restricts the total
length, returning an error if too long, but it does not check for
arithmetic overflow.  For example, if the SPI message consisted of two
transfers and the first has a length of 10 and the second has a length
of (__u32)(-1), the total length would be seen as 9, even though the
second transfer is actually very long.  If the second transfer specifies
a null `rx_buf` and a non-null `tx_buf`, the `copy_from_user()` could
overrun the spidev's pre-allocated tx buffer before it reaches an
invalid user memory address.  Fix it by checking that neither the total
nor the individual transfer lengths exceed the maximum allowed value.

Thanks to Dan Carpenter for reporting the potential integer overflow.

Signed-off-by: Ian Abbott 
Signed-off-by: Mark Brown 
[Ian Abbott: Note: original commit compares the lengths to INT_MAX
 instead of bufsiz due to changes in earlier commits.]
Signed-off-by: Ben Hutchings 
(cherry picked from commit 7499401e4a0b01ee43cff768de4ca630dcd0bc64)
Signed-off-by: Willy Tarreau 
---
 drivers/spi/spidev.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
index 5d23983..4dd8e2a 100644
--- a/drivers/spi/spidev.c
+++ b/drivers/spi/spidev.c
@@ -241,7 +241,10 @@ static int spidev_message(struct spidev_data *spidev,
k_tmp->len = u_tmp->len;
 
total += k_tmp->len;
-   if (total > bufsiz) {
+   /* Check total length of transfers.  Also check each
+* transfer length to avoid arithmetic overflow.
+*/
+   if (total > bufsiz || k_tmp->len > bufsiz) {
status = -EMSGSIZE;
goto done;
}
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 35/48] rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Al Viro 

[ Upstream commit 7d985ed1dca5c90535d67ce92ef6ca520302340a ]

[I would really like an ACK on that one from dhowells; it appears to be
quite straightforward, but...]

MSG_PEEK isn't passed to ->recvmsg() via msg->msg_flags; as the matter of
fact, neither the kernel users of rxrpc, nor the syscalls ever set that bit
in there.  It gets passed via flags; in fact, another such check in the same
function is done correctly - as flags & MSG_PEEK.

It had been that way (effectively disabled) for 8 years, though, so the patch
needs beating up - that case had never been tested.  If it is correct, it's
-stable fodder.

Signed-off-by: Al Viro 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings 
(cherry picked from commit 10c82cd7d46e4c525b046c399fcd285ce138198e)

Signed-off-by: Willy Tarreau 
---
 net/rxrpc/ar-recvmsg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/ar-recvmsg.c b/net/rxrpc/ar-recvmsg.c
index d5630d9..b6076b2 100644
--- a/net/rxrpc/ar-recvmsg.c
+++ b/net/rxrpc/ar-recvmsg.c
@@ -86,7 +86,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
if (!skb) {
/* nothing remains on the queue */
if (copied &&
-   (msg->msg_flags & MSG_PEEK || timeo == 0))
+   (flags & MSG_PEEK || timeo == 0))
goto out;
 
/* wait for a message to turn up */
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 34/48] rds: avoid potential stack overflow

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Arnd Bergmann 

[ Upstream commit f862e07cf95d5b62a5fc5e981dd7d0dbaf33a501 ]

The rds_iw_update_cm_id function stores a large 'struct rds_sock' object
on the stack in order to pass a pair of addresses. This happens to just
fit withint the 1024 byte stack size warning limit on x86, but just
exceed that limit on ARM, which gives us this warning:

net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 
1024 bytes [-Wframe-larger-than=]

As the use of this large variable is basically bogus, we can rearrange
the code to not do that. Instead of passing an rds socket into
rds_iw_get_device, we now just pass the two addresses that we have
available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly,
to create two address structures on the stack there.

Signed-off-by: Arnd Bergmann 
Acked-by: Sowmini Varadhan 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings 
(cherry picked from commit 3fe2d645fe4ea7ff6cba9020685e46c1a1dff9c0)

Signed-off-by: Willy Tarreau 
---
 net/rds/iw_rdma.c | 40 ++--
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/net/rds/iw_rdma.c b/net/rds/iw_rdma.c
index de4a1b1..6ed9cdd 100644
--- a/net/rds/iw_rdma.c
+++ b/net/rds/iw_rdma.c
@@ -86,7 +86,9 @@ static unsigned int rds_iw_unmap_fastreg_list(struct 
rds_iw_mr_pool *pool,
struct list_head *kill_list);
 static void rds_iw_destroy_fastreg(struct rds_iw_mr_pool *pool, struct 
rds_iw_mr *ibmr);
 
-static int rds_iw_get_device(struct rds_sock *rs, struct rds_iw_device 
**rds_iwdev, struct rdma_cm_id **cm_id)
+static int rds_iw_get_device(struct sockaddr_in *src, struct sockaddr_in *dst,
+struct rds_iw_device **rds_iwdev,
+struct rdma_cm_id **cm_id)
 {
struct rds_iw_device *iwdev;
struct rds_iw_cm_id *i_cm_id;
@@ -110,15 +112,15 @@ static int rds_iw_get_device(struct rds_sock *rs, struct 
rds_iw_device **rds_iwd
src_addr->sin_port,
dst_addr->sin_addr.s_addr,
dst_addr->sin_port,
-   rs->rs_bound_addr,
-   rs->rs_bound_port,
-   rs->rs_conn_addr,
-   rs->rs_conn_port);
+   src->sin_addr.s_addr,
+   src->sin_port,
+   dst->sin_addr.s_addr,
+   dst->sin_port);
 #ifdef WORKING_TUPLE_DETECTION
-   if (src_addr->sin_addr.s_addr == rs->rs_bound_addr &&
-   src_addr->sin_port == rs->rs_bound_port &&
-   dst_addr->sin_addr.s_addr == rs->rs_conn_addr &&
-   dst_addr->sin_port == rs->rs_conn_port) {
+   if (src_addr->sin_addr.s_addr == src->sin_addr.s_addr &&
+   src_addr->sin_port == src->sin_port &&
+   dst_addr->sin_addr.s_addr == dst->sin_addr.s_addr &&
+   dst_addr->sin_port == dst->sin_port) {
 #else
/* FIXME - needs to compare the local and remote
 * ipaddr/port tuple, but the ipaddr is the only
@@ -126,7 +128,7 @@ static int rds_iw_get_device(struct rds_sock *rs, struct 
rds_iw_device **rds_iwd
 * zero'ed.  It doesn't appear to be properly populated
 * during connection setup...
 */
-   if (src_addr->sin_addr.s_addr == rs->rs_bound_addr) {
+   if (src_addr->sin_addr.s_addr == src->sin_addr.s_addr) {
 #endif
spin_unlock_irq(&iwdev->spinlock);
*rds_iwdev = iwdev;
@@ -177,19 +179,13 @@ int rds_iw_update_cm_id(struct rds_iw_device *rds_iwdev, 
struct rdma_cm_id *cm_i
 {
struct sockaddr_in *src_addr, *dst_addr;
struct rds_iw_device *rds_iwdev_old;
-   struct rds_sock rs;
struct rdma_cm_id *pcm_id;
int rc;
 
src_addr = (struct sockaddr_in *)&cm_id->route.addr.src_addr;
dst_addr = (struct sockaddr_in *)&cm_id->route.addr.dst_addr;
 
-   rs.rs_bound_addr = src_addr->sin_addr.s_addr;
-   rs.rs_bound_port = src_addr->sin_port;
-   rs.rs_conn_addr = dst_addr->sin_addr.s_addr;
-   rs.rs_conn_port = dst_addr->sin_port;
-
-   rc = rds_iw_get_device(&rs, &rds_iwdev_old, &pcm_id);
+   rc = rds_iw_get_device(src_addr, dst_addr, &rds_iwdev_old, &pcm_id);
if (rc)
rds_iw_remove_cm_id(rds_iwdev, cm_id);
 
@@ -609,9 +605,17 @@ void *rds_iw_get_mr(struct scatterlist *sg, unsigned long 
nents,
struct rds_iw_device *rds_iwdev;
struct rds_iw_mr *ibmr =

[ 25/48] fs: take i_mutex during prepare_binprm for set[ug]id executables

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Jann Horn 

commit 8b01fc86b9f425899f8a3a8fc1c47d73c2c20543 upstream.

This prevents a race between chown() and execve(), where chowning a
setuid-user binary to root would momentarily make the binary setuid
root.

This patch was mostly written by Linus Torvalds.

Signed-off-by: Jann Horn 
Signed-off-by: Linus Torvalds 
[bwh: Backported to 3.2:
 - Drop the task_no_new_privs() and user namespace checks
 - Open-code file_inode()
 - s/READ_ONCE/ACCESS_ONCE/
 - Adjust context]
Signed-off-by: Ben Hutchings 
(cherry picked from commit 470e517be17dd6ef8670bec7bd7831ea0d3ad8a6)

Signed-off-by: Willy Tarreau 
---
 fs/exec.c | 65 +++
 1 file changed, 40 insertions(+), 25 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index c32ae34..8dc1270 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1181,6 +1181,45 @@ int check_unsafe_exec(struct linux_binprm *bprm)
return res;
 }
 
+static void bprm_fill_uid(struct linux_binprm *bprm)
+{
+   struct inode *inode;
+   unsigned int mode;
+   uid_t uid;
+   gid_t gid;
+
+   /* clear any previous set[ug]id data from a previous binary */
+   bprm->cred->euid = current_euid();
+   bprm->cred->egid = current_egid();
+
+   if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
+   return;
+
+   inode = bprm->file->f_path.dentry->d_inode;
+   mode = ACCESS_ONCE(inode->i_mode);
+   if (!(mode & (S_ISUID|S_ISGID)))
+   return;
+
+   /* Be careful if suid/sgid is set */
+   mutex_lock(&inode->i_mutex);
+
+   /* reload atomically mode/uid/gid now that lock held */
+   mode = inode->i_mode;
+   uid = inode->i_uid;
+   gid = inode->i_gid;
+   mutex_unlock(&inode->i_mutex);
+
+   if (mode & S_ISUID) {
+   bprm->per_clear |= PER_CLEAR_ON_SETID;
+   bprm->cred->euid = uid;
+   }
+
+   if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
+   bprm->per_clear |= PER_CLEAR_ON_SETID;
+   bprm->cred->egid = gid;
+   }
+}
+
 /* 
  * Fill the binprm structure from the inode. 
  * Check permissions, then read the first 128 (BINPRM_BUF_SIZE) bytes
@@ -1189,36 +1228,12 @@ int check_unsafe_exec(struct linux_binprm *bprm)
  */
 int prepare_binprm(struct linux_binprm *bprm)
 {
-   umode_t mode;
-   struct inode * inode = bprm->file->f_path.dentry->d_inode;
int retval;
 
-   mode = inode->i_mode;
if (bprm->file->f_op == NULL)
return -EACCES;
 
-   /* clear any previous set[ug]id data from a previous binary */
-   bprm->cred->euid = current_euid();
-   bprm->cred->egid = current_egid();
-
-   if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)) {
-   /* Set-uid? */
-   if (mode & S_ISUID) {
-   bprm->per_clear |= PER_CLEAR_ON_SETID;
-   bprm->cred->euid = inode->i_uid;
-   }
-
-   /* Set-gid? */
-   /*
-* If setgid is set but no group execute bit then this
-* is a candidate for mandatory locking, not a setgid
-* executable.
-*/
-   if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
-   bprm->per_clear |= PER_CLEAR_ON_SETID;
-   bprm->cred->egid = inode->i_gid;
-   }
-   }
+   bprm_fill_uid(bprm);
 
/* fill in binprm security blob */
retval = security_bprm_set_creds(bprm);
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 06/48] x86/tls: Disallow unusual TLS segments

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Andy Lutomirski 

commit 0e58af4e1d2166e9e33375a0f121e4867010d4f8 upstream.

Users have no business installing custom code segments into the
GDT, and segments that are not present but are otherwise valid
are a historical source of interesting attacks.

For completeness, block attempts to set the L bit.  (Prior to
this patch, the L bit would have been silently dropped.)

This is an ABI break.  I've checked glibc, musl, and Wine, and
none of them look like they'll have any trouble.

Note to stable maintainers: this is a hardening patch that fixes
no known bugs.  Given the possibility of ABI issues, this
probably shouldn't be backported quickly.

Signed-off-by: Andy Lutomirski 
Acked-by: H. Peter Anvin 
Cc: Konrad Rzeszutek Wilk 
Cc: Linus Torvalds 
Cc: secur...@kernel.org 
Cc: Willy Tarreau 
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings 
(cherry picked from commit fbc3c534ddffeebba6f943945ac71ec83cfa04b8)

Signed-off-by: Willy Tarreau 
---
 arch/x86/kernel/tls.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/arch/x86/kernel/tls.c b/arch/x86/kernel/tls.c
index 8dda590..6146cc0 100644
--- a/arch/x86/kernel/tls.c
+++ b/arch/x86/kernel/tls.c
@@ -61,6 +61,28 @@ static bool tls_desc_okay(const struct user_desc *info)
if (!info->seg_32bit)
return false;
 
+   /* Only allow data segments in the TLS array. */
+   if (info->contents > 1)
+   return false;
+
+   /*
+* Non-present segments with DPL 3 present an interesting attack
+* surface.  The kernel should handle such segments correctly,
+* but TLS is very difficult to protect in a sandbox, so prevent
+* such segments from being created.
+*
+* If userspace needs to remove a TLS entry, it can still delete
+* it outright.
+*/
+   if (info->seg_not_present)
+   return false;
+
+#ifdef CONFIG_X86_64
+   /* The L bit makes no sense for data. */
+   if (info->lm)
+   return false;
+#endif
+
return true;
 }
 
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 44/48] lockd: Try to reconnect if statd has moved

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Benjamin Coddington 

commit 173b3afceebe76fa2205b2c8808682d5b541fe3c upstream.

If rpc.statd is restarted, upcalls to monitor hosts can fail with
ECONNREFUSED.  In that case force a lookup of statd's new port and retry the
upcall.

Signed-off-by: Benjamin Coddington 
Signed-off-by: Trond Myklebust 
[bwh: Backported to 3.2: not using RPC_TASK_SOFTCONN]
Signed-off-by: Ben Hutchings 
(cherry picked from commit 3aabe891f32c209a2be7cd5581d2634020e801c1)

Signed-off-by: Willy Tarreau 
---
 fs/lockd/mon.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
index f956651..48de6a5 100644
--- a/fs/lockd/mon.c
+++ b/fs/lockd/mon.c
@@ -109,6 +109,12 @@ static int nsm_mon_unmon(struct nsm_handle *nsm, u32 proc, 
struct nsm_res *res)
 
msg.rpc_proc = &clnt->cl_procinfo[proc];
status = rpc_call_sync(clnt, &msg, 0);
+   if (status == -ECONNREFUSED) {
+   dprintk("lockd: NSM upcall RPC failed, status=%d, forcing 
rebind\n",
+   status);
+   rpc_force_rebind(clnt);
+   status = rpc_call_sync(clnt, &msg, 0);
+   }
if (status < 0)
dprintk("lockd: NSM upcall RPC failed, status=%d\n",
status);
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 13/48] x86: Conditionally update time when ack-ing pending irqs

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Shai Fultheim 

commit 42fa4250436304d4650fa271f37671f6cee24e08 upstream.

On virtual environments, apic_read could take a long time. As a
result, under certain conditions the ack pending loop may exit
without any queued irqs left, but after more than one second. A
warning will be printed needlessly in this case.

If the loop is about to exit regardless of max_loops, don't
update it.

Signed-off-by: Shai Fultheim 
[ rebased and reworded the commit message]
Signed-off-by: Ido Yariv 
Acked-by: Thomas Gleixner 
Link: http://lkml.kernel.org/r/1334873552-31346-1-git-send-email-...@wizery.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings 
(cherry picked from commit c9f1417be9acae3a9867f8bdab2b7924d76cf6ac)

Signed-off-by: Willy Tarreau 
---
 arch/x86/kernel/apic/apic.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 1d2d670..be4bf4c 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1250,11 +1250,13 @@ void __cpuinit setup_local_APIC(void)
   acked);
break;
}
-   if (cpu_has_tsc) {
-   rdtscll(ntsc);
-   max_loops = (cpu_khz << 10) - (ntsc - tsc);
-   } else
-   max_loops--;
+   if (queued) {
+   if (cpu_has_tsc) {
+   rdtscll(ntsc);
+   max_loops = (cpu_khz << 10) - (ntsc - tsc);
+   } else
+   max_loops--;
+   }
} while (queued && max_loops > 0);
WARN_ON(max_loops <= 0);
 
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 12/48] x86/asm/entry/64: Remove a bogus ret_from_fork optimization

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Andy Lutomirski 

commit 956421fbb74c3a6261903f3836c0740187cf038b upstream.

'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and
the related state make sense for 'ret_from_sys_call'.  This is
entirely the wrong check.  TS_COMPAT would make a little more
sense, but there's really no point in keeping this optimization
at all.

This fixes a return to the wrong user CS if we came from int
0x80 in a 64-bit task.

Signed-off-by: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Oleg Nesterov 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.l...@amacapital.net
[ Backported from tip:x86/asm. ]
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings 
(cherry picked from commit 159891c0953a89a28f793fc52373b031262c44d2)

Signed-off-by: Willy Tarreau 
---
 arch/x86/kernel/entry_64.S | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index d9bcee0..303eaeb8 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -413,11 +413,14 @@ ENTRY(ret_from_fork)
testl $3, CS-ARGOFFSET(%rsp)# from kernel_thread?
je   int_ret_from_sys_call
 
-   testl $_TIF_IA32, TI_flags(%rcx)# 32-bit compat task needs IRET
-   jnz  int_ret_from_sys_call
-
-   RESTORE_TOP_OF_STACK %rdi, -ARGOFFSET
-   jmp ret_from_sys_call   # go to the SYSRET fastpath
+   /*
+* By the time we get here, we have no idea whether our pt_regs,
+* ti flags, and ti status came from the 64-bit SYSCALL fast path,
+* the slow path, or one of the ia32entry paths.
+* Use int_ret_from_sys_call to return, since it can safely handle
+* all of the above.
+*/
+   jmp  int_ret_from_sys_call
 
CFI_ENDPROC
 END(ret_from_fork)
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 30/48] ipv4: Dont use ufo handling on later transformed packets

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Steffen Klassert 

We might call ip_ufo_append_data() for packets that will be IPsec
transformed later. This function should be used just for real
udp packets. So we check for rt->dst.header_len which is only
nonzero on IPsec handling and call ip_ufo_append_data() just
if rt->dst.header_len is zero.

Signed-off-by: Steffen Klassert 
Signed-off-by: David S. Miller 
(cherry picked from commit c146066ab80267c3305de5dda6a4083f06df9265)
Signed-off-by: Willy Tarreau 
---
 net/ipv4/ip_output.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index faa6623..bd5c4b3 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -878,6 +878,7 @@ int ip_append_data(struct sock *sk,
if (((length > mtu) || (skb && skb_has_frags(skb))) &&
(sk->sk_protocol == IPPROTO_UDP) &&
(rt->u.dst.dev->features & NETIF_F_UFO)) {
+   (rt->u.dst.dev->features & NETIF_F_UFO) && !rt->u.dst.header_len) {
err = ip_ufo_append_data(sk, getfrag, from, length, hh_len,
 fragheaderlen, transhdrlen, mtu,
 flags);
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: perf.data file format specification draft

2015-05-15 Thread Stephane Eranian

On Thu, May 14, 2015 at 5:25 AM, Andi Kleen  wrote:
> Hi,
>
> Since there are more and more consumers I started a description of the
> on-disk perf.data format. This does not replace the kernel perf event
> description or the manpage, but describes the parts that perf record
> adds.
>
> So far it is still has some gaps and needs review. Eventually this should
> become part of the perf documentation.
>
> Steven, would be good if you could fill in some details on how trace
> data works.
> Adrian, would be good if you could fill in the missing bits for
> auxtrace/itrace.
> Everyone else, please review and add missing information.
>
> Thanks,
> -Andi
>
> ---
>
> perf.data format
>
> This document describes the on-disk perf.data format, generated by perf record
> or perf inject and consumed by the other perf tools.
>
> On a high level perf.data contains the events generated by the PMUs, plus 
> metadata.
>
> All fields are in native-endian of the machine that generated the perf.data.
>
> When perf is writing to a pipe it uses a special version of the file
> format that does not rely on seeking to adjust data offsets.  This
> format is not described here. The pipe version can be converted to
> normal perf.data with perf inject.
>
> The file starts with a perf_header:
>
> struct perf_header {
> char magic[8];  /* PERFILE2 */
> uint64_t size;  /* size of the header */
> uint64_t attr_size; /* size of an attribute in attrs */
> struct perf_file_section attrs;
> struct perf_file_section data;
> struct perf_file_section event_types;
> uint64_t flags;
> uint64_t flags1[3];
> };
>
> The magic number identifies the perf file and the version. Current perf 
> versions
> use PERFILE2. Old perf versions generated a version 1 format (PERFFILE). 
> Version 1
> is not described here. The magic number also identifies the endian. When the
> magic value is 64bit byte swapped compared the file is in non-native
> endian.
>
> A perf_file_section contains a pointer to another section of the perf file.
> The header contains three such pointers: for attributes, data and event types.
>
> struct perf_file_section {
> uint64_t offset;/* offset from start of file */
> uint64_t size;  /* size of the section */
> };
>
> Flags section:
>
> The header is followed by different optional headers, described by the bits 
> set
> in flags. Only headers for which the bit is set are included. Each header
> consists of a perf_file_section located after the initial header.
> The respective perf_file_section points to the data of the additional
> header and defines its size.
>
> Some headers consist of strings, which are defined like this:
>
> struct perf_header_string {
>uint32_t len;
>char string[len]; /* zero terminated */
> };
>
> Some headers consist of a sequence of strings, which start with a
>
> struct perf_header_string_list {
>  uint32_t nr;
>  struct perf_header_string strings[nr]; /* variable length records */
> };
>
> The bits are the flags bits in a 256 bit bitmap starting with
> flags. These define the valid bits:
>
> HEADER_RESERVED = 0,/* always cleared */
> HEADER_FIRST_FEATURE= 1,
> HEADER_TRACING_DATA = 1,
>
> Describe me.
>
> HEADER_BUILD_ID = 2,
>
> The header consists of an sequence of build_id_event. The size of each record
> is defined by header.size (see perf_event.h). Each event defines a ELF build 
> id
> for a executable file name for a pid. An ELF build id is a unique identifier
> assigned by the linker to an executable.
>
> struct build_id_event {
> struct perf_event_header header;
> pid_tpid;
> uint8_t  build_id[24];
> char filename[header.size - offsetof(struct 
> build_id_event, filename)];
> };
>
> HEADER_HOSTNAME = 3,
>
> A perf_header_string with the hostname where the data was collected
> (uname -n)
>
> HEADER_OSRELEASE = 4,
>
> A perf_header_string with the os release where the data was collected
> (uname -r)
>
> HEADER_VERSION = 5,
>
> A perf_header_string with the perf user tool version where the
> data was collected. This is the same as the version of the source tree
> the perf tool was built from.
>
> HEADER_ARCH = 6,
>
> A perf_header_string with the CPU architecture (uname -m)
>
> HEADER_NRCPUS = 7,
>
> A structure defining the number of CPUs.
>
> struct nr_cpus {
>uint32_t nr_cpus_online;
>uint32_t nr_cpus_available; /* CPUs not yet onlined */
> };
>
> HEADER_CPUDESC = 8,
>
> A perf_header_string with description of the CPU. On x86 this is the model 
> name
> in /proc/cpuinfo
>
> HEADER_CPUID = 9,
>
> A perf_header_string with the exact CPU type. On x86 this is
> vendor,family,model,stepping. For example: GenuineIntel,6,69,1
>
> HEADER_TOTAL_MEM = 10,
>
>

[ 48/48] sound/oss: fix deadlock in sequencer_ioctl(SNDCTL_SEQ_OUTOFBAND)

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Alexey Khoroshilov 

A deadlock can be initiated by userspace via ioctl(SNDCTL_SEQ_OUTOFBAND)
on /dev/sequencer with TMR_ECHO midi event.

In this case the control flow is:
sound_ioctl()
-> case SND_DEV_SEQ:
   case SND_DEV_SEQ2:
 sequencer_ioctl()
 -> case SNDCTL_SEQ_OUTOFBAND:
  spin_lock_irqsave(&lock,flags);
  play_event();
  -> case EV_TIMING:
   seq_timing_event()
   -> case TMR_ECHO:
seq_copy_to_input()
-> spin_lock_irqsave(&lock,flags);

It seems that spin_lock_irqsave() around play_event() is not necessary,
because the only other call location in seq_startplay() makes the call
without acquiring spinlock.

So, the patch just removes spinlocks around play_event().
By the way, it removes unreachable code in seq_timing_event(),
since (seq_mode == SEQ_2) case is handled in the beginning.

Compile tested only.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov 
Signed-off-by: Takashi Iwai 
(cherry picked from commit bc26d4d06e337ade069f33d3f4377593b24e6e36)
Signed-off-by: Willy Tarreau 
---
 sound/oss/sequencer.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/sound/oss/sequencer.c b/sound/oss/sequencer.c
index 5cb171d..7d32997 100644
--- a/sound/oss/sequencer.c
+++ b/sound/oss/sequencer.c
@@ -677,13 +677,8 @@ static int seq_timing_event(unsigned char *event_rec)
break;
 
case TMR_ECHO:
-   if (seq_mode == SEQ_2)
-   seq_copy_to_input(event_rec, 8);
-   else
-   {
-   parm = (parm << 8 | SEQ_ECHO);
-   seq_copy_to_input((unsigned char *) &parm, 4);
-   }
+   parm = (parm << 8 | SEQ_ECHO);
+   seq_copy_to_input((unsigned char *) &parm, 4);
break;
 
default:;
@@ -1326,7 +1321,6 @@ int sequencer_ioctl(int dev, struct file *file, unsigned 
int cmd, void __user *a
int mode = translate_mode(file);
struct synth_info inf;
struct seq_event_rec event_rec;
-   unsigned long flags;
int __user *p = arg;
 
orig_dev = dev = dev >> 4;
@@ -1481,9 +1475,7 @@ int sequencer_ioctl(int dev, struct file *file, unsigned 
int cmd, void __user *a
case SNDCTL_SEQ_OUTOFBAND:
if (copy_from_user(&event_rec, arg, sizeof(event_rec)))
return -EFAULT;
-   spin_lock_irqsave(&lock,flags);
play_event(event_rec.arr);
-   spin_unlock_irqrestore(&lock,flags);
return 0;
 
case SNDCTL_MIDI_INFO:
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 43/48] pagemap: do not leak physical addresses to non-privileged userspace

2015-05-15 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: "Kirill A. Shutemov" 

commit ab676b7d6fbf4b294bf198fb27ade5b0e865c7ce upstream.

As pointed by recent post[1] on exploiting DRAM physical imperfection,
/proc/PID/pagemap exposes sensitive information which can be used to do
attacks.

This disallows anybody without CAP_SYS_ADMIN to read the pagemap.

[1] 
http://googleprojectzero.blogspot.com/2015/03/exploiting-dram-rowhammer-bug-to-gain.html

[ Eventually we might want to do anything more finegrained, but for now
  this is the simple model.   - Linus ]

Signed-off-by: Kirill A. Shutemov 
Acked-by: Konstantin Khlebnikov 
Acked-by: Andy Lutomirski 
Cc: Pavel Emelyanov 
Cc: Andrew Morton 
Cc: Mark Seaborn 
Signed-off-by: Linus Torvalds 
[mancha security: Backported to 3.10]
Signed-off-by: mancha security 
Signed-off-by: Ben Hutchings 
(cherry picked from commit 1ffc3cd9a36b504c20ce98fe5eeb5463f389e1ac)

Signed-off-by: Willy Tarreau 
---
 fs/proc/task_mmu.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 3b7b82a..73db5a6 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -773,9 +773,19 @@ out:
return ret;
 }
 
+static int pagemap_open(struct inode *inode, struct file *file)
+{
+   /* do not disclose physical addresses to unprivileged
+  userspace (closes a rowhammer attack vector) */
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+   return 0;
+}
+
 const struct file_operations proc_pagemap_operations = {
.llseek = mem_lseek, /* borrow this */
.read   = pagemap_read,
+   .open   = pagemap_open,
 };
 #endif /* CONFIG_PROC_PAGE_MONITOR */
 
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 >

1 - 100 of 849 matches

Mail list logo