[RFC][PATCH] kvm: fix a race when closing irq eventfd

2013-02-17 Thread Li Zefan
While trying to fix a race when closing cgroup eventfd, I took a look
at how kvm deals with this problem, and I found it doesn't.

I may be wrong, as I don't know kvm code, so correct me if I'm.

/*
 * Race-free decouple logic (ordering is critical)
 */
static void
irqfd_shutdown(struct work_struct *work)

I don't think it's race-free!

static int
irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
{
...
 * We cannot race against the irqfd going away since the
 * other side is required to acquire wqh->lock, which 
we hold
 */
if (irqfd_is_active(irqfd))
irqfd_deactivate(irqfd);
}

In kvm_irqfd_deassign() and kvm_irqfd_release() where irqfds are freed,
wqh->lock is not acquired!

So here is the race:

CPU0CPU1
--- -
kvm_irqfd_release()
  spin_lock(kvm->irqfds.lock);
  ...
  irqfd_deactivate(irqfd);
list_del_init(&irqfd->list);
  spin_unlock(kvm->irqfd.lock);
  ...
close(eventfd)
  irqfd_wakeup();
irqfd_shutdown();
  remove_waitqueue(irqfd->wait);
  kfree(irqfd);
spin_lock(kvm->irqfd.lock);
  if (!list_empty(&irqfd->list))
irqfd_deactivate(irqfd);
  list_del_init(&irqfd->list);
spin_unlock(kvm->irqfd.lock);

Look, we're accessing irqfd though it has already been freed!

Here's a fix I have. Please enlighten me with a better fix.

Signed-off-by: Li Zefan 
---
 fs/eventfd.c| 34 ++
 include/linux/eventfd.h | 17 +
 virt/kvm/eventfd.c  | 45 +++--
 3 files changed, 66 insertions(+), 30 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 35470d9..cda8a4c 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -22,21 +22,6 @@
 #include 
 #include 
 
-struct eventfd_ctx {
-   struct kref kref;
-   wait_queue_head_t wqh;
-   /*
-* Every time that a write(2) is performed on an eventfd, the
-* value of the __u64 being written is added to "count" and a
-* wakeup is performed on "wqh". A read(2) will return the "count"
-* value to userspace, and will reset "count" to zero. The kernel
-* side eventfd_signal() also, adds to the "count" counter and
-* issue a wakeup.
-*/
-   __u64 count;
-   unsigned int flags;
-};
-
 /**
  * eventfd_signal - Adds @n to the eventfd counter.
  * @ctx: [in] Pointer to the eventfd context.
@@ -153,20 +138,29 @@ static void eventfd_ctx_do_read(struct eventfd_ctx *ctx, 
__u64 *cnt)
  * This is used to atomically remove a wait queue entry from the eventfd wait
  * queue head, and read/reset the counter value.
  */
-int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_t *wait,
+int eventfd_ctx_remove_wait_queue_locked(struct eventfd_ctx *ctx, wait_queue_t 
*wait,
  __u64 *cnt)
 {
-   unsigned long flags;
-
-   spin_lock_irqsave(&ctx->wqh.lock, flags);
eventfd_ctx_do_read(ctx, cnt);
__remove_wait_queue(&ctx->wqh, wait);
if (*cnt != 0 && waitqueue_active(&ctx->wqh))
wake_up_locked_poll(&ctx->wqh, POLLOUT);
-   spin_unlock_irqrestore(&ctx->wqh.lock, flags);
 
return *cnt != 0 ? 0 : -EAGAIN;
 }
+EXPORT_SYMBOL_GPL(eventfd_ctx_remove_wait_queue_locked);
+
+int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_t *wait,
+ __u64 *cnt)
+{
+   unsigned long flags;
+   int ret;
+
+   spin_lock_irqsave(&ctx->wqh.lock, flags);
+   ret = eventfd_ctx_remove_wait_queue_locked(ctx, wait, cnt);
+   spin_unlock_irqrestore(&ctx->wqh.lock, flags);
+   return ret;
+}
 EXPORT_SYMBOL_GPL(eventfd_ctx_remove_wait_queue);
 
 /**
diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
index 3c3ef19..61085ac 100644
--- a/include/linux/eventfd.h
+++ b/include/linux/eventfd.h
@@ -26,6 +26,21 @@
 #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
 #define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE)
 
+struct eventfd_ctx {
+   struct kref kref;
+   wait_queue_head_t wqh;
+   /*
+* Every time that a write(2) is performed on an eventfd, the
+* value of the __u64 being written is added to "count"

Re: [RFC][PATCH] kvm: fix a race when closing irq eventfd

2013-02-17 Thread Li Zefan
On 2013/2/18 12:02, Alex Williamson wrote:
> On Mon, 2013-02-18 at 11:13 +0800, Li Zefan wrote:
>> While trying to fix a race when closing cgroup eventfd, I took a look
>> at how kvm deals with this problem, and I found it doesn't.
>>
>> I may be wrong, as I don't know kvm code, so correct me if I'm.
>>
>>  /*
>>   * Race-free decouple logic (ordering is critical)
>>   */
>>  static void
>>  irqfd_shutdown(struct work_struct *work)
>>
>> I don't think it's race-free!
>>
>>  static int
>>  irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
>>  {
>>  ...
>>   * We cannot race against the irqfd going away since the
>>   * other side is required to acquire wqh->lock, which 
>> we hold
>>   */
>>  if (irqfd_is_active(irqfd))
>>  irqfd_deactivate(irqfd);
>>  }
>>
>> In kvm_irqfd_deassign() and kvm_irqfd_release() where irqfds are freed,
>> wqh->lock is not acquired!
>>
>> So here is the race:
>>
>> CPU0CPU1
>> --- -
>> kvm_irqfd_release()
>>   spin_lock(kvm->irqfds.lock);
>>   ...
>>   irqfd_deactivate(irqfd);
>> list_del_init(&irqfd->list);
>>   spin_unlock(kvm->irqfd.lock);
>>   ...
>>  close(eventfd)
>>irqfd_wakeup();
> 
> irqfd_wakeup is assumed to be called with wqh->lock held
> 

I'm aware of this.

As I said, kvm_irqfd_deassign() and kvm_irqfd_release() are not acquiring
wqh->lock.

>> irqfd_shutdown();
> 
> eventfd_ctx_remove_wait_queue has to acquire wqh->lock to complete or
> else irqfd_shutdown never makes it to the kfree.  So in your scenario
> this cpu0 spins here until cpu1 completes.
> 
>>   remove_waitqueue(irqfd->wait);
>>   kfree(irqfd);
>>  spin_lock(kvm->irqfd.lock);
>>if (!list_empty(&irqfd->list))
> 
> We don't take this branch because we already did list_del_init above,
> which makes irqfd->list empty.
> 

It doesn't matter if the list is empty or not.

The point is, irqfd has been kfreed, so the if statement is simply not safe!

>>  irqfd_deactivate(irqfd);
>>list_del_init(&irqfd->list);
>>  spin_unlock(kvm->irqfd.lock);
>>
>> Look, we're accessing irqfd though it has already been freed!
> 
> Unless the irqfd_wakeup path isn't acquiring wqh->lock, it looks
> race-free to me.  Thanks,
> 
> Alex
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] kvm: fix a race when closing irq eventfd

2013-02-17 Thread Li Zefan
On 2013/2/18 12:09, Li Zefan wrote:
> On 2013/2/18 12:02, Alex Williamson wrote:
>> On Mon, 2013-02-18 at 11:13 +0800, Li Zefan wrote:
>>> While trying to fix a race when closing cgroup eventfd, I took a look
>>> at how kvm deals with this problem, and I found it doesn't.
>>>
>>> I may be wrong, as I don't know kvm code, so correct me if I'm.
>>>
>>> /*
>>>  * Race-free decouple logic (ordering is critical)
>>>  */
>>> static void
>>> irqfd_shutdown(struct work_struct *work)
>>>
>>> I don't think it's race-free!
>>>
>>> static int
>>> irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
>>> {
>>> ...
>>>  * We cannot race against the irqfd going away since the
>>>  * other side is required to acquire wqh->lock, which 
>>> we hold
>>>  */
>>> if (irqfd_is_active(irqfd))
>>> irqfd_deactivate(irqfd);
>>> }
>>>
>>> In kvm_irqfd_deassign() and kvm_irqfd_release() where irqfds are freed,
>>> wqh->lock is not acquired!
>>>
>>> So here is the race:
>>>
>>> CPU0CPU1
>>> --- -
>>> kvm_irqfd_release()
>>>   spin_lock(kvm->irqfds.lock);
>>>   ...
>>>   irqfd_deactivate(irqfd);
>>> list_del_init(&irqfd->list);
>>>   spin_unlock(kvm->irqfd.lock);
>>>   ...
>>> close(eventfd)
>>>   irqfd_wakeup();
>>
>> irqfd_wakeup is assumed to be called with wqh->lock held
>>
> 
> I'm aware of this.
> 
> As I said, kvm_irqfd_deassign() and kvm_irqfd_release() are not acquiring
> wqh->lock.
> 
>>> irqfd_shutdown();
>>
>> eventfd_ctx_remove_wait_queue has to acquire wqh->lock to complete or
>> else irqfd_shutdown never makes it to the kfree.  So in your scenario
>> this cpu0 spins here until cpu1 completes.
>>

Oh you're right, this is not obvious. Thanks for the explanation.

Now I'll go to see how to fix cgroup.

>>>   remove_waitqueue(irqfd->wait);
>>>   kfree(irqfd);
>>> spin_lock(kvm->irqfd.lock);
>>>   if (!list_empty(&irqfd->list))
>>
>> We don't take this branch because we already did list_del_init above,
>> which makes irqfd->list empty.
>>
> 
> It doesn't matter if the list is empty or not.
> 
> The point is, irqfd has been kfreed, so the if statement is simply not safe!
> 
>>> irqfd_deactivate(irqfd);
>>>   list_del_init(&irqfd->list);
>>> spin_unlock(kvm->irqfd.lock);
>>>
>>> Look, we're accessing irqfd though it has already been freed!
>>
>> Unless the irqfd_wakeup path isn't acquiring wqh->lock, it looks
>> race-free to me.  Thanks,
>>
>> Alex
>>
>> .
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip -v11 11/11] tracing: Add kprobes event profiling interface

2009-07-09 Thread Li Zefan
> +Event Profiling
> +---
> + You can check the total number of probe hits and probe miss-hits via
> +/sys/kernel/debug/tracing/kprobe_profile.
> + The fist column is event name, the second is the number of probe hits,

s/fist/first

> +the third is the number of probe miss-hits.
> +
> +
...
> +/* Probes profiling interfaces */
> +static int profile_seq_show(struct seq_file *m, void *v)
> +{
> + struct trace_probe *tp = v;
> +
> + if (tp == NULL)
> + return 0;
> +

tp will never be NULL, which is guaranteed by seq_file

> + seq_printf(m, "%s", tp->call.name);
> +
> + seq_printf(m, "\t%8lu %8lu\n", tp->nhits,
> +probe_is_return(tp) ? tp->rp.kp.nmissed : tp->kp.nmissed);
> +
> + return 0;
> +}
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip -v11 08/11] tracing: add kprobe-based event tracer

2009-07-10 Thread Li Zefan
> +static __kprobes unsigned long fetch_memory(struct pt_regs *regs, void *addr)
> +{
> + unsigned long retval;

need a space after local variable declarations.

> + if (probe_kernel_address(addr, retval))
> + return 0;
> + return retval;
> +}
> +
> +static __kprobes unsigned long fetch_argument(struct pt_regs *regs, void 
> *num)
> +{
> + return regs_get_argument_nth(regs, (unsigned int)((unsigned long)num));
> +}
> +
> +static __kprobes unsigned long fetch_retvalue(struct pt_regs *regs,
> +   void *dummy)
> +{
> + return regs_return_value(regs);
> +}
> +
> +static __kprobes unsigned long fetch_ip(struct pt_regs *regs, void *dummy)
> +{
> + return instruction_pointer(regs);
> +}
> +
> +/* Memory fetching by symbol */
> +struct symbol_cache {
> + char *symbol;
> + long offset;
> + unsigned long addr;
> +};
> +
> +static unsigned long update_symbol_cache(struct symbol_cache *sc)
> +{
> + sc->addr = (unsigned long)kallsyms_lookup_name(sc->symbol);
> + if (sc->addr)
> + sc->addr += sc->offset;
> + return sc->addr;
> +}
> +
> +static void free_symbol_cache(struct symbol_cache *sc)
> +{
> + kfree(sc->symbol);
> + kfree(sc);
> +}
> +
> +static struct symbol_cache *alloc_symbol_cache(const char *sym, long offset)
> +{
> + struct symbol_cache *sc;

ditto.

and in some other places

> + if (!sym || strlen(sym) == 0)
> + return NULL;
> + sc = kzalloc(sizeof(struct symbol_cache), GFP_KERNEL);
> + if (!sc)
> + return NULL;
> +
> + sc->symbol = kstrdup(sym, GFP_KERNEL);
> + if (!sc->symbol) {
> + kfree(sc);
> + return NULL;
> + }
> + sc->offset = offset;
> +
> + update_symbol_cache(sc);
> + return sc;
> +}

...

> +static int probes_seq_show(struct seq_file *m, void *v)
> +{
> + struct trace_probe *tp = v;
> + int i, ret;
> + char buf[MAX_ARGSTR_LEN + 1];
> +
> + if (tp == NULL)
> + return 0;
> +

redundant check. tp won't be NULL.

> + seq_printf(m, "%c", probe_is_return(tp) ? 'r' : 'p');
> + if (tp->call.name)
> + seq_printf(m, ":%s", tp->call.name);
> +
> + if (tp->symbol)
> + seq_printf(m, " %s%+ld", probe_symbol(tp), probe_offset(tp));
> + else
> + seq_printf(m, " 0x%p", probe_address(tp));
> +
> + for (i = 0; i < tp->nr_args; i++) {
> + ret = trace_arg_string(buf, MAX_ARGSTR_LEN, &tp->args[i]);
> + if (ret) {
> + pr_warning("Argument%d is too long.\n", i);
> + break;
> + }
> + seq_printf(m, " %s", buf);
> + }
> + seq_printf(m, "\n");
> + return 0;
> +}

...

> +static ssize_t probes_write(struct file *file, const char __user *buffer,
> + size_t count, loff_t *ppos)
> +{
> + char *kbuf, *tmp;
> + int ret;
> + size_t done;
> + size_t size;
> +
> + if (!count || count < 0)
> + return 0;

count is unsigned, so won't < 0. Also I don't think you
need to treat (count == 0) specially.

> +
> + kbuf = kmalloc(WRITE_BUFSIZE, GFP_KERNEL);

should be kmalloc(WRITE_BUFSIZE+1), or...

> + if (!kbuf)
> + return -ENOMEM;
> +
> + ret = done = 0;
> + do {
> + size = count - done;
> + if (size > WRITE_BUFSIZE)
> + size = WRITE_BUFSIZE;

if (size >= WRITE_BUFSIZE)
size = WRITE_BUFSIZE - 1;

> + if (copy_from_user(kbuf, buffer + done, size)) {
> + ret = -EFAULT;
> + goto out;
> + }
> + kbuf[size] = '\0';
> + tmp = strchr(kbuf, '\n');
> + if (!tmp) {
> + pr_warning("Line length is too long: "
> +"Should be less than %d.", WRITE_BUFSIZE);
> + ret = -EINVAL;
> + goto out;
> + }
> + *tmp = '\0';
> + size = tmp - kbuf + 1;
> + done += size;
> + /* Remove comments */
> + tmp = strchr(kbuf, '#');
> + if (tmp)
> + *tmp = '\0';
> +
> + ret = command_trace_probe(kbuf);
> + if (ret)
> + goto out;
> +
> + } while (done < count);
> + ret = done;
> +out:
> + kfree(kbuf);
> + return ret;
> +}

...

> +enum print_line_t
> +print_kretprobe_event(struct trace_iterator *iter, int flags)
> +{
> + struct kretprobe_trace_entry *field;
> + struct trace_seq *s = &iter->seq;
> + int i;
> +
> + trace_assign_type(field, iter->ent);
> +
> + if (!seq_print_ip_sym(s, field->ret_ip, flags | TRACE_ITER_SYM_OFFSET))
> + goto partial;
> +

can't we use %pF?

> + if (!trace_seq_puts(s, " <- "))
> + goto partial;
> +
> + if (!seq_print_ip_sym(s, field

Re: [PATCH 2/3] cgroups: Add an API to attach a task to current task's cgroup

2010-05-30 Thread Li Zefan
04:24, Tejun Heo wrote:
> From: Sridhar Samudrala 
> 
> Add a new kernel API to attach a task to current task's cgroup
> in all the active hierarchies.
> 
> Signed-off-by: Sridhar Samudrala 

Acked-by: Li Zefan 

btw: you lost the reviewed-by tag given by Paul Menage.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] vhost: apply cpumask and cgroup to vhost pollers

2010-05-30 Thread Li Zefan
Tejun Heo wrote:
> Apply the cpumask and cgroup of the initializing task to the created
> vhost poller.
> 
> Based on Sridhar Samudrala's patch.
> 
> Cc: Michael S. Tsirkin 
> Cc: Sridhar Samudrala 
> ---
>  drivers/vhost/vhost.c |   36 +++-
>  1 file changed, 31 insertions(+), 5 deletions(-)
> 
> Index: work/drivers/vhost/vhost.c
> ===
> --- work.orig/drivers/vhost/vhost.c
> +++ work/drivers/vhost/vhost.c
> @@ -23,6 +23,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include 
>  #include 
> @@ -176,12 +177,30 @@ repeat:
>  long vhost_dev_init(struct vhost_dev *dev,
>   struct vhost_virtqueue *vqs, int nvqs)
>  {
> - struct task_struct *poller;
> - int i;
> + struct task_struct *poller = NULL;
> + cpumask_var_t mask;
> + int i, ret = -ENOMEM;
> +
> + if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> + goto out;
> 

If we "goto out", we will end up calling kthread_stop(poller), but
seems kthread_stop() requires the task_struct pointer != NULL.

>   poller = kthread_create(vhost_poller, dev, "vhost-%d", current->pid);
> - if (IS_ERR(poller))
> - return PTR_ERR(poller);
> + if (IS_ERR(poller)) {
> + ret = PTR_ERR(poller);
> + goto out;
> + }
> +
> + ret = sched_getaffinity(current->pid, mask);
> + if (ret)
> + goto out;
> +
> + ret = sched_setaffinity(poller->pid, mask);
> + if (ret)
> + goto out;
> +
> + ret = cgroup_attach_task_current_cg(poller);
> + if (ret)
> + goto out;
> 
>   dev->vqs = vqs;
>   dev->nvqs = nvqs;
> @@ -202,7 +221,14 @@ long vhost_dev_init(struct vhost_dev *de
>   vhost_poll_init(&dev->vqs[i].poll,
>   dev->vqs[i].handle_kick, POLLIN, dev);
>   }
> - return 0;
> +
> + wake_up_process(poller);/* avoid contributing to loadavg */
> + ret = 0;
> +out:
> + if (ret)
> + kthread_stop(poller);
> + free_cpumask_var(mask);
> + return ret;
>  }
> 
>  /* Caller should have device mutex */
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH UPDATED 3/3] vhost: apply cpumask and cgroup to vhost pollers

2010-05-31 Thread Li Zefan
>   poller = kthread_create(vhost_poller, dev, "vhost-%d", current->pid);
> - if (IS_ERR(poller))
> - return PTR_ERR(poller);
> + if (IS_ERR(poller)) {
> + ret = PTR_ERR(poller);
> + goto out;

here...

> + }
> +
> + ret = sched_getaffinity(current->pid, mask);
> + if (ret)
> + goto out;
> +
> + ret = sched_setaffinity(poller->pid, mask);
> + if (ret)
> + goto out;
> +
> + ret = cgroup_attach_task_current_cg(poller);
> + if (ret)
> + goto out;
> 
>   dev->vqs = vqs;
>   dev->nvqs = nvqs;
> @@ -202,7 +221,14 @@ long vhost_dev_init(struct vhost_dev *de
>   vhost_poll_init(&dev->vqs[i].poll,
>   dev->vqs[i].handle_kick, POLLIN, dev);
>   }
> - return 0;
> +
> + wake_up_process(poller);/* avoid contributing to loadavg */
> + ret = 0;
> +out:
> + if (ret && poller)

It's still buggy..

> + kthread_stop(poller);
> + free_cpumask_var(mask);
> + return ret;
>  }
> 
>  /* Caller should have device mutex */
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] perf events: Change perf parameter --pid to process-wide collection instead of thread-wide

2010-03-25 Thread Li Zefan
Zhang, Yanmin wrote:
> From: Zhang, Yanmin 
> 
> Parameter --pid (or -p) of perf currently means a thread-wide collection.
> For exmaple, if a process whose id is  has 10 threads, 'perf top -p '
> just collects the main thread statistics. That's misleading. Users are
> used to attach a whole process when debugging a process by gdb. To follow
> normal usage style, the patch change --pid to process-wide collection and
> add --tid (-t) to mean a thread-wide collection.
> 
> Usage example is:
> #perf top -p 
> #perf record -p  -f sleep 10
> #perf stat -p  -f sleep 10
> Above commands collect the statistics of all threads of process .
> 
> Signed-off-by: Zhang Yanmin 
> 

Seems this patch causes seg faults:

# ./perf sched record
Segmentation fault
# ./perf kmem record
Segmentation fault
# ./perf timechart record
Segmentation fault



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] perf events: Change perf parameter --pid to process-wide collection instead of thread-wide

2010-03-25 Thread Li Zefan
>>> Parameter --pid (or -p) of perf currently means a thread-wide collection.
>>> For exmaple, if a process whose id is  has 10 threads, 'perf top -p 
>>> '
>>> just collects the main thread statistics. That's misleading. Users are
>>> used to attach a whole process when debugging a process by gdb. To follow
>>> normal usage style, the patch change --pid to process-wide collection and
>>> add --tid (-t) to mean a thread-wide collection.
>>>
>>> Usage example is:
>>> #perf top -p 
>>> #perf record -p  -f sleep 10
>>> #perf stat -p  -f sleep 10
>>> Above commands collect the statistics of all threads of process .
>>>
>>> Signed-off-by: Zhang Yanmin 
>>>
>> Seems this patch causes seg faults:
>>
>> # ./perf sched record
>> Segmentation fault
>> # ./perf kmem record
>> Segmentation fault
>> # ./perf timechart record
>> Segmentation fault
> 
> Thanks for reporting it. Arnaldo, could you pick up below patch?
> Zefan, Could you try it?
> 

The fix works. Thanks!

> mmap_array[][][] is not reset to 0 after malloc. Below patch against
> tip/master of March 24th fixes it with a zalloc.
> 
> Reported-by:  Li Zefan 
> Signed-off-by:Zhang Yanmin 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] cgroup - removing superfluous rcu_read_lock_held check

2010-11-02 Thread Li Zefan
On 2010年11月02日 03:15, Jiri Olsa wrote:
> hi,

This..

> the rcu_dereference_check is defined as
> 
>   #define rcu_dereference_check(p, c) \
>  __rcu_dereference_check((p), rcu_read_lock_held() || (c), __rcu)
> 
> so the caller does not need to specify rcu_read_lock_held() condition.
>
 
> wbr,
> jirka

and this should be excluded from the changelog.

> 
> 
> Signed-off-by: Jiri Olsa 

Reviewed-by: Li Zefan 

However a nitpick:

> ---
>  include/linux/cgroup.h |1 -
>  kernel/cgroup.c|6 ++
>  2 files changed, 2 insertions(+), 5 deletions(-)
...
> @@ -4544,7 +4542,7 @@ unsigned short css_id(struct cgroup_subsys_state *css)
>* it's unchanged until freed.
>*/
>   cssid = rcu_dereference_check(css->id,
> - rcu_read_lock_held() || atomic_read(&css->refcnt));
> + atomic_read(&css->refcnt));

Now the 2 lines can be made into one line and still fit into 80 chars.

>  
>   if (cssid)
>   return cssid->id;
> @@ -4557,7 +4555,7 @@ unsigned short css_depth(struct cgroup_subsys_state 
> *css)
>   struct css_id *cssid;
>  
>   cssid = rcu_dereference_check(css->id,
> - rcu_read_lock_held() || atomic_read(&css->refcnt));
> + atomic_read(&css->refcnt));

dito

>  
>   if (cssid)
>   return cssid->depth;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] cgroups: fix API thinko

2010-08-17 Thread Li Zefan
(Just came back from vacation)

Michael S. Tsirkin wrote:
> cgroup_attach_task_current_cg API that have upstream is backwards: we
> really need an API to attach to the cgroups from another process A to
> the current one.
> 
> In our case (vhost), a priveledged user wants to attach it's task to cgroups
> from a less priveledged one, the API makes us run it in the other
> task's context, and this fails.
> 
> So let's make the API generic and just pass in 'from' and 'to' tasks.
> Add an inline wrapper for cgroup_attach_task_current_cg to avoid
> breaking bisect.
> 
> Signed-off-by: Michael S. Tsirkin 

Acked-by: Li Zefan 

I also don't like the name, but I'm not good at English or naming. ;)

> ---
> 
> Paul, Li, Sridhar, could you please review the following
> patch?
> 
> I only compile-tested it due to travel, but looks
> straight-forward to me.
> Alex Williamson volunteered to test and report the results.
> Sending out now for review as I might be offline for a bit.
> Will only try to merge when done, obviously.
> 
> If OK, I would like to merge this through -net tree,
> together with the patch fixing vhost-net.
> Let me know if that sounds ok.
> 

That's Ok.

...
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 43b2072..b38ec60 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -525,7 +525,11 @@ struct task_struct *cgroup_iter_next(struct cgroup *cgrp,
>  void cgroup_iter_end(struct cgroup *cgrp, struct cgroup_iter *it);
>  int cgroup_scan_tasks(struct cgroup_scanner *scan);
>  int cgroup_attach_task(struct cgroup *, struct task_struct *);
> -int cgroup_attach_task_current_cg(struct task_struct *);
> +int cgroup_attach_task_all(struct task_struct *from, struct task_struct *);

a nitpick:

better add a blank line here.

> +static inline int cgroup_attach_task_current_cg(struct task_struct *tsk)
> +{
> + return cgroup_attach_task_all(current, tsk);
> +}
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ftrace/perf_event leak

2010-09-01 Thread Li Zefan
Avi Kivity wrote:
>  On 09/01/2010 12:04 PM, Peter Zijlstra wrote:
>>
>> Does something like the below cure that?
>>
> 
> Unfortunately not.
> 

Then try this:

The bug should be caused by commit 1c024eca51fdc965290acf342ae16a476c2189d0.

---
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 8a2b73f..9d1d1f2 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -161,6 +161,7 @@ void perf_trace_destroy(struct perf_event *p_event)
}
}
 out:
+   module_put(tp_event->mod);
mutex_unlock(&event_mutex);
 }
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ftrace/perf_event leak

2010-09-01 Thread Li Zefan
>> Subject: perf, trace: Fix module leak
>> From: Li Zefan 
>> Date: Wed Sep 01 12:58:43 CEST 2010
>>
>> Commit 1c024eca (perf, trace: Optimize tracepoints by using
>> per-tracepoint-per-cpu hlist to track events) caused a module refcount
>> leak.
>>
>> Tested-by: Avi Kivity 
>> Signed-off-by: Peter Zijlstra 
>> LKML-Reference: <4c7e1f12.8030...@cn.fujitsu.com>
>> ---
>>  kernel/trace/trace_event_perf.c |3 +++
>>  1 file changed, 3 insertions(+)
>>
>> Index: linux-2.6/kernel/trace/trace_event_perf.c
>> ===
>> --- linux-2.6.orig/kernel/trace/trace_event_perf.c
>> +++ linux-2.6/kernel/trace/trace_event_perf.c
>> @@ -91,6 +91,8 @@ int perf_trace_init(struct perf_event *p
>>  tp_event->class && tp_event->class->reg &&
>>  try_module_get(tp_event->mod)) {
>>  ret = perf_trace_event_init(tp_event, p_event);
>> +if (ret)
>> +module_put(tp_event->mod);
>>  break;
>>  }
>>  }
>> @@ -147,6 +149,7 @@ void perf_trace_destroy(struct perf_even
>>  }
>>  }
>>  out:
>> +module_put(tp_event->mod);
>>  mutex_unlock(&event_mutex);
>>  }
>>  
>>
> 
> Thanks for fixing this.
> 
> However, can we split this in two patches to ease the backport?
> 
> The lack of a module_put() after perf_trace_init() failure is there for a 
> while
> (the backport needs to start in 2.6.32).

The failure should be a rare case, I don't think this has to be backported?

> 
> But the lack of a module_put in the destroy path needs a .35 backport only.
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html