Re: [PATCH] #2 UTRACE_STOP race condition nesting

2009-03-06 Thread Renzo Davoli
Dear Roland, dear utrace developers,

I have update also the second patch (which includes the first).
This patch fixes the utrace_stop race condition and 
implements a consistent model of tracing engine nesting.

renzo
On Sat, Feb 14, 2009 at 10:11:55AM +0100, Renzo Davoli wrote:
  
 This is an updated patch. It solves the race condition + it gives a quick (a 
 bit dirty)
 solution to issues 34.
   3- Nesting, is it really useful to run all the reports in a row and
   (eventually) stop and the end waiting for all the engines?
 The patch waits for each engine to resume before notifying the next 
 registered engine.
   4- report_syscall_entry engines evaluation order should be reversed
 REPORT macros have an extra reverse argument. The macros append this string 
 to the
 list_for_each_entry_safe function name. All the macro calls skip this 
 argument except
 the one in report_syscall_entry where it is set to _reverse.
 
 With this patch it is possible to run nested kmview machines and ptrace works 
 inside
 the virtual machines.
 
 This patch is a bit dirty because variables and sections of code needed to 
 count and test
 the stopped engines are useless here: a task can be kept stopped for at most 
 one engine at
 a time.
 
 This patch is a proof-of concept to show what I meant in my previous message.
 
 For what concerns 12 (not included in this patch):
   1- Virtual Machines may need to change the system call
 THis is just to simplify the implementation of arch. independent virtual 
 machine.
 I have kept the definition of missing functions in the kmview module code.
   2- UTRACE_SYSCALL_ABORT: is it really useful as a return value for
   report_syscall_entry?
 It is useless for kmview as the decision of aborting the system call is taken 
 while
 the process is stopped, I am currently setting the syscall number to -1 to 
 skip the syscall.
 
 For the sake of completeness there is another way to implement the partial 
 virtual machine
 stuff by introducing another quiescence state inside the report upcalls.
 I mean: when utrace calls a report function (say for example 
 report_syscall_entry), the function
 in the module puts the process in a stopped state (maybe its TASK_TRACED and 
 calls the schedule).
 From utrace's point of view the report function does not return until all 
 the changes in
 the task state have been completed and the decision 
 UTRACE_RESUME/UTRACE_SYSCALL_ABORT has been taken.
 In this way UTRACE_STOP is never used because the module has to implement 
 another feature
 similar to UTRACE_STOP on its own. So what is UTRACE_STOP for?
 
 ciao
   renzo

---
--- kernel/utrace.c.mcgrath 2009-03-05 15:09:57.0 +0100
+++ kernel/utrace.c 2009-03-06 11:49:15.0 +0100
@@ -369,6 +369,13 @@
return killed;
 }
 
+static void mark_engine_wants_stop(struct utrace_engine *engine);
+static void clear_engine_wants_stop(struct utrace_engine *engine);
+static bool engine_wants_stop(struct utrace_engine *engine);
+static void mark_engine_wants_resume(struct utrace_engine *engine);
+static void clear_engine_wants_resume(struct utrace_engine *engine);
+static bool engine_wants_resume(struct utrace_engine *engine);
+
 /*
  * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up.
  * @task == current, @utrace == current-utrace, which is not locked.
@@ -378,6 +385,7 @@
 static bool utrace_stop(struct task_struct *task, struct utrace *utrace)
 {
bool killed;
+   struct utrace_engine *engine, *next;
 
/*
 * @utrace-stopped is the flag that says we are safely
@@ -399,7 +407,23 @@
return true;
}
 
-   utrace-stopped = 1;
+   /* final check: is really needed to stop? */
+   list_for_each_entry_safe(engine, next, utrace-attached, entry) {
+   if ((engine-ops != utrace_detached_ops)  
engine_wants_stop(engine)) {
+   if (engine_wants_resume(engine)) {
+   clear_engine_wants_stop(engine);
+   clear_engine_wants_resume(engine);
+   }
+   else
+   utrace-stopped = 1;
+   }
+   }
+   if (unlikely(!utrace-stopped)) {
+   spin_unlock_irq(task-sighand-siglock);
+   spin_unlock(utrace-lock);
+   return false;
+   }
+
__set_current_state(TASK_TRACED);
 
/*
@@ -625,6 +649,7 @@
  * to record whether the engine is keeping the target thread stopped.
  */
 #define ENGINE_STOP(1UL  _UTRACE_NEVENTS)
+#define ENGINE_RESUME  (1UL  (_UTRACE_NEVENTS+1))
 
 static void mark_engine_wants_stop(struct utrace_engine *engine)
 {
@@ -641,6 +666,21 @@
return (engine-flags  ENGINE_STOP) != 0;
 }
 
+static void mark_engine_wants_resume(struct utrace_engine *engine)
+{
+   engine-flags |= ENGINE_RESUME;
+}
+
+static void clear_engine_wants_resume(struct 

[BUG] utrace_attach_task() never returns when called from the report_clone callback

2009-03-06 Thread Ananth N Mavinakayanahalli
Roland,

With the current utrace/master tree, I am seeing that utrace_attach_task()
never returns when invoked from the clone callback. The same module
works fine with prior utrace (rcu as well as with my embed version).

The testcase is simple:
a. attach an engine to attachstop-mt (from the gdb testsuite) _before_ it
   calls pthread_create.
b. Watch for CLONE_THREAD and try to attach a utrace engine to the new
   thread. The utrace_attach_task() call never returns.

If the utrace module is unloaded, the kernel barfs with the following
innocuous information:

BUG: unable to handle kernel paging request at fdff
IP: [a012009a] 0xa012009a
PGD 203067 PUD 204067 PMD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/pci:01/:01:01.1/irq
CPU 6 
Modules linked in: big list
[last unloaded: utrace_quiesce_threads]
Pid: 6203, comm: attachstop-mt Not tainted 2.6.29-rc7-ut #1 eserver
xSeries 366-[88632RA]-
RIP: 0010:[a012009a]  [a012009a] 0xa012009a
RSP: 0018:8801d34ebe10  EFLAGS: 00010246
RAX: fdff RBX: 8801f11a36c0 RCX: c100
RDX:  RSI: 8801dd0507f8 RDI: 88022daf4500
RBP: fff4 R08: 8801d34ea000 R09: 88022f2596a0
R10: 8800280b1600 R11: 0018 R12: 8801d34f1860
R13: 8802210dd300 R14: 8801dd07e2c0 R15: 003d0f00
FS:  7f58c8d286e0() GS:88022f18e5c0()
knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: fdff CR3: 0002029bd000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process attachstop-mt (pid: 6203, threadinfo 8801d34ea000, task
8801d3512440)
Stack:
 003d0f00 8801d34f1860 8802210dd300 8801d3512440
 8801d34ebe70 a012028d 8801dd050618 8801d35129e0
 8801d35129d8 80260480  8801d34f1860
Call Trace:
 [80260480] ? utrace_report_clone+0x95/0xfc
 [80239120] ? do_fork+0x20b/0x2f3
 [804a4035] ? do_page_fault+0x3c7/0x74e
 [8020c243] ? stub_clone+0x13/0x20
 [8020bedb] ? system_call_fastpath+0x16/0x1b
Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
RIP  [a012009a] 0xa012009a
 RSP 8801d34ebe10
CR2: fdff
---[ end trace 96bb7eb644ab73a4 ]---

I have verified that the earlier version of utrace works just fine.

In the earlier case, the engine would go directly on to the attached
list if the calling task was the creator of the new thread. This has
changed with the new implementation.

I haven't yet zeroed in on what exact change caused this problem.

Ananth



Re: [PATCH] Embed struct utrace in task_struct - V2

2009-03-06 Thread Frank Ch. Eigler
Hi -

On Tue, Mar 03, 2009 at 03:14:01PM -0800, Roland McGrath wrote:
   * When we on the team think the utrace patch is ready to post, we need to
 do a coordinated post of Frank's ftrace widget.  [...]
  
  Would you consider simply merging it into your git tree / patch suite?
 
 Sure.  The way to do that is for you to publish a git repository that I can
 pull from.  [...]

OK:


The following changes since commit 0ef2243aeae481f1c0f1edd23a8259bd20331b00:
  Roland McGrath (1):
Merge remote branch 'upstream/HEAD' of /home/roland/redhat/linux/2.6/ 
into utrace

are available in the git repository at:

  http://web.elastic.org/~fche/git/linux-2.6-utrace.git utrace-ftrace

Frank Ch. Eigler (1):
  utrace-based ftrace process engine, v2

 include/linux/processtrace.h |   41 +++
 kernel/trace/Kconfig |9 +
 kernel/trace/Makefile|1 +
 kernel/trace/trace.h |   30 ++-
 kernel/trace/trace_process.c |  591 ++
 5 files changed, 661 insertions(+), 11 deletions(-)
 create mode 100644 include/linux/processtrace.h
 create mode 100644 kernel/trace/trace_process.c


- FChE



Re: [PATCH] Embed struct utrace in task_struct - V2

2009-03-06 Thread Roland McGrath
   http://web.elastic.org/~fche/git/linux-2.6-utrace.git utrace-ftrace
 
 Frank Ch. Eigler (1):
   utrace-based ftrace process engine, v2

Thanks, Frank.  Your branch is now in my repo and its patch generated in
2.6-current/.  I'll pull periodically, or let me know if my repo lags
behind yours in future.


Thanks,
Roland



Re: [BUG] utrace_attach_task() never returns when called from the report_clone callback

2009-03-06 Thread Ananth N Mavinakayanahalli
On Fri, Mar 06, 2009 at 12:52:34PM -0800, Roland McGrath wrote:
  With the current utrace/master tree, I am seeing that utrace_attach_task()
  never returns when invoked from the clone callback. The same module
  works fine with prior utrace (rcu as well as with my embed version).
 
 I changed the utrace_attach_delay() logic recently.  That is probably it.

Right, reverting dd30e86355e fixes the problem.

 Please post your test case.

Here it is -- does nothing much really :) I used this module in
conjunction with attachstop_mt with an engine attaching to it before the
pthread_create().

---
#include linux/module.h
#include linux/utrace.h
#include linux/err.h

MODULE_DESCRIPTION(Utrace tests);
MODULE_LICENSE(GPL);

static int target_pid;
module_param_named(pid, target_pid, int, 0);

/* Structure for all threads of a process having the same utrace ops */
struct proc_utrace {
struct task_struct *tgid_task;

/* list of task_utrace structs */
struct list_head list;
unsigned int num_threads;
};

struct task_utrace {
struct list_head list;
struct task_struct *task;
/* TODO: Get rid of this and use MATCHING_OPS on task? */
struct utrace_engine *engine;
};

static const struct utrace_engine_ops ut_ops;

static struct task_utrace *get_task_ut(struct task_struct *task,
struct proc_utrace *proc_ut)
{
struct task_utrace *task_ut, *temp;

list_for_each_entry_safe(task_ut, temp, proc_ut-list, list) {
if (task_ut-task == task)
return task_ut;
}
return NULL;
}

static int cleanup_proc_ut(struct proc_utrace *proc_ut)
{
int ret = 0;
struct task_utrace *task_ut, *temp;

printk(KERN_INFO Cleanup_proc_ut\n);
if (proc_ut == NULL)
return 0;

if (list_empty(proc_ut-list))
goto out;

/* walk proc_ut-list and free task_ut */
list_for_each_entry_safe(task_ut, temp, proc_ut-list, list) {
if (task_ut-engine) {
printk(KERN_INFO Calling detach for %d\n,
task_pid_nr(task_ut-task));
ret = utrace_control(task_ut-task,
task_ut-engine, UTRACE_DETACH);
if (ret)
printk(KERN_INFO utrace_detach returned %d\n,
ret);
printk(KERN_INFO Detached engine for %d\n,
task_pid_nr(task_ut-task));
}
list_del(task_ut-list);
kfree(task_ut);
}
out:
kfree(proc_ut);
return ret;
}

static int setup_task_ut(struct task_struct *t, struct proc_utrace *proc_ut)
{
struct task_utrace *task_ut;
int ret = 0;

if (!t || !proc_ut)
return -EINVAL;

printk(KERN_INFO setup_task_ut: attaching for task %d\n,
task_pid_nr(t));
task_ut = kzalloc(sizeof(*task_ut), GFP_KERNEL);
if (!task_ut)
return -ENOMEM;

INIT_LIST_HEAD(task_ut-list);
task_ut-task = t;
list_add_tail(task_ut-list, proc_ut-list);

/*
 * The utrace engine's *data will point to proc_ut.
 */
printk(KERN_INFO Before utrace_attach_task: %d\n, task_pid_nr(t));
task_ut-engine = utrace_attach_task(t, UTRACE_ATTACH_CREATE,
ut_ops, proc_ut);
printk(KERN_INFO After utrace_attach_task: %d, engine = %p\n,
task_pid_nr(t), task_ut-engine);
if (IS_ERR(task_ut-engine)) {
printk(KERN_ERR utrace_attach_task returned %d\n,
(int)PTR_ERR(task_ut-engine));
task_ut-engine = NULL;
ret = -ESRCH;
goto out;
}
printk(KERN_INFO utrace_attach_task: SUCCESS! - engine = %p\n,
task_ut-engine);
if (utrace_set_events(t, task_ut-engine,
UTRACE_EVENT(QUIESCE) | UTRACE_EVENT(CLONE) |
UTRACE_EVENT(EXIT))) {
ret = -ESRCH;
}
proc_ut-num_threads++;
out:
return ret;
}

static u32 ut_quiesce(enum utrace_resume_action action,
struct utrace_engine *engine,
struct task_struct *task, unsigned long event)
{
printk(KERN_INFO In quiesce callback: tid = %d\n, task_pid_nr(task));
return UTRACE_RESUME;
}

/* clone handler -- handle thread spawns and forks */
static u32 ut_clone(enum utrace_resume_action action,
struct utrace_engine *engine,
struct task_struct *parent, unsigned long clone_flags,
struct task_struct *child)
{
struct proc_utrace *proc_ut = (struct proc_utrace *)engine-data;

 

Re: [BUG] utrace_attach_task() never returns when called from the report_clone callback

2009-03-06 Thread Ananth N Mavinakayanahalli
On Fri, Mar 06, 2009 at 12:52:34PM -0800, Roland McGrath wrote:
  With the current utrace/master tree, I am seeing that utrace_attach_task()
  never returns when invoked from the clone callback. The same module
  works fine with prior utrace (rcu as well as with my embed version).
 
 I changed the utrace_attach_delay() logic recently.  That is probably it.
 Please post your test case.

The issue is that target-real_parent == current-real_parent and not
current on a CLONE_THREAD|CLONE_PARENT. So we keep looping in the
do-while.

Ananth



Re: instruction-analysis API(s)

2009-03-06 Thread Jim Keniston

Quoting Masami Hiramatsu mhira...@redhat.com:


Hi Jim and Sriker,

Here, I almost rewrote my patch.

Changelog:
- rewrite decoding logic based on Intel' manual.
- supoort insn_get_sib(),insn_get_displacement()
  and insn_get_immediate() too.
- support 3 bytes opcode and 64bit immediate.
- introduce some bitmaps.

Thank you,


Well, I didn't do much of a code review -- it looks like you addressed  
all my concerns -- but as I mentioned on IRC, I hacked together a test  
rig whereby you can disassemble a designated elf file (e.g., vmlinux,  
libc, libm) and then compare insn_get_length()'s results with  
objdump's results.  The comment in distill.awk shows how to use  
objdump, awk, and test_get_len together.


I also hacked up insn_x86.h and insn_x86.c to work in user space.   
Most of that is accomplished via insn_x86_user.h, but it certainly  
isn't necessary to do it that way.  In particular, __u8, __s8, __u16,  
etc. are versions of u8, s8, u16, etc. that can be used in both kernel  
and user code, so maybe we should switch to those.


I tested with vmlinux, libc, and libm on both an i686 system and an  
x86_64 system.  I found and fixed a few bugs.  Here are the ones that  
come to mind (all fixed):

- shrd/shld, which we discussed
- missing support for weird nops with modrm bytes (0f 1f ...).
- neglected to include the REX prefix in prefixes.nbytes
- missing static decl in an inline function in insn_x86.h

There are some other cases where insn_get_length() doesn't match up  
with the disassembly, but I don't consider them bugs:
- 0x9b is an instruction (fwait), but the disassembler treats it as a  
prefix.  For example 9b df ... can be disassembled as

fstsw ...   // wait, then store status word
or
fwait   // wait
fnstsw ...  // store status word without waiting
Perhaps it's relevant to investigate whether a single-step of 9b df  
... would execute just the fwait or the whole fstsw.  Anyway, this  
explains the failures of finit and fstsw that I mentioned to you.  I  
also saw this with fstcw and fclex.
- Illegal instruction sequences, such as an x86_64 instruction that  
starts with 0x40, or a misplaced 0x65 prefix.  Typically, we see these  
when disassembling data.  I just filtered out (via egrep) instructions  
whose disassembly starts with rex or includes (bad).


We could address the above by filtering them out in distill.awk or  
test_get_len.c.  I think we're clean otherwise.


There's a little more housecleaning to do -- e.g., adding Hitachi (?)  
copyright to IBM copyright, discarding insn_field_exists() and  
insn_extract_reg(), putting this all in git somewhere.  But not tonight.


Pull all the attached files into a directory and have a go -- e.g.,
$ make
$ objdump -d vmlinux | awk -f distill.awk | ./test_get_len [x86_64]

Jim

# Usage: objdump -d a.out | awk -f distill.awk | ./test_get_len
# Distills the disassembly as follows:
# - Removes all lines except the disassembled instructions.
# - For instructions that exceed 1 line (7 bytes), crams all the hex bytes
# into a single line.

BEGIN {
prev_addr = 
prev_hex = 
prev_mnemonic = 
}

/^ *[0-9a-f]+:/ {
if (split($0, field, \t)  3) {
# This is a continuation of the same insn.
prev_hex = prev_hex field[2]
} else {
if (prev_addr != )
printf %s\t%s\t%s\n, prev_addr, prev_hex, 
prev_mnemonic
prev_addr = field[1]
prev_hex = field[2]
prev_mnemonic = field[3]
}
}

END {
if (prev_addr != )
printf %s\t%s\t%s\n, prev_addr, prev_hex, prev_mnemonic
}
/*
 * x86 instruction analysis
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
 *
 * Copyright (C) IBM Corporation, 2002, 2004, 2009
 */
#ifdef KERNEL
#include linux/module.h
#include linux/string.h
#else
#include string.h
#endif
// #include asm/insn.h
#include insn_x86.h

MODULE_LICENSE(GPL); // for test

/**
 * insn_init() - initialize struct insn
 * @insn:   struct insn to be initialized
 * @kaddr:  address (in kernel memory) of instruction (or copy thereof)
 * @x86_64: true for 64-bit kernel or 64-bit app
 */
void insn_init(struct insn *insn, const u8 *kaddr, bool x86_64)
{
memset(insn, 0,