[PATCH] some optimizations for Virtual Machines

2007-08-13 Thread Renzo Davoli
Roland (and utrace-devel community),

I have just completed, together with Andrea Gasparini, a first 
implementation of a kernel module based on utrace as a fast support for our 
virtualization environment (view-os/umview). The name of the module is "kmview" 
kernel-mode-view-os, and the user level tool will have the same name.
We will (GPL) release both module and user level program as soon as possible.

utrace is a wondeful and well designed tool. However, IMHO, during the
implementation of kmview we have found that there are some
improvements that can be done (and that we have already implemented)
for a better support of virtual machines.

Here are some comments, I hope you'll share our ideas and you'll insert 
our improvements soon in utrace's mainstream code.

1- Order of callbacks

You say: Engines are called in the order they attached.
It is meaningful for kernel generated events but unfortunately it
does not provide a significant semantics for engine nesting when
applied to report_syscall_entry.

When dealing with several tracing/virtual machine tools the 
report_syscall_entry callbacks must be evaluated in the reverse way.

As an example I tried to use strace on a view-os like virtual machine 
(some syscalls get virtualized).
strace works but being the last engine it shows the modified calls, but the
return values of the original calls.

Wrong order:
syscall enter: call -> VM (modification) -> strace -> kernel
syscall exit: call -> VM (restore) -> strace -> kernel

Right order:
syscall enter: call -> strace -> VM (modification) -> kernel
syscall exit: call -> VM (restore) -> strace -> kernel

Reversing the attached engine list traversal for syscall_entry solves the
problem.

2- Access to traced process vm.

Your interface provides the call utrace_access_process_vm: it allows tracer 
processes to use /dev/*/mem.
Unfortunately write access is denied (as stated in fs/proc/base.c):
> #define mem_write NULL
> #ifndef mem_write
> /* This is a security hazard */

The /dev/*/mem way to access process vm's would be useless anyway.
When I write a virtual machine support for hundreds of processes I cannot
keep hundreds of open files. On the other hand I cannot open and close file
for each memory access: we need fast access!

I propose a new call:
int utrace_access_process_vm(struct task_struct *tsk, unsigned long addr, char 
__user *ubuf, int len, int write, int string);
which give I-O access to the memory of the process.
It has about the same interface of access_process_vm (mm/memory.c) with
the extra "string" option (significative only when write==0).
Sometimes a read buffer can be significantly larger than the actual field
used for a string. If string==1 the transfer terminates at '\0' avoiding
the memory error that could arise for unallocated memory after the string 
(and a slight increase in performance).
Prior to give access to the process vm, utrace_access_process_vm check
the rights to do so using utrace_allow_access_process_vm (it has the same
degree of protection of your access to /dev/*/mem).

3- In the patch I have also implemented the support for PTRACE_MULTI 
and PTRACE_SYSVM.
These two extra features provide:
-- PTRACE_MULTI: multiple PTRACE operation using one call, including
data transfer of chunks of memory and registers.
(it would speed up many commands, have a look of "strace strace ls",
to see how many bursts of prace could collapse!).
I designed this call for virtual machine support.
-- PTRACE_SYSVM: can be used instead of PTRACE_SYSCALL or SYSEMU.
At the end of the pre-syscall protocol it is possible to choose among three
different behavior:
   i- call againg after the syscall (maybe some parameters gets modified by
the virtualization. (like PTRACE_SYSCALL)
   ii- skip the upcall after the syscall but do perform the syscall 
(for a non virtualized call)
   iii- skip both the system call and the second upcall event
(for a completely virtualized call).
PTRACE_SYSVM almost half the number of context switches for Virtual Machines.
(SYSEMU works just for total Virtual Machines, while SYSVM works also
for partial Virtual Machines)
There is a extensive description of SYSVM in some messages I sent some 
time ago on KDML. We already implemented these features on vanilla kernel, 
this verstion based on utrace is architecture independent.

---
THe complete patch is here:
http://www.cs.unibo.it/~renzo/utrace/
Unfortunately it is against 2.4.22. I have a very slow connection to the
Internet here, I'll try to update the patch to the latest kernel as
soon as I return home.

ciao
renzo
-- 

Renzo Davoli| Dept. of Computer Science
(NIC rd235, HAM IZ4DJE) | University of Bologna 
Tel. +39 051 2094501  

Re: [PATCH] some optimizations for Virtual Machines

2007-08-13 Thread Renzo Davoli
A bugfix for my ASCII ART ;-)

Wrong order (utrace behavior now):
syscall enter: process -VM (modification) -> strace -> kernel
syscall exit: kernel -VM (restore) -> strace -> process

Right order (proposed):
syscall enter: process -> strace -> VM (modification) -> kernel
syscall exit: kernel -> VM (restore) -> strace -> process

Sorry for this trailing errata message.

renzo



[PATCH] update: some optimizations for Virtual Machines

2007-08-14 Thread Renzo Davoli
Just a quick note to say that I have updated my patch.

More precisely I have refined the virtualized syscall nesting.

When there are more engines for a task and report-callbacks can
change the status the quiescent state must be managed for each engine.

syscall enter: process -> VM0 -> VM1 (modify) -> VM2 (second
modification) -> kernel
syscall exit: kernel -> VM2 (restore) -> VM1 (restore) -> VM0 -> process

Both for syscall enter and exit the modification of VM2 must take place
after VM1 has completed its job.
If VM1 requires the quiescent state to compute its modification of the state
VM2 report_syscall_entry has to wait for VM1 to finish its job.
The same for restore in the opposite way.

One more idea:
The entry.S code provides the feature to skip the system call (by setting
the syscall number to -1).
This feature must be provided to VM nesting.

The new patch provide the correct nesting and if VM1 (in the example)
sets the syscall number to -1 it skips also VM2 report syscall report.
In this case during syscall exit skips VM2, restore the syscall number
and calls the report_syscall_exit from VM1 and then VM0.
Maybe it is useless to call VM1 too (starting from VM0 given that the 
call has been skipped) but for now I do in this way for the sake of
simmetry.

The new patch implements this policy.

Maybe the same idea (wait for quiescent state at each engine) must be 
applied to all the other report* that can change the status (e.g. 
report_signal?).

renzo



Re: [PATCH] some optimizations for Virtual Machines

2007-08-20 Thread Renzo Davoli
We have updated the patch to the latest kernel.
It is here:
http://www.cs.unibo.it/~renzo/utrace.

> 1- Order of callbacks
This patch is crucial. Without this change no virtual machines can be
nested if based on utrace.
> 
> 2- Access to traced process vm.
> 
This patch is very important: with this change VM hypervisors can access
their process memory efficiently. 
> 
> I propose a new call:
> int utrace_access_process_vm(struct task_struct *tsk, unsigned long addr, 
> char __user *ubuf, int len, int write, int string);
> which give I-O access to the memory of the process.
I have seen that access_process_vm has been exported to modules: it is a
change included in mainstream 2.6.23 from the first rc.
so I have exported access_process_vm_user too.
access_process_vm_user has two main differences with
access_process_vm:
- it copies a memory area directly from a process vm to the user space
  of current and viceversa (it uses an internal one page buffer, it does
not need extra buffers or extra code loops).
- it supports the "string" flag for reading: no useless copies of data
  after the end of string, no memory errors due to short string read
into large buffers.

utrace_access_process_vm can be kept or not: modules can call it instead
of directly accessing access_process_vm_user when they need to check
that the requesting process has the right to access the other process
vm.
> 
> 3- In the patch I have also implemented the support for PTRACE_MULTI 
> and PTRACE_SYSVM.
This patch is just useful. We'll use it to compare the performance
between umview and kmview. Several ptrace based application could
benefit from these features (e.g. when they need to load chunks of
memory or chunks of registers, burst of ptrace calls could be sent as a
single call reducing the number of mode switches).

That's all for now.


renzo



Re: [PATCH] some optimizations for Virtual Machines

2007-08-21 Thread Renzo Davoli
> > > int utrace_access_process_vm(struct task_struct *tsk, unsigned long addr, 
> > > char __user *ubuf, int len, int write, int string);
> 
> "string" smells like a hack, someone will come up with his favourite
> structure and ask for flag for copying, say, single-linked lists. :(

There are no system calls asking for linked lists as parameters.
Instead there are many having string! All those having a pathname.

Look at this:
char *s=strdup(x);
fd=open (s,O_RDONLY);

THis is a quiet, safe chunk of user code.
What do you do to grab the value of s (from a virtual machine monitor)?
With ptrace you can do a loop of PEEK_DATA one for each word of memory,
when there is a NULL byte you leave the loop.
IF you are designing a Virtual Machine this is just a performance
suicide.
You could open /proc//mem, in this case either you keep 
one descriptor opened for each controlled process or you  open
/read/close the proc file per each access.
Either a scalability or a performance nightmare.
if you need to go fast something like:
access_process_vm(,address_of_s, PATH_MAX ...)
can fail because the s could be in the unluky position at the end of
an allocated partition.

> > - it copies a memory area directly from a process vm to the user space
> >   of current and viceversa (it uses an internal one page buffer,
> 
> check for allocation failure
You mean:
actual>  mm = get_task_mm(tsk);
actual>  if (!mm)
actual>return 0;
actual>
actual>  buf=kmalloc(PAGE_SIZE, GFP_KERNEL);
must be changed as:
updated>  mm = get_task_mm(tsk);
updated>  buf=kmalloc(PAGE_SIZE, GFP_KERNEL);
updated>  if (!mm || !buf)
updated>return 0;

Okay, you're right, I'll update the patch.

> 
> > > 3- In the patch I have also implemented the support for PTRACE_MULTI 
> > > and PTRACE_SYSVM.
> 
> PTRACE_MULTI is horrible, it is asking for pain with compat version.

I cannot understand. If you do not use PTRACE_MULTI, ptrace works as
usual.
Old ptrace do not use PTRACE_MULTI so you do not need to support
PTRACE_MULTI for backward compatibility with old versions.

> 
> Mode switches are fast. This is a reason why read(2) doesn't have
> batched version, and read is called waaay more often than ptrace.

readv do exist.
Mode switches are fast, but having less mode switches is even faster.
> 
> I also wonder if this was tested with list debugging on: iterating over
> RCU protected list backwards when prev pointers are poisoned shouldn't
> work.
I do not know if the reverse scan of the list can be done better
or maybe my implementation is buggy.

I say that we do need that reverse traversal for SYSCALL_ENTRY otherwise
it is not possible to implement nested services based on utrace.

Regarding the 3 points of my original message:
1- order of calls: the patch (or a different patch implementing the same
idea) is needed, otherwise the support for nested engines 
become meaningless when dealing with system calls virtualization.
2- access to process vm: some solution is needed for a fast access to a
utraced process vm.
3- I need this patch for my project, I feel that it could speed up some
other programs, but this is not so crucial as 1 and 2.

renzo



Is PTRACE_SINGLEBLOCK buggy?

2008-06-02 Thread Renzo Davoli
Hi Roland, hi everybody,

I have finished teaching my spring term so I am back working on utrace.

I am porting my stuff about virtualquare kmview on the new version of
kernels.
I ran into something that seems to be a bug on PTRACE_SINGLEBLOCK.

The source code here enclosed says "OKAY" on a standard 2.6.25.4,
while it generates a kernel panic on a 2.6.25.4 +
http://people.redhat.com/roland/utrace/2.6-current/linux-2.6-utrace.patch.

Is this a bug? (I think so, no combination of syscall parms should
ever generate kernel panics ;)
Is this a known bug? (e.g. because PTRACE_SINGLEBLOCK is already a WIP
with utrace and you are already working on it...)

ciao
renzo

---
#include 
#include 
#include 
#include 
#include 

static int child(void *arg)
{
  if(ptrace(PTRACE_TRACEME, 0, 0, 0) < 0){
perror("ptrace traceme");
  }
  kill(getpid(), SIGSTOP);
  return 0;
}

int main()
{
  int pid, status, rv;
  static char stack[1024];

  if((pid = clone(child, &stack[1020], SIGCHLD, NULL)) < 0){
perror("clone");
return 0;
  }
  if((pid = waitpid(pid, &status, WUNTRACED)) < 0){
perror("Waiting for stop");
return 0;
  }
  ptrace(33, pid, 0, 0); /* PTRACE_SINGLEBLOCK */
  printf("OKAY\n");
  return 0;
}



Re: Is PTRACE_SINGLEBLOCK buggy?

2008-06-02 Thread Renzo Davoli
Jan Kratochvil has just sent me an E-mail saying that it seems to be 
a kvm bug (or a bug caused by kvm).

He is right: using qemu/kqemu instead of kvm it does not panic.

Anyway I am puzzled. Using kvm the PTRACE_SINGLEBLOCK should have the
same effect on 2.6.25.4 and 2.6.25.4+utrace.
2.6.25.4: ptrace_resume(kernel/ptrace.c)->user_enable_block_step
2.6.25.4+utrace: 
 ptrace_common(kernel/ptrace.c) sets UTRACE_ACTION_BLOCKSTEP 
 ->utrace_quiescent(kernel/utrace.c) tests UTRACE_ACTION_BLOCKSTEP 
 ->user_enable_block_step
I wonder where is the difference...

Anyway, let us wait for kvm people to fix it...

I want to thank Jan for his quick feedback.

renzo



3- utrace module nesting (again)

2008-06-03 Thread Renzo Davoli
(again because we already discussed this point)

As a matter of Fact utrace is a very useful and powerful tool to
support Virtualization (it is not just for debugging!).

When dealing with nested virtualization, i.e. nested utrace modules
registered to track one process, there is a problem.

Almost all the events managed by utrace refer to the notification of changes
by the kernel, thus it is clearly consistent that all the modules
get informed in the order they registered.
Each module can change the perception of the event and the next utrace
module receive a modified event (or none when UTRACE_ACTION_HIDE).

This is the case for:
  _UTRACE_EVENT_QUIESCE,  /* Tracing requests stop.  */
  _UTRACE_EVENT_REAP,   /* Zombie reaped, no more tracing possible.  */
  _UTRACE_EVENT_CLONE,  /* Successful clone/fork/vfork just done.  */
  _UTRACE_EVENT_VFORK_DONE, /* vfork woke from waiting for child.  */
  _UTRACE_EVENT_EXEC, /* Successful execve just completed.  */
  _UTRACE_EVENT_EXIT, /* Thread exit in progress.  */
  _UTRACE_EVENT_DEATH,  /* Thread has died.  */
  _UTRACE_EVENT_SYSCALL_EXIT, /* Returning to user after system call.  */
  _UTRACE_EVENT_SIGNAL(*), /* Signal delivery will run a user handler.  */
  _UTRACE_EVENT_JCTL, /* Job control stop or continue completed.  */ 

This is the sole exception I see:
  _UTRACE_EVENT_SYSCALL_ENTRY, /* User entered kernel for system call.  */

When utrace manages a syscall request (it is an event generated by the process)
the notification must be sent following the reverse order, i.e. starting from
the last registered module towards the first one.

Each module's report_syscall_entry can change parameters, and
the call itself, or even shortcut the call (in this latter case no further
module should manage the event).
If the system call (maybe a different system call) survives the chain
it is submitted to the kernel and exit event (kernel generated) is
gets managed using the standard sequence.

Using the same sequence for _UTRACE_EVENT_SYSCALL_ENTRY and
_UTRACE_EVENT_SYSCALL_EXIT is inconsistent, it simply forbids
virtualization nesting.
IMHO, the sequence for _UTRACE_EVENT_SYSCALL_ENTRY must be the one here
proposed and no one else. Virtualization can be used for protection,
(the sandbox effect), providing a way to change the processing order
of calls can be used to create threats.

I have updated my previous patch (x86_32 only), you can see it from
the svn of viewos.

http://view-os.svn.sourceforge.net/viewvc/view-os/trunk/kmview-kernel-module/kernel_patches/

renzo



Some ideas/proposals on utrace

2008-06-03 Thread Renzo Davoli
Dear Roland and dear utrace developers,

I am using utrace in View-OS. kmview is a partial virtual machine
engine based on utrace. kmview uses a kernel module and it is a flexible,
performant and transparent replacement for umview. (see
wiki.virtualsquare.org).

I am sending some messages about issues I found with utrace.
I am ready to submit code to implement the fix I propose, but I would
like to discuss with you and agree on goals and methods.

Three messages follows, with subjects:
1- TIF_SYSCALL_EMU is useless.
2- "skip syscall" management.
3- utrace module nesting (again)

renzo

-- 
========
Renzo Davoli| Dept. of Computer Science
(NIC rd235, HAM IZ4DJE) | University of Bologna 
Tel. +39 051 2094501| Mura Anteo Zamboni, 7
Fax. +39 051 2094510| I-40127 Bologna  ITALY
Key fingerprint = A019 17E2 5562 06F6 77BB  2E93 1A01 F646 30EA B487




2- "skip syscall" management.

2008-06-03 Thread Renzo Davoli
arch/x86/kernel/entry_32.S provides two ways to skip the call:
> syscall_trace_entry:
>   movl $-ENOSYS,PT_EAX(%esp)
>   movl %esp, %eax
>   xorl %edx,%edx
>   call do_syscall_trace
>   cmpl $0, %eax
*** this:
>   jne resume_userspace# ret != 0 -> running under PTRACE_SYSEMU,
># so must skip actual syscall
>   movl PT_ORIG_EAX(%esp), %eax
>   cmpl $(nr_syscalls), %eax
*** or this:
>   jnae syscall_call
>   jmp syscall_exit

Old ptrace used a non-zero return value by do_syscall_trace to skip the
call (skipping also the second do_syscall_trace on exit). If  orig_eax
(syscall no) is -1 the jnae fails as it is seen as the largest unsigned number.

Now PTRACE_SYSEMU is implemented using this latter method in kernel/ptrace.c.

IMHO the former is better.

In all architectures the code uses the following layers:
1-assembly code layer (entry_*.S for x86)
2-arch/*/kernel/ptrace.c
3-kernel/utrace.c
4-utrace module
or
4-kernel/ptrace.c when backward ptrace compatibility is required

Syscall skipping is a useful feature that many utrace modules may require.
Thus my proposal is to use a return value through all the interfaces
to skip the call.
More precisely:
- interface 1-2, is already in place for x86_32. when do_syscall_trace
returns nonzero the syscall get skipped. A similar management should be
coded for the other architectures. I have already written the
fix for ppc, ppc64 and (untested) x86_64 (I needed this for my
PTRACE_SYSVM patch).
- interface 2-3, the tracehook_report_syscall_entry should return an integer,
the call get skipped when non-zero.
- interface 3-4, i propose to add an action flag to skip the call.
report_syscall_entry can have one extra ACTION_FLAG say:
#define UTRACE_SYSCALL_SKIP 0x0100
It is possible to ask the lower level to abort the syscall, the
arch-dependent part of the kernel decides how to implement it
#define UTRACE_SYSCALL_ENOSYS 0x0200

My proposal has some pros:
- SYSEMU management becomes architecture-independent
Statements like these can be eliminated.
unsigned long *scno = ®s->orig_ax; /* XXX */
unsigned long *retval = ®s->ax;/* XXX */
- The boundary between arch-independent and arch-dependent sections of the
kernel is more consistent.
- It can be ported to different architrectures. kernel/ptrace.c is
independent from strange syscall and return value encodigs.

(BTW: I continue to say that my PTRACE_SYSVM is more flexible than PTRACE_SYSEMU
and at least as performant.
In with PTRACE_SYSEMU the next System Call is always virtualized (skipped),
with PTRACE_SYSVM it is possible to process the system call parameters
and decide on the fly if the call has to be virtualized or not.
PTRACE_SYSEMU supports only global virtualization (like User-Mode Linux),
while PTRACE_SYSVM supports *also* partial virtualization (like my
umview/kmview).)

renzo



1- TIF_SYSCALL_EMU is useless.

2008-06-03 Thread Renzo Davoli
This flag was used by the old ptrace.  PTRACE_SYSEMU is now managed by 
kernel/ptrace.c.
In fact, TIF_SYSCALL_EMU is cleared in ptrace_disable(arch/x86/kernel/ptrace.c)
and tested but never set.

renzo



Re: Tracing Syscalls under Fedora 9

2008-06-06 Thread Renzo Davoli
On Fri, Jun 06, 2008 at 04:38:34PM +0200, Martin Süßkraut wrote:
> has the tracing of system calls changed in utrace between Fedora 8 and 9?
> 
> My module works fine under Fedora 8, but under Fedora 9 the callbacks
> report_syscall_entry and report_syscall_exit seam not to be invoked
> any more.
I had the same problem.

For some reason the only way to trace the syscall is to trace also 
UTRACE_EVENT(SIGNAL_TERM)
or CORE.

I added an empty report_signal function and now it works.

This behavior was caused by this statement in arch/x86/kernel/ptrace.c:
>  if (!tracehook_consider_fatal_signal(current, SIGTRAP, SIG_DFL))
>goto out;

and in include/linux/tracehook.h:
> static inline int tracehook_consider_fatal_signal(struct task_struct *task,
>   int sig,
>   void __user *handler)
> {
>   return (tsk_utrace_flags(task) & (UTRACE_EVENT(SIGNAL_TERM) |
> UTRACE_EVENT(SIGNAL_CORE)));
> }

so if neither SIGNAL_TERM nor SIGNAL_CORE got catched, syscalls cannot be
traced.

Roland, is this a feature or a bug?

renzo



Re: Tracing Syscalls under Fedora 9

2008-06-09 Thread Renzo Davoli
> On Fri, Jun 06, 2008 at 04:38:34PM +0200, Martin Süßkraut wrote:
> > has the tracing of system calls changed in utrace between Fedora 8 and 9?
> For some reason the only way to trace the syscall is to trace also 
> UTRACE_EVENT(SIGNAL_TERM)
> or CORE.
> 
> I added an empty report_signal function and now it works.

Martin told me by an E-mail message that the change proposed above solved
his problem. This is for the people on the ML concerned with the same
trouble.

renzo



Utrace and process (partial) virtualization

2009-02-04 Thread Renzo Davoli
Dear Roland and dear utrace developers,

I am already having some problems regarding utrace, and more
specifically the utrace interface for (partial) virtual machines and 
(again) the support for utrace engines nesting.

I am writing my point of view here for a general discussion.

This is the summary:
1- Virtual Machines may need to change the system call

2- UTRACE_SYSCALL_ABORT: is it really useful as a return value for
report_syscall_entry?

3- Nesting, is it really useful to run all the reports in a row and 
(eventually) stop and the end waiting for all the engines?

4- report_syscall_entry engines evaluation order should be reversed


1- This is the simplest suggestion/request.
sometimes virtual machine engines need to change the system call 
(e.g. the process calls a "creat", the kernel must run "open" instead).
I suggest to add some useful inline functions in arch/*/include/asm/syscall.h:
syscall_set_nr // to set the system call number
syscall_get_pc // to get/set the program counter
syscall_set_pc
syscall_get_sp // to get/set the stack pointer
syscall_set_sp
These inline calls would help to create architecture independent
virtual machine engines.

Now the "hard" part:
2- Which is the scenario of virtual machines based on utrace?

In my mind there are two or three actors.
K- At the lowest layer there is the kernel providing utrace
M- There is a module which uses utrace and virtualize something.
   M can do all the virtualization at kernel level but maybe it uses also:
U- A userland Virtual Machine Monitor.

So we have K,M and U.

When a virtualized process does a syscall, K calls the report_syscall_entry 
function of M.
If M is entirely at kernel level it can decide whether to abort the syscall
(setting UTRACE_SYSCALL_ABORT) or not but there is no (clean) way to forward 
the request to U and wait for U's decision about the syscall.
SYSEMU can be implemented with utrace current interface as it aborts 
*all* the syscalls.
View-OS cannot use it. In fact km-view is a userland VM which need to 
decide which system calls must be skipped and which executed. 
It is not for View-OS only,
whoever tries to implement similar features will run into the same problem.

Maybe even VMMs entirely implemented in the kernel module need to delay
the decision about the action. I think UTRACE_STOP has exactly this
meaning: in Roland's ptrace implementation UTRACE_STOP is used in this way.
User-mode Linux running on ptrace do change the registers of the process
status while the process in in STOP state.

I am currently trying to implement a new kmview module using UTRACE_STOP.
When I need to skip the syscall I change the syscall (orig_ax in x86) number 
to -1 while the process is stopped.
Utrace believes that the syscall is *not* aborted then it passes orig_ax
(return ret ?: regs->orig_ax; in arch/x86/kernel/ptrace.c)
to the "entry_{32/64}.s" layer, causing the syscall to be skipped.
This is a dirty workaround.

I think that the specific actions (for syscalls, signals) should be
accepted during a utrace_control(..., UTRACE_RESUME).
In this way:
** K calls report_syscall_entry
** M sends the request to U and returns UTRACE_STOP.
   (M can then process requests for many other processes and many userland VMM)
** U receives the request, decides syscall abort or execute
** U sends its reply to M
** M calls utrace_control UTRACE_RESUME setting the action flag needed (e.g.
   UTRACE_SYSCALL_ABORT).

The same scenario can apply to userland management of signals, the
VMM or debugger could need to delay the decision among UTRACE_SIGNAL* cases,
and it is hard to keep the monitor inside the report_signal
upcall waiting to return a value. It would need another implementation of some
kind of process stop/quiescence inside the module.

3- Following the KMU schema above, let us now depict a scenario where
there are multiple M engines and multiple U VMMs on the same process.

If I have correctly understood the code, the current implementation
runs all the report upcalls in a row. If some ot the report upcalls return
UTRACE_STOP, utrace waits for all the stopped engine to send a UTRACE_RESUME.
(from utrace.c:
If another engine is keeping @target stopped, then it remains stopped until 
all engines let it resume.)

All the M engines may try to change the status of the process concurrently,
as each engine thinks the process has been stopped for its manamengent.

Maybe we have two different ideas of the STOP state and of process
virtualization.
For me a process in STOP state is blocked for inspection. During the STOP
state a module M can change the process status.
With "virtualized process" I mean a process that "sees" an environment 
different from that provided by the hosting kernel.
A user-mode linux process is a virtualized process.
In my mind several engines working on a process implement several layers
of virtualization.
The first engine provides the process a modified virtual world.
If a second engine gets loaded on the same process

utrace@FOSDEM

2009-02-07 Thread Renzo Davoli
I am at FOSDEM in Brussels.

(I'll give a talk tomorrow 11:00, not directly related to utrace).

If there are other utrace developers here araund we can meet in person
for some brainstorming

renzo



UTRACE_STOP race condition?

2009-02-11 Thread Renzo Davoli
Dear Roland and dear utrace developers,

please help me. Either I have not understood the meaning of UTRACE_STOP
or it is completely useless due to a race condition.

There are always two entities in a utrace interaction: the traced
process and the tracing module.

When a traced event occurs in the traced process the correspondent 
report function gets called in the module.

If the report function returns UTRACE_STOP the traced process stays in a
quiescent state and the module wakes it up by a 
utrace_control(...,UTRACE_RESUME) call *later*.

This *later* is the problem.

If the module wakes the traced process too quickly, utrace has not yet put
it into a "stopped" state, therefore UTRACE_RESUME gets lost.
As a consequence, the execution is blocked.

IMHO, given the current utrace code, there is no way to set up some kind
of synchronization in the module to prevent this error.

---

For the sake of simplicity let us assume one engine attached to the
traced process (the problem is the same for more engines).

The point is: when a report function returns UTRACE_STOP and later calls
utrace_control(...,UTRACE_RESUME) the traced process must not stop

t=0: Before the report function calling loop utrace->stopped=0;
 (In start_report: BUG_ON(utrace->stopped);)
t=1: REPORT FUNCTION CALL(no lock!):
t=2: When the report function returns UTRACE_STOP
 In finish_callback:
t=3: spin_lock(&utrace->lock);
 mark_engine_wants_stop(engine);
 spin_unlock(&utrace->lock);
t=4: in utrace_stop(..):
   spin_lock(&utrace->lock);
   utrace->stopped=1;
   __set_current_state(TASK_TRACED);
   spin_unlock(&utrace->lock);
   schedule(); --> now the traced process is blocked.

The module has "decided" UTRACE_STOP at t=1, then the module can call
utrace_control(...,UTRACE_RESUME) at any t>1.
If the resume call takes place before t=4 the request is lost and
the race condition causes the traced process to stop anyway.
In fact for 1stopped;
 ...
and therefore it does nothing.
 /*
  * Let the thread resume running.  If it's not stopped now,
  * there is nothing more we need to do.
  */
if (resume)
utrace_reset(target, utrace, NULL);
else
spin_unlock(&utrace->lock);

-
There are two solutions:

1- (slow & dirty): some sort of synchronization: no ptrace_control (or
  ptrace_set_events) should take place during all the sequence including
  from the report function call to the utrace->stopped=1.

2- (the nice one): add another flag named ENGINE_RESUME (like ENGINE_STOP).
  that flag must be cleared before calling the report function:
  t=0.5: clear_engine_wants_resume(engine);

  utrace_control(...,UTRACE_RESUME) should set the flag:
spin_lock(&utrace->lock);
mark_engine_wants_resume(engine);
spin_unlock(&utrace->lock);
 
  utrace_stop at t=4 (inside the lock) must check if the traced process has
  been already resumed.
  spin_lock(&utrace->lock);
  spin_lock_irq(&task->sighand->siglock);
  /* final check: is really needed to stop? */
  list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
  if ((engine->ops != &utrace_detached_ops) && 
engine_wants_stop(engine)) {
  if (engine_wants_resume(engine))
  clear_engine_wants_stop(engine);
  else
  utrace->stopped = 1;
  }
  }
  if (unlikely(!utrace->stopped)) {
  spin_unlock_irq(&task->sighand->siglock);
  spin_unlock(&utrace->lock);
  return false;
  }

  In this way the race condition should be eliminated.
  (it was eliminated in my proof-of-concept utrace patched implementation)
  If utrace_stop discovers that a resume request is already pending
  the traced process is not blocked.

-
Ptrace on utrace works because there is a workaround: 
the notification to the ptracer is called from within the utrace_stop
function *after utrace->stopped has been set*.
Ptrace would suffer from the same race condition otherwise.

I am looking forward to hearing some comments on this. From what I see,
Kmview cannot be implemented on the current utrace implementation.

renzo



Re: UTRACE_STOP race condition?

2009-02-11 Thread Renzo Davoli
On Wed, Feb 11, 2009 at 09:45:15AM -0500, Frank Ch. Eigler wrote:
> This may not answer your question, but I believe it is not proper to
> to make this call at any time t>1, only once you receive the quiesce
> callback.

Maybe I am wrong but the quiesce callback gets called *before* the other
report_* (say syscall_entry).

So when I capture UTRACE_QUIESCE, I got the report call before t=1.

Some communication from utrace to the module should happen *after* 
utrace->stopped is set to 1 
(something similar to the code Roland added for ptrace).



Even if it worked this way (i.e. return STOP and wait for report_quiesce,
I think the race condition there is in any case) the interface
to the module would be horrible.

When the module receives a report callback, it returns UTRACE_STOP and
then it needs to use some data structure to wait for a report_quiesce
to restart the traced process.

With the idea of patch included in my previous mail there is no need of
such a complexity.

Thank you for taking part to this discussion

renzo



[PATCH] UTRACE_STOP race condition?

2009-02-13 Thread Renzo Davoli
Dear Roland, dear utrace developers,

I have now a complete patch that seems to be quite stable.
At least Kmview have passed through the tests without getting stuck randomly 
for the race condition.

All the other comments about utrace&virtualization (see my message of Feb 04) 
are already pending
1- Virtual Machines may need to change the system call
2- UTRACE_SYSCALL_ABORT: is it really useful as a return value for
report_syscall_entry?
3- Nesting, is it really useful to run all the reports in a row and
(eventually) stop and the end waiting for all the engines?
4- report_syscall_entry engines evaluation order should be reversed

ciao
renzo

--- linux-2.6.29-rc4-utrace/kernel/utrace.c.mcgrath 2009-02-13 
18:28:25.0 +0100
+++ linux-2.6.29-rc4-utrace/kernel/utrace.c 2009-02-13 19:14:18.0 
+0100
@@ -491,6 +491,13 @@
 #define DEAD_FLAGS_MASK(UTRACE_EVENT(REAP))
 #define LIVE_FLAGS_MASK(~0UL)
 
+static void mark_engine_wants_stop(struct utrace_attached_engine *engine);
+static void clear_engine_wants_stop(struct utrace_attached_engine *engine);
+static bool engine_wants_stop(struct utrace_attached_engine *engine);
+static void mark_engine_wants_resume(struct utrace_attached_engine *engine);
+static void clear_engine_wants_resume(struct utrace_attached_engine *engine);
+static bool engine_wants_resume(struct utrace_attached_engine *engine);
+
 /*
  * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up.
  * @task == current, @utrace == current->utrace, which is not locked.
@@ -500,6 +507,7 @@
 static bool utrace_stop(struct task_struct *task, struct utrace *utrace)
 {
bool killed;
+   struct utrace_attached_engine *engine, *next;
 
/*
 * @utrace->stopped is the flag that says we are safely
@@ -521,6 +529,23 @@
return true;
}
 
+   /* final check: it is really needed to stop? */
+   list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
+   if ((engine->ops != &utrace_detached_ops) && 
engine_wants_stop(engine)) {
+   if (engine_wants_resume(engine)) {
+   clear_engine_wants_stop(engine);
+   clear_engine_wants_resume(engine);
+   }
+   else
+   utrace->stopped = 1;
+   }
+   }
+   if (unlikely(!utrace->stopped)) {
+   spin_unlock_irq(&task->sighand->siglock);
+   spin_unlock(&utrace->lock);
+   return false;
+   }
+
utrace->stopped = 1;
__set_current_state(TASK_TRACED);
 
@@ -784,6 +809,7 @@
  * to record whether the engine is keeping the target thread stopped.
  */
 #define ENGINE_STOP(1UL << _UTRACE_NEVENTS)
+#define ENGINE_RESUME  (1UL << (_UTRACE_NEVENTS+1))
 
 static void mark_engine_wants_stop(struct utrace_attached_engine *engine)
 {
@@ -800,6 +826,21 @@
return (engine->flags & ENGINE_STOP) != 0;
 }
 
+static void mark_engine_wants_resume(struct utrace_attached_engine *engine)
+{
+   engine->flags |= ENGINE_RESUME;
+}
+
+static void clear_engine_wants_resume(struct utrace_attached_engine *engine)
+{
+   engine->flags &= ~ENGINE_RESUME;
+}
+
+static bool engine_wants_resume(struct utrace_attached_engine *engine)
+{
+   return (engine->flags & ENGINE_RESUME) != 0;
+}
+
 /**
  * utrace_set_events - choose which event reports a tracing engine gets
  * @target:thread to affect
@@ -1050,6 +1091,10 @@
list_move(&engine->entry, &detached);
} else {
flags |= engine->flags | UTRACE_EVENT(REAP);
+   if (engine_wants_resume(engine)) {
+   clear_engine_wants_stop(engine);
+   clear_engine_wants_resume(engine);
+   }
wake = wake && !engine_wants_stop(engine);
}
}
@@ -1282,6 +1327,7 @@
 * There might not be another report before it just
 * resumes, so make sure single-step is not left set.
 */
+   mark_engine_wants_resume(engine);
if (likely(resume))
user_disable_single_step(target);
break;



[PATCH] #2 UTRACE_STOP race condition & nesting

2009-02-14 Thread Renzo Davoli
Dear Roland, dear utrace developers,
 
This is an updated patch. It solves the race condition + it gives a quick (a 
bit dirty)
solution to issues 3&4.
3- Nesting, is it really useful to run all the reports in a row and
(eventually) stop and the end waiting for all the engines?
The patch waits for each engine to resume before notifying the next registered 
engine.
4- report_syscall_entry engines evaluation order should be reversed
REPORT macros have an extra "reverse" argument. The macros append this string 
to the
list_for_each_entry_safe function name. All the macro calls skip this argument 
except
the one in report_syscall_entry where it is set to _reverse.

With this patch it is possible to run nested kmview machines and ptrace works 
inside
the virtual machines.

This patch is "a bit dirty" because variables and sections of code needed to 
count and test
the stopped engines are useless here: a task can be kept stopped for at most 
one engine at
a time.

This patch is a proof-of concept to show what I meant in my previous message.

For what concerns 1&2 (not included in this patch):
1- Virtual Machines may need to change the system call
THis is just to simplify the implementation of arch. independent virtual 
machine.
I have kept the definition of missing functions in the kmview module code.
2- UTRACE_SYSCALL_ABORT: is it really useful as a return value for
report_syscall_entry?
It is useless for kmview as the decision of aborting the system call is taken 
while
the process is stopped, I am currently setting the syscall number to -1 to skip 
the syscall.

For the sake of completeness there is another way to implement the partial 
virtual machine
stuff by introducing another "quiescence" state inside the report upcalls.
I mean: when utrace calls a report function (say for example 
report_syscall_entry), the function
in the module puts the process in a stopped state (maybe its TASK_TRACED and 
calls the schedule).
>From utrace's point of view the report function does not return until all the 
>changes in
the task state have been completed and the decision 
UTRACE_RESUME/UTRACE_SYSCALL_ABORT has been taken.
In this way UTRACE_STOP is never used because the module has to implement 
another feature
similar to UTRACE_STOP on its own. So what is UTRACE_STOP for?

ciao
renzo


--- linux-2.6.29-rc4-utrace/kernel/utrace.c.mcgrath 2009-02-13 
18:28:25.0 +0100
+++ linux-2.6.29-rc4-utrace/kernel/utrace.c 2009-02-14 09:17:31.0 
+0100
@@ -491,6 +491,13 @@
 #define DEAD_FLAGS_MASK(UTRACE_EVENT(REAP))
 #define LIVE_FLAGS_MASK(~0UL)
 
+static void mark_engine_wants_stop(struct utrace_attached_engine *engine);
+static void clear_engine_wants_stop(struct utrace_attached_engine *engine);
+static bool engine_wants_stop(struct utrace_attached_engine *engine);
+static void mark_engine_wants_resume(struct utrace_attached_engine *engine);
+static void clear_engine_wants_resume(struct utrace_attached_engine *engine);
+static bool engine_wants_resume(struct utrace_attached_engine *engine);
+
 /*
  * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up.
  * @task == current, @utrace == current->utrace, which is not locked.
@@ -500,6 +507,7 @@
 static bool utrace_stop(struct task_struct *task, struct utrace *utrace)
 {
bool killed;
+   struct utrace_attached_engine *engine, *next;
 
/*
 * @utrace->stopped is the flag that says we are safely
@@ -521,6 +529,23 @@
return true;
}
 
+   /* final check: is really needed to stop? */
+   list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
+   if ((engine->ops != &utrace_detached_ops) && 
engine_wants_stop(engine)) {
+   if (engine_wants_resume(engine)) {
+   clear_engine_wants_stop(engine);
+   clear_engine_wants_resume(engine);
+   }
+   else
+   utrace->stopped = 1;
+   }
+   }
+   if (unlikely(!utrace->stopped)) {
+   spin_unlock_irq(&task->sighand->siglock);
+   spin_unlock(&utrace->lock);
+   return false;
+   }
+
utrace->stopped = 1;
__set_current_state(TASK_TRACED);
 
@@ -784,6 +809,7 @@
  * to record whether the engine is keeping the target thread stopped.
  */
 #define ENGINE_STOP(1UL << _UTRACE_NEVENTS)
+#define ENGINE_RESUME  (1UL << (_UTRACE_NEVENTS+1))
 
 static void mark_engine_wants_stop(struct utrace_attached_engine *engine)
 {
@@ -800,6 +826,21 @@
return (engine->flags & ENGINE_STOP) != 0;
 }
 
+static void mark_engine_wants_resume(struct utrace_attached_engine *engine)
+{
+   engine->flags |= ENGINE_RESUME;
+}
+
+static void clear_engine_wants_resume(struct utrace_attached_engine *engine)
+{
+   engine->flags &= ~ENGINE_RESUME;

Re: [PATCH] UTRACE_STOP race condition?

2009-03-06 Thread Renzo Davoli
Dear Roland, dear utrace developers,

I have updated my patch #1 (it solves the race condition on utrace_stop but 
not the nesting issue) for the latest version of utrace.

renzo

On Fri, Feb 13, 2009 at 09:29:25PM +0100, Renzo Davoli wrote:
> I have now a complete patch that seems to be quite stable.
> At least Kmview have passed through the tests without getting stuck randomly 
> for the race condition.
> 
---
--- kernel/utrace.c.mcgrath 2009-03-05 15:09:57.0 +0100
+++ kernel/utrace.c 2009-03-06 11:20:48.0 +0100
@@ -369,6 +369,13 @@
return killed;
 }
 
+static void mark_engine_wants_stop(struct utrace_engine *engine);
+static void clear_engine_wants_stop(struct utrace_engine *engine);
+static bool engine_wants_stop(struct utrace_engine *engine);
+static void mark_engine_wants_resume(struct utrace_engine *engine);
+static void clear_engine_wants_resume(struct utrace_engine *engine);
+static bool engine_wants_resume(struct utrace_engine *engine);
+
 /*
  * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up.
  * @task == current, @utrace == current->utrace, which is not locked.
@@ -378,6 +385,7 @@
 static bool utrace_stop(struct task_struct *task, struct utrace *utrace)
 {
bool killed;
+   struct utrace_engine *engine, *next;
 
/*
 * @utrace->stopped is the flag that says we are safely
@@ -399,7 +407,23 @@
return true;
}
 
-   utrace->stopped = 1;
+   /* final check: it is really needed to stop? */
+   list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
+   if ((engine->ops != &utrace_detached_ops) && 
engine_wants_stop(engine)) {
+   if (engine_wants_resume(engine)) {
+   clear_engine_wants_stop(engine);
+   clear_engine_wants_resume(engine);
+   }
+   else
+   utrace->stopped = 1;
+   }
+   }
+   if (unlikely(!utrace->stopped)) {
+   spin_unlock_irq(&task->sighand->siglock);
+   spin_unlock(&utrace->lock);
+   return false;
+   }
+
__set_current_state(TASK_TRACED);
 
/*
@@ -625,6 +649,7 @@
  * to record whether the engine is keeping the target thread stopped.
  */
 #define ENGINE_STOP(1UL << _UTRACE_NEVENTS)
+#define ENGINE_RESUME  (1UL << (_UTRACE_NEVENTS+1))
 
 static void mark_engine_wants_stop(struct utrace_engine *engine)
 {
@@ -641,6 +666,21 @@
return (engine->flags & ENGINE_STOP) != 0;
 }
 
+static void mark_engine_wants_resume(struct utrace_engine *engine)
+{
+   engine->flags |= ENGINE_RESUME;
+}
+
+static void clear_engine_wants_resume(struct utrace_engine *engine)
+{
+   engine->flags &= ~ENGINE_RESUME;
+}
+
+static bool engine_wants_resume(struct utrace_engine *engine)
+{
+   return (engine->flags & ENGINE_RESUME) != 0;
+}
+
 /**
  * utrace_set_events - choose which event reports a tracing engine gets
  * @target:thread to affect
@@ -891,6 +931,10 @@
list_move(&engine->entry, &detached);
} else {
flags |= engine->flags | UTRACE_EVENT(REAP);
+   if (engine_wants_resume(engine)) {
+   clear_engine_wants_stop(engine);
+   clear_engine_wants_resume(engine);
+   }
wake = wake && !engine_wants_stop(engine);
}
}
@@ -1110,6 +1154,7 @@
 * There might not be another report before it just
 * resumes, so make sure single-step is not left set.
 */
+   mark_engine_wants_resume(engine);
if (likely(resume))
user_disable_single_step(target);
break;



Re: [PATCH] #2 UTRACE_STOP race condition & nesting

2009-03-06 Thread Renzo Davoli
Dear Roland, dear utrace developers,

I have update also the second patch (which includes the first).
This patch fixes the utrace_stop race condition and 
implements a consistent model of tracing engine nesting.

renzo
On Sat, Feb 14, 2009 at 10:11:55AM +0100, Renzo Davoli wrote:
>  
> This is an updated patch. It solves the race condition + it gives a quick (a 
> bit dirty)
> solution to issues 3&4.
>   3- Nesting, is it really useful to run all the reports in a row and
>   (eventually) stop and the end waiting for all the engines?
> The patch waits for each engine to resume before notifying the next 
> registered engine.
>   4- report_syscall_entry engines evaluation order should be reversed
> REPORT macros have an extra "reverse" argument. The macros append this string 
> to the
> list_for_each_entry_safe function name. All the macro calls skip this 
> argument except
> the one in report_syscall_entry where it is set to _reverse.
> 
> With this patch it is possible to run nested kmview machines and ptrace works 
> inside
> the virtual machines.
> 
> This patch is "a bit dirty" because variables and sections of code needed to 
> count and test
> the stopped engines are useless here: a task can be kept stopped for at most 
> one engine at
> a time.
> 
> This patch is a proof-of concept to show what I meant in my previous message.
> 
> For what concerns 1&2 (not included in this patch):
>   1- Virtual Machines may need to change the system call
> THis is just to simplify the implementation of arch. independent virtual 
> machine.
> I have kept the definition of missing functions in the kmview module code.
>   2- UTRACE_SYSCALL_ABORT: is it really useful as a return value for
>   report_syscall_entry?
> It is useless for kmview as the decision of aborting the system call is taken 
> while
> the process is stopped, I am currently setting the syscall number to -1 to 
> skip the syscall.
> 
> For the sake of completeness there is another way to implement the partial 
> virtual machine
> stuff by introducing another "quiescence" state inside the report upcalls.
> I mean: when utrace calls a report function (say for example 
> report_syscall_entry), the function
> in the module puts the process in a stopped state (maybe its TASK_TRACED and 
> calls the schedule).
> >From utrace's point of view the report function does not return until all 
> >the changes in
> the task state have been completed and the decision 
> UTRACE_RESUME/UTRACE_SYSCALL_ABORT has been taken.
> In this way UTRACE_STOP is never used because the module has to implement 
> another feature
> similar to UTRACE_STOP on its own. So what is UTRACE_STOP for?
> 
> ciao
>   renzo

---
--- kernel/utrace.c.mcgrath 2009-03-05 15:09:57.0 +0100
+++ kernel/utrace.c 2009-03-06 11:49:15.0 +0100
@@ -369,6 +369,13 @@
return killed;
 }
 
+static void mark_engine_wants_stop(struct utrace_engine *engine);
+static void clear_engine_wants_stop(struct utrace_engine *engine);
+static bool engine_wants_stop(struct utrace_engine *engine);
+static void mark_engine_wants_resume(struct utrace_engine *engine);
+static void clear_engine_wants_resume(struct utrace_engine *engine);
+static bool engine_wants_resume(struct utrace_engine *engine);
+
 /*
  * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up.
  * @task == current, @utrace == current->utrace, which is not locked.
@@ -378,6 +385,7 @@
 static bool utrace_stop(struct task_struct *task, struct utrace *utrace)
 {
bool killed;
+   struct utrace_engine *engine, *next;
 
/*
 * @utrace->stopped is the flag that says we are safely
@@ -399,7 +407,23 @@
return true;
}
 
-   utrace->stopped = 1;
+   /* final check: is really needed to stop? */
+   list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
+   if ((engine->ops != &utrace_detached_ops) && 
engine_wants_stop(engine)) {
+   if (engine_wants_resume(engine)) {
+   clear_engine_wants_stop(engine);
+   clear_engine_wants_resume(engine);
+   }
+   else
+   utrace->stopped = 1;
+   }
+   }
+   if (unlikely(!utrace->stopped)) {
+   spin_unlock_irq(&task->sighand->siglock);
+   spin_unlock(&utrace->lock);
+   return false;
+   }
+
__set_current_state(TASK_TRACED);
 
/*
@@ -625,6 +649,7 @@
  * to record whether the engine is keeping the target thread stopped.
  */
 #define ENGINE_STOP(1UL << _UTRACE_NEVENTS

[PATCH 1/2] UTRACE_STOP race condition (updated)

2009-03-12 Thread Renzo Davoli
Dear Roland, dear utrace developers,

I have updated my patch #1 (it solves the race condition on utrace_stop but
not the nesting issue) for the latest version of utrace.

I am trying to get the patches updated downloading, compiling and testing
the fixes every week or so... 
Things would be easier if these patch could be merged in the mainstream ;-)

renzo

diff -Naur linux-2.6.29-rc7-git5-utrace/kernel/utrace.c 
linux-2.6.29-rc7-git5-utrace-p1/kernel/utrace.c
--- linux-2.6.29-rc7-git5-utrace/kernel/utrace.c2009-03-12 
11:00:09.0 +0100
+++ linux-2.6.29-rc7-git5-utrace-p1/kernel/utrace.c 2009-03-12 
11:05:50.0 +0100
@@ -376,6 +376,13 @@
return killed;
 }
 
+static void mark_engine_wants_stop(struct utrace_engine *engine);
+static void clear_engine_wants_stop(struct utrace_engine *engine);
+static bool engine_wants_stop(struct utrace_engine *engine);
+static void mark_engine_wants_resume(struct utrace_engine *engine);
+static void clear_engine_wants_resume(struct utrace_engine *engine);
+static bool engine_wants_resume(struct utrace_engine *engine);
+
 /*
  * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up.
  * @task == current, @utrace == current->utrace, which is not locked.
@@ -385,6 +392,7 @@
 static bool utrace_stop(struct task_struct *task, struct utrace *utrace)
 {
bool killed;
+   struct utrace_engine *engine, *next;
 
/*
 * @utrace->stopped is the flag that says we are safely
@@ -406,7 +414,23 @@
return true;
}
 
-   utrace->stopped = 1;
+   /* final check: it is really needed to stop? */
+   list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
+   if ((engine->ops != &utrace_detached_ops) && 
engine_wants_stop(engine)) {
+   if (engine_wants_resume(engine)) {
+   clear_engine_wants_stop(engine);
+   clear_engine_wants_resume(engine);
+   }
+   else
+   utrace->stopped = 1;
+   }
+   }
+   if (unlikely(!utrace->stopped)) {
+   spin_unlock_irq(&task->sighand->siglock);
+   spin_unlock(&utrace->lock);
+   return false;
+   }
+
__set_current_state(TASK_TRACED);
 
/*
@@ -632,6 +656,7 @@
  * to record whether the engine is keeping the target thread stopped.
  */
 #define ENGINE_STOP(1UL << _UTRACE_NEVENTS)
+#define ENGINE_RESUME  (1UL << (_UTRACE_NEVENTS+1))
 
 static void mark_engine_wants_stop(struct utrace_engine *engine)
 {
@@ -648,6 +673,21 @@
return (engine->flags & ENGINE_STOP) != 0;
 }
 
+static void mark_engine_wants_resume(struct utrace_engine *engine)
+{
+   engine->flags |= ENGINE_RESUME;
+}
+
+static void clear_engine_wants_resume(struct utrace_engine *engine)
+{
+   engine->flags &= ~ENGINE_RESUME;
+}
+
+static bool engine_wants_resume(struct utrace_engine *engine)
+{
+   return (engine->flags & ENGINE_RESUME) != 0;
+}
+
 /**
  * utrace_set_events - choose which event reports a tracing engine gets
  * @target:thread to affect
@@ -906,6 +946,10 @@
list_move(&engine->entry, &detached);
} else {
flags |= engine->flags | UTRACE_EVENT(REAP);
+   if (engine_wants_resume(engine)) {
+   clear_engine_wants_stop(engine);
+   clear_engine_wants_resume(engine);
+   }
wake = wake && !engine_wants_stop(engine);
}
}
@@ -1133,6 +1177,7 @@
 * There might not be another report before it just
 * resumes, so make sure single-step is not left set.
 */
+   mark_engine_wants_resume(engine);
if (likely(resume))
user_disable_single_step(target);
break;



[PATCH 2/2] UTRACE_STOP: nesting engine management (updated)

2009-03-12 Thread Renzo Davoli
Dear Roland, dear utrace developers,

I have update also the second patch. Please note that now this patch
must be applied after the first one.
This patch implements a consistent nesting model for utrace machines.
(There is a full description in the messages I sent on Feb. 14 and Mar. 6)

renzo
---
diff -Naur linux-2.6.29-rc7-git5-utrace-p1/kernel/utrace.c 
linux-2.6.29-rc7-git5-utrace-p2/kernel/utrace.c
--- linux-2.6.29-rc7-git5-utrace-p1/kernel/utrace.c 2009-03-12 
11:05:50.0 +0100
+++ linux-2.6.29-rc7-git5-utrace-p2/kernel/utrace.c 2009-03-12 
13:37:27.0 +0100
@@ -1405,6 +1405,7 @@
 static bool finish_callback(struct utrace *utrace,
struct utrace_report *report,
struct utrace_engine *engine,
+   struct task_struct *task,
u32 ret)
 {
enum utrace_resume_action action = utrace_resume_action(ret);
@@ -1426,6 +1427,7 @@
spin_lock(&utrace->lock);
mark_engine_wants_stop(engine);
spin_unlock(&utrace->lock);
+   utrace_stop(task, utrace);
}
} else if (engine_wants_stop(engine)) {
spin_lock(&utrace->lock);
@@ -1492,7 +1494,7 @@
ops = engine->ops;
 
if (want & UTRACE_EVENT(QUIESCE)) {
-   if (finish_callback(utrace, report, engine,
+   if (finish_callback(utrace, report, engine, task,
(*ops->report_quiesce)(report->action,
   engine, task,
   event)))
@@ -1526,24 +1528,24 @@
  * @callback is the name of the member in the ops vector, and remaining
  * args are the extras it takes after the standard three args.
  */
-#define REPORT(task, utrace, report, event, callback, ...)   \
+#define REPORT(reverse, task, utrace, report, event, callback, ...)
  \
do {  \
start_report(utrace); \
-   REPORT_CALLBACKS(task, utrace, report, event, callback,   \
+   REPORT_CALLBACKS(reverse, task, utrace, report, event, 
callback,  \
 (report)->action, engine, current,   \
 ## __VA_ARGS__); \
finish_report(report, task, utrace);  \
} while (0)
-#define REPORT_CALLBACKS(task, utrace, report, event, callback, ...) \
+#define REPORT_CALLBACKS(reverse, task, utrace, report, event, callback, ...)  
  \
do {  \
struct utrace_engine *engine; \
const struct utrace_engine_ops *ops;  \
-   list_for_each_entry(engine, &utrace->attached, entry) {   \
+   list_for_each_entry ## reverse(engine, &utrace->attached, 
entry) {\
ops = start_callback(utrace, report, engine, task,\
 event);  \
if (!ops) \
continue; \
-   finish_callback(utrace, report, engine,   \
+   finish_callback(utrace, report, engine, task,   
  \
(*ops->callback)(__VA_ARGS__));   \
} \
} while (0)
@@ -1558,7 +1560,7 @@
struct utrace *utrace = task_utrace_struct(task);
INIT_REPORT(report);
 
-   REPORT(task, utrace, &report, UTRACE_EVENT(EXEC),
+   REPORT(, task, utrace, &report, UTRACE_EVENT(EXEC),
   report_exec, fmt, bprm, regs);
 }
 
@@ -1573,7 +1575,7 @@
INIT_REPORT(report);
 
start_report(utrace);
-   REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(SYSCALL_ENTRY),
+   REPORT_CALLBACKS(_reverse, task, utrace, &report, 
UTRACE_EVENT(SYSCALL_ENTRY),
 report_syscall_entry, report.result | report.action,
 engine, current, regs);
finish_report(&report, task, utrace);
@@ -1615,7 +1617,7 @@
struct utrace *utrace = task_utrace_struct(task);
INIT_REPORT(report);
 
-   REPORT(task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT),
+   REPORT(, task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT),
   report_syscall_exit, regs);
 }
 
@@ -1640,7 +1642,7 @@
start_re

Re: [PATCH 2/2] UTRACE_STOP: nesting engine management (updated)

2009-03-12 Thread Renzo Davoli
> Again, we need Roland's opinion, but could you explain why it would
> be better to use _reverse in utrace_report_syscall_entry() ?

I refer to this posting:
http://www.mail-archive.com/utrace-devel@redhat.com/msg00579.html

Item #4 explains why it is *needed* to reverse the order in 
utrace_report_syscall_entry
to have a consistent implementation of nested virtualization.

> I don't think this is safe. If we do utrace_stop() here, the next engine
> can be detached before we return (UTRACE_DETACH assumes it it safe to
> unlink the engine when the target is stopped). This means we can't
> continue list_for_each_entry(engine, &utrace->attached, entry) after
> return from finish_callback().

Maybe this is not the best patch, maybe we can solve the problem in a
better way.
The point is explained in #3 in the same posting cited above.

When a report function of an engine returns UTRACE_STOP, it means (may mean)
that it wants to change the status of the process before resuming it.
VM monitors often change the status, sometimes debugger users want to set
some variables too.

IMHO, utrace should stop it *before* calling the report function of the 
next engine, otherwise we need to set up another structure to synchronize
the engines (that may even be unknown one to the other).
If there is a tracer/debugger among the engines, it is not even possible to know
which snapshot it gets, after or before the modification created by the VM
monitor?

With these patches it is possible to run nested virtual machines based
on utrace, it is also possbile to strace (use ptrace) on processes running
inside a VM.

renzo



Re: [PATCH 2/3] utrace core

2009-03-21 Thread Renzo Davoli
Tracing does not mean only debug. Some tracing facilities can be used for 
virtualization.
For example User-Mode Linux is based on ptrace.

I have a prototype of kernel module for virtualization (kmview) based on utrace.
Using kmview (module+VMM) it is possible for a user (not root) to mount a 
filesystem just for 
a process (or a hierarchy of processes), or it is possible for some processes to
use different networking stacks or virtual devices. It is something like 
user-mode containers.
kmview provides the same features of umview, based on ptrace, in a (very) 
faster way.
(umview is in Debian lenny,squeeze,sid if you want to test it)

*Utrace is really what I wanted* to support kmview (apart from
some minor issues about the support of nested virtualizations).
Other virtualizations now based on ptrace could move part of their 
implementation
at kernel level by utrace and several speedups become possible.
For example kmview is a partial virtual machine monitor: some system calls are 
forwarded
to the kernel, some others virtualized.
When a user mounts a filesystem, all the system calls which use pathnames 
inside the mountpoint 
subtree get virtualized while the others are forwarded to the kernel.
With utrace the kmview kernel module handles many system calls at kernel level.
I mean, if an "open" system call was sent to the kernel because the path is 
outside
the virtualized part of the file system, all the system calls on the same file 
descriptors 
can be forwarded to the kernel without any request to the VMM at user level.
This is just one example of speedup, several others are possible.

Other virtualizations like user-mode linux or fakeroot-ng could use utrace to
speedup their virtualization, too.

As far as I have seen, systemtap is a wonderful tool for debugging, expecially 
for 
kernel debugging but it has not been designed for virtualization.
Ptrace provide a standard set of features and all the implementations of VMM 
must be 
in userland. Utrace provides the flexibility to split a VMM and move part of it 
to a 
kernel module.

Utrace provides a unified interface to kernel modules for 
tracing/virtualization.
kmview can be implemented as a client of utrace or by spreading code around the 
kernel and
like kmview other virtualizations based on ptrace could need to move some of 
their
logic to the kernel to speedup their execution.
These VMMs will use utrace based modules instead of kernel patches.

renzo

On Sat, Mar 21, 2009 at 01:49:09AM -0700, Andrew Morton wrote:
> I'd be interested in seeing a bit of discussion regarding the overall value
> of utrace - it has been quite a while since it floated past.
> 
> I assume that redoing ptrace to be a client of utrace _will_ happen, and
> that this is merely a cleanup exercise with no new user-visible features?
> 
> The "prototype utrace-ftrace interface" seems to be more a cool toy rather
> than a serious new kernel feature (yes?)
> 
> If so, what are the new killer utrace clients which would justify all these
> changes?
> 
> Also, is it still the case that RH are shipping utrace?  If so, for what
> reasons and what benefits are users seeing from it?
> 
> And I recall that there were real problems wiring up the Feb 2007 version
> of utrace to the ARM architecture.  Have those issues been resolved?  Are
> any problems expected for any architectures?



Re: [PATCH 2/3] utrace core

2009-03-21 Thread Renzo Davoli
On Sat, Mar 21, 2009 at 03:34:57PM +0100, Ingo Molnar wrote:
> 
> * Renzo Davoli  wrote:
> 
> > Tracing does not mean only debug. Some tracing facilities can be 
> > used for virtualization. For example User-Mode Linux is based on 
> > ptrace.
> > 
> > I have a prototype of kernel module for virtualization (kmview) 
> > based on utrace. [...]
> 
> Hm, i cannot find the source code. Can it be downloaded from 
> somewhere?
Sure! kmview is not included in our Debian packages yet as it relies on 
(still) non mainstream features (utrace), but the code is available on 
our view-os svn repository.

Check out:
svn co https://view-os.svn.sourceforge.net/svnroot/view-os view-os 

More specifically to browse the code/specifications:
The kmview device protocol is here:
http://wiki.virtualsquare.org/index.php/KMview_module_interface_specifications
The kernel module itself is here:
http://view-os.svn.sourceforge.net/viewvc/view-os/trunk/kmview-kernel-module/
The VMM userland application share most of the code with
umview, the source code for both is here:
http://view-os.svn.sourceforge.net/viewvc/view-os/trunk/xmview-os/xmview/

kmview kernel module (current version) needs the following patches:
utrace
http://www.mail-archive.com/utrace-devel@redhat.com/msg00654.html
http://www.mail-archive.com/utrace-devel@redhat.com/msg00655.html
I am trying to keep everything up to date, but the whole stuff is
evolving in a quite fast way.

Everything has been released under GPLv2.

renzo



utrace-kmview contract

2009-03-23 Thread Renzo Davoli
Dear Roland,

You are right when you say that the interface specification is a contract
between utrace and the module writers.
My goal is to use utrace for my virtual machines, your goal is to
design utrace as a support for a wide range of applications.
I hope your "wide range of applications" will include kmview.

In my perception utrace's support of multiple engines needs a supplement of
investigation.
I do not want my patches enter utrace code provided there is another
fast/clean/easy to code way to reach the same results.
It is not for kmview alone, I think this is an example for a range
of virtualization application based on utrace.
When utrace is used for debugging, "the faster, the better" invariant holds,
but when you are dealing with virtualization the rule changes to
"the slower, the useless!".
Debugging is a temporary state of an application, while virtualization must be
designed to be used as a standard environment.

Sometimes a picture worth thousands of words. 
http://www.cs.unibo.it/~renzo/4roland20090323.pdf
I have drawn some examples. This is actually a simplified view
just to show the problems.
The module unreal is a test module for kmview that virtualizes the /unreal
subtree as a "copy" of the file system ("/unreal/x/y/z is the
file /x/y/z).
I know that a so simple transformation could have been implemented directly
inside the report_syscall function but kmview is a general support
for virtualization. unreal is just a simple test for it.
kmview is composed by a kernel module and the "agent" in user space.

In the first slide a user runs kmview and inside the vm he/she loads the
unreal module and runs a cat command. When cat tries to open 
"/unreal/etc/passwd", unreal rewrites the path to /etc/passwd, the kernel
runs an "open" system call but the arguments have been modified.
The report_syscall_entry routine must send the path to kmview in userland
and wait for the answer.
The number on the arrows show the sequence of actions.

The second slide shows a tracing/debugging tool used with virtualization.
This is an example of multiple engines working on the same process.
strace must read its data before the virtualization for report_syscall_entry.
On the contrary the return value shown by strace must be the one returned
by the kmview virtualization engine, thus the order for report_syscall_entry
is the reverse of that used by report_syscall_exit.
Note that if instead of "strace cat /unreal/etc/passwd" our user wrote
"strace -f -o /tmp/xxx kmview bash" as the first command the order of the
engine would have been inverted. strace in fact should show the system call
trace as they appear "outside the virtualization" as one may expect 
from the command.

The third slide shows a nested virtualization and the forth a debug tool
running inside a nested virtualization.
In all these examples I'd use UTRACE_STOP.

Now let us discuss the details of the contract ;-)

I set up two different implementations of kmview kernel module.
In the standard one (#undefine KMVIEW_NEWSTOP) the report_syscall
function returns UTRACE_STOP waiting for the answer from kmview application.
The new one (#define KMVIEW_NEWSTOP) uses a semaphore to stop the execution 
inside the report_syscall function which always returns UTRACE_RESUME.


If you decide that the right implementation is the former 
(#undefine KMVIEW_NEWSTOP):
- please tell me how to implement the example of page 3 if in the management of
syscall_entry for kmview2 does not stop prior to call kmview1.
Okay, you say kmview1's module receives a notification that another engine 
wants to stop reading its @action argument but it needs the state as 
modified by kmview2.
- I could set up some kind of synchronization among kmview machines but the
solution would be extremely weak. What about if kmview run nested with another
virtualization/tracing application based on utracei e.g. strace?
- You say "use UTRACE_REPORT" to wait for the other machines are done
fiddling with it. 
The comment you wrote about UTRACE_REPORT says:
* This is like %UTRACE_RESUME, but also ensures that there will be
* a @report_quiesce or @report_signal callback made soon.  If
* @target had been stopped, then there will be a callback before it
* resumes running normally.  If another engine is keeping @target
* stopped, then there might be no callbacks until all engines let
* it resume.
But if kmview1 and 2 have both stopped the report_syscall so no callback will
be called until both finishes. 
Otherwise you may mean that kmview1 returns UTRACE_RESUME and when
kmview1's report quiesce get called it returns UTRACE_STOP. In this way
the management of the system call should be moved from the 
report_syscall_entry to report_quiesce but just for kmview1.
Which one is the cleaner way to implement a service on utrace in you opinion? 
In my opinion the possibility to have the process blocked before
calling the next report function leads to s

Re: resuming after stop at syscall_entry

2009-04-25 Thread Renzo Davoli
> Enter Renzo Davoli.  

Here I am!

I have spent my time testing the latest version and trying to figure out
how to implement "nested Renzo's engines" with the support you propose.

Comments on the latest version of utrace:
-
1- syscall_entry report reversed.
wonderful, thank you. Now kmview.ko runs on vanilla utrace provided
KMVIEW_NEWSTOP is defined.
KMVIEW_NEWSTOP stops the process inside the syscall report function
so it is a undesirable workaround, not a solution.
Anyway this can be used as a proof-of-concept: the problem related to
the order of callbacks for syscall_entry is solved.
-
2- utrace_control(.., UTRACE_RESUME) can arrive too early, before
ENGINE_STOP is set (in engine->flags by mark_engine_wants_stop).

Let us name p the traced process and vm the tracer.
t=10: p reports a system call. 
 during the report function, p communicates with vm 
 the report function returns UTRACE_STOP
 utrace is unlocked during the report function.
t=20: p records its need to stop: 
  (lock) engine->flags |= ENGINE_STOP; (unlock)

later (time t' > 10) vm calls utrace_control(p, engine, ENGINE_RESUME):
if t' < 20 the request gets lost!
in fact:
t=15:   utrace_control gets the lock
resume=utrace->stopped IS ZERO!
clear_engine_wants_stopped clears ENGINE_STOP which has not been
set yet
at t=20 ENGINE_STOP is set and the task blocked.

There are two "clean" "non-baroque" approaches to solve this problem:
2A- interface approach: 
long time ago utrace had a utrace_set_flags call to set ENGINE_STOP flag 
before p communicates with vm. In this way ENGINE STOP will always 
be cleared after it has been set.
2B- implementation approach:
use two bits: ENGINE_STOP and ENGINE_RESUME.
before t=10 ENGINE_STOP and ENGINE_RESUME are unset.
utrace_control(p, engine, UTRACE_RESUME) must set ENGINE_RESUME and clear
ENGINE_STOP.
at t=20 p can check if there has been a fast resume request. In this case
ENGINE_STOP is not set.

It is possible to create other workarounds, barriers, fake reports, 
busy wait loops... If we want something effective, we must implement
solutions not workarounds. If a engine say UTRACE_STOP and later
UTRACE_RESUME, the task must be resumed. The simplest, the better.

My patch in:
http://view-os.svn.sourceforge.net/viewvc/view-os/trunk/kmview-kernel-module/kernel_patches/linux-2.6.29-patch1?revision=637&view=markup
implements 2B and works with the latest utrace implementation.
--
Comments on the proposal.

Roland, let me say frankly that the repeated report scan for system call
is just a step towards a solution, but I do not like it so much.

Problem #1: when each engine receives the same syscall_entry report several
times, each engine must discover if:
- a previous engine has already stopped this task
  ( utrace_resume_action(action) == UTRACE_STOP)
- this is a repeated scan and the current engine has already processed this
  report (there is the risk to process it twice). 
- this is a real new report

Maybe I can keep the address of the engine which stopped
the task somewhere (say in a task private variable stopengine).  
During the repeated scan:
- if stopengine is NULL is a fresh call.
- else (stopengine != NULL) means that the current engine has 
already processed this report
- if stopengine == this engine then set stopengine to NULL.
A more portable approach follows (*) :
Each engine records if it stopped the task.
During the repeated scan:
- if ! (action & UTRACE_SYSCALL_RESUMED) this is a fresh call
- else the current engine has already processed this report
- if this engine stopped the task then clear 
UTRACE_SYSCALL_RESUMED in the action returned.

This is not a nice solution: this "protocol" must be consistently applied
by all the modules using utrace otherwise they cannot interoperate.
If a report_syscall_entry does not behave in the same way it may receive
repeated reports or force other engines to skip some reports.

All the programmers of utrace modules should always agree on these 
details: not a good interface for a long term interoperability.

Problem #2: syscall exit may need to modify the return value/errno.
The need for stop&go at each engine applies not only to syscall_entry.


I really do not understand why is so unaccetable to have a UTRACE_STOP_NOW
tag to stop a process *before* reporting to the next engine.
The interface would be clean, interoperability between tracing and virtualizing
guaranteed.

It is not a matter of performance. If your engine need to see the 
system call that is going to be done by the kernel as you say:
if (utrace_resume_action(action) == UTRACE_STOP)
return UTRACE_REPORT
it has to wait all the virtualize

Bug: report_reap is never called

2009-09-05 Thread Renzo Davoli
Hi Ronald & utrace developers

I am back...

I am upgrading my kmview support and I have stepped into a clear bug.

in utrace_reap:

--
list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
ops = engine->ops;
engine->ops = NULL;
engine->flags = 0;
list_move(&engine->entry, &detached);

/*
 * If it didn't need a callback, we don't need to drop
 * the lock.  Now nothing else refers to this engine.
 */
if (!(engine->flags & UTRACE_EVENT(REAP)))
  continue;


The code following this 'if' is never executed (i.e. the reap callback never
called).
In fact it is impossible for (engine->flags & UTRACE_EVENT(REAP)) to be
true given that a few statement above engine->flags has been set to 0!

To fix the bug:
clean all the events but reap:
engine->flags &= UTRACE_EVENT(REAP);
or save the flag in a temporary var before cleaning it, as you do for 
engine->ops.

ciao
renzo



Re: linux-next: add utrace tree

2010-01-25 Thread Renzo Davoli
Let me add my two euro-cents to this discussion.

Mark Wielaard :
> Unfortunately ptrace does all that magic already (badly). People don't
> just use it for (s)tracing syscalls, but also for tracing signals, for
>  single step debugging and poking at memory, register state, for process
> jailing and virtualization (uml) through syscall emulation.
> So when they are talking about these fancy things that is because that
> is what ptrace gives them currently. And they hate it, because the
> ptrace interface is such a pain to work with. And all these things don't
> really work together. You cannot trace, emulate, debug, jail at the same
> time.
I support Mark's words. I don't use ptrace for debugging/tracing and I
have experienced severe limitations of ptrace interface.
(I have tried to post some extensions for ptrace to overcome some 
constraints see my posts on ptrace_vm or ptrace_multi on LKML).

Oleg Nesterov, writing to Andrew Morton said:
> First of all, utrace makes other things possible.  gdbstub,
> nondestructive core dump, uprobes, kmview, hopefully more.  I didn't
> look at these projects closely, perhaps other people can tell more.  As
> for their merge status, until utrace itself is merged it is very hard to
> develop them out of tree.

In the list above there is also kmview, which is a creature of mines.
umview and kmview are partial virtual machines, processes running
in a [uk]mview machine can have their own view for the file system, 
networking support, user-id, system-name, etc.
A [uk]mview machine virtualizes just what the user need: the filesystem
or just a subtree/some subtrees or networking or define one/some
virtual devices, etc. The "view" provided by a [uk]mview machine can be
a composition of real resources (provided by the Linux kernel) and
virtual resources.

Each system call request gets hijacked to a module of [uk]mview when
it refers to a virtual resource. The request is forwarded to the kernel
otherwise.

umview is based on ptrace, kmview uses a kernel module based on utrace.
(umview is included in debian lenny (to sid), tutorial and manuals in 
wiki.virtualsquare.org)

IMHO utrace is better than ptrace (or an optimized version of it):
1 - "Frank Ch. Eigler" wrote: 
> At least one reason is that ptrace is single-usage-only, so for
> example you cannot concurrently debug & strace the same program.
  - exactly. utrace allows multiple tracing engines, this means that kmview 
  machines can be nested (in a natural way, no extra code is needed for
  this feature). In the same way strace/gdb can run on virtualized processes, 
too.
2 - kmview kernel module implements several optimizations
  to minimize the number of requests forwarded to the kmview process
  (the virtual machine monitor). kmview is just a module using the
  utrace interface, prior attempts of optimized umview required kernel patches.
  Like kmview any other service requiring process tracing can include 
  specific optimizations in its own kernel module.
  On the other hand, all these services could use the standardized utrace
  interface for their optimizations, instead asking for messy patches 
  to change code all around the kernel source.
3 - ptrace takes SIGSTOP/SIGCONT for its own management. Strace/gdb and
  umview cannot be transparent for programs using these signals.

Oleg Nesterov talking about Ptrace said:
> Of course they can't use other interfaces, we don't have them. And
> without the new abstraction layer we will never have, I think.
I agree.

THe following list includes the execution times I got in a recent test 
(make vde-2, see http://www.cs.unibo.it/~renzo/view-os-lk2009.pdf)
plain kernel 22.7s, 
kmview (no modules) 23.9s (+5.5%), 
full kmview (modules loaded, all syscall virtualized) 38.5s (+70%)
optimized umview 51.0 (+124%), 
umview on vanilla kernel 75.7s (+233%).

utrace can be used to speedup virtualization (at least in my case
it worked in this way). 
Performance can be useful for debugging but it is a main issue for
virtualization.
Kmview module provides optimizations to select the system call requests 
depending on the syscall number, the pathnames or the file descriptors. 
http://wiki.virtualsquare.org/index.php/KMview_module_interface_specifications
Trying to add all the optimizations needed by different projects to ptrace is a
never-ending nightmare: the LKML will continue to receive patch proposals
for ptrace... 
The solution is that everybody can code his/her optimized kernel/user 
interface for tracing in his/her kernel module, i.e. utrace.

renzo



Re: Tracing with utrace, some questions

2010-10-11 Thread Renzo Davoli
On Mon, Oct 11, 2010 at 10:19:40AM +0300, Ali Polatel wrote:
> > Renzo Davoli's umview/kmview is just such an animal.
> > See http://wiki.virtualsquare.org for details.
> Looks like really nice example for me! I'm reading it now :)
And I am here listening on this ML if you need further info on it.

ciao
renzo



Call for utrace survival

2011-06-09 Thread Renzo Davoli
My project kmview is based on utrace.
Utrace is a wonderful tool to support partial virtualization, I have found no
other tools providing a tracing interface of user processes by kernel modules.
-systemtap, dtrace: are mainly for kernel debug
-LTTng: creates traces for off-line debugging (ust needs the program to be 
compiled for tracing, it does not work on existing binaries).

utrace can be a fast and smart replacement for the old/slow ptrace.

Now the kernel patches are getting obsolete...

This is a call for utrace users and developers to see if there are enough
(human) resources to continue the project.
I am considering to fork the subset of the project needed by kmview, but
if there are enough other projects and developers interested to utrace survival
we can work together.

renzo davoli
virtualsquare labs
University of Bologna



Utrace for 2.6.39.1 on View-OS/VirtualSquare

2011-06-14 Thread Renzo Davoli
I need utrace for kmview so I have updated the utrace support for 2.6.39.1.

The code is here:
http://view-os.svn.sourceforge.net/viewvc/view-os/trunk/utrace/

It seems to work. I have tested kmview on it.

renzo




Re: [RFC v2 00/19] utrace for 3.0 kernel

2011-07-13 Thread Renzo Davoli
On Mon, Jul 11, 2011 at 06:19:33PM -0700, Josh Stone wrote:
> On 06/30/2011 05:20 PM, Oleg Nesterov wrote:
> > TODO:
> > 
> > - Testing.
> 
> I ran the whole systemtap testsuite with a kernel built from your git
> tree, and did not see any utrace-specific issues.  Thanks!
I have got the git tree, too.
I can confirm that also my kmview works on this version of utrace.
Thank you Oleg.
renzo