Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-14 Thread Zach Brown


I'm finally back from my travel and conference hiatus.. you guys have  
been busy! :)


On Feb 13, 2007, at 6:20 AM, Ingo Molnar wrote:

I'm pleased to announce the first release of the "Syslet" kernel  
feature

and kernel subsystem, which provides generic asynchrous system call
support:

   http://redhat.com/~mingo/syslet-patches/


In general, I really like the look of this.

I think I'm convinced that your strong preference to do this with  
full kernel threads (1:1 task_struct -> thread_info/stack  
relationship) is the right thing to do.  The fibrils fell on the side  
of risking bugs by sharing task_structs amongst stacks executing  
kernel paths.  This, correct me if I'm wrong, falls on the side of  
risking behavioural quirks stemming from task_struct references that  
we happen to have not enabled sharing of yet.


I have strong hopes that we won't actually *care* about the  
behavioural differences we get from having individual task structs  
(which share the important things!) between syscall handlers.  The  
*only* seemingly significant case I've managed to find, the IO  
scheduler priority and context fields, is easy enough to fix up.   
Jens and I have been talking about that.  It's been bugging him for  
other reasons.


So, thanks, nice work.  I'm going to focus on finding out if its  
feasible for The Database to use this instead of the current iocb  
mechanics.  I'm optimistic.



Syslets are small, simple, lightweight programs (consisting of
system-calls, 'atoms')


I will admit, though, that I'm not at all convinced that we need  
this.  Adding a system call for addition (just addition?  how far do  
we go?!) sure feels like a warning sign to me that we're heading down  
a slippery slope.  I would rather we started with an obviously  
minimal syscall which just takes an array of calls and args and  
executes them unconditionally.


But its existance doesn't stop the use case I care about.  So it's  
hard to get *too* worked up about it.



Comments, suggestions, reports are welcome!


For what it's worth, it looks like 'x86-optimized-copy_uatom.patch'  
got some hunks that should have been in 'x86-optimized- 
sys_umem_add.patch'.


- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-14 Thread Davide Libenzi
On Wed, 14 Feb 2007, Jeremy Fitzhardinge wrote:

> Davide Libenzi wrote:
> >> Would this work?
> >> 
> >
> > Hopefully the API will simplify enough so that emulation will becomes 
> > easier.
> >   
> 
> The big question in my mind is how all this stuff interacts with
> signals.  Can a blocked syscall atom be interrupted by a signal?  If so,
> what thread does it get delivered to?  How does sigprocmask affect this
> (is it atomic with respect to blocking atoms)?

Signal context is another thing that we need to transfer to the 
return-to-userspace task, in case we switch. Async threads inherit that 
from the "main" task once they're created, but from there to the 
sys_async_exec syscall, userspace might have changed the signal context, 
and re-emerging with a different one is not an option ;)
We should setup service-threds signal context, so that we can cancel them, 
but the implementation should be hidden to userspace (that will use 
sys_async_cancel - or whatever name -  to do that).



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-14 Thread Jeremy Fitzhardinge
Davide Libenzi wrote:
>> Would this work?
>> 
>
> Hopefully the API will simplify enough so that emulation will becomes 
> easier.
>   

The big question in my mind is how all this stuff interacts with
signals.  Can a blocked syscall atom be interrupted by a signal?  If so,
what thread does it get delivered to?  How does sigprocmask affect this
(is it atomic with respect to blocking atoms)?

>> Also, an unrelated question: is there enough control structure in place
>> to allow multiple syslet streams to synchronize with each other with
>> futexes?
>> 
>
> I think the whole point of an async execution of a syscall or a syslet, is 
> that the syscall/syslet itself includes a non interlocked operations with 
> other syscalls/syslets. So that the main scheduler thread can run in a 
> lockless/singletask fashion. There are no technical obstacles that 
> prevents you to do it, bu if you start adding locks (and hence having 
> long-living syslet-threads) at that point you'll end up with a fully 
> multithreaded solution.
>   

I was thinking you'd use the futexes more like barriers than locks. 
That way you could have several streams going asynchronously, but use
futexes to gang them together at appropriate times in the stream.  A
handwavy example would be to have separate async streams for audio and
video, but use futexes to stop them from drifting too far from each other.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-14 Thread Davide Libenzi
On Wed, 14 Feb 2007, Jeremy Fitzhardinge wrote:

> Are there any special semantics that result from running the syslet
> atoms in kernel mode?  If I wanted to, could I write a syslet emulation
> in userspace that's functionally identical to a kernel-based
> implementation?  (Obviously the performance characteristics will be
> different.)
> 
> I'm asking from the perspective of trying to work out the Valgrind
> binding for this if it goes into the kernel.  Valgrind needs to see all
> the input and output values of each system call the client makes,
> including those done within the syslet mechanism.  It seems to me that
> the easiest way to do this would be to intercept the syslet system
> calls, and just implement them in usermode, performing the same series
> of syscalls directly, and applying the Valgrind machinery to each one in
> turn.
> 
> Would this work?

Hopefully the API will simplify enough so that emulation will becomes 
easier.



> Also, an unrelated question: is there enough control structure in place
> to allow multiple syslet streams to synchronize with each other with
> futexes?

I think the whole point of an async execution of a syscall or a syslet, is 
that the syscall/syslet itself includes a non interlocked operations with 
other syscalls/syslets. So that the main scheduler thread can run in a 
lockless/singletask fashion. There are no technical obstacles that 
prevents you to do it, bu if you start adding locks (and hence having 
long-living syslet-threads) at that point you'll end up with a fully 
multithreaded solution.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-14 Thread Jeremy Fitzhardinge
Ingo Molnar wrote:
> Syslets consist of 'syslet atoms', where each atom represents a single 
> system-call. These atoms can be chained to each other: serially, in 
> branches or in loops. The return value of an executed atom is checked 
> against the condition flags. So an atom can specify 'exit on nonzero' or 
> 'loop until non-negative' kind of constructs.
>
> Syslet atoms fundamentally execute only system calls, thus to be able to 
> manipulate user-space variables from syslets i've added a simple special 
> system call: sys_umem_add(ptr, val). This can be used to increase or 
> decrease the user-space variable (and to get the result), or to simply 
> read out the variable (if 'val' is 0).
>   

This looks very interesting.  A couple of questions:

Are there any special semantics that result from running the syslet
atoms in kernel mode?  If I wanted to, could I write a syslet emulation
in userspace that's functionally identical to a kernel-based
implementation?  (Obviously the performance characteristics will be
different.)

I'm asking from the perspective of trying to work out the Valgrind
binding for this if it goes into the kernel.  Valgrind needs to see all
the input and output values of each system call the client makes,
including those done within the syslet mechanism.  It seems to me that
the easiest way to do this would be to intercept the syslet system
calls, and just implement them in usermode, performing the same series
of syscalls directly, and applying the Valgrind machinery to each one in
turn.

Would this work?

Also, an unrelated question: is there enough control structure in place
to allow multiple syslet streams to synchronize with each other with
futexes?

Thanks,
J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-14 Thread Davide Libenzi
On Wed, 14 Feb 2007, Ingo Molnar wrote:

> 
> * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> 
> > Let me clarify what I meant. There is only limited number of threads, 
> > which are supposed to execute blocking context, so when all they are 
> > used, main one will block too - I asked about possibility to reuse the 
> > same thread to execute queue of requests attached to it, each request 
> > can block, but if blocking issue is removed, it would be possible to 
> > return.
> 
> ah, ok, i understand your point. This is not quite possible: the 
> cachemisses are driven from schedule(), which can be arbitraily deep 
> inside arbitrary system calls. It can be in a mutex_lock() deep inside a 
> driver. It can be due to a alloc_pages() call done by a kmalloc() call 
> done from within ext3, which was called from the loopback block driver, 
> which was called from XFS, which was called from a VFS syscall.
> 
> Even if it were possible to backtrack i'm quite sure we dont want to do 
> this, for three main reasons:

IMO it'd be quite simple. We detect the service-thread full condition, 
*before* entering exec_atom and we queue the atom in an async_head request 
list. Yes, there is the chance that from the test time in sys_async_exec, 
to the time we'll end up entering exec_atom and down to schedule, one 
of the threads would become free, but IMO better that blocking 
sys_async_exec.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-14 Thread Linus Torvalds


On Wed, 14 Feb 2007, Pavel Machek wrote:
> 
> Ouch, yet another interpretter in kernel :-(. Can we reuse acpi or
> something?

Hah. You make the joke! I get it!

Mwahahahaa! 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-14 Thread Pavel Machek
Hi!
  
> The boring details:
> 
> Syslets consist of 'syslet atoms', where each atom represents a single 
> system-call. These atoms can be chained to each other: serially, in 
> branches or in loops. The return value of an executed atom is checked 
> against the condition flags. So an atom can specify 'exit on nonzero' or 
> 'loop until non-negative' kind of constructs.

Ouch, yet another interpretter in kernel :-(. Can we reuse acpi or
something?

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-14 Thread Evgeniy Polyakov
On Wed, Feb 14, 2007 at 11:37:31AM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
> > Let me clarify what I meant. There is only limited number of threads, 
> > which are supposed to execute blocking context, so when all they are 
> > used, main one will block too - I asked about possibility to reuse the 
> > same thread to execute queue of requests attached to it, each request 
> > can block, but if blocking issue is removed, it would be possible to 
> > return.
> 
> ah, ok, i understand your point. This is not quite possible: the 
> cachemisses are driven from schedule(), which can be arbitraily deep 
> inside arbitrary system calls. It can be in a mutex_lock() deep inside a 
> driver. It can be due to a alloc_pages() call done by a kmalloc() call 
> done from within ext3, which was called from the loopback block driver, 
> which was called from XFS, which was called from a VFS syscall.

That's only because of schedule() is a main point where
'rescheduling'/requeuing (task switch in other words) happens - but if
it will be possible to bypass schedule()'s decision and not reschedule
there, but 'on demand', will it be possible to reuse the same syslet?

Let me show an example:
consider aio_sendfile() on a big file, so it is not possible to fully
get it into VFS, but having spinning on per-page basis (like right now)
is no optial solution too. For kevent AIO I created new address space
operation aio_getpages() which is essentially mpage_readpages() - it
populates several pages into VFS in one BIO (if possible, otherwise in
the smallest possible number of chunks) and then in bio destruction
callback (actually in bio_endio callback, but for that case it can be
considered as the same) I reschedule the same request to some other (not
exactly the same as started) thread. When processed data is being sent
and next chunk of the file is populated to the VFS using aio_getpages(),
which in BIO callback will reschedule the same request again.

So it is possible with essentially one thread (or limited number of
them) to fill the whole IO pipe.

With syslet approach it seems to be impossible due to the fact, that
request is a whole sendfile. Even if one uses proper readahed (fadvise)
advise, there is no possibility to split sendfile and form it as a set
of essentially the same requests with different start/offset/whatever
parameters (well, exactly for senfile() it is possible - just setup
several calls in one syslet from different offsets and with different
lengths and form a proper state machine of them, but for example TCP 
recv() will not match that scenario).

So my main question was about possibility to reuse syslet state machine
in kevent AIO instead of own (althtough own one lacks only one good 
feature of syslets threads currently - its set of threads is global, 
but not per-task, which does not allow to scale good with number of 
different processes doing IO) so to not duplicate the code if kevent is
ever be possible to get into.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-14 Thread Alan
> > Ohh. OpenVMS lives forever ;) Me likeee ;)
> 
> hm, i dont know OpenVMS - but googled around a bit for 'VMS 
> asynchronous' and it gave me this:

VMS had SYS$QIO which is asynchronous I/O queueing with completions of
sorts. You had to specifically remember if you wanted to a synchronous
I/O.

Nothing afaik quite like series of commands batched async, although VMS
has a call for everything else so its possible there is one buried in the
back of volume 347 of the grey wall ;)

Looking at the completion side I'm not 100% sure we need async_wait given
the async batches can include futex operations...

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-14 Thread Ingo Molnar

* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> Let me clarify what I meant. There is only limited number of threads, 
> which are supposed to execute blocking context, so when all they are 
> used, main one will block too - I asked about possibility to reuse the 
> same thread to execute queue of requests attached to it, each request 
> can block, but if blocking issue is removed, it would be possible to 
> return.

ah, ok, i understand your point. This is not quite possible: the 
cachemisses are driven from schedule(), which can be arbitraily deep 
inside arbitrary system calls. It can be in a mutex_lock() deep inside a 
driver. It can be due to a alloc_pages() call done by a kmalloc() call 
done from within ext3, which was called from the loopback block driver, 
which was called from XFS, which was called from a VFS syscall.

Even if it were possible to backtrack i'm quite sure we dont want to do 
this, for three main reasons:

Firstly, backtracking and retrying always has a cost. We construct state 
on the way in - and we destruct on the way out. The kernel stack we have 
built up has a (nontrivial) construction cost and thus a construction 
value - we should preserve that if possible.

Secondly, and this is equally important: i wanted the number of async 
kernel threads to be the natural throttling mechanism. If user-space 
wants to use less threads and overcommit the request queue then it can 
be done in user-space: by over-queueing requests into a separate list, 
and taking from that list upon completion and submitting it. User-space 
has precise knowledge of overqueueing scenarios: if the event ring is 
full then all async kernel threads are busy.

but note that there's a deeper reason as well for not wanting 
over-queueing: the main cost of a 'pending request' is the kernel stack 
of the blocked thread itself! So do we want to allow 'requests' to stay 
'pending' even if there are "no more threads available"? Nope: because 
letting them 'pend' would essentially (and implicitly) mean an increase 
of the thread pool.

In other words: with the syslet subsystem, a kernel thread /is/ the 
asynchronous request itself. So 'have more requests pending' means 'have 
more kernel threads'. And 'no kernel thread available' must thus mean 
'no queueing of this request'.

Thirdly, there is a performance advantage of this queueing property as 
well: by letting a cachemiss thread only do a single syslet all work is 
concentrated back to the 'head' task, and all queueing decisions are 
immediately known by user-space and can be acted upon.

So the work-queueing setup is not symmetric at all, there's a 
fundamental bias and tendency back towards the head task - this helps 
caching too. That's what Tux did too - it always tried to queue back to 
the 'head task' as soon as it could. Spreading out work dynamically and 
transparently is necessary and nice, but it's useless if the system has 
no automatic tendency to move back into single-threaded (fully cached) 
state if the workload becomes less parallel. Without this fundamental 
(and transparent) 'shrink parallelism' property syslets would only 
degrade into yet another threading construct.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-14 Thread Evgeniy Polyakov
On Tue, Feb 13, 2007 at 11:18:10PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> 
> > [...] it still has a problem - syscall blocks and the same thread thus 
> > is not allowed to continue execution and fill the pipe - so what if 
> > system issues thousands of requests and there are only tens of working 
> > thread at most. [...]
> 
> the same thread is allowed to continue execution even if the system call 
> blocks: take a look at async_schedule(). The blocked system-call is 'put 
> aside' (in a sleeping thread), the kernel switches the user-space 
> context (registers) to a free kernel thread and switches to it - and 
> returns to user-space as if nothing happened - allowing the user-space 
> context to 'fill the pipe' as much as it can. Or did i misunderstand 
> your point?

Let me clarify what I meant.
There is only limited number of threads, which are supposed to execute
blocking context, so when all they are used, main one will block too - I
asked about possibility to reuse the same thread to execute queue of
requests attached to it, each request can block, but if blocking issue
is removed, it would be possible to return.

What I'm asking for is how actually kevent IO state machine functions work
- each IO request is made not through usual mpage and bio allocations,
but with special kevent ones, which do not wait until completion, but
instead in destructor it is either rescheduled (if big file is
transferred, then it is split into parts for transmission) or committed
as ready (thus it becomes possible to read completion through kevent
queue or ring), so there are only several threads, each one does small
job on each request, but the same request can be rescheduled to it again
and again (from bio destructor or ->end_io callback for example).

So I asked if it is possible to extend this state machine to work not
only with blocked syscalls but with non-blocked functions with
possibility to reschedule the same item again.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-14 Thread Ingo Molnar

* Davide Libenzi  wrote:

> > There's another problem AFAICS:
> > 
> > - We woke up one of the cachemiss_loop threads in pick_new_async_thread
> > 
> > - The threads wakes up, mark itself as busy, and look at the ->work 
> >   pointer hoping to find something to work on
> > 
> > But we never set that pointer to a userspace atom AFAICS. Me blind? :)
> 
> I still don't see at->work ever set to a valid userspace atom 
> though...

yeah - i havent added support for 'submit syslet from within a syslet' 
support yet :-)

note that current normal syslet operation (both async and sync alike) 
does not need at->work at all. When we cachemiss then the new head task 
just wants to return a NULL pointer to user-space, to signal that work 
is continuing in the background. A ready 'cachemiss' thread is really 
not there to do cachemisses, it is a 'new head task in the waiting'. The 
name comes from Tux and i guess it's time for a rename :)

but i do plan a SYSLET_ASYNC_CONTINUE flag, roughly along the patch i've 
attached below: this would skip to the linearly next syslet and would 
let the original syslet execute in the. I have not fully thought this 
through though (let alone tested it ;) - can you see any hole in this 
approach? This would in essence allow the following construct:

   syslet1 &
   syslet2 &
   syslet3 &
   syslet4 &

submitted in parallel, straight to cachemiss threads, from a syslet 
itself.

there's yet another work submission variant that makes sense to do, a 
true syslet vector submission: to do a loop over syslet atoms in 
sys_async_exec(). That would have the added advantage of enabling 
caching. If one vector component generates a cachemiss then the head 
would continue with the next vector. (this too needs at->work alike 
communication between ex-head and new-head)

maybe the latter would be the cleaner approach - SYSLET_ASYNC_CONTINUE 
has no effect in cachemiss context, so it only makes sense if the 
submitted syslet is a pure vector of parallel atoms. Alternatively, 
SYSLET_ASYNC_CONTINUE would have to be made work from cachemiss contexts 
too. (because that makes sense too, to start new async execution from 
another async context.)

another not yet clear area is when there's no cachemiss thread 
available. Right now SYSLET_ASYNC_CONTINUE will just fail - which makes 
it nondeterministic.

Ingo

---
 include/linux/async.h  |   13 +++--
 include/linux/sched.h  |3 +--
 include/linux/syslet.h |   20 +---
 kernel/async.c |   43 +--
 kernel/sched.c |2 +-
 5 files changed, 56 insertions(+), 27 deletions(-)

 # *DOCUMENTATION*
Index: linux/include/linux/async.h
===
--- linux.orig/include/linux/async.h
+++ linux/include/linux/async.h
@@ -1,15 +1,23 @@
 #ifndef _LINUX_ASYNC_H
 #define _LINUX_ASYNC_H
+
+#include 
+
 /*
  * The syslet subsystem - asynchronous syscall execution support.
  *
  * Generic kernel API definitions:
  */
 
+struct syslet_uatom;
+struct async_thread;
+struct async_head;
+
 #ifdef CONFIG_ASYNC_SUPPORT
 extern void async_init(struct task_struct *t);
 extern void async_exit(struct task_struct *t);
-extern void __async_schedule(struct task_struct *t);
+extern void
+__async_schedule(struct task_struct *t, struct syslet_uatom __user 
*next_uatom);
 #else /* !CONFIG_ASYNC_SUPPORT */
 static inline void async_init(struct task_struct *t)
 {
@@ -17,7 +25,8 @@ static inline void async_init(struct tas
 static inline void async_exit(struct task_struct *t)
 {
 }
-static inline void __async_schedule(struct task_struct *t)
+static inline void
+__async_schedule(struct task_struct *t, struct syslet_uatom __user *next_uatom)
 {
 }
 #endif /* !CONFIG_ASYNC_SUPPORT */
Index: linux/include/linux/sched.h
===
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -83,13 +83,12 @@ struct sched_param {
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
 struct exec_domain;
 struct futex_pi_state;
-struct async_thread;
-struct async_head;
 /*
  * List of flags we want to share for kernel threads,
  * if only because they are not used by them anyway.
Index: linux/include/linux/syslet.h
===
--- linux.orig/include/linux/syslet.h
+++ linux/include/linux/syslet.h
@@ -56,10 +56,16 @@ struct syslet_uatom {
 #define SYSLET_ASYNC   0x0001
 
 /*
+ * Queue this syslet asynchronously and continue executing the
+ * next linear atom:
+ */
+#define SYSLET_ASYNC_CONTINUE  0x0002
+
+/*
  * Never queue this syslet asynchronously - even if synchronous
  * execution causes a context-switching:
  */
-#define SYSLET_SYNC0x0002
+#define SYSLET_SYNC0x0004
 
 /*
  * Do not queue the syslet in the completion r

Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Davide Libenzi
On Tue, 13 Feb 2007, Davide Libenzi wrote:

[...]

> So the sys_async_exec task is going to block. Now, am I being really 
> tired, or the cachemiss fast return is simply not there?

The former 8)

pick_new_async_head()
new_task->ah = ah;

cachemiss_loop()
for (;;) {
if (unlikely(t->ah || ...))
break;


> There's another problem AFAICS:
> 
> - We woke up one of the cachemiss_loop threads in pick_new_async_thread
> 
> - The threads wakes up, mark itself as busy, and look at the ->work 
>   pointer hoping to find something to work on
> 
> But we never set that pointer to a userspace atom AFAICS. Me blind? :)

I still don't see at->work ever set to a valid userspace atom though...



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Willy Tarreau
Hi Ingo !

On Tue, Feb 13, 2007 at 03:20:10PM +0100, Ingo Molnar wrote:
> I'm pleased to announce the first release of the "Syslet" kernel feature 
> and kernel subsystem, which provides generic asynchrous system call 
> support:
> 
>http://redhat.com/~mingo/syslet-patches/
> 
> Syslets are small, simple, lightweight programs (consisting of 
> system-calls, 'atoms') that the kernel can execute autonomously (and, 
> not the least, asynchronously), without having to exit back into 
> user-space. Syslets can be freely constructed and submitted by any 
> unprivileged user-space context - and they have access to all the 
> resources (and only those resources) that the original context has 
> access to.

I like this a lot. I've always felt frustrated by the wasted time in
setsockopt() calls after accept() or before connect(), or in multiple
calls to epoll_ctl(). It might also be useful as an efficient readv()
emulation using recv(), etc...

Nice work !
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Davide Libenzi
On Tue, 13 Feb 2007, Ingo Molnar wrote:

> I'm pleased to announce the first release of the "Syslet" kernel feature 
> and kernel subsystem, which provides generic asynchrous system call 
> support:
> [...]

Ok, I had little to time review the code, but it has been a long 
working day, so bear with me if I missed something.
I don't see how sys_async_exec would not block, based on your patches. 
Let's try to follow:

- We enter sys_async_exec

- We may fill the pool, but that's nothing interesting ATM. A bunch of 
  threads will be created, and they'll end up sleeping inside the 
  cachemiss_loop

- We set the async_ready pointer and we fall inside exec_atom

- There we copy the atom (nothing interesting from a scheduling POV) and 
  we fall inside __exec_atom

- In __exec_atom we do the actual syscall call. Note that we're still the 
  task/thread that called sys_async_exec

- So we enter the syscall, and now we end up in schedule because we're 
  just unlucky

- We notice that the async_ready pointer is not NULL, and we call 
  __async_schedule

- Finally we're in pick_new_async_thread and we pick one of the ready 
  threads sleeping in cachemiss_loop

- We copy the pt_regs to the newly picked-up thread, we set its async head 
  pointer, we set the current task async_ready pointer to NULL, we 
  re-initialize the async_thread structure (the old async_ready), and we 
  put ourselves in the busy_list

- Then we roll back to the schedule that started everything, and being 
  still "prev" for the scheduler, we go to sleep

So the sys_async_exec task is going to block. Now, am I being really 
tired, or the cachemiss fast return is simply not there?
There's another problem AFAICS:

- We woke up one of the cachemiss_loop threads in pick_new_async_thread

- The threads wakes up, mark itself as busy, and look at the ->work 
  pointer hoping to find something to work on

But we never set that pointer to a userspace atom AFAICS. Me blind? :)




- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Davide Libenzi
On Tue, 13 Feb 2007, Ingo Molnar wrote:

> * Davide Libenzi  wrote:
> 
> > > If this is going to be a generic AIO subsystem:
> > > 
> > > - Cancellation of peding request
> > 
> > What about the busy_async_threads list becoming a hash/rb_tree indexed 
> > by syslet_atom ptr. A cancel would lookup the thread and send a signal 
> > (of course, signal handling of the async threads should be set 
> > properly)?
> 
> well, each async syslet has a separate TID at the moment, so if we want 
> a submitted syslet to be cancellable then we could return the TID of the 
> syslet handler (instead of the NULL) in sys_async_exec(). Then 
> user-space could send a signal the old-fashioned way, via sys_tkill(), 
> if it so wishes.

That works too. I was thinking about identifying syslets with the 
userspace ptr, but the TID is fine too.



> the TID could also be used in a sys_async_wait_on() API. I.e. it would 
> be a natural, readily accessible 'cookie' for the pending work. TIDs can 
> be looked up lockless via RCU, so it's reasonably fast as well.
> 
> ( Note that there's already a way to 'signal' pending syslets: do_exit() 
>   in the user context will signal all async contexts (which results in 
>   -EINTR of currently executing syscalls, wherever possible) and will 
>   tear them down. But that's too crude for aio_cancel() i guess. )

Yup.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Davide Libenzi
On Tue, 13 Feb 2007, Ingo Molnar wrote:

> 
> * Davide Libenzi  wrote:
> 
> > > Open issues:
> 
> > If this is going to be a generic AIO subsystem:
> > 
> > - Cancellation of pending request
> 
> How about implementing aio_cancel() as a NOP. Can anyone prove that the 
> kernel didnt actually attempt to cancel that IO? [but unfortunately 
> failed at doing so, because the platters were being written already.]
> 
> really, what's the point behind aio_cancel()?

You need cancel. If you scheduled an async syscall, and the "session" 
linked with that chain is going away, you better have that canceled before 
cleaning up buffers to where the chain is going to read/write.
If you keep and hash or a tree indexed by atom-ptr, than become a matter 
of a lookup and sending a signal.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Ulrich Drepper
Ingo Molnar wrote:
> really, what's the point behind aio_cancel()?

- sequence

 aio_write()
 aio_cancel()
 aio_write()

  with both writes going to the same place must be predictably

- think beyond files.  Writes to sockets, ttys, they  can block and
cancel must abort them.  Even for files the same applies in some
situations, e.g., for network filesystems.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Olivier Galibert
On Tue, Feb 13, 2007 at 10:57:24PM +0100, Ingo Molnar wrote:
> 
> * Davide Libenzi  wrote:
> 
> > > Open issues:
> 
> > If this is going to be a generic AIO subsystem:
> > 
> > - Cancellation of pending request
> 
> How about implementing aio_cancel() as a NOP. Can anyone prove that the 
> kernel didnt actually attempt to cancel that IO? [but unfortunately 
> failed at doing so, because the platters were being written already.]
> 
> really, what's the point behind aio_cancel()?

Lemme give you a real-world scenario: Question Answering in a Dialog
System.  Your locked-in-memory index ranks documents in a several
million files corpus depending of the chances they have to have what
you're looking for.  You have a tenth of a second to read as many of
them as you can, and each seek is 5ms.  So you aio-read them,
requesting them in order of ranking up to 200 or so, and see what you
have at the 0.1s deadline.  If you're lucky, a combination of cache
(especially if you stat() the whole dir tree on a regular basis to
keep the metadata fresh in cache) and of good io reorganisation by the
scheduler will allow you to get a good number of them and do the
information extraction, scoring and clustering of answers, which is
pure CPU at that point.  You *have* to cancel the remaining i/o
because you do not want the disk saturated when the next request
comes, especially if it's 10ms later because the dialog manager found
out it needed a complementary request.

Incidentally, that's something I'm currently implementing for work,
making these aio discussions more interesting that usual :-)

  OG.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Andi Kleen
> ok. The TID+signal approach i mentioned in the other reply should work. 

Not sure if a signal is good for this. It might conflict with existing
strange historical semantics.

> If it's frequent enough we could make this an explicit 
> sys_async_cancel(TID) API.

Ideally there should be a new function like signal_pending() that checks for
this. Then the network fs could check those in their blocking loops
and error out.

Then it would even work on non intr NFS mounts.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> > ok, that should work fine already - exit in the user context gets
> 
> That would be a little heavy handed. I wouldn't expect my GUI program 
> to quit itself on cancel. And requiring it to create a new thread just 
> to exit on cancel would be also nasty.
> 
> And of course you cannot interrupt blocked IOs this way right now 
> (currently it only works with signals in some cases on NFS)

ok. The TID+signal approach i mentioned in the other reply should work. 
If it's frequent enough we could make this an explicit 
sys_async_cancel(TID) API.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Dmitry Torokhov
Hi Ingo,

On Tuesday 13 February 2007 15:39, Ingo Molnar wrote:
> 
> * Dmitry Torokhov <[EMAIL PROTECTED]> wrote:
> 
> > > What are the semantics of async sys_async_wait and async sys_async ?
> > 
> > Ohh. OpenVMS lives forever ;) Me likeee ;)
> 
> hm, i dont know OpenVMS - but googled around a bit for 'VMS 
> asynchronous' and it gave me this:
> 
>   http://en.wikipedia.org/wiki/Asynchronous_system_trap
> 
> is AST what you mean? From a quick read AST seems to be a signal 
> mechanism a bit like Unix signals, extended to kernel-space as well - 
> while syslets are a different 'safe execution engine' kind of thing 
> centered around the execution of system calls.
> 

That is only one of ways of notifying userspace of system call completion
on OpenVMS. Pretty much every syscall there exists in 2 flavors - async
and sync, for example $QIO and $QIOW or $ENQ/$ENQW (actually -W flavor
is async call + $SYNCH to wait for completion). Once system service call
is completed the OS would raise a so-called event flag and may also
deliver an AST to the process. Application may either wait for an
event flag/set of event flags (EFN) or rely on AST to get notification.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Andi Kleen
On Tue, Feb 13, 2007 at 11:26:26PM +0100, Ingo Molnar wrote:
> 
> * Andi Kleen <[EMAIL PROTECTED]> wrote:
> 
> > > really, what's the point behind aio_cancel()?
> > 
> > The main use case is when you open a file requester on a network file 
> > system where the server is down and you get tired of waiting and press 
> > "Cancel" it should abort the hanging IO immediately.
> 
> ok, that should work fine already - exit in the user context gets 

That would be a little heavy handed. I wouldn't expect my GUI
program to quit itself on cancel. And requiring it to create a new
thread just to exit on cancel would be also nasty.

And of course you cannot interrupt blocked IOs this way right now
(currently it only works with signals in some cases on NFS)

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> > really, what's the point behind aio_cancel()?
> 
> The main use case is when you open a file requester on a network file 
> system where the server is down and you get tired of waiting and press 
> "Cancel" it should abort the hanging IO immediately.

ok, that should work fine already - exit in the user context gets 
propagated to all async syslet contexts immediately. So if the syscalls 
that the syslet uses are reasonably interruptible, it will work out 
fine.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Andi Kleen
Ingo Molnar <[EMAIL PROTECTED]> writes:
> 
> really, what's the point behind aio_cancel()?

The main use case is when you open a file requester on a network
file system where the server is down and you get tired of waiting
and press "Cancel" it should abort the hanging IO immediately.

At least I would appreciate such a feature sometimes.

e.g. the readdir loop could be a syslet (are they powerful
enough to allocate memory for a arbitary sized directory? Probably not) 
and then the cancel button could async_cancel() it.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Ingo Molnar

* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> > I have not received first mail with announcement yet, so I will place 
> > my thoughts here if you do not mind.
> 
> An issue with sys_async_wait(): is is possible that events_left will 
> be setup too late so that all events are already ready and thus 
> sys_async_wait() can wait forever (or until next $sys_async_wait are 
> ready)?

yeah. I have fixed this up and have uploaded a newer queue to:

 http://redhat.com/~mingo/syslet-patches/

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Ingo Molnar

* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> [...] it still has a problem - syscall blocks and the same thread thus 
> is not allowed to continue execution and fill the pipe - so what if 
> system issues thousands of requests and there are only tens of working 
> thread at most. [...]

the same thread is allowed to continue execution even if the system call 
blocks: take a look at async_schedule(). The blocked system-call is 'put 
aside' (in a sleeping thread), the kernel switches the user-space 
context (registers) to a free kernel thread and switches to it - and 
returns to user-space as if nothing happened - allowing the user-space 
context to 'fill the pipe' as much as it can. Or did i misunderstand 
your point?

basically there's SYSLET_ASYNC for 'always async' and SYSLET_SYNC for 
'always sync' - but the default syslet behavior is: 'try sync and switch 
transparently to async on demand'. The testcode i sent very much uses 
this. (and this mechanism is in essence Zach's fibril-switching thing, 
but done via kernel threads.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Ingo Molnar

* Davide Libenzi  wrote:

> > If this is going to be a generic AIO subsystem:
> > 
> > - Cancellation of peding request
> 
> What about the busy_async_threads list becoming a hash/rb_tree indexed 
> by syslet_atom ptr. A cancel would lookup the thread and send a signal 
> (of course, signal handling of the async threads should be set 
> properly)?

well, each async syslet has a separate TID at the moment, so if we want 
a submitted syslet to be cancellable then we could return the TID of the 
syslet handler (instead of the NULL) in sys_async_exec(). Then 
user-space could send a signal the old-fashioned way, via sys_tkill(), 
if it so wishes.

the TID could also be used in a sys_async_wait_on() API. I.e. it would 
be a natural, readily accessible 'cookie' for the pending work. TIDs can 
be looked up lockless via RCU, so it's reasonably fast as well.

( Note that there's already a way to 'signal' pending syslets: do_exit() 
  in the user context will signal all async contexts (which results in 
  -EINTR of currently executing syscalls, wherever possible) and will 
  tear them down. But that's too crude for aio_cancel() i guess. )

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Ingo Molnar

* Davide Libenzi  wrote:

> > Open issues:

> If this is going to be a generic AIO subsystem:
> 
> - Cancellation of pending request

How about implementing aio_cancel() as a NOP. Can anyone prove that the 
kernel didnt actually attempt to cancel that IO? [but unfortunately 
failed at doing so, because the platters were being written already.]

really, what's the point behind aio_cancel()?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Davide Libenzi
On Tue, 13 Feb 2007, Davide Libenzi wrote:

> If this is going to be a generic AIO subsystem:
> 
> - Cancellation of peding request

What about the busy_async_threads list becoming a hash/rb_tree indexed by 
syslet_atom ptr. A cancel would lookup the thread and send a signal (of 
course, signal handling of the async threads should be set properly)?



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Ingo Molnar

* Dmitry Torokhov <[EMAIL PROTECTED]> wrote:

> > What are the semantics of async sys_async_wait and async sys_async ?
> 
> Ohh. OpenVMS lives forever ;) Me likeee ;)

hm, i dont know OpenVMS - but googled around a bit for 'VMS 
asynchronous' and it gave me this:

  http://en.wikipedia.org/wiki/Asynchronous_system_trap

is AST what you mean? From a quick read AST seems to be a signal 
mechanism a bit like Unix signals, extended to kernel-space as well - 
while syslets are a different 'safe execution engine' kind of thing 
centered around the execution of system calls.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Ingo Molnar

* Benjamin LaHaise <[EMAIL PROTECTED]> wrote:

> [...] interaction with set_fs()...

hm, this one should already work in the current version, because 
addr_limit is in thread_info and hence stays with the async context. Or 
can you see any hole in it?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Davide Libenzi
On Tue, 13 Feb 2007, Linus Torvalds wrote:

>   if (in_async_context())
>   return -EINVAL;
> 
> or similar. We need that "async_context()" function anyway for the other 
> cases where we can't do other things concurrently, like changing the UID.

Yes, that's definitely better. Let's have the policy about weather a 
syscall is or is not async-enabled, inside the syscall itself. Simplify 
things a lot.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Davide Libenzi
On Tue, 13 Feb 2007, Ingo Molnar wrote:

> As it might be obvious to some of you, the syslet subsystem takes many 
> ideas and experience from my Tux in-kernel webserver :) The syslet code 
> originates from a heavy rewrite of the Tux-atom and the Tux-cachemiss 
> infrastructure.
> 
> Open issues:
> 
>  - the 'TID' of the 'head' thread currently varies depending on which 
>thread is running the user-space context.
> 
>  - signal support is not fully thought through - probably the head 
>should be getting all of them - the cachemiss threads are not really 
>interested in executing signal handlers.
> 
>  - sys_fork() and sys_async_exec() should be filtered out from the 
>syscalls that are allowed - first one only makes sense with ptregs, 
>second one is a nice kernel recursion thing :) I didnt want to 
>duplicate the sys_call_table though - maybe others have a better 
>idea.

If this is going to be a generic AIO subsystem:

- Cancellation of peding request



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Evgeniy Polyakov
> I have not received first mail with announcement yet, so I will place 
> my thoughts here if you do not mind.

An issue with sys_async_wait():
is is possible that events_left will be setup too late so that all
events are already ready and thus sys_async_wait() can wait forever
(or until next $sys_async_wait are ready)?

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Evgeniy Polyakov
On Tue, Feb 13, 2007 at 05:56:42PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> * Benjamin LaHaise <[EMAIL PROTECTED]> wrote:
> 
> > > > Open issues:
> > > 
> > > Let me add some more
> > 
> > Also: FPU state (especially important with the FPU and SSE memory copy 
> > variants), segment register bases on x86-64, interaction with 
> > set_fs()...
> 
> agreed - i'll fix this. But i can see no big conceptual issue here - 
> these resources are all attached to the user context, and that doesnt 
> change upon an 'async context-switch'. So it's "only" a matter of 
> properly separating the user execution context from the kernel execution 
> context. The hardest bit was getting the ptregs details right - the 
> FPU/SSE state is pretty much async already (in the hardware too) and 
> isnt even touched by any of these codepaths.

Good work, Ingo.

I have not received first mail with announcement yet, so I will place 
my thoughts here if you do not mind.

First one is per-thread data like TID. What about TLS related kernel
data (is non-exec stack property stored in TLS block or in kernel)?
Should it be copied with regs too (or better introduce new clone flag,
which would force that info copy)?

Btw, does SSE?/MMX?/call-it-yourself really saved on context switch?
As far as I can see no syscalls (and kernel at all) use that registers.

Another one is more global AIO question - while this approach IMHO
outperforms micro-thread design (Zach and Linus created really good
starting points, but they too have fundamental limiting factor), it
still has a problem - syscall blocks and the same thread thus is not
allowed to continue execution and fill the pipe - so what if system
issues thousands of requests and there are only tens of working thread
at most. What Tux did, as far as I recall, (and some other similar 
state machines do :) was to break blocking syscall issues and return
to the next execution entity (next syslet or atom). Is it possible to
extend exactly this state machine and interface to allow that (so that
some other state machine implementations would not continue its life :)?

>   Ingo

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> Ie, we could just add to "do_fork()" (which is where all of the 
> vfork/clone/fork cases end up) a simple case like
> 
>   err = wait_async_context();
>   if (err)
>   return err;
> 
> or
> 
>   if (in_async_context())
>   return -EINVAL;

ok, this is a much nicer solution. I've scrapped the 
sys_async_sys_call_table[] thing.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Ingo Molnar

* Benjamin LaHaise <[EMAIL PROTECTED]> wrote:

> > > Open issues:
> > 
> > Let me add some more
> 
> Also: FPU state (especially important with the FPU and SSE memory copy 
> variants), segment register bases on x86-64, interaction with 
> set_fs()...

agreed - i'll fix this. But i can see no big conceptual issue here - 
these resources are all attached to the user context, and that doesnt 
change upon an 'async context-switch'. So it's "only" a matter of 
properly separating the user execution context from the kernel execution 
context. The hardest bit was getting the ptregs details right - the 
FPU/SSE state is pretty much async already (in the hardware too) and 
isnt even touched by any of these codepaths.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> > sys_exec and other security boundaries must be synchronous 
> > only and not allow async "spill over" (consider setuid async binary 
> > patching)
> 
> He probably would need some generalization of Andrea's seccomp work. 
> Perhaps using bitmaps? For paranoia I would suggest to white list, not 
> black list calls.

what i've implemented in my tree is sys_async_call_table[] which is a 
copy of sys_call_table[] with certain entries modified (by architecture 
level code, not by kernel/async.c) to sys_ni_syscall(). It's up to the 
architecture to decide which syscalls are allowed.

but i could use a bitmap too - whatever linear construct. [ I'm not sure 
there's much connection to seccomp - seccomp uses a NULL terminated 
whitelist - while syslets would use most of the entries (and would not 
want to have the overhead of checking a blacklist). ]

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Ingo Molnar

* Alan <[EMAIL PROTECTED]> wrote:

> > A syslet is executed opportunistically: i.e. the syslet subsystem 
> > assumes that the syslet will not block, and it will switch to a 
> > cachemiss kernel thread from the scheduler. This means that even a
> 
> How is scheduler fairness maintained ? and what is done for resource 
> accounting here ?

the async threads are as if the user created user-space threads - and 
it's accounted (and scheduled) accordingly.

> > that the kernel fills and user-space clears. Waiting is done via the 
> > sys_async_wait() system call. Completion can be supressed on a 
> > per-atom
> 
> They should be selectable as well iff possible.

basically arbitrary notification interfaces are supported. For example 
if you add a sys_kill() call as the last syslet atom then this will 
notify any waiter in sigwait().

or if you want to select(), just do it in the fds that you are 
interested in, and the write that the syslet does triggers select() 
completion.

but the fastest one will be by using syslets: to just check the 
notification ring pointer in user-space, and then call into 
sys_async_wait() if the ring is empty.

I just noticed a small bug here: sys_async_wait() should also take the 
ring index userspace checked as a second parameter, and fix up the 
number of events it waits for with the delta between the ring index the 
kernel maintains and the ring index user-space has. The patch below 
fixes this bug.

> > Open issues:
> 
> Let me add some more
> 
>   sys_setuid/gid/etc need to be synchronous only and not occur 
> while other async syscalls are running in parallel to meet current 
> kernel assumptions.

these should probably be taken out of the 'async syscall table', along 
with fork and the async syscalls themselves.

>   sys_exec and other security boundaries must be synchronous 
> only and not allow async "spill over" (consider setuid async binary 
> patching)

i've tested sys_exec() and it seems to work, but i might have missed 
some corner-cases. (And what you raise is not academic, it might even 
make sense to do it, in the vfork() way.)

> >  - sys_fork() and sys_async_exec() should be filtered out from the 
> >syscalls that are allowed - first one only makes sense with ptregs, 
> 
> clone and vfork. async_vfork is a real mindbender actually.

yeah. Also, create_module() perhaps. I'm starting to lean towards an 
async_syscall_table[]. At which point we could reduce the max syslet 
parameter count to 4, and do those few 5 and 6 parameter syscalls (of 
which only splice() and futex() truly matter i suspect) via wrappers. 
This would fit a syslet atom into 32 bytes on x86. Hm?

> >second one is a nice kernel recursion thing :) I didnt want to 
> >duplicate the sys_call_table though - maybe others have a better 
> >idea.
> 
> What are the semantics of async sys_async_wait and async sys_async ?

agreed, that should be forbidden too.

Ingo

-->
---
 kernel/async.c |   12 +---
 kernel/async.h |2 +-
 2 files changed, 10 insertions(+), 4 deletions(-)

Index: linux/kernel/async.c
===
--- linux.orig/kernel/async.c
+++ linux/kernel/async.c
@@ -721,7 +721,8 @@ static void refill_cachemiss_pool(struct
  * to finish or for all async processing to finish (whichever
  * comes first).
  */
-asmlinkage long sys_async_wait(unsigned long min_wait_events)
+asmlinkage long
+sys_async_wait(unsigned long min_wait_events, unsigned long user_curr_ring_idx)
 {
struct async_head *ah = current->ah;
 
@@ -730,12 +731,17 @@ asmlinkage long sys_async_wait(unsigned 
 
if (min_wait_events) {
spin_lock(&ah->lock);
-   ah->events_left = min_wait_events;
+   /*
+* Account any completions that happened since user-space
+* checked the ring:
+*/
+   ah->events_left = min_wait_events -
+   (ah->curr_ring_idx - user_curr_ring_idx);
spin_unlock(&ah->lock);
}
 
return wait_event_interruptible(ah->wait,
-   list_empty(&ah->busy_async_threads) || !ah->events_left);
+   list_empty(&ah->busy_async_threads) || ah->events_left > 0);
 }
 
 /**
Index: linux/kernel/async.h
===
--- linux.orig/kernel/async.h
+++ linux/kernel/async.h
@@ -26,7 +26,7 @@ struct async_head {
struct list_headready_async_threads;
struct list_headbusy_async_threads;
 
-   unsigned long   events_left;
+   longevents_left;
wait_queue_head_t   wait;
 
struct async_head_user  __user  *uah;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PR

Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Linus Torvalds


On Tue, 13 Feb 2007, Andi Kleen wrote:

> > sys_exec and other security boundaries must be synchronous only
> > and not allow async "spill over" (consider setuid async binary patching)
> 
> He probably would need some generalization of Andrea's seccomp work.
> Perhaps using bitmaps? For paranoia I would suggest to white list, not black 
> list
> calls.

It's actually more likely a lot more efficient to let the system call 
itself do the sanity checking. That allows the common system calls (that 
*don't* need to even check) to just not do anything at all, instead of 
having some complex logic in the common system call execution trying to 
figure out for each system call whether it is ok or not.

Ie, we could just add to "do_fork()" (which is where all of the 
vfork/clone/fork cases end up) a simple case like

err = wait_async_context();
if (err)
return err;

or

if (in_async_context())
return -EINVAL;

or similar. We need that "async_context()" function anyway for the other 
cases where we can't do other things concurrently, like changing the UID.

I would suggest that "wait_async_context()" would do:

 - if weare *in* an async context, return an error. We cannot wait for 
   ourselves!
 - if we are the "real thread", wait for all async contexts to go away 
   (and since we are the real thread, no new ones will be created, so this 
   is not going to be an infinite wait)

The new thing would be that wait_async_context() would possibly return 
-ERESTARTSYS (signal while an async context was executing), so any system 
call that does this would possibly return EINTR. Which "fork()" hasn't 
historically done. But if you have async events active, some operations 
likely cannot be done (setuid() and execve() comes to mind), so you really 
do need something like this.

And obviously it would only affect any program that actually would _use_ 
any of the suggested new interfaces, so it's not like a new error return 
would break anything old.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread bert hubert
On Tue, Feb 13, 2007 at 09:58:48AM -0500, Benjamin LaHaise wrote:

> not present is mandatory).  I have looked into exactly this approach, and 
> it's only cheaper if the code is incomplete.  Linux's native threads are 
> pretty damned good.

Cheaper in time or in memory? Iow, would you be able to queue up as many
threads as syslets?

Bert

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Dmitry Torokhov

On 2/13/07, Alan <[EMAIL PROTECTED]> wrote:

> A syslet is executed opportunistically: i.e. the syslet subsystem
> assumes that the syslet will not block, and it will switch to a
> cachemiss kernel thread from the scheduler. This means that even a

How is scheduler fairness maintained ? and what is done for resource
accounting here ?

> that the kernel fills and user-space clears. Waiting is done via the
> sys_async_wait() system call. Completion can be supressed on a per-atom

They should be selectable as well iff possible.

> Open issues:

Let me add some more

   sys_setuid/gid/etc need to be synchronous only and not occur
while other async syscalls are running in parallel to meet current kernel
assumptions.

   sys_exec and other security boundaries must be synchronous only
and not allow async "spill over" (consider setuid async binary patching)

>  - sys_fork() and sys_async_exec() should be filtered out from the
>syscalls that are allowed - first one only makes sense with ptregs,

clone and vfork. async_vfork is a real mindbender actually.

>second one is a nice kernel recursion thing :) I didnt want to
>duplicate the sys_call_table though - maybe others have a better
>idea.

What are the semantics of async sys_async_wait and async sys_async ?



Ohh. OpenVMS lives forever ;) Me likeee ;)

--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Andi Kleen
Alan <[EMAIL PROTECTED]> writes:

Funny, it sounds like batch() on stereoids @) Ok with an async context it 
becomes
somewhat more interesting.
 
>   sys_setuid/gid/etc need to be synchronous only and not occur
> while other async syscalls are running in parallel to meet current kernel
> assumptions.
> 
>   sys_exec and other security boundaries must be synchronous only
> and not allow async "spill over" (consider setuid async binary patching)

He probably would need some generalization of Andrea's seccomp work.
Perhaps using bitmaps? For paranoia I would suggest to white list, not black 
list
calls.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Arjan van de Ven
On Tue, 2007-02-13 at 09:58 -0500, Benjamin LaHaise wrote:
> On Tue, Feb 13, 2007 at 03:00:19PM +, Alan wrote:
> > > Open issues:
> > 
> > Let me add some more
> 
> Also: FPU state (especially important with the FPU and SSE memory copy 
> variants)

are these preserved over explicit system calls? 
-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Benjamin LaHaise
On Tue, Feb 13, 2007 at 03:00:19PM +, Alan wrote:
> > Open issues:
> 
> Let me add some more

Also: FPU state (especially important with the FPU and SSE memory copy 
variants), segment register bases on x86-64, interaction with set_fs()...  
There is no easy way of getting around the full thread context switch and 
its associated overhead (mucking around in CR0 is one of the more expensive 
bits of the context switch code path, and at the very least, setting the FPU 
not present is mandatory).  I have looked into exactly this approach, and 
it's only cheaper if the code is incomplete.  Linux's native threads are 
pretty damned good.

-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[EMAIL PROTECTED]>.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

2007-02-13 Thread Alan
> A syslet is executed opportunistically: i.e. the syslet subsystem 
> assumes that the syslet will not block, and it will switch to a 
> cachemiss kernel thread from the scheduler. This means that even a 

How is scheduler fairness maintained ? and what is done for resource
accounting here ?

> that the kernel fills and user-space clears. Waiting is done via the 
> sys_async_wait() system call. Completion can be supressed on a per-atom 

They should be selectable as well iff possible.

> Open issues:

Let me add some more

sys_setuid/gid/etc need to be synchronous only and not occur
while other async syscalls are running in parallel to meet current kernel
assumptions.

sys_exec and other security boundaries must be synchronous only
and not allow async "spill over" (consider setuid async binary patching)

>  - sys_fork() and sys_async_exec() should be filtered out from the 
>syscalls that are allowed - first one only makes sense with ptregs, 

clone and vfork. async_vfork is a real mindbender actually.

>second one is a nice kernel recursion thing :) I didnt want to 
>duplicate the sys_call_table though - maybe others have a better 
>idea.

What are the semantics of async sys_async_wait and async sys_async ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/