Re: p_vmspace in syscall

2007-07-04 Thread Robert Watson


On Mon, 2 Jul 2007, Nicolas Cormier wrote:

I am trying to map some data allocated in kernel to a user process (via a 
syscall). I need the proc's vmspace, but the value of p_vmspace of the input 
proc argument is NULL ... How can I get a valid vmspace ?


When operating in a system call, the 'td' argument to the system call function 
is the current thread pointer.  You can follow td-td_proc to get to the 
current process (and therefore, its address space).  In general, I prefer 
mapping user pages into kernel instead of kernel pages into user space, as it 
reduces the chances of leakage of kernel data to user space, and there are 
some useful primitives for making this easier.  For example, take a look at 
the sf_buf infrastructure used for things like socket zero-copy send, which 
manages a temporary kernel mapping for a page.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: p_vmspace in syscall

2007-07-04 Thread Nicolas Cormier

On 7/4/07, Robert Watson [EMAIL PROTECTED] wrote:


On Mon, 2 Jul 2007, Nicolas Cormier wrote:

 I am trying to map some data allocated in kernel to a user process (via a
 syscall). I need the proc's vmspace, but the value of p_vmspace of the input
 proc argument is NULL ... How can I get a valid vmspace ?

When operating in a system call, the 'td' argument to the system call function
is the current thread pointer.  You can follow td-td_proc to get to the
current process (and therefore, its address space).  In general, I prefer
mapping user pages into kernel instead of kernel pages into user space, as it
reduces the chances of leakage of kernel data to user space, and there are
some useful primitives for making this easier.  For example, take a look at
the sf_buf infrastructure used for things like socket zero-copy send, which
manages a temporary kernel mapping for a page.



Yes Roman told me in private that I'm wrong with the first argument, I
thought that it was a proc*...

For my module I try to create a simple interface of a network allocator:
User code should look like this:

unsigned id;
void* data = netmalloc(host, size, id);
memcpy(data, toto, sizeof(toto);
netdetach(data);

and later in another process:
void* data = netattach(host, id);
...
netfree(data);

netmalloc syscall does something like that:
- query distant host to allocate size
- receive an id from distant host
- malloc in kernel size
- map the buffer to user process (*)

netdetach syscall:
- send data to distant host

netattach syscall:
- get data from host
- malloc in kernel size
- map the buffer to user process (*)

* I already watch the function vm_pgmoveco
(http://fxr.watson.org/fxr/source/kern/kern_subr.c?v=RELENG62#L78)

I used pgmoveco as follow:

vm_map_t mapa = proc-p_vmspace-vm_map,
size = round_page(size);
void* data = malloc(size,  M_NETMALLOC, M_WAITOK);
vm_offset_t addr = vm_map_min(mapa);
vm_map_find(mapa, NULL, 0, addr, size, TRUE, VM_PROT_ALL,
VM_PROT_ALL, MAP_NOFAULT);
vm_pgmoveco(mapa, (vm_offset_t)data, addr);


With this I have a panic with vm_page_insert, I am not sure to
understand the reason of this panic. I can't have multiple virtual
pages on the same physical page ?

Thanks!
--
Nicolas Cormier
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: p_vmspace in syscall

2007-07-04 Thread Robert Watson


On Wed, 4 Jul 2007, Nicolas Cormier wrote:


On 7/4/07, Robert Watson [EMAIL PROTECTED] wrote:


On Mon, 2 Jul 2007, Nicolas Cormier wrote:

I am trying to map some data allocated in kernel to a user process (via a 
syscall). I need the proc's vmspace, but the value of p_vmspace of the 
input proc argument is NULL ... How can I get a valid vmspace ?


When operating in a system call, the 'td' argument to the system call 
function is the current thread pointer.  You can follow td-td_proc to get 
to the current process (and therefore, its address space).  In general, I 
prefer mapping user pages into kernel instead of kernel pages into user 
space, as it reduces the chances of leakage of kernel data to user space, 
and there are some useful primitives for making this easier.  For example, 
take a look at the sf_buf infrastructure used for things like socket 
zero-copy send, which manages a temporary kernel mapping for a page.


Yes Roman told me in private that I'm wrong with the first argument, I 
thought that it was a proc*...


For my module I try to create a simple interface of a network allocator: 
User code should look like this:


unsigned id;
void* data = netmalloc(host, size, id);
memcpy(data, toto, sizeof(toto);
netdetach(data);

and later in another process:
void* data = netattach(host, id);
...
netfree(data);

netmalloc syscall does something like that:
- query distant host to allocate size
- receive an id from distant host
- malloc in kernel size
- map the buffer to user process (*)

netdetach syscall:
- send data to distant host

netattach syscall:
- get data from host
- malloc in kernel size
- map the buffer to user process (*)

* I already watch the function vm_pgmoveco
(http://fxr.watson.org/fxr/source/kern/kern_subr.c?v=RELENG62#L78)

I used pgmoveco as follow:

vm_map_t mapa = proc-p_vmspace-vm_map,
size = round_page(size);
void* data = malloc(size,  M_NETMALLOC, M_WAITOK);
vm_offset_t addr = vm_map_min(mapa);
vm_map_find(mapa, NULL, 0, addr, size, TRUE, VM_PROT_ALL,
VM_PROT_ALL, MAP_NOFAULT);
vm_pgmoveco(mapa, (vm_offset_t)data, addr);


With this I have a panic with vm_page_insert, I am not sure to understand 
the reason of this panic. I can't have multiple virtual pages on the same 
physical page ?


I think part of what you're running into here is a conceptual issue.  The 
pages allocated by malloc(9) belong to the kernel memory allocator, and are 
generally managed by the slab allocator.  While in principle you can map them 
into user space, you're going to have to set up a lot of book-keeping to 
properly free them again later, etc.  There are really two approaches you 
could be looking at:


(1) The user app allocates memory pages, perhaps using mmap() to map anonymous
memory or a file.  You then borrow those pages to use in-kernel, mapping
as required.

(2) Your kernel code allocates pages directly from the VM system, possibly
anonymous swap-backed pages from the page allocator, and maps them into
the kernel as required.

In either case, you'll need to think about address space limits, especially if 
the buffer is large -- the kernel address space on 32-bit systems is limited 
in size, since it shares the address space with a user application.  On 64-bit 
systems, this is not an issue.  You'll also need to make sure that the pages 
are both paged in and pinned in memory.  So before we talk about the details 
of the calls, we should think about how you plan to use the memory.


How much memory are we talking about -- enough to potentially run into kernel 
address space problems on 32-bit systems?  How long will the mappings persist 
-- do you map them into kernel for a brief period to fill them, and then leave 
them mapped into user space, or is this going to be a persistent shared 
mapping over a very long period of time?  Is the memory going to be pageable? 
How will it interact with things like mprotect(), msync(), etc?  What should 
happen if a the pages are released by the process using munmap() or by mapping 
over the region with mmap()?  What should happen in a child process if a 
process forks after netattach() and the parent calls netdatach()?  What 
happens if the process calls send() using a source address in the memory 
region, and zero-copy sockets are enabled, which would normally lead the page 
to be borrowed from the user process?


The underlying point here is that there is a model by which VM is managed -- 
pages, pagers, memory objects, mappings, address spaces, etc.  We can't just 
talk about pages being shared or mapped, we need to think about what is to be 
accomplished, and how to map that into the abstractions that already exist. 
Memory comes in different flavours, and generally speaking, you don't want to 
use pages that come from malloc(9) for sharing with userspace, so we need to 
think about what kind of memory you do need.


Robert N M Watson
Computer Laboratory
University of Cambridge
___

Re: p_vmspace in syscall

2007-07-04 Thread Nicolas Cormier

On 7/4/07, Robert Watson [EMAIL PROTECTED] wrote:

How much memory are we talking about -- enough to potentially run into kernel
address space problems on 32-bit systems?  How long will the mappings persist
-- do you map them into kernel for a brief period to fill them, and then leave
them mapped into user space, or is this going to be a persistent shared
mapping over a very long period of time?  Is the memory going to be pageable?
How will it interact with things like mprotect(), msync(), etc?  What should
happen if a the pages are released by the process using munmap() or by mapping
over the region with mmap()?  What should happen in a child process if a
process forks after netattach() and the parent calls netdatach()?  What
happens if the process calls send() using a source address in the memory
region, and zero-copy sockets are enabled, which would normally lead the page
to be borrowed from the user process?


Currently I'm just trying to play with kernel/modules/vm ... I'm a
newbie in kernel development and I just want to make a little
prototype of an in-kernel network allocator.
To start I only need to map a page (1024 bytes) from kernel to user process.
This memory will never be used by the kernel between the call of
net(malloc/attach) and the call of net(detach/free). So user and
kernel will never use this page at the same time.


The underlying point here is that there is a model by which VM is managed --
pages, pagers, memory objects, mappings, address spaces, etc.  We can't just
talk about pages being shared or mapped, we need to think about what is to be
accomplished, and how to map that into the abstractions that already exist.
Memory comes in different flavours, and generally speaking, you don't want to
use pages that come from malloc(9) for sharing with userspace, so we need to
think about what kind of memory you do need.


Thank you for your answer. Right now, I just want to do it as easily
as possible, I don't know if this kind of project could interest other
persons ? It is ok for me to work more on it later on, if there is any
further interest in doing it.
--
Nicolas Cormier
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: p_vmspace in syscall

2007-07-04 Thread Robert Watson


On Wed, 4 Jul 2007, Nicolas Cormier wrote:

Currently I'm just trying to play with kernel/modules/vm ... I'm a newbie in 
kernel development and I just want to make a little prototype of an 
in-kernel network allocator. To start I only need to map a page (1024 bytes) 
from kernel to user process. This memory will never be used by the kernel 
between the call of net(malloc/attach) and the call of net(detach/free). So 
user and kernel will never use this page at the same time.


The underlying point here is that there is a model by which VM is managed 
-- pages, pagers, memory objects, mappings, address spaces, etc.  We can't 
just talk about pages being shared or mapped, we need to think about what 
is to be accomplished, and how to map that into the abstractions that 
already exist. Memory comes in different flavours, and generally speaking, 
you don't want to use pages that come from malloc(9) for sharing with 
userspace, so we need to think about what kind of memory you do need.


Thank you for your answer. Right now, I just want to do it as easily as 
possible, I don't know if this kind of project could interest other persons 
? It is ok for me to work more on it later on, if there is any further 
interest in doing it.


What do you mean by a network allocator?  How do you plan to use these pages?

If you haven't already, you should look at the zero-copy socket code in 
uipc_cow.c.  The main criticism of this approach has been that it uses 
copy-on-write, leading to potential IPIs for VM shootdowns, etc.  An 
alternative, more along the lines of IO-Lite, would be to allow user space to 
explicitly abandon the page on send, then map a new page to replace it.  In 
which case you might consider a variation on the send system call that accepts 
only page-aligned arguments and has the effect of unmapping the pages that are 
sent.  In neither case, on the transmit side, does this require an 
modification to the kernel memory allocator.


The receive side has always been more tricky to deal with...

Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: p_vmspace in syscall

2007-07-04 Thread Nicolas Cormier

On 7/4/07, Robert Watson [EMAIL PROTECTED] wrote:

What do you mean by a network allocator?  How do you plan to use these pages?


First I just want to access a local copy of a distant buffer.
After the goal is to share memory between hosts (no concurrent access).


If you haven't already, you should look at the zero-copy socket code in
uipc_cow.c.  The main criticism of this approach has been that it uses
copy-on-write, leading to potential IPIs for VM shootdowns, etc.  An
alternative, more along the lines of IO-Lite, would be to allow user space to
explicitly abandon the page on send, then map a new page to replace it.  In
which case you might consider a variation on the send system call that accepts
only page-aligned arguments and has the effect of unmapping the pages that are
sent.  In neither case, on the transmit side, does this require an
modification to the kernel memory allocator.

The receive side has always been more tricky to deal with...



Ok I will take a look at uipc_cow.c,
Thank you
--
Nicolas Cormier
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: p_vmspace in syscall

2007-07-04 Thread Nicolas Cormier

On 7/4/07, Steve Watt [EMAIL PROTECTED] wrote:

In [EMAIL PROTECTED],
  Nicolas Cormier [EMAIL PROTECTED] wrote:
On 7/4/07, Robert Watson [EMAIL PROTECTED] wrote:

 When operating in a system call, the 'td' argument to the system call 
function
 is the current thread pointer.  You can follow td-td_proc to get to the
 current process (and therefore, its address space).  In general, I prefer
 mapping user pages into kernel instead of kernel pages into user space, as it
 reduces the chances of leakage of kernel data to user space, and there are
 some useful primitives for making this easier.  For example, take a look at
 the sf_buf infrastructure used for things like socket zero-copy send, which
 manages a temporary kernel mapping for a page.


netmalloc syscall does something like that:
- query distant host to allocate size
- receive an id from distant host
- malloc in kernel size
- map the buffer to user process (*)

netdetach syscall:
- send data to distant host

netattach syscall:
- get data from host
- malloc in kernel size
- map the buffer to user process (*)

What this really sounds like is network shared memory or remote DMA.
I would architect this to remove as much of the management code as
possible from the kernel (i.e. query the distant host, get ID, etc.)
into a userland daemon.  Depending on the exact semantics you want,
you'll probably need to write a new kind of pager.

Basically, at the netmalloc call, you would simply pass the reqest
back to the userland daemon, which would format it in whatever way is
needed to cross the net, send the request off, receive the ID, and
give association information back to the kernel (number of pages,
protections, whatever).  Then the call would map the new pages into
the userland process just like it was a shared memory segment.

At detach time, the message would again go to the userland daemon,
which would map the pages locally and probably use a zero-copy send
to ship the data to the remote host.

There are some fun potential interactions in there in code I haven't
looked at in a long time.  I'll resist the urge to dive in and hack
something together, since VM systems have a way of being tricky in
unexpected places.


Thank you for this post ! Your design should be a good start.
--
Nicolas Cormier
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: p_vmspace in syscall

2007-07-04 Thread Steve Watt
In [EMAIL PROTECTED],
  Nicolas Cormier [EMAIL PROTECTED] wrote:
On 7/4/07, Robert Watson [EMAIL PROTECTED] wrote:

 When operating in a system call, the 'td' argument to the system call 
 function
 is the current thread pointer.  You can follow td-td_proc to get to the
 current process (and therefore, its address space).  In general, I prefer
 mapping user pages into kernel instead of kernel pages into user space, as it
 reduces the chances of leakage of kernel data to user space, and there are
 some useful primitives for making this easier.  For example, take a look at
 the sf_buf infrastructure used for things like socket zero-copy send, which
 manages a temporary kernel mapping for a page.


netmalloc syscall does something like that:
- query distant host to allocate size
- receive an id from distant host
- malloc in kernel size
- map the buffer to user process (*)

netdetach syscall:
- send data to distant host

netattach syscall:
- get data from host
- malloc in kernel size
- map the buffer to user process (*)

What this really sounds like is network shared memory or remote DMA.
I would architect this to remove as much of the management code as
possible from the kernel (i.e. query the distant host, get ID, etc.)
into a userland daemon.  Depending on the exact semantics you want,
you'll probably need to write a new kind of pager.

Basically, at the netmalloc call, you would simply pass the reqest
back to the userland daemon, which would format it in whatever way is
needed to cross the net, send the request off, receive the ID, and
give association information back to the kernel (number of pages,
protections, whatever).  Then the call would map the new pages into
the userland process just like it was a shared memory segment.

At detach time, the message would again go to the userland daemon,
which would map the pages locally and probably use a zero-copy send
to ship the data to the remote host.

There are some fun potential interactions in there in code I haven't
looked at in a long time.  I'll resist the urge to dive in and hack
something together, since VM systems have a way of being tricky in
unexpected places.

-- 
Steve Watt KD6GGD  PP-ASEL-IA  ICBM: 121W 56' 57.5 / 37N 20' 15.3
 Internet: steve @ Watt.COM  Whois: SW32-ARIN
   Free time?  There's no such thing.  It just comes in varying prices...
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: p_vmspace in syscall

2007-07-03 Thread Nicolas Cormier

On 7/2/07, Nicolas Cormier [EMAIL PROTECTED] wrote:

Hi,
I am trying to map some data allocated in kernel to a user process
(via a syscall).
I need the proc's vmspace, but the value of p_vmspace of the input
proc argument is NULL ...
How can I get a valid vmspace ?

Thanks !


Ok, syscall function passed a proc* as arguments, I don't know where
this proc* come from but it works with:
struct thread *td = curthread;
p = td-td_proc;
--
Nicolas Cormier
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: p_vmspace in syscall

2007-07-03 Thread Roman Divacky
 Ok, syscall function passed a proc* as arguments, I don't know where

this does not make any sense... userland processes have no way to
determine where a proc is stored...

what exactly are you trying to achieve?
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]