Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-14 Thread Chris Metcalf

On 3/14/2017 12:12 PM, Till Smejkal wrote:

On Mon, 13 Mar 2017, Andy Lutomirski wrote:

On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal
 wrote:

On Mon, 13 Mar 2017, Andy Lutomirski wrote:

This sounds rather complicated.  Getting TLB flushing right seems
tricky.  Why not just map the same thing into multiple mms?

This is exactly what happens at the end. The memory region that is described by 
the
VAS segment will be mapped in the ASes that use the segment.

So why is this kernel feature better than just doing MAP_SHARED
manually in userspace?

One advantage of VAS segments is that they can be globally queried by user 
programs
which means that VAS segments can be shared by applications that not 
necessarily have
to be related. If I am not mistaken, MAP_SHARED of pure in memory data will 
only work
if the tasks that share the memory region are related (aka. have a common 
parent that
initialized the shared mapping). Otherwise, the shared mapping have to be 
backed by a
file.


True, but why is this bad?  The shared mapping will be memory resident
regardless, even if backed by a file (unless swapped out under heavy
memory pressure, but arguably that's a feature anyway).  More importantly,
having a file name is a simple and consistent way of identifying such
shared memory segments.

With a little work, you can also arrange to map such files into memory
at a fixed address in all participating processes, thus making internal
pointers work correctly.


VAS segments on the other side allow sharing of pure in memory data by
arbitrary related tasks without the need of a file. This becomes especially
interesting if one combines VAS segments with non-volatile memory since one can 
keep
data structures in the NVM and still be able to share them between multiple 
tasks.


I am not fully up to speed on NV/pmem stuff, but isn't that exactly what
the DAX mode is supposed to allow you to do?  If so, isn't sharing a
mapped file on a DAX filesystem on top of pmem equivalent to what
you're proposing?

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-14 Thread Till Smejkal
On Tue, 14 Mar 2017, Chris Metcalf wrote:
> On 3/14/2017 12:12 PM, Till Smejkal wrote:
> > On Mon, 13 Mar 2017, Andy Lutomirski wrote:
> > > On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal
> > >  wrote:
> > > > On Mon, 13 Mar 2017, Andy Lutomirski wrote:
> > > > > This sounds rather complicated.  Getting TLB flushing right seems
> > > > > tricky.  Why not just map the same thing into multiple mms?
> > > > This is exactly what happens at the end. The memory region that is 
> > > > described by the
> > > > VAS segment will be mapped in the ASes that use the segment.
> > > So why is this kernel feature better than just doing MAP_SHARED
> > > manually in userspace?
> > One advantage of VAS segments is that they can be globally queried by user 
> > programs
> > which means that VAS segments can be shared by applications that not 
> > necessarily have
> > to be related. If I am not mistaken, MAP_SHARED of pure in memory data will 
> > only work
> > if the tasks that share the memory region are related (aka. have a common 
> > parent that
> > initialized the shared mapping). Otherwise, the shared mapping have to be 
> > backed by a
> > file.
> 
> True, but why is this bad?  The shared mapping will be memory resident
> regardless, even if backed by a file (unless swapped out under heavy
> memory pressure, but arguably that's a feature anyway).  More importantly,
> having a file name is a simple and consistent way of identifying such
> shared memory segments.
> 
> With a little work, you can also arrange to map such files into memory
> at a fixed address in all participating processes, thus making internal
> pointers work correctly.

I don't want to say that the interface provided by MAP_SHARED is bad. I am only
arguing that VAS segments and the interface that they provide have an advantage 
over
the existing ones in my opinion. However, Matthew Wilcox also suggested in some
earlier mail that VAS segments could be exported to user space via a special 
purpose
filesystem. This would enable users of VAS segments to also just use some 
special
files to setup the shared memory regions. But since the VAS segment itself 
already
knows where at has to be mapped in the virtual address space of the process, the
establishing of the shared memory region would be very easy for the user.

> > VAS segments on the other side allow sharing of pure in memory data by
> > arbitrary related tasks without the need of a file. This becomes especially
> > interesting if one combines VAS segments with non-volatile memory since one 
> > can keep
> > data structures in the NVM and still be able to share them between multiple 
> > tasks.
> 
> I am not fully up to speed on NV/pmem stuff, but isn't that exactly what
> the DAX mode is supposed to allow you to do?  If so, isn't sharing a
> mapped file on a DAX filesystem on top of pmem equivalent to what
> you're proposing?

If I read the documentation to DAX filesystems correctly, it is indeed possible 
to us
them to create files that life purely in NVM. I wasn't fully aware of this 
feature.
Thanks for the pointer.

However, the main contribution of this patchset is actually the idea of first 
class
virtual address spaces and that they can be used to allow processes to have 
multiple
different views on the system's main memory. For us, VAS segments were another 
logic
step in the same direction (from first class virtual address spaces to first 
class
address space segments). However, if there is already functionality in the Linux
kernel to achieve the exact same behavior, there is no real need to add VAS 
segments.
I will continue thinking about them and either find a different situation where 
the
currently available interface is not sufficient/too complicated or drop VAS 
segments
from future version of the patch set.

Till

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-14 Thread Till Smejkal
On Mon, 13 Mar 2017, Andy Lutomirski wrote:
> On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal
>  wrote:
> > On Mon, 13 Mar 2017, Andy Lutomirski wrote:
> >> This sounds rather complicated.  Getting TLB flushing right seems
> >> tricky.  Why not just map the same thing into multiple mms?
> >
> > This is exactly what happens at the end. The memory region that is 
> > described by the
> > VAS segment will be mapped in the ASes that use the segment.
> 
> So why is this kernel feature better than just doing MAP_SHARED
> manually in userspace?

One advantage of VAS segments is that they can be globally queried by user 
programs
which means that VAS segments can be shared by applications that not 
necessarily have
to be related. If I am not mistaken, MAP_SHARED of pure in memory data will 
only work
if the tasks that share the memory region are related (aka. have a common 
parent that
initialized the shared mapping). Otherwise, the shared mapping have to be 
backed by a
file. VAS segments on the other side allow sharing of pure in memory data by
arbitrary related tasks without the need of a file. This becomes especially
interesting if one combines VAS segments with non-volatile memory since one can 
keep
data structures in the NVM and still be able to share them between multiple 
tasks.

> >> Ick.  Please don't do this.  Can we please keep an mm as just an mm
> >> and not make it look magically different depending on which process
> >> maps it?  If you need a trampoline (which you do, of course), just
> >> write a trampoline in regular user code and map it manually.
> >
> > Did I understand you correctly that you are proposing that the switching 
> > thread
> > should make sure by itself that its code, stack, … memory regions are 
> > properly setup
> > in the new AS before/after switching into it? I think, this would make 
> > using first
> > class virtual address spaces much more difficult for user applications to 
> > the extend
> > that I am not even sure if they can be used at all. At the moment, 
> > switching into a
> > VAS is a very simple operation for an application because the kernel will 
> > just simply
> > do the right thing.
> 
> Yes.  I think that having the same mm_struct look different from
> different tasks is problematic.  Getting it right in the arch code is
> going to be nasty.  The heuristics of what to share are also tough --
> why would text + data + stack or whatever you're doing be adequate?
> What if you're in a thread?  What if two tasks have their stacks in
> the same place?

The different ASes that a task now can have when it uses first class virtual 
address
spaces are not realized in the kernel by using only one mm_struct per task that 
just
looks differently but by using multiple mm_structs - one for each AS that the 
task
can execute in. When a task attaches a first class virtual address space to 
itself to
be able to use another AS, the kernel adds a temporary mm_struct to this task 
that
contains the mappings of the first class virtual address space and the one 
shared
with the task's original AS. If a thread now wants to switch into this attached 
first
class virtual address space the kernel only changes the 'mm' and 'active_mm' 
pointers
in the task_struct of the thread to the temporary mm_struct and performs the
corresponding mm_switch operation. The original mm_struct of the thread will 
not be
changed.

Accordingly, I do not magically make mm_structs look differently depending on 
the
task that uses it, but create temporary mm_structs that only contain mappings 
to the
same memory regions.

I agree that finding a good heuristics of what to share is difficult. At the 
moment,
all memory regions that are available in the task's original AS will also be
available when a thread switches into an attached first class virtual address 
space
(aka. are shared). That means that VAS can mainly be used to extend the AS of a 
task
in the current state of the implementation. The reason why I implemented the 
sharing
in this way is that I didn't want to break shared libraries. If I only share
code+heap+stack, shared libraries would not work anymore after switching into a 
VAS.

> I could imagine something like a sigaltstack() mode that lets you set
> a signal up to also switch mm could be useful.

This is a very interesting idea. I will keep it in mind for future use cases of
multiple virtual address spaces per task.

Thanks
Till

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

Re: [RFC PATCH 07/13] kernel/fork: Split and export 'mm_alloc' and 'mm_init'

2017-03-14 Thread Till Smejkal
On Tue, 14 Mar 2017, David Laight wrote:
> From: Linuxppc-dev Till Smejkal
> > Sent: 13 March 2017 22:14
> > The only way until now to create a new memory map was via the exported
> > function 'mm_alloc'. Unfortunately, this function not only allocates a new
> > memory map, but also completely initializes it. However, with the
> > introduction of first class virtual address spaces, some initialization
> > steps done in 'mm_alloc' are not applicable to the memory maps needed for
> > this feature and hence would lead to errors in the kernel code.
> > 
> > Instead of introducing a new function that can allocate and initialize
> > memory maps for first class virtual address spaces and potentially
> > duplicate some code, I decided to split the mm_alloc function as well as
> > the 'mm_init' function that it uses.
> > 
> > Now there are four functions exported instead of only one. The new
> > 'mm_alloc' function only allocates a new mm_struct and zeros it out. If one
> > want to have the old behavior of mm_alloc one can use the newly introduced
> > function 'mm_alloc_and_setup' which not only allocates a new mm_struct but
> > also fully initializes it.
> ...
> 
> That looks like bugs waiting to happen.
> You need unchanged code to fail to compile.

Thank you for this hint. I can give the new mm_alloc function a different name 
so
that code that uses the *old* mm_alloc function will fail to compile. I just 
reused
the old name when I wrote the code, because mm_alloc was only used in very few
locations in the kernel (2 times in the whole kernel source) which made 
identifying
and changing them very easy. I also don't think that there will be many users 
in the
kernel for mm_alloc in the future because it is a relatively low level data
structure. But if it is better to use a different name for the new function, I 
am
very happy to change this.

Till

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-14 Thread Till Smejkal
On Mon, 13 Mar 2017, Andy Lutomirski wrote:
> On Mon, Mar 13, 2017 at 3:14 PM, Till Smejkal
>  wrote:
> > This patchset extends the kernel memory management subsystem with a new
> > type of address spaces (called VAS) which can be created and destroyed
> > independently of processes by a user in the system. During its lifetime
> > such a VAS can be attached to processes by the user which allows a process
> > to have multiple address spaces and thereby multiple, potentially
> > different, views on the system's main memory. During its execution the
> > threads belonging to the process are able to switch freely between the
> > different attached VAS and the process' original AS enabling them to
> > utilize the different available views on the memory.
> 
> Sounds like the old SKAS feature for UML.

I haven't heard of this feature before, but after shortly looking at the 
description
on the UML website it actually has some similarities with what I am proposing. 
But as
far as I can see this was not merged into the mainline kernel, was it? In 
addition, I
think that first class virtual address spaces goes even one step further by 
allowing
AS to live independently of processes.

> > In addition to the concept of first class virtual address spaces, this
> > patchset introduces yet another feature called VAS segments. VAS segments
> > are memory regions which have a fixed size and position in the virtual
> > address space and can be shared between multiple first class virtual
> > address spaces. Such shareable memory regions are especially useful for
> > in-memory pointer-based data structures or other pure in-memory data.
> 
> This sounds rather complicated.  Getting TLB flushing right seems
> tricky.  Why not just map the same thing into multiple mms?

This is exactly what happens at the end. The memory region that is described by 
the
VAS segment will be mapped in the ASes that use the segment.

> >
> > | VAS |  processes  |
> > -
> > switch  |   468ns |  1944ns |
> 
> The solution here is IMO to fix the scheduler.

IMHO it will be very difficult for the scheduler code to reach the same 
switching
time as the pure VAS switch because switching between VAS does not involve 
saving any
registers or FPU state and does not require selecting the next runnable task. 
VAS
switch is basically a system call that just changes the AS of the current thread
which makes it a very lightweight operation.

> Also, FWIW, I have patches (that need a little work) that will make
> switch_mm() wy faster on x86.

These patches will also improve the speed of the VAS switch operation. We are 
also
using the switch_mm function in the background to perform the actual hardware 
switch
between the two ASes. The main reason why the VAS switch is faster than the task
switch is that it just has to do fewer things.

> > At the current state of the development, first class virtual address spaces
> > have one limitation, that we haven't been able to solve so far. The feature
> > allows, that different threads of the same process can execute in different
> > AS at the same time. This is possible, because the VAS-switch operation
> > only changes the active mm_struct for the task_struct of the calling
> > thread. However, when a thread switches into a first class virtual address
> > space, some parts of its original AS are duplicated into the new one to
> > allow the thread to continue its execution at its current state.
> 
> Ick.  Please don't do this.  Can we please keep an mm as just an mm
> and not make it look magically different depending on which process
> maps it?  If you need a trampoline (which you do, of course), just
> write a trampoline in regular user code and map it manually.

Did I understand you correctly that you are proposing that the switching thread
should make sure by itself that its code, stack, … memory regions are properly 
setup
in the new AS before/after switching into it? I think, this would make using 
first
class virtual address spaces much more difficult for user applications to the 
extend
that I am not even sure if they can be used at all. At the moment, switching 
into a
VAS is a very simple operation for an application because the kernel will just 
simply
do the right thing.

Till

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-14 Thread Andy Lutomirski
On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal
 wrote:
> On Mon, 13 Mar 2017, Andy Lutomirski wrote:
>> This sounds rather complicated.  Getting TLB flushing right seems
>> tricky.  Why not just map the same thing into multiple mms?
>
> This is exactly what happens at the end. The memory region that is described 
> by the
> VAS segment will be mapped in the ASes that use the segment.

So why is this kernel feature better than just doing MAP_SHARED
manually in userspace?


>> Ick.  Please don't do this.  Can we please keep an mm as just an mm
>> and not make it look magically different depending on which process
>> maps it?  If you need a trampoline (which you do, of course), just
>> write a trampoline in regular user code and map it manually.
>
> Did I understand you correctly that you are proposing that the switching 
> thread
> should make sure by itself that its code, stack, … memory regions are 
> properly setup
> in the new AS before/after switching into it? I think, this would make using 
> first
> class virtual address spaces much more difficult for user applications to the 
> extend
> that I am not even sure if they can be used at all. At the moment, switching 
> into a
> VAS is a very simple operation for an application because the kernel will 
> just simply
> do the right thing.

Yes.  I think that having the same mm_struct look different from
different tasks is problematic.  Getting it right in the arch code is
going to be nasty.  The heuristics of what to share are also tough --
why would text + data + stack or whatever you're doing be adequate?
What if you're in a thread?  What if two tasks have their stacks in
the same place?

I could imagine something like a sigaltstack() mode that lets you set
a signal up to also switch mm could be useful.

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

RE: [RFC PATCH 07/13] kernel/fork: Split and export 'mm_alloc' and 'mm_init'

2017-03-14 Thread David Laight
From: Linuxppc-dev Till Smejkal
> Sent: 13 March 2017 22:14
> The only way until now to create a new memory map was via the exported
> function 'mm_alloc'. Unfortunately, this function not only allocates a new
> memory map, but also completely initializes it. However, with the
> introduction of first class virtual address spaces, some initialization
> steps done in 'mm_alloc' are not applicable to the memory maps needed for
> this feature and hence would lead to errors in the kernel code.
> 
> Instead of introducing a new function that can allocate and initialize
> memory maps for first class virtual address spaces and potentially
> duplicate some code, I decided to split the mm_alloc function as well as
> the 'mm_init' function that it uses.
> 
> Now there are four functions exported instead of only one. The new
> 'mm_alloc' function only allocates a new mm_struct and zeros it out. If one
> want to have the old behavior of mm_alloc one can use the newly introduced
> function 'mm_alloc_and_setup' which not only allocates a new mm_struct but
> also fully initializes it.
...

That looks like bugs waiting to happen.
You need unchanged code to fail to compile.

David



___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc