Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Till Smejkal
On Thu, 16 Mar 2017, Thomas Gleixner wrote:
> On Thu, 16 Mar 2017, Till Smejkal wrote:
> > On Thu, 16 Mar 2017, Thomas Gleixner wrote:
> > > Why do we need yet another mechanism to represent something which looks
> > > like a file instead of simply using existing mechanisms and extend them?
> > 
> > You are right. I also recognized during the discussion with Andy, Chris,
> > Matthew, Luck, Rich and the others that there are already other
> > techniques in the Linux kernel that can achieve the same functionality
> > when combined. As I said also to the others, I will drop the VAS segments
> > for future versions. The first class virtual address space feature was
> > the more interesting part of the patchset anyways.
> 
> While you are at it, could you please drop this 'first class' marketing as
> well? It has zero technical value, really.

Yes of course. I am sorry for the trouble that I caused already.

Thanks
Till

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Thomas Gleixner
On Thu, 16 Mar 2017, Till Smejkal wrote:
> On Thu, 16 Mar 2017, Thomas Gleixner wrote:
> > Why do we need yet another mechanism to represent something which looks
> > like a file instead of simply using existing mechanisms and extend them?
> 
> You are right. I also recognized during the discussion with Andy, Chris,
> Matthew, Luck, Rich and the others that there are already other
> techniques in the Linux kernel that can achieve the same functionality
> when combined. As I said also to the others, I will drop the VAS segments
> for future versions. The first class virtual address space feature was
> the more interesting part of the patchset anyways.

While you are at it, could you please drop this 'first class' marketing as
well? It has zero technical value, really.

Thanks,

tglx

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Till Smejkal
On Thu, 16 Mar 2017, Thomas Gleixner wrote:
> Why do we need yet another mechanism to represent something which looks
> like a file instead of simply using existing mechanisms and extend them?

You are right. I also recognized during the discussion with Andy, Chris, 
Matthew,
Luck, Rich and the others that there are already other techniques in the Linux 
kernel
that can achieve the same functionality when combined. As I said also to the 
others,
I will drop the VAS segments for future versions. The first class virtual 
address
space feature was the more interesting part of the patchset anyways.

Thanks
Till

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Till Smejkal
On Wed, 15 Mar 2017, Luck, Tony wrote:
> On Wed, Mar 15, 2017 at 03:02:34PM -0700, Till Smejkal wrote:
> > I don't agree here. VAS segments are basically in-memory files that are 
> > handled by
> > the kernel directly without using a file system. Hence, if an application 
> > uses a VAS
> > segment to store data the same rules apply as if it uses a file. Everything 
> > that it
> > saves in the VAS segment might be accessible by other applications. An 
> > application
> > using VAS segments should be aware of this fact. In addition, the resources 
> > that are
> > represented by a VAS segment are not leaked. As I said, VAS segments are 
> > much like
> > files. Hence, if you don't want to use them any more, delete them. But as 
> > with files,
> > the kernel will not delete them for you (although something like this can 
> > be added).
> 
> So how do they differ from shmget(2), shmat(2), shmdt(2), shmctl(2)?
> 
> Apart from VAS having better names, instead of silly "key_t key" ones.

Unfortunately, I have to admit that the VAS segments don't differ from shm* a 
lot.
The implementation is differently, but the functionality that you can achieve 
with it
is very similar. I am sorry. We should have looked more closely at the whole
functionality that is provided by the shmem subsystem before working on VAS 
segments.

However, VAS segments are not the key part of this patch set. The more 
interesting
functionality in our opinion is the introduction of first class virtual address
spaces and what they can be used for. VAS segments were just another logical 
step for
us (from first class virtual address spaces to first class virtual address space
segments) but since their functionality can be achieved with various other 
already
existing features of the Linux kernel, I will probably drop them in future 
versions
of the patchset.

Thanks
Till

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Thomas Gleixner
On Wed, 15 Mar 2017, Till Smejkal wrote:
> On Wed, 15 Mar 2017, Andy Lutomirski wrote:

> > > VAS segments on the other side would provide a functionality to
> > > achieve the same without the need of any mounted filesystem. However,
> > > I agree, that this is just a small advantage compared to what can
> > > already be achieved with the existing functionality provided by the
> > > Linux kernel.
> > 
> > I see this "small advantage" as "resource leak and security problem".
> 
> I don't agree here. VAS segments are basically in-memory files that are
> handled by the kernel directly without using a file system. Hence, if an

Why do we need yet another mechanism to represent something which looks
like a file instead of simply using existing mechanisms and extend them?

Thanks,

tglx

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Andy Lutomirski
On Tue, Mar 14, 2017 at 9:12 AM, Till Smejkal
 wrote:
> On Mon, 13 Mar 2017, Andy Lutomirski wrote:
>> On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal
>>  wrote:
>> > On Mon, 13 Mar 2017, Andy Lutomirski wrote:
>> >> This sounds rather complicated.  Getting TLB flushing right seems
>> >> tricky.  Why not just map the same thing into multiple mms?
>> >
>> > This is exactly what happens at the end. The memory region that is 
>> > described by the
>> > VAS segment will be mapped in the ASes that use the segment.
>>
>> So why is this kernel feature better than just doing MAP_SHARED
>> manually in userspace?
>
> One advantage of VAS segments is that they can be globally queried by user 
> programs
> which means that VAS segments can be shared by applications that not 
> necessarily have
> to be related. If I am not mistaken, MAP_SHARED of pure in memory data will 
> only work
> if the tasks that share the memory region are related (aka. have a common 
> parent that
> initialized the shared mapping). Otherwise, the shared mapping have to be 
> backed by a
> file.

What's wrong with memfd_create()?

> VAS segments on the other side allow sharing of pure in memory data by
> arbitrary related tasks without the need of a file. This becomes especially
> interesting if one combines VAS segments with non-volatile memory since one 
> can keep
> data structures in the NVM and still be able to share them between multiple 
> tasks.

What's wrong with regular mmap?

>
>> >> Ick.  Please don't do this.  Can we please keep an mm as just an mm
>> >> and not make it look magically different depending on which process
>> >> maps it?  If you need a trampoline (which you do, of course), just
>> >> write a trampoline in regular user code and map it manually.
>> >
>> > Did I understand you correctly that you are proposing that the switching 
>> > thread
>> > should make sure by itself that its code, stack, … memory regions are 
>> > properly setup
>> > in the new AS before/after switching into it? I think, this would make 
>> > using first
>> > class virtual address spaces much more difficult for user applications to 
>> > the extend
>> > that I am not even sure if they can be used at all. At the moment, 
>> > switching into a
>> > VAS is a very simple operation for an application because the kernel will 
>> > just simply
>> > do the right thing.
>>
>> Yes.  I think that having the same mm_struct look different from
>> different tasks is problematic.  Getting it right in the arch code is
>> going to be nasty.  The heuristics of what to share are also tough --
>> why would text + data + stack or whatever you're doing be adequate?
>> What if you're in a thread?  What if two tasks have their stacks in
>> the same place?
>
> The different ASes that a task now can have when it uses first class virtual 
> address
> spaces are not realized in the kernel by using only one mm_struct per task 
> that just
> looks differently but by using multiple mm_structs - one for each AS that the 
> task
> can execute in. When a task attaches a first class virtual address space to 
> itself to
> be able to use another AS, the kernel adds a temporary mm_struct to this task 
> that
> contains the mappings of the first class virtual address space and the one 
> shared
> with the task's original AS. If a thread now wants to switch into this 
> attached first
> class virtual address space the kernel only changes the 'mm' and 'active_mm' 
> pointers
> in the task_struct of the thread to the temporary mm_struct and performs the
> corresponding mm_switch operation. The original mm_struct of the thread will 
> not be
> changed.
>
> Accordingly, I do not magically make mm_structs look differently depending on 
> the
> task that uses it, but create temporary mm_structs that only contain mappings 
> to the
> same memory regions.

This sounds complicated and fragile.  What happens if a heuristically
shared region coincides with a region in the "first class address
space" being selected?

I think the right solution is "you're a user program playing virtual
address games -- make sure you do it right".

--Andy

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Luck, Tony
On Wed, Mar 15, 2017 at 03:02:34PM -0700, Till Smejkal wrote:
> I don't agree here. VAS segments are basically in-memory files that are 
> handled by
> the kernel directly without using a file system. Hence, if an application 
> uses a VAS
> segment to store data the same rules apply as if it uses a file. Everything 
> that it
> saves in the VAS segment might be accessible by other applications. An 
> application
> using VAS segments should be aware of this fact. In addition, the resources 
> that are
> represented by a VAS segment are not leaked. As I said, VAS segments are much 
> like
> files. Hence, if you don't want to use them any more, delete them. But as 
> with files,
> the kernel will not delete them for you (although something like this can be 
> added).

So how do they differ from shmget(2), shmat(2), shmdt(2), shmctl(2)?

Apart from VAS having better names, instead of silly "key_t key" ones.

-Tony

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Till Smejkal
On Wed, 15 Mar 2017, Andy Lutomirski wrote:
> > One advantage of VAS segments is that they can be globally queried by user 
> > programs
> > which means that VAS segments can be shared by applications that not 
> > necessarily have
> > to be related. If I am not mistaken, MAP_SHARED of pure in memory data will 
> > only work
> > if the tasks that share the memory region are related (aka. have a common 
> > parent that
> > initialized the shared mapping). Otherwise, the shared mapping have to be 
> > backed by a
> > file.
> 
> What's wrong with memfd_create()?
> 
> > VAS segments on the other side allow sharing of pure in memory data by
> > arbitrary related tasks without the need of a file. This becomes especially
> > interesting if one combines VAS segments with non-volatile memory since one 
> > can keep
> > data structures in the NVM and still be able to share them between multiple 
> > tasks.
> 
> What's wrong with regular mmap?

I never wanted to say that there is something wrong with regular mmap. We just
figured that with VAS segments you could remove the need to mmap your shared 
data but
instead can keep everything purely in memory.

Unfortunately, I am not at full speed with memfds. Is my understanding correct 
that
if the last user of such a file descriptor closes it, the corresponding memory 
is
freed? Accordingly, memfd cannot be used to keep data in memory while no 
program is
currently using it, can it? To be able to do this you need again some 
representation
of the data in a file? Yes, you can use a tmpfs to keep the file content in 
memory as
well, or some DAX filesystem to keep the file content in NVM, but this always
requires that such filesystems are mounted in the system that the application is
currently running on. VAS segments on the other side would provide a 
functionality to
achieve the same without the need of any mounted filesystem. However, I agree, 
that
this is just a small advantage compared to what can already be achieved with the
existing functionality provided by the Linux kernel. I probably need to revisit 
the
whole idea of first class virtual address space segments before continuing with 
this
pacthset. Thank you very much for the great feedback.

> >> >> Ick.  Please don't do this.  Can we please keep an mm as just an mm
> >> >> and not make it look magically different depending on which process
> >> >> maps it?  If you need a trampoline (which you do, of course), just
> >> >> write a trampoline in regular user code and map it manually.
> >> >
> >> > Did I understand you correctly that you are proposing that the switching 
> >> > thread
> >> > should make sure by itself that its code, stack, … memory regions are 
> >> > properly setup
> >> > in the new AS before/after switching into it? I think, this would make 
> >> > using first
> >> > class virtual address spaces much more difficult for user applications 
> >> > to the extend
> >> > that I am not even sure if they can be used at all. At the moment, 
> >> > switching into a
> >> > VAS is a very simple operation for an application because the kernel 
> >> > will just simply
> >> > do the right thing.
> >>
> >> Yes.  I think that having the same mm_struct look different from
> >> different tasks is problematic.  Getting it right in the arch code is
> >> going to be nasty.  The heuristics of what to share are also tough --
> >> why would text + data + stack or whatever you're doing be adequate?
> >> What if you're in a thread?  What if two tasks have their stacks in
> >> the same place?
> >
> > The different ASes that a task now can have when it uses first class 
> > virtual address
> > spaces are not realized in the kernel by using only one mm_struct per task 
> > that just
> > looks differently but by using multiple mm_structs - one for each AS that 
> > the task
> > can execute in. When a task attaches a first class virtual address space to 
> > itself to
> > be able to use another AS, the kernel adds a temporary mm_struct to this 
> > task that
> > contains the mappings of the first class virtual address space and the one 
> > shared
> > with the task's original AS. If a thread now wants to switch into this 
> > attached first
> > class virtual address space the kernel only changes the 'mm' and 
> > 'active_mm' pointers
> > in the task_struct of the thread to the temporary mm_struct and performs the
> > corresponding mm_switch operation. The original mm_struct of the thread 
> > will not be
> > changed.
> >
> > Accordingly, I do not magically make mm_structs look differently depending 
> > on the
> > task that uses it, but create temporary mm_structs that only contain 
> > mappings to the
> > same memory regions.
> 
> This sounds complicated and fragile.  What happens if a heuristically
> shared region coincides with a region in the "first class address
> space" being selected?

If such a conflict happens, the task cannot use the first class address space 
and the
corresponding system call will return an error. 

Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Till Smejkal
On Wed, 15 Mar 2017, Andy Lutomirski wrote:
> On Wed, Mar 15, 2017 at 12:44 PM, Till Smejkal
>  wrote:
> > On Wed, 15 Mar 2017, Andy Lutomirski wrote:
> >> > One advantage of VAS segments is that they can be globally queried by 
> >> > user programs
> >> > which means that VAS segments can be shared by applications that not 
> >> > necessarily have
> >> > to be related. If I am not mistaken, MAP_SHARED of pure in memory data 
> >> > will only work
> >> > if the tasks that share the memory region are related (aka. have a 
> >> > common parent that
> >> > initialized the shared mapping). Otherwise, the shared mapping have to 
> >> > be backed by a
> >> > file.
> >>
> >> What's wrong with memfd_create()?
> >>
> >> > VAS segments on the other side allow sharing of pure in memory data by
> >> > arbitrary related tasks without the need of a file. This becomes 
> >> > especially
> >> > interesting if one combines VAS segments with non-volatile memory since 
> >> > one can keep
> >> > data structures in the NVM and still be able to share them between 
> >> > multiple tasks.
> >>
> >> What's wrong with regular mmap?
> >
> > I never wanted to say that there is something wrong with regular mmap. We 
> > just
> > figured that with VAS segments you could remove the need to mmap your 
> > shared data but
> > instead can keep everything purely in memory.
> 
> memfd does that.

Yes, that's right. Thanks for giving me the pointer to this. I should have 
researched
more carefully before starting to work at VAS segments.

> > VAS segments on the other side would provide a functionality to
> > achieve the same without the need of any mounted filesystem. However, I 
> > agree, that
> > this is just a small advantage compared to what can already be achieved 
> > with the
> > existing functionality provided by the Linux kernel.
> 
> I see this "small advantage" as "resource leak and security problem".

I don't agree here. VAS segments are basically in-memory files that are handled 
by
the kernel directly without using a file system. Hence, if an application uses 
a VAS
segment to store data the same rules apply as if it uses a file. Everything 
that it
saves in the VAS segment might be accessible by other applications. An 
application
using VAS segments should be aware of this fact. In addition, the resources 
that are
represented by a VAS segment are not leaked. As I said, VAS segments are much 
like
files. Hence, if you don't want to use them any more, delete them. But as with 
files,
the kernel will not delete them for you (although something like this can be 
added).

> >> This sounds complicated and fragile.  What happens if a heuristically
> >> shared region coincides with a region in the "first class address
> >> space" being selected?
> >
> > If such a conflict happens, the task cannot use the first class address 
> > space and the
> > corresponding system call will return an error. However, with the current 
> > available
> > virtual address space size that programs can use, such conflicts are 
> > probably rare.
> 
> A bug that hits 1% of the time is often worse than one that hits 100%
> of the time because debugging it is miserable.

I don't agree that this is a bug at all. If there is a conflict in the memory 
layout
of the ASes the application simply cannot use this first class virtual address 
space.
Every application that wants to use first class virtual address spaces should 
check
for error return values and handle them.

This situation is similar to mapping a file at some special address in memory 
because
the file contains pointer based data structures and the application wants to use
them, but the kernel cannot map the file at this particular position in the
application's AS because there is already a different conflicting mapping. If an
application wants to do such things, it should also handle all the errors that 
can
occur.

Till

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Andy Lutomirski
On Wed, Mar 15, 2017 at 12:44 PM, Till Smejkal
 wrote:
> On Wed, 15 Mar 2017, Andy Lutomirski wrote:
>> > One advantage of VAS segments is that they can be globally queried by user 
>> > programs
>> > which means that VAS segments can be shared by applications that not 
>> > necessarily have
>> > to be related. If I am not mistaken, MAP_SHARED of pure in memory data 
>> > will only work
>> > if the tasks that share the memory region are related (aka. have a common 
>> > parent that
>> > initialized the shared mapping). Otherwise, the shared mapping have to be 
>> > backed by a
>> > file.
>>
>> What's wrong with memfd_create()?
>>
>> > VAS segments on the other side allow sharing of pure in memory data by
>> > arbitrary related tasks without the need of a file. This becomes especially
>> > interesting if one combines VAS segments with non-volatile memory since 
>> > one can keep
>> > data structures in the NVM and still be able to share them between 
>> > multiple tasks.
>>
>> What's wrong with regular mmap?
>
> I never wanted to say that there is something wrong with regular mmap. We just
> figured that with VAS segments you could remove the need to mmap your shared 
> data but
> instead can keep everything purely in memory.

memfd does that.

>
> Unfortunately, I am not at full speed with memfds. Is my understanding 
> correct that
> if the last user of such a file descriptor closes it, the corresponding 
> memory is
> freed? Accordingly, memfd cannot be used to keep data in memory while no 
> program is
> currently using it, can it?

No, stop right here.  If you want to have a bunch of memory that
outlives the program that allocates it, use a filesystem (tmpfs,
hugetlbfs, ext4, whatever).  Don't create new persistent kernel
things.

> VAS segments on the other side would provide a functionality to
> achieve the same without the need of any mounted filesystem. However, I 
> agree, that
> this is just a small advantage compared to what can already be achieved with 
> the
> existing functionality provided by the Linux kernel.

I see this "small advantage" as "resource leak and security problem".

>> This sounds complicated and fragile.  What happens if a heuristically
>> shared region coincides with a region in the "first class address
>> space" being selected?
>
> If such a conflict happens, the task cannot use the first class address space 
> and the
> corresponding system call will return an error. However, with the current 
> available
> virtual address space size that programs can use, such conflicts are probably 
> rare.

A bug that hits 1% of the time is often worse than one that hits 100%
of the time because debugging it is miserable.

--Andy

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc