[uClinux-dev] [PATCH] NOMMU: Stub out vm_get_page_prot() if there's no MMU

2010-08-26 Thread David Howells
Stub out vm_get_page_prot() if there's no MMU.

This was added by commit:

commit 804af2cf6e7af31d2e664b54e6579b531dbd
Author: Hugh Dickins h...@veritas.com
Date:   Wed Jul 26 21:39:49 2006 +0100
Subject: [AGPGART] remove private page protection map

and is used in commit:

commit c07fbfd17e614a76b194f371c5331e21e6cffb54
Author: Daniel De Graaf dgde...@tycho.nsa.gov
Date:   Tue Aug 10 18:02:45 2010 -0700
Subject: fbmem: VM_IO set, but not propagated

in the fbmem video driver, but the function doesn't exist on NOMMU, resulting
in an undefined symbol at link time.

Signed-off-by: David Howells dhowe...@redhat.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---

 include/linux/mm.h |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 831c693..e6b1210 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1363,7 +1363,15 @@ static inline unsigned long vma_pages(struct 
vm_area_struct *vma)
return (vma-vm_end - vma-vm_start)  PAGE_SHIFT;
 }
 
+#ifdef CONFIG_MMU
 pgprot_t vm_get_page_prot(unsigned long vm_flags);
+#else
+static inline pgprot_t vm_get_page_prot(unsigned long vm_flags)
+{
+   return __pgprot(0);
+}
+#endif
+
 struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
 int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t);

___
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org
To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev


Re: [uClinux-dev] [PATCH 1/3] MPU support

2010-08-26 Thread Steve Longerbeam

On 08/24/2010 03:43 PM, Mike Frysinger wrote:



Apparently the ARM MPU's are not nearly as capable as the blackfin MPU.
The ARM MPU deals with whole regions, and typically only up to 8 memory
regions can be controlled by the MPU at any one time, each region having
one protection setting (r/w/x for kernel mode, r/w/x for user mode). Not
nearly as fine grained as per-page.
 

i dont quite understand what you mean by whole region.  if you define a
region as 4KiB, dont you get the granularity expected ?  could you describe
the flexibility/restrictions of this a little more (i'm not an ARM core guy) ?

the Blackfin MPU has separate insn/data TLBs, and each TLB has 16 entries
(PTEs i believe is the common naming).  each PTE has supervisor rwx and
usermode rwx permissions.  further, each PTE has a size field which may be
1KiB, 4KiB, 1MiB, or 4MiB.

   


ok, sounds like the blackfin MPU has all the features of a true MMU but 
without the v--p address translation.


The ARM MPU, using MMU language, has an 8-entry TLB (some ARM MPUs have 
separate insn/data TLBs, others don't). But here's the kicker, the 
entire address space can only be described by 8 PTE's (aka MPU regions), 
total! So actually there is no need for a page table in main memory at 
all, since the TLB already has enough entries to cover the entire 
address space.




i guess we cheat a little and we lock a PTE for the kernel itself so that
it'll always be covered so we can process PTE misses without triggering a miss
(nested exceptions).  i'm not entirely familiar with the exact gory details of
other arches, so i cant say how unique we are in this regard.
   


The ARM MPU can do something similar. MPU regions can overlap, and a 
simple priority scheme is used to decide which region's permissions 
apply to a memory access that overlaps (higher numbered regions have 
higher priority). So on ARM we can lock a PTE/region, by defining 
region 0 to cover the entire address space, and give kernel read/write 
access, user no access. And region 0 is never overwritten or disabled. 
So if an access is made to an address not described by any other region, 
region 0 permissions are applied to the access (and a protection fault 
is generated if the access was made in user mode).


Note that, with region 0 locked, that only leaves 7 PTEs/regions that 
can be swapped in and out for user processes. So with the ARM MPU, we 
can't create a region for every mmap(), we would run out of available 
entries. So we have to use a trade-off, only create an MPU region for 
XIP file mappings (text). All other mappings (non-XIP file mappings and 
anonymous mappings) allocate from a common user memory pool (which is 
another patch I plan to submit).


So another locked region is used (region 1) that covers this user memory 
pool. User mode has read/write access of course, as well as kernel. And 
so we actually now only have 6 regions that user processes can play with.


What this trade-off means is that we have process-to-kernel protection, 
but not process-to-process protection.



   

So ARM could use something higher-level than protect_page(), something
like protect_region(start, end, flags), or just all of protect_vma()
could be moved to include/asm/mmu_context.h. That way ARM can operate on
the whole region, while blackfin would add protection for every page in
the VMA as it is doing now.
 

i think you could use the existing framework, and perhaps optionally extend
it.  maybe if i knew a little more about your regions, i could suggest
something else.

   

I'll work on another patch that better merges my original ARM MPU work
into the blackfin work, and resubmit.
 

great, thanks

   

Btw, I probably should be working in whatever git tree people are
submitting patches against, rather than the 20100628 release. Which git
tree should I submit against?
 

that's hard to say.  if current mainline (2.6.36-rc2) has everything you need
to boot a working system, then that is probably the place to base your work.
i understand though that the arm/nommu work is taking a while to get into
mainline, so that might not be feasible.  in which case, you should find the
very latest uclinux tree and use that.

i know people like to base their work off a release, but in order to get
merged, the focus has to be on the latest development tree.
   


ok. Greg says that the core non-MMU stuff is in mainline now, so I'll 
work from mainline.


Steve

___
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org
To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev


Re: [uClinux-dev] [PATCH 1/3] MPU support

2010-08-26 Thread Mike Frysinger
On Thursday, August 26, 2010 14:19:41 Steve Longerbeam wrote:
 The ARM MPU can do something similar. MPU regions can overlap, and a
 simple priority scheme is used to decide which region's permissions
 apply to a memory access that overlaps (higher numbered regions have
 higher priority). So on ARM we can lock a PTE/region, by defining
 region 0 to cover the entire address space, and give kernel read/write
 access, user no access. And region 0 is never overwritten or disabled.
 So if an access is made to an address not described by any other region,
 region 0 permissions are applied to the access (and a protection fault
 is generated if the access was made in user mode).
 
 Note that, with region 0 locked, that only leaves 7 PTEs/regions that
 can be swapped in and out for user processes. So with the ARM MPU, we
 can't create a region for every mmap(), we would run out of available
 entries. So we have to use a trade-off, only create an MPU region for
 XIP file mappings (text). All other mappings (non-XIP file mappings and
 anonymous mappings) allocate from a common user memory pool (which is
 another patch I plan to submit).

i dont understand why running out of entries is a problem.  we run out of 
entries too as you cant cover 512MiB of SDRAM with 16 entries.  we simply take 
an exception when this occurs and in the exception handler, we use a basic 
round-robin replacement scheme to install a valid PTE (assuming of course the 
user has a valid mapping for the excepting address).  then we return to the 
user process and it continues on.

why wont this scheme work for you too ?
-mike


signature.asc
Description: This is a digitally signed message part.
___
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org
To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev

Re: [uClinux-dev] [PATCH 1/3] MPU support

2010-08-26 Thread Steve Longerbeam

On 08/26/2010 12:04 PM, Mike Frysinger wrote:

On Thursday, August 26, 2010 14:19:41 Steve Longerbeam wrote:
   

The ARM MPU can do something similar. MPU regions can overlap, and a
simple priority scheme is used to decide which region's permissions
apply to a memory access that overlaps (higher numbered regions have
higher priority). So on ARM we can lock a PTE/region, by defining
region 0 to cover the entire address space, and give kernel read/write
access, user no access. And region 0 is never overwritten or disabled.
So if an access is made to an address not described by any other region,
region 0 permissions are applied to the access (and a protection fault
is generated if the access was made in user mode).

Note that, with region 0 locked, that only leaves 7 PTEs/regions that
can be swapped in and out for user processes. So with the ARM MPU, we
can't create a region for every mmap(), we would run out of available
entries. So we have to use a trade-off, only create an MPU region for
XIP file mappings (text). All other mappings (non-XIP file mappings and
anonymous mappings) allocate from a common user memory pool (which is
another patch I plan to submit).
 

i dont understand why running out of entries is a problem.  we run out of
entries too as you cant cover 512MiB of SDRAM with 16 entries.  we simply take
an exception when this occurs and in the exception handler, we use a basic
round-robin replacement scheme to install a valid PTE (assuming of course the
user has a valid mapping for the excepting address).  then we return to the
user process and it continues on.

why wont this scheme work for you too ?
   


no, you're right, that would work. Of course, it would have a bigger 
memory usage for the page tables, and a performance hit (with my 
implementation when a process is running there are no faults). But it is 
more inline with how MMU kernels work, and it adds process-to-process 
protection too.


Steve

___
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org
To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev


Re: [uClinux-dev] [PATCH 1/3] MPU support

2010-08-26 Thread Jamie Lokier
Mike Frysinger wrote:
 as it stands, this breaks all non-arm NOMMU ports.  the patch will need to be 
 broken up into arm-specific and arm-independent parts.
 
 the common code changes will need justification as to why they exist at all.  
 we're doing MPU on Blackfin/nommu today without any of these.  we support 
 pretty much all the same features of a MMU system short of virtual memory -- 
 4k pages, RWX granularity, process to process protection, process to kernel 
 protection (include kernel modules), kernel XIP, and userspace XIP.
 
 further, why did you go with CONFIG_CPU_CP15_MPU ?  there is already a 
 CONFIG_MPU option that is used in common nommu code.

While we're here, I'd better mention that I have a mostly ARM-compatible
CPU here, with an MPU that isn't like the ARM ones - but it does use CP15 :-)

-- Jamie
___
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org
To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev


Re: [uClinux-dev] [PATCH 1/3] MPU support

2010-08-26 Thread Mike Frysinger
On Thursday, August 26, 2010 18:45:08 Steve Longerbeam wrote:
 On 08/26/2010 12:04 PM, Mike Frysinger wrote:
  On Thursday, August 26, 2010 14:19:41 Steve Longerbeam wrote:
  The ARM MPU can do something similar. MPU regions can overlap, and a
  simple priority scheme is used to decide which region's permissions
  apply to a memory access that overlaps (higher numbered regions have
  higher priority). So on ARM we can lock a PTE/region, by defining
  region 0 to cover the entire address space, and give kernel read/write
  access, user no access. And region 0 is never overwritten or disabled.
  So if an access is made to an address not described by any other region,
  region 0 permissions are applied to the access (and a protection fault
  is generated if the access was made in user mode).
  
  Note that, with region 0 locked, that only leaves 7 PTEs/regions that
  can be swapped in and out for user processes. So with the ARM MPU, we
  can't create a region for every mmap(), we would run out of available
  entries. So we have to use a trade-off, only create an MPU region for
  XIP file mappings (text). All other mappings (non-XIP file mappings and
  anonymous mappings) allocate from a common user memory pool (which is
  another patch I plan to submit).
  
  i dont understand why running out of entries is a problem.  we run out of
  entries too as you cant cover 512MiB of SDRAM with 16 entries.  we simply
  take an exception when this occurs and in the exception handler, we use
  a basic round-robin replacement scheme to install a valid PTE (assuming
  of course the user has a valid mapping for the excepting address).  then
  we return to the user process and it continues on.
  
  why wont this scheme work for you too ?
 
 no, you're right, that would work. Of course, it would have a bigger
 memory usage for the page tables, and a performance hit (with my
 implementation when a process is running there are no faults). But it is
 more inline with how MMU kernels work, and it adds process-to-process
 protection too.

we used a bitmap to save on memory and execution.  each bit representing a 4k 
chunk.  this is the page_rwx_mask and similar stuff that appears in the 
Blackfin asm/mmu*.h headers.

have you done performance measurements to see the overhead with the MPU turned 
on in your scheme compared to off ?  doing something like a ffmpeg decode to 
another file.  if the performance trade offs of your current scheme (per-
mapping) is significant compared to the classic per-page, then it is worth 
while to extend the MPU Kconfig option so people can select per-page or per-
mapping schema.

btw, i dont think it was mentioned earlier, but these ranges you're working 
with ... do they have alignment requirements ?  the thing about Blackfin PTEs 
is that they must be aligned according to the size they represent.  so if it 
is a 4KiB mapping, it must be aligned to 4KiB.  if it's 1MiB, it must be 
aligned to 1MiB.  it'd be nice if that alignment restriction wasnt there as we 
could then do a flexible range mapping similar to what you have.
-mike


signature.asc
Description: This is a digitally signed message part.
___
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org
To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev

Re: [uClinux-dev] [PATCH 1/3] MPU support

2010-08-26 Thread Steve Longerbeam

On 08/26/2010 06:07 PM, Mike Frysinger wrote:

On Thursday, August 26, 2010 18:45:08 Steve Longerbeam wrote:
   

On 08/26/2010 12:04 PM, Mike Frysinger wrote:
 

On Thursday, August 26, 2010 14:19:41 Steve Longerbeam wrote:
   

The ARM MPU can do something similar. MPU regions can overlap, and a
simple priority scheme is used to decide which region's permissions
apply to a memory access that overlaps (higher numbered regions have
higher priority). So on ARM we can lock a PTE/region, by defining
region 0 to cover the entire address space, and give kernel read/write
access, user no access. And region 0 is never overwritten or disabled.
So if an access is made to an address not described by any other region,
region 0 permissions are applied to the access (and a protection fault
is generated if the access was made in user mode).

Note that, with region 0 locked, that only leaves 7 PTEs/regions that
can be swapped in and out for user processes. So with the ARM MPU, we
can't create a region for every mmap(), we would run out of available
entries. So we have to use a trade-off, only create an MPU region for
XIP file mappings (text). All other mappings (non-XIP file mappings and
anonymous mappings) allocate from a common user memory pool (which is
another patch I plan to submit).
 

i dont understand why running out of entries is a problem.  we run out of
entries too as you cant cover 512MiB of SDRAM with 16 entries.  we simply
take an exception when this occurs and in the exception handler, we use
a basic round-robin replacement scheme to install a valid PTE (assuming
of course the user has a valid mapping for the excepting address).  then
we return to the user process and it continues on.

why wont this scheme work for you too ?
   

no, you're right, that would work. Of course, it would have a bigger
memory usage for the page tables, and a performance hit (with my
implementation when a process is running there are no faults). But it is
more inline with how MMU kernels work, and it adds process-to-process
protection too.
 

we used a bitmap to save on memory and execution.  each bit representing a 4k
chunk.  this is the page_rwx_mask and similar stuff that appears in the
Blackfin asm/mmu*.h headers.
   


ok, I'll take a closer look.


have you done performance measurements to see the overhead with the MPU turned
on in your scheme compared to off ?  doing something like a ffmpeg decode to
another file.  if the performance trade offs of your current scheme (per-
mapping) is significant compared to the classic per-page, then it is worth
while to extend the MPU Kconfig option so people can select per-page or per-
mapping schema.
   


yes, if performance degrades a lot for per-page compared to my current 
scheme, that would be worthwhile to offer both options. OTOH, other 
people may have different requirements (better protection being more 
important than memory footprint or performance, or vice-versa). So it 
might make sense to offer both options anyway.



btw, i dont think it was mentioned earlier, but these ranges you're working
with ... do they have alignment requirements ?


yes, but it varies by ARM cores. For instance, on the SC100, the MPU 
regions must be 64-byte aligned, but the ARM940T has the same alignment 
requirement as blackfin (alignment = size).



   the thing about Blackfin PTEs
is that they must be aligned according to the size they represent.  so if it
is a 4KiB mapping, it must be aligned to 4KiB.  if it's 1MiB, it must be
aligned to 1MiB.  it'd be nice if that alignment restriction wasnt there as we
could then do a flexible range mapping similar to what you have.
-mike
   


___
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org
To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev


Re: [uClinux-dev] [PATCH 1/3] MPU support

2010-08-26 Thread Mike Frysinger
On Thursday, August 26, 2010 21:40:13 Steve Longerbeam wrote:
 On 08/26/2010 06:07 PM, Mike Frysinger wrote:
  have you done performance measurements to see the overhead with the MPU
  turned on in your scheme compared to off ?  doing something like a
  ffmpeg decode to another file.  if the performance trade offs of your
  current scheme (per- mapping) is significant compared to the classic
  per-page, then it is worth while to extend the MPU Kconfig option so
  people can select per-page or per- mapping schema.
 
 yes, if performance degrades a lot for per-page compared to my current
 scheme, that would be worthwhile to offer both options. OTOH, other
 people may have different requirements (better protection being more
 important than memory footprint or performance, or vice-versa). So it
 might make sense to offer both options anyway.

iirc, in the tests we did, doing a cpu intensive task didnt suffer all that 
much.  but doing a memory intensive task (like ffmpeg decoding), we saw a ~10x 
slowdown :(.  so we default it to off but keep it around for debugging since 
the performance is certainly good enough for that.

however, we also didnt profile the whole stack, so there might be some places 
we can squeeze a bit more performance out.
-mike


signature.asc
Description: This is a digitally signed message part.
___
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org
To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev