Re: Question about adding flags to mmap system call / NVIDIA amd64 driver implementation

2009-04-30 Thread John Baldwin
On Tuesday 28 April 2009 7:58:57 pm Julian Elischer wrote:
 Robert Noland wrote:
  On Tue, 2009-04-28 at 16:48 -0500, Kevin Day wrote:
  On Apr 28, 2009, at 3:19 PM, Julian Bangert wrote:
 
  Hello,
 
  I am currently trying to work a bit on the remaining missing  
  feature that NVIDIA requires ( 
http://wiki.freebsd.org/NvidiaFeatureRequests 
or a back post in this ML) -  the improved mmap system call.
 
 
 you might check with jhb (john Baldwin) as I think (from his
 p4 work) that he may be doing something in this area in p4.

After some promptings from Robert and his needs for Xorg recently I did start 
hacking on this again.  However, I haven't tested it yet.  What I have done 
so far is in //depot/user/jhb/pat/... and it does the following:

1) Adds a vm_cache_mode_t.  Each arch defines the valid values for this (I've 
only done the MD portions of this work for amd64 so far).  Every arch must at 
least define a value for VM_CACHE_DEFAULT.

2) Stores a cache mode in each vm_map_entry struct.  This cache mode is then 
passed down to a few pmap functions: pmap_object_init_pt(), 
pmap_enter_object(), and pmap_enter_quick().  Several vm_map routines such as 
vm_map_insert() and vm_map_find() now take a cache mode to use when adding a 
new mapping.

3) Each VM object stores a cache mode as well (defaults to VM_CACHE_DEFAULT).  
When a VM_CACHE_DEFAULT mapping is made of an object, the cache mode of the 
object is used.

4) A new VM object type: OBJT_SG.  This object type has its own pager that is 
sort of like the device pager.  However, instead of invoking d_mmap() to 
determine the physaddr for a given page, it consults a pre-created 
scatter/gather list (an ADT from my branch for working on unmapped buffer 
I/O) to determine the backing physical address for a given virtual address.

5) A new callback for device mmap: d_mmap_single().  One of the features of 
this is that it can return a vm_object_t to be used to satisfy the mmap() 
request instead of using the device's device pager VM object.

6) A new mcache() system call similar to mprotect(), except that it changes 
the cache mode of an address range rather than the protection.  This may not 
be all that useful really.

Given all this, a driver could do the following to map a thing as WC in both 
userland and the kernel:

1) When it learns about a thing it creates a SG list to describe it.  If 
the thing consists of userland pages, it has to wire the pages first.  The 
driver can use vm_allocate_pager() to create a OBJT_SG VM object.  It sets 
the object's cache mode to VM_CACHE_WC (if the arch supports that).

2) When userland wants to map the thing it does a device mmap() with a 
proper length and a file offset that is a cookie for the thing.  The device 
driver's d_mmap_single() recognizes the magic file offset and returns 
the thing's VM object.  Since the mapping info is now part of a normal 
object mapping, it will go away via munmap(), etc.  The driver no longer has 
to do weird gymnastics to invalidate mappings from its device pager 
as transient mappings are no longer stored in the device pager.

3) When the driver wants to map the thing into the kernel, it can use 
vm_map_find() to insert the thing's VM object into kernel map.

And I think that is all there is to it.  I need to test this somehow to make 
sure though, and make sure this meets the needs of Robert and Nvidia.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Question about adding flags to mmap system call / NVIDIA amd64 driver implementation

2009-04-30 Thread Robert Noland
On Thu, 2009-04-30 at 17:36 -0400, John Baldwin wrote:
 On Tuesday 28 April 2009 7:58:57 pm Julian Elischer wrote:
  Robert Noland wrote:
   On Tue, 2009-04-28 at 16:48 -0500, Kevin Day wrote:
   On Apr 28, 2009, at 3:19 PM, Julian Bangert wrote:
  
   Hello,
  
   I am currently trying to work a bit on the remaining missing  
   feature that NVIDIA requires ( 
 http://wiki.freebsd.org/NvidiaFeatureRequests 
 or a back post in this ML) -  the improved mmap system call.
  
  
  you might check with jhb (john Baldwin) as I think (from his
  p4 work) that he may be doing something in this area in p4.
 
 After some promptings from Robert and his needs for Xorg recently I did start 
 hacking on this again.  However, I haven't tested it yet.  What I have done 
 so far is in //depot/user/jhb/pat/... and it does the following:
 
 1) Adds a vm_cache_mode_t.  Each arch defines the valid values for this (I've 
 only done the MD portions of this work for amd64 so far).  Every arch must at 
 least define a value for VM_CACHE_DEFAULT.
 
 2) Stores a cache mode in each vm_map_entry struct.  This cache mode is then 
 passed down to a few pmap functions: pmap_object_init_pt(), 
 pmap_enter_object(), and pmap_enter_quick().  Several vm_map routines such as 
 vm_map_insert() and vm_map_find() now take a cache mode to use when adding a 
 new mapping.
 
 3) Each VM object stores a cache mode as well (defaults to VM_CACHE_DEFAULT). 
  
 When a VM_CACHE_DEFAULT mapping is made of an object, the cache mode of the 
 object is used.
 
 4) A new VM object type: OBJT_SG.  This object type has its own pager that is 
 sort of like the device pager.  However, instead of invoking d_mmap() to 
 determine the physaddr for a given page, it consults a pre-created 
 scatter/gather list (an ADT from my branch for working on unmapped buffer 
 I/O) to determine the backing physical address for a given virtual address.
 
 5) A new callback for device mmap: d_mmap_single().  One of the features of 
 this is that it can return a vm_object_t to be used to satisfy the mmap() 
 request instead of using the device's device pager VM object.
 
 6) A new mcache() system call similar to mprotect(), except that it changes 
 the cache mode of an address range rather than the protection.  This may not 
 be all that useful really.
 
 Given all this, a driver could do the following to map a thing as WC in 
 both 
 userland and the kernel:
 
 1) When it learns about a thing it creates a SG list to describe it.  If 
 the thing consists of userland pages, it has to wire the pages first.  The 
 driver can use vm_allocate_pager() to create a OBJT_SG VM object.  It sets 
 the object's cache mode to VM_CACHE_WC (if the arch supports that).
 
 2) When userland wants to map the thing it does a device mmap() with a 
 proper length and a file offset that is a cookie for the thing.  The device 
 driver's d_mmap_single() recognizes the magic file offset and returns 
 the thing's VM object.  Since the mapping info is now part of a normal 
 object mapping, it will go away via munmap(), etc.  The driver no longer has 
 to do weird gymnastics to invalidate mappings from its device pager 
 as transient mappings are no longer stored in the device pager.
 
 3) When the driver wants to map the thing into the kernel, it can use 
 vm_map_find() to insert the thing's VM object into kernel map.
 
 And I think that is all there is to it.  I need to test this somehow to make 
 sure though, and make sure this meets the needs of Robert and Nvidia.

I think this sounds pretty good...  I need to get my perforce foo up to
speed so I can try it out...

robert.

-- 
Robert Noland rnol...@freebsd.org
FreeBSD


signature.asc
Description: This is a digitally signed message part


Re: Question about adding flags to mmap system call / NVIDIA amd64 driver implementation

2009-04-28 Thread Marius Nünnerich
On Tue, Apr 28, 2009 at 22:19, Julian Bangert julid...@online.de wrote:
 Hello,

 I am currently trying to work a bit on the remaining missing feature that
 NVIDIA requires ( http://wiki.freebsd.org/NvidiaFeatureRequests  or a back
 post in this ML) -  the improved mmap system call.
  For now, I am trying to extend the current system call and implementation
 to add cache control ( the type of memory caching used) . This feature
 inherently is very architecture specific- but it can lead to enormous
 performance improvements for memmapped devices ( useful for drivers, etc). I
 would do this at the user site by adding 3 flags to the mmap system call
 (MEM_CACHE__ATTR1 to MEM_CACHE__ATTR3 ) which are a single octal digit
 corresponding to the various caching options ( like Uncacheable,Write
 Combining, etc... ) with the same numbers as the PAT_* macros from
 i386/include/specialreg.h except that the value 0 ( PAT_UNCACHEABLE ) is
 replaced with value 2 ( undefined), whereas value 0 ( all 3 flags cleared)
 is assigned the meaning feature not used, use default cache control.
 For each cache behaviour there would of course also be a macro expanding to
 the rigth combination of these flags for enhanced useability.

Hmm, I don't like that. What about using something like PAT_WC
directly for the userland? Afaik a userland app that uses stuff like
this is md anyway.

  The mmap system call would, if any of these flags are set, decode them and
 get a corresponding PAT_* value, perform the mapping and then call into the
 pmap module to modify the cache attributes for every page.

  My first question is if there is a more elegant way of solving that - the 3
 flags would be architecture specific ( they could be used for other things
 on other architectures though if need be ) and I do not know the policy on
 architecture specific syscall flags, therefore I appreciate any input.

 The second question goes to all those great VM/pmap gurus out there: As far
 as I understand, at the moment the pmap_change_attr can only cange the cache
 flags for kernel pages. Is there a particular reason why this function might
 not be adapted/extended to userspace mappings? If not, I would either add a
 new function to iterate over all pages and set cache flags for a particular
 region or add a new member (possibly just add the 3 flags again ? ) to the
 md part of vm_page_t. Or one could just keep track and return errors as soon
 as someone tries to map a memory region ( cache-customized mapping is
 usually done to device memory ) already mapped with  different cache
 behaviour.

Do you know how other OS handle this stuff? Maybe there is some
inspiration there for a clean interface. I'm not sure if I remember
correctly but there is something in my mind that we must take care
that no virtual pages have different PAT settings for the same
physical page. Maybe I read something like this in the AMD's
documentation of PAT. Sorry I don't remember exactly but perhaps
someone else can explain it better.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Question about adding flags to mmap system call / NVIDIA amd64 driver implementation

2009-04-28 Thread Kevin Day


On Apr 28, 2009, at 3:19 PM, Julian Bangert wrote:


Hello,

I am currently trying to work a bit on the remaining missing  
feature that NVIDIA requires ( http://wiki.freebsd.org/NvidiaFeatureRequests 
  or a back post in this ML) -  the improved mmap system call.
For now, I am trying to extend the current system call and  
implementation to add cache control ( the type of memory caching  
used) . This feature inherently is very architecture specific- but  
it can lead to enormous performance improvements for memmapped  
devices ( useful for drivers, etc). I would do this at the user site  
by adding 3 flags to the mmap system call (MEM_CACHE__ATTR1 to  
MEM_CACHE__ATTR3 ) which are a single octal digit corresponding to  
the various caching options ( like Uncacheable,Write Combining,  
etc... ) with the same numbers as the PAT_* macros from i386/include/ 
specialreg.h except that the value 0 ( PAT_UNCACHEABLE ) is replaced  
with value 2 ( undefined), whereas value 0 ( all 3 flags cleared) is  
assigned the meaning feature not used, use default cache control.
For each cache behaviour there would of course also be a macro  
expanding to the rigth combination of these flags for enhanced  
useability.


The mmap system call would, if any of these flags are set, decode  
them and get a corresponding PAT_* value, perform the mapping and  
then call into the pmap module to modify the cache attributes for  
every page.


Have you looked at mem(4) yet?

 Several architectures allow attributes to be associated with  
ranges of
 physical memory.  These attributes can be manipulated via  
ioctl() calls
 performed on /dev/mem.  Declarations and data types are to be  
found in

 sys/memrange.h.

 The specific attributes, and number of programmable ranges may  
vary

 between architectures.  The full set of supported attributes is:

 MDF_UNCACHEABLE
 The region is not cached.

 MDF_WRITECOMBINE
 Writes to the region may be combined or performed out of  
order.


 MDF_WRITETHROUGH
 Writes to the region are committed synchronously.

 MDF_WRITEBACK
 Writes to the region are committed asynchronously.

 MDF_WRITEPROTECT
 The region cannot be written to.

This requires knowledge of the physical addresses, but I believe  
that's probably already necessary for what it sounds like you're  
trying to accomplish.


Back in the FreeBSD-3.0 days, I was writing a custom driver for an AGP  
graphics controller, and setting the MTRR flags for the exposed buffer  
was a definite improvement (200-1200% faster in most cases).


-- Kevin

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Question about adding flags to mmap system call / NVIDIA amd64 driver implementation

2009-04-28 Thread Robert Noland
On Tue, 2009-04-28 at 16:48 -0500, Kevin Day wrote:
 On Apr 28, 2009, at 3:19 PM, Julian Bangert wrote:
 
  Hello,
 
  I am currently trying to work a bit on the remaining missing  
  feature that NVIDIA requires ( 
  http://wiki.freebsd.org/NvidiaFeatureRequests 
or a back post in this ML) -  the improved mmap system call.
  For now, I am trying to extend the current system call and  
  implementation to add cache control ( the type of memory caching  
  used) . This feature inherently is very architecture specific- but  
  it can lead to enormous performance improvements for memmapped  
  devices ( useful for drivers, etc). I would do this at the user site  
  by adding 3 flags to the mmap system call (MEM_CACHE__ATTR1 to  
  MEM_CACHE__ATTR3 ) which are a single octal digit corresponding to  
  the various caching options ( like Uncacheable,Write Combining,  
  etc... ) with the same numbers as the PAT_* macros from i386/include/ 
  specialreg.h except that the value 0 ( PAT_UNCACHEABLE ) is replaced  
  with value 2 ( undefined), whereas value 0 ( all 3 flags cleared) is  
  assigned the meaning feature not used, use default cache control.
  For each cache behaviour there would of course also be a macro  
  expanding to the rigth combination of these flags for enhanced  
  useability.
 
  The mmap system call would, if any of these flags are set, decode  
  them and get a corresponding PAT_* value, perform the mapping and  
  then call into the pmap module to modify the cache attributes for  
  every page.
 
 Have you looked at mem(4) yet?
 
   Several architectures allow attributes to be associated with  
 ranges of
   physical memory.  These attributes can be manipulated via  
 ioctl() calls
   performed on /dev/mem.  Declarations and data types are to be  
 found in
   sys/memrange.h.
 
   The specific attributes, and number of programmable ranges may  
 vary
   between architectures.  The full set of supported attributes is:
 
   MDF_UNCACHEABLE
   The region is not cached.
 
   MDF_WRITECOMBINE
   Writes to the region may be combined or performed out of  
 order.
 
   MDF_WRITETHROUGH
   Writes to the region are committed synchronously.
 
   MDF_WRITEBACK
   Writes to the region are committed asynchronously.
 
   MDF_WRITEPROTECT
   The region cannot be written to.
 
 This requires knowledge of the physical addresses, but I believe  
 that's probably already necessary for what it sounds like you're  
 trying to accomplish.
 
 Back in the FreeBSD-3.0 days, I was writing a custom driver for an AGP  
 graphics controller, and setting the MTRR flags for the exposed buffer  
 was a definite improvement (200-1200% faster in most cases).

This is MTRR, which is what we currently do, when we can.  The issue is
that often times the BIOS maps ranges in a way that prevents us from
using MTRR.  This is generally ideal for things like agp and
framebuffers when it works, since they have a specific physical range
that you want to work with.

With PCI(E) cards it isn't as cut and dry... In the ATI and Nouveau
cases, we map scatter gather pages into the GART, which generally are
allocated using contigmalloc behind the scenes, so it is also possible
for it to work in that case. Moving forward, we may actually be mapping
random pages into and out of the GART (GEM / TTM).  In those cases we
really don't have a large contiguous range that we could set MTRR on.
Intel CPUs are limited to 8 MTRR registers for the entire system also,
so that can become an issue quickly if you are trying to manipulate
several areas of memory.  With PAT we can manipulate the caching
properties on a page level.  PAT also allows for some overlap conditions
that MTRR won't, such as mapping a page write-combining on top on an
UNCACHEABLE MTRR.

jhb@ has started some work on this, since I've been badgering him about
this recently as well.

robert.

 -- Kevin
 
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
-- 
Robert Noland rnol...@freebsd.org
FreeBSD


signature.asc
Description: This is a digitally signed message part


Re: Question about adding flags to mmap system call / NVIDIA amd64 driver implementation

2009-04-28 Thread Julian Elischer

Robert Noland wrote:

On Tue, 2009-04-28 at 16:48 -0500, Kevin Day wrote:

On Apr 28, 2009, at 3:19 PM, Julian Bangert wrote:


Hello,

I am currently trying to work a bit on the remaining missing  
feature that NVIDIA requires ( http://wiki.freebsd.org/NvidiaFeatureRequests 
  or a back post in this ML) -  the improved mmap system call.



you might check with jhb (john Baldwin) as I think (from his
p4 work) that he may be doing something in this area in p4.


For now, I am trying to extend the current system call and  
implementation to add cache control ( the type of memory caching  
used) . This feature inherently is very architecture specific- but  
it can lead to enormous performance improvements for memmapped  
devices ( useful for drivers, etc). I would do this at the user site  
by adding 3 flags to the mmap system call (MEM_CACHE__ATTR1 to  
MEM_CACHE__ATTR3 ) which are a single octal digit corresponding to  
the various caching options ( like Uncacheable,Write Combining,  
etc... ) with the same numbers as the PAT_* macros from i386/include/ 
specialreg.h except that the value 0 ( PAT_UNCACHEABLE ) is replaced  
with value 2 ( undefined), whereas value 0 ( all 3 flags cleared) is  
assigned the meaning feature not used, use default cache control.
For each cache behaviour there would of course also be a macro  
expanding to the rigth combination of these flags for enhanced  
useability.


The mmap system call would, if any of these flags are set, decode  
them and get a corresponding PAT_* value, perform the mapping and  
then call into the pmap module to modify the cache attributes for  
every page.

Have you looked at mem(4) yet?

  Several architectures allow attributes to be associated with  
ranges of
  physical memory.  These attributes can be manipulated via  
ioctl() calls
  performed on /dev/mem.  Declarations and data types are to be  
found in

  sys/memrange.h.

  The specific attributes, and number of programmable ranges may  
vary

  between architectures.  The full set of supported attributes is:

  MDF_UNCACHEABLE
  The region is not cached.

  MDF_WRITECOMBINE
  Writes to the region may be combined or performed out of  
order.


  MDF_WRITETHROUGH
  Writes to the region are committed synchronously.

  MDF_WRITEBACK
  Writes to the region are committed asynchronously.

  MDF_WRITEPROTECT
  The region cannot be written to.

This requires knowledge of the physical addresses, but I believe  
that's probably already necessary for what it sounds like you're  
trying to accomplish.


Back in the FreeBSD-3.0 days, I was writing a custom driver for an AGP  
graphics controller, and setting the MTRR flags for the exposed buffer  
was a definite improvement (200-1200% faster in most cases).


This is MTRR, which is what we currently do, when we can.  The issue is
that often times the BIOS maps ranges in a way that prevents us from
using MTRR.  This is generally ideal for things like agp and
framebuffers when it works, since they have a specific physical range
that you want to work with.

With PCI(E) cards it isn't as cut and dry... In the ATI and Nouveau
cases, we map scatter gather pages into the GART, which generally are
allocated using contigmalloc behind the scenes, so it is also possible
for it to work in that case. Moving forward, we may actually be mapping
random pages into and out of the GART (GEM / TTM).  In those cases we
really don't have a large contiguous range that we could set MTRR on.
Intel CPUs are limited to 8 MTRR registers for the entire system also,
so that can become an issue quickly if you are trying to manipulate
several areas of memory.  With PAT we can manipulate the caching
properties on a page level.  PAT also allows for some overlap conditions
that MTRR won't, such as mapping a page write-combining on top on an
UNCACHEABLE MTRR.

jhb@ has started some work on this, since I've been badgering him about
this recently as well.

robert.


-- Kevin

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org