Re: Aperture mapping under GEM
Managing a fake linear address space just to match some existing arbitrary API requirements is insane. Creating the right interface for my UMA environment is my goal. I'm not sure precisely what that API should be, but at least this one is obviously wrong. Isn't that also what you are trying to do with GEM though.. match GPU objects to the file interface. Now the thing is if you don't consider GTT mapping to be the same as normal mapping, you need an Intel specifc GTT map call, however that means a do_mmap you don't intend on ever changing to a real mmap call. Now you need to justify that to the vfs people. I do wonder if you are better having an alternate open method that flags the mmap different, but that doesn;'t make much sense to me either. However creating new MAP_GTT means berakign the generic interface. Dave. I want to handle thousands of discrete objects and be able to map them independently into my process, and bind them independently to the GTT. Only a few will ever be mapped to my process and while all of them will be bound to the GTT at times, only a subset will fit at any particular time. - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Aperture mapping under GEM
Keith Packard wrote: and that's why TTM needs to manage a fake linear address space for the drm fd. Managing a fake linear address space just to match some existing arbitrary API requirements is insane. Creating the right interface for my UMA environment is my goal. I'm not sure precisely what that API should be, but at least this one is obviously wrong. I'm not sure I agree. What we're discussing is really per buffer object address space or per device address space. With the current GEM implementation, the address space is per buffer object, and if this were done correctly you'd duplicate the shmemfs filesystem to make a drmfs filesystem where you have complete control over creation and mmap-ing and do not need to create special cases to work around the shmemfs implementation. It's not impossible that you can overload the shmemfs mmap / fault methods of the shmemfs filesystem, but what you're suggesting isn't really what I'd refer to as the cleanest and most natural interface. Since you were asking for comments, I'd strongly recommend avoiding trying to manipulate ptes from the driver. The other approach is to use one address space per device. An address space is obviously needed to be able to do unmap_mapping_range, read, write, seek etc. It's not an arbitrary API requirement. It's the linux file operations API requirement. Since the address space is per device it needs to be managed. I see nothing wrong with that, except you don't get a filesystem entry per buffer, and you need to be aware what the limitations are: that the address space may become fragmented and resizing becomes complicated. Given this, it's possible to make a choice what fits the driver best. A lightweight driver that needs to manipulate ptes to account for caching and placement would probably use the latter method, which is what TTM currently does. You've chosen the first and is faced with either 1) Hack ptes from the driver. 2) Try to overload the shmemfs mmap / fault methods. 3) Implement a new drmfs filesystem. /Thomas - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Aperture mapping under GEM
On Sun, 2008-08-03 at 10:53 +0200, Thomas Hellström wrote: With the current GEM implementation, the address space is per buffer object, and if this were done correctly you'd duplicate the shmemfs filesystem to make a drmfs filesystem where you have complete control over creation and mmap-ing and do not need to create special cases to work around the shmemfs implementation. I am not working around the shmem implementation at all; I'm using regular shmem objects just as they are. The only thing I'm working around at this point is the artificial kernel limit of 1024 fds. With more fds, I could simply allocate shmem objects and pass them into my environment. Once the objects are allocated, I use regular kernel APIs to map those pages to my device. I'd strongly recommend avoiding trying to manipulate ptes from the driver. I don't touch the shmem PTEs at all. The mapping I'm adding is entirely separate from shmem and involves mapping portions of the GTT aperture which just happen to contain pointers to shmem-allocated pages. The other approach is to use one address space per device. This would require constructing an entirely artificial linear space for my objects. You then have to track this per-device linear address for each object and pass that into the mmap call. And, what does it mean when you ask to mmap a range spanning multiple objects? 1) Hack ptes from the driver. Nope, not doing this; the GTT-based mapping would allocate separate PTEs using the existing standard device mapping APIs. 2) Try to overload the shmemfs mmap / fault methods. I don't need to do this either; shmem handles its pages just fine. 3) Implement a new drmfs filesystem. I would prefer to use the existing shmem mechanisms instead of precisely duplicating them. -- [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Aperture mapping under GEM
On Sun, 2008-08-03 at 08:07 +0100, Dave Airlie wrote: Isn't that also what you are trying to do with GEM though.. match GPU objects to the file interface. Yes, with a 1-1 mapping between GPU objects and file objects. You can use the normal read/write/mmap API on them. The reason we aren't using fds now is just that the kernel cannot handle this many fds per process. Now the thing is if you don't consider GTT mapping to be the same as normal mapping, you need an Intel specifc GTT map call, I want to map these pages in two different ways, the first way is through normal WB mapping which provides the expected memory semantics (cached reads and writes). The second is to map them through the GTT which offers two important benefits: 1) WC mapping which avoids the need to clflush when passing data from application to GPU. 2) Linearized access to tiled surfaces. This uses the tile swizzling HW in the GPU to construct a synthetic linear view of the tiled surface which is currently required when doing SW rendering from inside the X server. however that means a do_mmap you don't intend on ever changing to a real mmap call. Now you need to justify that to the vfs people. Nope, I can use a 'normal' mmap call and have two different address ranges within my object, one which maps the pages directly and one which maps them through the GTT. No flags needed here. I do wonder if you are better having an alternate open method that flags the mmap different, but that doesn;'t make much sense to me either. However creating new MAP_GTT means berakign the generic interface. I want to allow the mapping type to be selected on a per-use basis, not be an attribute of the file handle. I don't generally know up-front what kind of mapping will be needed. I could have a magic 'dup' ioctl that gave me a new FD that would do the new mapping type, and use that. -- [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Aperture mapping under GEM
Keith Packard wrote: The other approach is to use one address space per device. This would require constructing an entirely artificial linear space for my objects. You then have to track this per-device linear address for each object and pass that into the mmap call. And, what does it mean when you ask to mmap a range spanning multiple objects? That's clearly an illegal operation and would return an error. 1) Hack ptes from the driver. Nope, not doing this; the GTT-based mapping would allocate separate PTEs using the existing standard device mapping APIs. But what happens when you unbind an object from the GTT while you map that data through the GTT? In your original email you stated that you'd walk through the VMAs and modify the PTEs. If you want to avoid that, you need to run unmap_mapping_range() on an address space. What address space would that be? /Thomas - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Aperture mapping under GEM
Yes, with a 1-1 mapping between GPU objects and file objects. You can use the normal read/write/mmap API on them. The reason we aren't using fds now is just that the kernel cannot handle this many fds per process. Well it can now, we just need to fix the X server :) I want to map these pages in two different ways, the first way is through normal WB mapping which provides the expected memory semantics (cached reads and writes). The second is to map them through the GTT which offers two important benefits: 1) WC mapping which avoids the need to clflush when passing data from application to GPU. 2) Linearized access to tiled surfaces. This uses the tile swizzling HW in the GPU to construct a synthetic linear view of the tiled surface which is currently required when doing SW rendering from inside the X server. however that means a do_mmap you don't intend on ever changing to a real mmap call. Now you need to justify that to the vfs people. Nope, I can use a 'normal' mmap call and have two different address ranges within my object, one which maps the pages directly and one which maps them through the GTT. No flags needed here. Well bit-31 is now a flag, just under an assumed named with a fake passport. The question is whether this matters at all, or whether Intel driver can just do it that way and have intel specific hooks into the shmem mmap/fault code. For radeon this interface would suck, an object can be VRAM, main RAM, GTT, tiled, endian swapped, etc. but if I don't care about that, if Intel were to use mmap2 then in theory you could use an even higher bit than bit 31. Dave - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Aperture mapping under GEM
On Mon, 2008-08-04 at 05:13 +0100, Dave Airlie wrote: Well it can now, we just need to fix the X server :) Yeah, I just discovered that today. Weird that the kernel was fixed between the last time I looked and now though; NR_OPEN had been 1024 for many years prior. However, it's not just fixing the X server -- we'd have to fix every GL application as well to not assume their fds were always in a narrow range. Anyone care to wager how many 3D apps still use select? We could get higher fds just by using dup2 and managing fds up in user space. Making sure we didn't step on valid fds would be a pain. Plus, we're still stuck with increasing the max fd for each DRI application. I'm sure a patch which had DRM increase this from inside the kernel with no protections would be welcome by the kernel community. Well bit-31 is now a flag, just under an assumed named with a fake passport. No argument; if there were a flag parameter to mmap, we'd just use it. Given that we're using ioctls instead of raw syscalls, it seems like we could just use a flag were it not for the lack of any additional parameter to the underlying mmap fop. Lacking this, we're stuck using a kludge (either fake linearized allocs from the drm fd, or bit 31 on the gem object), or creating a separate per-object fd (and underlying file/dentry/inode) for this other mapping. Of these, the kludge plan seems more efficient, and I do prefer the per-object kludge to the drm-fd kludge, but I'm not that tied to either; the underlying code would all be the same, except for how to identify which gem object the user was talking about. The question is whether this matters at all, or whether Intel driver can just do it that way and have intel specific hooks into the shmem mmap/fault code. I don't think so; I can wrap the mmap fop easily enough and substitute my own vma initialization. To invalidate the mapping after pulling the object from the GTT, it looks like zap_page_range will work, then my fault handler would get called on access to bind back to the GTT and re-validate the map. Or so it seems to me; I haven't tried it yet, and I won't have time to do that for a couple of weeks. For radeon this interface would suck, an object can be VRAM, main RAM, GTT, tiled, endian swapped, etc. We pass tile information into the kernel for our objects now; we assume that the GTT map user wants a linear view of the object suitable for plain old fb drawing. The only semantic distinction between the regular mmap and the GTT mmap is this linearization of tiled objects; the WC mapping doesn't affect how things work, only how fast each read/write operation is, and whether the kernel will be doing additional CPU cache flushing. but if I don't care about that, if Intel were to use mmap2 then in theory you could use an even higher bit than bit 31. Yeah, someday we'll need to deal with single objects larger than 2GB. -- [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Aperture mapping under GEM
Zhao, Chunfeng wrote: Hi Keith, Do we have a time line to merge DRM modesetting_GEM branch to upstream main line branch? Thanks! Chunfeng -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Keith Packard Sent: Thursday, July 31, 2008 9:18 PM To: dri-devel Cc: [EMAIL PROTECTED] Subject: Aperture mapping under GEM Ok, we clearly need to deal with mapping subsets of the graphics aperture, both for discrete graphics cards and for 2D on tiled surfaces. Plus, there are reasons for using WC object mappings which is easily done through the aperture. I haven't spend a huge amount of time thinking about this, but I figured I'd prod people into discussion to try and sort things out. First off, here's what I think I want. We expose mmap ioctls on the gem objects, and I'd like to use the same basic mechanism; when (if?) gem objects become real files, we would want to continue using the same interface. I suggest creating two mmap windows for main memory objects: 0x-0x7fff: map the backing pages directly 0x8000-0x: map the object through the aperture I don't quite know what to do with discrete card memory; suggestions here are welcome from people who've thought about this more than I. Using these two per-object windows means there isn't any need to manage a synthetic linear address space for some global object (like the DRM fd). Next, we need to hook the mmap path in the driver so that our code can get a chance to play. I attached something that might work. Once we've got an mmap request, here's what I think we want to do: 1. Detect an aperture mapping request (bit 31) 2. Map the object to the aperture (speculating that the app will actually use it) 3. Initialize the vma to point at the aperture physical address range If the object remains mapped to the GTT, there's nothing else to do until the unmap request comes along at which point we tear down the vma. If the object gets unmapped from the GTT, we need to go find every VMA mapping it and fix up their PTEs to be unreadable/writable. I'm hoping this won't kill performance, but I'm fairly sure this will require an IPI to get the TLBs flushed on every core. Right? At least there won't be a cache flush as well. Now, if the application touches any one of those pages, we should map the whole object back to the GTT and rewrite the PTEs again. We could do this a page at a time, but I don't see any real benefit as we have to allocate the aperture space anyways, and it shouldn't be that much more expensive to fix up a lot of PTEs than to fix up just one. I think that's the whole story here; am I missing any big pieces? Keith, The description would be a little easier to follow if you didn't use the term map both for mmap-ing and AGP binding. Anyway, the above would probably work but for Intel UMA only, as other driver writers would have to deal with switching caching policy and VRAM copies as well, and either not use shmem objects or short-circuit their mapping / fault methods. The Linux mm people are very strongly against having a driver manipulating ptes directly. For this reason, one could use unmap_mapping_range() to invalidate all user ptes pointing to a particular range in the address space of an object, and that's why TTM needs to manage a fake linear address space for the drm fd. /Thomas - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Aperture mapping under GEM
On Sat, 2008-08-02 at 17:01 +0200, Thomas Hellström wrote: The description would be a little easier to follow if you didn't use the term map both for mmap-ing and AGP binding. Yeah, using unique terms for each map is a good idea. Anyway, the above would probably work but for Intel UMA only, as other driver writers would have to deal with switching caching policy and VRAM copies as well, and either not use shmem objects or short-circuit their mapping / fault methods. This is for the Intel driver, which is UMA only. The Linux mm people are very strongly against having a driver manipulating ptes directly. I'm always interested in coming up with the cleanest and most natural interface, independent of arbitrary objections. and that's why TTM needs to manage a fake linear address space for the drm fd. Managing a fake linear address space just to match some existing arbitrary API requirements is insane. Creating the right interface for my UMA environment is my goal. I'm not sure precisely what that API should be, but at least this one is obviously wrong. I want to handle thousands of discrete objects and be able to map them independently into my process, and bind them independently to the GTT. Only a few will ever be mapped to my process and while all of them will be bound to the GTT at times, only a subset will fit at any particular time. -- [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
RE: Aperture mapping under GEM
Hi Keith, Do we have a time line to merge DRM modesetting_GEM branch to upstream main line branch? Thanks! Chunfeng -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Keith Packard Sent: Thursday, July 31, 2008 9:18 PM To: dri-devel Cc: [EMAIL PROTECTED] Subject: Aperture mapping under GEM Ok, we clearly need to deal with mapping subsets of the graphics aperture, both for discrete graphics cards and for 2D on tiled surfaces. Plus, there are reasons for using WC object mappings which is easily done through the aperture. I haven't spend a huge amount of time thinking about this, but I figured I'd prod people into discussion to try and sort things out. First off, here's what I think I want. We expose mmap ioctls on the gem objects, and I'd like to use the same basic mechanism; when (if?) gem objects become real files, we would want to continue using the same interface. I suggest creating two mmap windows for main memory objects: 0x-0x7fff: map the backing pages directly 0x8000-0x: map the object through the aperture I don't quite know what to do with discrete card memory; suggestions here are welcome from people who've thought about this more than I. Using these two per-object windows means there isn't any need to manage a synthetic linear address space for some global object (like the DRM fd). Next, we need to hook the mmap path in the driver so that our code can get a chance to play. I attached something that might work. Once we've got an mmap request, here's what I think we want to do: 1. Detect an aperture mapping request (bit 31) 2. Map the object to the aperture (speculating that the app will actually use it) 3. Initialize the vma to point at the aperture physical address range If the object remains mapped to the GTT, there's nothing else to do until the unmap request comes along at which point we tear down the vma. If the object gets unmapped from the GTT, we need to go find every VMA mapping it and fix up their PTEs to be unreadable/writable. I'm hoping this won't kill performance, but I'm fairly sure this will require an IPI to get the TLBs flushed on every core. Right? At least there won't be a cache flush as well. Now, if the application touches any one of those pages, we should map the whole object back to the GTT and rewrite the PTEs again. We could do this a page at a time, but I don't see any real benefit as we have to allocate the aperture space anyways, and it shouldn't be that much more expensive to fix up a lot of PTEs than to fix up just one. I think that's the whole story here; am I missing any big pieces? -- [EMAIL PROTECTED] - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Aperture mapping under GEM
On Fri, 2008-08-01 at 18:48 +0200, Jakob Bornecrantz wrote: The basic fault here is that you have added a driver specific flag to a generic ioctl/syscall. Which the last time I checked we didn't want. For example on PCIE Radeon there is no GTT to map, so bit 31 makes no sense there. The GEM MMAP ioctl is driver-specific, not generic for precisely this reason. -- [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
RE: Aperture mapping under GEM
On Fri, 2008-08-01 at 10:45 -0700, Zhao, Chunfeng wrote: Hi Keith, Do we have a time line to merge DRM modesetting_GEM branch to upstream main line branch? Eric has posted the GEM patches to lkml for review; there are external kernel changes which are necessary for GEM to work; I think that blocks having GEM appear in the DRM master branch. Jesse is working on rebasing KMS to GEM, but he's not yet comfortable moving that to master. In any case, if you look at Jesse's proposed 2.5 release plans (visible through http://planet.freedesktop.org), you'll see that we expect all of this to be available for our Q3 release which will occur at the end of September. For that to happen, everything will be merged to the suitable master upstream branches. -- [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Aperture mapping under GEM
On Fri, Aug 1, 2008 at 8:13 PM, Keith Packard [EMAIL PROTECTED] wrote: On Fri, 2008-08-01 at 18:48 +0200, Jakob Bornecrantz wrote: The basic fault here is that you have added a driver specific flag to a generic ioctl/syscall. Which the last time I checked we didn't want. For example on PCIE Radeon there is no GTT to map, so bit 31 makes no sense there. The GEM MMAP ioctl is driver-specific, not generic for precisely this reason. If you want a none generic ioctl for that function go ahead, but IHMO it should then be some sort of flag field on the request. Fiddling with bits on the address feels a bit icky at best. But, the last time I check the only reason you could even hope to get a mmap ioctl into mainline was under the provision that you later moved it to the mmap syscall, which is however generic. Cheers Jakob. - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Aperture mapping under GEM
On Fri, Aug 1, 2008 at 2:13 PM, Keith Packard [EMAIL PROTECTED] wrote: On Fri, 2008-08-01 at 18:48 +0200, Jakob Bornecrantz wrote: The basic fault here is that you have added a driver specific flag to a generic ioctl/syscall. Which the last time I checked we didn't want. For example on PCIE Radeon there is no GTT to map, so bit 31 makes no sense there. The GEM MMAP ioctl is driver-specific, not generic for precisely this reason. I think Jakob has a point though. From your first post in this thread: We expose mmap ioctls on the gem objects, and I'd like to use the same basic mechanism; when (if?) gem objects become real files, we would want to continue using the same interface. I suggest creating two mmap windows for main memory objects: Are you saying that you're not planning to make the mmap ioctl a real mmap syscall when/if that's feasible or that it's okay to add intel-gem specific bits to the mmap arguments? I recall Thomas asking for a flags argument to the GEM create ioctl... cheers, Kristian - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Aperture mapping under GEM
On Fri, 2008-08-01 at 20:33 +0200, Jakob Bornecrantz wrote: If you want a none generic ioctl for that function go ahead, but IHMO it should then be some sort of flag field on the request. Fiddling with bits on the address feels a bit icky at best. Yeah, it is a bit icky. The thing is that with a file object, you've got one linear address space, so you can't really mmap the same address space in two different ways. One alternative here is to create another file object for the same pages and use different mmap semantics there, but I'd prefer to avoid that as it will be fairly expensive in kernel memory. -- [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Aperture mapping under GEM
On Fri, 2008-08-01 at 14:34 -0400, Kristian Høgsberg wrote: Are you saying that you're not planning to make the mmap ioctl a real mmap syscall when/if that's feasible or that it's okay to add intel-gem specific bits to the mmap arguments? I recall Thomas asking for a flags argument to the GEM create ioctl... Note that there aren't intel specific bits here, the intel back-end just has two separate address space ranges which expose different mappings. So, it could still be managed through the regular mmap API. However, it does seem a bit kludgy, and it might be better to have separate flags. However, it also seems odd to create two different mappings to the same address, that have different cache behaviour -- technically, this isn't valid for Intel PTEs, but as one mapping goes through the GTT, the underlying physical address seen by the CPU differs. Also, I don't know enough about the linux mmap implementation to say whether it will do 'odd' things with vmas which map the same address range in a file. Using separate address ranges means I can see the difference down in my mmap driver entry point, which seems like a feature. Alternate suggestions are welcome, especially if they point to a potential underlying implementation. -- [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Aperture mapping under GEM
Ok, we clearly need to deal with mapping subsets of the graphics aperture, both for discrete graphics cards and for 2D on tiled surfaces. Plus, there are reasons for using WC object mappings which is easily done through the aperture. I haven't spend a huge amount of time thinking about this, but I figured I'd prod people into discussion to try and sort things out. First off, here's what I think I want. We expose mmap ioctls on the gem objects, and I'd like to use the same basic mechanism; when (if?) gem objects become real files, we would want to continue using the same interface. I suggest creating two mmap windows for main memory objects: 0x-0x7fff: map the backing pages directly 0x8000-0x: map the object through the aperture I don't quite know what to do with discrete card memory; suggestions here are welcome from people who've thought about this more than I. Using these two per-object windows means there isn't any need to manage a synthetic linear address space for some global object (like the DRM fd). Next, we need to hook the mmap path in the driver so that our code can get a chance to play. I attached something that might work. Once we've got an mmap request, here's what I think we want to do: 1. Detect an aperture mapping request (bit 31) 2. Map the object to the aperture (speculating that the app will actually use it) 3. Initialize the vma to point at the aperture physical address range If the object remains mapped to the GTT, there's nothing else to do until the unmap request comes along at which point we tear down the vma. If the object gets unmapped from the GTT, we need to go find every VMA mapping it and fix up their PTEs to be unreadable/writable. I'm hoping this won't kill performance, but I'm fairly sure this will require an IPI to get the TLBs flushed on every core. Right? At least there won't be a cache flush as well. Now, if the application touches any one of those pages, we should map the whole object back to the GTT and rewrite the PTEs again. We could do this a page at a time, but I don't see any real benefit as we have to allocate the aperture space anyways, and it shouldn't be that much more expensive to fix up a lot of PTEs than to fix up just one. I think that's the whole story here; am I missing any big pieces? -- [EMAIL PROTECTED] commit 0eb8c53640406c08b5a304d09bf08079b53eef84 Author: Keith Packard [EMAIL PROTECTED] Date: Tue Jul 29 20:19:28 2008 -0700 Start adding gtt mapping ioctls diff --git a/linux-core/i915_gem.c b/linux-core/i915_gem.c index 236203a..f187361 100644 --- a/linux-core/i915_gem.c +++ b/linux-core/i915_gem.c @@ -85,6 +85,23 @@ i915_gem_init_ioctl(struct drm_device *dev, void *data, } +static struct file_operations i915_gem_file_operations; + +#define I915_GEM_MAP_GTT_BASE (1 31) + +static int i915_gem_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct drm_device *dev = file-private_data; + drm_i915_private_t *dev_priv = dev-dev_private; + + DRM_INFO(mmap %08lx\n, vma-vm_start); + if (vma-vm_start I915_GEM_MAP_GTT_BASE) + return -ENODEV; + else + return dev_priv-shmem_mmap (file, vma); +} + + /** * Creates a new mm object and returns a handle to it. */ @@ -103,6 +120,16 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data, if (obj == NULL) return -ENOMEM; + obj-filp-private_data = dev; + spin_lock(dev-object_name_lock); + if (i915_gem_file_operations.mmap == NULL) { + dev-shmem_mmap = obj-filp-f_path.dentry-d_inode-i_fop-mmap; + i915_gem_file_operations = *obj-filp-f_path.dentry-d_inode-i_fop; + i915_gem_file_operations.mmap = i915_gem_mmap; + } + obj-filp-f_path.dentry-d_inode-i_fop = i915_gem_file_operations; + spin_unlock(dev-object_name_lock); + ret = drm_gem_handle_create(file_priv, obj, handle); mutex_lock(dev-struct_mutex); drm_gem_object_handle_unreference(obj); diff --git a/shared-core/i915_drv.h b/shared-core/i915_drv.h index a9a431c..a577292 100644 --- a/shared-core/i915_drv.h +++ b/shared-core/i915_drv.h @@ -321,6 +321,9 @@ typedef struct drm_i915_private { uint32_t bit_6_swizzle_x; /** Bit 6 swizzling required for Y tiling */ uint32_t bit_6_swizzle_y; + + /** shmem_mmap isn't public, but we discover it by magic */ + int (*shmem_mmap) (struct file *file, struct vm_area_struct *vma); } mm; } drm_i915_private_t; signature.asc Description: This is a digitally signed message part - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel