Re: drm-ttm-cleanup-branch
Dave Airlie wrote: On 5/9/07, Thomas Hellström [EMAIL PROTECTED] wrote: Dave Airlie wrote: I'll try it out as soon as there is time. I've just tested glxgears and a few mesa tests on it and it seems to be working fine We should probably think about pulling this over into the DRM sooner rather than later, there are also some changes to the DDX i830_driver.c compat code to deal with... Yup. I've attached a patch (against the cleanup branch) with things I think may be needed. 1) 64-bit reordering. 64-bit scalars, structs and unions should probably be 64-bit aligned in parent structs. I had to insert padding in two cases, but this probably needs to be double-checked. 2) A magic member in the init ioctl. Checking this allows for verbose and friendly failure of code that uses the old interface. 3) Init major / minor versioning of the memory manager interface in case we need changes in the future. 4) expand_pads are 64-bit following Jesse Barnes recommendations. 5) The info_req carries a fence class for validation for a particular command submission mechanism. 6) The info_rep arg carries tile_strides and tile_info. The argument tile_strides is ((actual_tile_stride) 16) | (desired_tile_stride)) Any reason you don't just separate actual_tile_stride and desired_tile_stride to 2xunsigned int? not sure why merging them really gives us anything... The argument tile_info is driver-specific. (Could be tile width, x-major, y-major etc.) Finally, should we perhaps allow for 64-bit buffer object flags / mask at this point? Possibly, the rest all seems like good ideas, I know we hit the 64-bit alignment on nouveau so good to get it fixed early I haven't done any user-space or kernel coding for this yet. Just want to know what you think. Well I'll be offline for a few weeks so I'll get the odd chance to read mail but no development... If kernel-space relocations are on the agenda, I should flag up one issue that we are currently pretending doesn't really exist, namely that not all relocations apply to the main dma/batch buffer. More precisely, state structures stored for use and reuse outside of the command stream can contain pointers to things like draw surfaces and textures. Having pointers in those structs limits their ability to be reused as intended, but there's not much to do about that. The important thing to note is that they *do* need to fixed up with relocation information at the same time as the batchbuffer. Or in other words, that while the batch buffer may be special, it is not unique - there are a (small) number of buffers that will require the same treatment. Keith - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: drm-ttm-cleanup-branch
Dave Airlie wrote: I'll try it out as soon as there is time. I've just tested glxgears and a few mesa tests on it and it seems to be working fine We should probably think about pulling this over into the DRM sooner rather than later, there are also some changes to the DDX i830_driver.c compat code to deal with... Yup. I've attached a patch (against the cleanup branch) with things I think may be needed. 1) 64-bit reordering. 64-bit scalars, structs and unions should probably be 64-bit aligned in parent structs. I had to insert padding in two cases, but this probably needs to be double-checked. 2) A magic member in the init ioctl. Checking this allows for verbose and friendly failure of code that uses the old interface. 3) Init major / minor versioning of the memory manager interface in case we need changes in the future. 4) expand_pads are 64-bit following Jesse Barnes recommendations. 5) The info_req carries a fence class for validation for a particular command submission mechanism. 6) The info_rep arg carries tile_strides and tile_info. The argument tile_strides is ((actual_tile_stride) 16) | (desired_tile_stride)) The argument tile_info is driver-specific. (Could be tile width, x-major, y-major etc.) Finally, should we perhaps allow for 64-bit buffer object flags / mask at this point? I haven't done any user-space or kernel coding for this yet. Just want to know what you think. /Thomas - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: drm-ttm-cleanup-branch
...And the patch. Thomas Hellström wrote: Dave Airlie wrote: I'll try it out as soon as there is time. I've just tested glxgears and a few mesa tests on it and it seems to be working fine We should probably think about pulling this over into the DRM sooner rather than later, there are also some changes to the DDX i830_driver.c compat code to deal with... Yup. I've attached a patch (against the cleanup branch) with things I think may be needed. 1) 64-bit reordering. 64-bit scalars, structs and unions should probably be 64-bit aligned in parent structs. I had to insert padding in two cases, but this probably needs to be double-checked. 2) A magic member in the init ioctl. Checking this allows for verbose and friendly failure of code that uses the old interface. 3) Init major / minor versioning of the memory manager interface in case we need changes in the future. 4) expand_pads are 64-bit following Jesse Barnes recommendations. 5) The info_req carries a fence class for validation for a particular command submission mechanism. 6) The info_rep arg carries tile_strides and tile_info. The argument tile_strides is ((actual_tile_stride) 16) | (desired_tile_stride)) The argument tile_info is driver-specific. (Could be tile width, x-major, y-major etc.) Finally, should we perhaps allow for 64-bit buffer object flags / mask at this point? I haven't done any user-space or kernel coding for this yet. Just want to know what you think. /Thomas From 2b7d5bff5a6aeca08ccf1931828eeca74935bad5 Mon Sep 17 00:00:00 2001 From: Thomas Hellstrom thomas-at-tungstengraphics-dot-com Date: Wed, 9 May 2007 09:56:06 +0200 Subject: [PATCH] New members and 64-bit reordering. Signed-off-by: Thomas Hellstrom thomas-at-tungstengraphics-dot-com --- shared-core/drm.h | 46 +- 1 files changed, 29 insertions(+), 17 deletions(-) diff --git a/shared-core/drm.h b/shared-core/drm.h index ae308be..5cb2545 100644 --- a/shared-core/drm.h +++ b/shared-core/drm.h @@ -671,12 +671,13 @@ #define DRM_FENCE_MASK_DRIVER #define DRM_FENCE_TYPE_EXE 0x0001 typedef struct drm_fence_arg { - unsigned handle; - int class; - unsigned type; - unsigned flags; - unsigned signaled; - unsigned expand_pad[4]; /*Future expansion */ + unsigned int handle; + unsigned int class; + unsigned int type; + unsigned int flags; + unsigned int signaled; + unsigned int pad_64; + drm_u64_t expand_pad[3]; /*Future expansion */ } drm_fence_arg_t; /* Buffer permissions, referring to how the GPU uses the buffers. @@ -774,6 +775,10 @@ #define DRM_BO_HINT_DONT_FENCE 0x00 #define DRM_BO_HINT_WAIT_LAZY 0x0008 #define DRM_BO_HINT_ALLOW_UNFENCED_MAP 0x0010 +#define DRM_BO_INIT_MAGIC 0xfe769812 +#define DRM_BO_INIT_MAJOR 0 +#define DRM_BO_INIT_MINOR 1 + typedef enum { drm_bo_type_dc, @@ -786,25 +791,26 @@ struct drm_bo_info_req { unsigned int handle; unsigned int mask; unsigned int hint; + unsigned int fence_class; }; struct drm_bo_create_req { unsigned int mask; unsigned int hint; - unsigned page_alignment; - drm_u64_t size; + unsigned int page_alignment; drm_bo_type_t type; + drm_u64_t size; drm_u64_t buffer_start; }; struct drm_bo_op_req { - struct drm_bo_info_req bo_req; - unsigned int arg_handle; enum { drm_bo_validate, drm_bo_fence, drm_bo_ref_fence, } op; + unsigned int arg_handle; + struct drm_bo_info_req bo_req; }; /* @@ -816,20 +822,22 @@ #define DRM_BO_REP_BUSY 0x0001 struct drm_bo_info_rep { unsigned int handle; unsigned int flags; - drm_u64_t size; - drm_u64_t offset; - drm_u64_t arg_handle; unsigned int mask; - drm_u64_t buffer_start; unsigned int fence_flags; unsigned int rep_flags; unsigned int page_alignment; - unsigned int expand_pad[4]; /*Future expansion */ + unsigned int tile_strides; + unsigned int tile_info; + drm_u64_t size; + drm_u64_t offset; + drm_u64_t arg_handle; + drm_u64_t buffer_start; + drm_u64_t expand_pad[4]; /*Future expansion */ }; struct drm_bo_arg_rep { - int ret; struct drm_bo_info_rep bo_info; + int ret; }; struct drm_bo_create_arg { @@ -859,6 +867,7 @@ struct drm_bo_map_wait_idle_arg { struct drm_bo_op_arg { int handled; + unsigned int pad_64; drm_u64_t next; union { struct drm_bo_op_req req; @@ -882,9 +891,12 @@ typedef struct drm_mm_type_arg { } drm_mm_type_arg_t; typedef struct drm_mm_init_arg { + unsigned int magic; + unsigned int major; + unsigned int minor; + unsigned int mem_type; drm_u64_t p_offset;
Re: drm-ttm-cleanup-branch
On 5/9/07, Thomas Hellström [EMAIL PROTECTED] wrote: Dave Airlie wrote: I'll try it out as soon as there is time. I've just tested glxgears and a few mesa tests on it and it seems to be working fine We should probably think about pulling this over into the DRM sooner rather than later, there are also some changes to the DDX i830_driver.c compat code to deal with... Yup. I've attached a patch (against the cleanup branch) with things I think may be needed. 1) 64-bit reordering. 64-bit scalars, structs and unions should probably be 64-bit aligned in parent structs. I had to insert padding in two cases, but this probably needs to be double-checked. 2) A magic member in the init ioctl. Checking this allows for verbose and friendly failure of code that uses the old interface. 3) Init major / minor versioning of the memory manager interface in case we need changes in the future. 4) expand_pads are 64-bit following Jesse Barnes recommendations. 5) The info_req carries a fence class for validation for a particular command submission mechanism. 6) The info_rep arg carries tile_strides and tile_info. The argument tile_strides is ((actual_tile_stride) 16) | (desired_tile_stride)) Any reason you don't just separate actual_tile_stride and desired_tile_stride to 2xunsigned int? not sure why merging them really gives us anything... The argument tile_info is driver-specific. (Could be tile width, x-major, y-major etc.) Finally, should we perhaps allow for 64-bit buffer object flags / mask at this point? Possibly, the rest all seems like good ideas, I know we hit the 64-bit alignment on nouveau so good to get it fixed early I haven't done any user-space or kernel coding for this yet. Just want to know what you think. Well I'll be offline for a few weeks so I'll get the odd chance to read mail but no development... Dave. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: drm-ttm-cleanup-branch
On 5/7/07, Thomas Hellström [EMAIL PROTECTED] wrote: Dave Airlie wrote: I've started a cleanup branch, I've just checked in the mm_ioctl split out into separate parts, I'll try and get fence and buffer done as well.. This will break compatiblity but to be honest, anyone who has deployed this beyond embedded system work (i.e. TG and me), deserves what they get for integrating unreleased code :-) Of course quite how to fix up the buffer object ioctl chain scheme I'm not sure, I'm not 100% sure this is really a win in most situations?? have we any numbers? Dave. Dave, It's good that you've found time to do this. The IOCTL chaining is presently only used to submit a list of buffers for validation. Any other use can go away, and if we can find a better way of submitting a buffer list (be it a linked list or array) for a validatebufferlist we can and should probably use that. Okay I've cleaned up this stuff as best I can, it builds I haven't tested it as my crash box is hooked up doing some other testing at the moment... but I kept the list for the fence/validate operations... I've also removed the typedefs from most of the code and I don't see a major impact on readability so I may proceed to do this in more places.. If there is a consensus to skip the validatebufferlist functionality altogether we can skip also that, but we still need a way to submit a buffer list with the device-specific submit ioctl so It is my gut feeling that we should have a validatebufferlist reference implementation. I'm also not sure about what the Nouveau guys think of removing this functionality? I like the linked list for the following two reasons: 1. When you start a command buffer sequence with BEGIN_BATCH(_dwords, _relocs, _flags), you know the number of relocs and the space needed in the batch buffer, but you don't know for sure the number of buffers on the validate list. 2. It's been tested, debugged and it works ok. Another important consideration is what buffer attribues should we allow the validate call to change?. Currently it can only change flags, with a mask and give a hint as to how a certain operation should be performed. I'd like to see the following attribute as well: * The fence class, That is, what command submission mechanism are we validating for?. If the fence class does change compared to the current fence (if any), we need to expire the current fence before validating. This is a need for for hardware with multiple command submission mechanism. Then there's tiling. I think the kernel needs to know about tiling to efficiently set up tiling (AKA fence) registers in the limited GPU virtual address space (if possible on a per buffer basis), and then it needs to know also about the desired tile stride and a device specific tile-type. User space gets back the actual set tile-type and the underlying GPU virtual space tile-stride (which may exceed the desired tile stride due to hardware limitations). I think setting the tile attributes can be done using a separate IOCTL, but the actual GPU virtual space tile stride needs to be reported back with the validate IOCTL. so a typical command sequence for a tiled buffer would be: drmBOSetTiled(bo, 3 /* Tile stride */, DRM_INTEL_FLAG_TILED_XMAJOR); BEGIN_BATCH(5 /*dwords*/, 2 /*relocs*/, myFlags); OUT_BATCH(...) OUT_BATCH(...) OUT_RELOC_OFFSET(buffer); OUT_RELOC_STRIDE(buffer); OUT_BATCH(FIRE_CMD); ADVANCE_BATCH(); To summarize: * Tile parameters can be set using a separate ioctl (tile stride and tile type) * An extra validate TILED flag can make DRM set up the buffer for tiling. * Validate needs to return the GPU virtual space tile-stride, and perhaps the actual tile type. * We can use pte tricks to always have the _desired_ tile stride for CPU mappings. (This means we might have different strides in GPU- and CPU space). * Only the pages actually _used_ by the buffer are inserted into GPU virtual space. This may not save GPU virtual space, but it saves a lot of memory for buffers with odd strides. (For example 1025). This interface sounds okay for tiling, I'd need to play around with it to be sure it would work on radeon, which I probably won't get to before I go on holidays... but I was looking at doing this for my own project as it might save me 2-3MBs of memory I'm needlessly mapping into the GTT at the moment to fill in the tile gaps.. Dave. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net
Re: drm-ttm-cleanup-branch
Dave Airlie wrote: On 5/7/07, Thomas Hellström [EMAIL PROTECTED] wrote: Dave Airlie wrote: I've started a cleanup branch, I've just checked in the mm_ioctl split out into separate parts, I'll try and get fence and buffer done as well.. This will break compatiblity but to be honest, anyone who has deployed this beyond embedded system work (i.e. TG and me), deserves what they get for integrating unreleased code :-) Of course quite how to fix up the buffer object ioctl chain scheme I'm not sure, I'm not 100% sure this is really a win in most situations?? have we any numbers? Dave. Dave, It's good that you've found time to do this. The IOCTL chaining is presently only used to submit a list of buffers for validation. Any other use can go away, and if we can find a better way of submitting a buffer list (be it a linked list or array) for a validatebufferlist we can and should probably use that. Okay I've cleaned up this stuff as best I can, it builds I haven't tested it as my crash box is hooked up doing some other testing at the moment... but I kept the list for the fence/validate operations... I've also removed the typedefs from most of the code and I don't see a major impact on readability so I may proceed to do this in more places.. Cool! I'll try it out as soon as there is time. BTW what rules should we use for scalar types in drm? I assume unsigned, int, unsigned long etc are OK when no specific size is needed in kernel code? u32 and similar can be used in linux-only kernel code. In shared code we use uint32_t and similar? __u32 is used in linux-only user-space interface code. What about shared interface code? unsigned? uint32_t? This interface sounds okay for tiling, I'd need to play around with it to be sure it would work on radeon, which I probably won't get to before I go on holidays... but I was looking at doing this for my own project as it might save me 2-3MBs of memory I'm needlessly mapping into the GTT at the moment to fill in the tile gaps.. Yes, I have an agpgart patch around somewhere that allows tiled insertions and removals, but it still needs some form of TTM support. I'll be needing something similar towards the summer as well, mainly to be able to tile private back-buffers with a limited number of tiling (fence) registers. With the buffer object vm we can reuse tiling registers, Not from GPU-validated buffers but at least from mapped buffers. We can kill the user-space mappings and steal the tiling register. As soon as a nopfn() is triggered we give it (or another one) back, and remap the page(s). Let's keep in touch about this. Dave. /Thomas - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: drm-ttm-cleanup-branch
I'll try it out as soon as there is time. I've just tested glxgears and a few mesa tests on it and it seems to be working fine We should probably think about pulling this over into the DRM sooner rather than later, there are also some changes to the DDX i830_driver.c compat code to deal with... BTW what rules should we use for scalar types in drm? I assume unsigned, int, unsigned long etc are OK when no specific size is needed in kernel code? u32 and similar can be used in linux-only kernel code. In shared code we use uint32_t and similar? __u32 is used in linux-only user-space interface code. What about shared interface code? unsigned? uint32_t? I'd rather not use unsigned or unsigned long ever, unsigned int is well defined IMHO, unsigned i feel isn't informative enough and unsigned long or void * is just a bad idea for 32/64bit so to be avoided at all costs... So I guess we use unsigned int and uint32_t in any many places as we can I think we need to add an include to drm.h for uint32_t to work in it.. the kernel people might give out a bit, but I'm quite happy to standover using uint32_t and friends.. Dave. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: drm-ttm-cleanup-branch
Dave Airlie wrote: I've started a cleanup branch, I've just checked in the mm_ioctl split out into separate parts, I'll try and get fence and buffer done as well.. This will break compatiblity but to be honest, anyone who has deployed this beyond embedded system work (i.e. TG and me), deserves what they get for integrating unreleased code :-) Of course quite how to fix up the buffer object ioctl chain scheme I'm not sure, I'm not 100% sure this is really a win in most situations?? have we any numbers? Dave. Dave, It's good that you've found time to do this. The IOCTL chaining is presently only used to submit a list of buffers for validation. Any other use can go away, and if we can find a better way of submitting a buffer list (be it a linked list or array) for a validatebufferlist we can and should probably use that. If there is a consensus to skip the validatebufferlist functionality altogether we can skip also that, but we still need a way to submit a buffer list with the device-specific submit ioctl so It is my gut feeling that we should have a validatebufferlist reference implementation. I'm also not sure about what the Nouveau guys think of removing this functionality? I like the linked list for the following two reasons: 1. When you start a command buffer sequence with BEGIN_BATCH(_dwords, _relocs, _flags), you know the number of relocs and the space needed in the batch buffer, but you don't know for sure the number of buffers on the validate list. 2. It's been tested, debugged and it works ok. Another important consideration is what buffer attribues should we allow the validate call to change?. Currently it can only change flags, with a mask and give a hint as to how a certain operation should be performed. I'd like to see the following attribute as well: * The fence class, That is, what command submission mechanism are we validating for?. If the fence class does change compared to the current fence (if any), we need to expire the current fence before validating. This is a need for for hardware with multiple command submission mechanism. Then there's tiling. I think the kernel needs to know about tiling to efficiently set up tiling (AKA fence) registers in the limited GPU virtual address space (if possible on a per buffer basis), and then it needs to know also about the desired tile stride and a device specific tile-type. User space gets back the actual set tile-type and the underlying GPU virtual space tile-stride (which may exceed the desired tile stride due to hardware limitations). I think setting the tile attributes can be done using a separate IOCTL, but the actual GPU virtual space tile stride needs to be reported back with the validate IOCTL. so a typical command sequence for a tiled buffer would be: drmBOSetTiled(bo, 3 /* Tile stride */, DRM_INTEL_FLAG_TILED_XMAJOR); BEGIN_BATCH(5 /*dwords*/, 2 /*relocs*/, myFlags); OUT_BATCH(...) OUT_BATCH(...) OUT_RELOC_OFFSET(buffer); OUT_RELOC_STRIDE(buffer); OUT_BATCH(FIRE_CMD); ADVANCE_BATCH(); To summarize: * Tile parameters can be set using a separate ioctl (tile stride and tile type) * An extra validate TILED flag can make DRM set up the buffer for tiling. * Validate needs to return the GPU virtual space tile-stride, and perhaps the actual tile type. * We can use pte tricks to always have the _desired_ tile stride for CPU mappings. (This means we might have different strides in GPU- and CPU space). * Only the pages actually _used_ by the buffer are inserted into GPU virtual space. This may not save GPU virtual space, but it saves a lot of memory for buffers with odd strides. (For example 1025). /Thomas - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
drm-ttm-cleanup-branch
I've started a cleanup branch, I've just checked in the mm_ioctl split out into separate parts, I'll try and get fence and buffer done as well.. This will break compatiblity but to be honest, anyone who has deployed this beyond embedded system work (i.e. TG and me), deserves what they get for integrating unreleased code :-) Dave. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: drm-ttm-cleanup-branch
I've started a cleanup branch, I've just checked in the mm_ioctl split out into separate parts, I'll try and get fence and buffer done as well.. This will break compatiblity but to be honest, anyone who has deployed this beyond embedded system work (i.e. TG and me), deserves what they get for integrating unreleased code :-) Of course quite how to fix up the buffer object ioctl chain scheme I'm not sure, I'm not 100% sure this is really a win in most situations?? have we any numbers? Dave. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel