Re: TTM eviction ghost object
I want to dig much more into this asynchronous memory manager mechanism. Now I have several questions: 1.According to Thamos's suggestion, each memory manager will has a fence object with it, which is delivered from driver's *move* function, my question is what's relationship between the memory manager's fence object and each BO's fence object? 2.What's the difference between the HW-engine-for-move and GPU? If we use GPU to do the move, can we treat the two behavior the same? I mean can the BO's synchronization be achieved through memory manager's fence object? 3.Now the BO's synchronization is acted by ttm_bo_wait, which is used in evict_bo, swapout_bo, cpu_access, bo_cleanup. cpu_access and bo_cleanup need ttm_bo_wait anyway, while the evict_bo and swapout_bo(assume GPU support this feature) needn't call ttm_bo_wait since they use the same engine, i.e. GPU. Am i right? any comments will be appreciated 2009/12/16 Jerome Glisse gli...@freedesktop.org On Wed, Dec 16, 2009 at 12:12:13AM +0800, Donnie Fang wrote: Hi Thomas, I conclude your meaning as below: a. When CPU join in, it must wait for the sync object to really free the device address space. b. When CPU absent, but there are two indepent HW engines relevant to the space, the one must wait for the sync object. c. Fully pipelined bo move support when only one HW engine related to the space. Am i right? About *b*, let's say, 1)schedule copy the bo from VRAM based on HW DMA engine. 2) Put a corresponding sync object on the manager. 3) Free the vram region. 4) Region gets allocated. 5) GPU 2D render to this region. since GPU 2D and HW DMA engine is totally independent from each other, so sync object still needs to be signaled in this situation. Some hw have way to synchronize btw different part of the GPU so for instance you can tell the 2D pipeline to wait on the hw dma engine before doing any work. If hw doesn't have such synchronization capabilities i believe it's better to only use 1 pipeline of the hw (so forget about hw dma engine and do bo move using the 2d or 3d engine), otherwise you will have to put the CPU in the loop and that would mean stalling the GPU (will more than likely end up in suboptimal use of the GPU). Cheers, Jerome -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: TTM eviction ghost object
On Tue, Jan 12, 2010 at 01:03:02AM +0800, Donnie Fang wrote: I want to dig much more into this asynchronous memory manager mechanism. Now I have several questions: 1.According to Thamos's suggestion, each memory manager will has a fence object with it, which is delivered from driver's *move* function, my question is what's relationship between the memory manager's fence object and each BO's fence object? Thomas's idea was (AFAICT) that the fence associated to the manager should always be the lastest one, ie: fence *fence_lastest(*fencea, *fenceb) And doing fence = driver-move(bo, ...) // fence is the same as BO's fence object lock manager-fence = fence_lastest(fence, manager-fence) unlock 2.What's the difference between the HW-engine-for-move and GPU? If we use GPU to do the move, can we treat the two behavior the same? I mean can the BO's synchronization be achieved through memory manager's fence object? If the GPU only as 3D engine to do everythings (bo move, 2d rendering, 3d) then GPU won't care about manager's fence. If GPU has a separate hw for moving memory then you need fence btw each GPU engine. This fence will be hw specific and likely only the fence id (private to fence implementation) will be use. For instance. DMAEngine move buffer A from system to VRAM, emit fenceid 0xCAFEDEAD 3DEngine want to use buffer A from VRAM, before sending cmd which use buffer you queue a sync command waiting on fenceid 0xCAFEDEAD. This assume hw has such sync mecanism. If it doesn't have such thing then you need to wait in the CPU land until DMAEngine is done before queuing 3D command. Most HW doesn't have such limitation AFAIK (radeon, nvidia, intel). Note that sync btw different engine can be tricky from driver code pov. 3.Now the BO's synchronization is acted by ttm_bo_wait, which is used in evict_bo, swapout_bo, cpu_access, bo_cleanup. cpu_access and bo_cleanup need ttm_bo_wait anyway, while the evict_bo and swapout_bo(assume GPU support this feature) needn't call ttm_bo_wait since they use the same engine, i.e. GPU. Am i right? any comments will be appreciated evict_bo doesn't need ttm_bo_wait. But all others do, swapout_bo is going to swap page to the hard disk so before writing page to the disk we must wait for the gpu to have write those page content into system memory. Cheers, Jerome -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: TTM eviction ghost object
Jerome Glisse wrote: On Tue, Jan 12, 2010 at 01:03:02AM +0800, Donnie Fang wrote: I want to dig much more into this asynchronous memory manager mechanism. Now I have several questions: 1.According to Thamos's suggestion, each memory manager will has a fence object with it, which is delivered from driver's *move* function, my question is what's relationship between the memory manager's fence object and each BO's fence object? Thomas's idea was (AFAICT) that the fence associated to the manager should always be the lastest one, ie: fence *fence_lastest(*fencea, *fenceb) And doing fence = driver-move(bo, ...) // fence is the same as BO's fence object lock manager-fence = fence_lastest(fence, manager-fence) unlock This is true to some extent. The fence that sits on the manager should be the last fence associated with a move out operation from that manager. This can be extended to allow some granularity: For example instead of having one fence per manager, one could have one fence per free region in the manager, but with each level of optimization we introduce more complexity to the point where noone will be able to understand the code /Thomas -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: TTM eviction ghost object
Hi Thomas, I have several doubts. please check them as below. 2009/12/15 Thomas Hellstrom tho...@shipmail.org Jerome Glisse wrote: Hi Thomas, Dave find out the root of a strange oops we did encouter. I spend sometimes today trying to hack ttm around but in the end my solution is wrong. So when we move an object a ttm ghost object is created. If GPU takes time to evict the bo the ghost object endup on the destroy list stay on the lru list (unless i have missunderstood the code the whole day). No if ghost is in GTT (similar issue can happen in different configuration, bottom line is evicting a ghost object) it can get evicted and that's when trouble start. The driver callback get call with the ghost object but ghost object haven't been created by the driver and thus driver will more than likely endup oupsing trying to access its private bo structure (ttm_bo structure is embeded in radeon_bo structure and any driver relying on accessing the driver structure will hit this issue). I see 2 solutions : - Don't put ghost on lru list - Add a flag so we know if we can call driver callback on object or not. Jerome, In general, since the driver bo is *derived* from the ttm bo, and the callback takes the base type, ttm bos as arguments, The driver needs to check the object type before typecasting. We do a similar check in the vmwgfx driver by checking the bo destroy function, to see whether it's the driver specific destroy, so this first problem should be viewed as a driver bug, as I see it. Note that if you need driver private per-bo information to be added to a bo in order for move() to work, you should carefully review if it's *really* needed, and in that case we must set up a callback to add that information at bo creation, but in general the driver specific move function should be able to handle the base object type. I will send the first solution patch but i haven't yet found an easy way to exercise this path. My concern is that when in ttm_bo_mem_force_space we might fail because we don't wait for the ghost object to actualy die and free space (issue only if no_wait=false). Also i wonder if letting a ghost bo object on lru might not lead to infinite eviction loop. Case i am thinking of : - VRAM is full only 1 object we can evict, we evict it and create a ghost object holding the vram space the eviction is long enough that we put the ghost on lru. ttm_bo_mem_force_space evict the ghost_object and we loop on this. Anyway, what is your thought on this. This situation is actually handled by the evict bool. When @evict==true, no ghost object is created, and eviction is synchronous, so rather than being incorrect, we're being suboptimal. I admit this isn't the most optimal solution. My plan when I get time is to implement fully asynchronous memory management. That means that the managers are in sync with the CPU and not the GPU, and all buffer moves are pipelined, provided that the driver supports it. This also means that I will hang a sync object on each memory type manager, so that if we need to switch hw engine, and sync the manager with the GPU, we can wait on that sync object. This will mean that when you evict a buffer object, its space will immediately show up as free, although it really isn't free yet, but it *will* be free when the gpu executes a move to that memory region, since the eviction will be scheduled before the move to memory. Does the space show up as free immediately even when the fence object of this bo hasn't been signaled? The ttm core now deliver it to a ghost bo and let it track the old bo's space, free its space only when the bo fence signaled. How could manage to fit these modification? would you please show more hints? Cheers, Jerome -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: TTM eviction ghost object
Donnie Fang wrote: Hi Thomas, I have several doubts. please check them as below. 2009/12/15 Thomas Hellstrom tho...@shipmail.org mailto:tho...@shipmail.org Jerome Glisse wrote: Hi Thomas, Dave find out the root of a strange oops we did encouter. I spend sometimes today trying to hack ttm around but in the end my solution is wrong. So when we move an object a ttm ghost object is created. If GPU takes time to evict the bo the ghost object endup on the destroy list stay on the lru list (unless i have missunderstood the code the whole day). No if ghost is in GTT (similar issue can happen in different configuration, bottom line is evicting a ghost object) it can get evicted and that's when trouble start. The driver callback get call with the ghost object but ghost object haven't been created by the driver and thus driver will more than likely endup oupsing trying to access its private bo structure (ttm_bo structure is embeded in radeon_bo structure and any driver relying on accessing the driver structure will hit this issue). I see 2 solutions : - Don't put ghost on lru list - Add a flag so we know if we can call driver callback on object or not. Jerome, In general, since the driver bo is *derived* from the ttm bo, and the callback takes the base type, ttm bos as arguments, The driver needs to check the object type before typecasting. We do a similar check in the vmwgfx driver by checking the bo destroy function, to see whether it's the driver specific destroy, so this first problem should be viewed as a driver bug, as I see it. Note that if you need driver private per-bo information to be added to a bo in order for move() to work, you should carefully review if it's *really* needed, and in that case we must set up a callback to add that information at bo creation, but in general the driver specific move function should be able to handle the base object type. I will send the first solution patch but i haven't yet found an easy way to exercise this path. My concern is that when in ttm_bo_mem_force_space we might fail because we don't wait for the ghost object to actualy die and free space (issue only if no_wait=false). Also i wonder if letting a ghost bo object on lru might not lead to infinite eviction loop. Case i am thinking of : - VRAM is full only 1 object we can evict, we evict it and create a ghost object holding the vram space the eviction is long enough that we put the ghost on lru. ttm_bo_mem_force_space evict the ghost_object and we loop on this. Anyway, what is your thought on this. This situation is actually handled by the evict bool. When @evict==true, no ghost object is created, and eviction is synchronous, so rather than being incorrect, we're being suboptimal. I admit this isn't the most optimal solution. My plan when I get time is to implement fully asynchronous memory management. That means that the managers are in sync with the CPU and not the GPU, and all buffer moves are pipelined, provided that the driver supports it. This also means that I will hang a sync object on each memory type manager, so that if we need to switch hw engine, and sync the manager with the GPU, we can wait on that sync object. This will mean that when you evict a buffer object, its space will immediately show up as free, although it really isn't free yet, but it *will* be free when the gpu executes a move to that memory region, since the eviction will be scheduled before the move to memory. Does the space show up as free immediately even when the fence object of this bo hasn't been signaled? The ttm core now deliver it to a ghost bo and let it track the old bo's space, free its space only when the bo fence signaled. How could manage to fit these modification? would you please show more hints? Yes, the space will show up as free immediately. However, if you want to *use* the space immediately, you will have to wait on the sync object that will be attached to the manager. Let's say you copy _from_ vram using a hw dma engine, and copy _to_ vram using the CPU: 1) Schedule a copy from a vram region. 2) Put a corresponding sync object on the manager. 3) Free the vram region. 4) Region gets allocated. 5) We want to memcpy to the vram region. We need to wait for the manager fence. 6) Memcpy. The second example is when you use the hw dma engine for both copies. 1) Schedule a copy from a vram region. 2) Put a corresponding sync object on the manager. 3) Free the vram region. 4) Region gets allocated. 5) Schedule a copy to vram. No need to
Re: TTM eviction ghost object
Hi Thomas, I conclude your meaning as below: a. When CPU join in, it must wait for the sync object to really free the device address space. b. When CPU absent, but there are two indepent HW engines relevant to the space, the one must wait for the sync object. c. Fully pipelined bo move support when only one HW engine related to the space. Am i right? About *b*, let's say, 1)schedule copy the bo from VRAM based on HW DMA engine. 2) Put a corresponding sync object on the manager. 3) Free the vram region. 4) Region gets allocated. 5) GPU 2D render to this region. since GPU 2D and HW DMA engine is totally independent from each other, so sync object still needs to be signaled in this situation. 2009/12/15 Thomas Hellström tho...@shipmail.org Donnie Fang wrote: Hi Thomas, I have several doubts. please check them as below. 2009/12/15 Thomas Hellstrom tho...@shipmail.org mailto: tho...@shipmail.org Jerome Glisse wrote: Hi Thomas, Dave find out the root of a strange oops we did encouter. I spend sometimes today trying to hack ttm around but in the end my solution is wrong. So when we move an object a ttm ghost object is created. If GPU takes time to evict the bo the ghost object endup on the destroy list stay on the lru list (unless i have missunderstood the code the whole day). No if ghost is in GTT (similar issue can happen in different configuration, bottom line is evicting a ghost object) it can get evicted and that's when trouble start. The driver callback get call with the ghost object but ghost object haven't been created by the driver and thus driver will more than likely endup oupsing trying to access its private bo structure (ttm_bo structure is embeded in radeon_bo structure and any driver relying on accessing the driver structure will hit this issue). I see 2 solutions : - Don't put ghost on lru list - Add a flag so we know if we can call driver callback on object or not. Jerome, In general, since the driver bo is *derived* from the ttm bo, and the callback takes the base type, ttm bos as arguments, The driver needs to check the object type before typecasting. We do a similar check in the vmwgfx driver by checking the bo destroy function, to see whether it's the driver specific destroy, so this first problem should be viewed as a driver bug, as I see it. Note that if you need driver private per-bo information to be added to a bo in order for move() to work, you should carefully review if it's *really* needed, and in that case we must set up a callback to add that information at bo creation, but in general the driver specific move function should be able to handle the base object type. I will send the first solution patch but i haven't yet found an easy way to exercise this path. My concern is that when in ttm_bo_mem_force_space we might fail because we don't wait for the ghost object to actualy die and free space (issue only if no_wait=false). Also i wonder if letting a ghost bo object on lru might not lead to infinite eviction loop. Case i am thinking of : - VRAM is full only 1 object we can evict, we evict it and create a ghost object holding the vram space the eviction is long enough that we put the ghost on lru. ttm_bo_mem_force_space evict the ghost_object and we loop on this. Anyway, what is your thought on this. This situation is actually handled by the evict bool. When @evict==true, no ghost object is created, and eviction is synchronous, so rather than being incorrect, we're being suboptimal. I admit this isn't the most optimal solution. My plan when I get time is to implement fully asynchronous memory management. That means that the managers are in sync with the CPU and not the GPU, and all buffer moves are pipelined, provided that the driver supports it. This also means that I will hang a sync object on each memory type manager, so that if we need to switch hw engine, and sync the manager with the GPU, we can wait on that sync object. This will mean that when you evict a buffer object, its space will immediately show up as free, although it really isn't free yet, but it *will* be free when the gpu executes a move to that memory region, since the eviction will be scheduled before the move to memory. Does the space show up as free immediately even when the fence object of this bo hasn't been signaled? The ttm core now deliver it to a ghost bo and let it track the old bo's space, free its space only when the bo fence signaled. How could manage to fit these modification? would you please show more hints? Yes, the space will show up as free immediately. However, if you want to *use*
Re: TTM eviction ghost object
On Wed, Dec 16, 2009 at 12:12:13AM +0800, Donnie Fang wrote: Hi Thomas, I conclude your meaning as below: a. When CPU join in, it must wait for the sync object to really free the device address space. b. When CPU absent, but there are two indepent HW engines relevant to the space, the one must wait for the sync object. c. Fully pipelined bo move support when only one HW engine related to the space. Am i right? About *b*, let's say, 1)schedule copy the bo from VRAM based on HW DMA engine. 2) Put a corresponding sync object on the manager. 3) Free the vram region. 4) Region gets allocated. 5) GPU 2D render to this region. since GPU 2D and HW DMA engine is totally independent from each other, so sync object still needs to be signaled in this situation. Some hw have way to synchronize btw different part of the GPU so for instance you can tell the 2D pipeline to wait on the hw dma engine before doing any work. If hw doesn't have such synchronization capabilities i believe it's better to only use 1 pipeline of the hw (so forget about hw dma engine and do bo move using the 2d or 3d engine), otherwise you will have to put the CPU in the loop and that would mean stalling the GPU (will more than likely end up in suboptimal use of the GPU). Cheers, Jerome -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
TTM eviction ghost object
Hi Thomas, Dave find out the root of a strange oops we did encouter. I spend sometimes today trying to hack ttm around but in the end my solution is wrong. So when we move an object a ttm ghost object is created. If GPU takes time to evict the bo the ghost object endup on the destroy list stay on the lru list (unless i have missunderstood the code the whole day). No if ghost is in GTT (similar issue can happen in different configuration, bottom line is evicting a ghost object) it can get evicted and that's when trouble start. The driver callback get call with the ghost object but ghost object haven't been created by the driver and thus driver will more than likely endup oupsing trying to access its private bo structure (ttm_bo structure is embeded in radeon_bo structure and any driver relying on accessing the driver structure will hit this issue). I see 2 solutions : - Don't put ghost on lru list - Add a flag so we know if we can call driver callback on object or not. I will send the first solution patch but i haven't yet found an easy way to exercise this path. My concern is that when in ttm_bo_mem_force_space we might fail because we don't wait for the ghost object to actualy die and free space (issue only if no_wait=false). Also i wonder if letting a ghost bo object on lru might not lead to infinite eviction loop. Case i am thinking of : - VRAM is full only 1 object we can evict, we evict it and create a ghost object holding the vram space the eviction is long enough that we put the ghost on lru. ttm_bo_mem_force_space evict the ghost_object and we loop on this. Anyway, what is your thought on this. Cheers, Jerome on the lru and just let it stay on the destroy list, but that doesn't endup that well. -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: TTM eviction ghost object
Jerome Glisse wrote: Hi Thomas, Dave find out the root of a strange oops we did encouter. I spend sometimes today trying to hack ttm around but in the end my solution is wrong. So when we move an object a ttm ghost object is created. If GPU takes time to evict the bo the ghost object endup on the destroy list stay on the lru list (unless i have missunderstood the code the whole day). No if ghost is in GTT (similar issue can happen in different configuration, bottom line is evicting a ghost object) it can get evicted and that's when trouble start. The driver callback get call with the ghost object but ghost object haven't been created by the driver and thus driver will more than likely endup oupsing trying to access its private bo structure (ttm_bo structure is embeded in radeon_bo structure and any driver relying on accessing the driver structure will hit this issue). I see 2 solutions : - Don't put ghost on lru list - Add a flag so we know if we can call driver callback on object or not. Jerome, In general, since the driver bo is *derived* from the ttm bo, and the callback takes the base type, ttm bos as arguments, The driver needs to check the object type before typecasting. We do a similar check in the vmwgfx driver by checking the bo destroy function, to see whether it's the driver specific destroy, so this first problem should be viewed as a driver bug, as I see it. Note that if you need driver private per-bo information to be added to a bo in order for move() to work, you should carefully review if it's *really* needed, and in that case we must set up a callback to add that information at bo creation, but in general the driver specific move function should be able to handle the base object type. I will send the first solution patch but i haven't yet found an easy way to exercise this path. My concern is that when in ttm_bo_mem_force_space we might fail because we don't wait for the ghost object to actualy die and free space (issue only if no_wait=false). Also i wonder if letting a ghost bo object on lru might not lead to infinite eviction loop. Case i am thinking of : - VRAM is full only 1 object we can evict, we evict it and create a ghost object holding the vram space the eviction is long enough that we put the ghost on lru. ttm_bo_mem_force_space evict the ghost_object and we loop on this. Anyway, what is your thought on this. This situation is actually handled by the evict bool. When @evict==true, no ghost object is created, and eviction is synchronous, so rather than being incorrect, we're being suboptimal. I admit this isn't the most optimal solution. My plan when I get time is to implement fully asynchronous memory management. That means that the managers are in sync with the CPU and not the GPU, and all buffer moves are pipelined, provided that the driver supports it. This also means that I will hang a sync object on each memory type manager, so that if we need to switch hw engine, and sync the manager with the GPU, we can wait on that sync object. This will mean that when you evict a buffer object, its space will immediately show up as free, although it really isn't free yet, but it *will* be free when the gpu executes a move to that memory region, since the eviction will be scheduled before the move to memory. Thanks, Thomas Cheers, Jerome -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel