On 11.12.2014 14:13, Andy Lutomirski wrote: > On Wed, Dec 10, 2014 at 8:24 PM, Michel Dänzer <michel at daenzer.net> wrote: >> On 11.12.2014 05:28, Andy Lutomirski wrote: >>> On Wed, Dec 10, 2014 at 1:44 AM, Michel Dänzer <michel at daenzer.net> >>> wrote: >>>> On 10.12.2014 06:39, Andy Lutomirski wrote: >>>>> On Tue, Dec 9, 2014 at 8:06 AM, Andy Lutomirski <luto at amacapital.net> >>>>> wrote: >>>>>> On Tue, Dec 9, 2014 at 1:18 AM, Michel Dänzer <michel at daenzer.net> >>>>>> wrote: >>>>>>> On 09.12.2014 09:24, Andy Lutomirski wrote: >>>>>>>> >>>>>>>> The relevant line from latencytop seems to be: >>>>>>>> >>>>>>>> 154 20441402 489139 radeon_fence_default_wait [radeon] >>>>>>>> fence_wait_timeout ttm_bo_wait [ttm] ttm_bo_move_accel_cleanup [ttm] >>>>>>>> radeon_move_blit.isra.12 [radeon] radeon_bo_move [radeon] >>>>>>>> ttm_bo_handle_move_mem [ttm] ttm_bo_evict [ttm] ttm_mem_evict_first >>>>>>>> [ttm] ttm_bo_mem_space [ttm] ttm_bo_validate [ttm] >>>>>>>> radeon_bo_fault_reserve_notify [radeon] >>>>>>> >>>>>>> Which process is this? >>>>>> >>>>>> Xorg >>>>>> >>>>>>> >>>>>>> Looks like CPU access to a BO in VRAM, but the BO is located outside of >>>>>>> the CPU visible area of VRAM, so it has to be moved into the CPU visible >>>>>>> area first. >> >> [...] >> >>>>> But I'm still waiting for the day that buggy userspace *can't* cause >>>>> kernel graphics stalls. >>>> >>>> Actually, this looks more like buggy userspace stalling itself. :) >>> >>> I thought the stall was the kernel evicting things from vram. Why >>> does it need to wait for userspace for that? Is it that userspace is >>> actively using whatever's being evicted? >> >> As I explained above, the stall happens because userspace does CPU >> access to a BO which resides in the CPU-inaccessible part of VRAM. The >> kernel has to move the BO into the CPU accessible part of VRAM before it >> can let userspace proceed. > > Sure, but why does that take nearly 500ms? Even if the object in > question is the entire framebuffer, that still seems extraordinarily > slow.
It has to wait for any previously queued GPU operations and the eviction of other buffers. Also, TTM buffer moves are currently synchronous, i.e. TTM waits for a buffer to become idle before starting its move, which means we don't get maximum throughput for a series of buffer moves. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer