Re: [Mesa-dev] [PATCH 0/2] radeon: Use the DMA engine for buffer downloads
On Tuesday 04 March 2014, 02:08:58, Marek Olšák wrote: Could you please do this without changing u_upload_mgr? You can still use u_upload_alloc to allocate buffer memory in the driver and the map buffer read/write flags are not important with persistent coherent buffer mappings anyway. Since 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 we allocate CPU - GPU streaming buffers (i. e. those with PIPE_USAGE_STREAM) in VRAM. We should therefore set buffer.usage to PIPE_USAGE_STAGING in u_upload_alloc_buffer when we use u_upload_mgr for downloads - otherwise we won't get any performance improvements. Would it now be OK to change u_upload_mgr or do you have a better proposal? Ole Marek On Mon, Mar 3, 2014 at 9:29 PM, Niels Ole Salscheider niels_...@salscheider-online.de wrote: Using the DMA engine for buffer downloads vastly improves performance. This is because reads from VRAM by the CPU are slow because of the high latency of the PCIe bus. The first patch allows u_upload_mgr to be used for downloads, too. The second patch then uses u_upload_mgr in the radeon driver for downloads. I considered to rename u_upload_mgr to u_transfer_mgr since it might be confusing that an upload manager can be used for downloads. But then again we also have transfers so that u_transfer_mgr might also be confusing. Thus, I decided not to rename it for now. Without these patches, the buffer_bandwidth benchmark from uCLbench gives me: ./buffer_bandwidth --size=2000 --iterations=100 # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant memory, 32 KB local memory) 1/1 direct 2000 Bytes 759.29 MB/s(HD) 17.13 MB/s(DD) 14.61 MB/s(DH) With these paches, the read performance is much better: ./buffer_bandwidth --size=2000 --iterations=100 # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant memory, 32 KB local memory) 1/1 direct 2000 Bytes 759.90 MB/s(HD) 613.49 MB/s(DD) 1841.07 MB/s(DH) Judging by these numbers, it might even make sense to use the DMA engine for larger buffer downloads... Niels Ole Salscheider (2): util/u_upload_mgr: Allow to also use it for downloads radeon: Use transfer manager for buffer downloads src/gallium/auxiliary/hud/hud_context.c | 3 +- src/gallium/auxiliary/util/u_blitter.c | 3 +- src/gallium/auxiliary/util/u_upload_mgr.c | 49 +++- src/gallium/auxiliary/util/u_upload_mgr.h | 13 - src/gallium/auxiliary/util/u_vbuf.c | 3 +- src/gallium/auxiliary/vl/vl_compositor.c| 3 +- src/gallium/drivers/ilo/ilo_context.c | 3 +- src/gallium/drivers/r300/r300_context.c | 3 +- src/gallium/drivers/radeon/r600_buffer_common.c | 78 +++-- src/gallium/drivers/radeon/r600_pipe_common.c | 14 - src/gallium/drivers/radeon/r600_pipe_common.h | 1 + src/mesa/state_tracker/st_context.c | 9 ++- 12 files changed, 136 insertions(+), 46 deletions(-) -- 1.9.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/2] radeon: Use the DMA engine for buffer downloads
You can try to do the allocation of the staging buffer with pipe_buffer_create instead of u_upload_mgr. You can also use u_suballocator, which is like a stripped out version of u_upload_mgr. You would need another instance of u_upload_mgr anyway, because we'd like to continue using PIPE_USAGE_STREAM for uploads. Marek On Sat, Aug 9, 2014 at 2:35 PM, Niels Ole Salscheider niels_...@salscheider-online.de wrote: On Tuesday 04 March 2014, 02:08:58, Marek Olšák wrote: Could you please do this without changing u_upload_mgr? You can still use u_upload_alloc to allocate buffer memory in the driver and the map buffer read/write flags are not important with persistent coherent buffer mappings anyway. Since 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 we allocate CPU - GPU streaming buffers (i. e. those with PIPE_USAGE_STREAM) in VRAM. We should therefore set buffer.usage to PIPE_USAGE_STAGING in u_upload_alloc_buffer when we use u_upload_mgr for downloads - otherwise we won't get any performance improvements. Would it now be OK to change u_upload_mgr or do you have a better proposal? Ole Marek On Mon, Mar 3, 2014 at 9:29 PM, Niels Ole Salscheider niels_...@salscheider-online.de wrote: Using the DMA engine for buffer downloads vastly improves performance. This is because reads from VRAM by the CPU are slow because of the high latency of the PCIe bus. The first patch allows u_upload_mgr to be used for downloads, too. The second patch then uses u_upload_mgr in the radeon driver for downloads. I considered to rename u_upload_mgr to u_transfer_mgr since it might be confusing that an upload manager can be used for downloads. But then again we also have transfers so that u_transfer_mgr might also be confusing. Thus, I decided not to rename it for now. Without these patches, the buffer_bandwidth benchmark from uCLbench gives me: ./buffer_bandwidth --size=2000 --iterations=100 # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant memory, 32 KB local memory) 1/1 direct 2000 Bytes 759.29 MB/s(HD) 17.13 MB/s(DD) 14.61 MB/s(DH) With these paches, the read performance is much better: ./buffer_bandwidth --size=2000 --iterations=100 # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant memory, 32 KB local memory) 1/1 direct 2000 Bytes 759.90 MB/s(HD) 613.49 MB/s(DD) 1841.07 MB/s(DH) Judging by these numbers, it might even make sense to use the DMA engine for larger buffer downloads... Niels Ole Salscheider (2): util/u_upload_mgr: Allow to also use it for downloads radeon: Use transfer manager for buffer downloads src/gallium/auxiliary/hud/hud_context.c | 3 +- src/gallium/auxiliary/util/u_blitter.c | 3 +- src/gallium/auxiliary/util/u_upload_mgr.c | 49 +++- src/gallium/auxiliary/util/u_upload_mgr.h | 13 - src/gallium/auxiliary/util/u_vbuf.c | 3 +- src/gallium/auxiliary/vl/vl_compositor.c| 3 +- src/gallium/drivers/ilo/ilo_context.c | 3 +- src/gallium/drivers/r300/r300_context.c | 3 +- src/gallium/drivers/radeon/r600_buffer_common.c | 78 +++-- src/gallium/drivers/radeon/r600_pipe_common.c | 14 - src/gallium/drivers/radeon/r600_pipe_common.h | 1 + src/mesa/state_tracker/st_context.c | 9 ++- 12 files changed, 136 insertions(+), 46 deletions(-) -- 1.9.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/2] radeon: Use the DMA engine for buffer downloads
Could you please do this without changing u_upload_mgr? You can still use u_upload_alloc to allocate buffer memory in the driver and the map buffer read/write flags are not important with persistent coherent buffer mappings anyway. I have sent an updated patch to the list. Ole ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/2] radeon: Use the DMA engine for buffer downloads
Using the DMA engine for buffer downloads vastly improves performance. This is because reads from VRAM by the CPU are slow because of the high latency of the PCIe bus. The first patch allows u_upload_mgr to be used for downloads, too. The second patch then uses u_upload_mgr in the radeon driver for downloads. I considered to rename u_upload_mgr to u_transfer_mgr since it might be confusing that an upload manager can be used for downloads. But then again we also have transfers so that u_transfer_mgr might also be confusing. Thus, I decided not to rename it for now. Without these patches, the buffer_bandwidth benchmark from uCLbench gives me: ./buffer_bandwidth --size=2000 --iterations=100 # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant memory, 32 KB local memory) 1/1 direct 2000 Bytes 759.29 MB/s(HD) 17.13 MB/s(DD) 14.61 MB/s(DH) With these paches, the read performance is much better: ./buffer_bandwidth --size=2000 --iterations=100 # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant memory, 32 KB local memory) 1/1 direct 2000 Bytes 759.90 MB/s(HD) 613.49 MB/s(DD) 1841.07 MB/s(DH) Judging by these numbers, it might even make sense to use the DMA engine for larger buffer downloads... Niels Ole Salscheider (2): util/u_upload_mgr: Allow to also use it for downloads radeon: Use transfer manager for buffer downloads src/gallium/auxiliary/hud/hud_context.c | 3 +- src/gallium/auxiliary/util/u_blitter.c | 3 +- src/gallium/auxiliary/util/u_upload_mgr.c | 49 +++- src/gallium/auxiliary/util/u_upload_mgr.h | 13 - src/gallium/auxiliary/util/u_vbuf.c | 3 +- src/gallium/auxiliary/vl/vl_compositor.c| 3 +- src/gallium/drivers/ilo/ilo_context.c | 3 +- src/gallium/drivers/r300/r300_context.c | 3 +- src/gallium/drivers/radeon/r600_buffer_common.c | 78 +++-- src/gallium/drivers/radeon/r600_pipe_common.c | 14 - src/gallium/drivers/radeon/r600_pipe_common.h | 1 + src/mesa/state_tracker/st_context.c | 9 ++- 12 files changed, 136 insertions(+), 46 deletions(-) -- 1.9.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/2] radeon: Use the DMA engine for buffer downloads
Could you please do this without changing u_upload_mgr? You can still use u_upload_alloc to allocate buffer memory in the driver and the map buffer read/write flags are not important with persistent coherent buffer mappings anyway. Marek On Mon, Mar 3, 2014 at 9:29 PM, Niels Ole Salscheider niels_...@salscheider-online.de wrote: Using the DMA engine for buffer downloads vastly improves performance. This is because reads from VRAM by the CPU are slow because of the high latency of the PCIe bus. The first patch allows u_upload_mgr to be used for downloads, too. The second patch then uses u_upload_mgr in the radeon driver for downloads. I considered to rename u_upload_mgr to u_transfer_mgr since it might be confusing that an upload manager can be used for downloads. But then again we also have transfers so that u_transfer_mgr might also be confusing. Thus, I decided not to rename it for now. Without these patches, the buffer_bandwidth benchmark from uCLbench gives me: ./buffer_bandwidth --size=2000 --iterations=100 # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant memory, 32 KB local memory) 1/1 direct 2000 Bytes 759.29 MB/s(HD) 17.13 MB/s(DD) 14.61 MB/s(DH) With these paches, the read performance is much better: ./buffer_bandwidth --size=2000 --iterations=100 # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant memory, 32 KB local memory) 1/1 direct 2000 Bytes 759.90 MB/s(HD) 613.49 MB/s(DD) 1841.07 MB/s(DH) Judging by these numbers, it might even make sense to use the DMA engine for larger buffer downloads... Niels Ole Salscheider (2): util/u_upload_mgr: Allow to also use it for downloads radeon: Use transfer manager for buffer downloads src/gallium/auxiliary/hud/hud_context.c | 3 +- src/gallium/auxiliary/util/u_blitter.c | 3 +- src/gallium/auxiliary/util/u_upload_mgr.c | 49 +++- src/gallium/auxiliary/util/u_upload_mgr.h | 13 - src/gallium/auxiliary/util/u_vbuf.c | 3 +- src/gallium/auxiliary/vl/vl_compositor.c| 3 +- src/gallium/drivers/ilo/ilo_context.c | 3 +- src/gallium/drivers/r300/r300_context.c | 3 +- src/gallium/drivers/radeon/r600_buffer_common.c | 78 +++-- src/gallium/drivers/radeon/r600_pipe_common.c | 14 - src/gallium/drivers/radeon/r600_pipe_common.h | 1 + src/mesa/state_tracker/st_context.c | 9 ++- 12 files changed, 136 insertions(+), 46 deletions(-) -- 1.9.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev