[Bf-committers] massive cuda speed improvements with Cuda 5.0/5.5
Hi there, I did some tests with cuda 5.0 and 5.5 today and changed the nvcc optimization flags for cycles_kernel_cuda. I found out the following: - --opencc-options is deprecated for sm_20 and up and should be removed from compiler options - Stating -O3 and use_fast_math as nvcc options brings massive speedup on my system (more below) - We shouldnt complain about new cuda toolsets that are slow, we should find a solution as we cant use old software forever To the speedups: Example 1: system: i7-3820 @ 3.60GHz, GeForce GTK 660 Blender (cycles_cuda_kernel) compiled with standard settings: Mike_pan file took 02:06.60 to render Blender (cycles_cuda_kernel) compiled with O3 use-fast-math: Mike_pan took 01:39:93 There is no optical difference in the render results: Image1: http://www.pasteall.org/pic/52757 Image2: http://www.pasteall.org/pic/52758 I bet theres more potential in there. /Jürgen ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
[Bf-committers] massive cuda speed improvements with Cuda 5.0/5.5
Hi all, >As long as MinGW OpenMP isn't fixed it is fixed in http://tdm-gcc.tdragon.net/download ( see openmp download ) and patch to any other MinGW version for gomp library can be found here http://netcologne.dl.sourceforge.net/project/tdm-gcc/Sources/TDM%20Sources/gcc-4.7.1-tdmsrc-1.zip not sure how useful it is, but still - there is a patch which can be applied. Regards Sergey ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] massive cuda speed improvements with Cuda 5.0/5.5
Hi Jurgen, How does this times compare between CUDA 5.0 and 5.5? (is this a speedup from 5.5 but a slowdown in relation with 5.0? or it's an overall speed up ?) -- Dalai 2013/6/3 Jürgen Herrmann : > Hi there, > > > > I did some tests with cuda 5.0 and 5.5 today and changed the nvcc > optimization flags for cycles_kernel_cuda. > > > > I found out the following: > > > > - “--opencc-options “ is deprecated for sm_20 and up and should be > removed from compiler options > > - Stating “-O3” and “—use_fast_math” as nvcc options brings massive > speedup on my system (more below) > > - We shouldn’t complain about new cuda toolsets that are slow, we > should find a solution as we can’t use old software forever… > > > > To the speedups: > > > > Example 1: > > system: i7-3820 @ 3.60GHz, GeForce GTK 660 > > > > Blender (cycles_cuda_kernel) compiled with standard settings: > > Mike_pan file took 02:06.60 to render > > > > Blender (cycles_cuda_kernel) compiled with –O3 –use-fast-math: > > Mike_pan took 01:39:93 > > > > There is no optical difference in the render results: > > > > Image1: http://www.pasteall.org/pic/52757 > > Image2: http://www.pasteall.org/pic/52758 > > > > I bet there’s more potential in there. > > > > /Jürgen > > ___ > Bf-committers mailing list > Bf-committers@blender.org > http://lists.blender.org/mailman/listinfo/bf-committers ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] massive cuda speed improvements with Cuda 5.0/5.5
Hi Dalai, I tested 5.5 on a different system I don't have access to this machine right now, I'll deliver the complete benchmark results tomorrow. I plan to compare on as many different configurations with 32 and 64 bit and different cuda versions. This will take some time but I think it's worth it. ;) /Jürgen Am 03.06.2013 um 20:50 schrieb Dalai Felinto : > Hi Jurgen, > > How does this times compare between CUDA 5.0 and 5.5? > (is this a speedup from 5.5 but a slowdown in relation with 5.0? or > it's an overall speed up ?) > > -- > Dalai > > 2013/6/3 Jürgen Herrmann : >> Hi there, >> >> >> >> I did some tests with cuda 5.0 and 5.5 today and changed the nvcc >> optimization flags for cycles_kernel_cuda. >> >> >> >> I found out the following: >> >> >> >> - “--opencc-options “ is deprecated for sm_20 and up and should be >> removed from compiler options >> >> - Stating “-O3” and “—use_fast_math” as nvcc options brings massive >> speedup on my system (more below) >> >> - We shouldn’t complain about new cuda toolsets that are slow, we >> should find a solution as we can’t use old software forever… >> >> >> >> To the speedups: >> >> >> >> Example 1: >> >> system: i7-3820 @ 3.60GHz, GeForce GTK 660 >> >> >> >> Blender (cycles_cuda_kernel) compiled with standard settings: >> >> Mike_pan file took 02:06.60 to render >> >> >> >> Blender (cycles_cuda_kernel) compiled with –O3 –use-fast-math: >> >> Mike_pan took 01:39:93 >> >> >> >> There is no optical difference in the render results: >> >> >> >> Image1: http://www.pasteall.org/pic/52757 >> >> Image2: http://www.pasteall.org/pic/52758 >> >> >> >> I bet there’s more potential in there. >> >> >> >> /Jürgen >> >> ___ >> Bf-committers mailing list >> Bf-committers@blender.org >> http://lists.blender.org/mailman/listinfo/bf-committers > ___ > Bf-committers mailing list > Bf-committers@blender.org > http://lists.blender.org/mailman/listinfo/bf-committers ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] massive cuda speed improvements with Cuda 5.0/5.5
Thanks for testing. I've also been doing some experimenting with compile flags and other things here. So far it seems I can make my 650M render a few percentages faster compared to CUDA 4.2, but 460 GT is still considerably slower with the BMW scene (2m30s with 5.5 compared to 2m01s with 4.2), and 580 GTX had a similar difference. It seems you are testing with a 6xx card so that makes sense. Patch attached for those who want to test this with 5.0/5.5. On Mon, Jun 3, 2013 at 8:46 PM, Jürgen Herrmann wrote: > Hi there, > > > > I did some tests with cuda 5.0 and 5.5 today and changed the nvcc > optimization flags for cycles_kernel_cuda. > > > > I found out the following: > > > > - “--opencc-options “ is deprecated for sm_20 and up and should be > removed from compiler options > > - Stating “-O3” and “—use_fast_math” as nvcc options brings massive > speedup on my system (more below) > > - We shouldn’t complain about new cuda toolsets that are slow, we > should find a solution as we can’t use old software forever… > > > > To the speedups: > > > > Example 1: > > system: i7-3820 @ 3.60GHz, GeForce GTK 660 > > > > Blender (cycles_cuda_kernel) compiled with standard settings: > > Mike_pan file took 02:06.60 to render > > > > Blender (cycles_cuda_kernel) compiled with –O3 –use-fast-math: > > Mike_pan took 01:39:93 > > > > There is no optical difference in the render results: > > > > Image1: http://www.pasteall.org/pic/52757 > > Image2: http://www.pasteall.org/pic/52758 > > > > I bet there’s more potential in there. > > > > /Jürgen > > ___ > Bf-committers mailing list > Bf-committers@blender.org > http://lists.blender.org/mailman/listinfo/bf-committers diff --git a/intern/cycles/device/device_cuda.cpp b/intern/cycles/device/device_cuda.cpp index f32c6dd..27978b9 100644 --- a/intern/cycles/device/device_cuda.cpp +++ b/intern/cycles/device/device_cuda.cpp @@ -46,6 +46,7 @@ public: map tex_interp_map; int cuDevId; bool first_error; + vector cuStreams; struct PixelMem { GLuint cuPBO; @@ -205,6 +206,12 @@ public: if(cuda_error_(result, "cuCtxCreate")) return; + const int num_streams = 8; + cuStreams.resize(num_streams); + + for(int i = 0; i < num_streams; i++) + cuStreamCreate(&cuStreams[i], 0); + cuda_pop_context(); } @@ -212,6 +219,9 @@ public: { task_pool.stop(); + for(int i = 0; i < cuStreams.size(); i++) + cuStreamDestroy(cuStreams[i]); + cuda_push_context(); cuda_assert(cuCtxDetach(cuContext)) } @@ -514,7 +524,7 @@ public: } } - void path_trace(RenderTile& rtile, int sample) + void path_trace(RenderTile& rtile, int sample, CUstream stream) { if(have_error()) return; @@ -575,9 +585,9 @@ public: cuda_assert(cuFuncSetCacheConfig(cuPathTrace, CU_FUNC_CACHE_PREFER_L1)) cuda_assert(cuFuncSetBlockShape(cuPathTrace, xthreads, ythreads, 1)) - cuda_assert(cuLaunchGrid(cuPathTrace, xblocks, yblocks)) + cuda_assert(cuLaunchGridAsync(cuPathTrace, xblocks, yblocks, stream)) - cuda_assert(cuCtxSynchronize()) + //cuda_assert(cuCtxSynchronize()) cuda_pop_context(); } @@ -882,12 +892,35 @@ public: void thread_run(DeviceTask *task) { if(task->type == DeviceTask::PATH_TRACE) { - RenderTile tile; + vector concurrent_tiles(cuStreams.size()); + vector have_tile(cuStreams.size()); /* keep rendering tiles until done */ - while(task->acquire_tile(this, tile)) { - int start_sample = tile.start_sample; - int end_sample = tile.start_sample + tile.num_samples; + while(1) { + int start_sample = -1; + int end_sample = -1; + + for(int i = 0; i < concurrent_tiles.size(); i++) { + RenderTile& tile = concurrent_tiles[i]; + + if(task->acquire_tile(this, tile)) { + have_tile[i] = true; + + if(start_sample == -1) { + start_sample = tile.start_sample; + end_sample = tile.start_sample + tile.num_samples; + } +
Re: [Bf-committers] massive cuda speed improvements with Cuda 5.0/5.5
Hi Brecht, you're welcome ;) I think I'll have to include these compiler flag optimizations into vs2012 builds otherwise these builds will be significantly slower than other compilers :( Cycles has two problems when it comes to windows/vs2012 builds: 1. It is much slower on CPU 2. It is slower with cuda, if we don't use opt flags. As long as MinGW OpenMP isn't fixed we have to stick to vs2008/cuda 4.2 or find a decent solution for VS2012/Cuda 5.5. As blender seems to be used mostly by windows users this isn't optimal. /Jürgen Am 03.06.2013 um 21:20 schrieb Brecht Van Lommel : > Thanks for testing. I've also been doing some experimenting with > compile flags and other things here. So far it seems I can make my > 650M render a few percentages faster compared to CUDA 4.2, but 460 GT > is still considerably slower with the BMW scene (2m30s with 5.5 > compared to 2m01s with 4.2), and 580 GTX had a similar difference. It > seems you are testing with a 6xx card so that makes sense. > > Patch attached for those who want to test this with 5.0/5.5. > > On Mon, Jun 3, 2013 at 8:46 PM, Jürgen Herrmann wrote: >> Hi there, >> >> >> >> I did some tests with cuda 5.0 and 5.5 today and changed the nvcc >> optimization flags for cycles_kernel_cuda. >> >> >> >> I found out the following: >> >> >> >> - “--opencc-options “ is deprecated for sm_20 and up and should be >> removed from compiler options >> >> - Stating “-O3” and “—use_fast_math” as nvcc options brings massive >> speedup on my system (more below) >> >> - We shouldn’t complain about new cuda toolsets that are slow, we >> should find a solution as we can’t use old software forever… >> >> >> >> To the speedups: >> >> >> >> Example 1: >> >> system: i7-3820 @ 3.60GHz, GeForce GTK 660 >> >> >> >> Blender (cycles_cuda_kernel) compiled with standard settings: >> >> Mike_pan file took 02:06.60 to render >> >> >> >> Blender (cycles_cuda_kernel) compiled with –O3 –use-fast-math: >> >> Mike_pan took 01:39:93 >> >> >> >> There is no optical difference in the render results: >> >> >> >> Image1: http://www.pasteall.org/pic/52757 >> >> Image2: http://www.pasteall.org/pic/52758 >> >> >> >> I bet there’s more potential in there. >> >> >> >> /Jürgen >> >> ___ >> Bf-committers mailing list >> Bf-committers@blender.org >> http://lists.blender.org/mailman/listinfo/bf-committers > > ___ > Bf-committers mailing list > Bf-committers@blender.org > http://lists.blender.org/mailman/listinfo/bf-committers ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] massive cuda speed improvements with Cuda 5.0/5.5
Hi Jürgen, Am 03.06.2013 20:46, schrieb Jürgen Herrmann: > - “--opencc-options “ is deprecated for sm_20 and up and should be > removed from compiler options This option should not be harmful, we just kept it in for sm_1x architecture. Although I am not sure if sm_1x still builds at all, we dropped official support for it with Blender 2.67, so probably can be removed. > - Stating “-O3” and “—use_fast_math” as nvcc options brings massive > speedup on my system (more below) We cannot use fast math everywhere, I remember that Brecht changed the code to only use fast math functions selectively, to avoid some precision problems. http://projects.blender.org/scm/viewvc.php?view=rev&root=bf-blender&revision=47133 But that could not be valid with Toolkit 5.x anymore, needs further tests. > > - We shouldn’t complain about new cuda toolsets that are slow, we > should find a solution as we can’t use old software forever… > True, but still it's sad that an upgrade causes all those problems. :) Thanks for testing! Thomas -- Thomas Dinges Blender Developer, Artist and Musician www.dingto.org ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] massive cuda speed improvements with Cuda 5.0/5.5
Hi Brecht, this looks very promising. On my Geforce 540M (Windows x64, driver 320.20) I get those render times with the BMW scene (100 Samples, 128x128 tiles). Vanilla Trunk (Toolkit 4.2): 2.29 minutes Vanilla Trunk (Toolkit 5.5 RC): 3.54 minutes Trunk + patch (Toolkit 5.5 RC): 3.00 minutes So, that is definitely much better. :) Best regards, Thomas Am 03.06.2013 21:20, schrieb Brecht Van Lommel: > Thanks for testing. I've also been doing some experimenting with > compile flags and other things here. So far it seems I can make my > 650M render a few percentages faster compared to CUDA 4.2, but 460 GT > is still considerably slower with the BMW scene (2m30s with 5.5 > compared to 2m01s with 4.2), and 580 GTX had a similar difference. It > seems you are testing with a 6xx card so that makes sense. > > Patch attached for those who want to test this with 5.0/5.5. -- Thomas Dinges Blender Developer, Artist and Musician www.dingto.org ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] massive cuda speed improvements with Cuda 5.0/5.5
hi > it is fixed in http://tdm-gcc.tdragon.net/download ( see openmp download ) > and patch to any other MinGW version for gomp library can be found here > > http://netcologne.dl.sourceforge.net/project/tdm-gcc/Sources/TDM%20Sources/gcc-4.7.1-tdmsrc-1.zip > can Antony maybe check this and enable openmp correctly and solves the mingw64 problem, this could be the answer guys! Regards Yousef Harfoush ba...@msn.com ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] massive cuda speed improvements with Cuda 5.0/5.5
hi again i had time so i did some tests with a simple smoke scene: gcc 4.7.1 wingw64 with openmp NOT compiled took 1:46 s gcc 4.7.1 wingw64 with openmp COMPILED crashes gcc 4.7.1 wingw64 with replacing the fixed dlls from the site and openmp COMPILED took 0:25 s it seems the problem has been fixed :) i hope some one DO a full blender regression test. Regards Yousef Harfoush ba...@msn.com ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] massive cuda speed improvements with Cuda 5.0/5.5
Sounds promising! If that really works we could just drop the whole MSVC stuff and build with mingw ;) Am 04.06.2013 um 04:09 schrieb Yousef Hurfoush : > hi again > > i had time so i did some tests with a simple smoke scene: > > gcc 4.7.1 wingw64 with openmp NOT compiled took 1:46 s > gcc 4.7.1 wingw64 with openmp COMPILED crashes > gcc 4.7.1 wingw64 with replacing the fixed dlls from the site and openmp > COMPILED took 0:25 s > > it seems the problem has been fixed :) > > i hope some one DO a full blender regression test. > > Regards > Yousef Harfoush > ba...@msn.com > > > ___ > Bf-committers mailing list > Bf-committers@blender.org > http://lists.blender.org/mailman/listinfo/bf-committers ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers