Hi Lauri, Thanks for your persistence. Seeing that this is reproducible on several boards with up-to-date BIOS is really helpful and gives me some confidence that it's more than a weird vendor or board-specific corner case and that we should be able to reproduce it. Yong is going to start looking into this problem.
Regards, Felix On 3/14/2019 12:41 PM, Lauri Ehrenpreis wrote: Yes it affects this a bit but it doesn't get the speed up to "normal" level. I got best results with "profile_peak" - then the memcpy speed on CPU is 1/3 of what it is without opencl initialization: echo "profile_peak" > /sys/class/drm/card0/device/power_dpm_force_performance_level ./cl_slow_test 1 5 got 1 platforms 1 devices speed 3710.360352 avg 3710.360352 mbytes/s speed 3713.660400 avg 3712.010254 mbytes/s speed 3797.630859 avg 3740.550537 mbytes/s speed 3708.004883 avg 3732.414062 mbytes/s speed 3796.403076 avg 3745.211914 mbytes/s Without calling clCreateContext: ./cl_slow_test 0 5 speed 7299.201660 avg 7299.201660 mbytes/s speed 9298.841797 avg 8299.021484 mbytes/s speed 9360.181641 avg 8652.742188 mbytes/s speed 9004.759766 avg 8740.746094 mbytes/s speed 9414.607422 avg 8875.518555 mbytes/s -- Lauri On Thu, Mar 14, 2019 at 5:46 PM Ernst Sjöstrand <ern...@gmail.com<mailto:ern...@gmail.com>> wrote: Does echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level or setting cpu scaling governor to performance affect it at all? Regards //Ernst Den tors 14 mars 2019 kl 14:31 skrev Lauri Ehrenpreis <lauri...@gmail.com<mailto:lauri...@gmail.com>>: > > I tried also with those 2 boards now: > https://www.asrock.com/MB/AMD/Fatal1ty%20B450%20Gaming-ITXac/index.asp > https://www.msi.com/Motherboard/B450I-GAMING-PLUS-AC > > Both are using latest BIOS, ubuntu 18.10, kernel > https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0.2/ > > There are some differences in dmesg (asrock has some amdgpu assert in dmesg) > but otherwise results are exactly the same. > In desktop env cl_slow_test works fast, over ssh terminal it doesn't. If i > move mouse then it starts working fast in terminal as well. > > So one can't use OpenCL without monitor and desktop env running and this > happens with 2 different chipsets (b350 & b450), latest bios from 3 different > vendors, latest kernel and latest rocm. This doesn't look like edge case with > unusual setup to me.. > > Attached dmesg, dmidecode, and clinfo from both boards. > > -- > Lauri > > On Wed, Mar 13, 2019 at 10:15 PM Lauri Ehrenpreis > <lauri...@gmail.com<mailto:lauri...@gmail.com>> wrote: >> >> For reproduction only the tiny cl_slow_test.cpp is needed which is attached >> to first e-mail. >> >> System information is following: >> CPU: Ryzen5 2400G >> Main board: Gigabyte AMD B450 AORUS mini itx: >> https://www.gigabyte.com/Motherboard/B450-I-AORUS-PRO-WIFI-rev-10#kf >> BIOS: F5 8.47 MB 2019/01/25 (latest) >> Kernel: https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/ (amd64) >> OS: Ubuntu 18.04 LTS >> rocm-opencl-dev installation: >> wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo >> apt-key add - >> echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' >> | sudo tee /etc/apt/sources.list.d/rocm.list >> sudo apt install rocm-opencl-dev >> >> Also exactly the same issue happens with this board: >> https://www.gigabyte.com/Motherboard/GA-AB350-Gaming-3-rev-1x#kf >> >> I have MSI and Asrock mini itx boards ready as well, So far didn't get >> amdgpu & opencl working there but I'll try again tomorrow.. >> >> -- >> Lauri >> >> >> On Wed, Mar 13, 2019 at 8:51 PM Kuehling, Felix >> <felix.kuehl...@amd.com<mailto:felix.kuehl...@amd.com>> wrote: >>> >>> Hi Lauri, >>> >>> I still think the SMU is doing something funny, but rocm-smi isn't >>> showing enough information to really see what's going on. >>> >>> On APUs the SMU firmware is embedded in the system BIOS. Unlike discrete >>> GPUs, the SMU firmware is not loaded by the driver. You could try >>> updating your system BIOS to the latest version available from your main >>> board vendor and see if that makes a difference. It may include a newer >>> version of the SMU firmware, potentially with a fix. >>> >>> If that doesn't help, we'd have to reproduce the problem in house to see >>> what's happening, which may require the same main board and BIOS version >>> you're using. We can ask our SMU firmware team if they've ever >>> encountered your type of problem. But I don't want to give you too much >>> hope. It's a tricky problem involving HW, firmware and multiple driver >>> components in a fairly unusual configuration. >>> >>> Regards, >>> Felix >>> >>> On 2019-03-13 7:28 a.m., Lauri Ehrenpreis wrote: >>> > What I observe is that moving the mouse made the memory speed go up >>> > and also it made mclk=1200Mhz in rocm-smi output. >>> > However if I force mclk to 1200Mhz myself then memory speed is still >>> > slow. >>> > >>> > So rocm-smi output when memory speed went fast due to mouse movement: >>> > rocm-smi >>> > ======================== ROCm System Management Interface >>> > ======================== >>> > ================================================================================================ >>> > GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf >>> > PwrCap SCLK OD MCLK OD GPU% >>> > GPU[0] : WARNING: Empty SysFS value: pclk >>> > GPU[0] : WARNING: Unable to read >>> > /sys/class/drm/card0/device/gpu_busy_percent >>> > 0 44.0c N/A 400Mhz 1200Mhz N/A 0% manual N/A >>> > 0% 0% N/A >>> > ================================================================================================ >>> > ======================== End of ROCm SMI Log >>> > ======================== >>> > >>> > And rocm-smi output when I forced memclk=1200MHz myself: >>> > rocm-smi --setmclk 2 >>> > rocm-smi >>> > ======================== ROCm System Management Interface >>> > ======================== >>> > ================================================================================================ >>> > GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf >>> > PwrCap SCLK OD MCLK OD GPU% >>> > GPU[0] : WARNING: Empty SysFS value: pclk >>> > GPU[0] : WARNING: Unable to read >>> > /sys/class/drm/card0/device/gpu_busy_percent >>> > 0 39.0c N/A 400Mhz 1200Mhz N/A 0% manual N/A >>> > 0% 0% N/A >>> > ================================================================================================ >>> > ======================== End of ROCm SMI Log >>> > ======================== >>> > >>> > So only difference is that temperature shows 44c when memory speed was >>> > fast and 39c when it was slow. But mclk was 1200MHz and sclk was >>> > 400MHz in both cases. >>> > Can it be that rocm-smi just has a bug in reporting and mclk was not >>> > actually 1200MHz when I forced it with rocm-smi --setmclk 2 ? >>> > That would explain the different behaviour.. >>> > >>> > If so then is there a programmatic way how to really guarantee the >>> > high speed mclk? Basically I want do something similar in my program >>> > what happens if I move >>> > the mouse in desktop env and this way guarantee the normal memory >>> > speed each time the program starts. >>> > >>> > -- >>> > Lauri >>> > >>> > >>> > On Tue, Mar 12, 2019 at 11:36 PM Deucher, Alexander >>> > <alexander.deuc...@amd.com<mailto:alexander.deuc...@amd.com> >>> > <mailto:alexander.deuc...@amd.com<mailto:alexander.deuc...@amd.com>>> >>> > wrote: >>> > >>> > Forcing the sclk and mclk high may impact the CPU frequency since >>> > they share TDP. >>> > >>> > Alex >>> > >>> > ------------------------------------------------------------------------ >>> > *From:* amd-gfx >>> > <amd-gfx-boun...@lists.freedesktop.org<mailto:amd-gfx-boun...@lists.freedesktop.org> >>> > >>> > <mailto:amd-gfx-boun...@lists.freedesktop.org<mailto:amd-gfx-boun...@lists.freedesktop.org>>> >>> > on behalf of Lauri >>> > Ehrenpreis <lauri...@gmail.com<mailto:lauri...@gmail.com> >>> > <mailto:lauri...@gmail.com<mailto:lauri...@gmail.com>>> >>> > *Sent:* Tuesday, March 12, 2019 5:31 PM >>> > *To:* Kuehling, Felix >>> > *Cc:* Tom St Denis; >>> > amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> >>> > >>> > <mailto:amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>> >>> > *Subject:* Re: Slow memory access when using OpenCL without X11 >>> > However it's not only related to mclk and sclk. I tried this: >>> > rocm-smi --setsclk 2 >>> > rocm-smi --setmclk 3 >>> > rocm-smi >>> > ======================== ROCm System Management Interface >>> > ======================== >>> > >>> > ================================================================================================ >>> > GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf >>> > PwrCap SCLK OD MCLK OD GPU% >>> > GPU[0] : WARNING: Empty SysFS value: pclk >>> > GPU[0] : WARNING: Unable to read >>> > /sys/class/drm/card0/device/gpu_busy_percent >>> > 0 34.0c N/A 1240Mhz 1333Mhz N/A 0% >>> > manual N/A 0% 0% N/A >>> > >>> > ================================================================================================ >>> > ======================== End of ROCm SMI Log >>> > ======================== >>> > >>> > ./cl_slow_test 1 >>> > got 1 platforms 1 devices >>> > speed 3919.777100 avg 3919.777100 mbytes/s >>> > speed 3809.373291 avg 3864.575195 mbytes/s >>> > speed 585.796814 avg 2771.649170 mbytes/s >>> > speed 188.721848 avg 2125.917236 mbytes/s >>> > speed 188.916367 avg 1738.517090 mbytes/s >>> > >>> > So despite forcing max sclk and mclk the memory speed is still slow.. >>> > >>> > -- >>> > Lauri >>> > >>> > >>> > On Tue, Mar 12, 2019 at 11:21 PM Lauri Ehrenpreis >>> > <lauri...@gmail.com<mailto:lauri...@gmail.com> >>> > <mailto:lauri...@gmail.com<mailto:lauri...@gmail.com>>> wrote: >>> > >>> > IN the case when memory is slow, the rocm-smi outputs this: >>> > ======================== ROCm System Management >>> > Interface ======================== >>> > >>> > ================================================================================================ >>> > GPU Temp AvgPwr SCLK MCLK PCLK Fan >>> > Perf PwrCap SCLK OD MCLK OD GPU% >>> > GPU[0] : WARNING: Empty SysFS value: pclk >>> > GPU[0] : WARNING: Unable to read >>> > /sys/class/drm/card0/device/gpu_busy_percent >>> > 0 30.0c N/A 400Mhz 933Mhz N/A 0% >>> > auto N/A 0% 0% N/A >>> > >>> > ================================================================================================ >>> > ======================== End of ROCm SMI Log >>> > ======================== >>> > >>> > normal memory speed case gives following: >>> > ======================== ROCm System Management >>> > Interface ======================== >>> > >>> > ================================================================================================ >>> > GPU Temp AvgPwr SCLK MCLK PCLK Fan >>> > Perf PwrCap SCLK OD MCLK OD GPU% >>> > GPU[0] : WARNING: Empty SysFS value: pclk >>> > GPU[0] : WARNING: Unable to read >>> > /sys/class/drm/card0/device/gpu_busy_percent >>> > 0 35.0c N/A 400Mhz 1200Mhz N/A 0% >>> > auto N/A 0% 0% N/A >>> > >>> > ================================================================================================ >>> > ======================== End of ROCm SMI Log >>> > ======================== >>> > >>> > So there is a difference in MCLK - can this cause such a huge >>> > slowdown? >>> > >>> > -- >>> > Lauri >>> > >>> > On Tue, Mar 12, 2019 at 6:39 PM Kuehling, Felix >>> > <felix.kuehl...@amd.com<mailto:felix.kuehl...@amd.com> >>> > <mailto:felix.kuehl...@amd.com<mailto:felix.kuehl...@amd.com>>> wrote: >>> > >>> > [adding the list back] >>> > >>> > I'd suspect a problem related to memory clock. This is an >>> > APU where >>> > system memory is shared with the CPU, so if the SMU >>> > changes memory >>> > clocks that would affect CPU memory access performance. If >>> > the problem >>> > only occurs when OpenCL is running, then the compute power >>> > profile could >>> > have an effect here. >>> > >>> > Laurie, can you monitor the clocks during your tests using >>> > rocm-smi? >>> > >>> > Regards, >>> > Felix >>> > >>> > On 2019-03-11 1:15 p.m., Tom St Denis wrote: >>> > > Hi Lauri, >>> > > >>> > > I don't have ROCm installed locally (not on that team at >>> > AMD) but I >>> > > can rope in some of the KFD folk and see what they say :-). >>> > > >>> > > (in the mean time I should look into installing the ROCm >>> > stack on my >>> > > Ubuntu disk for experimentation...). >>> > > >>> > > Only other thing that comes to mind is some sort of >>> > stutter due to >>> > > power/clock gating (or gfx off/etc). But that typically >>> > affects the >>> > > display/gpu side not the CPU side. >>> > > >>> > > Felix: Any known issues with Raven and ROCm interacting >>> > over memory >>> > > bus performance? >>> > > >>> > > Tom >>> > > >>> > > On Mon, Mar 11, 2019 at 12:56 PM Lauri Ehrenpreis >>> > <lauri...@gmail.com<mailto:lauri...@gmail.com> >>> > <mailto:lauri...@gmail.com<mailto:lauri...@gmail.com>> >>> > > <mailto:lauri...@gmail.com<mailto:lauri...@gmail.com> >>> > <mailto:lauri...@gmail.com<mailto:lauri...@gmail.com>>>> >>> > wrote: >>> > > >>> > > Hi! >>> > > >>> > > The 100x memory slowdown is hard to belive indeed. I >>> > attached the >>> > > test program with my first e-mail which depends only on >>> > > rocm-opencl-dev package. Would you mind compiling it >>> > and checking >>> > > if it slows down memory for you as well? >>> > > >>> > > steps: >>> > > 1) g++ cl_slow_test.cpp -o cl_slow_test -I >>> > > /opt/rocm/opencl/include/ -L >>> > /opt/rocm/opencl/lib/x86_64/ -lOpenCL >>> > > 2) logout from desktop env and disconnect >>> > hdmi/diplayport etc >>> > > 3) log in over ssh >>> > > 4) run the program ./cl_slow_test 1 >>> > > >>> > > For me it reproduced even without step 2 as well but >>> > less >>> > > reliably. moving mouse for example could make the >>> > memory speed >>> > > fast again. >>> > > >>> > > -- >>> > > Lauri >>> > > >>> > > >>> > > >>> > > On Mon, Mar 11, 2019 at 6:33 PM Tom St Denis >>> > <tstdeni...@gmail.com<mailto:tstdeni...@gmail.com> >>> > <mailto:tstdeni...@gmail.com<mailto:tstdeni...@gmail.com>> >>> > > >>> > <mailto:tstdeni...@gmail.com<mailto:tstdeni...@gmail.com> >>> > <mailto:tstdeni...@gmail.com<mailto:tstdeni...@gmail.com>>>> >>> > wrote: >>> > > >>> > > Hi Lauri, >>> > > >>> > > There's really no connection between the two >>> > other than they >>> > > run in the same package. I too run a 2400G (as my >>> > > workstation) and I got the same ~6.6GB/sec >>> > transfer rate but >>> > > without a CL app running ... The only logical >>> > reason is your >>> > > CL app is bottlenecking the APUs memory bus but >>> > you claim >>> > > "simply opening a context is enough" so >>> > something else is >>> > > going on. >>> > > >>> > > Your last reply though says "with it running in the >>> > > background" so it's entirely possible the CPU >>> > isn't busy but >>> > > the package memory controller (shared between >>> > both the CPU and >>> > > GPU) is busy. For instance running xonotic in a >>> > 1080p window >>> > > on my 4K display reduced the memory test to >>> > 5.8GB/sec and >>> > > that's hardly a heavy memory bound GPU app. >>> > > >>> > > The only other possible connection is the GPU is >>> > generating so >>> > > much heat that it's throttling the package which >>> > is also >>> > > unlikely if you have a proper HSF attached (I >>> > use the ones >>> > > that came in the retail boxes). >>> > > >>> > > Cheers, >>> > > Tom >>> > > >>> > > > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx