Re: [darktable-dev] OpenCL failure on AMD kernel 6.6,x
Hello, I am only now reading your reply cause the message was in the spam folder, due darktable domain wrong settings https://github.com/darktable-org/darktable/issues/15906 ROCm is simply not reliable on consumer cards and never has been. ROCm 6.0 officially only supports the MI100/MI200/MI300 data center GPUs, the Radeon Pro W7900/W6800/V620/VII and the Radeon RX 7900/VII on at most kernel 6.2 (Ubuntu 22.04.3). Everything else is unsupported. Indeed I have another Radeon card which is a RX480 and it is a second class citizen since it is not supported by rocm. It does not even have HDCP support so I am forced to watch video streaming with a very low resolution When the drm/amdgpu errors started appearing, I would usually have at most 20 minutes until the whole machine would lock itself up. My Radeon 680M suffered till recently of this 1 year old bugreport https://gitlab.freedesktop.org/drm/amd/-/issues/2220 For me the situation became so unbearable last year, I ended up plugging in an Intel Arc A770 GPU besides my RX 6600 XT. The A770 only does OpenCL (not just for darktable), the RX 6600 XT only graphics (ROCm is not installed). I would maybe be able to get my hand on an MI100, but they're passively cooled and don't fit into a standard ATX case. Have you ever tried to unpack propertary AMD OpenCL drivers somewhere and exposing them to darktable with a command like OPENCL_VENDOR_PATH=/home/user/amd_opencl/etc/OpenCL/vendors/ LD_LIBRARY_PATH=/home/user/amd_opencl/opt/amdgpu-pro/lib64/ darktable it works for my old RX480 P.S. Not just using a different/recent kernel can break a working setup, but also changing the firmware files (usually installed in /lib/firmware/amdgpu/). An update of the firmware package might therefore also break a previously working kernel. Yes you're 100% right ___ darktable developer mailing list to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org
Re: [darktable-dev] OpenCL failure on AMD kernel 6.6,x
On 1/8/24 14:21, Germano Massullo wrote: Il 21/12/23 21:01, Šarūnas ha scritto: Debian unstable, kernel 6.6 from experimental, Which kernel version are you using? 6.6 is a too broad definition I don't have that setup anymore, but the kernel must have been $ uname -a Linux molly 6.6.9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.6.9-1 (2024-01-01) x86_64 GNU/Linux except it was 6.6.8. -- Šarūnas Burdulis math.dartmouth.edu/~sarunas · https://useplaintext.email · OpenPGP_signature.asc Description: OpenPGP digital signature
Re: [darktable-dev] OpenCL failure on AMD kernel 6.6,x
Il 21/12/23 21:01, Šarūnas ha scritto: Debian unstable, kernel 6.6 from experimental, Which kernel version are you using? 6.6 is a too broad definition ___ darktable developer mailing list to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org
Re: [darktable-dev] OpenCL failure on AMD kernel 6.6,x
Hi, that sounds like one of the endless AMD GPU stability issues other people and myself (RX 570, RX 6600 XT, ...) have been having for years. ROCm is simply not reliable on consumer cards and never has been. ROCm 6.0 officially only supports the MI100/MI200/MI300 data center GPUs, the Radeon Pro W7900/W6800/V620/VII and the Radeon RX 7900/VII on at most kernel 6.2 (Ubuntu 22.04.3). Everything else is unsupported. From what I have gathered over the years, I would say running both graphics and compute at the same time on the same AMD GPU without messing up the internal memory management/GPU state only seems to work reliably under *very* strict constraints. The data center chips don't support any graphics at all, so no issues there (we run MI100/MI210/MI250s at work and I've never seen anything like this there). The Radeon Pro W7900/W6800/V620/VII GPUs are usually only available in very few and certified configurations, so probably less issues to keep track of. But the RDNA2/RDNA3-based consumer GPUs are being used in an endless variety of system builds and configurations, and I assume AMD developers simply don't invest a lot of time in making these more stable since ROCm doesn't support them anyways. When the drm/amdgpu errors started appearing, I would usually have at most 20 minutes until the whole machine would lock itself up. So every time I noticed OpenCL issues in darktable, I had to reboot, breaking my workflow and concentration. If I had a lot of work before me, I would simply turn OpenCL off completely and trade speed for stability. For me the situation became so unbearable last year, I ended up plugging in an Intel Arc A770 GPU besides my RX 6600 XT. The A770 only does OpenCL (not just for darktable), the RX 6600 XT only graphics (ROCm is not installed). I would maybe be able to get my hand on an MI100, but they're passively cooled and don't fit into a standard ATX case. (The sorry state of OpenCL might also be a reason to put more emphasis on darktable's CPU codepaths and other optimisations again.) cheers, Simon P.S. Not just using a different/recent kernel can break a working setup, but also changing the firmware files (usually installed in /lib/firmware/amdgpu/). An update of the firmware package might therefore also break a previously working kernel. On 21.12.23 21:01, Šarūnas wrote: Debian unstable, kernel 6.6 from experimental, ROCm 6.0, Radeon RX 7600, darktable 4.4.2. OpenCL works 2 out of 3 times between reboots. ~/.cache and ~/.config/darktable are purged between reboots. When it works, all is fine, no error messages. When it doesn't (stuck at/before CL kernel compile), attaching to darktable process with `strace` shows endless FUTEX'es. dmesg shows repeating amdgpu drm and iommu errors (they keep repeating after darktable is stopped). OpenPGP_0xCE9228264D6BD39A_and_old_rev.asc Description: OpenPGP public key OpenPGP_signature.asc Description: OpenPGP digital signature
Re: [darktable-dev] OpenCL failure on AMD kernel 6.6,x
Debian unstable, kernel 6.6 from experimental, ROCm 6.0, Radeon RX 7600, darktable 4.4.2. OpenCL works 2 out of 3 times between reboots. ~/.cache and ~/.config/darktable are purged between reboots. When it works, all is fine, no error messages. When it doesn't (stuck at/before CL kernel compile), attaching to darktable process with `strace` shows endless FUTEX'es. dmesg shows repeating amdgpu drm and iommu errors (they keep repeating after darktable is stopped). -- Šarūnas Burdulis math.dartmouth.edu/~sarunas · https://useplaintext.email · OpenPGP_signature.asc Description: OpenPGP digital signature
Re: [darktable-dev] OpenCL failure on AMD kernel 6.6,x
Hello, make sure to have a kernel >= 6.6, and make sure to have OpenCL enabled in darktable. Then, start darktable and check if there are errors in dmesg ___ darktable developer mailing list to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org
[darktable-dev] OpenCL failure on AMD kernel 6.6,x
Hi Germano, I have a RX 6700 XT but I'm on Kubuntu stable and 6.2.0-39-generic kernel. I don't want to mess with my current configuration but if you can point me up to something that explain how to test with that kernel, I'll be happy to help. I'm thinking about a bootable distro from external hard disk, but IDK which distro has that kernel. I am a little up to date :)) Fabio OpenPGP_0x87F3511AE682DF25.asc Description: OpenPGP public key OpenPGP_signature.asc Description: OpenPGP digital signature
Re: [darktable-dev] OpenCL failure on AMD kernel 6.6,x
On 12/20/23 17:35, Germano Massullo wrote: Everyone with an AMD videocard using OpenCL through ROCm on a Linux system with 6.6.x kernel, can you please check if after starting darktable you get this [1] kind of error messages in dmesg output? Yes I've been having that error with 6.6.x and a Radeon 6900XT. It also prevents the darktable process from stopping, can't even kill -9 it. Rolling back to a previous kernel series prevented it from appearing. Leaving a comment on the gitlab thread. Thanks! Leander -- --- Leander Hutton lean...@one-button.org www.one-button.org ___ darktable developer mailing list to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org
[darktable-dev] OpenCL failure on AMD kernel 6.6,x
Everyone with an AMD videocard using OpenCL through ROCm on a Linux system with 6.6.x kernel, can you please check if after starting darktable you get this [1] kind of error messages in dmesg output? There is a bug in the AMDGPU driver and I would like everyone experiencing it to leave a comment so that AMD developers can raise the priority level of the bugreport Thank you [1]: https://gitlab.freedesktop.org/drm/amd/-/issues/3037#note_2213169 ___ darktable developer mailing list to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org