Re: [darktable-dev] OpenCL failure on AMD kernel 6.6,x

2024-01-08 Thread Germano Massullo
Hello, I am only now reading your reply cause the message was in the 
spam folder, due darktable domain wrong settings 
https://github.com/darktable-org/darktable/issues/15906


ROCm is simply not reliable on consumer cards and never has been. ROCm 
6.0 officially only supports the MI100/MI200/MI300 data center GPUs, 
the Radeon Pro W7900/W6800/V620/VII and the Radeon RX 7900/VII on at 
most kernel 6.2 (Ubuntu 22.04.3). Everything else is unsupported.


Indeed I have another Radeon card which is a RX480 and it is a second 
class citizen since it is not supported by rocm. It does not even have 
HDCP support so I am forced to watch video streaming with a very low 
resolution


When the drm/amdgpu errors started appearing, I would usually have at 
most 20 minutes until the whole machine would lock itself up.


My Radeon 680M suffered till recently of this 1 year old bugreport
https://gitlab.freedesktop.org/drm/amd/-/issues/2220

For me the situation became so unbearable last year, I ended up 
plugging in an Intel Arc A770 GPU besides my RX 6600 XT. The A770 only 
does OpenCL (not just for darktable), the RX 6600 XT only graphics 
(ROCm is not installed). I would maybe be able to get my hand on an 
MI100, but they're passively cooled and don't fit into a standard ATX 
case.


Have you ever tried to unpack propertary AMD OpenCL drivers somewhere 
and exposing them to darktable with a command like
OPENCL_VENDOR_PATH=/home/user/amd_opencl/etc/OpenCL/vendors/ 
LD_LIBRARY_PATH=/home/user/amd_opencl/opt/amdgpu-pro/lib64/ darktable

it works for my old RX480

P.S. Not just using a different/recent kernel can break a working 
setup, but also changing the firmware files (usually installed in 
/lib/firmware/amdgpu/). An update of the firmware package might 
therefore also break a previously working kernel.


Yes you're 100% right
___
darktable developer mailing list
to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org



Re: [darktable-dev] OpenCL failure on AMD kernel 6.6,x

2024-01-08 Thread Šarūnas

On 1/8/24 14:21, Germano Massullo wrote:

Il 21/12/23 21:01, Šarūnas ha scritto:

Debian unstable,
kernel 6.6 from experimental,

Which kernel version are you using? 6.6 is a too broad definition


I don't have that setup anymore, but the kernel must have been

$ uname -a
Linux molly 6.6.9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.6.9-1 
(2024-01-01) x86_64 GNU/Linux


except it was 6.6.8.

--
Šarūnas Burdulis
math.dartmouth.edu/~sarunas

· https://useplaintext.email ·




OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [darktable-dev] OpenCL failure on AMD kernel 6.6,x

2024-01-08 Thread Germano Massullo

Il 21/12/23 21:01, Šarūnas ha scritto:

Debian unstable,
kernel 6.6 from experimental,

Which kernel version are you using? 6.6 is a too broad definition

___
darktable developer mailing list
to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org



Re: [darktable-dev] OpenCL failure on AMD kernel 6.6,x

2024-01-01 Thread Sturm Flut

Hi,

that sounds like one of the endless AMD GPU stability issues other 
people and myself (RX 570, RX 6600 XT, ...) have been having for years.


ROCm is simply not reliable on consumer cards and never has been. ROCm 
6.0 officially only supports the MI100/MI200/MI300 data center GPUs, the 
Radeon Pro W7900/W6800/V620/VII and the Radeon RX 7900/VII on at most 
kernel 6.2 (Ubuntu 22.04.3). Everything else is unsupported.


From what I have gathered over the years, I would say running both 
graphics and compute at the same time on the same AMD GPU without 
messing up the internal memory management/GPU state only seems to work 
reliably under *very* strict constraints.


The data center chips don't support any graphics at all, so no issues 
there (we run MI100/MI210/MI250s at work and I've never seen anything 
like this there). The Radeon Pro W7900/W6800/V620/VII GPUs are usually 
only available in very few and certified configurations, so probably 
less issues to keep track of. But the RDNA2/RDNA3-based consumer GPUs 
are being used in an endless variety of system builds and 
configurations, and I assume AMD developers simply don't invest a lot of 
time in making these more stable since ROCm doesn't support them anyways.


When the drm/amdgpu errors started appearing, I would usually have at 
most 20 minutes until the whole machine would lock itself up. So every 
time I noticed OpenCL issues in darktable, I had to reboot, breaking my 
workflow and concentration. If I had a lot of work before me, I would 
simply turn OpenCL off completely and trade speed for stability.


For me the situation became so unbearable last year, I ended up plugging 
in an Intel Arc A770 GPU besides my RX 6600 XT. The A770 only does 
OpenCL (not just for darktable), the RX 6600 XT only graphics (ROCm is 
not installed). I would maybe be able to get my hand on an MI100, but 
they're passively cooled and don't fit into a standard ATX case.


(The sorry state of OpenCL might also be a reason to put more emphasis 
on darktable's CPU codepaths and other optimisations again.)


cheers,
Simon


P.S. Not just using a different/recent kernel can break a working setup, 
but also changing the firmware files (usually installed in 
/lib/firmware/amdgpu/). An update of the firmware package might 
therefore also break a previously working kernel.




On 21.12.23 21:01, Šarūnas wrote:

Debian unstable,
kernel 6.6 from experimental,
ROCm 6.0,
Radeon RX 7600,
darktable 4.4.2.

OpenCL works 2 out of 3 times between reboots.
~/.cache and ~/.config/darktable are purged between reboots.

When it works, all is fine, no error messages.

When it doesn't (stuck at/before CL kernel compile), attaching to 
darktable process with `strace` shows endless FUTEX'es. dmesg shows 
repeating amdgpu drm and iommu errors (they keep repeating after 
darktable is stopped).





OpenPGP_0xCE9228264D6BD39A_and_old_rev.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [darktable-dev] OpenCL failure on AMD kernel 6.6,x

2023-12-21 Thread Šarūnas

Debian unstable,
kernel 6.6 from experimental,
ROCm 6.0,
Radeon RX 7600,
darktable 4.4.2.

OpenCL works 2 out of 3 times between reboots.
~/.cache and ~/.config/darktable are purged between reboots.

When it works, all is fine, no error messages.

When it doesn't (stuck at/before CL kernel compile), attaching to 
darktable process with `strace` shows endless FUTEX'es. dmesg shows 
repeating amdgpu drm and iommu errors (they keep repeating after 
darktable is stopped).



--
Šarūnas Burdulis
math.dartmouth.edu/~sarunas

· https://useplaintext.email ·




OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [darktable-dev] OpenCL failure on AMD kernel 6.6,x

2023-12-21 Thread Germano Massullo
Hello, make sure to have a kernel >= 6.6, and make sure to have OpenCL 
enabled in darktable. Then, start darktable and check if there are 
errors in dmesg

___
darktable developer mailing list
to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org



[darktable-dev] OpenCL failure on AMD kernel 6.6,x

2023-12-21 Thread Fabio Sirna

Hi Germano,

I have a RX 6700 XT but I'm on Kubuntu stable and 6.2.0-39-generic kernel.

I don't want to mess with my current configuration but if you can point 
me up to something that explain how to test with that kernel, I'll be 
happy to help. I'm thinking about a bootable distro from external hard 
disk, but IDK which distro has that kernel.  I am a little up to date :))


Fabio




OpenPGP_0x87F3511AE682DF25.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [darktable-dev] OpenCL failure on AMD kernel 6.6,x

2023-12-20 Thread Leander Hutton

On 12/20/23 17:35, Germano Massullo wrote:

Everyone with an AMD videocard using OpenCL through ROCm on a Linux system with 
6.6.x kernel, can you please check if after starting darktable you get this [1] 
kind of error messages in dmesg output?


Yes I've been having that error with 6.6.x and a Radeon 6900XT. It also 
prevents the darktable process from stopping, can't even kill -9 it. Rolling 
back to a previous kernel series prevented it from appearing. Leaving a comment 
on the gitlab thread.

Thanks!

Leander

--
---
Leander Hutton
lean...@one-button.org
www.one-button.org

___
darktable developer mailing list
to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org



[darktable-dev] OpenCL failure on AMD kernel 6.6,x

2023-12-20 Thread Germano Massullo
Everyone with an AMD videocard using OpenCL through ROCm on a Linux 
system with 6.6.x kernel, can you please check if after starting 
darktable you get this [1] kind of error messages in dmesg output?
There is a bug in the AMDGPU driver and I would like everyone 
experiencing it to leave a comment so that AMD developers can raise the 
priority level of the bugreport

Thank you

[1]: https://gitlab.freedesktop.org/drm/amd/-/issues/3037#note_2213169
___
darktable developer mailing list
to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org