Re: Change queue/pipe split between amdkfd and amdgpu
On 02/16/2017 03:00 PM, Bridgman, John wrote: > Any objections to authorizing Oded to post the kfdtest binary he is using to > some public place (if not there already) so others (like Andres) can test > changes which touch on amdkfd ? > > We should check it for embarrassing symbols but otherwise it should be OK. someone was up late for a dead line? lol > > That said, since we are getting perilously close to actually sending dGPU > support changes upstream we will need (IMO) to maintain a sanitized source > repo for kfdtest as well... sharing the binary just gets us started. > Hi John, Yes, this is the sort of thing I've been referring to for some time now. We definitely need some kind of centralized mechanism to test/validate kfd stuff so if you can get this out that would be great! A binary would be a start, I am sure we can made do and its certainly better than nothing, however source much like what happened with UMR would be of course ideal. I suggest to you that it would perhaps be good if we could arrange some kind of IRC meeting regarding kfd? Since it seems there is a bit of fragmented effort here. I have my own ioctl()'s locally for pinning for my own project which I am not sure are suitable to just upstream as AMD has its own take so what should we do? I heard so much about dGPU support for a couple of years now but only seen bits thrown over the wall. Can we begin a more serious incremental approach happening ASAP? I created #amdkfd on freenode some time ago which a couple of interested academics and users hang. Kind Regards, Edward. > Thanks, > John > >> -Original Message- >> From: Oded Gabbay [mailto:oded.gab...@gmail.com] >> Sent: Friday, February 10, 2017 12:57 PM >> To: Andres Rodriguez >> Cc: Kuehling, Felix; Bridgman, John; amd-gfx@lists.freedesktop.org; >> Deucher, Alexander; Jay Cornwall >> Subject: Re: Change queue/pipe split between amdkfd and amdgpu >> >> I don't have a repo, nor do I have the source code. >> It is a tool that we developed inside AMD (when I was working there), and >> after I left AMD I got permission to use the binary for regressions testing. >> >> Oded >> >> On Fri, Feb 10, 2017 at 6:33 PM, Andres Rodriguez >> wrote: >>> Hey Oded, >>> >>> Where can I find a repo with kfdtest? >>> >>> I tried looking here bit couldn't find it: >>> >>> https://cgit.freedesktop.org/~gabbayo/ >>> >>> -Andres >>> >>> >>> >>> On 2017-02-10 05:35 AM, Oded Gabbay wrote: >>>> >>>> So the warning in dmesg is gone of course, but the test (that I >>>> mentioned in previous email) still fails, and this time it caused the >>>> kernel to crash. In addition, now other tests fail as well, e.g. >>>> KFDEventTest.SignalEvent >>>> >>>> I honestly suggest to take some time to debug this patch-set on an >>>> actual Kaveri machine and then re-send the patches. >>>> >>>> Thanks, >>>> Oded >>>> >>>> log of crash from KFDQMTest.CreateMultipleCpQueues: >>>> >>>> [ 160.900137] kfd: qcm fence wait loop timeout expired [ >>>> 160.900143] kfd: the cp might be in an unrecoverable state due to an >>>> unsuccessful queues preemption [ 160.916765] show_signal_msg: 36 >>>> callbacks suppressed [ 160.916771] kfdtest[2498]: segfault at >>>> 17f8a ip 7f8ae932ee5d sp 7ffc52219cd0 error 4 in >>>> libhsakmt-1.so.0.0.1[7f8ae932b000+8000] >>>> [ 163.152229] kfd: qcm fence wait loop timeout expired [ >>>> 163.152250] BUG: unable to handle kernel NULL pointer dereference at >>>> 005a [ 163.152299] IP: >>>> kfd_get_process_device_data+0x6/0x30 [amdkfd] [ 163.152323] PGD >>>> 2333aa067 [ 163.152323] PUD 230f64067 [ 163.152335] PMD 0 >>>> >>>> [ 163.152364] Oops: [#1] SMP >>>> [ 163.152379] Modules linked in: joydev edac_mce_amd edac_core >>>> input_leds kvm_amd snd_hda_codec_realtek kvm irqbypass >>>> snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel >> snd_hda_codec >>>> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core >>>> snd_hwdep pcbc snd_pcm aesni_intel snd_seq_midi snd_seq_midi_event >>>> snd_rawmidi snd_seq aes_x86_64 crypto_simd snd_seq_device >> glue_helper >>>> cryptd snd_timer snd fam15h_power k10temp soundcore i2c_piix4 shpchp >>>> tpm_infineon mac_hid parport_pc ppdev nfsd auth_rpcgss nfs_acl lockd >>>> lp grace sunrpc parport a
RE: Change queue/pipe split between amdkfd and amdgpu
Any objections to authorizing Oded to post the kfdtest binary he is using to some public place (if not there already) so others (like Andres) can test changes which touch on amdkfd ? We should check it for embarrassing symbols but otherwise it should be OK. That said, since we are getting perilously close to actually sending dGPU support changes upstream we will need (IMO) to maintain a sanitized source repo for kfdtest as well... sharing the binary just gets us started. Thanks, John >-Original Message- >From: Oded Gabbay [mailto:oded.gab...@gmail.com] >Sent: Friday, February 10, 2017 12:57 PM >To: Andres Rodriguez >Cc: Kuehling, Felix; Bridgman, John; amd-gfx@lists.freedesktop.org; >Deucher, Alexander; Jay Cornwall >Subject: Re: Change queue/pipe split between amdkfd and amdgpu > >I don't have a repo, nor do I have the source code. >It is a tool that we developed inside AMD (when I was working there), and >after I left AMD I got permission to use the binary for regressions testing. > >Oded > >On Fri, Feb 10, 2017 at 6:33 PM, Andres Rodriguez >wrote: >> Hey Oded, >> >> Where can I find a repo with kfdtest? >> >> I tried looking here bit couldn't find it: >> >> https://cgit.freedesktop.org/~gabbayo/ >> >> -Andres >> >> >> >> On 2017-02-10 05:35 AM, Oded Gabbay wrote: >>> >>> So the warning in dmesg is gone of course, but the test (that I >>> mentioned in previous email) still fails, and this time it caused the >>> kernel to crash. In addition, now other tests fail as well, e.g. >>> KFDEventTest.SignalEvent >>> >>> I honestly suggest to take some time to debug this patch-set on an >>> actual Kaveri machine and then re-send the patches. >>> >>> Thanks, >>> Oded >>> >>> log of crash from KFDQMTest.CreateMultipleCpQueues: >>> >>> [ 160.900137] kfd: qcm fence wait loop timeout expired [ >>> 160.900143] kfd: the cp might be in an unrecoverable state due to an >>> unsuccessful queues preemption [ 160.916765] show_signal_msg: 36 >>> callbacks suppressed [ 160.916771] kfdtest[2498]: segfault at >>> 17f8a ip 7f8ae932ee5d sp 7ffc52219cd0 error 4 in >>> libhsakmt-1.so.0.0.1[7f8ae932b000+8000] >>> [ 163.152229] kfd: qcm fence wait loop timeout expired [ >>> 163.152250] BUG: unable to handle kernel NULL pointer dereference at >>> 005a [ 163.152299] IP: >>> kfd_get_process_device_data+0x6/0x30 [amdkfd] [ 163.152323] PGD >>> 2333aa067 [ 163.152323] PUD 230f64067 [ 163.152335] PMD 0 >>> >>> [ 163.152364] Oops: [#1] SMP >>> [ 163.152379] Modules linked in: joydev edac_mce_amd edac_core >>> input_leds kvm_amd snd_hda_codec_realtek kvm irqbypass >>> snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel >snd_hda_codec >>> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core >>> snd_hwdep pcbc snd_pcm aesni_intel snd_seq_midi snd_seq_midi_event >>> snd_rawmidi snd_seq aes_x86_64 crypto_simd snd_seq_device >glue_helper >>> cryptd snd_timer snd fam15h_power k10temp soundcore i2c_piix4 shpchp >>> tpm_infineon mac_hid parport_pc ppdev nfsd auth_rpcgss nfs_acl lockd >>> lp grace sunrpc parport autofs4 hid_logitech_hidpp hid_logitech_dj >>> hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 radeon >>> i2c_algo_bit ttm drm_kms_helper syscopyarea ahci sysfillrect >>> sysimgblt libahci fb_sys_fops drm r8169 mii fjes video [ 163.152668] >>> CPU: 3 PID: 2498 Comm: kfdtest Not tainted 4.10.0-rc5+ #3 [ >>> 163.152695] Hardware name: Gigabyte Technology Co., Ltd. To be filled >>> by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014 [ 163.152735] task: >>> 995e73d16580 task.stack: b41144458000 [ 163.152764] RIP: >>> 0010:kfd_get_process_device_data+0x6/0x30 [amdkfd] [ 163.152790] >>> RSP: 0018:b4114445bab0 EFLAGS: 00010246 [ 163.152812] RAX: >>> ffea RBX: 995e75909c00 RCX: >>> >>> [ 163.152841] RDX: RSI: ffea RDI: >>> 995e75909600 >>> [ 163.152869] RBP: b4114445bae0 R08: 000252a5 R09: >>> 0414 >>> [ 163.152898] R10: R11: b412d38d R12: >>> ffc2 >>> [ 163.152926] R13: R14: 995e75909ca8 R15: >>> 995e75909c00 >>> [ 163.152956] FS: 7f8ae975e740() GS:995e7ed8() >>> knlGS: >>> [ 163.152988] CS: 0010 DS: ES: CR0:
Re: Change queue/pipe split between amdkfd and amdgpu
I don't have a repo, nor do I have the source code. It is a tool that we developed inside AMD (when I was working there), and after I left AMD I got permission to use the binary for regressions testing. Oded On Fri, Feb 10, 2017 at 6:33 PM, Andres Rodriguez wrote: > Hey Oded, > > Where can I find a repo with kfdtest? > > I tried looking here bit couldn't find it: > > https://cgit.freedesktop.org/~gabbayo/ > > -Andres > > > > On 2017-02-10 05:35 AM, Oded Gabbay wrote: >> >> So the warning in dmesg is gone of course, but the test (that I >> mentioned in previous email) still fails, and this time it caused the >> kernel to crash. In addition, now other tests fail as well, e.g. >> KFDEventTest.SignalEvent >> >> I honestly suggest to take some time to debug this patch-set on an >> actual Kaveri machine and then re-send the patches. >> >> Thanks, >> Oded >> >> log of crash from KFDQMTest.CreateMultipleCpQueues: >> >> [ 160.900137] kfd: qcm fence wait loop timeout expired >> [ 160.900143] kfd: the cp might be in an unrecoverable state due to >> an unsuccessful queues preemption >> [ 160.916765] show_signal_msg: 36 callbacks suppressed >> [ 160.916771] kfdtest[2498]: segfault at 17f8a ip >> 7f8ae932ee5d sp 7ffc52219cd0 error 4 in >> libhsakmt-1.so.0.0.1[7f8ae932b000+8000] >> [ 163.152229] kfd: qcm fence wait loop timeout expired >> [ 163.152250] BUG: unable to handle kernel NULL pointer dereference >> at 005a >> [ 163.152299] IP: kfd_get_process_device_data+0x6/0x30 [amdkfd] >> [ 163.152323] PGD 2333aa067 >> [ 163.152323] PUD 230f64067 >> [ 163.152335] PMD 0 >> >> [ 163.152364] Oops: [#1] SMP >> [ 163.152379] Modules linked in: joydev edac_mce_amd edac_core >> input_leds kvm_amd snd_hda_codec_realtek kvm irqbypass >> snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec >> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core >> snd_hwdep pcbc snd_pcm aesni_intel snd_seq_midi snd_seq_midi_event >> snd_rawmidi snd_seq aes_x86_64 crypto_simd snd_seq_device glue_helper >> cryptd snd_timer snd fam15h_power k10temp soundcore i2c_piix4 shpchp >> tpm_infineon mac_hid parport_pc ppdev nfsd auth_rpcgss nfs_acl lockd >> lp grace sunrpc parport autofs4 hid_logitech_hidpp hid_logitech_dj >> hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 radeon >> i2c_algo_bit ttm drm_kms_helper syscopyarea ahci sysfillrect sysimgblt >> libahci fb_sys_fops drm r8169 mii fjes video >> [ 163.152668] CPU: 3 PID: 2498 Comm: kfdtest Not tainted 4.10.0-rc5+ #3 >> [ 163.152695] Hardware name: Gigabyte Technology Co., Ltd. To be >> filled by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014 >> [ 163.152735] task: 995e73d16580 task.stack: b41144458000 >> [ 163.152764] RIP: 0010:kfd_get_process_device_data+0x6/0x30 [amdkfd] >> [ 163.152790] RSP: 0018:b4114445bab0 EFLAGS: 00010246 >> [ 163.152812] RAX: ffea RBX: 995e75909c00 RCX: >> >> [ 163.152841] RDX: RSI: ffea RDI: >> 995e75909600 >> [ 163.152869] RBP: b4114445bae0 R08: 000252a5 R09: >> 0414 >> [ 163.152898] R10: R11: b412d38d R12: >> ffc2 >> [ 163.152926] R13: R14: 995e75909ca8 R15: >> 995e75909c00 >> [ 163.152956] FS: 7f8ae975e740() GS:995e7ed8() >> knlGS: >> [ 163.152988] CS: 0010 DS: ES: CR0: 80050033 >> [ 163.153012] CR2: 005a CR3: 0002216ab000 CR4: >> 000406e0 >> [ 163.153041] Call Trace: >> [ 163.153059] ? destroy_queues_cpsch+0x166/0x190 [amdkfd] >> [ 163.153086] execute_queues_cpsch+0x2e/0xc0 [amdkfd] >> [ 163.153113] destroy_queue_cpsch+0xbd/0x140 [amdkfd] >> [ 163.153139] pqm_destroy_queue+0x111/0x1d0 [amdkfd] >> [ 163.153164] pqm_uninit+0x3f/0xb0 [amdkfd] >> [ 163.153186] kfd_unbind_process_from_device+0x51/0xd0 [amdkfd] >> [ 163.153214] iommu_pasid_shutdown_callback+0x20/0x30 [amdkfd] >> [ 163.153239] mn_release+0x37/0x70 [amd_iommu_v2] >> [ 163.153261] __mmu_notifier_release+0x44/0xc0 >> [ 163.153281] exit_mmap+0x15a/0x170 >> [ 163.153297] ? __wake_up+0x44/0x50 >> [ 163.153314] ? exit_robust_list+0x5c/0x110 >> [ 163.15] mmput+0x57/0x140 >> [ 163.153347] do_exit+0x26b/0xb30 >> [ 163.153362] do_group_exit+0x43/0xb0 >> [ 163.153379] get_signal+0x293/0x620 >> [ 163.153396] do_signal+0x37/0x760 >> [ 163.153411] ? print_vma_addr+0x82/0x100 >> [ 163.153429] ? vprintk_default+0x29/0x50 >> [ 163.153447] ? bad_area+0x46/0x50 >> [ 163.153463] ? __do_page_fault+0x3c7/0x4e0 >> [ 163.153481] exit_to_usermode_loop+0x76/0xb0 >> [ 163.153500] prepare_exit_to_usermode+0x2f/0x40 >> [ 163.153521] retint_user+0x8/0x10 >> [ 163.153536] RIP: 0033:0x7f8ae932ee5d >> [ 163.153551] RSP: 002b:7ffc52219cd0 EFLAGS: 00010202 >> [ 163.153573] RAX: 0003 RBX: 00017f8a RCX: >> 7ffc52219d00 >> [ 163.153602] RDX: 7f
Re: Change queue/pipe split between amdkfd and amdgpu
Hey Oded, Where can I find a repo with kfdtest? I tried looking here bit couldn't find it: https://cgit.freedesktop.org/~gabbayo/ -Andres On 2017-02-10 05:35 AM, Oded Gabbay wrote: So the warning in dmesg is gone of course, but the test (that I mentioned in previous email) still fails, and this time it caused the kernel to crash. In addition, now other tests fail as well, e.g. KFDEventTest.SignalEvent I honestly suggest to take some time to debug this patch-set on an actual Kaveri machine and then re-send the patches. Thanks, Oded log of crash from KFDQMTest.CreateMultipleCpQueues: [ 160.900137] kfd: qcm fence wait loop timeout expired [ 160.900143] kfd: the cp might be in an unrecoverable state due to an unsuccessful queues preemption [ 160.916765] show_signal_msg: 36 callbacks suppressed [ 160.916771] kfdtest[2498]: segfault at 17f8a ip 7f8ae932ee5d sp 7ffc52219cd0 error 4 in libhsakmt-1.so.0.0.1[7f8ae932b000+8000] [ 163.152229] kfd: qcm fence wait loop timeout expired [ 163.152250] BUG: unable to handle kernel NULL pointer dereference at 005a [ 163.152299] IP: kfd_get_process_device_data+0x6/0x30 [amdkfd] [ 163.152323] PGD 2333aa067 [ 163.152323] PUD 230f64067 [ 163.152335] PMD 0 [ 163.152364] Oops: [#1] SMP [ 163.152379] Modules linked in: joydev edac_mce_amd edac_core input_leds kvm_amd snd_hda_codec_realtek kvm irqbypass snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core snd_hwdep pcbc snd_pcm aesni_intel snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq aes_x86_64 crypto_simd snd_seq_device glue_helper cryptd snd_timer snd fam15h_power k10temp soundcore i2c_piix4 shpchp tpm_infineon mac_hid parport_pc ppdev nfsd auth_rpcgss nfs_acl lockd lp grace sunrpc parport autofs4 hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 radeon i2c_algo_bit ttm drm_kms_helper syscopyarea ahci sysfillrect sysimgblt libahci fb_sys_fops drm r8169 mii fjes video [ 163.152668] CPU: 3 PID: 2498 Comm: kfdtest Not tainted 4.10.0-rc5+ #3 [ 163.152695] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014 [ 163.152735] task: 995e73d16580 task.stack: b41144458000 [ 163.152764] RIP: 0010:kfd_get_process_device_data+0x6/0x30 [amdkfd] [ 163.152790] RSP: 0018:b4114445bab0 EFLAGS: 00010246 [ 163.152812] RAX: ffea RBX: 995e75909c00 RCX: [ 163.152841] RDX: RSI: ffea RDI: 995e75909600 [ 163.152869] RBP: b4114445bae0 R08: 000252a5 R09: 0414 [ 163.152898] R10: R11: b412d38d R12: ffc2 [ 163.152926] R13: R14: 995e75909ca8 R15: 995e75909c00 [ 163.152956] FS: 7f8ae975e740() GS:995e7ed8() knlGS: [ 163.152988] CS: 0010 DS: ES: CR0: 80050033 [ 163.153012] CR2: 005a CR3: 0002216ab000 CR4: 000406e0 [ 163.153041] Call Trace: [ 163.153059] ? destroy_queues_cpsch+0x166/0x190 [amdkfd] [ 163.153086] execute_queues_cpsch+0x2e/0xc0 [amdkfd] [ 163.153113] destroy_queue_cpsch+0xbd/0x140 [amdkfd] [ 163.153139] pqm_destroy_queue+0x111/0x1d0 [amdkfd] [ 163.153164] pqm_uninit+0x3f/0xb0 [amdkfd] [ 163.153186] kfd_unbind_process_from_device+0x51/0xd0 [amdkfd] [ 163.153214] iommu_pasid_shutdown_callback+0x20/0x30 [amdkfd] [ 163.153239] mn_release+0x37/0x70 [amd_iommu_v2] [ 163.153261] __mmu_notifier_release+0x44/0xc0 [ 163.153281] exit_mmap+0x15a/0x170 [ 163.153297] ? __wake_up+0x44/0x50 [ 163.153314] ? exit_robust_list+0x5c/0x110 [ 163.15] mmput+0x57/0x140 [ 163.153347] do_exit+0x26b/0xb30 [ 163.153362] do_group_exit+0x43/0xb0 [ 163.153379] get_signal+0x293/0x620 [ 163.153396] do_signal+0x37/0x760 [ 163.153411] ? print_vma_addr+0x82/0x100 [ 163.153429] ? vprintk_default+0x29/0x50 [ 163.153447] ? bad_area+0x46/0x50 [ 163.153463] ? __do_page_fault+0x3c7/0x4e0 [ 163.153481] exit_to_usermode_loop+0x76/0xb0 [ 163.153500] prepare_exit_to_usermode+0x2f/0x40 [ 163.153521] retint_user+0x8/0x10 [ 163.153536] RIP: 0033:0x7f8ae932ee5d [ 163.153551] RSP: 002b:7ffc52219cd0 EFLAGS: 00010202 [ 163.153573] RAX: 0003 RBX: 00017f8a RCX: 7ffc52219d00 [ 163.153602] RDX: 7f8ae9534220 RSI: 7f8ae8b5eb28 RDI: 00017f8a [ 163.153630] RBP: 7ffc52219d20 R08: 01cc1890 R09: [ 163.153659] R10: 0027 R11: 7f8ae932ee10 R12: 01cc52a0 [ 163.153687] R13: 7ffc5221a200 R14: 0021 R15: [ 163.153716] Code: e0 04 00 00 48 3b 91 f0 03 00 00 74 01 c3 55 48 89 e5 e8 2e f9 ff ff 5d c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 <48> 8b 46 70 48 83 c6 70 48 89 e5 48 39 f0 74 16 48 3b 78 10 75 [ 163.153818] RIP: kfd_get_process_device_data
Re: Change queue/pipe split between amdkfd and amdgpu
So the warning in dmesg is gone of course, but the test (that I mentioned in previous email) still fails, and this time it caused the kernel to crash. In addition, now other tests fail as well, e.g. KFDEventTest.SignalEvent I honestly suggest to take some time to debug this patch-set on an actual Kaveri machine and then re-send the patches. Thanks, Oded log of crash from KFDQMTest.CreateMultipleCpQueues: [ 160.900137] kfd: qcm fence wait loop timeout expired [ 160.900143] kfd: the cp might be in an unrecoverable state due to an unsuccessful queues preemption [ 160.916765] show_signal_msg: 36 callbacks suppressed [ 160.916771] kfdtest[2498]: segfault at 17f8a ip 7f8ae932ee5d sp 7ffc52219cd0 error 4 in libhsakmt-1.so.0.0.1[7f8ae932b000+8000] [ 163.152229] kfd: qcm fence wait loop timeout expired [ 163.152250] BUG: unable to handle kernel NULL pointer dereference at 005a [ 163.152299] IP: kfd_get_process_device_data+0x6/0x30 [amdkfd] [ 163.152323] PGD 2333aa067 [ 163.152323] PUD 230f64067 [ 163.152335] PMD 0 [ 163.152364] Oops: [#1] SMP [ 163.152379] Modules linked in: joydev edac_mce_amd edac_core input_leds kvm_amd snd_hda_codec_realtek kvm irqbypass snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core snd_hwdep pcbc snd_pcm aesni_intel snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq aes_x86_64 crypto_simd snd_seq_device glue_helper cryptd snd_timer snd fam15h_power k10temp soundcore i2c_piix4 shpchp tpm_infineon mac_hid parport_pc ppdev nfsd auth_rpcgss nfs_acl lockd lp grace sunrpc parport autofs4 hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 radeon i2c_algo_bit ttm drm_kms_helper syscopyarea ahci sysfillrect sysimgblt libahci fb_sys_fops drm r8169 mii fjes video [ 163.152668] CPU: 3 PID: 2498 Comm: kfdtest Not tainted 4.10.0-rc5+ #3 [ 163.152695] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014 [ 163.152735] task: 995e73d16580 task.stack: b41144458000 [ 163.152764] RIP: 0010:kfd_get_process_device_data+0x6/0x30 [amdkfd] [ 163.152790] RSP: 0018:b4114445bab0 EFLAGS: 00010246 [ 163.152812] RAX: ffea RBX: 995e75909c00 RCX: [ 163.152841] RDX: RSI: ffea RDI: 995e75909600 [ 163.152869] RBP: b4114445bae0 R08: 000252a5 R09: 0414 [ 163.152898] R10: R11: b412d38d R12: ffc2 [ 163.152926] R13: R14: 995e75909ca8 R15: 995e75909c00 [ 163.152956] FS: 7f8ae975e740() GS:995e7ed8() knlGS: [ 163.152988] CS: 0010 DS: ES: CR0: 80050033 [ 163.153012] CR2: 005a CR3: 0002216ab000 CR4: 000406e0 [ 163.153041] Call Trace: [ 163.153059] ? destroy_queues_cpsch+0x166/0x190 [amdkfd] [ 163.153086] execute_queues_cpsch+0x2e/0xc0 [amdkfd] [ 163.153113] destroy_queue_cpsch+0xbd/0x140 [amdkfd] [ 163.153139] pqm_destroy_queue+0x111/0x1d0 [amdkfd] [ 163.153164] pqm_uninit+0x3f/0xb0 [amdkfd] [ 163.153186] kfd_unbind_process_from_device+0x51/0xd0 [amdkfd] [ 163.153214] iommu_pasid_shutdown_callback+0x20/0x30 [amdkfd] [ 163.153239] mn_release+0x37/0x70 [amd_iommu_v2] [ 163.153261] __mmu_notifier_release+0x44/0xc0 [ 163.153281] exit_mmap+0x15a/0x170 [ 163.153297] ? __wake_up+0x44/0x50 [ 163.153314] ? exit_robust_list+0x5c/0x110 [ 163.15] mmput+0x57/0x140 [ 163.153347] do_exit+0x26b/0xb30 [ 163.153362] do_group_exit+0x43/0xb0 [ 163.153379] get_signal+0x293/0x620 [ 163.153396] do_signal+0x37/0x760 [ 163.153411] ? print_vma_addr+0x82/0x100 [ 163.153429] ? vprintk_default+0x29/0x50 [ 163.153447] ? bad_area+0x46/0x50 [ 163.153463] ? __do_page_fault+0x3c7/0x4e0 [ 163.153481] exit_to_usermode_loop+0x76/0xb0 [ 163.153500] prepare_exit_to_usermode+0x2f/0x40 [ 163.153521] retint_user+0x8/0x10 [ 163.153536] RIP: 0033:0x7f8ae932ee5d [ 163.153551] RSP: 002b:7ffc52219cd0 EFLAGS: 00010202 [ 163.153573] RAX: 0003 RBX: 00017f8a RCX: 7ffc52219d00 [ 163.153602] RDX: 7f8ae9534220 RSI: 7f8ae8b5eb28 RDI: 00017f8a [ 163.153630] RBP: 7ffc52219d20 R08: 01cc1890 R09: [ 163.153659] R10: 0027 R11: 7f8ae932ee10 R12: 01cc52a0 [ 163.153687] R13: 7ffc5221a200 R14: 0021 R15: [ 163.153716] Code: e0 04 00 00 48 3b 91 f0 03 00 00 74 01 c3 55 48 89 e5 e8 2e f9 ff ff 5d c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 <48> 8b 46 70 48 83 c6 70 48 89 e5 48 39 f0 74 16 48 3b 78 10 75 [ 163.153818] RIP: kfd_get_process_device_data+0x6/0x30 [amdkfd] RSP: b4114445bab0 [ 163.153848] CR2: 005a [ 163.160389] ---[ end trace f6a8177c7119c1f5 ]--- [ 163.160390] Fixing recursive fault but reboot is needed!
Re: Change queue/pipe split between amdkfd and amdgpu
Hey Oded, Sorry to be a nuisance, but if you have everything still setup could you give this fix a quick go? diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 5321d18..9f70ee0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -667,7 +667,7 @@ static int set_sched_resources(struct device_queue_manager *dqm) /* This situation may be hit in the future if a new HW * generation exposes more than 64 queues. If so, the * definition of res.queue_mask needs updating */ - if (WARN_ON(i > sizeof(res.queue_mask))) { + if (WARN_ON(i > (sizeof(res.queue_mask)*8))) { pr_err("Invalid queue enabled by amdgpu: %d\n", i); break; } John/Felix, Any chance I could borrow a carrizo/kaveri for a few days? Or maybe you could help me run some final tests on this patch series? - Andres On 2017-02-09 03:11 PM, Oded Gabbay wrote: Andres, I tried your patches on Kaveri with airlied's drm-next branch. I used radeon+amdkfd The following test failed: KFDQMTest.CreateMultipleCpQueues However, I can't debug it because I don't have the sources of kfdtest. In dmesg, I saw the following warning during boot: WARNING: CPU: 0 PID: 150 at drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:670 start_cpsch+0xc5/0x220 [amdkfd] [4.393796] Modules linked in: hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 radeon(+) i2c_algo_bit ttm drm_kms_helper syscopyarea ahci sysfillrect sysimgblt libahci fb_sys_fops drm r8169 mii fjes video [4.393811] CPU: 0 PID: 150 Comm: systemd-udevd Not tainted 4.10.0-rc5+ #1 [4.393811] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014 [4.393812] Call Trace: [4.393818] dump_stack+0x63/0x90 [4.393822] __warn+0xcb/0xf0 [4.393823] warn_slowpath_null+0x1d/0x20 [4.393830] start_cpsch+0xc5/0x220 [amdkfd] [4.393836] ? initialize_cpsch+0xa0/0xb0 [amdkfd] [4.393841] kgd2kfd_device_init+0x375/0x490 [amdkfd] [4.393883] radeon_kfd_device_init+0xaf/0xd0 [radeon] [4.393911] radeon_driver_load_kms+0x11e/0x1f0 [radeon] [4.393933] drm_dev_register+0x14a/0x200 [drm] [4.393946] drm_get_pci_dev+0x9d/0x160 [drm] [4.393974] radeon_pci_probe+0xb8/0xe0 [radeon] [4.393976] local_pci_probe+0x45/0xa0 [4.393978] pci_device_probe+0x103/0x150 [4.393981] driver_probe_device+0x2bf/0x460 [4.393982] __driver_attach+0xdf/0xf0 [4.393984] ? driver_probe_device+0x460/0x460 [4.393985] bus_for_each_dev+0x6c/0xc0 [4.393987] driver_attach+0x1e/0x20 [4.393988] bus_add_driver+0x1fd/0x270 [4.393989] ? 0xc05c8000 [4.393991] driver_register+0x60/0xe0 [4.393992] ? 0xc05c8000 [4.393993] __pci_register_driver+0x4c/0x50 [4.394007] drm_pci_init+0xeb/0x100 [drm] [4.394008] ? 0xc05c8000 [4.394031] radeon_init+0x98/0xb6 [radeon] [4.394034] do_one_initcall+0x53/0x1a0 [4.394037] ? __vunmap+0x81/0xd0 [4.394039] ? kmem_cache_alloc_trace+0x152/0x1c0 [4.394041] ? vfree+0x2e/0x70 [4.394044] do_init_module+0x5f/0x1ff [4.394046] load_module+0x24cc/0x29f0 [4.394047] ? __symbol_put+0x60/0x60 [4.394050] ? security_kernel_post_read_file+0x6b/0x80 [4.394052] SYSC_finit_module+0xdf/0x110 [4.394054] SyS_finit_module+0xe/0x10 [4.394056] entry_SYSCALL_64_fastpath+0x1e/0xad [4.394058] RIP: 0033:0x7f9cda77c8e9 [4.394059] RSP: 002b:7ffe195d3378 EFLAGS: 0246 ORIG_RAX: 0139 [4.394060] RAX: ffda RBX: 7f9cdb8dda7e RCX: 7f9cda77c8e9 [4.394061] RDX: RSI: 7f9cdac7ce2a RDI: 0013 [4.394062] RBP: 7ffe195d2450 R08: R09: [4.394063] R10: 0013 R11: 0246 R12: 7ffe195d245a [4.394063] R13: 7ffe195d1378 R14: 563f70cc93b0 R15: 563f70cba4d0 [4.394091] ---[ end trace 9c5af17304d998bb ]--- [4.394092] Invalid queue enabled by amdgpu: 9 I suggest you get a Kaveri/Carrizo machine to debug these issues. Until that, I don't think we should merge this patch-set. Oded On Wed, Feb 8, 2017 at 9:47 PM, Andres Rodriguez wrote: Thank you Oded. - Andres On 2017-02-08 02:32 PM, Oded Gabbay wrote: On Wed, Feb 8, 2017 at 6:23 PM, Andres Rodriguez wrote: Hey Felix, Thanks for the pointer to the ROCm mqd commit. I like that the workarounds are easy to spot. I'll add that to a new patch series I'm working on for some bug-fixes for perf being lower on pipes other than pipe 0. I haven't tested this yet on kaveri/carrizo. I'm hoping someone with the HW will be able to give it a go. I put in a few small hacks to g
Re: Change queue/pipe split between amdkfd and amdgpu
Thanks Oded for the test results. I'll work on a fix. Regards, Andres On Thu, Feb 9, 2017 at 3:11 PM, Oded Gabbay wrote: > Andres, > > I tried your patches on Kaveri with airlied's drm-next branch. > I used radeon+amdkfd > > The following test failed: KFDQMTest.CreateMultipleCpQueues > However, I can't debug it because I don't have the sources of kfdtest. > > In dmesg, I saw the following warning during boot: > WARNING: CPU: 0 PID: 150 at > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:670 > start_cpsch+0xc5/0x220 [amdkfd] > [4.393796] Modules linked in: hid_logitech_hidpp hid_logitech_dj > hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 radeon(+) > i2c_algo_bit ttm drm_kms_helper syscopyarea ahci sysfillrect sysimgblt > libahci fb_sys_fops drm r8169 mii fjes video > [4.393811] CPU: 0 PID: 150 Comm: systemd-udevd Not tainted 4.10.0-rc5+ > #1 > [4.393811] Hardware name: Gigabyte Technology Co., Ltd. To be > filled by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014 > [4.393812] Call Trace: > [4.393818] dump_stack+0x63/0x90 > [4.393822] __warn+0xcb/0xf0 > [4.393823] warn_slowpath_null+0x1d/0x20 > [4.393830] start_cpsch+0xc5/0x220 [amdkfd] > [4.393836] ? initialize_cpsch+0xa0/0xb0 [amdkfd] > [4.393841] kgd2kfd_device_init+0x375/0x490 [amdkfd] > [4.393883] radeon_kfd_device_init+0xaf/0xd0 [radeon] > [4.393911] radeon_driver_load_kms+0x11e/0x1f0 [radeon] > [4.393933] drm_dev_register+0x14a/0x200 [drm] > [4.393946] drm_get_pci_dev+0x9d/0x160 [drm] > [4.393974] radeon_pci_probe+0xb8/0xe0 [radeon] > [4.393976] local_pci_probe+0x45/0xa0 > [4.393978] pci_device_probe+0x103/0x150 > [4.393981] driver_probe_device+0x2bf/0x460 > [4.393982] __driver_attach+0xdf/0xf0 > [4.393984] ? driver_probe_device+0x460/0x460 > [4.393985] bus_for_each_dev+0x6c/0xc0 > [4.393987] driver_attach+0x1e/0x20 > [4.393988] bus_add_driver+0x1fd/0x270 > [4.393989] ? 0xc05c8000 > [4.393991] driver_register+0x60/0xe0 > [4.393992] ? 0xc05c8000 > [4.393993] __pci_register_driver+0x4c/0x50 > [4.394007] drm_pci_init+0xeb/0x100 [drm] > [4.394008] ? 0xc05c8000 > [4.394031] radeon_init+0x98/0xb6 [radeon] > [4.394034] do_one_initcall+0x53/0x1a0 > [4.394037] ? __vunmap+0x81/0xd0 > [4.394039] ? kmem_cache_alloc_trace+0x152/0x1c0 > [4.394041] ? vfree+0x2e/0x70 > [4.394044] do_init_module+0x5f/0x1ff > [4.394046] load_module+0x24cc/0x29f0 > [4.394047] ? __symbol_put+0x60/0x60 > [4.394050] ? security_kernel_post_read_file+0x6b/0x80 > [4.394052] SYSC_finit_module+0xdf/0x110 > [4.394054] SyS_finit_module+0xe/0x10 > [4.394056] entry_SYSCALL_64_fastpath+0x1e/0xad > [4.394058] RIP: 0033:0x7f9cda77c8e9 > [4.394059] RSP: 002b:7ffe195d3378 EFLAGS: 0246 ORIG_RAX: > 0139 > [4.394060] RAX: ffda RBX: 7f9cdb8dda7e RCX: > 7f9cda77c8e9 > [4.394061] RDX: RSI: 7f9cdac7ce2a RDI: > 0013 > [4.394062] RBP: 7ffe195d2450 R08: R09: > > [4.394063] R10: 0013 R11: 0246 R12: > 7ffe195d245a > [4.394063] R13: 7ffe195d1378 R14: 563f70cc93b0 R15: > 563f70cba4d0 > [4.394091] ---[ end trace 9c5af17304d998bb ]--- > [4.394092] Invalid queue enabled by amdgpu: 9 > > I suggest you get a Kaveri/Carrizo machine to debug these issues. > > Until that, I don't think we should merge this patch-set. > > Oded > > On Wed, Feb 8, 2017 at 9:47 PM, Andres Rodriguez > wrote: > > Thank you Oded. > > > > - Andres > > > > > > On 2017-02-08 02:32 PM, Oded Gabbay wrote: > >> > >> On Wed, Feb 8, 2017 at 6:23 PM, Andres Rodriguez > >> wrote: > >>> > >>> Hey Felix, > >>> > >>> Thanks for the pointer to the ROCm mqd commit. I like that the > >>> workarounds > >>> are easy to spot. I'll add that to a new patch series I'm working on > for > >>> some bug-fixes for perf being lower on pipes other than pipe 0. > >>> > >>> I haven't tested this yet on kaveri/carrizo. I'm hoping someone with > the > >>> HW > >>> will be able to give it a go. I put in a few small hacks to get KFD to > >>> boot > >>> but do nothing on polaris10. > >>> > >>> Regards, > >>> Andres > >>> > >>> > >>> On 2017-02-06 03:20 PM, Felix Kuehling wrote: > > Hi Andres, > > Thank you for tackling this task. It's more involved than I expected, > mostly because I didn't have much awareness of the MQD management in > amdgpu. > > I made one comment in a separate message about the unified MQD commit > function, if you want to bring that more in line with our latest ROCm > release on github. > > Also, were you able to test the upstream KFD with your changes on a > Kaveri or Carrizo? > > Regards, > Felix > > > On 17-02-03 11:51 PM, Andres Rodriguez
Re: Change queue/pipe split between amdkfd and amdgpu
Andres, I tried your patches on Kaveri with airlied's drm-next branch. I used radeon+amdkfd The following test failed: KFDQMTest.CreateMultipleCpQueues However, I can't debug it because I don't have the sources of kfdtest. In dmesg, I saw the following warning during boot: WARNING: CPU: 0 PID: 150 at drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:670 start_cpsch+0xc5/0x220 [amdkfd] [4.393796] Modules linked in: hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 radeon(+) i2c_algo_bit ttm drm_kms_helper syscopyarea ahci sysfillrect sysimgblt libahci fb_sys_fops drm r8169 mii fjes video [4.393811] CPU: 0 PID: 150 Comm: systemd-udevd Not tainted 4.10.0-rc5+ #1 [4.393811] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014 [4.393812] Call Trace: [4.393818] dump_stack+0x63/0x90 [4.393822] __warn+0xcb/0xf0 [4.393823] warn_slowpath_null+0x1d/0x20 [4.393830] start_cpsch+0xc5/0x220 [amdkfd] [4.393836] ? initialize_cpsch+0xa0/0xb0 [amdkfd] [4.393841] kgd2kfd_device_init+0x375/0x490 [amdkfd] [4.393883] radeon_kfd_device_init+0xaf/0xd0 [radeon] [4.393911] radeon_driver_load_kms+0x11e/0x1f0 [radeon] [4.393933] drm_dev_register+0x14a/0x200 [drm] [4.393946] drm_get_pci_dev+0x9d/0x160 [drm] [4.393974] radeon_pci_probe+0xb8/0xe0 [radeon] [4.393976] local_pci_probe+0x45/0xa0 [4.393978] pci_device_probe+0x103/0x150 [4.393981] driver_probe_device+0x2bf/0x460 [4.393982] __driver_attach+0xdf/0xf0 [4.393984] ? driver_probe_device+0x460/0x460 [4.393985] bus_for_each_dev+0x6c/0xc0 [4.393987] driver_attach+0x1e/0x20 [4.393988] bus_add_driver+0x1fd/0x270 [4.393989] ? 0xc05c8000 [4.393991] driver_register+0x60/0xe0 [4.393992] ? 0xc05c8000 [4.393993] __pci_register_driver+0x4c/0x50 [4.394007] drm_pci_init+0xeb/0x100 [drm] [4.394008] ? 0xc05c8000 [4.394031] radeon_init+0x98/0xb6 [radeon] [4.394034] do_one_initcall+0x53/0x1a0 [4.394037] ? __vunmap+0x81/0xd0 [4.394039] ? kmem_cache_alloc_trace+0x152/0x1c0 [4.394041] ? vfree+0x2e/0x70 [4.394044] do_init_module+0x5f/0x1ff [4.394046] load_module+0x24cc/0x29f0 [4.394047] ? __symbol_put+0x60/0x60 [4.394050] ? security_kernel_post_read_file+0x6b/0x80 [4.394052] SYSC_finit_module+0xdf/0x110 [4.394054] SyS_finit_module+0xe/0x10 [4.394056] entry_SYSCALL_64_fastpath+0x1e/0xad [4.394058] RIP: 0033:0x7f9cda77c8e9 [4.394059] RSP: 002b:7ffe195d3378 EFLAGS: 0246 ORIG_RAX: 0139 [4.394060] RAX: ffda RBX: 7f9cdb8dda7e RCX: 7f9cda77c8e9 [4.394061] RDX: RSI: 7f9cdac7ce2a RDI: 0013 [4.394062] RBP: 7ffe195d2450 R08: R09: [4.394063] R10: 0013 R11: 0246 R12: 7ffe195d245a [4.394063] R13: 7ffe195d1378 R14: 563f70cc93b0 R15: 563f70cba4d0 [4.394091] ---[ end trace 9c5af17304d998bb ]--- [4.394092] Invalid queue enabled by amdgpu: 9 I suggest you get a Kaveri/Carrizo machine to debug these issues. Until that, I don't think we should merge this patch-set. Oded On Wed, Feb 8, 2017 at 9:47 PM, Andres Rodriguez wrote: > Thank you Oded. > > - Andres > > > On 2017-02-08 02:32 PM, Oded Gabbay wrote: >> >> On Wed, Feb 8, 2017 at 6:23 PM, Andres Rodriguez >> wrote: >>> >>> Hey Felix, >>> >>> Thanks for the pointer to the ROCm mqd commit. I like that the >>> workarounds >>> are easy to spot. I'll add that to a new patch series I'm working on for >>> some bug-fixes for perf being lower on pipes other than pipe 0. >>> >>> I haven't tested this yet on kaveri/carrizo. I'm hoping someone with the >>> HW >>> will be able to give it a go. I put in a few small hacks to get KFD to >>> boot >>> but do nothing on polaris10. >>> >>> Regards, >>> Andres >>> >>> >>> On 2017-02-06 03:20 PM, Felix Kuehling wrote: Hi Andres, Thank you for tackling this task. It's more involved than I expected, mostly because I didn't have much awareness of the MQD management in amdgpu. I made one comment in a separate message about the unified MQD commit function, if you want to bring that more in line with our latest ROCm release on github. Also, were you able to test the upstream KFD with your changes on a Kaveri or Carrizo? Regards, Felix On 17-02-03 11:51 PM, Andres Rodriguez wrote: > > The current queue/pipe split policy is for amdgpu to take the first > pipe > of > MEC0 and leave the rest for amdkfd to use. This policy is taken as an > assumption in a few areas of the implementation. > > This patch series aims to allow for flexible/tunable queue/pipe split > policies > between kgd and kfd. It also updates the
Re: Change queue/pipe split between amdkfd and amdgpu
Thank you Oded. - Andres On 2017-02-08 02:32 PM, Oded Gabbay wrote: On Wed, Feb 8, 2017 at 6:23 PM, Andres Rodriguez wrote: Hey Felix, Thanks for the pointer to the ROCm mqd commit. I like that the workarounds are easy to spot. I'll add that to a new patch series I'm working on for some bug-fixes for perf being lower on pipes other than pipe 0. I haven't tested this yet on kaveri/carrizo. I'm hoping someone with the HW will be able to give it a go. I put in a few small hacks to get KFD to boot but do nothing on polaris10. Regards, Andres On 2017-02-06 03:20 PM, Felix Kuehling wrote: Hi Andres, Thank you for tackling this task. It's more involved than I expected, mostly because I didn't have much awareness of the MQD management in amdgpu. I made one comment in a separate message about the unified MQD commit function, if you want to bring that more in line with our latest ROCm release on github. Also, were you able to test the upstream KFD with your changes on a Kaveri or Carrizo? Regards, Felix On 17-02-03 11:51 PM, Andres Rodriguez wrote: The current queue/pipe split policy is for amdgpu to take the first pipe of MEC0 and leave the rest for amdkfd to use. This policy is taken as an assumption in a few areas of the implementation. This patch series aims to allow for flexible/tunable queue/pipe split policies between kgd and kfd. It also updates the queue/pipe split policy to one that allows better compute app concurrency for both drivers. In the process some duplicate code and hardcoded constants were removed. Any suggestions or feedback on improvements welcome. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx Hi Andres, I will try to find sometime to test it on my Kaveri machine. Oded ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Change queue/pipe split between amdkfd and amdgpu
On Wed, Feb 8, 2017 at 6:23 PM, Andres Rodriguez wrote: > Hey Felix, > > Thanks for the pointer to the ROCm mqd commit. I like that the workarounds > are easy to spot. I'll add that to a new patch series I'm working on for > some bug-fixes for perf being lower on pipes other than pipe 0. > > I haven't tested this yet on kaveri/carrizo. I'm hoping someone with the HW > will be able to give it a go. I put in a few small hacks to get KFD to boot > but do nothing on polaris10. > > Regards, > Andres > > > On 2017-02-06 03:20 PM, Felix Kuehling wrote: >> >> Hi Andres, >> >> Thank you for tackling this task. It's more involved than I expected, >> mostly because I didn't have much awareness of the MQD management in >> amdgpu. >> >> I made one comment in a separate message about the unified MQD commit >> function, if you want to bring that more in line with our latest ROCm >> release on github. >> >> Also, were you able to test the upstream KFD with your changes on a >> Kaveri or Carrizo? >> >> Regards, >> Felix >> >> >> On 17-02-03 11:51 PM, Andres Rodriguez wrote: >>> >>> The current queue/pipe split policy is for amdgpu to take the first pipe >>> of >>> MEC0 and leave the rest for amdkfd to use. This policy is taken as an >>> assumption in a few areas of the implementation. >>> >>> This patch series aims to allow for flexible/tunable queue/pipe split >>> policies >>> between kgd and kfd. It also updates the queue/pipe split policy to one >>> that >>> allows better compute app concurrency for both drivers. >>> >>> In the process some duplicate code and hardcoded constants were removed. >>> >>> Any suggestions or feedback on improvements welcome. >>> >> > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx Hi Andres, I will try to find sometime to test it on my Kaveri machine. Oded ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Change queue/pipe split between amdkfd and amdgpu
Hey Felix, Thanks for the pointer to the ROCm mqd commit. I like that the workarounds are easy to spot. I'll add that to a new patch series I'm working on for some bug-fixes for perf being lower on pipes other than pipe 0. I haven't tested this yet on kaveri/carrizo. I'm hoping someone with the HW will be able to give it a go. I put in a few small hacks to get KFD to boot but do nothing on polaris10. Regards, Andres On 2017-02-06 03:20 PM, Felix Kuehling wrote: Hi Andres, Thank you for tackling this task. It's more involved than I expected, mostly because I didn't have much awareness of the MQD management in amdgpu. I made one comment in a separate message about the unified MQD commit function, if you want to bring that more in line with our latest ROCm release on github. Also, were you able to test the upstream KFD with your changes on a Kaveri or Carrizo? Regards, Felix On 17-02-03 11:51 PM, Andres Rodriguez wrote: The current queue/pipe split policy is for amdgpu to take the first pipe of MEC0 and leave the rest for amdkfd to use. This policy is taken as an assumption in a few areas of the implementation. This patch series aims to allow for flexible/tunable queue/pipe split policies between kgd and kfd. It also updates the queue/pipe split policy to one that allows better compute app concurrency for both drivers. In the process some duplicate code and hardcoded constants were removed. Any suggestions or feedback on improvements welcome. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Change queue/pipe split between amdkfd and amdgpu
Hi Andres, Thank you for tackling this task. It's more involved than I expected, mostly because I didn't have much awareness of the MQD management in amdgpu. I made one comment in a separate message about the unified MQD commit function, if you want to bring that more in line with our latest ROCm release on github. Also, were you able to test the upstream KFD with your changes on a Kaveri or Carrizo? Regards, Felix On 17-02-03 11:51 PM, Andres Rodriguez wrote: > The current queue/pipe split policy is for amdgpu to take the first pipe of > MEC0 and leave the rest for amdkfd to use. This policy is taken as an > assumption in a few areas of the implementation. > > This patch series aims to allow for flexible/tunable queue/pipe split policies > between kgd and kfd. It also updates the queue/pipe split policy to one that > allows better compute app concurrency for both drivers. > > In the process some duplicate code and hardcoded constants were removed. > > Any suggestions or feedback on improvements welcome. > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx