Re: Change queue/pipe split between amdkfd and amdgpu

2017-02-15 Thread Edward O'Callaghan


On 02/16/2017 03:00 PM, Bridgman, John wrote:
> Any objections to authorizing Oded to post the kfdtest binary he is using to 
> some public place (if not there already) so others (like Andres) can test 
> changes which touch on amdkfd ? 
> 
> We should check it for embarrassing symbols but otherwise it should be OK.

someone was up late for a dead line? lol

> 
> That said, since we are getting perilously close to actually sending dGPU 
> support changes upstream we will need (IMO) to maintain a sanitized source 
> repo for kfdtest as well... sharing the binary just gets us started.
> 

Hi John,

Yes, this is the sort of thing I've been referring to for some time now.
We definitely need some kind of centralized mechanism to test/validate
kfd stuff so if you can get this out that would be great! A binary would
be a start, I am sure we can made do and its certainly better than
nothing, however source much like what happened with UMR would be of
course ideal.

I suggest to you that it would perhaps be good if we could arrange some
kind of IRC meeting regarding kfd? Since it seems there is a bit of
fragmented effort here. I have my own ioctl()'s locally for pinning for
my own project which I am not sure are suitable to just upstream as AMD
has its own take so what should we do? I heard so much about dGPU
support for a couple of years now but only seen bits thrown over the
wall. Can we begin a more serious incremental approach happening ASAP?
I created #amdkfd on freenode some time ago which a couple of interested
academics and users hang.

Kind Regards,
Edward.

> Thanks,
> John
> 
>> -Original Message-
>> From: Oded Gabbay [mailto:oded.gab...@gmail.com]
>> Sent: Friday, February 10, 2017 12:57 PM
>> To: Andres Rodriguez
>> Cc: Kuehling, Felix; Bridgman, John; amd-gfx@lists.freedesktop.org;
>> Deucher, Alexander; Jay Cornwall
>> Subject: Re: Change queue/pipe split between amdkfd and amdgpu
>>
>> I don't have a repo, nor do I have the source code.
>> It is a tool that we developed inside AMD (when I was working there), and
>> after I left AMD I got permission to use the binary for regressions testing.
>>
>> Oded
>>
>> On Fri, Feb 10, 2017 at 6:33 PM, Andres Rodriguez 
>> wrote:
>>> Hey Oded,
>>>
>>> Where can I find a repo with kfdtest?
>>>
>>> I tried looking here bit couldn't find it:
>>>
>>> https://cgit.freedesktop.org/~gabbayo/
>>>
>>> -Andres
>>>
>>>
>>>
>>> On 2017-02-10 05:35 AM, Oded Gabbay wrote:
>>>>
>>>> So the warning in dmesg is gone of course, but the test (that I
>>>> mentioned in previous email) still fails, and this time it caused the
>>>> kernel to crash. In addition, now other tests fail as well, e.g.
>>>> KFDEventTest.SignalEvent
>>>>
>>>> I honestly suggest to take some time to debug this patch-set on an
>>>> actual Kaveri machine and then re-send the patches.
>>>>
>>>> Thanks,
>>>> Oded
>>>>
>>>> log of crash from KFDQMTest.CreateMultipleCpQueues:
>>>>
>>>> [  160.900137] kfd: qcm fence wait loop timeout expired [
>>>> 160.900143] kfd: the cp might be in an unrecoverable state due to an
>>>> unsuccessful queues preemption [  160.916765] show_signal_msg: 36
>>>> callbacks suppressed [  160.916771] kfdtest[2498]: segfault at
>>>> 17f8a ip 7f8ae932ee5d sp 7ffc52219cd0 error 4 in
>>>> libhsakmt-1.so.0.0.1[7f8ae932b000+8000]
>>>> [  163.152229] kfd: qcm fence wait loop timeout expired [
>>>> 163.152250] BUG: unable to handle kernel NULL pointer dereference at
>>>> 005a [  163.152299] IP:
>>>> kfd_get_process_device_data+0x6/0x30 [amdkfd] [  163.152323] PGD
>>>> 2333aa067 [  163.152323] PUD 230f64067 [  163.152335] PMD 0
>>>>
>>>> [  163.152364] Oops:  [#1] SMP
>>>> [  163.152379] Modules linked in: joydev edac_mce_amd edac_core
>>>> input_leds kvm_amd snd_hda_codec_realtek kvm irqbypass
>>>> snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel
>> snd_hda_codec
>>>> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core
>>>> snd_hwdep pcbc snd_pcm aesni_intel snd_seq_midi snd_seq_midi_event
>>>> snd_rawmidi snd_seq aes_x86_64 crypto_simd snd_seq_device
>> glue_helper
>>>> cryptd snd_timer snd fam15h_power k10temp soundcore i2c_piix4 shpchp
>>>> tpm_infineon mac_hid parport_pc ppdev nfsd auth_rpcgss nfs_acl lockd
>>>> lp grace sunrpc parport a

RE: Change queue/pipe split between amdkfd and amdgpu

2017-02-15 Thread Bridgman, John
Any objections to authorizing Oded to post the kfdtest binary he is using to 
some public place (if not there already) so others (like Andres) can test 
changes which touch on amdkfd ? 

We should check it for embarrassing symbols but otherwise it should be OK.

That said, since we are getting perilously close to actually sending dGPU 
support changes upstream we will need (IMO) to maintain a sanitized source repo 
for kfdtest as well... sharing the binary just gets us started.

Thanks,
John

>-Original Message-
>From: Oded Gabbay [mailto:oded.gab...@gmail.com]
>Sent: Friday, February 10, 2017 12:57 PM
>To: Andres Rodriguez
>Cc: Kuehling, Felix; Bridgman, John; amd-gfx@lists.freedesktop.org;
>Deucher, Alexander; Jay Cornwall
>Subject: Re: Change queue/pipe split between amdkfd and amdgpu
>
>I don't have a repo, nor do I have the source code.
>It is a tool that we developed inside AMD (when I was working there), and
>after I left AMD I got permission to use the binary for regressions testing.
>
>Oded
>
>On Fri, Feb 10, 2017 at 6:33 PM, Andres Rodriguez 
>wrote:
>> Hey Oded,
>>
>> Where can I find a repo with kfdtest?
>>
>> I tried looking here bit couldn't find it:
>>
>> https://cgit.freedesktop.org/~gabbayo/
>>
>> -Andres
>>
>>
>>
>> On 2017-02-10 05:35 AM, Oded Gabbay wrote:
>>>
>>> So the warning in dmesg is gone of course, but the test (that I
>>> mentioned in previous email) still fails, and this time it caused the
>>> kernel to crash. In addition, now other tests fail as well, e.g.
>>> KFDEventTest.SignalEvent
>>>
>>> I honestly suggest to take some time to debug this patch-set on an
>>> actual Kaveri machine and then re-send the patches.
>>>
>>> Thanks,
>>> Oded
>>>
>>> log of crash from KFDQMTest.CreateMultipleCpQueues:
>>>
>>> [  160.900137] kfd: qcm fence wait loop timeout expired [
>>> 160.900143] kfd: the cp might be in an unrecoverable state due to an
>>> unsuccessful queues preemption [  160.916765] show_signal_msg: 36
>>> callbacks suppressed [  160.916771] kfdtest[2498]: segfault at
>>> 17f8a ip 7f8ae932ee5d sp 7ffc52219cd0 error 4 in
>>> libhsakmt-1.so.0.0.1[7f8ae932b000+8000]
>>> [  163.152229] kfd: qcm fence wait loop timeout expired [
>>> 163.152250] BUG: unable to handle kernel NULL pointer dereference at
>>> 005a [  163.152299] IP:
>>> kfd_get_process_device_data+0x6/0x30 [amdkfd] [  163.152323] PGD
>>> 2333aa067 [  163.152323] PUD 230f64067 [  163.152335] PMD 0
>>>
>>> [  163.152364] Oops:  [#1] SMP
>>> [  163.152379] Modules linked in: joydev edac_mce_amd edac_core
>>> input_leds kvm_amd snd_hda_codec_realtek kvm irqbypass
>>> snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel
>snd_hda_codec
>>> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core
>>> snd_hwdep pcbc snd_pcm aesni_intel snd_seq_midi snd_seq_midi_event
>>> snd_rawmidi snd_seq aes_x86_64 crypto_simd snd_seq_device
>glue_helper
>>> cryptd snd_timer snd fam15h_power k10temp soundcore i2c_piix4 shpchp
>>> tpm_infineon mac_hid parport_pc ppdev nfsd auth_rpcgss nfs_acl lockd
>>> lp grace sunrpc parport autofs4 hid_logitech_hidpp hid_logitech_dj
>>> hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 radeon
>>> i2c_algo_bit ttm drm_kms_helper syscopyarea ahci sysfillrect
>>> sysimgblt libahci fb_sys_fops drm r8169 mii fjes video [  163.152668]
>>> CPU: 3 PID: 2498 Comm: kfdtest Not tainted 4.10.0-rc5+ #3 [
>>> 163.152695] Hardware name: Gigabyte Technology Co., Ltd. To be filled
>>> by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014 [  163.152735] task:
>>> 995e73d16580 task.stack: b41144458000 [  163.152764] RIP:
>>> 0010:kfd_get_process_device_data+0x6/0x30 [amdkfd] [  163.152790]
>>> RSP: 0018:b4114445bab0 EFLAGS: 00010246 [  163.152812] RAX:
>>> ffea RBX: 995e75909c00 RCX:
>>> 
>>> [  163.152841] RDX:  RSI: ffea RDI:
>>> 995e75909600
>>> [  163.152869] RBP: b4114445bae0 R08: 000252a5 R09:
>>> 0414
>>> [  163.152898] R10:  R11: b412d38d R12:
>>> ffc2
>>> [  163.152926] R13:  R14: 995e75909ca8 R15:
>>> 995e75909c00
>>> [  163.152956] FS:  7f8ae975e740() GS:995e7ed8()
>>> knlGS:
>>> [  163.152988] CS:  0010 DS:  ES:  CR0:

Re: Change queue/pipe split between amdkfd and amdgpu

2017-02-10 Thread Oded Gabbay
I don't have a repo, nor do I have the source code.
It is a tool that we developed inside AMD (when I was working there),
and after I left AMD I got permission to use the binary for
regressions testing.

Oded

On Fri, Feb 10, 2017 at 6:33 PM, Andres Rodriguez  wrote:
> Hey Oded,
>
> Where can I find a repo with kfdtest?
>
> I tried looking here bit couldn't find it:
>
> https://cgit.freedesktop.org/~gabbayo/
>
> -Andres
>
>
>
> On 2017-02-10 05:35 AM, Oded Gabbay wrote:
>>
>> So the warning in dmesg is gone of course, but the test (that I
>> mentioned in previous email) still fails, and this time it caused the
>> kernel to crash. In addition, now other tests fail as well, e.g.
>> KFDEventTest.SignalEvent
>>
>> I honestly suggest to take some time to debug this patch-set on an
>> actual Kaveri machine and then re-send the patches.
>>
>> Thanks,
>> Oded
>>
>> log of crash from KFDQMTest.CreateMultipleCpQueues:
>>
>> [  160.900137] kfd: qcm fence wait loop timeout expired
>> [  160.900143] kfd: the cp might be in an unrecoverable state due to
>> an unsuccessful queues preemption
>> [  160.916765] show_signal_msg: 36 callbacks suppressed
>> [  160.916771] kfdtest[2498]: segfault at 17f8a ip
>> 7f8ae932ee5d sp 7ffc52219cd0 error 4 in
>> libhsakmt-1.so.0.0.1[7f8ae932b000+8000]
>> [  163.152229] kfd: qcm fence wait loop timeout expired
>> [  163.152250] BUG: unable to handle kernel NULL pointer dereference
>> at 005a
>> [  163.152299] IP: kfd_get_process_device_data+0x6/0x30 [amdkfd]
>> [  163.152323] PGD 2333aa067
>> [  163.152323] PUD 230f64067
>> [  163.152335] PMD 0
>>
>> [  163.152364] Oops:  [#1] SMP
>> [  163.152379] Modules linked in: joydev edac_mce_amd edac_core
>> input_leds kvm_amd snd_hda_codec_realtek kvm irqbypass
>> snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec
>> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core
>> snd_hwdep pcbc snd_pcm aesni_intel snd_seq_midi snd_seq_midi_event
>> snd_rawmidi snd_seq aes_x86_64 crypto_simd snd_seq_device glue_helper
>> cryptd snd_timer snd fam15h_power k10temp soundcore i2c_piix4 shpchp
>> tpm_infineon mac_hid parport_pc ppdev nfsd auth_rpcgss nfs_acl lockd
>> lp grace sunrpc parport autofs4 hid_logitech_hidpp hid_logitech_dj
>> hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 radeon
>> i2c_algo_bit ttm drm_kms_helper syscopyarea ahci sysfillrect sysimgblt
>> libahci fb_sys_fops drm r8169 mii fjes video
>> [  163.152668] CPU: 3 PID: 2498 Comm: kfdtest Not tainted 4.10.0-rc5+ #3
>> [  163.152695] Hardware name: Gigabyte Technology Co., Ltd. To be
>> filled by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014
>> [  163.152735] task: 995e73d16580 task.stack: b41144458000
>> [  163.152764] RIP: 0010:kfd_get_process_device_data+0x6/0x30 [amdkfd]
>> [  163.152790] RSP: 0018:b4114445bab0 EFLAGS: 00010246
>> [  163.152812] RAX: ffea RBX: 995e75909c00 RCX:
>> 
>> [  163.152841] RDX:  RSI: ffea RDI:
>> 995e75909600
>> [  163.152869] RBP: b4114445bae0 R08: 000252a5 R09:
>> 0414
>> [  163.152898] R10:  R11: b412d38d R12:
>> ffc2
>> [  163.152926] R13:  R14: 995e75909ca8 R15:
>> 995e75909c00
>> [  163.152956] FS:  7f8ae975e740() GS:995e7ed8()
>> knlGS:
>> [  163.152988] CS:  0010 DS:  ES:  CR0: 80050033
>> [  163.153012] CR2: 005a CR3: 0002216ab000 CR4:
>> 000406e0
>> [  163.153041] Call Trace:
>> [  163.153059]  ? destroy_queues_cpsch+0x166/0x190 [amdkfd]
>> [  163.153086]  execute_queues_cpsch+0x2e/0xc0 [amdkfd]
>> [  163.153113]  destroy_queue_cpsch+0xbd/0x140 [amdkfd]
>> [  163.153139]  pqm_destroy_queue+0x111/0x1d0 [amdkfd]
>> [  163.153164]  pqm_uninit+0x3f/0xb0 [amdkfd]
>> [  163.153186]  kfd_unbind_process_from_device+0x51/0xd0 [amdkfd]
>> [  163.153214]  iommu_pasid_shutdown_callback+0x20/0x30 [amdkfd]
>> [  163.153239]  mn_release+0x37/0x70 [amd_iommu_v2]
>> [  163.153261]  __mmu_notifier_release+0x44/0xc0
>> [  163.153281]  exit_mmap+0x15a/0x170
>> [  163.153297]  ? __wake_up+0x44/0x50
>> [  163.153314]  ? exit_robust_list+0x5c/0x110
>> [  163.15]  mmput+0x57/0x140
>> [  163.153347]  do_exit+0x26b/0xb30
>> [  163.153362]  do_group_exit+0x43/0xb0
>> [  163.153379]  get_signal+0x293/0x620
>> [  163.153396]  do_signal+0x37/0x760
>> [  163.153411]  ? print_vma_addr+0x82/0x100
>> [  163.153429]  ? vprintk_default+0x29/0x50
>> [  163.153447]  ? bad_area+0x46/0x50
>> [  163.153463]  ? __do_page_fault+0x3c7/0x4e0
>> [  163.153481]  exit_to_usermode_loop+0x76/0xb0
>> [  163.153500]  prepare_exit_to_usermode+0x2f/0x40
>> [  163.153521]  retint_user+0x8/0x10
>> [  163.153536] RIP: 0033:0x7f8ae932ee5d
>> [  163.153551] RSP: 002b:7ffc52219cd0 EFLAGS: 00010202
>> [  163.153573] RAX: 0003 RBX: 00017f8a RCX:
>> 7ffc52219d00
>> [  163.153602] RDX: 7f

Re: Change queue/pipe split between amdkfd and amdgpu

2017-02-10 Thread Andres Rodriguez

Hey Oded,

Where can I find a repo with kfdtest?

I tried looking here bit couldn't find it:

https://cgit.freedesktop.org/~gabbayo/

-Andres


On 2017-02-10 05:35 AM, Oded Gabbay wrote:

So the warning in dmesg is gone of course, but the test (that I
mentioned in previous email) still fails, and this time it caused the
kernel to crash. In addition, now other tests fail as well, e.g.
KFDEventTest.SignalEvent

I honestly suggest to take some time to debug this patch-set on an
actual Kaveri machine and then re-send the patches.

Thanks,
Oded

log of crash from KFDQMTest.CreateMultipleCpQueues:

[  160.900137] kfd: qcm fence wait loop timeout expired
[  160.900143] kfd: the cp might be in an unrecoverable state due to
an unsuccessful queues preemption
[  160.916765] show_signal_msg: 36 callbacks suppressed
[  160.916771] kfdtest[2498]: segfault at 17f8a ip
7f8ae932ee5d sp 7ffc52219cd0 error 4 in
libhsakmt-1.so.0.0.1[7f8ae932b000+8000]
[  163.152229] kfd: qcm fence wait loop timeout expired
[  163.152250] BUG: unable to handle kernel NULL pointer dereference
at 005a
[  163.152299] IP: kfd_get_process_device_data+0x6/0x30 [amdkfd]
[  163.152323] PGD 2333aa067
[  163.152323] PUD 230f64067
[  163.152335] PMD 0

[  163.152364] Oops:  [#1] SMP
[  163.152379] Modules linked in: joydev edac_mce_amd edac_core
input_leds kvm_amd snd_hda_codec_realtek kvm irqbypass
snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core
snd_hwdep pcbc snd_pcm aesni_intel snd_seq_midi snd_seq_midi_event
snd_rawmidi snd_seq aes_x86_64 crypto_simd snd_seq_device glue_helper
cryptd snd_timer snd fam15h_power k10temp soundcore i2c_piix4 shpchp
tpm_infineon mac_hid parport_pc ppdev nfsd auth_rpcgss nfs_acl lockd
lp grace sunrpc parport autofs4 hid_logitech_hidpp hid_logitech_dj
hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 radeon
i2c_algo_bit ttm drm_kms_helper syscopyarea ahci sysfillrect sysimgblt
libahci fb_sys_fops drm r8169 mii fjes video
[  163.152668] CPU: 3 PID: 2498 Comm: kfdtest Not tainted 4.10.0-rc5+ #3
[  163.152695] Hardware name: Gigabyte Technology Co., Ltd. To be
filled by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014
[  163.152735] task: 995e73d16580 task.stack: b41144458000
[  163.152764] RIP: 0010:kfd_get_process_device_data+0x6/0x30 [amdkfd]
[  163.152790] RSP: 0018:b4114445bab0 EFLAGS: 00010246
[  163.152812] RAX: ffea RBX: 995e75909c00 RCX: 
[  163.152841] RDX:  RSI: ffea RDI: 995e75909600
[  163.152869] RBP: b4114445bae0 R08: 000252a5 R09: 0414
[  163.152898] R10:  R11: b412d38d R12: ffc2
[  163.152926] R13:  R14: 995e75909ca8 R15: 995e75909c00
[  163.152956] FS:  7f8ae975e740() GS:995e7ed8()
knlGS:
[  163.152988] CS:  0010 DS:  ES:  CR0: 80050033
[  163.153012] CR2: 005a CR3: 0002216ab000 CR4: 000406e0
[  163.153041] Call Trace:
[  163.153059]  ? destroy_queues_cpsch+0x166/0x190 [amdkfd]
[  163.153086]  execute_queues_cpsch+0x2e/0xc0 [amdkfd]
[  163.153113]  destroy_queue_cpsch+0xbd/0x140 [amdkfd]
[  163.153139]  pqm_destroy_queue+0x111/0x1d0 [amdkfd]
[  163.153164]  pqm_uninit+0x3f/0xb0 [amdkfd]
[  163.153186]  kfd_unbind_process_from_device+0x51/0xd0 [amdkfd]
[  163.153214]  iommu_pasid_shutdown_callback+0x20/0x30 [amdkfd]
[  163.153239]  mn_release+0x37/0x70 [amd_iommu_v2]
[  163.153261]  __mmu_notifier_release+0x44/0xc0
[  163.153281]  exit_mmap+0x15a/0x170
[  163.153297]  ? __wake_up+0x44/0x50
[  163.153314]  ? exit_robust_list+0x5c/0x110
[  163.15]  mmput+0x57/0x140
[  163.153347]  do_exit+0x26b/0xb30
[  163.153362]  do_group_exit+0x43/0xb0
[  163.153379]  get_signal+0x293/0x620
[  163.153396]  do_signal+0x37/0x760
[  163.153411]  ? print_vma_addr+0x82/0x100
[  163.153429]  ? vprintk_default+0x29/0x50
[  163.153447]  ? bad_area+0x46/0x50
[  163.153463]  ? __do_page_fault+0x3c7/0x4e0
[  163.153481]  exit_to_usermode_loop+0x76/0xb0
[  163.153500]  prepare_exit_to_usermode+0x2f/0x40
[  163.153521]  retint_user+0x8/0x10
[  163.153536] RIP: 0033:0x7f8ae932ee5d
[  163.153551] RSP: 002b:7ffc52219cd0 EFLAGS: 00010202
[  163.153573] RAX: 0003 RBX: 00017f8a RCX: 7ffc52219d00
[  163.153602] RDX: 7f8ae9534220 RSI: 7f8ae8b5eb28 RDI: 00017f8a
[  163.153630] RBP: 7ffc52219d20 R08: 01cc1890 R09: 
[  163.153659] R10: 0027 R11: 7f8ae932ee10 R12: 01cc52a0
[  163.153687] R13: 7ffc5221a200 R14: 0021 R15: 
[  163.153716] Code: e0 04 00 00 48 3b 91 f0 03 00 00 74 01 c3 55 48
89 e5 e8 2e f9 ff ff 5d c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
44 00 00 55 <48> 8b 46 70 48 83 c6 70 48 89 e5 48 39 f0 74 16 48 3b 78
10 75
[  163.153818] RIP: kfd_get_process_device_data

Re: Change queue/pipe split between amdkfd and amdgpu

2017-02-10 Thread Oded Gabbay
So the warning in dmesg is gone of course, but the test (that I
mentioned in previous email) still fails, and this time it caused the
kernel to crash. In addition, now other tests fail as well, e.g.
KFDEventTest.SignalEvent

I honestly suggest to take some time to debug this patch-set on an
actual Kaveri machine and then re-send the patches.

Thanks,
Oded

log of crash from KFDQMTest.CreateMultipleCpQueues:

[  160.900137] kfd: qcm fence wait loop timeout expired
[  160.900143] kfd: the cp might be in an unrecoverable state due to
an unsuccessful queues preemption
[  160.916765] show_signal_msg: 36 callbacks suppressed
[  160.916771] kfdtest[2498]: segfault at 17f8a ip
7f8ae932ee5d sp 7ffc52219cd0 error 4 in
libhsakmt-1.so.0.0.1[7f8ae932b000+8000]
[  163.152229] kfd: qcm fence wait loop timeout expired
[  163.152250] BUG: unable to handle kernel NULL pointer dereference
at 005a
[  163.152299] IP: kfd_get_process_device_data+0x6/0x30 [amdkfd]
[  163.152323] PGD 2333aa067
[  163.152323] PUD 230f64067
[  163.152335] PMD 0

[  163.152364] Oops:  [#1] SMP
[  163.152379] Modules linked in: joydev edac_mce_amd edac_core
input_leds kvm_amd snd_hda_codec_realtek kvm irqbypass
snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core
snd_hwdep pcbc snd_pcm aesni_intel snd_seq_midi snd_seq_midi_event
snd_rawmidi snd_seq aes_x86_64 crypto_simd snd_seq_device glue_helper
cryptd snd_timer snd fam15h_power k10temp soundcore i2c_piix4 shpchp
tpm_infineon mac_hid parport_pc ppdev nfsd auth_rpcgss nfs_acl lockd
lp grace sunrpc parport autofs4 hid_logitech_hidpp hid_logitech_dj
hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 radeon
i2c_algo_bit ttm drm_kms_helper syscopyarea ahci sysfillrect sysimgblt
libahci fb_sys_fops drm r8169 mii fjes video
[  163.152668] CPU: 3 PID: 2498 Comm: kfdtest Not tainted 4.10.0-rc5+ #3
[  163.152695] Hardware name: Gigabyte Technology Co., Ltd. To be
filled by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014
[  163.152735] task: 995e73d16580 task.stack: b41144458000
[  163.152764] RIP: 0010:kfd_get_process_device_data+0x6/0x30 [amdkfd]
[  163.152790] RSP: 0018:b4114445bab0 EFLAGS: 00010246
[  163.152812] RAX: ffea RBX: 995e75909c00 RCX: 
[  163.152841] RDX:  RSI: ffea RDI: 995e75909600
[  163.152869] RBP: b4114445bae0 R08: 000252a5 R09: 0414
[  163.152898] R10:  R11: b412d38d R12: ffc2
[  163.152926] R13:  R14: 995e75909ca8 R15: 995e75909c00
[  163.152956] FS:  7f8ae975e740() GS:995e7ed8()
knlGS:
[  163.152988] CS:  0010 DS:  ES:  CR0: 80050033
[  163.153012] CR2: 005a CR3: 0002216ab000 CR4: 000406e0
[  163.153041] Call Trace:
[  163.153059]  ? destroy_queues_cpsch+0x166/0x190 [amdkfd]
[  163.153086]  execute_queues_cpsch+0x2e/0xc0 [amdkfd]
[  163.153113]  destroy_queue_cpsch+0xbd/0x140 [amdkfd]
[  163.153139]  pqm_destroy_queue+0x111/0x1d0 [amdkfd]
[  163.153164]  pqm_uninit+0x3f/0xb0 [amdkfd]
[  163.153186]  kfd_unbind_process_from_device+0x51/0xd0 [amdkfd]
[  163.153214]  iommu_pasid_shutdown_callback+0x20/0x30 [amdkfd]
[  163.153239]  mn_release+0x37/0x70 [amd_iommu_v2]
[  163.153261]  __mmu_notifier_release+0x44/0xc0
[  163.153281]  exit_mmap+0x15a/0x170
[  163.153297]  ? __wake_up+0x44/0x50
[  163.153314]  ? exit_robust_list+0x5c/0x110
[  163.15]  mmput+0x57/0x140
[  163.153347]  do_exit+0x26b/0xb30
[  163.153362]  do_group_exit+0x43/0xb0
[  163.153379]  get_signal+0x293/0x620
[  163.153396]  do_signal+0x37/0x760
[  163.153411]  ? print_vma_addr+0x82/0x100
[  163.153429]  ? vprintk_default+0x29/0x50
[  163.153447]  ? bad_area+0x46/0x50
[  163.153463]  ? __do_page_fault+0x3c7/0x4e0
[  163.153481]  exit_to_usermode_loop+0x76/0xb0
[  163.153500]  prepare_exit_to_usermode+0x2f/0x40
[  163.153521]  retint_user+0x8/0x10
[  163.153536] RIP: 0033:0x7f8ae932ee5d
[  163.153551] RSP: 002b:7ffc52219cd0 EFLAGS: 00010202
[  163.153573] RAX: 0003 RBX: 00017f8a RCX: 7ffc52219d00
[  163.153602] RDX: 7f8ae9534220 RSI: 7f8ae8b5eb28 RDI: 00017f8a
[  163.153630] RBP: 7ffc52219d20 R08: 01cc1890 R09: 
[  163.153659] R10: 0027 R11: 7f8ae932ee10 R12: 01cc52a0
[  163.153687] R13: 7ffc5221a200 R14: 0021 R15: 
[  163.153716] Code: e0 04 00 00 48 3b 91 f0 03 00 00 74 01 c3 55 48
89 e5 e8 2e f9 ff ff 5d c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
44 00 00 55 <48> 8b 46 70 48 83 c6 70 48 89 e5 48 39 f0 74 16 48 3b 78
10 75
[  163.153818] RIP: kfd_get_process_device_data+0x6/0x30 [amdkfd] RSP:
b4114445bab0
[  163.153848] CR2: 005a
[  163.160389] ---[ end trace f6a8177c7119c1f5 ]---
[  163.160390] Fixing recursive fault but reboot is needed!

Re: Change queue/pipe split between amdkfd and amdgpu

2017-02-09 Thread Andres Rodriguez

Hey Oded,

Sorry to be a nuisance, but if you have everything still setup could you 
give this fix a quick go?


diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c

index 5321d18..9f70ee0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -667,7 +667,7 @@ static int set_sched_resources(struct 
device_queue_manager *dqm)

/* This situation may be hit in the future if a new HW
 * generation exposes more than 64 queues. If so, the
 * definition of res.queue_mask needs updating */
-   if (WARN_ON(i > sizeof(res.queue_mask))) {
+   if (WARN_ON(i > (sizeof(res.queue_mask)*8))) {
pr_err("Invalid queue enabled by amdgpu: %d\n", i);
break;
}

John/Felix,

Any chance I could borrow a carrizo/kaveri for a few days? Or maybe you 
could help me run some final tests on this patch series?


- Andres


On 2017-02-09 03:11 PM, Oded Gabbay wrote:

  Andres,

I tried your patches on Kaveri with airlied's drm-next branch.
I used radeon+amdkfd

The following test failed: KFDQMTest.CreateMultipleCpQueues
However, I can't debug it because I don't have the sources of kfdtest.

In dmesg, I saw the following warning during boot:
WARNING: CPU: 0 PID: 150 at
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:670
start_cpsch+0xc5/0x220 [amdkfd]
[4.393796] Modules linked in: hid_logitech_hidpp hid_logitech_dj
hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 radeon(+)
i2c_algo_bit ttm drm_kms_helper syscopyarea ahci sysfillrect sysimgblt
libahci fb_sys_fops drm r8169 mii fjes video
[4.393811] CPU: 0 PID: 150 Comm: systemd-udevd Not tainted 4.10.0-rc5+ #1
[4.393811] Hardware name: Gigabyte Technology Co., Ltd. To be
filled by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014
[4.393812] Call Trace:
[4.393818]  dump_stack+0x63/0x90
[4.393822]  __warn+0xcb/0xf0
[4.393823]  warn_slowpath_null+0x1d/0x20
[4.393830]  start_cpsch+0xc5/0x220 [amdkfd]
[4.393836]  ? initialize_cpsch+0xa0/0xb0 [amdkfd]
[4.393841]  kgd2kfd_device_init+0x375/0x490 [amdkfd]
[4.393883]  radeon_kfd_device_init+0xaf/0xd0 [radeon]
[4.393911]  radeon_driver_load_kms+0x11e/0x1f0 [radeon]
[4.393933]  drm_dev_register+0x14a/0x200 [drm]
[4.393946]  drm_get_pci_dev+0x9d/0x160 [drm]
[4.393974]  radeon_pci_probe+0xb8/0xe0 [radeon]
[4.393976]  local_pci_probe+0x45/0xa0
[4.393978]  pci_device_probe+0x103/0x150
[4.393981]  driver_probe_device+0x2bf/0x460
[4.393982]  __driver_attach+0xdf/0xf0
[4.393984]  ? driver_probe_device+0x460/0x460
[4.393985]  bus_for_each_dev+0x6c/0xc0
[4.393987]  driver_attach+0x1e/0x20
[4.393988]  bus_add_driver+0x1fd/0x270
[4.393989]  ? 0xc05c8000
[4.393991]  driver_register+0x60/0xe0
[4.393992]  ? 0xc05c8000
[4.393993]  __pci_register_driver+0x4c/0x50
[4.394007]  drm_pci_init+0xeb/0x100 [drm]
[4.394008]  ? 0xc05c8000
[4.394031]  radeon_init+0x98/0xb6 [radeon]
[4.394034]  do_one_initcall+0x53/0x1a0
[4.394037]  ? __vunmap+0x81/0xd0
[4.394039]  ? kmem_cache_alloc_trace+0x152/0x1c0
[4.394041]  ? vfree+0x2e/0x70
[4.394044]  do_init_module+0x5f/0x1ff
[4.394046]  load_module+0x24cc/0x29f0
[4.394047]  ? __symbol_put+0x60/0x60
[4.394050]  ? security_kernel_post_read_file+0x6b/0x80
[4.394052]  SYSC_finit_module+0xdf/0x110
[4.394054]  SyS_finit_module+0xe/0x10
[4.394056]  entry_SYSCALL_64_fastpath+0x1e/0xad
[4.394058] RIP: 0033:0x7f9cda77c8e9
[4.394059] RSP: 002b:7ffe195d3378 EFLAGS: 0246 ORIG_RAX:
0139
[4.394060] RAX: ffda RBX: 7f9cdb8dda7e RCX: 7f9cda77c8e9
[4.394061] RDX:  RSI: 7f9cdac7ce2a RDI: 0013
[4.394062] RBP: 7ffe195d2450 R08:  R09: 
[4.394063] R10: 0013 R11: 0246 R12: 7ffe195d245a
[4.394063] R13: 7ffe195d1378 R14: 563f70cc93b0 R15: 563f70cba4d0
[4.394091] ---[ end trace 9c5af17304d998bb ]---
[4.394092] Invalid queue enabled by amdgpu: 9

I suggest you get a Kaveri/Carrizo machine to debug these issues.

Until that, I don't think we should merge this patch-set.

Oded

On Wed, Feb 8, 2017 at 9:47 PM, Andres Rodriguez  wrote:

Thank you Oded.

- Andres


On 2017-02-08 02:32 PM, Oded Gabbay wrote:

On Wed, Feb 8, 2017 at 6:23 PM, Andres Rodriguez 
wrote:

Hey Felix,

Thanks for the pointer to the ROCm mqd commit. I like that the
workarounds
are easy to spot. I'll add that to a new patch series I'm working on for
some bug-fixes for perf being lower on pipes other than pipe 0.

I haven't tested this yet on kaveri/carrizo. I'm hoping someone with the
HW
will be able to give it a go. I put in a few small hacks to g

Re: Change queue/pipe split between amdkfd and amdgpu

2017-02-09 Thread Andres Rodriguez
Thanks Oded for the test results.

I'll work on a fix.

Regards,
Andres

On Thu, Feb 9, 2017 at 3:11 PM, Oded Gabbay  wrote:

>  Andres,
>
> I tried your patches on Kaveri with airlied's drm-next branch.
> I used radeon+amdkfd
>
> The following test failed: KFDQMTest.CreateMultipleCpQueues
> However, I can't debug it because I don't have the sources of kfdtest.
>
> In dmesg, I saw the following warning during boot:
> WARNING: CPU: 0 PID: 150 at
> drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:670
> start_cpsch+0xc5/0x220 [amdkfd]
> [4.393796] Modules linked in: hid_logitech_hidpp hid_logitech_dj
> hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 radeon(+)
> i2c_algo_bit ttm drm_kms_helper syscopyarea ahci sysfillrect sysimgblt
> libahci fb_sys_fops drm r8169 mii fjes video
> [4.393811] CPU: 0 PID: 150 Comm: systemd-udevd Not tainted 4.10.0-rc5+
> #1
> [4.393811] Hardware name: Gigabyte Technology Co., Ltd. To be
> filled by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014
> [4.393812] Call Trace:
> [4.393818]  dump_stack+0x63/0x90
> [4.393822]  __warn+0xcb/0xf0
> [4.393823]  warn_slowpath_null+0x1d/0x20
> [4.393830]  start_cpsch+0xc5/0x220 [amdkfd]
> [4.393836]  ? initialize_cpsch+0xa0/0xb0 [amdkfd]
> [4.393841]  kgd2kfd_device_init+0x375/0x490 [amdkfd]
> [4.393883]  radeon_kfd_device_init+0xaf/0xd0 [radeon]
> [4.393911]  radeon_driver_load_kms+0x11e/0x1f0 [radeon]
> [4.393933]  drm_dev_register+0x14a/0x200 [drm]
> [4.393946]  drm_get_pci_dev+0x9d/0x160 [drm]
> [4.393974]  radeon_pci_probe+0xb8/0xe0 [radeon]
> [4.393976]  local_pci_probe+0x45/0xa0
> [4.393978]  pci_device_probe+0x103/0x150
> [4.393981]  driver_probe_device+0x2bf/0x460
> [4.393982]  __driver_attach+0xdf/0xf0
> [4.393984]  ? driver_probe_device+0x460/0x460
> [4.393985]  bus_for_each_dev+0x6c/0xc0
> [4.393987]  driver_attach+0x1e/0x20
> [4.393988]  bus_add_driver+0x1fd/0x270
> [4.393989]  ? 0xc05c8000
> [4.393991]  driver_register+0x60/0xe0
> [4.393992]  ? 0xc05c8000
> [4.393993]  __pci_register_driver+0x4c/0x50
> [4.394007]  drm_pci_init+0xeb/0x100 [drm]
> [4.394008]  ? 0xc05c8000
> [4.394031]  radeon_init+0x98/0xb6 [radeon]
> [4.394034]  do_one_initcall+0x53/0x1a0
> [4.394037]  ? __vunmap+0x81/0xd0
> [4.394039]  ? kmem_cache_alloc_trace+0x152/0x1c0
> [4.394041]  ? vfree+0x2e/0x70
> [4.394044]  do_init_module+0x5f/0x1ff
> [4.394046]  load_module+0x24cc/0x29f0
> [4.394047]  ? __symbol_put+0x60/0x60
> [4.394050]  ? security_kernel_post_read_file+0x6b/0x80
> [4.394052]  SYSC_finit_module+0xdf/0x110
> [4.394054]  SyS_finit_module+0xe/0x10
> [4.394056]  entry_SYSCALL_64_fastpath+0x1e/0xad
> [4.394058] RIP: 0033:0x7f9cda77c8e9
> [4.394059] RSP: 002b:7ffe195d3378 EFLAGS: 0246 ORIG_RAX:
> 0139
> [4.394060] RAX: ffda RBX: 7f9cdb8dda7e RCX:
> 7f9cda77c8e9
> [4.394061] RDX:  RSI: 7f9cdac7ce2a RDI:
> 0013
> [4.394062] RBP: 7ffe195d2450 R08:  R09:
> 
> [4.394063] R10: 0013 R11: 0246 R12:
> 7ffe195d245a
> [4.394063] R13: 7ffe195d1378 R14: 563f70cc93b0 R15:
> 563f70cba4d0
> [4.394091] ---[ end trace 9c5af17304d998bb ]---
> [4.394092] Invalid queue enabled by amdgpu: 9
>
> I suggest you get a Kaveri/Carrizo machine to debug these issues.
>
> Until that, I don't think we should merge this patch-set.
>
> Oded
>
> On Wed, Feb 8, 2017 at 9:47 PM, Andres Rodriguez 
> wrote:
> > Thank you Oded.
> >
> > - Andres
> >
> >
> > On 2017-02-08 02:32 PM, Oded Gabbay wrote:
> >>
> >> On Wed, Feb 8, 2017 at 6:23 PM, Andres Rodriguez 
> >> wrote:
> >>>
> >>> Hey Felix,
> >>>
> >>> Thanks for the pointer to the ROCm mqd commit. I like that the
> >>> workarounds
> >>> are easy to spot. I'll add that to a new patch series I'm working on
> for
> >>> some bug-fixes for perf being lower on pipes other than pipe 0.
> >>>
> >>> I haven't tested this yet on kaveri/carrizo. I'm hoping someone with
> the
> >>> HW
> >>> will be able to give it a go. I put in a few small hacks to get KFD to
> >>> boot
> >>> but do nothing on polaris10.
> >>>
> >>> Regards,
> >>> Andres
> >>>
> >>>
> >>> On 2017-02-06 03:20 PM, Felix Kuehling wrote:
> 
>  Hi Andres,
> 
>  Thank you for tackling this task. It's more involved than I expected,
>  mostly because I didn't have much awareness of the MQD management in
>  amdgpu.
> 
>  I made one comment in a separate message about the unified MQD commit
>  function, if you want to bring that more in line with our latest ROCm
>  release on github.
> 
>  Also, were you able to test the upstream KFD with your changes on a
>  Kaveri or Carrizo?
> 
>  Regards,
> Felix
> 
> 
>  On 17-02-03 11:51 PM, Andres Rodriguez 

Re: Change queue/pipe split between amdkfd and amdgpu

2017-02-09 Thread Oded Gabbay
 Andres,

I tried your patches on Kaveri with airlied's drm-next branch.
I used radeon+amdkfd

The following test failed: KFDQMTest.CreateMultipleCpQueues
However, I can't debug it because I don't have the sources of kfdtest.

In dmesg, I saw the following warning during boot:
WARNING: CPU: 0 PID: 150 at
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:670
start_cpsch+0xc5/0x220 [amdkfd]
[4.393796] Modules linked in: hid_logitech_hidpp hid_logitech_dj
hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 radeon(+)
i2c_algo_bit ttm drm_kms_helper syscopyarea ahci sysfillrect sysimgblt
libahci fb_sys_fops drm r8169 mii fjes video
[4.393811] CPU: 0 PID: 150 Comm: systemd-udevd Not tainted 4.10.0-rc5+ #1
[4.393811] Hardware name: Gigabyte Technology Co., Ltd. To be
filled by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014
[4.393812] Call Trace:
[4.393818]  dump_stack+0x63/0x90
[4.393822]  __warn+0xcb/0xf0
[4.393823]  warn_slowpath_null+0x1d/0x20
[4.393830]  start_cpsch+0xc5/0x220 [amdkfd]
[4.393836]  ? initialize_cpsch+0xa0/0xb0 [amdkfd]
[4.393841]  kgd2kfd_device_init+0x375/0x490 [amdkfd]
[4.393883]  radeon_kfd_device_init+0xaf/0xd0 [radeon]
[4.393911]  radeon_driver_load_kms+0x11e/0x1f0 [radeon]
[4.393933]  drm_dev_register+0x14a/0x200 [drm]
[4.393946]  drm_get_pci_dev+0x9d/0x160 [drm]
[4.393974]  radeon_pci_probe+0xb8/0xe0 [radeon]
[4.393976]  local_pci_probe+0x45/0xa0
[4.393978]  pci_device_probe+0x103/0x150
[4.393981]  driver_probe_device+0x2bf/0x460
[4.393982]  __driver_attach+0xdf/0xf0
[4.393984]  ? driver_probe_device+0x460/0x460
[4.393985]  bus_for_each_dev+0x6c/0xc0
[4.393987]  driver_attach+0x1e/0x20
[4.393988]  bus_add_driver+0x1fd/0x270
[4.393989]  ? 0xc05c8000
[4.393991]  driver_register+0x60/0xe0
[4.393992]  ? 0xc05c8000
[4.393993]  __pci_register_driver+0x4c/0x50
[4.394007]  drm_pci_init+0xeb/0x100 [drm]
[4.394008]  ? 0xc05c8000
[4.394031]  radeon_init+0x98/0xb6 [radeon]
[4.394034]  do_one_initcall+0x53/0x1a0
[4.394037]  ? __vunmap+0x81/0xd0
[4.394039]  ? kmem_cache_alloc_trace+0x152/0x1c0
[4.394041]  ? vfree+0x2e/0x70
[4.394044]  do_init_module+0x5f/0x1ff
[4.394046]  load_module+0x24cc/0x29f0
[4.394047]  ? __symbol_put+0x60/0x60
[4.394050]  ? security_kernel_post_read_file+0x6b/0x80
[4.394052]  SYSC_finit_module+0xdf/0x110
[4.394054]  SyS_finit_module+0xe/0x10
[4.394056]  entry_SYSCALL_64_fastpath+0x1e/0xad
[4.394058] RIP: 0033:0x7f9cda77c8e9
[4.394059] RSP: 002b:7ffe195d3378 EFLAGS: 0246 ORIG_RAX:
0139
[4.394060] RAX: ffda RBX: 7f9cdb8dda7e RCX: 7f9cda77c8e9
[4.394061] RDX:  RSI: 7f9cdac7ce2a RDI: 0013
[4.394062] RBP: 7ffe195d2450 R08:  R09: 
[4.394063] R10: 0013 R11: 0246 R12: 7ffe195d245a
[4.394063] R13: 7ffe195d1378 R14: 563f70cc93b0 R15: 563f70cba4d0
[4.394091] ---[ end trace 9c5af17304d998bb ]---
[4.394092] Invalid queue enabled by amdgpu: 9

I suggest you get a Kaveri/Carrizo machine to debug these issues.

Until that, I don't think we should merge this patch-set.

Oded

On Wed, Feb 8, 2017 at 9:47 PM, Andres Rodriguez  wrote:
> Thank you Oded.
>
> - Andres
>
>
> On 2017-02-08 02:32 PM, Oded Gabbay wrote:
>>
>> On Wed, Feb 8, 2017 at 6:23 PM, Andres Rodriguez 
>> wrote:
>>>
>>> Hey Felix,
>>>
>>> Thanks for the pointer to the ROCm mqd commit. I like that the
>>> workarounds
>>> are easy to spot. I'll add that to a new patch series I'm working on for
>>> some bug-fixes for perf being lower on pipes other than pipe 0.
>>>
>>> I haven't tested this yet on kaveri/carrizo. I'm hoping someone with the
>>> HW
>>> will be able to give it a go. I put in a few small hacks to get KFD to
>>> boot
>>> but do nothing on polaris10.
>>>
>>> Regards,
>>> Andres
>>>
>>>
>>> On 2017-02-06 03:20 PM, Felix Kuehling wrote:

 Hi Andres,

 Thank you for tackling this task. It's more involved than I expected,
 mostly because I didn't have much awareness of the MQD management in
 amdgpu.

 I made one comment in a separate message about the unified MQD commit
 function, if you want to bring that more in line with our latest ROCm
 release on github.

 Also, were you able to test the upstream KFD with your changes on a
 Kaveri or Carrizo?

 Regards,
Felix


 On 17-02-03 11:51 PM, Andres Rodriguez wrote:
>
> The current queue/pipe split policy is for amdgpu to take the first
> pipe
> of
> MEC0 and leave the rest for amdkfd to use. This policy is taken as an
> assumption in a few areas of the implementation.
>
> This patch series aims to allow for flexible/tunable queue/pipe split
> policies
> between kgd and kfd. It also updates the

Re: Change queue/pipe split between amdkfd and amdgpu

2017-02-08 Thread Andres Rodriguez

Thank you Oded.

- Andres

On 2017-02-08 02:32 PM, Oded Gabbay wrote:

On Wed, Feb 8, 2017 at 6:23 PM, Andres Rodriguez  wrote:

Hey Felix,

Thanks for the pointer to the ROCm mqd commit. I like that the workarounds
are easy to spot. I'll add that to a new patch series I'm working on for
some bug-fixes for perf being lower on pipes other than pipe 0.

I haven't tested this yet on kaveri/carrizo. I'm hoping someone with the HW
will be able to give it a go. I put in a few small hacks to get KFD to boot
but do nothing on polaris10.

Regards,
Andres


On 2017-02-06 03:20 PM, Felix Kuehling wrote:

Hi Andres,

Thank you for tackling this task. It's more involved than I expected,
mostly because I didn't have much awareness of the MQD management in
amdgpu.

I made one comment in a separate message about the unified MQD commit
function, if you want to bring that more in line with our latest ROCm
release on github.

Also, were you able to test the upstream KFD with your changes on a
Kaveri or Carrizo?

Regards,
   Felix


On 17-02-03 11:51 PM, Andres Rodriguez wrote:

The current queue/pipe split policy is for amdgpu to take the first pipe
of
MEC0 and leave the rest for amdkfd to use. This policy is taken as an
assumption in a few areas of the implementation.

This patch series aims to allow for flexible/tunable queue/pipe split
policies
between kgd and kfd. It also updates the queue/pipe split policy to one
that
allows better compute app concurrency for both drivers.

In the process some duplicate code and hardcoded constants were removed.

Any suggestions or feedback on improvements welcome.


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Hi Andres,
I will try to find sometime to test it on my Kaveri machine.

Oded


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Change queue/pipe split between amdkfd and amdgpu

2017-02-08 Thread Oded Gabbay
On Wed, Feb 8, 2017 at 6:23 PM, Andres Rodriguez  wrote:
> Hey Felix,
>
> Thanks for the pointer to the ROCm mqd commit. I like that the workarounds
> are easy to spot. I'll add that to a new patch series I'm working on for
> some bug-fixes for perf being lower on pipes other than pipe 0.
>
> I haven't tested this yet on kaveri/carrizo. I'm hoping someone with the HW
> will be able to give it a go. I put in a few small hacks to get KFD to boot
> but do nothing on polaris10.
>
> Regards,
> Andres
>
>
> On 2017-02-06 03:20 PM, Felix Kuehling wrote:
>>
>> Hi Andres,
>>
>> Thank you for tackling this task. It's more involved than I expected,
>> mostly because I didn't have much awareness of the MQD management in
>> amdgpu.
>>
>> I made one comment in a separate message about the unified MQD commit
>> function, if you want to bring that more in line with our latest ROCm
>> release on github.
>>
>> Also, were you able to test the upstream KFD with your changes on a
>> Kaveri or Carrizo?
>>
>> Regards,
>>   Felix
>>
>>
>> On 17-02-03 11:51 PM, Andres Rodriguez wrote:
>>>
>>> The current queue/pipe split policy is for amdgpu to take the first pipe
>>> of
>>> MEC0 and leave the rest for amdkfd to use. This policy is taken as an
>>> assumption in a few areas of the implementation.
>>>
>>> This patch series aims to allow for flexible/tunable queue/pipe split
>>> policies
>>> between kgd and kfd. It also updates the queue/pipe split policy to one
>>> that
>>> allows better compute app concurrency for both drivers.
>>>
>>> In the process some duplicate code and hardcoded constants were removed.
>>>
>>> Any suggestions or feedback on improvements welcome.
>>>
>>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Hi Andres,
I will try to find sometime to test it on my Kaveri machine.

Oded
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Change queue/pipe split between amdkfd and amdgpu

2017-02-08 Thread Andres Rodriguez

Hey Felix,

Thanks for the pointer to the ROCm mqd commit. I like that the 
workarounds are easy to spot. I'll add that to a new patch series I'm 
working on for some bug-fixes for perf being lower on pipes other than 
pipe 0.


I haven't tested this yet on kaveri/carrizo. I'm hoping someone with the 
HW will be able to give it a go. I put in a few small hacks to get KFD 
to boot but do nothing on polaris10.


Regards,
Andres

On 2017-02-06 03:20 PM, Felix Kuehling wrote:

Hi Andres,

Thank you for tackling this task. It's more involved than I expected,
mostly because I didn't have much awareness of the MQD management in amdgpu.

I made one comment in a separate message about the unified MQD commit
function, if you want to bring that more in line with our latest ROCm
release on github.

Also, were you able to test the upstream KFD with your changes on a
Kaveri or Carrizo?

Regards,
  Felix


On 17-02-03 11:51 PM, Andres Rodriguez wrote:

The current queue/pipe split policy is for amdgpu to take the first pipe of
MEC0 and leave the rest for amdkfd to use. This policy is taken as an
assumption in a few areas of the implementation.

This patch series aims to allow for flexible/tunable queue/pipe split policies
between kgd and kfd. It also updates the queue/pipe split policy to one that
allows better compute app concurrency for both drivers.

In the process some duplicate code and hardcoded constants were removed.

Any suggestions or feedback on improvements welcome.




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Change queue/pipe split between amdkfd and amdgpu

2017-02-06 Thread Felix Kuehling
Hi Andres,

Thank you for tackling this task. It's more involved than I expected,
mostly because I didn't have much awareness of the MQD management in amdgpu.

I made one comment in a separate message about the unified MQD commit
function, if you want to bring that more in line with our latest ROCm
release on github.

Also, were you able to test the upstream KFD with your changes on a
Kaveri or Carrizo?

Regards,
  Felix


On 17-02-03 11:51 PM, Andres Rodriguez wrote:
> The current queue/pipe split policy is for amdgpu to take the first pipe of
> MEC0 and leave the rest for amdkfd to use. This policy is taken as an
> assumption in a few areas of the implementation.
>
> This patch series aims to allow for flexible/tunable queue/pipe split policies
> between kgd and kfd. It also updates the queue/pipe split policy to one that 
> allows better compute app concurrency for both drivers.
>
> In the process some duplicate code and hardcoded constants were removed.
>
> Any suggestions or feedback on improvements welcome.
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx