Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating

2021-05-14 Thread David Coe

Hi all!

On 04/05/2021 07:52, Suravee Suthikulpanit wrote:

On certain AMD platforms, when the IOMMU performance counter source
(csource) field is zero, power-gating for the counter is enabled, which
prevents write access and returns zero for read access.

This can cause invalid perf result especially when event multiplexing
is needed (i.e. more number of events than available counters) since
the current logic keeps track of the previously read counter value,
and subsequently re-program the counter to continue counting the event.
With power-gating enabled, we cannot gurantee successful re-programming
of the counter.

Workaround this issue by :

1. Modifying the ordering of setting/reading counters and enabing/
disabling csources to only access the counter when the csource
is set to non-zero.

2. Since AMD IOMMU PMU does not support interrupt mode, the logic
can be simplified to always start counting with value zero,
and accumulate the counter value when stopping without the need
to keep track and reprogram the counter with the previously read
counter value.

This has been tested on systems with and without power-gating.


I've just noticed kernel-5.13-rc1 includes your full iommu enchilada. A 
quick test with Ubuntu's mainline ppa debs (and a home-spun perf)gives 
on a Ryzen 2400G what seem very satisfactory results. Bravo!


 Performance counter stats for 'system wide':

 0   amd_iommu_0/cmd_processed/   (33.32%)
 0   amd_iommu_0/cmd_processed_inv/   (33.34%)
 0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/(33.38%)
   615   amd_iommu_0/int_dte_hit/ (33.44%)
 5   amd_iommu_0/int_dte_mis/ (33.44%)
 1,347   amd_iommu_0/mem_dte_hit/ (33.46%)
19,127   amd_iommu_0/mem_dte_mis/ (33.44%)
71   amd_iommu_0/mem_iommu_tlb_pde_hit/   (33.43%)
   754   amd_iommu_0/mem_iommu_tlb_pde_mis/   (33.41%)
 1,777   amd_iommu_0/mem_iommu_tlb_pte_hit/   (33.36%)
20,163   amd_iommu_0/mem_iommu_tlb_pte_mis/   (33.32%)
 0   amd_iommu_0/mem_pass_excl/   (33.25%)
 0   amd_iommu_0/mem_pass_pretrans/   (33.28%)
27,283   amd_iommu_0/mem_pass_untrans/(33.27%)
 0   amd_iommu_0/mem_target_abort/(33.29%)
   645   amd_iommu_0/mem_trans_total/ (33.32%)
 0   amd_iommu_0/page_tbl_read_gst/   (33.28%)
   183   amd_iommu_0/page_tbl_read_nst/   (33.30%)
45   amd_iommu_0/page_tbl_read_tot/   (33.30%)
 0   amd_iommu_0/smi_blk/ (33.32%)
 0   amd_iommu_0/smi_recv/(33.28%)
 0   amd_iommu_0/tlb_inv/ (33.27%)
 0   amd_iommu_0/vapic_int_guest/ (33.28%)
   613   amd_iommu_0/vapic_int_non_guest/ (33.26%)

   9.998673791 seconds time elapsed

Running Windows 10 & etc under QEMU/KVM produces nothing untoward. 
Again, congratulations and many thanks.


--
David
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating

2021-05-05 Thread David Coe

Hi, once more!

On 04/05/2021 07:52, Suravee Suthikulpanit wrote:

On certain AMD platforms, when the IOMMU performance counter source
(csource) field is zero, power-gating for the counter is enabled, which
prevents write access and returns zero for read access.

This can cause invalid perf result especially when event multiplexing
is needed (i.e. more number of events than available counters) since
the current logic keeps track of the previously read counter value,
and subsequently re-program the counter to continue counting the event.
With power-gating enabled, we cannot gurantee successful re-programming
of the counter.

Workaround this issue by :

1. Modifying the ordering of setting/reading counters and enabing/
disabling csources to only access the counter when the csource
is set to non-zero.

2. Since AMD IOMMU PMU does not support interrupt mode, the logic
can be simplified to always start counting with value zero,
and accumulate the counter value when stopping without the need
to keep track and reprogram the counter with the previously read
counter value.

This has been tested on systems with and without power-gating.



Just as a final, sanity check I've loaded the same patched kernel 
5.11.0-16 on to an old AMD Athlon FX8350. So far, all seems in order: it 
loads IOMMUv1 and runs Ubuntu 21.04 without incident!


Much appreciate all your efforts, Suravee, Alex et al. Best regards.

--
David
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating

2021-05-04 Thread David Coe

Hi again!

On 04/05/2021 07:52, Suravee Suthikulpanit wrote:

On certain AMD platforms, when the IOMMU performance counter source
(csource) field is zero, power-gating for the counter is enabled, which
prevents write access and returns zero for read access.

This can cause invalid perf result especially when event multiplexing
is needed (i.e. more number of events than available counters) since
the current logic keeps track of the previously read counter value,
and subsequently re-program the counter to continue counting the event.
With power-gating enabled, we cannot gurantee successful re-programming
of the counter.

Workaround this issue by :

1. Modifying the ordering of setting/reading counters and enabing/
disabling csources to only access the counter when the csource
is set to non-zero.

2. Since AMD IOMMU PMU does not support interrupt mode, the logic
can be simplified to always start counting with value zero,
and accumulate the counter value when stopping without the need
to keep track and reprogram the counter with the previously read
counter value.



Results for Ryzen 4700U running Ubuntu 21.04 kernel 5.11.0-16 patched as 
above.


All amd_iommu events:

 Performance counter stats for 'system wide':

18   amd_iommu_0/cmd_processed/(33.29%)
 9   amd_iommu_0/cmd_processed_inv/(33.33%)
 0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/ (33.36%)
   308   amd_iommu_0/int_dte_hit/  (33.40%)
 5   amd_iommu_0/int_dte_mis/  (33.45%)
   346   amd_iommu_0/mem_dte_hit/  (33.46%)
 8,954   amd_iommu_0/mem_dte_mis/  (33.48%)
 0   amd_iommu_0/mem_iommu_tlb_pde_hit/(33.46%)
   771   amd_iommu_0/mem_iommu_tlb_pde_mis/(33.44%)
14   amd_iommu_0/mem_iommu_tlb_pte_hit/(33.40%)
   836   amd_iommu_0/mem_iommu_tlb_pte_mis/(33.36%)
 0   amd_iommu_0/mem_pass_excl/(33.32%)
 0   amd_iommu_0/mem_pass_pretrans/(33.28%)
 1,601   amd_iommu_0/mem_pass_untrans/ (33.27%)
 0   amd_iommu_0/mem_target_abort/ (33.27%)
 1,130   amd_iommu_0/mem_trans_total/  (33.27%)
 0   amd_iommu_0/page_tbl_read_gst/(33.27%)
   312   amd_iommu_0/page_tbl_read_nst/(33.28%)
   279   amd_iommu_0/page_tbl_read_tot/(33.27%)
 0   amd_iommu_0/smi_blk/  (33.29%)
 0   amd_iommu_0/smi_recv/ (33.27%)
 0   amd_iommu_0/tlb_inv/  (33.26%)
 0   amd_iommu_0/vapic_int_guest/  (33.25%)
   366   amd_iommu_0/vapic_int_non_guest/  (33.27%)

  10.001941666 seconds time elapsed


Groups of 8 amd_iommu events:

 Performance counter stats for 'system wide':

14   amd_iommu_0/cmd_processed/ 

 7   amd_iommu_0/cmd_processed_inv/ 

 0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/ 

   502   amd_iommu_0/int_dte_hit/ 

 6   amd_iommu_0/int_dte_mis/ 

   532   amd_iommu_0/mem_dte_hit/ 

13,622   amd_iommu_0/mem_dte_mis/ 

   159   amd_iommu_0/mem_iommu_tlb_pde_hit/ 



  10.002170562 seconds time elapsed


 Performance counter stats for 'system wide':

   762   amd_iommu_0/mem_iommu_tlb_pde_mis/ 

20   amd_iommu_0/mem_iommu_tlb_pte_hit/ 

   698   amd_iommu_0/mem_iommu_tlb_pte_mis/ 

 0   amd_iommu_0/mem_pass_excl/ 

 0   amd_iommu_0/mem_pass_pretrans/ 

15   amd_iommu_0/mem_pass_untrans/ 

 0   amd_iommu_0/mem_target_abort/ 

   718   amd_iommu_0/mem_trans_total/ 



  10.001683428 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/page_tbl_read_gst/ 

33   amd_iommu_0/page_tbl_read_nst/ 

33   amd_iommu_0/page_tbl_read_tot/ 

 0   amd_iommu_0/smi_blk/ 

 0   amd_iommu_0/smi_recv/ 

 0   amd_iommu_0/tlb_inv/ 

 0   amd_iommu_0/vapic_int_guest/ 

11,638   amd_iommu_0/vapic_int_non_guest/ 



  10.002205748 seconds time elapsed
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-18 Thread David Coe

Hi Suravee!

Results for Ryzen 2400G on Ubuntu 20.10, kernel 5.8.0-50 with patch 2/2 
alone. Events batched 3 x 8 to avoid counter-multiplexing (?) artefacts.


On 15/04/2021 10:28, Suthikulpanit, Suravee wrote:

David,

For the Ryzen 2400G, could you please try with:
- 1 event at a time
- Not more than 8 events (On your system, it has 2 banks x 4 counters/bank.
I am trying to see if this issue might be related to the counters 
multiplexing).



$ sudo dmesg | grep IOMMU
[sudo] password for info:
[0.543768] pci :00:00.2: AMD-Vi: IOMMU performance counters 
supported

[0.547696] pci :00:00.2: AMD-Vi: Found IOMMU cap 0x40
[0.549196] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 
counters/bank).

[0.811538] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel 


$ declare -a EventList=("amd_iommu_0/cmd_processed/, 
amd_iommu_0/cmd_processed_inv/, amd_iommu_0/ign_rd_wr_mmio_1ff8h/, 
amd_iommu_0/int_dte_hit/, amd_iommu_0/int_dte_mis/, 
amd_iommu_0/mem_dte_hit/, amd_iommu_0/mem_dte_mis/, 
amd_iommu_0/mem_iommu_tlb_pde_hit/" "amd_iommu_0/mem_iommu_tlb_pde_mis/, 
amd_iommu_0/mem_iommu_tlb_pte_hit/, amd_iommu_0/mem_iommu_tlb_pte_mis/, 
amd_iommu_0/mem_pass_excl/, amd_iommu_0/mem_pass_pretrans/, 
amd_iommu_0/mem_pass_untrans/, amd_iommu_0/mem_target_abort/, 
amd_iommu_0/mem_trans_total/" "amd_iommu_0/page_tbl_read_gst/, 
amd_iommu_0/page_tbl_read_nst/, amd_iommu_0/page_tbl_read_tot/, 
amd_iommu_0/smi_blk/, amd_iommu_0/smi_recv/, amd_iommu_0/tlb_inv/, 
amd_iommu_0/vapic_int_guest/, amd_iommu_0/vapic_int_non_guest/")



$ for event in "${EventList[@]}"; do sudo perf stat -e "$event" sleep 10 
; done


 Performance counter stats for 'system wide':

18   amd_iommu_0/cmd_processed/ 

 9   amd_iommu_0/cmd_processed_inv/ 

 0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/ 

   399   amd_iommu_0/int_dte_hit/ 

19   amd_iommu_0/int_dte_mis/ 

 1,177   amd_iommu_0/mem_dte_hit/ 

 5,521   amd_iommu_0/mem_dte_mis/ 

70   amd_iommu_0/mem_iommu_tlb_pde_hit/ 



  10.001490092 seconds time elapsed


 Performance counter stats for 'system wide':

   394   amd_iommu_0/mem_iommu_tlb_pde_mis/ 

   602   amd_iommu_0/mem_iommu_tlb_pte_hit/ 

 6,612   amd_iommu_0/mem_iommu_tlb_pte_mis/ 

 0   amd_iommu_0/mem_pass_excl/ 

 0   amd_iommu_0/mem_pass_pretrans/ 

 6,590   amd_iommu_0/mem_pass_untrans/ 

 0   amd_iommu_0/mem_target_abort/ 

   616   amd_iommu_0/mem_trans_total/ 



  10.001237585 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/page_tbl_read_gst/ 

78   amd_iommu_0/page_tbl_read_nst/ 

78   amd_iommu_0/page_tbl_read_tot/ 

 0   amd_iommu_0/smi_blk/ 

 0   amd_iommu_0/smi_recv/ 

 0   amd_iommu_0/tlb_inv/ 

 0   amd_iommu_0/vapic_int_guest/ 

   637   amd_iommu_0/vapic_int_non_guest/ 



  10.001186031 seconds time elapsed

Best regards,

--
David
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-15 Thread David Coe

Hi Suravee!

On 15/04/2021 10:28, Suthikulpanit, Suravee wrote:

David,

On 4/14/2021 10:33 PM, David Coe wrote:

Hi Suravee!

I've re-run your revert+update patch on Ubuntu's latest kernel 
5.11.0-14 partly to check my mailer's 'mangling' hadn't also reached 
the code!


There are 3 sets of results in the attachment, all for the Ryzen 
2400G. The as-distributed kernel already incorporates your IOMMU RFCv3 
patch.


A. As-distributed kernel (cold boot)
    >5 retries, so no IOMMU read/write capability, no amd_iommu events.

B. As-distributed kernel (warm boot)
    <5 retries, amd_iommu running stats show large numbers as before.

C. Revert+Update kernel
    amd_iommu events listed and also show large hit/miss numbers.

In due course, I'll load the new (revert+update) kernel on the 4700G 
but won't overload your mail-box unless something unusual turns up.


Best regards,



For the Ryzen 2400G, could you please try with:
- 1 event at a time
- Not more than 8 events (On your system, it has 2 banks x 4 counters/bank.
I am trying to see if this issue might be related to the counters 
multiplexing).


Thanks,


Herewith similar one-at-a-time perf stat results for the Ryzen 4700U. As 
before, more than 8 events displays the third (%) column and (sometimes) 
'silly' numbers in the first column.


--
David
$ sudo ./iommu_list.sh

 Performance counter stats for 'system wide':

20  amd_iommu_0/cmd_processed/

  10.003010666 seconds time elapsed


 Performance counter stats for 'system wide':

 5   amd_iommu_0/cmd_processed_inv/

  10.002349464 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/

  10.002386129 seconds time elapsed


 Performance counter stats for 'system wide':

   325   amd_iommu_0/int_dte_hit/

  10.002346630 seconds time elapsed


 Performance counter stats for 'system wide':

 2   amd_iommu_0/int_dte_mis/

  10.002365656 seconds time elapsed


 Performance counter stats for 'system wide':

   356   amd_iommu_0/mem_dte_hit/

  10.002426866 seconds time elapsed


 Performance counter stats for 'system wide':

 3,955   amd_iommu_0/mem_dte_mis/

  10.002729058 seconds time elapsed


 Performance counter stats for 'system wide':

 4   amd_iommu_0/mem_iommu_tlb_pde_hit/

  10.002422610 seconds time elapsed


 Performance counter stats for 'system wide':

 1,326   amd_iommu_0/mem_iommu_tlb_pde_mis/

  10.002397045 seconds time elapsed


 Performance counter stats for 'system wide':

 1,009   amd_iommu_0/mem_iommu_tlb_pte_hit/

  10.002445347 seconds time elapsed


 Performance counter stats for 'system wide':

 1,072   amd_iommu_0/mem_iommu_tlb_pte_mis/

  10.002414734 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/mem_pass_excl/

  10.002435482 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/mem_pass_pretrans/

  10.002409956 seconds time elapsed


 Performance counter stats for 'system wide':

 3,405   amd_iommu_0/mem_pass_untrans/

  10.002563812 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/mem_target_abort/

  10.002473657 seconds time elapsed


 Performance counter stats for 'system wide':

 1,311   amd_iommu_0/mem_trans_total/

  10.002471787 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/page_tbl_read_gst/

  10.002216033 seconds time elapsed


 Performance counter stats for 'system wide':

   276,609   amd_iommu_0/page_tbl_read_nst/

  10.002029261 seconds time elapsed


 Performance counter stats for 'system wide':

   126,161   amd_iommu_0/page_tbl_read_tot/

  10.003569029 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/smi_blk/

  10.001871818 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/smi_recv/

  10.002212891 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/tlb_inv/

  10.002396606 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/vapic_int_guest/

  10.002435308 seconds time elapsed


 Performance counter stats for 'system wide':

   340   amd_iommu_0/vapic_int_non_guest/

  10.002405868 seconds time elapsed


$ sudo perf stat -e 'amd_iommu_0/mem_dte_hit/, amd_iommu_0/mem_dte_mis/, 
amd_iommu_0/mem_iommu_tlb_pde_hit/, amd_iommu_0/mem_iommu_tlb_pde_mis/, 
amd_iommu_0/mem_iommu_tlb_pte_hit/, amd_i

Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-15 Thread David Coe

I think you've put your finger on it, Suravee!

On 15/04/2021 10:28, Suthikulpanit, Suravee wrote:

David,

On 4/14/2021 10:33 PM, David Coe wrote:

Hi Suravee!

I've re-run your revert+update patch on Ubuntu's latest kernel 
5.11.0-14 partly to check my mailer's 'mangling' hadn't also reached 
the code!


There are 3 sets of results in the attachment, all for the Ryzen 
2400G. The as-distributed kernel already incorporates your IOMMU RFCv3 
patch.


A. As-distributed kernel (cold boot)
    >5 retries, so no IOMMU read/write capability, no amd_iommu events.

B. As-distributed kernel (warm boot)
    <5 retries, amd_iommu running stats show large numbers as before.

C. Revert+Update kernel
    amd_iommu events listed and also show large hit/miss numbers.

In due course, I'll load the new (revert+update) kernel on the 4700G 
but won't overload your mail-box unless something unusual turns up.


Best regards,



For the Ryzen 2400G, could you please try with:
- 1 event at a time
- Not more than 8 events (On your system, it has 2 banks x 4 counters/bank.
I am trying to see if this issue might be related to the counters 
multiplexing).


Thanks,


Attached are the results you requested for the 2400G along with a tiny 
shell-script.


One event at a time and various batches of less than 8 events produce 
unexceptionable data. One final batch of 10 events and (hoopla) up go 
the counter stats.


Will you be doing something in mitigation or does this just go with the 
patch? Is there anything further you need from me? I'll run the script 
on the 4700U but I don't expect surprises :-).


All most appreciated,

--
David


iommu_list.sh
Description: application/shellscript
$ sudo ./iommu_list.sh

 Performance counter stats for 'system wide':

12  amd_iommu_0/cmd_processed/

  10.001266851 seconds time elapsed


 Performance counter stats for 'system wide':

11   amd_iommu_0/cmd_processed_inv/

  10.001259049 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/

  10.000791810 seconds time elapsed


 Performance counter stats for 'system wide':

   350   amd_iommu_0/int_dte_hit/

  10.000848437 seconds time elapsed


 Performance counter stats for 'system wide':

16   amd_iommu_0/int_dte_mis/

  10.001271989 seconds time elapsed


 Performance counter stats for 'system wide':

   348   amd_iommu_0/mem_dte_hit/

  10.000808074 seconds time elapsed


 Performance counter stats for 'system wide':

   211,925   amd_iommu_0/mem_dte_mis/

  10.000915362 seconds time elapsed


 Performance counter stats for 'system wide':

30   amd_iommu_0/mem_iommu_tlb_pde_hit/

  10.001520597 seconds time elapsed


 Performance counter stats for 'system wide':

   450   amd_iommu_0/mem_iommu_tlb_pde_mis/

  10.000877493 seconds time elapsed


 Performance counter stats for 'system wide':

10,953   amd_iommu_0/mem_iommu_tlb_pte_hit/

  10.000831802 seconds time elapsed


 Performance counter stats for 'system wide':

13,235   amd_iommu_0/mem_iommu_tlb_pte_mis/

  10.001292003 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/mem_pass_excl/

  10.000836000 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/mem_pass_pretrans/

  10.000799887 seconds time elapsed


 Performance counter stats for 'system wide':

12,283   amd_iommu_0/mem_pass_untrans/

  10.000815339 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/mem_target_abort/

  10.001205168 seconds time elapsed


 Performance counter stats for 'system wide':

 1,333   amd_iommu_0/mem_trans_total/

  10.000915359 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/page_tbl_read_gst/

  10.001248235 seconds time elapsed


 Performance counter stats for 'system wide':

65   amd_iommu_0/page_tbl_read_nst/

  10.001266411 seconds time elapsed


 Performance counter stats for 'system wide':

78   amd_iommu_0/page_tbl_read_tot/

  10.001272406 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/smi_blk/

  10.001282912 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/smi_recv/

  10.001223193 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/tlb_inv/

  10.001234853 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/vapic_int

Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-14 Thread David Coe

Hi again!

For completeness, I'm attaching results for the revert+update patch 
running the Ubuntu 21.04β kernel 5.11.0-14 on a Ryzen 4700U laptop.


The enormous amd_iommu running stats aren't always there, as they nearly 
always are on the the 2400G desktop, but they do turn up (depending on 
what the machine's been doing).


Be very interested in your thoughts on their relevance!

Best regards,

--
David
$ sudo dmesg | grep IOMMU
[0.498593] pci :00:00.2: AMD-Vi: IOMMU performance counters supported
[0.500507] pci :00:00.2: AMD-Vi: Found IOMMU cap 0x40
[0.502011] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[1.113195] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel 


$ sudo perf list | grep iommu
  amd_iommu_0/cmd_processed/ [Kernel PMU event]
  amd_iommu_0/cmd_processed_inv/ [Kernel PMU event]
  amd_iommu_0/ign_rd_wr_mmio_1ff8h/  [Kernel PMU event]
  amd_iommu_0/int_dte_hit/   [Kernel PMU event]
  amd_iommu_0/int_dte_mis/   [Kernel PMU event]
  amd_iommu_0/mem_dte_hit/   [Kernel PMU event]
  amd_iommu_0/mem_dte_mis/   [Kernel PMU event]
  amd_iommu_0/mem_iommu_tlb_pde_hit/ [Kernel PMU event]
  amd_iommu_0/mem_iommu_tlb_pde_mis/ [Kernel PMU event]
  amd_iommu_0/mem_iommu_tlb_pte_hit/ [Kernel PMU event]
  amd_iommu_0/mem_iommu_tlb_pte_mis/ [Kernel PMU event]
  amd_iommu_0/mem_pass_excl/ [Kernel PMU event]
  amd_iommu_0/mem_pass_pretrans/ [Kernel PMU event]
  amd_iommu_0/mem_pass_untrans/  [Kernel PMU event]
  amd_iommu_0/mem_target_abort/  [Kernel PMU event]
  amd_iommu_0/mem_trans_total/   [Kernel PMU event]
  amd_iommu_0/page_tbl_read_gst/ [Kernel PMU event]
  amd_iommu_0/page_tbl_read_nst/ [Kernel PMU event]
  amd_iommu_0/page_tbl_read_tot/ [Kernel PMU event]
  amd_iommu_0/smi_blk/   [Kernel PMU event]
  amd_iommu_0/smi_recv/  [Kernel PMU event]
  amd_iommu_0/tlb_inv/   [Kernel PMU event]
  amd_iommu_0/vapic_int_guest/   [Kernel PMU event]
  amd_iommu_0/vapic_int_non_guest/   [Kernel PMU event]
  intel_iommu:bounce_map_sg  [Tracepoint event]
  intel_iommu:bounce_map_single  [Tracepoint event]
  intel_iommu:bounce_unmap_single[Tracepoint event]
  intel_iommu:map_sg [Tracepoint event]
  intel_iommu:map_single [Tracepoint event]
  intel_iommu:unmap_sg   [Tracepoint event]
  intel_iommu:unmap_single   [Tracepoint event]
  iommu:add_device_to_group  [Tracepoint event]
  iommu:attach_device_to_domain  [Tracepoint event]
  iommu:detach_device_from_domain[Tracepoint event]
  iommu:io_page_fault[Tracepoint event]
  iommu:map  [Tracepoint event]
  iommu:remove_device_from_group [Tracepoint event]
  iommu:unmap[Tracepoint event]

$ sudo perf stat -e 'amd_iommu_0/cmd_processed/, 
amd_iommu_0/cmd_processed_inv/, amd_iommu_0/ign_rd_wr_mmio_1ff8h/, 
amd_iommu_0/int_dte_hit/, amd_iommu_0/int_dte_mis/, amd_iommu_0/mem_dte_hit/, 
amd_iommu_0/mem_dte_mis/, amd_iommu_0/mem_iommu_tlb_pde_hit/, 
amd_iommu_0/mem_iommu_tlb_pde_mis/, amd_iommu_0/mem_iommu_tlb_pte_hit/, 
amd_iommu_0/mem_iommu_tlb_pte_mis/, amd_iommu_0/mem_pass_excl/, 
amd_iommu_0/mem_pass_pretrans/, amd_iommu_0/mem_pass_untrans/, 
amd_iommu_0/mem_target_abort/, amd_iommu_0/mem_trans_total/, 
amd_iommu_0/page_tbl_read_gst/, amd_iommu_0/page_tbl_read_nst/, 
amd_iommu_0/page_tbl_read_tot/, amd_iommu_0/smi_blk/, amd_iommu_0/smi_recv/, 
amd_iommu_0/tlb_inv/, amd_iommu_0/vapic_int_guest/, 
amd_iommu_0/vapic_int_non_guest/' sleep 10

Performance counter stats for 'system wide':

30  amd_iommu_0/cmd_processed/ (33.31%)
17   amd_iommu_0/cmd_processed_inv/(33.34%)
 0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/ (33.36%)
   374   amd_iommu_0/int_dte_hit/  (33.39%)
29   amd_iommu_0/int_dte_mis/  (33.44%)
   394   amd_iommu_0/mem_dte_hit/  (33.46%)
 9,117   amd_iommu_0/mem_dte_mis/  (33.45%)
 5   amd_iommu_0/mem_iommu_tlb_pde_hit/(33.46%)
   819   amd_iommu_0/mem_iommu_tlb_pde_mis/(33.42%)
 2   

Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-14 Thread David Coe

Hi Suravee!

I've re-run your revert+update patch on Ubuntu's latest kernel 5.11.0-14 
partly to check my mailer's 'mangling' hadn't also reached the code!


There are 3 sets of results in the attachment, all for the Ryzen 2400G. 
The as-distributed kernel already incorporates your IOMMU RFCv3 patch.


A. As-distributed kernel (cold boot)
   >5 retries, so no IOMMU read/write capability, no amd_iommu events.

B. As-distributed kernel (warm boot)
   <5 retries, amd_iommu running stats show large numbers as before.

C. Revert+Update kernel
   amd_iommu events listed and also show large hit/miss numbers.

In due course, I'll load the new (revert+update) kernel on the 4700G but 
won't overload your mail-box unless something unusual turns up.


Best regards,

--
David
A. As Supplied - Cold Boot
**

$ sudo dmesg | grep IOMMU
[0.710610] pci :00:00.2: AMD-Vi: Unable to read/write to IOMMU perf 
counter.
[0.714365] pci :00:00.2: AMD-Vi: Found IOMMU cap 0x40
[0.984616] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel 

$ sudo perf list | grep iommu
  intel_iommu:bounce_map_sg  [Tracepoint event]
  intel_iommu:bounce_map_single  [Tracepoint event]
  intel_iommu:bounce_unmap_single[Tracepoint event]
  intel_iommu:map_sg [Tracepoint event]
  intel_iommu:map_single [Tracepoint event]
  intel_iommu:unmap_sg   [Tracepoint event]
  intel_iommu:unmap_single   [Tracepoint event]
  iommu:add_device_to_group  [Tracepoint event]
  iommu:attach_device_to_domain  [Tracepoint event]
  iommu:detach_device_from_domain[Tracepoint event]
  iommu:io_page_fault[Tracepoint event]
  iommu:map  [Tracepoint event]
  iommu:remove_device_from_group [Tracepoint event]
  iommu:unmap[Tracepoint event]

No amd_iommu events listed.


B. As Supplied - Warm Boot
**

$ sudo dmesg | grep IOMMU
[0.515523] pci :00:00.2: AMD-Vi: IOMMU performance counters supported
[0.519236] pci :00:00.2: AMD-Vi: Found IOMMU cap 0x40
[0.520549] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[0.795781] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel 

$ sudo perf stat -e 'amd_iommu_0/cmd_processed/, 
amd_iommu_0/cmd_processed_inv/, amd_iommu_0/ign_rd_wr_mmio_1ff8h/, 
amd_iommu_0/int_dte_hit/, amd_iommu_0/int_dte_mis/, amd_iommu_0/mem_dte_hit/, 
amd_iommu_0/mem_dte_mis/, amd_iommu_0/mem_iommu_tlb_pde_hit/, 
amd_iommu_0/mem_iommu_tlb_pde_mis/, amd_iommu_0/mem_iommu_tlb_pte_hit/, 
amd_iommu_0/mem_iommu_tlb_pte_mis/, amd_iommu_0/mem_pass_excl/, 
amd_iommu_0/mem_pass_pretrans/, amd_iommu_0/mem_pass_untrans/, 
amd_iommu_0/mem_target_abort/, amd_iommu_0/mem_trans_total/, 
amd_iommu_0/page_tbl_read_gst/, amd_iommu_0/page_tbl_read_nst/, 
amd_iommu_0/page_tbl_read_tot/, amd_iommu_0/smi_blk/, amd_iommu_0/smi_recv/, 
amd_iommu_0/tlb_inv/, amd_iommu_0/vapic_int_guest/, 
amd_iommu_0/vapic_int_non_guest/' sleep 10

Performance counter stats for 'system wide':

  0   amd_iommu_0/cmd_processed/   (33.39%)
  0   amd_iommu_0/cmd_processed_inv/   (33.06%)
  0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/(33.58%)
842,245,888,947,849   amd_iommu_0/int_dte_hit/ (33.42%)
848,982,159,636,118   amd_iommu_0/int_dte_mis/ (33.15%)
835,698,854,752,581   amd_iommu_0/mem_dte_hit/ (33.68%)
839,060,819,932,270   amd_iommu_0/mem_dte_mis/ (33.55%)
  0   amd_iommu_0/mem_iommu_tlb_pde_hit/   (33.44%)
837,231,240,047,576   amd_iommu_0/mem_iommu_tlb_pde_mis/   (33.62%)
842,688,371,629,123   amd_iommu_0/mem_iommu_tlb_pte_hit/   (33.40%)
851,647,568,857,291   amd_iommu_0/mem_iommu_tlb_pte_mis/   (33.05%)
  0   amd_iommu_0/mem_pass_excl/   (33.30%)
  0   amd_iommu_0/mem_pass_pretrans/   (33.36%)
852,801,037,224,491   amd_iommu_0/mem_pass_untrans/(33.01%)
  0   amd_iommu_0/mem_target_abort/(33.50%)
 46,371amd_iommu_0/mem_trans_total/(33.28%)
  0   amd_iommu_0/page_tbl_read_gst/   (33.00%)
  1,663amd_iommu_0/page_tbl_read_nst/  (33.55%)
 17amd_iommu_0/page_tbl_read_tot/  (33.38%)
  0   amd_iommu_0/smi_blk/ (33.28%)
  0   amd_iommu_0/smi_recv/(33.49%)
  0   amd_iommu_0/tlb_inv/ (33.32%)
  0   amd_iommu_0/vapic_int_guest/ (32.96%)
318   

Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-13 Thread David Coe

Hi Suravee!

Just in case (!), I've run your revert+update patch on kernel 5.11.0-13, 
Ubuntu 21.04β running on an AMD FX-8350 (pre Zen and IOMMUv2). As with 
the AMD Ryzen 2400G and 4700U, I'm finding no obvious issues.



$ sudo dmesg | grep IOMMU
[0.948890] pci :00:00.2: AMD-Vi: Found IOMMU cap 0x40
[4.393773] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel 
[4.393776] AMD-Vi: AMD IOMMUv2 functionality not available on this 
system



$ systool -m kvm_amd -v
Module = "kvm_amd"

  Attributes:
coresize= "114688"
initsize= "0"
initstate   = "live"
refcnt  = "0"
srcversion  = "4371BA17A41823101F90761"
taint   = ""
uevent  = 

  Parameters:
avic= "0"
dump_invalid_vmcb   = "N"
nested  = "1"
npt = "1"
nrips   = "1"
pause_filter_count_grow= "2"
pause_filter_count_max= "65535"
pause_filter_count_shrink= "0"
pause_filter_count  = "3000"
pause_filter_thresh = "128"
sev_es  = "0"
sev = "0"
vgif= "0"
vls = "0"

  Sections:


$ compgen -G "/sys/kernel/iommu_groups/*/devices/*"
/sys/kernel/iommu_groups/9/devices/:00:14.2
/sys/kernel/iommu_groups/0/devices/:00:00.0
/sys/kernel/iommu_groups/10/devices/:00:14.3
/sys/kernel/iommu_groups/2/devices/:00:04.0
/sys/kernel/iommu_groups/12/devices/:00:14.5
/sys/kernel/iommu_groups/4/devices/:00:0d.0
/sys/kernel/iommu_groups/14/devices/:00:16.0
/sys/kernel/iommu_groups/14/devices/:00:16.2
/sys/kernel/iommu_groups/6/devices/:00:12.0
/sys/kernel/iommu_groups/6/devices/:00:12.2
/sys/kernel/iommu_groups/16/devices/:02:00.0
/sys/kernel/iommu_groups/8/devices/:00:14.0
/sys/kernel/iommu_groups/1/devices/:00:02.0
/sys/kernel/iommu_groups/11/devices/:00:14.4
/sys/kernel/iommu_groups/3/devices/:00:0b.0
/sys/kernel/iommu_groups/13/devices/:00:15.3
/sys/kernel/iommu_groups/13/devices/:00:15.0
/sys/kernel/iommu_groups/13/devices/:06:00.0
/sys/kernel/iommu_groups/13/devices/:00:15.2
/sys/kernel/iommu_groups/13/devices/:07:00.0
/sys/kernel/iommu_groups/13/devices/:08:00.0
/sys/kernel/iommu_groups/13/devices/:00:15.1
/sys/kernel/iommu_groups/13/devices/:09:00.0
/sys/kernel/iommu_groups/5/devices/:00:11.0
/sys/kernel/iommu_groups/15/devices/:01:00.1
/sys/kernel/iommu_groups/15/devices/:01:00.0
/sys/kernel/iommu_groups/7/devices/:00:13.0
/sys/kernel/iommu_groups/7/devices/:00:13.2
/sys/kernel/iommu_groups/17/devices/:04:00.0


$ sudo kvm-ok
INFO: /dev/kvm exists
KVM acceleration can be used


$ perf list | grep iommu
No amd_iommu events

Best regards and many thanks.

--
David Coe

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-10 Thread David Coe
   (33.28%)
 0   amd_iommu_0/smi_recv/ (33.26%)
 0   amd_iommu_0/tlb_inv/  (33.23%)
 0   amd_iommu_0/vapic_int_guest/  (33.24%)
   366   amd_iommu_0/vapic_int_non_guest/  (33.27%)

The immediately obvious difference is the with the enormous count seen 
on mem_dte_mis on the older Ryzen 2400G. Will do some RTFM but anyone 
with comments and insight?


841,689,151,202,939   amd_iommu_0/mem_dte_mis/  (33.44%)

Otherwise, all seems to running smoothly (especially for a distribution 
still in β). Bravo and many thanks all!


--
David Coe
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-10 Thread David Coe
0   amd_iommu_0/mem_pass_pretrans/(33.28%)
 16,504   amd_iommu_0/mem_pass_untrans/ (33.28%)
  0   amd_iommu_0/mem_target_abort/ (33.28%)
  2,842   amd_iommu_0/mem_trans_total/  (33.28%)
  0   amd_iommu_0/page_tbl_read_gst/(33.28%)
111   amd_iommu_0/page_tbl_read_nst/(33.29%)
111   amd_iommu_0/page_tbl_read_tot/(33.28%)
  0   amd_iommu_0/smi_blk/  (33.28%)
  0   amd_iommu_0/smi_recv/ (33.29%)
  0   amd_iommu_0/tlb_inv/  (33.28%)
      0   amd_iommu_0/vapic_int_guest/  (33.28%)
345   amd_iommu_0/vapic_int_non_guest/  (33.29%)

   10.000799128 seconds time elapsed

Results for Ryzen 7 4700U to follow.

--
David Coe
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-09 Thread David Coe

On 09/04/2021 09:58, Suravee Suthikulpanit wrote:

In early AMD desktop/mobile platforms (during 2013), when the IOMMU
Performance Counter (PMC) support was first introduced in
commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter
resource management"), there was a HW bug where the counters could not
be accessed. The result was reading of the counter always return zero.

At the time, the suggested workaround was to add a test logic prior
to initializing the PMC feature to check if the counters can be programmed
and read back the same value. This has been working fine until the more
recent desktop/mobile platforms start enabling power gating for the PMC,
which prevents access to the counters. This results in the PMC support
being disabled unnecesarily.

Unfortunatly, there is no documentation of since which generation
of hardware the original PMC HW bug was fixed. Although, it was fixed
soon after the first introduction of the PMC. Base on this, we assume
that the buggy platforms are less likely to be in used, and it should
be relatively safe to remove this legacy logic.


Thanks for explaining the 'context', Suravee.


Link: 
https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Cc: Tj (Elloe Linux) 
Cc: Shuah Khan 
Cc: Alexander Monakov 
Cc: David Coe 
Cc: Paul Menzel 
Signed-off-by: Suravee Suthikulpanit 
---
  drivers/iommu/amd/init.c | 24 +---
  1 file changed, 1 insertion(+), 23 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 648cdfd03074..247cdda5d683 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1714,33 +1714,16 @@ static int __init init_iommu_all(struct 
acpi_table_header *table)
return 0;
  }
  
-static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,

-   u8 fxn, u64 *value, bool is_write);
-
  static void init_iommu_perf_ctr(struct amd_iommu *iommu)
  {
+   u64 val;
struct pci_dev *pdev = iommu->dev;
-   u64 val = 0xabcd, val2 = 0, save_reg = 0;
  
  	if (!iommu_feature(iommu, FEATURE_PC))

return;
  
  	amd_iommu_pc_present = true;
  
-	/* save the value to restore, if writable */

-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false))
-   goto pc_false;
-
-   /* Check if the performance counters can be written to */
-   if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) ||
-   (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) ||
-   (val != val2))
-   goto pc_false;
-
-   /* restore */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true))
-   goto pc_false;
-
pci_info(pdev, "IOMMU performance counters supported\n");
  
  	val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET);

@@ -1748,11 +1731,6 @@ static void init_iommu_perf_ctr(struct amd_iommu *iommu)
iommu->max_counters = (u8) ((val >> 7) & 0xf);
  
  	return;

-
-pc_false:
-   pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n");
-   amd_iommu_pc_present = false;
-   return;
  }
  
  static ssize_t amd_iommu_show_cap(struct device *dev,




I'll test your revert + update IOMMU patch on my Ryzen 2400G and 4700U 
most likely over the weekend. Very interesting!


Please be aware that your original IOMMU patch has already reached the 
imminent release of Ubuntu 21.04 (Hirsute). I've taken the liberty of 
adding Alex Hung (lead kernel developer at Ubuntu) to the circulation list.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu