** Also affects: linux (Ubuntu Noble)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Oracular)
   Importance: High
       Status: New

** Changed in: ubuntu-power-systems
       Status: New => Triaged

** Changed in: linux (Ubuntu Noble)
       Status: New => Triaged

** Changed in: linux (Ubuntu Oracular)
       Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2070253

Title:
  KVM on PowerVM: L2 Guest-Aggressively entering CEDE results in low
  performance. Possible tuning opportunity.

Status in The Ubuntu-power-systems project:
  Triaged
Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Noble:
  Triaged
Status in linux source package in Oracular:
  Fix Committed

Bug description:
  KVM on PowerVM: L2 Guest-Aggressively entering CEDE results in low
  performance. Possible tuning opportunity.

  ---uname output---
  Linux rhel86edb1 #1 SMP Sun Jan 21 11:45:44 EST 2024 ppc64le ppc64le ppc64le 
GNU/Linux
   
  ---Steps to Reproduce---
  Example: run READ only Test using EDB-PGBENCH and DT7 workloads on
   1. L1-Host 
   2. L2-Guest CEDE ON
   3. L2-Guest CEDE OFF

  significant performance drop is observed in L2-Guest CEDE on vs
  L2-Guest CEDE off case.

  Note: Host and Guest configuration  used performance experiments are
  listed below.

  Location of EDB-PGBENCH: 
  #wget 
http://ci-http-results.aus.stglabs.ibm.com/perfTest/scripts/Bug_Scripts/pgbench_install.sh
  #chmod 777 pgbench_install.sh
  #./pgbench_install.sh -->> it will install EDB(pgbench) and run edb on target 
lpar. 

  Location of DT7 workload:

  #wget 
http://ci-http-results.aus.stglabs.ibm.com/perfTest/scripts/Bug_Scripts/DT7-Install.sh
  #chmod 777 DT7-Install.sh
  #./DT7-Install.sh -->> It will install DT7.

  Sample Commands : Once installation was successful run below commands
  on target lpar.

  EDB-PGBENCH Commands :

  # su - enterprisedb
  # vi t1.tc -->> copy below lines to t1.tc file . 

  ##########t1.tc##########
  runname=select
  SCALE=100
  runtime=300
  thread="40"
  smtlist="8"
  mode=select
  recreateinstance=yes
  recreateduringrun=yes
  warmup=no
  perf_stat=yes
  PGSQL=/usr/local/pgsql/bin
  #PGSQL=/usr/edb/as14/bin
  #PGPORT=5432
  cores=5
  ##########t1.tc##########

  #cp t1.tc tc/
  #./auto-run-test.sh

  DT7 Commands :

  After installation of DT7 run below command :
  #cd /root
  #./DayTrader7_Run.sh -u 20 -l 900 -i 2  

  ######################################################################
  Machine Type: Power 10  LPAR (RHEL9.3)
  gcc           : 11.4.1
  Memory        : 300GB
  Test type     : pgbench-edb, DT7
  ######################################################################
  KVM Host lscpu output : 

  # lscpu
  Architecture:            ppc64le
    Byte Order:            Little Endian
  CPU(s):                  96
    On-line CPU(s) list:   0-39
    Off-line CPU(s) list:  40-95
  Model name:              POWER10 (architected), altivec supported
    Model:                 2.0 (pvr 0080 0200)
    Thread(s) per core:    8
    Core(s) per socket:    5
    Socket(s):             1
    Physical sockets:      1
    Physical chips:        4
    Physical cores/chip:   12
  Virtualization features:
    Hypervisor vendor:     pHyp
    Virtualization type:   para
  Caches (sum of all):
    L1d:                   320 KiB (10 instances)
    L1i:                   480 KiB (10 instances)
    L2:                    10 MiB (10 instances)
    L3:                    40 MiB (10 instances)
  NUMA:
    NUMA node(s):          1
    NUMA node2 CPU(s):     0-39
  Vulnerabilities:
    Gather data sampling:  Not affected
    Itlb multihit:         Not affected
    L1tf:                  Not affected
    Mds:                   Not affected
    Meltdown:              Not affected
    Mmio stale data:       Not affected
    Retbleed:              Not affected
    Spec rstack overflow:  Not affected
    Spec store bypass:     Not affected
    Spectre v1:            Vulnerable, ori31 speculation barrier enabled
    Spectre v2:            Vulnerable
    Srbds:                 Not affected
    Tsx async abort:       Not affected

  
  ##############################################

  KVM on PowerVM setup:

  KVM (Kernel Virtual Machine) is a virtualization module for Linux that
  provides the ability of virtualization to Linux i.e. it allows the
  kernel to function as a hypervisor.

  We used P10 2S4U system for this experiment.

  Workloads: DT7 and PGBENCH in details:

  DT7 is an open source benchmark application emulating an online stock trading 
system.
  DT7 consist of 3 components 
  1) Jmeter 
  2) WAS (WebSphere Application Server)
  3) DB2

  DayTrader benchmark/application will be installed/deployed on WAS and
  this used DB2 as a backbone database.  Jmeter generate the request and
  interact with the WAS. which would be kind of middle ware.

  PGBENCH : 
  pgbench is a simple program for running benchmark tests on PostgreSQL. It 
runs the same sequence of SQL commands over and over, possibly in multiple 
concurrent database sessions, and then calculates the average transaction rate 
(transactions per second).

  Config of KVM Host and L2-Guest:

  KVM Host Config : 
  # uname -a
  Linux  #1 SMP Sun Jan 21 11:45:44 EST 2024 ppc64le ppc64le ppc64le GNU/Linux
  # numactl -H
  available: 1 nodes (1)
  node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
  node 1 size: 292860 MB
  node 1 free: 290979 MB
  node distances:
  node   1
    1:  10
  # cat /proc/cmdline
  
BOOT_IMAGE=(ieee1275//pci@800000020000021/pci1014\\,683@0/namespace@1,msdos2)/vmlinuz-6.7.0-nested.1.1a946fcde971.up.ibm.el9.ppc64le
 root=/dev/mapper/rhel_rhel86edb-root ro 
crashkernel=2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G 
rd.lvm.lv=rhel_rhel86edb/root rd.lvm.lv=rhel_rhel86edb/swap biosdevname=0 
mitigations=off doorbell=off
  # ppc64_cpu --dscr
  DSCR is 23
  # cpupower idle-info
  CPUidle driver: pseries_idle
  CPUidle governor: menu
  analyzing CPU 0:

  Number of idle states: 2
  Available idle states: snooze CEDE
  snooze:
  Flags/Description: snooze
  Latency: 0
  Usage: 2656
  Duration: 297483
  CEDE:
  Flags/Description: CEDE
  Latency: 12
  Usage: 159981
  Duration: 95235883853

  # qemu-system-ppc64 --version
  QEMU emulator version 7.1.0
  Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers

  #Libvirt version : libvirt-8.7.0


  L2 GUEST CONFIG :

  CPU's : UN-pinned

  # cat /proc/cmdline
  
BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.7.0-nested.1.1a946fcde971.up.ibm.el9.ppc64le
 root=/dev/mapper/rhel-root ro 
crashkernel=2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G 
rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap mitigations=off doorbell=off
  # ppc64_cpu --dscr
  DSCR is 23
  # cat /proc/cmdline
  
BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.7.0-nested.1.1a946fcde971.up.ibm.el9.ppc64le
 root=/dev/mapper/rhel-root ro 
crashkernel=2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G 
rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap mitigations=off doorbell=off
  # numactl -H
  available: 1 nodes (0)
  node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
  node 0 size: 106739 MB
  node 0 free: 105211 MB
  node distances:
  node   0
    0:  10

  We did DT7 and PGBENCH-Read only test on L2-Guest with CEDE On vs Off.
  We could see degradation with CEDE on compare with CEDE off.

  Here I?m adding DT7 and EDB-PGBENCH results.

  L2-GUEST 5Cores with CEDE on:

  1) EDB-PGBENCH Data : 
  + /usr/local/pgsql/bin/pgbench -n -S -T 120 -c 40 -j 40 pgbench
  pgbench (14.5)
  transaction type: <builtin: select only>
  scaling factor: 100
  query mode: simple
  number of clients: 40
  number of threads: 40
  duration: 120 s
  number of transactions actually processed: 21811958
  latency average = 0.220 ms
  initial connection time = 16.004 ms
  tps = 181761.468180 (without initial connection time)

  
  2) DT7 Data: 
  DayTrader7 Report

   Run Group ID=0
   Run ID=40
   Run Description=Test Run
   Host=127.0.0.1                 Users=40           Run_time=900

   Total Instances                 2
   Total Throughputs               2340.6

  L2-GUEST 5Cores with CEDE Off:

  1) EDB-PGBENCH  Data : 
  + /usr/local/pgsql/bin/pgbench -n -S -T 120 -c 40 -j 40 pgbench
  pgbench (14.5)
  transaction type: <builtin: select only>
  scaling factor: 100
  query mode: simple
  number of clients: 40
  number of threads: 40
  duration: 120 s
  number of transactions actually processed: 37804765
  latency average = 0.127 ms
  initial connection time = 5.910 ms
  tps = 315015.313022 (without initial connection time)

  2) DT7 Results: 
  
==================================================================================
   DayTrader7 Report

   Run Group ID=0
   Run ID=41
   Run Description=Test Run
   Host=127.0.0.1                 Users=40           Run_time=900

   Total Instances                 2

   Total Throughputs               3569.6
  
===================================================================================

  EDB-PGBENCH Performance Summary:

  CEDE ON  EDB-PGBENCH  Data : 181761.46818 tps 
  CEDE OFF EDB-PGBENCH  Data : 315015.31302 tps  

  Percentage Drop: (181761.46818-315015.31)*100/315015.3130= 42% 
  Guest when CEDE was turned ON under-performed by 42% vs CEDE turned OFF.

  DT7 Performance Summary:

  CEDE ON  DT7  Data : 2340.6 tps 
  CEDE OFF DT7  Data : 3569.6 tps  

  Percentage Drop : (2340.6-3569.6 )*100/3569.6= 34% 
  Guest when CEDE was turned ON under-performed by 34% vs CEDE turned OFF.

  From above data we observed that performance drops when L2-Guest CEDE
  is ON when compared to L2-Guest CEDE is OFF. It is well understood
  that the solution cannot be offered with Shared CEDE disabled.
  However, it would be ideal to reduce the aggressiveness of CEDE'ing to
  scale to higher performance which is acceptable.

  
  .........................................................................

  
  The patch for this fix has been merged into upstream kernel via commit

  7be6ce7043b4cf293c8826a48fd9f56931cef2cf("KVM: PPC: Book3S HV
  nestedv2: Cancel pending DEC exception")

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2070253/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to