Baunsgaard commented on PR #1875:
URL: https://github.com/apache/systemds/pull/1875#issuecomment-1667835644

   on a script: 
   
   ```R
   M = rand(rows=1000000, cols = 1000, min = -10, max = 10, seed=10)
   ma = 0
   for( i in 1 :100){
       ma = max(M)
   }
   print(ma)
   ```
   
   The effect of the change is most pronounced on matrices that fit in L3 
cache, and i saw bigger improvements in the new performance suite than on code 
run from systemds. If the sizes exceed L3 cache. then there is basically no 
difference from before.
   
   Before:
   ```txt
   # rows=1000, cols =1000
    1  uamax         41.299  100000
      409,431,847,233      cycles                    #    1.387 GHz             
         (30.77%)
      523,227,969,934      instructions              #    1.28  insn per cycle  
         (38.50%)
    1  uamax         47.279  100000
      413,383,947,826      cycles                    #    1.241 GHz             
         (30.92%)
      524,182,015,584      instructions              #    1.27  insn per cycle  
         (38.63%)
   
   # rows=10000, cols = 1000
    1  uamax         25.585  10000
    1  uamax         28.182  10000
    1  uamax         26.694  10000
    1  uamax         24.848  10000
    1  uamax         24.011  10000
    1  uamax         23.784  10000
   
   
           359,247.70 msec task-clock                #   13.240 CPUs utilized   
       
              841,317      context-switches          #    2.342 K/sec           
       
               28,284      cpu-migrations            #   78.731 /sec            
       
              206,232      page-faults               #  574.066 /sec            
       
    1,108,599,198,429      cycles                    #    3.086 GHz             
         (30.77%)
      502,205,870,095      instructions              #    0.45  insn per cycle  
         (38.47%)
   
           392,097.52 msec task-clock                #   13.138 CPUs utilized   
       
              778,068      context-switches          #    1.984 K/sec           
       
               20,535      cpu-migrations            #   52.372 /sec            
       
              205,014      page-faults               #  522.865 /sec            
       
      943,647,244,159      cycles                    #    2.407 GHz             
         (30.81%)
      500,275,305,729      instructions              #    0.53  insn per cycle  
         (38.53%)
   
           374,457.71 msec task-clock                #   13.278 CPUs utilized   
       
              782,769      context-switches          #    2.090 K/sec           
       
               10,443      cpu-migrations            #   27.888 /sec            
       
              210,791      page-faults               #  562.923 /sec            
       
      884,493,417,646      cycles                    #    2.362 GHz             
         (30.80%)
      501,389,926,269      instructions              #    0.57  insn per cycle  
         (38.49%)
   
   
           352,095.39 msec task-clock                #   13.366 CPUs utilized   
       
              833,655      context-switches          #    2.368 K/sec           
       
               16,691      cpu-migrations            #   47.405 /sec            
       
              205,433      page-faults               #  583.458 /sec            
       
    1,100,798,737,086      cycles                    #    3.126 GHz             
         (30.78%)
      501,248,342,415      instructions              #    0.46  insn per cycle  
         (38.51%)
   
   
           341,750.56 msec task-clock                #   13.266 CPUs utilized   
       
              812,897      context-switches          #    2.379 K/sec           
       
               10,289      cpu-migrations            #   30.107 /sec            
       
              207,193      page-faults               #  606.270 /sec            
       
    1,094,121,118,291      cycles                    #    3.202 GHz             
         (30.77%)
      552,026,071,398      instructions              #    0.50  insn per cycle  
         (38.47%)
   
           338,876.78 msec task-clock                #   13.324 CPUs utilized   
       
              845,958      context-switches          #    2.496 K/sec           
       
               11,373      cpu-migrations            #   33.561 /sec            
       
              205,277      page-faults               #  605.757 /sec            
       
    1,106,712,339,290      cycles                    #    3.266 GHz             
         (30.77%)
      501,743,153,827      instructions              #    0.45  insn per cycle  
         (38.47%)
   
   # rows=1000000, cols = 1000
   1  uamax         27.449    100
   # rows=2000000, cols = 1000
    1  uamax         54.052    100
    1  uamax         56.211    100
    1  uamax         50.356    100
   
           873,547.67 msec task-clock                #   12.800 CPUs utilized   
       
               69,553      context-switches          #   79.621 /sec            
       
                4,185      cpu-migrations            #    4.791 /sec            
       
            4,144,419      page-faults               #    4.744 K/sec           
       
    2,002,404,626,118      cycles                    #    2.292 GHz             
         (30.79%)
    1,107,027,281,412      instructions              #    0.55  insn per cycle  
         (38.47%)
   
   
           783,457.81 msec task-clock                #   12.735 CPUs utilized   
       
               65,242      context-switches          #   83.274 /sec            
       
                4,064      cpu-migrations            #    5.187 /sec            
       
            4,138,801      page-faults               #    5.283 K/sec           
       
    2,617,122,863,360      cycles                    #    3.340 GHz             
         (30.78%)
    1,106,787,253,056      instructions              #    0.42  insn per cycle  
         (38.48%)
   ```
   
   After:
   ```txt
   #rows=1000, cols=1000
    1  uamax         43.653  100000
      400,435,008,944      cycles                    #    1.294 GHz             
         (30.87%)
      523,272,433,220      instructions              #    1.31  insn per cycle  
         (38.56%)
    1  uamax         45.917  100000
      392,868,426,026      cycles                    #    1.247 GHz             
         (30.89%)
      522,011,469,228      instructions              #    1.33  insn per cycle  
         (38.51%)
    1  uamax         45.784  100000
      396,903,167,347      cycles                    #    1.254 GHz             
         (30.62%)
      523,673,717,928      instructions              #    1.32  insn per cycle  
         (38.29%)
   
   # rows=10000, cols = 1000
    1  uamax         26.183  10000
    1  uamax         32.466  10000
    1  uamax         24.829  10000
    1  uamax         24.217  10000
    1  uamax         25.171  10000
   
   
           367,292.87 msec task-clock                #   13.210 CPUs utilized   
       
              797,898      context-switches          #    2.172 K/sec           
       
               26,614      cpu-migrations            #   72.460 /sec            
       
              206,226      page-faults               #  561.476 /sec            
       
    1,065,258,998,105      cycles                    #    2.900 GHz             
         (30.75%)
      500,348,412,981      instructions              #    0.47  insn per cycle  
         (38.46%)
   
   
           428,966.27 msec task-clock                #   12.498 CPUs utilized   
       
              797,160      context-switches          #    1.858 K/sec           
       
               45,284      cpu-migrations            #  105.565 /sec            
       
              217,370      page-faults               #  506.730 /sec            
       
      939,041,760,279      cycles                    #    2.189 GHz             
         (30.86%)
      553,393,227,140      instructions              #    0.59  insn per cycle  
         (38.56%)
   
   
           348,975.63 msec task-clock                #   13.216 CPUs utilized   
       
              831,593      context-switches          #    2.383 K/sec           
       
               25,198      cpu-migrations            #   72.206 /sec            
       
              207,501      page-faults               #  594.600 /sec            
       
    1,154,976,623,442      cycles                    #    3.310 GHz             
         (30.79%)
      501,929,998,662      instructions              #    0.43  insn per cycle  
         (38.48%)
   
   
           344,097.75 msec task-clock                #   13.295 CPUs utilized   
       
              830,495      context-switches          #    2.414 K/sec           
       
               19,296      cpu-migrations            #   56.077 /sec            
       
              207,392      page-faults               #  602.712 /sec            
       
    1,131,406,998,006      cycles                    #    3.288 GHz             
         (30.78%)
      500,336,733,495      instructions              #    0.44  insn per cycle  
         (38.48%)
   
   
           355,303.44 msec task-clock                #   13.089 CPUs utilized   
       
              895,437      context-switches          #    2.520 K/sec           
       
               22,520      cpu-migrations            #   63.382 /sec            
       
              216,614      page-faults               #  609.659 /sec            
       
    1,097,246,939,598      cycles                    #    3.088 GHz             
         (30.74%)
      503,138,981,191      instructions              #    0.46  insn per cycle  
         (38.44%)
   
   # rows=1000000, cols = 1000
    1  uamax         25.558    100
   # rows=2000000, cols=1000
    1  uamax         57.889    100
    1  uamax         50.071    100
    1  uamax         60.730    100
    1  uamax         54.011    100
   
           837,371.27 msec task-clock                #   12.806 CPUs utilized   
       
               69,876      context-switches          #   83.447 /sec            
       
                4,316      cpu-migrations            #    5.154 /sec            
       
            4,139,035      page-faults               #    4.943 K/sec           
       
    2,447,771,277,086      cycles                    #    2.923 GHz             
         (30.78%)
    1,103,928,104,585      instructions              #    0.45  insn per cycle  
         (38.49%)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to