Baunsgaard commented on PR #1875:
URL: https://github.com/apache/systemds/pull/1875#issuecomment-1667835644
on a script:
```R
M = rand(rows=1000000, cols = 1000, min = -10, max = 10, seed=10)
ma = 0
for( i in 1 :100){
ma = max(M)
}
print(ma)
```
The effect of the change is most pronounced on matrices that fit in L3
cache, and i saw bigger improvements in the new performance suite than on code
run from systemds. If the sizes exceed L3 cache. then there is basically no
difference from before.
Before:
```txt
# rows=1000, cols =1000
1 uamax 41.299 100000
409,431,847,233 cycles # 1.387 GHz
(30.77%)
523,227,969,934 instructions # 1.28 insn per cycle
(38.50%)
1 uamax 47.279 100000
413,383,947,826 cycles # 1.241 GHz
(30.92%)
524,182,015,584 instructions # 1.27 insn per cycle
(38.63%)
# rows=10000, cols = 1000
1 uamax 25.585 10000
1 uamax 28.182 10000
1 uamax 26.694 10000
1 uamax 24.848 10000
1 uamax 24.011 10000
1 uamax 23.784 10000
359,247.70 msec task-clock # 13.240 CPUs utilized
841,317 context-switches # 2.342 K/sec
28,284 cpu-migrations # 78.731 /sec
206,232 page-faults # 574.066 /sec
1,108,599,198,429 cycles # 3.086 GHz
(30.77%)
502,205,870,095 instructions # 0.45 insn per cycle
(38.47%)
392,097.52 msec task-clock # 13.138 CPUs utilized
778,068 context-switches # 1.984 K/sec
20,535 cpu-migrations # 52.372 /sec
205,014 page-faults # 522.865 /sec
943,647,244,159 cycles # 2.407 GHz
(30.81%)
500,275,305,729 instructions # 0.53 insn per cycle
(38.53%)
374,457.71 msec task-clock # 13.278 CPUs utilized
782,769 context-switches # 2.090 K/sec
10,443 cpu-migrations # 27.888 /sec
210,791 page-faults # 562.923 /sec
884,493,417,646 cycles # 2.362 GHz
(30.80%)
501,389,926,269 instructions # 0.57 insn per cycle
(38.49%)
352,095.39 msec task-clock # 13.366 CPUs utilized
833,655 context-switches # 2.368 K/sec
16,691 cpu-migrations # 47.405 /sec
205,433 page-faults # 583.458 /sec
1,100,798,737,086 cycles # 3.126 GHz
(30.78%)
501,248,342,415 instructions # 0.46 insn per cycle
(38.51%)
341,750.56 msec task-clock # 13.266 CPUs utilized
812,897 context-switches # 2.379 K/sec
10,289 cpu-migrations # 30.107 /sec
207,193 page-faults # 606.270 /sec
1,094,121,118,291 cycles # 3.202 GHz
(30.77%)
552,026,071,398 instructions # 0.50 insn per cycle
(38.47%)
338,876.78 msec task-clock # 13.324 CPUs utilized
845,958 context-switches # 2.496 K/sec
11,373 cpu-migrations # 33.561 /sec
205,277 page-faults # 605.757 /sec
1,106,712,339,290 cycles # 3.266 GHz
(30.77%)
501,743,153,827 instructions # 0.45 insn per cycle
(38.47%)
# rows=1000000, cols = 1000
1 uamax 27.449 100
# rows=2000000, cols = 1000
1 uamax 54.052 100
1 uamax 56.211 100
1 uamax 50.356 100
873,547.67 msec task-clock # 12.800 CPUs utilized
69,553 context-switches # 79.621 /sec
4,185 cpu-migrations # 4.791 /sec
4,144,419 page-faults # 4.744 K/sec
2,002,404,626,118 cycles # 2.292 GHz
(30.79%)
1,107,027,281,412 instructions # 0.55 insn per cycle
(38.47%)
783,457.81 msec task-clock # 12.735 CPUs utilized
65,242 context-switches # 83.274 /sec
4,064 cpu-migrations # 5.187 /sec
4,138,801 page-faults # 5.283 K/sec
2,617,122,863,360 cycles # 3.340 GHz
(30.78%)
1,106,787,253,056 instructions # 0.42 insn per cycle
(38.48%)
```
After:
```txt
#rows=1000, cols=1000
1 uamax 43.653 100000
400,435,008,944 cycles # 1.294 GHz
(30.87%)
523,272,433,220 instructions # 1.31 insn per cycle
(38.56%)
1 uamax 45.917 100000
392,868,426,026 cycles # 1.247 GHz
(30.89%)
522,011,469,228 instructions # 1.33 insn per cycle
(38.51%)
1 uamax 45.784 100000
396,903,167,347 cycles # 1.254 GHz
(30.62%)
523,673,717,928 instructions # 1.32 insn per cycle
(38.29%)
# rows=10000, cols = 1000
1 uamax 26.183 10000
1 uamax 32.466 10000
1 uamax 24.829 10000
1 uamax 24.217 10000
1 uamax 25.171 10000
367,292.87 msec task-clock # 13.210 CPUs utilized
797,898 context-switches # 2.172 K/sec
26,614 cpu-migrations # 72.460 /sec
206,226 page-faults # 561.476 /sec
1,065,258,998,105 cycles # 2.900 GHz
(30.75%)
500,348,412,981 instructions # 0.47 insn per cycle
(38.46%)
428,966.27 msec task-clock # 12.498 CPUs utilized
797,160 context-switches # 1.858 K/sec
45,284 cpu-migrations # 105.565 /sec
217,370 page-faults # 506.730 /sec
939,041,760,279 cycles # 2.189 GHz
(30.86%)
553,393,227,140 instructions # 0.59 insn per cycle
(38.56%)
348,975.63 msec task-clock # 13.216 CPUs utilized
831,593 context-switches # 2.383 K/sec
25,198 cpu-migrations # 72.206 /sec
207,501 page-faults # 594.600 /sec
1,154,976,623,442 cycles # 3.310 GHz
(30.79%)
501,929,998,662 instructions # 0.43 insn per cycle
(38.48%)
344,097.75 msec task-clock # 13.295 CPUs utilized
830,495 context-switches # 2.414 K/sec
19,296 cpu-migrations # 56.077 /sec
207,392 page-faults # 602.712 /sec
1,131,406,998,006 cycles # 3.288 GHz
(30.78%)
500,336,733,495 instructions # 0.44 insn per cycle
(38.48%)
355,303.44 msec task-clock # 13.089 CPUs utilized
895,437 context-switches # 2.520 K/sec
22,520 cpu-migrations # 63.382 /sec
216,614 page-faults # 609.659 /sec
1,097,246,939,598 cycles # 3.088 GHz
(30.74%)
503,138,981,191 instructions # 0.46 insn per cycle
(38.44%)
# rows=1000000, cols = 1000
1 uamax 25.558 100
# rows=2000000, cols=1000
1 uamax 57.889 100
1 uamax 50.071 100
1 uamax 60.730 100
1 uamax 54.011 100
837,371.27 msec task-clock # 12.806 CPUs utilized
69,876 context-switches # 83.447 /sec
4,316 cpu-migrations # 5.154 /sec
4,139,035 page-faults # 4.943 K/sec
2,447,771,277,086 cycles # 2.923 GHz
(30.78%)
1,103,928,104,585 instructions # 0.45 insn per cycle
(38.49%)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]