In my previous results, I had used double (not float) for the
following variables: result, sq_i and sq_j. In the case of float
instead of double I get "nan" and not 0.000000.
Quoting Νικόλαος Ταμπουρατζής <ntampourat...@ece.auth.gr>:
> Dear Jason, all,
>
> I am trying to find the accuracy problem with RISCV-FS and I observe
> that the problem is created (at least in my dummy example) because
> the variables (double) are set to zero in random simulated time (for
> this reason I get different results among executions of the same
> code). Specifically for the following dummy code:
>
>
> #include <cmath>
> #include <stdio.h>
>
> int main(){
>
> int dim = 10;
>
> float result;
>
> for (int iter = 0; iter < 2; iter++){
> result = 0;
> for (int i = 0; i < dim; i++){
> for (int j = 0; j < dim; j++){
> float sq_i = sqrt(i);
> float sq_j = sqrt(j);
> result += sq_i * sq_j;
> printf("ITER: %d | i: %d | j: %d Result(i: %f | j:
> %f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result);
> }
> }
> printf("Final Result: %lf\n", result);
> }
> }
>
>
> The correct Final Result in both iterations is 372.721656. However,
> I get the following results in FS:
>
> ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j:
> 1.000000): 1.000000
> ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j:
> 1.414214): 2.414214
> ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j:
> 1.732051): 4.146264
> ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j:
> 1.414214): 1.414214
> ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j:
> 2.000000): 3.414214
> ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j:
> 2.449490): 5.863703
> ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j:
> 2.828427): 8.692130
> ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j:
> 3.162278): 11.854408
> ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j:
> 3.464102): 15.318510
> ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j:
> 3.741657): 19.060167
> ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j:
> 4.000000): 23.060167
> ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j:
> 4.242641): 27.302808
> ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j:
> 0.000000): 27.302808
> ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j:
> 1.732051): 29.034859
> ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j:
> 2.449490): 31.484348
> ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j:
> 3.000000): 34.484348
> ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j:
> 3.464102): 37.948450
> ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j:
> 3.872983): 41.821433
> ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j:
> 4.242641): 46.064074
> ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j:
> 4.582576): 50.646650
> ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j:
> 4.898979): 55.545629
> ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j:
> 5.196152): 60.741782
> ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j:
> 0.000000): 60.741782
> ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j:
> 2.000000): 62.741782
> ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j:
> 2.828427): 65.570209
> ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j:
> 3.464102): 69.034310
> ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j:
> 4.000000): 73.034310
> ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j:
> 4.472136): 77.506446
> ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j:
> 4.898979): 82.405426
> ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j:
> 5.291503): 87.696928
> ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j:
> 5.656854): 93.353783
> ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j:
> 6.000000): 99.353783
> ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j:
> 0.000000): 99.353783
> ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j:
> 2.236068): 101.589851
> ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j:
> 3.162278): 104.752128
> ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j:
> 3.872983): 108.625112
> ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j:
> 4.472136): 113.097248
> ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j:
> 5.000000): 118.097248
> ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j:
> 5.477226): 123.574473
> ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j:
> 5.916080): 129.490553
> ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j:
> 6.324555): 135.815108
> ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j:
> 6.708204): 142.523312
> ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j:
> 0.000000): 142.523312
> ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j:
> 2.449490): 144.972802
> ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j:
> 3.464102): 148.436904
> ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j:
> 4.242641): 152.679544
> ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j:
> 4.898979): 157.578524
> ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j:
> 5.477226): 163.055749
> ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j:
> 6.000000): 169.055749
> ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j:
> 6.480741): 175.536490
> ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j:
> 6.928203): 182.464693
> ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j:
> 7.348469): 189.813162
> ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j:
> 0.000000): 189.813162
> ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j:
> 2.645751): 192.458914
> ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j:
> 3.741657): 196.200571
> ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j:
> 4.582576): 200.783147
> ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j:
> 5.291503): 206.074649
> ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j:
> 5.916080): 211.990729
> ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j:
> 6.480741): 218.471470
> ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j:
> 7.000000): 225.471470
> ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j:
> 7.483315): 232.954785
> ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j:
> 7.937254): 240.892039
> ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j:
> 0.000000): 240.892039
> ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j:
> 2.828427): 243.720466
> ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j:
> 4.000000): 247.720466
> ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j:
> 4.898979): 252.619445
> ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j:
> 5.656854): 258.276300
> ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j:
> 6.324555): 264.600855
> ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j:
> 6.928203): 271.529058
> ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j:
> 7.483315): 279.012373
> ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j:
> 8.000000): 287.012373
> ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j:
> 8.485281): 295.497654
> ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j:
> 0.000000): 295.497654
> ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j:
> 3.000000): 298.497654
> ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j:
> 4.242641): 302.740295
> ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j:
> 5.196152): 307.936447
> ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j:
> 6.000000): 313.936447
> ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j:
> 6.708204): 320.644651
> ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j:
> 7.348469): 327.993120
> ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j:
> 7.937254): 335.930374
> ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j:
> 8.485281): 344.415656
> ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j:
> 9.000000): 353.415656
> Final Result: 353.415656
> ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j:
> 1.000000): 1.000000
> ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j:
> 1.414214): 2.414214
> ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j:
> 1.732051): 4.146264
> ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | i*j:
> 2.000000): 6.146264
> ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i*j:
> 2.236068): 8.382332
> ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | i*j:
> 2.449490): 10.831822
> ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i*j:
> 2.645751): 13.477573
> ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | i*j:
> 2.828427): 16.306001
> ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i*j:
> 3.000000): 19.306001
> ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j:
> 0.000000): 19.306001
> ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j:
> 1.414214): 20.720214
> ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j:
> 2.000000): 22.720214
> ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j:
> 2.449490): 25.169704
> ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j:
> 2.828427): 27.998131
> ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j:
> 3.162278): 31.160409
> ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j:
> 3.464102): 34.624510
> ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j:
> 3.741657): 38.366168
> ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j:
> 4.000000): 42.366168
> ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j:
> 4.242641): 46.608808
> ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j:
> 0.000000): 46.608808
> ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j:
> 1.732051): 48.340859
> ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j:
> 2.449490): 50.790349
> ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j:
> 3.000000): 53.790349
> ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j:
> 3.464102): 57.254450
> ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j:
> 3.872983): 61.127434
> ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j:
> 4.242641): 65.370075
> ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j:
> 4.582576): 69.952650
> ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j:
> 4.898979): 74.851630
> ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j:
> 5.196152): 80.047782
> ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j:
> 0.000000): 80.047782
> ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j:
> 2.000000): 82.047782
> ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j:
> 2.828427): 84.876209
> ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j:
> 3.464102): 88.340311
> ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j:
> 4.000000): 92.340311
> ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j:
> 4.472136): 96.812447
> ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j:
> 4.898979): 101.711426
> ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j:
> 5.291503): 107.002929
> ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j:
> 5.656854): 112.659783
> ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j:
> 6.000000): 118.659783
> ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j:
> 0.000000): 118.659783
> ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j:
> 2.236068): 120.895851
> ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j:
> 3.162278): 124.058129
> ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j:
> 3.872983): 127.931112
> ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j:
> 4.472136): 132.403248
> ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j:
> 5.000000): 137.403248
> ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j:
> 5.477226): 142.880474
> ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j:
> 5.916080): 148.796553
> ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j:
> 6.324555): 155.121109
> ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j:
> 6.708204): 161.829313
> ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j:
> 0.000000): 161.829313
> ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j:
> 2.449490): 164.278802
> ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j:
> 3.464102): 167.742904
> ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j:
> 4.242641): 171.985545
> ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j:
> 4.898979): 176.884524
> ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j:
> 5.477226): 182.361750
> ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j:
> 6.000000): 188.361750
> ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j:
> 6.480741): 194.842491
> ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j:
> 6.928203): 201.770694
> ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j:
> 7.348469): 209.119163
> ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j:
> 0.000000): 209.119163
> ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j:
> 2.645751): 211.764914
> ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j:
> 3.741657): 215.506572
> ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j:
> 4.582576): 220.089147
> ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j:
> 5.291503): 225.380650
> ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j:
> 5.916080): 231.296730
> ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j:
> 6.480741): 237.777470
> ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j:
> 7.000000): 244.777470
> ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j:
> 7.483315): 252.260785
> ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j:
> 7.937254): 260.198039
> ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j:
> 0.000000): 260.198039
> ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j:
> 2.828427): 263.026466
> ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j:
> 4.000000): 267.026466
> ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j:
> 4.898979): 271.925446
> ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j:
> 5.656854): 277.582300
> ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j:
> 6.324555): 283.906855
> ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j:
> 6.928203): 290.835059
> ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j:
> 7.483315): 298.318373
> ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j:
> 8.000000): 306.318373
> ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j:
> 8.485281): 314.803655
> ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j:
> 0.000000): 314.803655
> ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j:
> 3.000000): 317.803655
> ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j:
> 4.242641): 322.046295
> ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j:
> 5.196152): 327.242448
> ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j:
> 6.000000): 333.242448
> ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j:
> 6.708204): 339.950652
> ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j:
> 7.348469): 347.299121
> ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j:
> 7.937254): 355.236375
> ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j:
> 8.485281): 363.721656
> ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j:
> 9.000000): 372.721656
> Final Result: 372.721656
>
>
>
> As we can see in the following iterations the sqrt(1) as well as the
> result is set to zero for some reason.
>
> ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
> 0.000000): 0.000000
>
> Please help me to resolve the accuracy issue! I think that it will
> be very useful for gem5 community.
>
> To be noticed, I find the correct simulated tick in which the
> application started in FS (using m5 dumpstats), and I start the
> --debug-start, but the trace file which is generated is 10x larger
> than SE mode for the same application. How can I compare them?
>
> Thank you in advance!
> Best regards,
> Nikos
>
> Quoting Νικόλαος Ταμπουρατζής <ntampourat...@ece.auth.gr>:
>
>> Dear Jason,
>>
>> I am trying to use --debug-start but in FS mode it is very
>> difficult to find the tick on which the application is started!
>>
>> However, I am writing the following very simple c++ program:
>>
>> #include <cmath>
>> #include <stdio.h>
>>
>> int main(){
>>
>> int dim = 4096;
>>
>> double result;
>>
>> for (int iter = 0; iter < 2; iter++){
>> result = 0;
>> for (int i = 0; i < dim; i++){
>> for (int j = 0; j < dim; j++){
>> result += sqrt(i) * sqrt(j);
>> }
>> }
>> printf("Result: %lf\n", result); //Result: 30530733453.127449
>> }
>> }
>>
>> I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o
>> test_riscv test_riscv.cpp
>>
>>
>> While in X86 (without cross-compilation of course), QEMU-RISCV,
>> GEM5-SE the result is the same (30530733453.127449), in GEM5-FS the
>> result is different! In addition, the result is also different
>> between the 2 iterations.
>>
>> Please reproduce the error if you want in order to verify my result.
>> Ηow can the issue be resolved?
>>
>> Thank you in advance!
>>
>> Best regards,
>> Nikos
>>
>>
>> Quoting Jason Lowe-Power <ja...@lowepower.com>:
>>
>>> Hi Nikos,
>>>
>>> You can use --debug-start to start the debugging after some number of
>>> ticks. Also, I would expect that the difference should come up
quickly, so
>>> no need to run the program to the end.
>>>
>>> For the FS mode one, you will want to just start the trace as the
>>> application starts. This could be a bit of a pain.
>>>
>>> I'm not really sure what fundamentally could be different. FS and SE
mode
>>> use the exact same code for executing instructions, so I don't think
that's
>>> the problem. Have you tried running for smaller inputs or just one
>>> iteration?
>>>
>>> Jason
>>>
>>>
>>>
>>> On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής <
>>> ntampourat...@ece.auth.gr> wrote:
>>>
>>>> Dear Bobby,
>>>>
>>>> Iam trying to add --debug-flags=Exec (building the gem5 for gem5.opt
>>>> not for gem5.fast which I had) but the debug traces exceed the 20GB
>>>> (and it is not finished yet) for less than 1 simulated second. How can
>>>> I reduce the size of the debug-flags (or set something more specific)?
>>>>
>>>> In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. If you
>>>> want, you can compare these two output files
>>>> (hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As you can
>>>> see, something goes wrong with the accuracy of calculations in FS mode
>>>> (benchmark uses double precission). You can find the files here:
>>>> http://kition.mhl.tuc.gr:8000/d/68d82f3533/
>>>>
>>>> Best regards,
>>>> Nikos
>>>>
>>>> Quoting Jason Lowe-Power <ja...@lowepower.com>:
>>>>
>>>>> That's quite odd that it works in SE mode but not FS mode!
>>>>>
>>>>> I would suggest running with --debug-flags=Exec for both and then
>>>> perform a
>>>>> diff to see how they differ.
>>>>>
>>>>> Cheers,
>>>>> Jason
>>>>>
>>>>> On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής <
>>>>> ntampourat...@ece.auth.gr> wrote:
>>>>>
>>>>>> Dear Bobby,
>>>>>>
>>>>>> In QEMU I get the same (correct) results that I get in SE mode
>>>>>> simulation. I get invalid results in FS simulation (in both
>>>>>> riscv-fs.py and riscv-ubuntu-run.py). I cannot access real RISCV
>>>>>> hardware at this moment, however, if you want you may execute my
xhpcg
>>>>>> binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the
>>>>>> following configuration:
>>>>>>
>>>>>> ./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 --rt=0.1
>>>>>>
>>>>>> Please let me know if you have any updates!
>>>>>>
>>>>>> Best regards,
>>>>>> Nikos
>>>>>>
>>>>>>
>>>>>> Quoting Jason Lowe-Power <ja...@lowepower.com>:
>>>>>>
>>>>>>> Hi Nikos,
>>>>>>>
>>>>>>> I notice you said the following in your original email:
>>>>>>>
>>>>>>> In addition, I used the RISCV Ubuntu image
>>>>>>>> (
https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu
>>>> ),
>>>>>>>> I installed the gcc compiler, compile it (through qemu) and I get
>>>>>>>> wrong results too.
>>>>>>>
>>>>>>>
>>>>>>> Is this saying you get the wrong results is QEMU? If so, the bug
is in
>>>>>> GCC
>>>>>>> or the HPCG workload, not in gem5. If not, I would test in QEMU to
>>>> make
>>>>>>> sure the binary works there. Another way you could test to see if
the
>>>>>>> problem is your binary or gem5 would be to run it on real
hardware. We
>>>>>> have
>>>>>>> access to some RISC-V hardware here at UC Davis, if you don't have
>>>> access
>>>>>>> to it.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Jason
>>>>>>>
>>>>>>> On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής <
>>>>>>> ntampourat...@ece.auth.gr> wrote:
>>>>>>>
>>>>>>>> Dear Bobby,
>>>>>>>>
>>>>>>>> 1) I use the original riscv-fs.py which is provided in the latest
>>>> gem5
>>>>>>>> release.
>>>>>>>> I run the gem5 once (./build/RISCV/gem5.fast -d ./HPCG_FS_results
>>>>>>>> ./configs/example/gem5_library/riscv-fs.py) in order to download
the
>>>>>>>> riscv-bootloader-vmlinux-5.10 and riscv-disk-img.
>>>>>>>> After this I mount the riscv-disk-img (sudo mount -o loop
>>>>>>>> riscv-disk-img /mnt), put the xhpcg executable and I do the
following
>>>>>>>> changes in riscv-fs.py to boot the riscv-disk-img with executable:
>>>>>>>>
>>>>>>>> image = CustomDiskImageResource(
>>>>>>>> local_path = "/home/cossim/.cache/gem5/riscv-disk-img",
>>>>>>>> )
>>>>>>>>
>>>>>>>> # Set the Full System workload.
>>>>>>>> board.set_kernel_disk_workload(
>>>>>>>>
kernel=Resource("riscv-bootloader-vmlinux-5.10"),
>>>>>>>> disk_image=image,
>>>>>>>> )
>>>>>>>>
>>>>>>>> Finally, in the
gem5/src/python/gem5/components/boards/riscv_board.py
>>>>>>>> I change the last line to "return ["console=ttyS0",
>>>>>>>> "root={root_value}", "rw"]" in order to allow the write
permissions
>>>> in
>>>>>>>> the image.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2) The HPCG benchmark after some iterations calculates if the
results
>>>>>>>> are valid or not valid. In the case of FS it gives invalid
results.
>>>> As
>>>>>>>> I see from the results, one (at least) problem is that produces
>>>>>>>> different results in each HPCG execution (with the same
>>>> configuration).
>>>>>>>>
>>>>>>>> Here is the HPCG output and riscv-fs.py
>>>>>>>> (http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may reproduce
the
>>>>>>>> results in the video if you use the xhpcg executable
>>>>>>>> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/)
>>>>>>>>
>>>>>>>> Please help me in order to solve it!
>>>>>>>>
>>>>>>>> Finally, I get invalid results in the HPL benchmark in FS mode
too.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Nikos
>>>>>>>>
>>>>>>>>
>>>>>>>> Quoting Bobby Bruce <bbr...@ucdavis.edu>:
>>>>>>>>
>>>>>>>> > I'm going to need a bit more information to help:
>>>>>>>> >
>>>>>>>> > 1. In what way have you modified
>>>>>>>> > ./configs/example/gem5_library/riscv-fs.py? Can you attach the
>>>> script
>>>>>>>> here?
>>>>>>>> > 2. What error are you getting or in what way are the results
>>>> invalid?
>>>>>>>> >
>>>>>>>> > -
>>>>>>>> > Dr. Bobby R. Bruce
>>>>>>>> > Room 3050,
>>>>>>>> > Kemper Hall, UC Davis
>>>>>>>> > Davis,
>>>>>>>> > CA, 95616
>>>>>>>> >
>>>>>>>> > web: https://www.bobbybruce.net
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής <
>>>>>>>> > ntampourat...@ece.auth.gr> wrote:
>>>>>>>> >
>>>>>>>> >>
>>>>>>>> >> Dear gem5 community,
>>>>>>>> >>
>>>>>>>> >> I have successfully cross-compile the HPCG benchmark for RISCV
>>>>>> (Serial
>>>>>>>> >> version, without MPI and OpenMP). While it working properly in
>>>> gem5
>>>>>> SE
>>>>>>>> >> mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results
>>>>>>>> >> ./configs/example/se.py -c xhpcg --options '--nx=16 --ny=16
>>>> --nz=16
>>>>>>>> >> --npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid results in FS
>>>>>>>> >> simulation using "./build/RISCV/gem5.fast -d ./HPCG_FS_results
>>>>>>>> >> ./configs/example/gem5_library/riscv-fs.py" (I mount the riscv
>>>> image
>>>>>>>> >> and put it).
>>>>>>>> >>
>>>>>>>> >> Can you help me please?
>>>>>>>> >>
>>>>>>>> >> In addition, I used the RISCV Ubuntu image
>>>>>>>> >> (
>>>> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu
>>>>>> ),
>>>>>>>> >> I installed the gcc compiler, compile it (through qemu) and I
get
>>>>>>>> >> wrong results too.
>>>>>>>> >>
>>>>>>>> >> Here is the Makefile which I use, the hpcg executable for RISCV
>>>>>>>> >> (xhpcg), and a video that shows the results
>>>>>>>> >> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/).
>>>>>>>> >>
>>>>>>>> >> P.S. I use the latest gem5 version.
>>>>>>>> >>
>>>>>>>> >> Thank you in advance! :)
>>>>>>>> >>
>>>>>>>> >> Best regards,
>>>>>>>> >> Nikos
>>>>>>>> >> _______________________________________________
>>>>>>>> >> gem5-users mailing list -- gem5-users@gem5.org
>>>>>>>> >> To unsubscribe send an email to gem5-users-le...@gem5.org
>>>>>>>> >>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> gem5-users mailing list -- gem5-users@gem5.org
>>>>>>>> To unsubscribe send an email to gem5-users-le...@gem5.org
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> gem5-users mailing list -- gem5-users@gem5.org
>>>>>> To unsubscribe send an email to gem5-users-le...@gem5.org
>>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> gem5-users mailing list -- gem5-users@gem5.org
>>>> To unsubscribe send an email to gem5-users-le...@gem5.org
>>>>
>>
>>
>> _______________________________________________
>> gem5-users mailing list -- gem5-users@gem5.org
>> To unsubscribe send an email to gem5-users-le...@gem5.org
>
>
> _______________________________________________
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org