Dear Boddy,

Thanks a lot for the effort! I looked in detail and I observe that the problem is created only using float and double variables (in the case of int it is working properly in FS mode). Specifically, in the case of float the variables are set to "nan", while in the case of double the variables are set to 0.000000 (in random time - probably from some instruction of simulated OS?). You may use a simple c/c++ example in order to get some traces before going to HPCG...

Thank you in advance!!
Best regards,
Nikos


Quoting Bobby Bruce <bbr...@ucdavis.edu>:

Hey Niko,

Thanks for this analysis. I jumped a little into this today but didn't get
as far as you did. I wanted to find a quick way to recreate the following:
https://gem5-review.googlesource.com/c/public/gem5/+/64211.  Please feel
free to use this, if it helps any.

It's very strange to me that this bug hasn't manifested itself before but
it's undeniably there. I'll try to spend more time looking at this tomorrow
with some traces and debug flags and see if I can narrow down the problem.

--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net


On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής <
ntampourat...@ece.auth.gr> wrote:

In my previous results, I had used double (not float) for the
following variables: result, sq_i and sq_j. In the case of float
instead of double I get "nan" and not 0.000000.

Quoting Νικόλαος Ταμπουρατζής <ntampourat...@ece.auth.gr>:

> Dear Jason, all,
>
> I am trying to find the accuracy problem with RISCV-FS and I observe
> that the problem is created (at least in my dummy example) because
> the variables (double) are set to zero in random simulated time (for
> this reason I get different results among executions of the same
> code). Specifically for the following dummy code:
>
>
> #include <cmath>
> #include <stdio.h>
>
> int main(){
>
>     int dim = 10;
>
>     float result;
>
>     for (int iter = 0; iter < 2; iter++){
>         result = 0;
>         for (int i = 0; i < dim; i++){
>             for (int j = 0; j < dim; j++){
>                 float sq_i = sqrt(i);
>                 float sq_j = sqrt(j);
>                 result += sq_i * sq_j;
>                 printf("ITER: %d | i: %d | j: %d Result(i: %f | j:
> %f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result);
>             }
>         }
>         printf("Final Result: %lf\n", result);
>     }
> }
>
>
> The correct Final Result in both iterations is 372.721656. However,
> I get the following results in FS:
>
> ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j:
> 1.000000): 1.000000
> ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j:
> 1.414214): 2.414214
> ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j:
> 1.732051): 4.146264
> ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j:
> 1.414214): 1.414214
> ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j:
> 2.000000): 3.414214
> ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j:
> 2.449490): 5.863703
> ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j:
> 2.828427): 8.692130
> ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j:
> 3.162278): 11.854408
> ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j:
> 3.464102): 15.318510
> ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j:
> 3.741657): 19.060167
> ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j:
> 4.000000): 23.060167
> ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j:
> 4.242641): 27.302808
> ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j:
> 0.000000): 27.302808
> ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j:
> 1.732051): 29.034859
> ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j:
> 2.449490): 31.484348
> ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j:
> 3.000000): 34.484348
> ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j:
> 3.464102): 37.948450
> ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j:
> 3.872983): 41.821433
> ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j:
> 4.242641): 46.064074
> ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j:
> 4.582576): 50.646650
> ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j:
> 4.898979): 55.545629
> ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j:
> 5.196152): 60.741782
> ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j:
> 0.000000): 60.741782
> ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j:
> 2.000000): 62.741782
> ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j:
> 2.828427): 65.570209
> ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j:
> 3.464102): 69.034310
> ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j:
> 4.000000): 73.034310
> ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j:
> 4.472136): 77.506446
> ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j:
> 4.898979): 82.405426
> ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j:
> 5.291503): 87.696928
> ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j:
> 5.656854): 93.353783
> ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j:
> 6.000000): 99.353783
> ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j:
> 0.000000): 99.353783
> ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j:
> 2.236068): 101.589851
> ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j:
> 3.162278): 104.752128
> ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j:
> 3.872983): 108.625112
> ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j:
> 4.472136): 113.097248
> ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j:
> 5.000000): 118.097248
> ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j:
> 5.477226): 123.574473
> ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j:
> 5.916080): 129.490553
> ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j:
> 6.324555): 135.815108
> ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j:
> 6.708204): 142.523312
> ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j:
> 0.000000): 142.523312
> ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j:
> 2.449490): 144.972802
> ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j:
> 3.464102): 148.436904
> ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j:
> 4.242641): 152.679544
> ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j:
> 4.898979): 157.578524
> ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j:
> 5.477226): 163.055749
> ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j:
> 6.000000): 169.055749
> ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j:
> 6.480741): 175.536490
> ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j:
> 6.928203): 182.464693
> ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j:
> 7.348469): 189.813162
> ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j:
> 0.000000): 189.813162
> ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j:
> 2.645751): 192.458914
> ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j:
> 3.741657): 196.200571
> ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j:
> 4.582576): 200.783147
> ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j:
> 5.291503): 206.074649
> ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j:
> 5.916080): 211.990729
> ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j:
> 6.480741): 218.471470
> ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j:
> 7.000000): 225.471470
> ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j:
> 7.483315): 232.954785
> ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j:
> 7.937254): 240.892039
> ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j:
> 0.000000): 240.892039
> ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j:
> 2.828427): 243.720466
> ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j:
> 4.000000): 247.720466
> ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j:
> 4.898979): 252.619445
> ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j:
> 5.656854): 258.276300
> ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j:
> 6.324555): 264.600855
> ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j:
> 6.928203): 271.529058
> ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j:
> 7.483315): 279.012373
> ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j:
> 8.000000): 287.012373
> ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j:
> 8.485281): 295.497654
> ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j:
> 0.000000): 295.497654
> ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j:
> 3.000000): 298.497654
> ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j:
> 4.242641): 302.740295
> ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j:
> 5.196152): 307.936447
> ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j:
> 6.000000): 313.936447
> ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j:
> 6.708204): 320.644651
> ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j:
> 7.348469): 327.993120
> ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j:
> 7.937254): 335.930374
> ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j:
> 8.485281): 344.415656
> ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j:
> 9.000000): 353.415656
> Final Result: 353.415656
> ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j:
> 0.000000): 0.000000
> ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j:
> 1.000000): 1.000000
> ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j:
> 1.414214): 2.414214
> ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j:
> 1.732051): 4.146264
> ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | i*j:
> 2.000000): 6.146264
> ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i*j:
> 2.236068): 8.382332
> ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | i*j:
> 2.449490): 10.831822
> ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i*j:
> 2.645751): 13.477573
> ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | i*j:
> 2.828427): 16.306001
> ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i*j:
> 3.000000): 19.306001
> ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j:
> 0.000000): 19.306001
> ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j:
> 1.414214): 20.720214
> ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j:
> 2.000000): 22.720214
> ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j:
> 2.449490): 25.169704
> ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j:
> 2.828427): 27.998131
> ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j:
> 3.162278): 31.160409
> ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j:
> 3.464102): 34.624510
> ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j:
> 3.741657): 38.366168
> ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j:
> 4.000000): 42.366168
> ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j:
> 4.242641): 46.608808
> ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j:
> 0.000000): 46.608808
> ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j:
> 1.732051): 48.340859
> ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j:
> 2.449490): 50.790349
> ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j:
> 3.000000): 53.790349
> ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j:
> 3.464102): 57.254450
> ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j:
> 3.872983): 61.127434
> ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j:
> 4.242641): 65.370075
> ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j:
> 4.582576): 69.952650
> ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j:
> 4.898979): 74.851630
> ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j:
> 5.196152): 80.047782
> ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j:
> 0.000000): 80.047782
> ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j:
> 2.000000): 82.047782
> ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j:
> 2.828427): 84.876209
> ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j:
> 3.464102): 88.340311
> ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j:
> 4.000000): 92.340311
> ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j:
> 4.472136): 96.812447
> ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j:
> 4.898979): 101.711426
> ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j:
> 5.291503): 107.002929
> ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j:
> 5.656854): 112.659783
> ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j:
> 6.000000): 118.659783
> ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j:
> 0.000000): 118.659783
> ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j:
> 2.236068): 120.895851
> ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j:
> 3.162278): 124.058129
> ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j:
> 3.872983): 127.931112
> ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j:
> 4.472136): 132.403248
> ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j:
> 5.000000): 137.403248
> ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j:
> 5.477226): 142.880474
> ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j:
> 5.916080): 148.796553
> ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j:
> 6.324555): 155.121109
> ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j:
> 6.708204): 161.829313
> ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j:
> 0.000000): 161.829313
> ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j:
> 2.449490): 164.278802
> ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j:
> 3.464102): 167.742904
> ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j:
> 4.242641): 171.985545
> ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j:
> 4.898979): 176.884524
> ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j:
> 5.477226): 182.361750
> ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j:
> 6.000000): 188.361750
> ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j:
> 6.480741): 194.842491
> ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j:
> 6.928203): 201.770694
> ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j:
> 7.348469): 209.119163
> ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j:
> 0.000000): 209.119163
> ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j:
> 2.645751): 211.764914
> ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j:
> 3.741657): 215.506572
> ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j:
> 4.582576): 220.089147
> ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j:
> 5.291503): 225.380650
> ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j:
> 5.916080): 231.296730
> ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j:
> 6.480741): 237.777470
> ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j:
> 7.000000): 244.777470
> ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j:
> 7.483315): 252.260785
> ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j:
> 7.937254): 260.198039
> ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j:
> 0.000000): 260.198039
> ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j:
> 2.828427): 263.026466
> ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j:
> 4.000000): 267.026466
> ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j:
> 4.898979): 271.925446
> ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j:
> 5.656854): 277.582300
> ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j:
> 6.324555): 283.906855
> ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j:
> 6.928203): 290.835059
> ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j:
> 7.483315): 298.318373
> ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j:
> 8.000000): 306.318373
> ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j:
> 8.485281): 314.803655
> ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j:
> 0.000000): 314.803655
> ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j:
> 3.000000): 317.803655
> ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j:
> 4.242641): 322.046295
> ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j:
> 5.196152): 327.242448
> ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j:
> 6.000000): 333.242448
> ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j:
> 6.708204): 339.950652
> ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j:
> 7.348469): 347.299121
> ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j:
> 7.937254): 355.236375
> ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j:
> 8.485281): 363.721656
> ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j:
> 9.000000): 372.721656
> Final Result: 372.721656
>
>
>
> As we can see in the following iterations the sqrt(1) as well as the
> result is set to zero for some reason.
>
> ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
> 0.000000): 0.000000
> ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
> 0.000000): 0.000000
>
> Please help me to resolve the accuracy issue! I think that it will
> be very useful for gem5 community.
>
> To be noticed, I find the correct simulated tick in which the
> application started in FS (using m5 dumpstats), and I start the
> --debug-start, but the trace file which is generated is 10x larger
> than SE mode for the same application. How can I compare them?
>
> Thank you in advance!
> Best regards,
> Nikos
>
> Quoting Νικόλαος Ταμπουρατζής <ntampourat...@ece.auth.gr>:
>
>> Dear Jason,
>>
>> I am trying to use --debug-start but in FS mode it is very
>> difficult to find the tick on which the application is started!
>>
>> However, I am writing the following very simple c++ program:
>>
>> #include <cmath>
>> #include <stdio.h>
>>
>> int main(){
>>
>>    int dim = 4096;
>>
>>    double result;
>>
>>    for (int iter = 0; iter < 2; iter++){
>>        result = 0;
>>        for (int i = 0; i < dim; i++){
>>            for (int j = 0; j < dim; j++){
>>                result += sqrt(i) * sqrt(j);
>>            }
>>        }
>>        printf("Result: %lf\n", result); //Result: 30530733453.127449
>>    }
>> }
>>
>> I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o
>> test_riscv test_riscv.cpp
>>
>>
>> While in X86 (without cross-compilation of course), QEMU-RISCV,
>> GEM5-SE the result is the same (30530733453.127449), in GEM5-FS the
>> result is different! In addition, the result is also different
>> between the 2 iterations.
>>
>> Please reproduce the error if you want in order to verify my result.
>> Ηow can the issue be resolved?
>>
>> Thank you in advance!
>>
>> Best regards,
>> Nikos
>>
>>
>> Quoting Jason Lowe-Power <ja...@lowepower.com>:
>>
>>> Hi Nikos,
>>>
>>> You can use --debug-start to start the debugging after some number of
>>> ticks. Also, I would expect that the difference should come up
quickly, so
>>> no need to run the program to the end.
>>>
>>> For the FS mode one, you will want to just start the trace as the
>>> application starts. This could be a bit of a pain.
>>>
>>> I'm not really sure what fundamentally could be different. FS and SE
mode
>>> use the exact same code for executing instructions, so I don't think
that's
>>> the problem. Have you tried running for smaller inputs or just one
>>> iteration?
>>>
>>> Jason
>>>
>>>
>>>
>>> On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής <
>>> ntampourat...@ece.auth.gr> wrote:
>>>
>>>> Dear Bobby,
>>>>
>>>> Iam trying to add --debug-flags=Exec (building the gem5 for gem5.opt
>>>> not for gem5.fast which I had) but the debug traces exceed the 20GB
>>>> (and it is not finished yet) for less than 1 simulated second. How can
>>>> I reduce the size of the debug-flags (or set something more specific)?
>>>>
>>>> In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. If you
>>>> want, you can compare these two output files
>>>> (hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As you can
>>>> see, something goes wrong with the accuracy of calculations in FS mode
>>>> (benchmark uses double precission). You can find the files here:
>>>> http://kition.mhl.tuc.gr:8000/d/68d82f3533/
>>>>
>>>> Best regards,
>>>> Nikos
>>>>
>>>> Quoting Jason Lowe-Power <ja...@lowepower.com>:
>>>>
>>>>> That's quite odd that it works in SE mode but not FS mode!
>>>>>
>>>>> I would suggest running with --debug-flags=Exec for both and then
>>>> perform a
>>>>> diff to see how they differ.
>>>>>
>>>>> Cheers,
>>>>> Jason
>>>>>
>>>>> On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής <
>>>>> ntampourat...@ece.auth.gr> wrote:
>>>>>
>>>>>> Dear Bobby,
>>>>>>
>>>>>> In QEMU I get the same (correct) results that I get in SE mode
>>>>>> simulation. I get invalid results in FS simulation (in both
>>>>>> riscv-fs.py and riscv-ubuntu-run.py). I cannot access real RISCV
>>>>>> hardware at this moment, however, if you want you may execute my
xhpcg
>>>>>> binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the
>>>>>> following configuration:
>>>>>>
>>>>>> ./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 --rt=0.1
>>>>>>
>>>>>> Please let me know if you have any updates!
>>>>>>
>>>>>> Best regards,
>>>>>> Nikos
>>>>>>
>>>>>>
>>>>>> Quoting Jason Lowe-Power <ja...@lowepower.com>:
>>>>>>
>>>>>>> Hi Nikos,
>>>>>>>
>>>>>>> I notice you said the following in your original email:
>>>>>>>
>>>>>>> In addition, I used the RISCV Ubuntu image
>>>>>>>> (
https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu
>>>> ),
>>>>>>>> I installed the gcc compiler, compile it (through qemu) and I get
>>>>>>>> wrong results too.
>>>>>>>
>>>>>>>
>>>>>>> Is this saying you get the wrong results is QEMU? If so, the bug
is in
>>>>>> GCC
>>>>>>> or the HPCG workload, not in gem5. If not, I would test in QEMU to
>>>> make
>>>>>>> sure the binary works there. Another way you could test to see if
the
>>>>>>> problem is your binary or gem5 would be to run it on real
hardware. We
>>>>>> have
>>>>>>> access to some RISC-V hardware here at UC Davis, if you don't have
>>>> access
>>>>>>> to it.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Jason
>>>>>>>
>>>>>>> On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής <
>>>>>>> ntampourat...@ece.auth.gr> wrote:
>>>>>>>
>>>>>>>> Dear Bobby,
>>>>>>>>
>>>>>>>> 1) I use the original riscv-fs.py which is provided in the latest
>>>> gem5
>>>>>>>> release.
>>>>>>>> I run the gem5 once (./build/RISCV/gem5.fast -d ./HPCG_FS_results
>>>>>>>> ./configs/example/gem5_library/riscv-fs.py) in order to download
the
>>>>>>>> riscv-bootloader-vmlinux-5.10 and riscv-disk-img.
>>>>>>>> After this I mount the riscv-disk-img (sudo mount -o loop
>>>>>>>> riscv-disk-img /mnt), put the xhpcg executable and I do the
following
>>>>>>>> changes in riscv-fs.py to boot the riscv-disk-img with executable:
>>>>>>>>
>>>>>>>> image = CustomDiskImageResource(
>>>>>>>>      local_path = "/home/cossim/.cache/gem5/riscv-disk-img",
>>>>>>>> )
>>>>>>>>
>>>>>>>> # Set the Full System workload.
>>>>>>>> board.set_kernel_disk_workload(
>>>>>>>>
 kernel=Resource("riscv-bootloader-vmlinux-5.10"),
>>>>>>>>                     disk_image=image,
>>>>>>>> )
>>>>>>>>
>>>>>>>> Finally, in the
gem5/src/python/gem5/components/boards/riscv_board.py
>>>>>>>> I change the last line to "return ["console=ttyS0",
>>>>>>>> "root={root_value}", "rw"]" in order to allow the write
permissions
>>>> in
>>>>>>>> the image.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2) The HPCG benchmark after some iterations calculates if the
results
>>>>>>>> are valid or not valid. In the case of FS it gives invalid
results.
>>>> As
>>>>>>>> I see from the results, one (at least) problem is that produces
>>>>>>>> different results in each HPCG execution (with the same
>>>> configuration).
>>>>>>>>
>>>>>>>> Here is the HPCG output and riscv-fs.py
>>>>>>>> (http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may reproduce
the
>>>>>>>> results in the video if you use the xhpcg executable
>>>>>>>> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/)
>>>>>>>>
>>>>>>>> Please help me in order to solve it!
>>>>>>>>
>>>>>>>> Finally, I get invalid results in the HPL benchmark in FS mode
too.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Nikos
>>>>>>>>
>>>>>>>>
>>>>>>>> Quoting Bobby Bruce <bbr...@ucdavis.edu>:
>>>>>>>>
>>>>>>>> > I'm going to need a bit more information to help:
>>>>>>>> >
>>>>>>>> > 1. In what way have you modified
>>>>>>>> > ./configs/example/gem5_library/riscv-fs.py? Can you attach the
>>>> script
>>>>>>>> here?
>>>>>>>> > 2. What error are you getting or in what way are the results
>>>> invalid?
>>>>>>>> >
>>>>>>>> > -
>>>>>>>> > Dr. Bobby R. Bruce
>>>>>>>> > Room 3050,
>>>>>>>> > Kemper Hall, UC Davis
>>>>>>>> > Davis,
>>>>>>>> > CA, 95616
>>>>>>>> >
>>>>>>>> > web: https://www.bobbybruce.net
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής <
>>>>>>>> > ntampourat...@ece.auth.gr> wrote:
>>>>>>>> >
>>>>>>>> >>
>>>>>>>> >> Dear gem5 community,
>>>>>>>> >>
>>>>>>>> >> I have successfully cross-compile the HPCG benchmark for RISCV
>>>>>> (Serial
>>>>>>>> >> version, without MPI and OpenMP). While it working properly in
>>>> gem5
>>>>>> SE
>>>>>>>> >> mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results
>>>>>>>> >> ./configs/example/se.py -c xhpcg --options '--nx=16 --ny=16
>>>> --nz=16
>>>>>>>> >> --npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid results in FS
>>>>>>>> >> simulation using "./build/RISCV/gem5.fast -d ./HPCG_FS_results
>>>>>>>> >> ./configs/example/gem5_library/riscv-fs.py" (I mount the riscv
>>>> image
>>>>>>>> >> and put it).
>>>>>>>> >>
>>>>>>>> >> Can you help me please?
>>>>>>>> >>
>>>>>>>> >> In addition, I used the RISCV Ubuntu image
>>>>>>>> >> (
>>>> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu
>>>>>> ),
>>>>>>>> >> I installed the gcc compiler, compile it (through qemu) and I
get
>>>>>>>> >> wrong results too.
>>>>>>>> >>
>>>>>>>> >> Here is the Makefile which I use, the hpcg executable for RISCV
>>>>>>>> >> (xhpcg), and a video that shows the results
>>>>>>>> >> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/).
>>>>>>>> >>
>>>>>>>> >> P.S. I use the latest gem5 version.
>>>>>>>> >>
>>>>>>>> >> Thank you in advance! :)
>>>>>>>> >>
>>>>>>>> >> Best regards,
>>>>>>>> >> Nikos
>>>>>>>> >> _______________________________________________
>>>>>>>> >> gem5-users mailing list -- gem5-users@gem5.org
>>>>>>>> >> To unsubscribe send an email to gem5-users-le...@gem5.org
>>>>>>>> >>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> gem5-users mailing list -- gem5-users@gem5.org
>>>>>>>> To unsubscribe send an email to gem5-users-le...@gem5.org
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> gem5-users mailing list -- gem5-users@gem5.org
>>>>>> To unsubscribe send an email to gem5-users-le...@gem5.org
>>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> gem5-users mailing list -- gem5-users@gem5.org
>>>> To unsubscribe send an email to gem5-users-le...@gem5.org
>>>>
>>
>>
>> _______________________________________________
>> gem5-users mailing list -- gem5-users@gem5.org
>> To unsubscribe send an email to gem5-users-le...@gem5.org
>
>
> _______________________________________________
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org


_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org



_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
  • [gem5-users] Re: ... Νικόλαος Ταμπουρατζής
    • [gem5-users]... Jason Lowe-Power
      • [gem5-us... Νικόλαος Ταμπουρατζής
        • [gem... Jason Lowe-Power
          • ... Νικόλαος Ταμπουρατζής
            • ... Jason Lowe-Power
            • ... Νικόλαος Ταμπουρατζής
            • ... Νικόλαος Ταμπουρατζής
            • ... Νικόλαος Ταμπουρατζής
            • ... Bobby Bruce
            • ... Νικόλαος Ταμπουρατζής
            • ... Jason Lowe-Power
            • ... Νικόλαος Ταμπουρατζής
            • ... Bobby Bruce
            • ... Νικόλαος Ταμπουρατζής via gem5-users
            • ... Bobby Bruce via gem5-users
            • ... Νικόλαος Ταμπουρατζής via gem5-users
            • ... Hoa Nguyen via gem5-users
            • ... Hoa Nguyen via gem5-users
            • ... Νικόλαος Ταμπουρατζής via gem5-users
            • ... Hoa Nguyen

Reply via email to