[gem5-users] Re: HPCG on RISCV

2022-10-07 Thread Hoa Nguyen
Hi,

It's quite odd that both sqrt_i and result were zeroed out at the same
time. Does the problem appear in other ISA FS mode, e.g. x86 FS mode? Can
you show the objdump of the loop as well?

Regards,
Hoa Nguyen

On Thu, Oct 6, 2022, 04:06 Νικόλαος Ταμπουρατζής 
wrote:

> Dear Jason, all,
>
> I am trying to find the accuracy problem with RISCV-FS and I observe
> that the problem is created (at least in my dummy example) because the
> variables (double) are set to zero in random simulated time (for this
> reason I get different results among executions of the same code).
> Specifically for the following dummy code:
>
>
> #include 
> #include 
>
> int main(){
>
>  int dim = 10;
>
>  float result;
>
>  for (int iter = 0; iter < 2; iter++){
>  result = 0;
>  for (int i = 0; i < dim; i++){
>  for (int j = 0; j < dim; j++){
>  float sq_i = sqrt(i);
>  float sq_j = sqrt(j);
>  result += sq_i * sq_j;
>  printf("ITER: %d | i: %d | j: %d Result(i: %f | j: %f
> | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result);
>  }
>  }
>  printf("Final Result: %lf\n", result);
>  }
> }
>
>
> The correct Final Result in both iterations is 372.721656. However, I
> get the following results in FS:
>
> ITER: 0 | i: 0 | j: 0 Result(i: 0.00 | j: 0.00 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 0 | j: 1 Result(i: 0.00 | j: 1.00 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 0 | j: 2 Result(i: 0.00 | j: 1.414214 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 0 | j: 3 Result(i: 0.00 | j: 1.732051 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 0 | j: 4 Result(i: 0.00 | j: 2.00 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 0 | j: 5 Result(i: 0.00 | j: 2.236068 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 0 | j: 6 Result(i: 0.00 | j: 2.449490 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 0 | j: 7 Result(i: 0.00 | j: 2.645751 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 0 | j: 8 Result(i: 0.00 | j: 2.828427 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 0 | j: 9 Result(i: 0.00 | j: 3.00 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 1 | j: 0 Result(i: 1.00 | j: 0.00 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 1 | j: 1 Result(i: 1.00 | j: 1.00 | i*j:
> 1.00): 1.00
> ITER: 0 | i: 1 | j: 2 Result(i: 1.00 | j: 1.414214 | i*j:
> 1.414214): 2.414214
> ITER: 0 | i: 1 | j: 3 Result(i: 1.00 | j: 1.732051 | i*j:
> 1.732051): 4.146264
> ITER: 0 | i: 1 | j: 4 Result(i: 0.00 | j: 2.00 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 1 | j: 5 Result(i: 0.00 | j: 2.236068 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 1 | j: 6 Result(i: 0.00 | j: 2.449490 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 1 | j: 7 Result(i: 0.00 | j: 2.645751 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 1 | j: 8 Result(i: 0.00 | j: 2.828427 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 1 | j: 9 Result(i: 0.00 | j: 3.00 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.00 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.00 | i*j:
> 1.414214): 1.414214
> ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j:
> 2.00): 3.414214
> ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j:
> 2.449490): 5.863703
> ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.00 | i*j:
> 2.828427): 8.692130
> ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j:
> 3.162278): 11.854408
> ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j:
> 3.464102): 15.318510
> ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j:
> 3.741657): 19.060167
> ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j:
> 4.00): 23.060167
> ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.00 | i*j:
> 4.242641): 27.302808
> ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.00 | i*j:
> 0.00): 27.302808
> ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.00 | i*j:
> 1.732051): 29.034859
> ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j:
> 2.449490): 31.484348
> ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j:
> 3.00): 34.484348
> ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.00 | i*j:
> 3.464102): 37.948450
> ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j:
> 3.872983): 41.821433
> ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j:
> 4.242641): 46.064074
> ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j:
> 4.582576): 50.646650
> ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j:
> 4.898979): 55.545629
> ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.00 | i*j:
> 5.196152): 60.741782
> ITER: 0 | i: 4 | j: 0 Result(i: 2.00 | j: 0.00 | i*j:
> 0.00): 60.741782
> ITER: 0 | i: 4 | j: 1 Result(i: 2.00 | j: 1.00 | i*j:
> 2.00): 62.741782
> ITER: 0 | i: 4 | j: 2 Resu

[gem5-users] Re: HPCG on RISCV

2022-10-07 Thread Νικόλαος Ταμπουρατζής

Dear Jason & Boddy,

Unfortunately, I have tried my simple example without the sqrt  
function and the problem remains. Specifically, I have the following  
simple code:



#include 
#include 

int main(){

int dim = 1024;

double result;

for (int iter = 0; iter < 2; iter++){
result = 0;
for (int i = 0; i < dim; i++){
for (int j = 0; j < dim; j++){
result += i * j;
}
}
printf("Final Result: %lf\n", result);
}
}


In the above code, the correct result is 274341298176.00 (from  
RISCV-SE mode and x86), while in FS mode I get sometimes the correct  
result and other times a different number.


Best regards,
Nikos


Quoting Jason Lowe-Power :


I have an idea...

Have you put a breakpoint in the implementation of the fsqrt_d function? I
would like to know if when running in SE mode and running in FS mode we are
using the same rounding mode. My hypothesis is that in FS mode the rounding
mode is set differently.

Cheers,
Jason

On Fri, Oct 7, 2022 at 12:15 AM Νικόλαος Ταμπουρατζής <
ntampourat...@ece.auth.gr> wrote:


Dear Boddy,

Thanks a lot for the effort! I looked in detail and I observe that the
problem is created only using float and double variables (in the case
of int it is working properly in FS mode). Specifically, in the case
of float the variables are set to "nan", while in the case of double
the variables are set to 0.00 (in random time - probably from some
instruction of simulated OS?). You may use a simple c/c++ example in
order to get some traces before going to HPCG...

Thank you in advance!!
Best regards,
Nikos


Quoting Bobby Bruce :

> Hey Niko,
>
> Thanks for this analysis. I jumped a little into this today but didn't
get
> as far as you did. I wanted to find a quick way to recreate the
following:
> https://gem5-review.googlesource.com/c/public/gem5/+/64211.  Please feel
> free to use this, if it helps any.
>
> It's very strange to me that this bug hasn't manifested itself before but
> it's undeniably there. I'll try to spend more time looking at this
tomorrow
> with some traces and debug flags and see if I can narrow down the
problem.
>
> --
> Dr. Bobby R. Bruce
> Room 3050,
> Kemper Hall, UC Davis
> Davis,
> CA, 95616
>
> web: https://www.bobbybruce.net
>
>
> On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής <
> ntampourat...@ece.auth.gr> wrote:
>
>> In my previous results, I had used double (not float) for the
>> following variables: result, sq_i and sq_j. In the case of float
>> instead of double I get "nan" and not 0.00.
>>
>> Quoting Νικόλαος Ταμπουρατζής :
>>
>> > Dear Jason, all,
>> >
>> > I am trying to find the accuracy problem with RISCV-FS and I observe
>> > that the problem is created (at least in my dummy example) because
>> > the variables (double) are set to zero in random simulated time (for
>> > this reason I get different results among executions of the same
>> > code). Specifically for the following dummy code:
>> >
>> >
>> > #include 
>> > #include 
>> >
>> > int main(){
>> >
>> > int dim = 10;
>> >
>> > float result;
>> >
>> > for (int iter = 0; iter < 2; iter++){
>> > result = 0;
>> > for (int i = 0; i < dim; i++){
>> > for (int j = 0; j < dim; j++){
>> > float sq_i = sqrt(i);
>> > float sq_j = sqrt(j);
>> > result += sq_i * sq_j;
>> > printf("ITER: %d | i: %d | j: %d Result(i: %f | j:
>> > %f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result);
>> > }
>> > }
>> > printf("Final Result: %lf\n", result);
>> > }
>> > }
>> >
>> >
>> > The correct Final Result in both iterations is 372.721656. However,
>> > I get the following results in FS:
>> >
>> > ITER: 0 | i: 0 | j: 0 Result(i: 0.00 | j: 0.00 | i*j:
>> > 0.00): 0.00
>> > ITER: 0 | i: 0 | j: 1 Result(i: 0.00 | j: 1.00 | i*j:
>> > 0.00): 0.00
>> > ITER: 0 | i: 0 | j: 2 Result(i: 0.00 | j: 1.414214 | i*j:
>> > 0.00): 0.00
>> > ITER: 0 | i: 0 | j: 3 Result(i: 0.00 | j: 1.732051 | i*j:
>> > 0.00): 0.00
>> > ITER: 0 | i: 0 | j: 4 Result(i: 0.00 | j: 2.00 | i*j:
>> > 0.00): 0.00
>> > ITER: 0 | i: 0 | j: 5 Result(i: 0.00 | j: 2.236068 | i*j:
>> > 0.00): 0.00
>> > ITER: 0 | i: 0 | j: 6 Result(i: 0.00 | j: 2.449490 | i*j:
>> > 0.00): 0.00
>> > ITER: 0 | i: 0 | j: 7 Result(i: 0.00 | j: 2.645751 | i*j:
>> > 0.00): 0.00
>> > ITER: 0 | i: 0 | j: 8 Result(i: 0.00 | j: 2.828427 | i*j:
>> > 0.00): 0.00
>> > ITER: 0 | i: 0 | j: 9 Result(i: 0.00 | j: 3.00 | i*j:
>> > 0.00): 0.00
>> > ITER: 0 | i: 1 | j: 0 Result(i: 1.00 | j: 0.00 | i*j:
>> > 0.00): 0.00
>> > ITER: 0 | i: 1 | j: 1 Result(i: 1.00 | j: 1.00 | i*j:
>> > 1.00): 1.00
>> > ITER: 0 | i: 1 | j: 2 Result(i: 1.00 | j: 1.414214 | i*j:
>> > 1.414214): 2.414214
>> > 

[gem5-users] Re: Load dependency in gem5

2022-10-07 Thread Eliot Moss

On 10/7/2022 1:30 PM, Aritra Bagchi wrote:

Hi Eliot,

Thanks for the response. The unrolled loop, despite having the same dependency across "j", can send 
multiple loads simultaneously. So the limitation might not be due to that dependency across "j" of 
different iterations. But in the non-unrolled loop, the control dependency is there, which goes away 
when the loop is unrolled. But even without any speculation, gem5 could have scheduled loads as 
follows:


first schedule load A[ k ]
then, compare i with N
then schedule load A [ k + 1 ] if i < N

But what is happening is A[ k + 1 ] is scheduled only after load A[ k ] is completed. Is that 
completion necessary? It seems it isn't. The memory system is underutilised.


Thanks and regards,
Aritra


I understand your thinking, but it would be helpful to include
the assembly code listings for the two cases for us to apply
more careful reasoning as to what may be happening.

Best - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Load dependency in gem5

2022-10-07 Thread Aritra Bagchi
Hi Eliot,

Thanks for the response. The unrolled loop, despite having the same
dependency across "j", can send multiple loads simultaneously. So the
limitation might not be due to that dependency across "j" of different
iterations. But in the non-unrolled loop, the control dependency is there,
which goes away when the loop is unrolled. But even without any
speculation, gem5 could have scheduled loads as follows:

first schedule load A[ k ]
then, compare i with N
then schedule load A [ k + 1 ] if i < N

But what is happening is A[ k + 1 ] is scheduled only after load A[ k ] is
completed. Is that completion necessary? It seems it isn't. The memory
system is underutilised.

Thanks and regards,
Aritra


On Fri, Oct 7, 2022 at 10:48 PM Eliot Moss  wrote:

> On 10/7/2022 1:13 PM, Eliot Moss wrote:
> > On 10/7/2022 1:03 PM, Aritra Bagchi wrote:
> >> Hi all,
> >>
> >> Any suggestions on this are most helpful.
> >>
> >> Thanks and regards,
> >> Aritra
> >
> > My guess is that it is because the non-unrolled loop
> > has a test of i against 1000 before each access to A[i].
> > That test guards the load, so must be completed before
> > the load can proceed.  It could also be because of the
> > way j is used - the next update cannot proceed until
> > the last one finishes.  It might be helpful to loop
> > at the actual instructions involved, but the control
> > dependency could be an issue.
> >
> > The unrolled loop avoids both of these possible
> > dependencies.
> >
> > I further observe that if we are talking about an
> > Intel processor. those processor handle loads in the
> > order the program presents them.  Not sure if that
> > has any impact here.  Also unsure whether cpu
> > speculative execution plays a role (which would actually
> > improve matters).
> >
> > Best - Eliot Moss
> >
> >> On Thu, Oct 6, 2022 at 6:01 PM Aritra Bagchi  >> > wrote:
> >>
> >> Hi all,
> >>
> >> *for (i = 0; i < 1000; i++) {*
> >> *  j = j + A[ i ]*
> >> *}*
> >>
> >> Suppose such a loop program is executed on gem5 (single-core
> execution, with O3 CU model). In
> >> that case, the memory hierarchy gets to see only one access at a
> time, e.g. only after A[ k ] is
> >> completed, A [ k + 1 ] access is sent to the memory hierarchy.
> Whereas, if the loop is unrolled
> >> (on i), multiple memory accesses are seen simultaneously. Why is
> that so? The memory loads could
> >> be serviced independently (even without unrolling the loop), so why
> is gem5 taking such a
> >> conservative approach?
> >>
> >> Any form of help/suggestion is highly appreciated.
> >>
> >> Thanks and regards,
> >> Aritra Bagchi
> >> Research Scholar,
> >> Department of Computer Science and Engineering,
> >> Indian Institute of Technology Delhi
> >>
> >>
> >>
> >> ___
> >> gem5-users mailing list -- gem5-users@gem5.org
> >> To unsubscribe send an email to gem5-users-le...@gem5.org
> > ___
> > gem5-users mailing list -- gem5-users@gem5.org
> > To unsubscribe send an email to gem5-users-le...@gem5.org
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Load dependency in gem5

2022-10-07 Thread Eliot Moss

On 10/7/2022 1:13 PM, Eliot Moss wrote:

On 10/7/2022 1:03 PM, Aritra Bagchi wrote:

Hi all,

Any suggestions on this are most helpful.

Thanks and regards,
Aritra


My guess is that it is because the non-unrolled loop
has a test of i against 1000 before each access to A[i].
That test guards the load, so must be completed before
the load can proceed.  It could also be because of the
way j is used - the next update cannot proceed until
the last one finishes.  It might be helpful to loop
at the actual instructions involved, but the control
dependency could be an issue.

The unrolled loop avoids both of these possible
dependencies.

I further observe that if we are talking about an
Intel processor. those processor handle loads in the
order the program presents them.  Not sure if that
has any impact here.  Also unsure whether cpu
speculative execution plays a role (which would actually
improve matters).

Best - Eliot Moss

On Thu, Oct 6, 2022 at 6:01 PM Aritra Bagchi > wrote:


    Hi all,

    *for (i = 0; i < 1000; i++) {*
    *      j = j + A[ i ]*
    *}*

    Suppose such a loop program is executed on gem5 (single-core execution, 
with O3 CU model). In
    that case, the memory hierarchy gets to see only one access at a time, e.g. 
only after A[ k ] is
    completed, A [ k + 1 ] access is sent to the memory hierarchy. Whereas, if 
the loop is unrolled
    (on i), multiple memory accesses are seen simultaneously. Why is that so? 
The memory loads could
    be serviced independently (even without unrolling the loop), so why is gem5 
taking such a
    conservative approach?

    Any form of help/suggestion is highly appreciated.

    Thanks and regards,
    Aritra Bagchi
    Research Scholar,
    Department of Computer Science and Engineering,
    Indian Institute of Technology Delhi



___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Load dependency in gem5

2022-10-07 Thread Eliot Moss

On 10/7/2022 1:03 PM, Aritra Bagchi wrote:

Hi all,

Any suggestions on this are most helpful.

Thanks and regards,
Aritra


My guess is that is is because the non-unrolled loop
has a test of i against 1000 before each access to A[i].
That test guards the load, so must be completed before
the load can proceed.  It could also be because of the
way j is used - the next update cannot proceed until
the last one finishes.  It might be helpful to loop
at the actual instructions involved, but the control
dependency could be an issue.

The unrolled loop avoids both of these possible
dependencies.

I further observe that if we are talking about an
Intel processor. those processor handle loads in the
order the program presents them.  Not sure if that
has any impact here.  Also unsure whether cpu
speculative execution plays a role (which would actually
improve matters).

Best - Eliot Moss

On Thu, Oct 6, 2022 at 6:01 PM Aritra Bagchi > wrote:


Hi all,

*for (i = 0; i < 1000; i++) {*
*      j = j + A[ i ]*
*}*

Suppose such a loop program is executed on gem5 (single-core execution, 
with O3 CU model). In
that case, the memory hierarchy gets to see only one access at a time, e.g. 
only after A[ k ] is
completed, A [ k + 1 ] access is sent to the memory hierarchy. Whereas, if 
the loop is unrolled
(on i), multiple memory accesses are seen simultaneously. Why is that so? 
The memory loads could
be serviced independently (even without unrolling the loop), so why is gem5 
taking such a
conservative approach?

Any form of help/suggestion is highly appreciated.

Thanks and regards,
Aritra Bagchi
Research Scholar,
Department of Computer Science and Engineering,
Indian Institute of Technology Delhi



___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Load dependency in gem5

2022-10-07 Thread Aritra Bagchi
Hi all,

Any suggestions on this are most helpful.

Thanks and regards,
Aritra


On Thu, Oct 6, 2022 at 6:01 PM Aritra Bagchi 
wrote:

> Hi all,
>
> *for (i = 0; i < 1000; i++) {*
> *  j = j + A[ i ]*
> *}*
>
> Suppose such a loop program is executed on gem5 (single-core execution,
> with O3 CU model). In that case, the memory hierarchy gets to see only one
> access at a time, e.g. only after A[ k ] is completed, A [ k + 1 ] access
> is sent to the memory hierarchy. Whereas, if the loop is unrolled (on i),
> multiple memory accesses are seen simultaneously. Why is that so? The
> memory loads could be serviced independently (even without unrolling the
> loop), so why is gem5 taking such a conservative approach?
>
> Any form of help/suggestion is highly appreciated.
>
> Thanks and regards,
> Aritra Bagchi
> Research Scholar,
> Department of Computer Science and Engineering,
> Indian Institute of Technology Delhi
>
>
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: HPCG on RISCV

2022-10-07 Thread Jason Lowe-Power
I have an idea...

Have you put a breakpoint in the implementation of the fsqrt_d function? I
would like to know if when running in SE mode and running in FS mode we are
using the same rounding mode. My hypothesis is that in FS mode the rounding
mode is set differently.

Cheers,
Jason

On Fri, Oct 7, 2022 at 12:15 AM Νικόλαος Ταμπουρατζής <
ntampourat...@ece.auth.gr> wrote:

> Dear Boddy,
>
> Thanks a lot for the effort! I looked in detail and I observe that the
> problem is created only using float and double variables (in the case
> of int it is working properly in FS mode). Specifically, in the case
> of float the variables are set to "nan", while in the case of double
> the variables are set to 0.00 (in random time - probably from some
> instruction of simulated OS?). You may use a simple c/c++ example in
> order to get some traces before going to HPCG...
>
> Thank you in advance!!
> Best regards,
> Nikos
>
>
> Quoting Bobby Bruce :
>
> > Hey Niko,
> >
> > Thanks for this analysis. I jumped a little into this today but didn't
> get
> > as far as you did. I wanted to find a quick way to recreate the
> following:
> > https://gem5-review.googlesource.com/c/public/gem5/+/64211.  Please feel
> > free to use this, if it helps any.
> >
> > It's very strange to me that this bug hasn't manifested itself before but
> > it's undeniably there. I'll try to spend more time looking at this
> tomorrow
> > with some traces and debug flags and see if I can narrow down the
> problem.
> >
> > --
> > Dr. Bobby R. Bruce
> > Room 3050,
> > Kemper Hall, UC Davis
> > Davis,
> > CA, 95616
> >
> > web: https://www.bobbybruce.net
> >
> >
> > On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής <
> > ntampourat...@ece.auth.gr> wrote:
> >
> >> In my previous results, I had used double (not float) for the
> >> following variables: result, sq_i and sq_j. In the case of float
> >> instead of double I get "nan" and not 0.00.
> >>
> >> Quoting Νικόλαος Ταμπουρατζής :
> >>
> >> > Dear Jason, all,
> >> >
> >> > I am trying to find the accuracy problem with RISCV-FS and I observe
> >> > that the problem is created (at least in my dummy example) because
> >> > the variables (double) are set to zero in random simulated time (for
> >> > this reason I get different results among executions of the same
> >> > code). Specifically for the following dummy code:
> >> >
> >> >
> >> > #include 
> >> > #include 
> >> >
> >> > int main(){
> >> >
> >> > int dim = 10;
> >> >
> >> > float result;
> >> >
> >> > for (int iter = 0; iter < 2; iter++){
> >> > result = 0;
> >> > for (int i = 0; i < dim; i++){
> >> > for (int j = 0; j < dim; j++){
> >> > float sq_i = sqrt(i);
> >> > float sq_j = sqrt(j);
> >> > result += sq_i * sq_j;
> >> > printf("ITER: %d | i: %d | j: %d Result(i: %f | j:
> >> > %f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result);
> >> > }
> >> > }
> >> > printf("Final Result: %lf\n", result);
> >> > }
> >> > }
> >> >
> >> >
> >> > The correct Final Result in both iterations is 372.721656. However,
> >> > I get the following results in FS:
> >> >
> >> > ITER: 0 | i: 0 | j: 0 Result(i: 0.00 | j: 0.00 | i*j:
> >> > 0.00): 0.00
> >> > ITER: 0 | i: 0 | j: 1 Result(i: 0.00 | j: 1.00 | i*j:
> >> > 0.00): 0.00
> >> > ITER: 0 | i: 0 | j: 2 Result(i: 0.00 | j: 1.414214 | i*j:
> >> > 0.00): 0.00
> >> > ITER: 0 | i: 0 | j: 3 Result(i: 0.00 | j: 1.732051 | i*j:
> >> > 0.00): 0.00
> >> > ITER: 0 | i: 0 | j: 4 Result(i: 0.00 | j: 2.00 | i*j:
> >> > 0.00): 0.00
> >> > ITER: 0 | i: 0 | j: 5 Result(i: 0.00 | j: 2.236068 | i*j:
> >> > 0.00): 0.00
> >> > ITER: 0 | i: 0 | j: 6 Result(i: 0.00 | j: 2.449490 | i*j:
> >> > 0.00): 0.00
> >> > ITER: 0 | i: 0 | j: 7 Result(i: 0.00 | j: 2.645751 | i*j:
> >> > 0.00): 0.00
> >> > ITER: 0 | i: 0 | j: 8 Result(i: 0.00 | j: 2.828427 | i*j:
> >> > 0.00): 0.00
> >> > ITER: 0 | i: 0 | j: 9 Result(i: 0.00 | j: 3.00 | i*j:
> >> > 0.00): 0.00
> >> > ITER: 0 | i: 1 | j: 0 Result(i: 1.00 | j: 0.00 | i*j:
> >> > 0.00): 0.00
> >> > ITER: 0 | i: 1 | j: 1 Result(i: 1.00 | j: 1.00 | i*j:
> >> > 1.00): 1.00
> >> > ITER: 0 | i: 1 | j: 2 Result(i: 1.00 | j: 1.414214 | i*j:
> >> > 1.414214): 2.414214
> >> > ITER: 0 | i: 1 | j: 3 Result(i: 1.00 | j: 1.732051 | i*j:
> >> > 1.732051): 4.146264
> >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.00 | j: 2.00 | i*j:
> >> > 0.00): 0.00
> >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.00 | j: 2.236068 | i*j:
> >> > 0.00): 0.00
> >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.00 | j: 2.449490 | i*j:
> >> > 0.00): 0.00
> >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.00 | j: 2.645751 | i*j:
> >> > 0.00): 0.00
> >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.0

[gem5-users] Re: HPCG on RISCV

2022-10-07 Thread Νικόλαος Ταμπουρατζής

Dear Boddy,

Thanks a lot for the effort! I looked in detail and I observe that the  
problem is created only using float and double variables (in the case  
of int it is working properly in FS mode). Specifically, in the case  
of float the variables are set to "nan", while in the case of double  
the variables are set to 0.00 (in random time - probably from some  
instruction of simulated OS?). You may use a simple c/c++ example in  
order to get some traces before going to HPCG...


Thank you in advance!!
Best regards,
Nikos


Quoting Bobby Bruce :


Hey Niko,

Thanks for this analysis. I jumped a little into this today but didn't get
as far as you did. I wanted to find a quick way to recreate the following:
https://gem5-review.googlesource.com/c/public/gem5/+/64211.  Please feel
free to use this, if it helps any.

It's very strange to me that this bug hasn't manifested itself before but
it's undeniably there. I'll try to spend more time looking at this tomorrow
with some traces and debug flags and see if I can narrow down the problem.

--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net


On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής <
ntampourat...@ece.auth.gr> wrote:


In my previous results, I had used double (not float) for the
following variables: result, sq_i and sq_j. In the case of float
instead of double I get "nan" and not 0.00.

Quoting Νικόλαος Ταμπουρατζής :

> Dear Jason, all,
>
> I am trying to find the accuracy problem with RISCV-FS and I observe
> that the problem is created (at least in my dummy example) because
> the variables (double) are set to zero in random simulated time (for
> this reason I get different results among executions of the same
> code). Specifically for the following dummy code:
>
>
> #include 
> #include 
>
> int main(){
>
> int dim = 10;
>
> float result;
>
> for (int iter = 0; iter < 2; iter++){
> result = 0;
> for (int i = 0; i < dim; i++){
> for (int j = 0; j < dim; j++){
> float sq_i = sqrt(i);
> float sq_j = sqrt(j);
> result += sq_i * sq_j;
> printf("ITER: %d | i: %d | j: %d Result(i: %f | j:
> %f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result);
> }
> }
> printf("Final Result: %lf\n", result);
> }
> }
>
>
> The correct Final Result in both iterations is 372.721656. However,
> I get the following results in FS:
>
> ITER: 0 | i: 0 | j: 0 Result(i: 0.00 | j: 0.00 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 0 | j: 1 Result(i: 0.00 | j: 1.00 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 0 | j: 2 Result(i: 0.00 | j: 1.414214 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 0 | j: 3 Result(i: 0.00 | j: 1.732051 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 0 | j: 4 Result(i: 0.00 | j: 2.00 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 0 | j: 5 Result(i: 0.00 | j: 2.236068 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 0 | j: 6 Result(i: 0.00 | j: 2.449490 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 0 | j: 7 Result(i: 0.00 | j: 2.645751 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 0 | j: 8 Result(i: 0.00 | j: 2.828427 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 0 | j: 9 Result(i: 0.00 | j: 3.00 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 1 | j: 0 Result(i: 1.00 | j: 0.00 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 1 | j: 1 Result(i: 1.00 | j: 1.00 | i*j:
> 1.00): 1.00
> ITER: 0 | i: 1 | j: 2 Result(i: 1.00 | j: 1.414214 | i*j:
> 1.414214): 2.414214
> ITER: 0 | i: 1 | j: 3 Result(i: 1.00 | j: 1.732051 | i*j:
> 1.732051): 4.146264
> ITER: 0 | i: 1 | j: 4 Result(i: 0.00 | j: 2.00 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 1 | j: 5 Result(i: 0.00 | j: 2.236068 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 1 | j: 6 Result(i: 0.00 | j: 2.449490 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 1 | j: 7 Result(i: 0.00 | j: 2.645751 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 1 | j: 8 Result(i: 0.00 | j: 2.828427 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 1 | j: 9 Result(i: 0.00 | j: 3.00 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.00 | i*j:
> 0.00): 0.00
> ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.00 | i*j:
> 1.414214): 1.414214
> ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j:
> 2.00): 3.414214
> ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j:
> 2.449490): 5.863703
> ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.00 | i*j:
> 2.828427): 8.692130
> ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j:
> 3.162278): 11.854408
> ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j:
> 3.464102): 15.318510
> ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j:
> 3.741657): 19.060167
> ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: