Re: [m5-dev] Running Ruby w/32 Cores

2011-04-07 Thread Nilay Vaish

On Thu, 7 Apr 2011, Gabriel Michael Black wrote:

When you say this is portable, what do you mean? Portable between compilers? 
We usually use gcc, but we have at least partial support for other compilers. 
I think this is necessary on some platforms.


Gabe



I would still root for using popcount() builtin available with GCC.


--
Nilay


Between different versions of gcc. Do we actually test whether the code 
compiles using other compilers?


--
Nilay
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Running Ruby w/32 Cores

2011-04-07 Thread Nilay Vaish
The problem is that LONG_BITS is 31, ie std::numeric_limitslong::digits 
returns 31 and not 32 which is what the writer expected.


--
Nilay


From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On Behalf Of 
Korey Sewell

Sent: Tuesday, April 05, 2011 7:14 AM
To: Beckmann, Brad
Subject: Re: [m5-dev] Running Ruby w/32 Cores

Hi again Brad, I looked this over again and although my 32-bit patch 
fixes things, now that I look at it again, I'm not convinced that I 
actually fixed the symptom of the bug but rather the cause of the bug.


Do you happen to know what are the problems with the 32-bit Set counts?

Sorry for prolonging the issue, but I thought I had put this to bed but 
maybe not. Finally, it may not matter that this works on 32-bit machines 
but it'd be nice if it did. (Let me know if I should move this convo to 
the m5-dev list)


I end up checking the last bit in the count function manually (the code 
as follows): int Set::count() const {

   int counter = 0;
   long mask;

   for (int i = 0; i  m_nArrayLen; i++) {
   mask = (long)0x01;

   for (int j = 0; j  LONG_BITS; j++) {
   // FIXME - significant performance loss when array
   // population  LONG_BITS
   if ((m_p_nArray[i]  mask) != 0) {
   counter++;
   }
   mask = mask  1;
   }

#ifndef _LP64
   long msb_mask = 0x8000;
   if ((m_p_nArray[i]  msb_mask) != 0) {
   counter++;
   }
#endif
   }

   return counter;
}

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Running Ruby w/32 Cores

2011-04-07 Thread Gabriel Michael Black

Quoting Nilay Vaish ni...@cs.wisc.edu:


On Thu, 7 Apr 2011, Gabriel Michael Black wrote:

When you say this is portable, what do you mean? Portable between  
compilers? We usually use gcc, but we have at least partial support  
for other compilers. I think this is necessary on some platforms.


Gabe



I would still root for using popcount() builtin available with GCC.


--
Nilay


Between different versions of gcc. Do we actually test whether the  
code compiles using other compilers?


--
Nilay
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



I don't know if we actively test it, but it worked at one time. Ali  
did some work on that, I think to get it to build with sun's compiler  
back when he was doing the SPARC full system support. It would be a  
good idea not to bake in any dependence on gcc.


Gabe

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Running Ruby w/32 Cores

2011-04-07 Thread Nilay Vaish

On Thu, 7 Apr 2011, Gabriel Michael Black wrote:


Quoting Nilay Vaish ni...@cs.wisc.edu:


On Thu, 7 Apr 2011, Gabriel Michael Black wrote:

When you say this is portable, what do you mean? Portable between 
compilers? We usually use gcc, but we have at least partial support for 
other compilers. I think this is necessary on some platforms.


Gabe



I would still root for using popcount() builtin available with GCC.


--
Nilay


Between different versions of gcc. Do we actually test whether the code 
compiles using other compilers?


--
Nilay
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



I don't know if we actively test it, but it worked at one time. Ali did some 
work on that, I think to get it to build with sun's compiler back when he was 
doing the SPARC full system support. It would be a good idea not to bake in 
any dependence on gcc.


Gabe



I agree with you. If we can avoid dependence on a compiler, we certainly 
should.


--
Nilay
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Running Ruby w/32 Cores

2011-04-06 Thread Beckmann, Brad
Hi Korey,

Yes, let's move this conversation back to m5-dev, since I think others may be 
interested and could help.

I don't know what the problem is exactly, but at some point of time (probably 
back in the early GEMS days) I seem to remember the Set code included an 
assertion check about the 31st bit in 32-bit mode.  Therefore, I think we knew 
about this problem and made sure that never happened.  I believe that is why we 
used to have a restriction that Ruby could only support 16 processors.  I'm 
really fuzzy on the details...maybe someone else can elaborate.

In the end, I just want to make sure we add something in the code that makes 
sure we don't encounter this problem again.  This is one of those bugs that can 
take a while to track down, if you don't catch it right when it happens with an 
assertion.

Brad



From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On Behalf Of Korey 
Sewell
Sent: Tuesday, April 05, 2011 7:14 AM
To: Beckmann, Brad
Subject: Re: [m5-dev] Running Ruby w/32 Cores

Hi again Brad,
I looked this over again and although my 32-bit patch fixes things, now that 
I look at it again, I'm not convinced that I actually fixed the symptom of the 
bug but rather the cause of the bug.

Do you happen to know what are the problems with the 32-bit Set counts?

Sorry for prolonging the issue, but I thought I had put this to bed but  maybe 
not. Finally, it may not matter that this works on 32-bit machines but it'd be 
nice if it did. (Let me know if I should move this convo to the m5-dev list)

I end up checking the last bit in the count function manually (the code as 
follows):
int
Set::count() const
{
int counter = 0;
long mask;

for (int i = 0; i  m_nArrayLen; i++) {
mask = (long)0x01;

for (int j = 0; j  LONG_BITS; j++) {
// FIXME - significant performance loss when array
// population  LONG_BITS
if ((m_p_nArray[i]  mask) != 0) {
counter++;
}
mask = mask  1;
}

#ifndef _LP64
long msb_mask = 0x8000;
if ((m_p_nArray[i]  msb_mask) != 0) {
counter++;
}
#endif
}

return counter;
}
On Tue, Apr 5, 2011 at 1:30 AM, Korey Sewell 
ksew...@umich.edumailto:ksew...@umich.edu wrote:
Brad, it  looks like you were right on the money here. I found the spot where 
it was returning the wrong value via a SLICC function to count sharers for 
everyone except the owner.

I realized that the machine that I use for testing is just a 32-bit machine, 
and like you warned there look to be issues with the Set type there. I ran the 
Fft-32 cores on a 64-bit machine and it seems to work correctly. I'll be 
running on the full splash/parsec suites soon and that should stress Ruby a 
good bit :).

I have a patch that checks to see if _LP64 is defined, and if not check that 
last bit when doing the set count function.

Thanks for being helpful in debugging. It was a relatively easy bug, but as 
always going through code and becoming more proficient at getting around while 
trying to solve a bug is really helpful.

On Fri, Apr 1, 2011 at 7:28 PM, Beckmann, Brad 
brad.beckm...@amd.commailto:brad.beckm...@amd.com wrote:
Ok for the first trace, the critical line is the following:

348523   0L2Cache L1_GETX  ILOSXIFLXO  [0x16180, line 0x16180] 
[NetDest (4) 0  - 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
1  - 0 0  - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  - 
]30

L2Cache identifies that 31 caches have a shared copy and that L1 cache 9 (L1-9) 
is the owner.
When L1Cache 0 (L1-0) issues a GETX, the L2Cache issues 30 Inv probes, forwards 
the GETX to L1-9, and sends an ack to L1-0 itself.
However, the L2 cache tells L1-0 to expect only 30 acks instead of 31.  It 
could be something wrong with the NetDest::count() function, or the 
Set::count() function?  I slightly modified my previous patch to isolate on 
what value the NetDest::count() function is returning.  If it is returning 30, 
instead of 31, then it must be a problem with NetDest.  You are compiling gem5 
as a 64-bit binary, right?

The second problem is essentially the same issue.  L2Cache 31 (L2-31) is the 
owner of the block, but I suspect NetDest is not counting bit 31 and thus it is 
returning a count of 0...causing the error.

Overall, concentrate on that NetDest::count function, or more importantly the 
Set::count() function.  Once you find out the problem, please let me know.

Thanks,

Brad


From: koreylsew...@gmail.commailto:koreylsew...@gmail.com 
[mailto:koreylsew...@gmail.commailto:koreylsew...@gmail.com] On Behalf Of 
Korey Sewell
Sent: Friday, April 01, 2011 12:00 PM
To: Beckmann, Brad

Subject: Re: [m5-dev] Running Ruby w/32 Cores

Brad,
attached are the protocol traces grep'd for the offending addresses. I'm going 
to spend the weekend digging through Ruby code so hopefully I'm pretty close to 
generating the fixes myself

Re: [m5-dev] Running Ruby w/32 Cores

2011-04-06 Thread Ali Saidi
Jumping in somewhat randomly here, uint64_t even on a 32bit machine is 
reasonably fast. It's not going to be as fast, but it will be correct. 
My vote would be to just switch all that Set code that uses long to 
explicitly use uint64_t and if it's slower on a 32bit machine so be it. 
At least it's correct.


Ali



On Wed, 6 Apr 2011 15:24:24 -0500, Beckmann, Brad 
brad.beckm...@amd.com wrote:

Hi Korey,

Yes, let's move this conversation back to m5-dev, since I think
others may be interested and could help.

I don't know what the problem is exactly, but at some point of time
(probably back in the early GEMS days) I seem to remember the Set 
code

included an assertion check about the 31st bit in 32-bit mode.
Therefore, I think we knew about this problem and made sure that 
never

happened.  I believe that is why we used to have a restriction that
Ruby could only support 16 processors.  I'm really fuzzy on the
details...maybe someone else can elaborate.

In the end, I just want to make sure we add something in the code
that makes sure we don't encounter this problem again.  This is one 
of

those bugs that can take a while to track down, if you don't catch it
right when it happens with an assertion.

Brad



From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On
Behalf Of Korey Sewell
Sent: Tuesday, April 05, 2011 7:14 AM
To: Beckmann, Brad
Subject: Re: [m5-dev] Running Ruby w/32 Cores

Hi again Brad,
I looked this over again and although my 32-bit patch fixes things,
now that I look at it again, I'm not convinced that I actually fixed
the symptom of the bug but rather the cause of the bug.

Do you happen to know what are the problems with the 32-bit Set 
counts?


Sorry for prolonging the issue, but I thought I had put this to bed
but  maybe not. Finally, it may not matter that this works on 32-bit
machines but it'd be nice if it did. (Let me know if I should move
this convo to the m5-dev list)

I end up checking the last bit in the count function manually (the
code as follows):
int
Set::count() const
{
int counter = 0;
long mask;

for (int i = 0; i  m_nArrayLen; i++) {
mask = (long)0x01;

for (int j = 0; j  LONG_BITS; j++) {
// FIXME - significant performance loss when array
// population  LONG_BITS
if ((m_p_nArray[i]  mask) != 0) {
counter++;
}
mask = mask  1;
}

#ifndef _LP64
long msb_mask = 0x8000;
if ((m_p_nArray[i]  msb_mask) != 0) {
counter++;
}
#endif
}

return counter;
}
On Tue, Apr 5, 2011 at 1:30 AM, Korey Sewell
ksew...@umich.edumailto:ksew...@umich.edu wrote:
Brad, it  looks like you were right on the money here. I found the
spot where it was returning the wrong value via a SLICC function to
count sharers for everyone except the owner.

I realized that the machine that I use for testing is just a 32-bit
machine, and like you warned there look to be issues with the Set 
type
there. I ran the Fft-32 cores on a 64-bit machine and it seems to 
work

correctly. I'll be running on the full splash/parsec suites soon and
that should stress Ruby a good bit :).

I have a patch that checks to see if _LP64 is defined, and if not
check that last bit when doing the set count function.

Thanks for being helpful in debugging. It was a relatively easy
bug, but as always going through code and becoming more proficient at
getting around while trying to solve a bug is really helpful.

On Fri, Apr 1, 2011 at 7:28 PM, Beckmann, Brad
brad.beckm...@amd.commailto:brad.beckm...@amd.com wrote:
Ok for the first trace, the critical line is the following:

348523   0L2Cache L1_GETX  ILOSXIFLXO  [0x16180,
line 0x16180] [NetDest (4) 0  - 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1  - 0 0  - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  - ]30

L2Cache identifies that 31 caches have a shared copy and that L1
cache 9 (L1-9) is the owner.
When L1Cache 0 (L1-0) issues a GETX, the L2Cache issues 30 Inv
probes, forwards the GETX to L1-9, and sends an ack to L1-0 itself.
However, the L2 cache tells L1-0 to expect only 30 acks instead of
31.  It could be something wrong with the NetDest::count() function,
or the Set::count() function?  I slightly modified my previous patch
to isolate on what value the NetDest::count() function is returning.
If it is returning 30, instead of 31, then it must be a problem with
NetDest.  You are compiling gem5 as a 64-bit binary, right?

The second problem is essentially the same issue.  L2Cache 31 (L2-31)
is the owner of the block, but I suspect NetDest is not counting bit
31 and thus it is returning a count of 0...causing the error.

Overall, concentrate on that NetDest::count function, or more
importantly the Set::count() function.  Once you find out the 
problem,

please let me know.

Thanks,

Brad


From: koreylsew...@gmail.commailto:koreylsew...@gmail.com
[mailto:koreylsew

Re: [m5-dev] Running Ruby w/32 Cores

2011-04-06 Thread Ali Saidi
stl::bitset does these type of optimizations underneath and it's 
portable.


Ali

On Wed, 6 Apr 2011 15:57:37 -0500 (CDT), Nilay Vaish 
ni...@cs.wisc.edu wrote:

I would prefer we make use of GCC builtin __builtin_popcount() for
counting the number of 1's in an int or related data type.

Nilay

On Wed, 6 Apr 2011, Ali Saidi wrote:


And actually, couldn't you use an stl bitset for this?

Thanks,
Ali

On Wed, 06 Apr 2011 15:34:01 -0500, Ali Saidi sa...@umich.edu 
wrote:

Jumping in somewhat randomly here, uint64_t even on a 32bit machine
is reasonably fast. It's not going to be as fast, but it will be
correct. My vote would be to just switch all that Set code that 
uses
long to explicitly use uint64_t and if it's slower on a 32bit 
machine

so be it. At least it's correct.
Ali


On Wed, 6 Apr 2011 15:24:24 -0500, Beckmann, Brad
brad.beckm...@amd.com wrote:

Hi Korey,
Yes, let's move this conversation back to m5-dev, since I think
others may be interested and could help.
I don't know what the problem is exactly, but at some point of 
time
(probably back in the early GEMS days) I seem to remember the Set 
code

included an assertion check about the 31st bit in 32-bit mode.
Therefore, I think we knew about this problem and made sure that 
never
happened.  I believe that is why we used to have a restriction 
that

Ruby could only support 16 processors.  I'm really fuzzy on the
details...maybe someone else can elaborate.
In the end, I just want to make sure we add something in the code
that makes sure we don't encounter this problem again.  This is 
one of
those bugs that can take a while to track down, if you don't catch 
it

right when it happens with an assertion.
Brad


From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On
Behalf Of Korey Sewell
Sent: Tuesday, April 05, 2011 7:14 AM
To: Beckmann, Brad
Subject: Re: [m5-dev] Running Ruby w/32 Cores
Hi again Brad,
I looked this over again and although my 32-bit patch fixes 
things,
now that I look at it again, I'm not convinced that I actually 
fixed

the symptom of the bug but rather the cause of the bug.
Do you happen to know what are the problems with the 32-bit Set 
counts?
Sorry for prolonging the issue, but I thought I had put this to 
bed
but  maybe not. Finally, it may not matter that this works on 
32-bit

machines but it'd be nice if it did. (Let me know if I should move
this convo to the m5-dev list)
I end up checking the last bit in the count function manually (the
code as follows):
int
Set::count() const
{
int counter = 0;
long mask;

for (int i = 0; i  m_nArrayLen; i++) {
mask = (long)0x01;

for (int j = 0; j  LONG_BITS; j++) {
// FIXME - significant performance loss when array
// population  LONG_BITS
if ((m_p_nArray[i]  mask) != 0) {
counter++;
}
mask = mask  1;
}
#ifndef _LP64
long msb_mask = 0x8000;
if ((m_p_nArray[i]  msb_mask) != 0) {
counter++;
}
#endif
}

return counter;
}

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Running Ruby w/32 Cores

2011-04-06 Thread Korey Sewell
A few comments:
(1) Using uint64_t seems like a quick, interim solution. But I still
haven't grasped why we have the 31st bit problem, but we don't have
the 63rd bit problem as well?

(2) Adding the stl::bitset seems like a good idea (does the Flags in
M5 use that?) but it wont be a straightforward switch because the Set
class supports arbitrary size sets. If it was implemented it would
take a little bit of effort but not too much.

(3) I didnt say this earlier, but it does look like this code could
use some optimization. From the gprof I ran on 2-8 cores, this
Set::count() function is the 2nd or 3rd highest producer of time for
the Ruby Fft runs (although still a very small overall % in system
time). Looks like simple optimizations like only looping for the set
size in the count() function should be helpful, instead of always
looping for the complete length of long datatype:
 for (int j = 0; j  LONG_BITS; j++) {
if ((m_p_nArray[i]  mask) != 0) {
  counter++;
}
   mask = mask  1;
 }

That as well as generating a mask, shifting and comparing each bit
doesn't seem necessary given we can potentially use a bitset or a
constant-time struct to loop over and check set inclusion.

On Wed, Apr 6, 2011 at 5:12 PM, Nilay Vaish ni...@cs.wisc.edu wrote:
 I believe even popcount is portable. I am not opposed to using bitset, just
 that it would probably require lot more changes.

 --
 Nilay

 On Wed, 6 Apr 2011, Ali Saidi wrote:

 stl::bitset does these type of optimizations underneath and it's portable.

 Ali

 On Wed, 6 Apr 2011 15:57:37 -0500 (CDT), Nilay Vaish ni...@cs.wisc.edu
 wrote:

 I would prefer we make use of GCC builtin __builtin_popcount() for
 counting the number of 1's in an int or related data type.

 Nilay

 On Wed, 6 Apr 2011, Ali Saidi wrote:

 And actually, couldn't you use an stl bitset for this?

 Thanks,
 Ali

 On Wed, 06 Apr 2011 15:34:01 -0500, Ali Saidi sa...@umich.edu wrote:

 Jumping in somewhat randomly here, uint64_t even on a 32bit machine
 is reasonably fast. It's not going to be as fast, but it will be
 correct. My vote would be to just switch all that Set code that uses
 long to explicitly use uint64_t and if it's slower on a 32bit machine
 so be it. At least it's correct.
 Ali


 On Wed, 6 Apr 2011 15:24:24 -0500, Beckmann, Brad
 brad.beckm...@amd.com wrote:

 Hi Korey,
 Yes, let's move this conversation back to m5-dev, since I think
 others may be interested and could help.
 I don't know what the problem is exactly, but at some point of time
 (probably back in the early GEMS days) I seem to remember the Set code
 included an assertion check about the 31st bit in 32-bit mode.
 Therefore, I think we knew about this problem and made sure that never
 happened.  I believe that is why we used to have a restriction that
 Ruby could only support 16 processors.  I'm really fuzzy on the
 details...maybe someone else can elaborate.
 In the end, I just want to make sure we add something in the code
 that makes sure we don't encounter this problem again.  This is one of
 those bugs that can take a while to track down, if you don't catch it
 right when it happens with an assertion.
 Brad


 From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On
 Behalf Of Korey Sewell
 Sent: Tuesday, April 05, 2011 7:14 AM
 To: Beckmann, Brad
 Subject: Re: [m5-dev] Running Ruby w/32 Cores
 Hi again Brad,
 I looked this over again and although my 32-bit patch fixes things,
 now that I look at it again, I'm not convinced that I actually fixed
 the symptom of the bug but rather the cause of the bug.
 Do you happen to know what are the problems with the 32-bit Set
 counts?
 Sorry for prolonging the issue, but I thought I had put this to bed
 but  maybe not. Finally, it may not matter that this works on 32-bit
 machines but it'd be nice if it did. (Let me know if I should move
 this convo to the m5-dev list)
 I end up checking the last bit in the count function manually (the
 code as follows):
 int
 Set::count() const
 {
    int counter = 0;
    long mask;

    for (int i = 0; i  m_nArrayLen; i++) {
        mask = (long)0x01;

        for (int j = 0; j  LONG_BITS; j++) {
            // FIXME - significant performance loss when array
            // population  LONG_BITS
            if ((m_p_nArray[i]  mask) != 0) {
                counter++;
            }
            mask = mask  1;
        }
 #ifndef _LP64
        long msb_mask = 0x8000;
        if ((m_p_nArray[i]  msb_mask) != 0) {
            counter++;
        }
 #endif
    }

    return counter;
 }

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev




-- 
- Korey

Re: [m5-dev] Running Ruby w/32 Cores

2011-04-06 Thread Nilay Vaish

On Wed, 6 Apr 2011, Korey Sewell wrote:


A few comments:
(1) Using uint64_t seems like a quick, interim solution. But I still
haven't grasped why we have the 31st bit problem, but we don't have
the 63rd bit problem as well?


I think if you use unsigned long, in place of long, the code would work on 
32-bit machines. I am uncertain why the current code works on 64-bit 
machine. I think long means 32-bit, irrespective of memory address length.




(2) Adding the stl::bitset seems like a good idea (does the Flags in
M5 use that?) but it wont be a straightforward switch because the Set
class supports arbitrary size sets. If it was implemented it would
take a little bit of effort but not too much.

(3) I didnt say this earlier, but it does look like this code could
use some optimization. From the gprof I ran on 2-8 cores, this
Set::count() function is the 2nd or 3rd highest producer of time for
the Ruby Fft runs (although still a very small overall % in system
time). Looks like simple optimizations like only looping for the set
size in the count() function should be helpful, instead of always
looping for the complete length of long datatype:
for (int j = 0; j  LONG_BITS; j++) {
   if ((m_p_nArray[i]  mask) != 0) {
 counter++;
   }
  mask = mask  1;
}

That as well as generating a mask, shifting and comparing each bit
doesn't seem necessary given we can potentially use a bitset or a
constant-time struct to loop over and check set inclusion.


I would still root for using popcount() builtin available with GCC.


--
Nilay
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Running Ruby w/32 Cores

2011-04-06 Thread Ali Saidi

On Apr 6, 2011, at 6:17 PM, Korey Sewell wrote:

 A few comments:
 (1) Using uint64_t seems like a quick, interim solution. But I still
 haven't grasped why we have the 31st bit problem, but we don't have
 the 63rd bit problem as well?
 
 (2) Adding the stl::bitset seems like a good idea (does the Flags in
 M5 use that?) but it wont be a straightforward switch because the Set
 class supports arbitrary size sets. If it was implemented it would
 take a little bit of effort but not too much.

The functional units, instruction flags and packet flags use it. Trace flags 
doesn't.

bitset supports arbitrarily sized sets too, you just have to declare the max 
size at construction (although there is a performance benefit to being less 
than the machine word length, it all still works if you're not).  Additionally, 
bitset seem to support most if not all of the operations (intersection, union, 
count, zero, etc) that Set does, although they have different names.
 
 (3) I didnt say this earlier, but it does look like this code could
 use some optimization. From the gprof I ran on 2-8 cores, this
 Set::count() function is the 2nd or 3rd highest producer of time for
 the Ruby Fft runs (although still a very small overall % in system
 time). Looks like simple optimizations like only looping for the set
 size in the count() function should be helpful, instead of always
 looping for the complete length of long datatype:
 for (int j = 0; j  LONG_BITS; j++) {
if ((m_p_nArray[i]  mask) != 0) {
  counter++;
}
   mask = mask  1;
 }
 
 That as well as generating a mask, shifting and comparing each bit
 doesn't seem necessary given we can potentially use a bitset or a
 constant-time struct to loop over and check set inclusion.
You can also do it with a constant time count of the number of bits that is set 
that is updated whenever something is changed. However, I don't think there is 
any reason to try and optimize a bespoke implementation of a bitset. The STL is 
going to be faster and will improve for free over time while this 
implementation won't. For example, bitset also uses count leading zeros where 
available to speed up finding the first set bit. 


Ali


 On Wed, Apr 6, 2011 at 5:12 PM, Nilay Vaish ni...@cs.wisc.edu wrote:
 I believe even popcount is portable. I am not opposed to using bitset, just
 that it would probably require lot more changes.
 
 --
 Nilay
 
 On Wed, 6 Apr 2011, Ali Saidi wrote:
 
 stl::bitset does these type of optimizations underneath and it's portable.
 
 Ali
 
 On Wed, 6 Apr 2011 15:57:37 -0500 (CDT), Nilay Vaish ni...@cs.wisc.edu
 wrote:
 
 I would prefer we make use of GCC builtin __builtin_popcount() for
 counting the number of 1's in an int or related data type.
 
 Nilay
 
 On Wed, 6 Apr 2011, Ali Saidi wrote:
 
 And actually, couldn't you use an stl bitset for this?
 
 Thanks,
 Ali
 
 On Wed, 06 Apr 2011 15:34:01 -0500, Ali Saidi sa...@umich.edu wrote:
 
 Jumping in somewhat randomly here, uint64_t even on a 32bit machine
 is reasonably fast. It's not going to be as fast, but it will be
 correct. My vote would be to just switch all that Set code that uses
 long to explicitly use uint64_t and if it's slower on a 32bit machine
 so be it. At least it's correct.
 Ali
 
 
 On Wed, 6 Apr 2011 15:24:24 -0500, Beckmann, Brad
 brad.beckm...@amd.com wrote:
 
 Hi Korey,
 Yes, let's move this conversation back to m5-dev, since I think
 others may be interested and could help.
 I don't know what the problem is exactly, but at some point of time
 (probably back in the early GEMS days) I seem to remember the Set code
 included an assertion check about the 31st bit in 32-bit mode.
 Therefore, I think we knew about this problem and made sure that never
 happened.  I believe that is why we used to have a restriction that
 Ruby could only support 16 processors.  I'm really fuzzy on the
 details...maybe someone else can elaborate.
 In the end, I just want to make sure we add something in the code
 that makes sure we don't encounter this problem again.  This is one of
 those bugs that can take a while to track down, if you don't catch it
 right when it happens with an assertion.
 Brad
 
 
 From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On
 Behalf Of Korey Sewell
 Sent: Tuesday, April 05, 2011 7:14 AM
 To: Beckmann, Brad
 Subject: Re: [m5-dev] Running Ruby w/32 Cores
 Hi again Brad,
 I looked this over again and although my 32-bit patch fixes things,
 now that I look at it again, I'm not convinced that I actually fixed
 the symptom of the bug but rather the cause of the bug.
 Do you happen to know what are the problems with the 32-bit Set
 counts?
 Sorry for prolonging the issue, but I thought I had put this to bed
 but  maybe not. Finally, it may not matter that this works on 32-bit
 machines but it'd be nice if it did. (Let me know if I should move
 this convo to the m5-dev list)
 I end up checking the last bit in the count function manually (the
 code as follows):
 int
 Set

Re: [m5-dev] Running Ruby w/32 Cores

2011-04-06 Thread Gabriel Michael Black
When you say this is portable, what do you mean? Portable between  
compilers? We usually use gcc, but we have at least partial support  
for other compilers. I think this is necessary on some platforms.


Gabe



I would still root for using popcount() builtin available with GCC.


--
Nilay
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev





___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Running Ruby w/32 Cores

2011-04-06 Thread Korey Sewell
Hi Ali,
My only problem with stl::bitset here is that the Set type from Ruby
seems to have the option to be resizable (through the overloaded
assignment operator). That's what I meant by arbitrary length.

In practice, I'm not sure if they ever assign sets of different
lengths to each other (causing resizing), but if they do, then that
would suggest that using the stl::bitset isnt a straightforward thing
(definitely do-able though, just not plug/play).

If the resizing is just a unused feature of Ruby, then I would
suggest we switch to bitset.

-- 
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Running Ruby w/32 Cores

2011-03-31 Thread Korey Sewell
Hi Lisa,
I actually had sent the attachments to Brad since m5dev bounced the
attachments. I think the limit is 512kB or something like that.

But definitely, thanks for the heads up!

On Wed, Mar 30, 2011 at 7:45 PM, Lisa Hsu h...@eecs.umich.edu wrote:
 I think you forgot the attachments  :P.
 Sometimes, if ProtocolTrace isn't enough for me to find a problem, I turn on
 RubySlicc and RubyGenerated as well.  RubySlicc is the DPRINTFs within the
 actual protocol *.sm files, and RubyGenerated are inside of the generated
 code that you will only see in the build directory.

 Lisa
 On Tue, Mar 29, 2011 at 10:15 AM, Korey Sewell ksew...@umich.edu wrote:

 Thanks for the response Brad.

 The 1st trace has 1 L2 and the 2nd has 1 L2 (i had a typo in the
 original email).

 For each trace, I attach the stdout/stderr (*.out) and then the
 protocol trace (*.prottrace).

 Also, in the 1st trace, the offending address is clear and I isolate
 that in the protocol trace file provided. However, in the 2nd trace,
 it's unclear (currently) which access caused it to fail so I took the
 whole protocol trace file and gzip'd it.

 My current lack of expertise in SLICC limits me a bit, but I'd like to
 be more helpful in debugging so if there is anything that I can look
 into (or run) on my end to expedite the process, please advise. In the
 interim, I'll try to locate the exact address that's breaking trace 2
 and then hopefully repost that.

 Thanks!

 -Korey

 On Tue, Mar 29, 2011 at 12:02 PM, Beckmann, Brad brad.beckm...@amd.com
 wrote:
  Hi Korey,
 
  I believe both of these issues should be easy to solve once we have a
  protocol trace leading up to the error.  If you could create such a trace
  and send it to the list, that would be great.  Just zero in on the 
  offending
  address.
 
  Thanks,
 
  Brad
 
 
  -Original Message-
  From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
  On Behalf Of Korey Sewell
  Sent: Tuesday, March 29, 2011 8:11 AM
  To: M5 Developer List
  Subject: [m5-dev] Running Ruby w/32 Cores
 
  Hi All,
  I'm still having a bit of trouble running Ruby with 32+ cores. I am
  experimenting w/configs varying the l2-caches. The runs seems to
  generate
  various errors in the SLICC.
 
  Has anybody seen these or have any insight to how to start solving
  these
  type of issues (posted below)?
  =
  The command line and errors are as follows:
  (1) 32 Cores and 32 L2s
  build/ALPHA_FS_MOESI_CMP_directory/m5.opt
  configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num-
  l2caches=32 ...
  info: Entering event queue @ 0.  Starting simulation...
  Runtime Error at MOESI_CMP_directory-dir.sm:155, Ruby Time: 38279:
  assert failure, PID: 5990
  press return to continue.
 
  Program aborted at cycle 19139500
  Aborted
 
  (2) 32 Cores and 1 L2
  build/ALPHA_FS_MOESI_CMP_directory/m5.opt
  configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num-
  l2caches=32 ...
  fatal: Invalid transition
  system.l1_cntrl0 time: 349075 addr: [0x16180, line 0x16180] event: Ack
  state:
  MM  @ cycle 174537500
  [doTransitionWorker:build/ALPHA_FS_MOESI_CMP_directory/mem/protoc
  ol/L1Cache_Transitions.cc,
  line 477]
  Memory Usage: 2316756 KBytes
  For more information see: http://www.m5sim.org/fatal/23f196b2
  
 
  Please let me know if you do...Thanks!
 
  --
  - Korey
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 



 --
 - Korey

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev






-- 
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Running Ruby w/32 Cores

2011-03-31 Thread Korey Sewell
Is there an attached patch I should be running or did it get bounced
by m5-dev? If so, can you send it directly to me rather through
m5-dev?

On Wed, Mar 30, 2011 at 8:26 PM, Beckmann, Brad brad.beckm...@amd.com wrote:
 Hi Korey,

 For the first trace, it looks like the L2 cache is either miscounting the 
 number of valid L1 copies, or there is an error with the ack arithmetic.  We 
 are going to need a bit more information to figure out where the exact 
 problem is.  Could you apply the attached patch and reply with the new 
 protocol trace?  Thanks.

 For the second trace, you should be able to get the offending address by 
 simply attaching GDB to the aborted process.  Without knowing which address 
 to zero in on, it is the proverbial  finding a needle in a haystack.

 Thanks,

 Brad



 -Original Message-
 From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
 On Behalf Of Korey Sewell
 Sent: Tuesday, March 29, 2011 10:15 AM
 To: M5 Developer List
 Subject: Re: [m5-dev] Running Ruby w/32 Cores

 Thanks for the response Brad.

 The 1st trace has 1 L2 and the 2nd has 1 L2 (i had a typo in the original 
 email).

 For each trace, I attach the stdout/stderr (*.out) and then the protocol 
 trace
 (*.prottrace).

 Also, in the 1st trace, the offending address is clear and I isolate that in 
 the
 protocol trace file provided. However, in the 2nd trace, it's unclear 
 (currently)
 which access caused it to fail so I took the whole protocol trace file and 
 gzip'd
 it.

 My current lack of expertise in SLICC limits me a bit, but I'd like to be 
 more
 helpful in debugging so if there is anything that I can look into (or run) 
 on my
 end to expedite the process, please advise. In the interim, I'll try to 
 locate
 the exact address that's breaking trace 2 and then hopefully repost that.

 Thanks!

 -Korey

 On Tue, Mar 29, 2011 at 12:02 PM, Beckmann, Brad
 brad.beckm...@amd.com wrote:
  Hi Korey,
 
  I believe both of these issues should be easy to solve once we have a
 protocol trace leading up to the error.  If you could create such a trace and
 send it to the list, that would be great.  Just zero in on the offending 
 address.
 
  Thanks,
 
  Brad
 
 
  -Original Message-
  From: m5-dev-boun...@m5sim.org [mailto:m5-dev-
 boun...@m5sim.org] On
  Behalf Of Korey Sewell
  Sent: Tuesday, March 29, 2011 8:11 AM
  To: M5 Developer List
  Subject: [m5-dev] Running Ruby w/32 Cores
 
  Hi All,
  I'm still having a bit of trouble running Ruby with 32+ cores. I am
  experimenting w/configs varying the l2-caches. The runs seems to
  generate various errors in the SLICC.
 
  Has anybody seen these or have any insight to how to start solving
  these type of issues (posted below)?
  =
  The command line and errors are as follows:
  (1) 32 Cores and 32 L2s
  build/ALPHA_FS_MOESI_CMP_directory/m5.opt
  configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num-
  l2caches=32 ...
  info: Entering event queue @ 0.  Starting simulation...
  Runtime Error at MOESI_CMP_directory-dir.sm:155, Ruby Time: 38279:
  assert failure, PID: 5990
  press return to continue.
 
  Program aborted at cycle 19139500
  Aborted
 
  (2) 32 Cores and 1 L2
  build/ALPHA_FS_MOESI_CMP_directory/m5.opt
  configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num-
  l2caches=32 ...
  fatal: Invalid transition
  system.l1_cntrl0 time: 349075 addr: [0x16180, line 0x16180] event: Ack
 state:
  MM  @ cycle 174537500
 
 [doTransitionWorker:build/ALPHA_FS_MOESI_CMP_directory/mem/protoc
  ol/L1Cache_Transitions.cc,
  line 477]
  Memory Usage: 2316756 KBytes
  For more information see: http://www.m5sim.org/fatal/23f196b2
  
 
  Please let me know if you do...Thanks!
 
  --
  - Korey
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 



 --
 - Korey

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev





-- 
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Running Ruby w/32 Cores

2011-03-30 Thread Lisa Hsu
I think you forgot the attachments  :P.

Sometimes, if ProtocolTrace isn't enough for me to find a problem, I turn on
RubySlicc and RubyGenerated as well.  RubySlicc is the DPRINTFs within the
actual protocol *.sm files, and RubyGenerated are inside of the generated
code that you will only see in the build directory.

Lisa

On Tue, Mar 29, 2011 at 10:15 AM, Korey Sewell ksew...@umich.edu wrote:

 Thanks for the response Brad.

 The 1st trace has 1 L2 and the 2nd has 1 L2 (i had a typo in the
 original email).

 For each trace, I attach the stdout/stderr (*.out) and then the
 protocol trace (*.prottrace).

 Also, in the 1st trace, the offending address is clear and I isolate
 that in the protocol trace file provided. However, in the 2nd trace,
 it's unclear (currently) which access caused it to fail so I took the
 whole protocol trace file and gzip'd it.

 My current lack of expertise in SLICC limits me a bit, but I'd like to
 be more helpful in debugging so if there is anything that I can look
 into (or run) on my end to expedite the process, please advise. In the
 interim, I'll try to locate the exact address that's breaking trace 2
 and then hopefully repost that.

 Thanks!

 -Korey

 On Tue, Mar 29, 2011 at 12:02 PM, Beckmann, Brad brad.beckm...@amd.com
 wrote:
  Hi Korey,
 
  I believe both of these issues should be easy to solve once we have a
 protocol trace leading up to the error.  If you could create such a trace
 and send it to the list, that would be great.  Just zero in on the offending
 address.
 
  Thanks,
 
  Brad
 
 
  -Original Message-
  From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
  On Behalf Of Korey Sewell
  Sent: Tuesday, March 29, 2011 8:11 AM
  To: M5 Developer List
  Subject: [m5-dev] Running Ruby w/32 Cores
 
  Hi All,
  I'm still having a bit of trouble running Ruby with 32+ cores. I am
  experimenting w/configs varying the l2-caches. The runs seems to
 generate
  various errors in the SLICC.
 
  Has anybody seen these or have any insight to how to start solving these
  type of issues (posted below)?
  =
  The command line and errors are as follows:
  (1) 32 Cores and 32 L2s
  build/ALPHA_FS_MOESI_CMP_directory/m5.opt
  configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num-
  l2caches=32 ...
  info: Entering event queue @ 0.  Starting simulation...
  Runtime Error at MOESI_CMP_directory-dir.sm:155, Ruby Time: 38279:
  assert failure, PID: 5990
  press return to continue.
 
  Program aborted at cycle 19139500
  Aborted
 
  (2) 32 Cores and 1 L2
  build/ALPHA_FS_MOESI_CMP_directory/m5.opt
  configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num-
  l2caches=32 ...
  fatal: Invalid transition
  system.l1_cntrl0 time: 349075 addr: [0x16180, line 0x16180] event: Ack
 state:
  MM  @ cycle 174537500
  [doTransitionWorker:build/ALPHA_FS_MOESI_CMP_directory/mem/protoc
  ol/L1Cache_Transitions.cc,
  line 477]
  Memory Usage: 2316756 KBytes
  For more information see: http://www.m5sim.org/fatal/23f196b2
  
 
  Please let me know if you do...Thanks!
 
  --
  - Korey
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 



 --
 - Korey

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Running Ruby w/32 Cores

2011-03-30 Thread Beckmann, Brad
Hi Korey,

For the first trace, it looks like the L2 cache is either miscounting the 
number of valid L1 copies, or there is an error with the ack arithmetic.  We 
are going to need a bit more information to figure out where the exact problem 
is.  Could you apply the attached patch and reply with the new protocol trace?  
Thanks.

For the second trace, you should be able to get the offending address by simply 
attaching GDB to the aborted process.  Without knowing which address to zero in 
on, it is the proverbial  finding a needle in a haystack.

Thanks,

Brad



 -Original Message-
 From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
 On Behalf Of Korey Sewell
 Sent: Tuesday, March 29, 2011 10:15 AM
 To: M5 Developer List
 Subject: Re: [m5-dev] Running Ruby w/32 Cores
 
 Thanks for the response Brad.
 
 The 1st trace has 1 L2 and the 2nd has 1 L2 (i had a typo in the original 
 email).
 
 For each trace, I attach the stdout/stderr (*.out) and then the protocol trace
 (*.prottrace).
 
 Also, in the 1st trace, the offending address is clear and I isolate that in 
 the
 protocol trace file provided. However, in the 2nd trace, it's unclear 
 (currently)
 which access caused it to fail so I took the whole protocol trace file and 
 gzip'd
 it.
 
 My current lack of expertise in SLICC limits me a bit, but I'd like to be more
 helpful in debugging so if there is anything that I can look into (or run) on 
 my
 end to expedite the process, please advise. In the interim, I'll try to locate
 the exact address that's breaking trace 2 and then hopefully repost that.
 
 Thanks!
 
 -Korey
 
 On Tue, Mar 29, 2011 at 12:02 PM, Beckmann, Brad
 brad.beckm...@amd.com wrote:
  Hi Korey,
 
  I believe both of these issues should be easy to solve once we have a
 protocol trace leading up to the error.  If you could create such a trace and
 send it to the list, that would be great.  Just zero in on the offending 
 address.
 
  Thanks,
 
  Brad
 
 
  -Original Message-
  From: m5-dev-boun...@m5sim.org [mailto:m5-dev-
 boun...@m5sim.org] On
  Behalf Of Korey Sewell
  Sent: Tuesday, March 29, 2011 8:11 AM
  To: M5 Developer List
  Subject: [m5-dev] Running Ruby w/32 Cores
 
  Hi All,
  I'm still having a bit of trouble running Ruby with 32+ cores. I am
  experimenting w/configs varying the l2-caches. The runs seems to
  generate various errors in the SLICC.
 
  Has anybody seen these or have any insight to how to start solving
  these type of issues (posted below)?
  =
  The command line and errors are as follows:
  (1) 32 Cores and 32 L2s
  build/ALPHA_FS_MOESI_CMP_directory/m5.opt
  configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num-
  l2caches=32 ...
  info: Entering event queue @ 0.  Starting simulation...
  Runtime Error at MOESI_CMP_directory-dir.sm:155, Ruby Time: 38279:
  assert failure, PID: 5990
  press return to continue.
 
  Program aborted at cycle 19139500
  Aborted
 
  (2) 32 Cores and 1 L2
  build/ALPHA_FS_MOESI_CMP_directory/m5.opt
  configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num-
  l2caches=32 ...
  fatal: Invalid transition
  system.l1_cntrl0 time: 349075 addr: [0x16180, line 0x16180] event: Ack
 state:
  MM  @ cycle 174537500
 
 [doTransitionWorker:build/ALPHA_FS_MOESI_CMP_directory/mem/protoc
  ol/L1Cache_Transitions.cc,
  line 477]
  Memory Usage: 2316756 KBytes
  For more information see: http://www.m5sim.org/fatal/23f196b2
  
 
  Please let me know if you do...Thanks!
 
  --
  - Korey
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 
 
 
 
 --
 - Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] Running Ruby w/32 Cores

2011-03-29 Thread Korey Sewell
Hi All,
I'm still having a bit of trouble running Ruby with 32+ cores. I am
experimenting w/configs varying the l2-caches. The runs seems to
generate various errors in the SLICC.

Has anybody seen these or have any insight to how to start solving
these type of issues (posted below)?
=
The command line and errors are as follows:
(1) 32 Cores and 32 L2s
build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py
-b FftBase32 -n 32 --num-dirs=32 --num-l2caches=32
...
info: Entering event queue @ 0.  Starting simulation...
Runtime Error at MOESI_CMP_directory-dir.sm:155, Ruby Time: 38279:
assert failure, PID: 5990
press return to continue.

Program aborted at cycle 19139500
Aborted

(2) 32 Cores and 1 L2
build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py
-b FftBase32 -n 32 --num-dirs=32 --num-l2caches=32
...
fatal: Invalid transition
system.l1_cntrl0 time: 349075 addr: [0x16180, line 0x16180] event: Ack state: MM
 @ cycle 174537500
[doTransitionWorker:build/ALPHA_FS_MOESI_CMP_directory/mem/protocol/L1Cache_Transitions.cc,
line 477]
Memory Usage: 2316756 KBytes
For more information see: http://www.m5sim.org/fatal/23f196b2


Please let me know if you do...Thanks!

-- 
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Running Ruby w/32 Cores

2011-03-29 Thread Beckmann, Brad
Hi Korey,

I believe both of these issues should be easy to solve once we have a protocol 
trace leading up to the error.  If you could create such a trace and send it to 
the list, that would be great.  Just zero in on the offending address.

Thanks,

Brad


 -Original Message-
 From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
 On Behalf Of Korey Sewell
 Sent: Tuesday, March 29, 2011 8:11 AM
 To: M5 Developer List
 Subject: [m5-dev] Running Ruby w/32 Cores
 
 Hi All,
 I'm still having a bit of trouble running Ruby with 32+ cores. I am
 experimenting w/configs varying the l2-caches. The runs seems to generate
 various errors in the SLICC.
 
 Has anybody seen these or have any insight to how to start solving these
 type of issues (posted below)?
 =
 The command line and errors are as follows:
 (1) 32 Cores and 32 L2s
 build/ALPHA_FS_MOESI_CMP_directory/m5.opt
 configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num-
 l2caches=32 ...
 info: Entering event queue @ 0.  Starting simulation...
 Runtime Error at MOESI_CMP_directory-dir.sm:155, Ruby Time: 38279:
 assert failure, PID: 5990
 press return to continue.
 
 Program aborted at cycle 19139500
 Aborted
 
 (2) 32 Cores and 1 L2
 build/ALPHA_FS_MOESI_CMP_directory/m5.opt
 configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num-
 l2caches=32 ...
 fatal: Invalid transition
 system.l1_cntrl0 time: 349075 addr: [0x16180, line 0x16180] event: Ack state:
 MM  @ cycle 174537500
 [doTransitionWorker:build/ALPHA_FS_MOESI_CMP_directory/mem/protoc
 ol/L1Cache_Transitions.cc,
 line 477]
 Memory Usage: 2316756 KBytes
 For more information see: http://www.m5sim.org/fatal/23f196b2
 
 
 Please let me know if you do...Thanks!
 
 --
 - Korey
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Running Ruby w/32 Cores

2011-03-29 Thread Korey Sewell
Thanks for the response Brad.

The 1st trace has 1 L2 and the 2nd has 1 L2 (i had a typo in the
original email).

For each trace, I attach the stdout/stderr (*.out) and then the
protocol trace (*.prottrace).

Also, in the 1st trace, the offending address is clear and I isolate
that in the protocol trace file provided. However, in the 2nd trace,
it's unclear (currently) which access caused it to fail so I took the
whole protocol trace file and gzip'd it.

My current lack of expertise in SLICC limits me a bit, but I'd like to
be more helpful in debugging so if there is anything that I can look
into (or run) on my end to expedite the process, please advise. In the
interim, I'll try to locate the exact address that's breaking trace 2
and then hopefully repost that.

Thanks!

-Korey

On Tue, Mar 29, 2011 at 12:02 PM, Beckmann, Brad brad.beckm...@amd.com wrote:
 Hi Korey,

 I believe both of these issues should be easy to solve once we have a 
 protocol trace leading up to the error.  If you could create such a trace and 
 send it to the list, that would be great.  Just zero in on the offending 
 address.

 Thanks,

 Brad


 -Original Message-
 From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
 On Behalf Of Korey Sewell
 Sent: Tuesday, March 29, 2011 8:11 AM
 To: M5 Developer List
 Subject: [m5-dev] Running Ruby w/32 Cores

 Hi All,
 I'm still having a bit of trouble running Ruby with 32+ cores. I am
 experimenting w/configs varying the l2-caches. The runs seems to generate
 various errors in the SLICC.

 Has anybody seen these or have any insight to how to start solving these
 type of issues (posted below)?
 =
 The command line and errors are as follows:
 (1) 32 Cores and 32 L2s
 build/ALPHA_FS_MOESI_CMP_directory/m5.opt
 configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num-
 l2caches=32 ...
 info: Entering event queue @ 0.  Starting simulation...
 Runtime Error at MOESI_CMP_directory-dir.sm:155, Ruby Time: 38279:
 assert failure, PID: 5990
 press return to continue.

 Program aborted at cycle 19139500
 Aborted

 (2) 32 Cores and 1 L2
 build/ALPHA_FS_MOESI_CMP_directory/m5.opt
 configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num-
 l2caches=32 ...
 fatal: Invalid transition
 system.l1_cntrl0 time: 349075 addr: [0x16180, line 0x16180] event: Ack state:
 MM  @ cycle 174537500
 [doTransitionWorker:build/ALPHA_FS_MOESI_CMP_directory/mem/protoc
 ol/L1Cache_Transitions.cc,
 line 477]
 Memory Usage: 2316756 KBytes
 For more information see: http://www.m5sim.org/fatal/23f196b2
 

 Please let me know if you do...Thanks!

 --
 - Korey
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev




-- 
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev