[m5-dev] Cron m5test@zizzer /z/m5/regression/do-regression quick

2011-04-06 Thread Cron Daemon
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/inorder-timing 
passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-atomic passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/o3-timing passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-timing passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby 
passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/o3-timing passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-atomic passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby 
passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-timing passed.
* build/ALPHA_SE/tests/fast/quick/01.hello-2T-smt/alpha/linux/o3-timing 
passed.
* build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-atomic 
passed.
* build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-timing 
passed.
* build/ALPHA_SE/tests/fast/quick/30.eio-mp/alpha/eio/simple-atomic-mp 
passed.
* build/ALPHA_SE/tests/fast/quick/30.eio-mp/alpha/eio/simple-timing-mp 
passed.
* build/ALPHA_SE/tests/fast/quick/50.memtest/alpha/linux/memtest passed.
* build/ALPHA_SE/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby 
passed.
* build/ALPHA_SE/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby 
passed.
* 
build/ALPHA_SE_MOESI_hammer/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_hammer
 passed.
* 
build/ALPHA_SE_MOESI_hammer/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_hammer
 passed.
* 
build/ALPHA_SE_MOESI_hammer/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_hammer
 passed.
* 
build/ALPHA_SE_MOESI_hammer/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_hammer
 passed.
* 
build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_token
 passed.
* 
build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_token
 passed.
* 
build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_token
 passed.
* 
build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_token
 passed.
* 
build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic 
passed.
* 
build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic-dual
 passed.
* 
build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing 
passed.
* 
build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing-dual
 passed.
* 
build/ALPHA_FS/tests/fast/quick/80.netperf-stream/alpha/linux/twosys-tsunami-simple-atomic
 passed.
* build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/inorder-timing passed.
* build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/o3-timing passed.
* build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-atomic passed.
* build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-timing passed.
* build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-timing-ruby 
passed.
* build/POWER_SE/tests/fast/quick/00.hello/power/linux/o3-timing passed.
* build/POWER_SE/tests/fast/quick/00.hello/power/linux/simple-atomic passed.
* build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-atomic passed.
* build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-timing passed.
* build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-timing-ruby 
passed.
* build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/o3-timing passed.
* build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/simple-timing 
passed.
* build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/simple-atomic 
passed.
* 
build/SPARC_SE/tests/fast/quick/40.m5threads-test-atomic/sparc/linux/simple-atomic-mp
 passed.
* 

Re: [m5-dev] Cron m5test@zizzer /z/m5/regression/do-regression --scratch all

2011-04-06 Thread Gabe Black
It looks like it was this change which was directly after the one I
pointed out before.

changeset:   8134:b01a51ff05fa
user:Ali Saidi ali.sa...@arm.com
date:Thu Mar 17 19:20:19 2011 -0500
summary: Mem: Fix issue with dirty block being lost when entire
block transferred to non-cache.

Could you take a look, Ali? The description doesn't necessarily sound
like something you'd expect to change the stats (it sounds like a corner
case), but I'm assuming you'll know.

Gabe

On 04/03/11 19:51, Gabe Black wrote:
 Does anyone have any ideas about when X86_SE parser stopped working? The
 last time it passed for sure was the end of February, but on March 16th
 Ali updated the stats and so it was presumably working then too. I'm
 running at that changeset right now to confirm that. There weren't any
 X86 specific changes recently, but there were a few O3 ones which might
 have changed the stats. The output is below, and you can see the biggest
 change percentage wise was icache writebacks. Most of the changes are
 related to memory somehow. After the stats is info about a change that
 may have caused the problem.

 = Statistics differences =
 Maximum error magnitude: +133.33%

   Reference  New Value   Abs Diff   Pct Chg
 Key statistics:

   host_inst_rate 189714 148502 -41212   -21.72%
   host_mem_usage 264736 268256   3520+1.33%
   sim_insts  1527476062 15289887561512694+0.10%
   sim_ticks  610952992000 612245337000 1292345000   
 +0.21%
   system.cpu.commit.COM:count1527476062 15289887561512694+0.10%

 Differences  0%:

   system.cpu.icache.writebacks3  7  4  +133.33%
   system.cpu.rename.RENAME:serializeStallCycles  19936 
 16025  -3911   -19.62%
   system.cpu.l2cache.occ_%::0  0.213694   0.236362   0.022668   +10.61%
   system.cpu.l2cache.occ_blocks::0 7002.339473 7745.103692 742.764219  
 +10.61%
   system.cpu.rename.RENAME:tempSerializingInsts   2561  
 2314   -247-9.64%
   system.cpu.icache.ReadReq_mshr_hits   1570   1427  
 -143-9.11%
   system.cpu.icache.demand_mshr_hits   1570   1427   -143   
 -9.11%
   system.cpu.icache.overall_mshr_hits   1570   1427  
 -143-9.11%
   system.cpu.rename.RENAME:serializingInsts   2550   2345  
 -205-8.04%
   system.cpu.l2cache.ReadReq_misses 316709 339091  22382   
 +7.07%
   system.cpu.l2cache.ReadReq_mshr_misses 316709 339091 
 22382+7.07%
   system.cpu.l2cache.ReadReq_mshr_miss_latency 9818903000 10512799000 
 693896000+7.07%
   system.cpu.l2cache.ReadReq_miss_latency 10822415500 11584355000 
 761939500+7.04%
   system.cpu.dcache.ReadReq_mshr_miss_latency 14062264500 14863694500 
 80143+5.70%
   system.cpu.l2cache.ReadReq_miss_rate   0.182786   0.192941  
 0.010155+5.56%
   system.cpu.l2cache.ReadReq_mshr_miss_rate   0.182786   0.192941  
 0.010155+5.56%
   system.cpu.idleCycles24586339   256777931091454+4.44%
   system.cpu.dcache.ReadReq_avg_mshr_miss_latency 8150.695480
 8493.787248 343.091768+4.21%
   system.cpu.l2cache.replacements 553099 575827  22728+4.11%
   system.cpu.l2cache.demand_mshr_miss_latency 17475146000 18186565000 
 711419000+4.07%
 [... showing top 20 errors only, additional errors omitted ...]

 * build/X86_SE/tests/fast/long/20.parser/x86/linux/o3-timing FAILED!

 changeset 9f704aa10eb4 in /z/repo/m5
 details: http://repo.m5sim.org/m5?cmd=changeset;node=9f704aa10eb4
 description:
   O3: Fix unaligned stores when cache blocked

   Without this change the a store can be issued to the cache multiple 
 times.
   If this case occurs when the l1 cache is out of mshrs (and thus blocked)
   the processor will never make forward progress because each cycle it 
 will
   send a single request using the recently freed mshr and not completing 
 the
   multipart store. This will continue forever.

 diffstat:

  src/cpu/o3/lsq_unit_impl.hh |  4 +++-
  1 files changed, 3 insertions(+), 1 deletions(-)

 diffs (14 lines):

 diff -r 2af262e73961 -r 9f704aa10eb4 src/cpu/o3/lsq_unit_impl.hh
 --- a/src/cpu/o3/lsq_unit_impl.hh Thu Mar 17 00:43:54 2011 -0400
 +++ b/src/cpu/o3/lsq_unit_impl.hh Thu Mar 17 19:20:19 2011 -0500
 @@ -1103,7 +1103,9 @@
  dynamic_castLSQSenderState *(retryPkt-senderState);
  
  // Don't finish the store unless this is the last packet.
 -if (!TheISA::HasUnalignedMemAcc || !state-pktToSend) {
 +if (!TheISA::HasUnalignedMemAcc || !state-pktToSend ||
 +state-pendingPacket == retryPkt) {
 +state-pktToSend = false;
  storePostSend(retryPkt);
  }
  retryPkt = NULL;
 

Re: [m5-dev] Running Ruby w/32 Cores

2011-04-06 Thread Beckmann, Brad
Hi Korey,

Yes, let's move this conversation back to m5-dev, since I think others may be 
interested and could help.

I don't know what the problem is exactly, but at some point of time (probably 
back in the early GEMS days) I seem to remember the Set code included an 
assertion check about the 31st bit in 32-bit mode.  Therefore, I think we knew 
about this problem and made sure that never happened.  I believe that is why we 
used to have a restriction that Ruby could only support 16 processors.  I'm 
really fuzzy on the details...maybe someone else can elaborate.

In the end, I just want to make sure we add something in the code that makes 
sure we don't encounter this problem again.  This is one of those bugs that can 
take a while to track down, if you don't catch it right when it happens with an 
assertion.

Brad



From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On Behalf Of Korey 
Sewell
Sent: Tuesday, April 05, 2011 7:14 AM
To: Beckmann, Brad
Subject: Re: [m5-dev] Running Ruby w/32 Cores

Hi again Brad,
I looked this over again and although my 32-bit patch fixes things, now that 
I look at it again, I'm not convinced that I actually fixed the symptom of the 
bug but rather the cause of the bug.

Do you happen to know what are the problems with the 32-bit Set counts?

Sorry for prolonging the issue, but I thought I had put this to bed but  maybe 
not. Finally, it may not matter that this works on 32-bit machines but it'd be 
nice if it did. (Let me know if I should move this convo to the m5-dev list)

I end up checking the last bit in the count function manually (the code as 
follows):
int
Set::count() const
{
int counter = 0;
long mask;

for (int i = 0; i  m_nArrayLen; i++) {
mask = (long)0x01;

for (int j = 0; j  LONG_BITS; j++) {
// FIXME - significant performance loss when array
// population  LONG_BITS
if ((m_p_nArray[i]  mask) != 0) {
counter++;
}
mask = mask  1;
}

#ifndef _LP64
long msb_mask = 0x8000;
if ((m_p_nArray[i]  msb_mask) != 0) {
counter++;
}
#endif
}

return counter;
}
On Tue, Apr 5, 2011 at 1:30 AM, Korey Sewell 
ksew...@umich.edumailto:ksew...@umich.edu wrote:
Brad, it  looks like you were right on the money here. I found the spot where 
it was returning the wrong value via a SLICC function to count sharers for 
everyone except the owner.

I realized that the machine that I use for testing is just a 32-bit machine, 
and like you warned there look to be issues with the Set type there. I ran the 
Fft-32 cores on a 64-bit machine and it seems to work correctly. I'll be 
running on the full splash/parsec suites soon and that should stress Ruby a 
good bit :).

I have a patch that checks to see if _LP64 is defined, and if not check that 
last bit when doing the set count function.

Thanks for being helpful in debugging. It was a relatively easy bug, but as 
always going through code and becoming more proficient at getting around while 
trying to solve a bug is really helpful.

On Fri, Apr 1, 2011 at 7:28 PM, Beckmann, Brad 
brad.beckm...@amd.commailto:brad.beckm...@amd.com wrote:
Ok for the first trace, the critical line is the following:

348523   0L2Cache L1_GETX  ILOSXIFLXO  [0x16180, line 0x16180] 
[NetDest (4) 0  - 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
1  - 0 0  - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  - 
]30

L2Cache identifies that 31 caches have a shared copy and that L1 cache 9 (L1-9) 
is the owner.
When L1Cache 0 (L1-0) issues a GETX, the L2Cache issues 30 Inv probes, forwards 
the GETX to L1-9, and sends an ack to L1-0 itself.
However, the L2 cache tells L1-0 to expect only 30 acks instead of 31.  It 
could be something wrong with the NetDest::count() function, or the 
Set::count() function?  I slightly modified my previous patch to isolate on 
what value the NetDest::count() function is returning.  If it is returning 30, 
instead of 31, then it must be a problem with NetDest.  You are compiling gem5 
as a 64-bit binary, right?

The second problem is essentially the same issue.  L2Cache 31 (L2-31) is the 
owner of the block, but I suspect NetDest is not counting bit 31 and thus it is 
returning a count of 0...causing the error.

Overall, concentrate on that NetDest::count function, or more importantly the 
Set::count() function.  Once you find out the problem, please let me know.

Thanks,

Brad


From: koreylsew...@gmail.commailto:koreylsew...@gmail.com 
[mailto:koreylsew...@gmail.commailto:koreylsew...@gmail.com] On Behalf Of 
Korey Sewell
Sent: Friday, April 01, 2011 12:00 PM
To: Beckmann, Brad

Subject: Re: [m5-dev] Running Ruby w/32 Cores

Brad,
attached are the protocol traces grep'd for the offending addresses. I'm going 
to spend the weekend digging through Ruby code so hopefully I'm pretty close to 
generating the fixes myself.


Re: [m5-dev] Running Ruby w/32 Cores

2011-04-06 Thread Ali Saidi
Jumping in somewhat randomly here, uint64_t even on a 32bit machine is 
reasonably fast. It's not going to be as fast, but it will be correct. 
My vote would be to just switch all that Set code that uses long to 
explicitly use uint64_t and if it's slower on a 32bit machine so be it. 
At least it's correct.


Ali



On Wed, 6 Apr 2011 15:24:24 -0500, Beckmann, Brad 
brad.beckm...@amd.com wrote:

Hi Korey,

Yes, let's move this conversation back to m5-dev, since I think
others may be interested and could help.

I don't know what the problem is exactly, but at some point of time
(probably back in the early GEMS days) I seem to remember the Set 
code

included an assertion check about the 31st bit in 32-bit mode.
Therefore, I think we knew about this problem and made sure that 
never

happened.  I believe that is why we used to have a restriction that
Ruby could only support 16 processors.  I'm really fuzzy on the
details...maybe someone else can elaborate.

In the end, I just want to make sure we add something in the code
that makes sure we don't encounter this problem again.  This is one 
of

those bugs that can take a while to track down, if you don't catch it
right when it happens with an assertion.

Brad



From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On
Behalf Of Korey Sewell
Sent: Tuesday, April 05, 2011 7:14 AM
To: Beckmann, Brad
Subject: Re: [m5-dev] Running Ruby w/32 Cores

Hi again Brad,
I looked this over again and although my 32-bit patch fixes things,
now that I look at it again, I'm not convinced that I actually fixed
the symptom of the bug but rather the cause of the bug.

Do you happen to know what are the problems with the 32-bit Set 
counts?


Sorry for prolonging the issue, but I thought I had put this to bed
but  maybe not. Finally, it may not matter that this works on 32-bit
machines but it'd be nice if it did. (Let me know if I should move
this convo to the m5-dev list)

I end up checking the last bit in the count function manually (the
code as follows):
int
Set::count() const
{
int counter = 0;
long mask;

for (int i = 0; i  m_nArrayLen; i++) {
mask = (long)0x01;

for (int j = 0; j  LONG_BITS; j++) {
// FIXME - significant performance loss when array
// population  LONG_BITS
if ((m_p_nArray[i]  mask) != 0) {
counter++;
}
mask = mask  1;
}

#ifndef _LP64
long msb_mask = 0x8000;
if ((m_p_nArray[i]  msb_mask) != 0) {
counter++;
}
#endif
}

return counter;
}
On Tue, Apr 5, 2011 at 1:30 AM, Korey Sewell
ksew...@umich.edumailto:ksew...@umich.edu wrote:
Brad, it  looks like you were right on the money here. I found the
spot where it was returning the wrong value via a SLICC function to
count sharers for everyone except the owner.

I realized that the machine that I use for testing is just a 32-bit
machine, and like you warned there look to be issues with the Set 
type
there. I ran the Fft-32 cores on a 64-bit machine and it seems to 
work

correctly. I'll be running on the full splash/parsec suites soon and
that should stress Ruby a good bit :).

I have a patch that checks to see if _LP64 is defined, and if not
check that last bit when doing the set count function.

Thanks for being helpful in debugging. It was a relatively easy
bug, but as always going through code and becoming more proficient at
getting around while trying to solve a bug is really helpful.

On Fri, Apr 1, 2011 at 7:28 PM, Beckmann, Brad
brad.beckm...@amd.commailto:brad.beckm...@amd.com wrote:
Ok for the first trace, the critical line is the following:

348523   0L2Cache L1_GETX  ILOSXIFLXO  [0x16180,
line 0x16180] [NetDest (4) 0  - 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1  - 0 0  - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  - ]30

L2Cache identifies that 31 caches have a shared copy and that L1
cache 9 (L1-9) is the owner.
When L1Cache 0 (L1-0) issues a GETX, the L2Cache issues 30 Inv
probes, forwards the GETX to L1-9, and sends an ack to L1-0 itself.
However, the L2 cache tells L1-0 to expect only 30 acks instead of
31.  It could be something wrong with the NetDest::count() function,
or the Set::count() function?  I slightly modified my previous patch
to isolate on what value the NetDest::count() function is returning.
If it is returning 30, instead of 31, then it must be a problem with
NetDest.  You are compiling gem5 as a 64-bit binary, right?

The second problem is essentially the same issue.  L2Cache 31 (L2-31)
is the owner of the block, but I suspect NetDest is not counting bit
31 and thus it is returning a count of 0...causing the error.

Overall, concentrate on that NetDest::count function, or more
importantly the Set::count() function.  Once you find out the 
problem,

please let me know.

Thanks,

Brad


From: koreylsew...@gmail.commailto:koreylsew...@gmail.com

Re: [m5-dev] Running Ruby w/32 Cores

2011-04-06 Thread Ali Saidi
stl::bitset does these type of optimizations underneath and it's 
portable.


Ali

On Wed, 6 Apr 2011 15:57:37 -0500 (CDT), Nilay Vaish 
ni...@cs.wisc.edu wrote:

I would prefer we make use of GCC builtin __builtin_popcount() for
counting the number of 1's in an int or related data type.

Nilay

On Wed, 6 Apr 2011, Ali Saidi wrote:


And actually, couldn't you use an stl bitset for this?

Thanks,
Ali

On Wed, 06 Apr 2011 15:34:01 -0500, Ali Saidi sa...@umich.edu 
wrote:

Jumping in somewhat randomly here, uint64_t even on a 32bit machine
is reasonably fast. It's not going to be as fast, but it will be
correct. My vote would be to just switch all that Set code that 
uses
long to explicitly use uint64_t and if it's slower on a 32bit 
machine

so be it. At least it's correct.
Ali


On Wed, 6 Apr 2011 15:24:24 -0500, Beckmann, Brad
brad.beckm...@amd.com wrote:

Hi Korey,
Yes, let's move this conversation back to m5-dev, since I think
others may be interested and could help.
I don't know what the problem is exactly, but at some point of 
time
(probably back in the early GEMS days) I seem to remember the Set 
code

included an assertion check about the 31st bit in 32-bit mode.
Therefore, I think we knew about this problem and made sure that 
never
happened.  I believe that is why we used to have a restriction 
that

Ruby could only support 16 processors.  I'm really fuzzy on the
details...maybe someone else can elaborate.
In the end, I just want to make sure we add something in the code
that makes sure we don't encounter this problem again.  This is 
one of
those bugs that can take a while to track down, if you don't catch 
it

right when it happens with an assertion.
Brad


From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On
Behalf Of Korey Sewell
Sent: Tuesday, April 05, 2011 7:14 AM
To: Beckmann, Brad
Subject: Re: [m5-dev] Running Ruby w/32 Cores
Hi again Brad,
I looked this over again and although my 32-bit patch fixes 
things,
now that I look at it again, I'm not convinced that I actually 
fixed

the symptom of the bug but rather the cause of the bug.
Do you happen to know what are the problems with the 32-bit Set 
counts?
Sorry for prolonging the issue, but I thought I had put this to 
bed
but  maybe not. Finally, it may not matter that this works on 
32-bit

machines but it'd be nice if it did. (Let me know if I should move
this convo to the m5-dev list)
I end up checking the last bit in the count function manually (the
code as follows):
int
Set::count() const
{
int counter = 0;
long mask;

for (int i = 0; i  m_nArrayLen; i++) {
mask = (long)0x01;

for (int j = 0; j  LONG_BITS; j++) {
// FIXME - significant performance loss when array
// population  LONG_BITS
if ((m_p_nArray[i]  mask) != 0) {
counter++;
}
mask = mask  1;
}
#ifndef _LP64
long msb_mask = 0x8000;
if ((m_p_nArray[i]  msb_mask) != 0) {
counter++;
}
#endif
}

return counter;
}

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] changeset in m5: ruby: fixes to support more types of RubyRequests

2011-04-06 Thread Brad Beckmann
changeset 02cb69e5cfeb in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=02cb69e5cfeb
description:
ruby: fixes to support more types of RubyRequests

diffstat:

 src/mem/ruby/system/Sequencer.cc |  9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diffs (40 lines):

diff -r 54a65799e4c1 -r 02cb69e5cfeb src/mem/ruby/system/Sequencer.cc
--- a/src/mem/ruby/system/Sequencer.cc  Mon Apr 04 11:42:32 2011 -0500
+++ b/src/mem/ruby/system/Sequencer.cc  Wed Apr 06 14:41:41 2011 -0700
@@ -229,6 +229,7 @@
 Address line_addr(request-ruby_request.m_PhysicalAddress);
 line_addr.makeLineAddress();
 if ((request-ruby_request.m_Type == RubyRequestType_ST) ||
+(request-ruby_request.m_Type == RubyRequestType_ATOMIC) ||
 (request-ruby_request.m_Type == RubyRequestType_RMW_Read) ||
 (request-ruby_request.m_Type == RubyRequestType_RMW_Write) ||
 (request-ruby_request.m_Type == RubyRequestType_Load_Linked) ||
@@ -381,6 +382,7 @@
 markRemoved();
 
 assert((request-ruby_request.m_Type == RubyRequestType_ST) ||
+   (request-ruby_request.m_Type == RubyRequestType_ATOMIC) ||
(request-ruby_request.m_Type == RubyRequestType_RMW_Read) ||
(request-ruby_request.m_Type == RubyRequestType_RMW_Write) ||
(request-ruby_request.m_Type == RubyRequestType_Load_Linked) ||
@@ -648,6 +650,7 @@
   //
   case RubyRequestType_Load_Linked:
   case RubyRequestType_Store_Conditional:
+  case RubyRequestType_ATOMIC:
 ctype = RubyRequestType_ATOMIC;
 break;
   default:
@@ -671,8 +674,10 @@
 
 Address line_addr(request.m_PhysicalAddress);
 line_addr.makeLineAddress();
-int proc_id = request.pkt-req-hasContextId() ?
-request.pkt-req-contextId() : -1;
+int proc_id = -1;
+if (request.pkt != NULL  request.pkt-req-hasContextId()) {
+proc_id = request.pkt-req-contextId();
+}
 RubyRequest *msg = new RubyRequest(request.m_PhysicalAddress.getAddress(),
request.data, request.m_Size,
request.m_ProgramCounter.getAddress(),
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] changeset in web-graphics: Ruby: Added figures for overview, dat...

2011-04-06 Thread Rathijit Sen
changeset 9e9db0c974e3 in /z/repo/web-graphics
details: web-graphics?cmd=changeset;node=9e9db0c974e3
description:
Ruby: Added figures for overview, data structures and timing for the 
Memory Controller.
The data structure and timing diagrams were adapted/taken from a 
presentation created by Andy Phelps in 2008.

diffstat:

 ruby/figures/mc_addr_command_timing.jpg  |0 
 ruby/figures/mc_addr_command_timing_back_to_back.jpg |0 
 ruby/figures/mc_data_struct.jpg  |0 
 ruby/figures/mc_overview.jpg |0 
 ruby/sources/mc_addr_command_timing.ppt  |0 
 ruby/sources/mc_addr_command_timing_back_to_back.ppt |0 
 ruby/sources/mc_data_struct.ppt  |0 
 ruby/sources/mc_overview.doc |0 
 8 files changed, 0 insertions(+), 0 deletions(-)

diffs (16 lines):

diff -r 9125ffaa8bfc -r 9e9db0c974e3 ruby/figures/mc_addr_command_timing.jpg
Binary file ruby/figures/mc_addr_command_timing.jpg has changed
diff -r 9125ffaa8bfc -r 9e9db0c974e3 
ruby/figures/mc_addr_command_timing_back_to_back.jpg
Binary file ruby/figures/mc_addr_command_timing_back_to_back.jpg has changed
diff -r 9125ffaa8bfc -r 9e9db0c974e3 ruby/figures/mc_data_struct.jpg
Binary file ruby/figures/mc_data_struct.jpg has changed
diff -r 9125ffaa8bfc -r 9e9db0c974e3 ruby/figures/mc_overview.jpg
Binary file ruby/figures/mc_overview.jpg has changed
diff -r 9125ffaa8bfc -r 9e9db0c974e3 ruby/sources/mc_addr_command_timing.ppt
Binary file ruby/sources/mc_addr_command_timing.ppt has changed
diff -r 9125ffaa8bfc -r 9e9db0c974e3 
ruby/sources/mc_addr_command_timing_back_to_back.ppt
Binary file ruby/sources/mc_addr_command_timing_back_to_back.ppt has changed
diff -r 9125ffaa8bfc -r 9e9db0c974e3 ruby/sources/mc_data_struct.ppt
Binary file ruby/sources/mc_data_struct.ppt has changed
diff -r 9125ffaa8bfc -r 9e9db0c974e3 ruby/sources/mc_overview.doc
Binary file ruby/sources/mc_overview.doc has changed
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] changeset in web-graphics: Ruby: Added FSM diagrams for MI_examp...

2011-04-06 Thread Rathijit Sen
changeset 3a5726a3e1da in /z/repo/web-graphics
details: web-graphics?cmd=changeset;node=3a5726a3e1da
description:
Ruby: Added FSM diagrams for MI_example cache coherence protocol.

diffstat:

 ruby/figures/MI_example_cache_FSM.jpg |0 
 ruby/figures/MI_example_dir_FSM.jpg   |0 
 ruby/sources/MI_example_cache_FSM.ppt |0 
 ruby/sources/MI_example_dir_FSM.ppt   |0 
 4 files changed, 0 insertions(+), 0 deletions(-)

diffs (8 lines):

diff -r 9e9db0c974e3 -r 3a5726a3e1da ruby/figures/MI_example_cache_FSM.jpg
Binary file ruby/figures/MI_example_cache_FSM.jpg has changed
diff -r 9e9db0c974e3 -r 3a5726a3e1da ruby/figures/MI_example_dir_FSM.jpg
Binary file ruby/figures/MI_example_dir_FSM.jpg has changed
diff -r 9e9db0c974e3 -r 3a5726a3e1da ruby/sources/MI_example_cache_FSM.ppt
Binary file ruby/sources/MI_example_cache_FSM.ppt has changed
diff -r 9e9db0c974e3 -r 3a5726a3e1da ruby/sources/MI_example_dir_FSM.ppt
Binary file ruby/sources/MI_example_dir_FSM.ppt has changed
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] changeset in web-graphics: Ruby: Added FSM diagrams for MOESI_CM...

2011-04-06 Thread Rathijit Sen
changeset 01b8bcdb3a1c in /z/repo/web-graphics
details: web-graphics?cmd=changeset;node=01b8bcdb3a1c
description:
Ruby: Added FSM diagrams for MOESI_CMP_directory cache coherence 
protocol.

diffstat:

 ruby/figures/MOESI_CMP_directory_L1cache_FSM.jpg|0 
 ruby/figures/MOESI_CMP_directory_L1cache_optim_FSM.jpg  |0 
 ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_1.jpg |0 
 ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_2.jpg |0 
 ruby/figures/MOESI_CMP_directory_dir_FSM.jpg|0 
 ruby/sources/MOESI_CMP_directory_L1cache_FSM.ppt|0 
 ruby/sources/MOESI_CMP_directory_L1cache_optim_FSM.ppt  |0 
 ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_1.ppt |0 
 ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_2.ppt |0 
 ruby/sources/MOESI_CMP_directory_dir_FSM.ppt|0 
 10 files changed, 0 insertions(+), 0 deletions(-)

diffs (20 lines):

diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/figures/MOESI_CMP_directory_L1cache_FSM.jpg
Binary file ruby/figures/MOESI_CMP_directory_L1cache_FSM.jpg has changed
diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/figures/MOESI_CMP_directory_L1cache_optim_FSM.jpg
Binary file ruby/figures/MOESI_CMP_directory_L1cache_optim_FSM.jpg has changed
diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_1.jpg
Binary file ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_1.jpg has changed
diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_2.jpg
Binary file ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_2.jpg has changed
diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/figures/MOESI_CMP_directory_dir_FSM.jpg
Binary file ruby/figures/MOESI_CMP_directory_dir_FSM.jpg has changed
diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/sources/MOESI_CMP_directory_L1cache_FSM.ppt
Binary file ruby/sources/MOESI_CMP_directory_L1cache_FSM.ppt has changed
diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/sources/MOESI_CMP_directory_L1cache_optim_FSM.ppt
Binary file ruby/sources/MOESI_CMP_directory_L1cache_optim_FSM.ppt has changed
diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_1.ppt
Binary file ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_1.ppt has changed
diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_2.ppt
Binary file ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_2.ppt has changed
diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/sources/MOESI_CMP_directory_dir_FSM.ppt
Binary file ruby/sources/MOESI_CMP_directory_dir_FSM.ppt has changed
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] changeset in web-graphics: Ruby: Added high-level figure for SLICC.

2011-04-06 Thread Rathijit Sen
changeset 7555e9135731 in /z/repo/web-graphics
details: web-graphics?cmd=changeset;node=7555e9135731
description:
Ruby: Added high-level figure for SLICC.
This was taken from the GEMS tutorial in ISCA 2005.

diffstat:

 ruby/figures/slicc_overview.jpg |0 
 ruby/sources/slicc_overview.ppt |0 
 2 files changed, 0 insertions(+), 0 deletions(-)

diffs (4 lines):

diff -r 01b8bcdb3a1c -r 7555e9135731 ruby/figures/slicc_overview.jpg
Binary file ruby/figures/slicc_overview.jpg has changed
diff -r 01b8bcdb3a1c -r 7555e9135731 ruby/sources/slicc_overview.ppt
Binary file ruby/sources/slicc_overview.ppt has changed
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] changeset in web-graphics: Ruby: Added FSM diagrams for MOESI_CM...

2011-04-06 Thread Rathijit Sen
Sorry for exceeding 65 characters in the first comment line on some 
recent check-ins. I didn't notice this early.


-Rathijit


On 04/06/2011 05:49 PM, Rathijit Sen wrote:

changeset 01b8bcdb3a1c in /z/repo/web-graphics
details: web-graphics?cmd=changeset;node=01b8bcdb3a1c
description:
Ruby: Added FSM diagrams for MOESI_CMP_directory cache coherence 
protocol.

diffstat:

  ruby/figures/MOESI_CMP_directory_L1cache_FSM.jpg|0
  ruby/figures/MOESI_CMP_directory_L1cache_optim_FSM.jpg  |0
  ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_1.jpg |0
  ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_2.jpg |0
  ruby/figures/MOESI_CMP_directory_dir_FSM.jpg|0
  ruby/sources/MOESI_CMP_directory_L1cache_FSM.ppt|0
  ruby/sources/MOESI_CMP_directory_L1cache_optim_FSM.ppt  |0
  ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_1.ppt |0
  ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_2.ppt |0
  ruby/sources/MOESI_CMP_directory_dir_FSM.ppt|0
  10 files changed, 0 insertions(+), 0 deletions(-)

diffs (20 lines):

diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/figures/MOESI_CMP_directory_L1cache_FSM.jpg
Binary file ruby/figures/MOESI_CMP_directory_L1cache_FSM.jpg has changed
diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/figures/MOESI_CMP_directory_L1cache_optim_FSM.jpg
Binary file ruby/figures/MOESI_CMP_directory_L1cache_optim_FSM.jpg has changed
diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_1.jpg
Binary file ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_1.jpg has changed
diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_2.jpg
Binary file ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_2.jpg has changed
diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/figures/MOESI_CMP_directory_dir_FSM.jpg
Binary file ruby/figures/MOESI_CMP_directory_dir_FSM.jpg has changed
diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/sources/MOESI_CMP_directory_L1cache_FSM.ppt
Binary file ruby/sources/MOESI_CMP_directory_L1cache_FSM.ppt has changed
diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/sources/MOESI_CMP_directory_L1cache_optim_FSM.ppt
Binary file ruby/sources/MOESI_CMP_directory_L1cache_optim_FSM.ppt has changed
diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_1.ppt
Binary file ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_1.ppt has changed
diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_2.ppt
Binary file ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_2.ppt has changed
diff -r 3a5726a3e1da -r 01b8bcdb3a1c 
ruby/sources/MOESI_CMP_directory_dir_FSM.ppt
Binary file ruby/sources/MOESI_CMP_directory_dir_FSM.ppt has changed
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Running Ruby w/32 Cores

2011-04-06 Thread Korey Sewell
A few comments:
(1) Using uint64_t seems like a quick, interim solution. But I still
haven't grasped why we have the 31st bit problem, but we don't have
the 63rd bit problem as well?

(2) Adding the stl::bitset seems like a good idea (does the Flags in
M5 use that?) but it wont be a straightforward switch because the Set
class supports arbitrary size sets. If it was implemented it would
take a little bit of effort but not too much.

(3) I didnt say this earlier, but it does look like this code could
use some optimization. From the gprof I ran on 2-8 cores, this
Set::count() function is the 2nd or 3rd highest producer of time for
the Ruby Fft runs (although still a very small overall % in system
time). Looks like simple optimizations like only looping for the set
size in the count() function should be helpful, instead of always
looping for the complete length of long datatype:
 for (int j = 0; j  LONG_BITS; j++) {
if ((m_p_nArray[i]  mask) != 0) {
  counter++;
}
   mask = mask  1;
 }

That as well as generating a mask, shifting and comparing each bit
doesn't seem necessary given we can potentially use a bitset or a
constant-time struct to loop over and check set inclusion.

On Wed, Apr 6, 2011 at 5:12 PM, Nilay Vaish ni...@cs.wisc.edu wrote:
 I believe even popcount is portable. I am not opposed to using bitset, just
 that it would probably require lot more changes.

 --
 Nilay

 On Wed, 6 Apr 2011, Ali Saidi wrote:

 stl::bitset does these type of optimizations underneath and it's portable.

 Ali

 On Wed, 6 Apr 2011 15:57:37 -0500 (CDT), Nilay Vaish ni...@cs.wisc.edu
 wrote:

 I would prefer we make use of GCC builtin __builtin_popcount() for
 counting the number of 1's in an int or related data type.

 Nilay

 On Wed, 6 Apr 2011, Ali Saidi wrote:

 And actually, couldn't you use an stl bitset for this?

 Thanks,
 Ali

 On Wed, 06 Apr 2011 15:34:01 -0500, Ali Saidi sa...@umich.edu wrote:

 Jumping in somewhat randomly here, uint64_t even on a 32bit machine
 is reasonably fast. It's not going to be as fast, but it will be
 correct. My vote would be to just switch all that Set code that uses
 long to explicitly use uint64_t and if it's slower on a 32bit machine
 so be it. At least it's correct.
 Ali


 On Wed, 6 Apr 2011 15:24:24 -0500, Beckmann, Brad
 brad.beckm...@amd.com wrote:

 Hi Korey,
 Yes, let's move this conversation back to m5-dev, since I think
 others may be interested and could help.
 I don't know what the problem is exactly, but at some point of time
 (probably back in the early GEMS days) I seem to remember the Set code
 included an assertion check about the 31st bit in 32-bit mode.
 Therefore, I think we knew about this problem and made sure that never
 happened.  I believe that is why we used to have a restriction that
 Ruby could only support 16 processors.  I'm really fuzzy on the
 details...maybe someone else can elaborate.
 In the end, I just want to make sure we add something in the code
 that makes sure we don't encounter this problem again.  This is one of
 those bugs that can take a while to track down, if you don't catch it
 right when it happens with an assertion.
 Brad


 From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On
 Behalf Of Korey Sewell
 Sent: Tuesday, April 05, 2011 7:14 AM
 To: Beckmann, Brad
 Subject: Re: [m5-dev] Running Ruby w/32 Cores
 Hi again Brad,
 I looked this over again and although my 32-bit patch fixes things,
 now that I look at it again, I'm not convinced that I actually fixed
 the symptom of the bug but rather the cause of the bug.
 Do you happen to know what are the problems with the 32-bit Set
 counts?
 Sorry for prolonging the issue, but I thought I had put this to bed
 but  maybe not. Finally, it may not matter that this works on 32-bit
 machines but it'd be nice if it did. (Let me know if I should move
 this convo to the m5-dev list)
 I end up checking the last bit in the count function manually (the
 code as follows):
 int
 Set::count() const
 {
    int counter = 0;
    long mask;

    for (int i = 0; i  m_nArrayLen; i++) {
        mask = (long)0x01;

        for (int j = 0; j  LONG_BITS; j++) {
            // FIXME - significant performance loss when array
            // population  LONG_BITS
            if ((m_p_nArray[i]  mask) != 0) {
                counter++;
            }
            mask = mask  1;
        }
 #ifndef _LP64
        long msb_mask = 0x8000;
        if ((m_p_nArray[i]  msb_mask) != 0) {
            counter++;
        }
 #endif
    }

    return counter;
 }

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev




-- 
- Korey

Re: [m5-dev] Running Ruby w/32 Cores

2011-04-06 Thread Nilay Vaish

On Wed, 6 Apr 2011, Korey Sewell wrote:


A few comments:
(1) Using uint64_t seems like a quick, interim solution. But I still
haven't grasped why we have the 31st bit problem, but we don't have
the 63rd bit problem as well?


I think if you use unsigned long, in place of long, the code would work on 
32-bit machines. I am uncertain why the current code works on 64-bit 
machine. I think long means 32-bit, irrespective of memory address length.




(2) Adding the stl::bitset seems like a good idea (does the Flags in
M5 use that?) but it wont be a straightforward switch because the Set
class supports arbitrary size sets. If it was implemented it would
take a little bit of effort but not too much.

(3) I didnt say this earlier, but it does look like this code could
use some optimization. From the gprof I ran on 2-8 cores, this
Set::count() function is the 2nd or 3rd highest producer of time for
the Ruby Fft runs (although still a very small overall % in system
time). Looks like simple optimizations like only looping for the set
size in the count() function should be helpful, instead of always
looping for the complete length of long datatype:
for (int j = 0; j  LONG_BITS; j++) {
   if ((m_p_nArray[i]  mask) != 0) {
 counter++;
   }
  mask = mask  1;
}

That as well as generating a mask, shifting and comparing each bit
doesn't seem necessary given we can potentially use a bitset or a
constant-time struct to loop over and check set inclusion.


I would still root for using popcount() builtin available with GCC.


--
Nilay
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] changeset in web-graphics: Ruby: Adding figure for common networ...

2011-04-06 Thread Rathijit Sen
changeset 1f62f3ea6275 in /z/repo/web-graphics
details: web-graphics?cmd=changeset;node=1f62f3ea6275
description:
Ruby: Adding figure for common network topologies.
Individual components of the figure were taken from
the GEMS tutorial in ISCA 2005.

diffstat:

 ruby/figures/Topology_overview.jpg |0 
 ruby/sources/Topology_overview.doc |0 
 2 files changed, 0 insertions(+), 0 deletions(-)

diffs (4 lines):

diff -r 7555e9135731 -r 1f62f3ea6275 ruby/figures/Topology_overview.jpg
Binary file ruby/figures/Topology_overview.jpg has changed
diff -r 7555e9135731 -r 1f62f3ea6275 ruby/sources/Topology_overview.doc
Binary file ruby/sources/Topology_overview.doc has changed
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Running Ruby w/32 Cores

2011-04-06 Thread Ali Saidi

On Apr 6, 2011, at 6:17 PM, Korey Sewell wrote:

 A few comments:
 (1) Using uint64_t seems like a quick, interim solution. But I still
 haven't grasped why we have the 31st bit problem, but we don't have
 the 63rd bit problem as well?
 
 (2) Adding the stl::bitset seems like a good idea (does the Flags in
 M5 use that?) but it wont be a straightforward switch because the Set
 class supports arbitrary size sets. If it was implemented it would
 take a little bit of effort but not too much.

The functional units, instruction flags and packet flags use it. Trace flags 
doesn't.

bitset supports arbitrarily sized sets too, you just have to declare the max 
size at construction (although there is a performance benefit to being less 
than the machine word length, it all still works if you're not).  Additionally, 
bitset seem to support most if not all of the operations (intersection, union, 
count, zero, etc) that Set does, although they have different names.
 
 (3) I didnt say this earlier, but it does look like this code could
 use some optimization. From the gprof I ran on 2-8 cores, this
 Set::count() function is the 2nd or 3rd highest producer of time for
 the Ruby Fft runs (although still a very small overall % in system
 time). Looks like simple optimizations like only looping for the set
 size in the count() function should be helpful, instead of always
 looping for the complete length of long datatype:
 for (int j = 0; j  LONG_BITS; j++) {
if ((m_p_nArray[i]  mask) != 0) {
  counter++;
}
   mask = mask  1;
 }
 
 That as well as generating a mask, shifting and comparing each bit
 doesn't seem necessary given we can potentially use a bitset or a
 constant-time struct to loop over and check set inclusion.
You can also do it with a constant time count of the number of bits that is set 
that is updated whenever something is changed. However, I don't think there is 
any reason to try and optimize a bespoke implementation of a bitset. The STL is 
going to be faster and will improve for free over time while this 
implementation won't. For example, bitset also uses count leading zeros where 
available to speed up finding the first set bit. 


Ali


 On Wed, Apr 6, 2011 at 5:12 PM, Nilay Vaish ni...@cs.wisc.edu wrote:
 I believe even popcount is portable. I am not opposed to using bitset, just
 that it would probably require lot more changes.
 
 --
 Nilay
 
 On Wed, 6 Apr 2011, Ali Saidi wrote:
 
 stl::bitset does these type of optimizations underneath and it's portable.
 
 Ali
 
 On Wed, 6 Apr 2011 15:57:37 -0500 (CDT), Nilay Vaish ni...@cs.wisc.edu
 wrote:
 
 I would prefer we make use of GCC builtin __builtin_popcount() for
 counting the number of 1's in an int or related data type.
 
 Nilay
 
 On Wed, 6 Apr 2011, Ali Saidi wrote:
 
 And actually, couldn't you use an stl bitset for this?
 
 Thanks,
 Ali
 
 On Wed, 06 Apr 2011 15:34:01 -0500, Ali Saidi sa...@umich.edu wrote:
 
 Jumping in somewhat randomly here, uint64_t even on a 32bit machine
 is reasonably fast. It's not going to be as fast, but it will be
 correct. My vote would be to just switch all that Set code that uses
 long to explicitly use uint64_t and if it's slower on a 32bit machine
 so be it. At least it's correct.
 Ali
 
 
 On Wed, 6 Apr 2011 15:24:24 -0500, Beckmann, Brad
 brad.beckm...@amd.com wrote:
 
 Hi Korey,
 Yes, let's move this conversation back to m5-dev, since I think
 others may be interested and could help.
 I don't know what the problem is exactly, but at some point of time
 (probably back in the early GEMS days) I seem to remember the Set code
 included an assertion check about the 31st bit in 32-bit mode.
 Therefore, I think we knew about this problem and made sure that never
 happened.  I believe that is why we used to have a restriction that
 Ruby could only support 16 processors.  I'm really fuzzy on the
 details...maybe someone else can elaborate.
 In the end, I just want to make sure we add something in the code
 that makes sure we don't encounter this problem again.  This is one of
 those bugs that can take a while to track down, if you don't catch it
 right when it happens with an assertion.
 Brad
 
 
 From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On
 Behalf Of Korey Sewell
 Sent: Tuesday, April 05, 2011 7:14 AM
 To: Beckmann, Brad
 Subject: Re: [m5-dev] Running Ruby w/32 Cores
 Hi again Brad,
 I looked this over again and although my 32-bit patch fixes things,
 now that I look at it again, I'm not convinced that I actually fixed
 the symptom of the bug but rather the cause of the bug.
 Do you happen to know what are the problems with the 32-bit Set
 counts?
 Sorry for prolonging the issue, but I thought I had put this to bed
 but  maybe not. Finally, it may not matter that this works on 32-bit
 machines but it'd be nice if it did. (Let me know if I should move
 this convo to the m5-dev list)
 I end up checking the last bit in the count function manually (the
 code as follows):
 int
 

Re: [m5-dev] Running Ruby w/32 Cores

2011-04-06 Thread Gabriel Michael Black
When you say this is portable, what do you mean? Portable between  
compilers? We usually use gcc, but we have at least partial support  
for other compilers. I think this is necessary on some platforms.


Gabe



I would still root for using popcount() builtin available with GCC.


--
Nilay
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev





___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Running Ruby w/32 Cores

2011-04-06 Thread Korey Sewell
Hi Ali,
My only problem with stl::bitset here is that the Set type from Ruby
seems to have the option to be resizable (through the overloaded
assignment operator). That's what I meant by arbitrary length.

In practice, I'm not sure if they ever assign sets of different
lengths to each other (causing resizing), but if they do, then that
would suggest that using the stl::bitset isnt a straightforward thing
(definitely do-able though, just not plug/play).

If the resizing is just a unused feature of Ruby, then I would
suggest we switch to bitset.

-- 
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev