[m5-dev] Cron m5test@zizzer /z/m5/regression/do-regression quick
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/inorder-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/o3-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/o3-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/01.hello-2T-smt/alpha/linux/o3-timing passed. * build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/30.eio-mp/alpha/eio/simple-atomic-mp passed. * build/ALPHA_SE/tests/fast/quick/30.eio-mp/alpha/eio/simple-timing-mp passed. * build/ALPHA_SE/tests/fast/quick/50.memtest/alpha/linux/memtest passed. * build/ALPHA_SE/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby passed. * build/ALPHA_SE/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_hammer passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_hammer passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_hammer passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_hammer passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MESI_CMP_directory passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MESI_CMP_directory passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MESI_CMP_directory passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_token passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_token passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_token passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_token passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic-dual passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing-dual passed. * build/ALPHA_FS/tests/fast/quick/80.netperf-stream/alpha/linux/twosys-tsunami-simple-atomic passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/inorder-timing passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/o3-timing passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-atomic passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-timing passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-timing-ruby passed. * build/POWER_SE/tests/fast/quick/00.hello/power/linux/o3-timing passed. * build/POWER_SE/tests/fast/quick/00.hello/power/linux/simple-atomic passed. * build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-atomic passed. * build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-timing passed. * build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-timing-ruby passed. * build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/o3-timing passed. * build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/simple-timing passed. * build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/simple-atomic passed. * build/SPARC_SE/tests/fast/quick/40.m5threads-test-atomic/sparc/linux/simple-atomic-mp passed. *
Re: [m5-dev] Cron m5test@zizzer /z/m5/regression/do-regression --scratch all
It looks like it was this change which was directly after the one I pointed out before. changeset: 8134:b01a51ff05fa user:Ali Saidi ali.sa...@arm.com date:Thu Mar 17 19:20:19 2011 -0500 summary: Mem: Fix issue with dirty block being lost when entire block transferred to non-cache. Could you take a look, Ali? The description doesn't necessarily sound like something you'd expect to change the stats (it sounds like a corner case), but I'm assuming you'll know. Gabe On 04/03/11 19:51, Gabe Black wrote: Does anyone have any ideas about when X86_SE parser stopped working? The last time it passed for sure was the end of February, but on March 16th Ali updated the stats and so it was presumably working then too. I'm running at that changeset right now to confirm that. There weren't any X86 specific changes recently, but there were a few O3 ones which might have changed the stats. The output is below, and you can see the biggest change percentage wise was icache writebacks. Most of the changes are related to memory somehow. After the stats is info about a change that may have caused the problem. = Statistics differences = Maximum error magnitude: +133.33% Reference New Value Abs Diff Pct Chg Key statistics: host_inst_rate 189714 148502 -41212 -21.72% host_mem_usage 264736 268256 3520+1.33% sim_insts 1527476062 15289887561512694+0.10% sim_ticks 610952992000 612245337000 1292345000 +0.21% system.cpu.commit.COM:count1527476062 15289887561512694+0.10% Differences 0%: system.cpu.icache.writebacks3 7 4 +133.33% system.cpu.rename.RENAME:serializeStallCycles 19936 16025 -3911 -19.62% system.cpu.l2cache.occ_%::0 0.213694 0.236362 0.022668 +10.61% system.cpu.l2cache.occ_blocks::0 7002.339473 7745.103692 742.764219 +10.61% system.cpu.rename.RENAME:tempSerializingInsts 2561 2314 -247-9.64% system.cpu.icache.ReadReq_mshr_hits 1570 1427 -143-9.11% system.cpu.icache.demand_mshr_hits 1570 1427 -143 -9.11% system.cpu.icache.overall_mshr_hits 1570 1427 -143-9.11% system.cpu.rename.RENAME:serializingInsts 2550 2345 -205-8.04% system.cpu.l2cache.ReadReq_misses 316709 339091 22382 +7.07% system.cpu.l2cache.ReadReq_mshr_misses 316709 339091 22382+7.07% system.cpu.l2cache.ReadReq_mshr_miss_latency 9818903000 10512799000 693896000+7.07% system.cpu.l2cache.ReadReq_miss_latency 10822415500 11584355000 761939500+7.04% system.cpu.dcache.ReadReq_mshr_miss_latency 14062264500 14863694500 80143+5.70% system.cpu.l2cache.ReadReq_miss_rate 0.182786 0.192941 0.010155+5.56% system.cpu.l2cache.ReadReq_mshr_miss_rate 0.182786 0.192941 0.010155+5.56% system.cpu.idleCycles24586339 256777931091454+4.44% system.cpu.dcache.ReadReq_avg_mshr_miss_latency 8150.695480 8493.787248 343.091768+4.21% system.cpu.l2cache.replacements 553099 575827 22728+4.11% system.cpu.l2cache.demand_mshr_miss_latency 17475146000 18186565000 711419000+4.07% [... showing top 20 errors only, additional errors omitted ...] * build/X86_SE/tests/fast/long/20.parser/x86/linux/o3-timing FAILED! changeset 9f704aa10eb4 in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=9f704aa10eb4 description: O3: Fix unaligned stores when cache blocked Without this change the a store can be issued to the cache multiple times. If this case occurs when the l1 cache is out of mshrs (and thus blocked) the processor will never make forward progress because each cycle it will send a single request using the recently freed mshr and not completing the multipart store. This will continue forever. diffstat: src/cpu/o3/lsq_unit_impl.hh | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diffs (14 lines): diff -r 2af262e73961 -r 9f704aa10eb4 src/cpu/o3/lsq_unit_impl.hh --- a/src/cpu/o3/lsq_unit_impl.hh Thu Mar 17 00:43:54 2011 -0400 +++ b/src/cpu/o3/lsq_unit_impl.hh Thu Mar 17 19:20:19 2011 -0500 @@ -1103,7 +1103,9 @@ dynamic_castLSQSenderState *(retryPkt-senderState); // Don't finish the store unless this is the last packet. -if (!TheISA::HasUnalignedMemAcc || !state-pktToSend) { +if (!TheISA::HasUnalignedMemAcc || !state-pktToSend || +state-pendingPacket == retryPkt) { +state-pktToSend = false; storePostSend(retryPkt); } retryPkt = NULL;
Re: [m5-dev] Running Ruby w/32 Cores
Hi Korey, Yes, let's move this conversation back to m5-dev, since I think others may be interested and could help. I don't know what the problem is exactly, but at some point of time (probably back in the early GEMS days) I seem to remember the Set code included an assertion check about the 31st bit in 32-bit mode. Therefore, I think we knew about this problem and made sure that never happened. I believe that is why we used to have a restriction that Ruby could only support 16 processors. I'm really fuzzy on the details...maybe someone else can elaborate. In the end, I just want to make sure we add something in the code that makes sure we don't encounter this problem again. This is one of those bugs that can take a while to track down, if you don't catch it right when it happens with an assertion. Brad From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On Behalf Of Korey Sewell Sent: Tuesday, April 05, 2011 7:14 AM To: Beckmann, Brad Subject: Re: [m5-dev] Running Ruby w/32 Cores Hi again Brad, I looked this over again and although my 32-bit patch fixes things, now that I look at it again, I'm not convinced that I actually fixed the symptom of the bug but rather the cause of the bug. Do you happen to know what are the problems with the 32-bit Set counts? Sorry for prolonging the issue, but I thought I had put this to bed but maybe not. Finally, it may not matter that this works on 32-bit machines but it'd be nice if it did. (Let me know if I should move this convo to the m5-dev list) I end up checking the last bit in the count function manually (the code as follows): int Set::count() const { int counter = 0; long mask; for (int i = 0; i m_nArrayLen; i++) { mask = (long)0x01; for (int j = 0; j LONG_BITS; j++) { // FIXME - significant performance loss when array // population LONG_BITS if ((m_p_nArray[i] mask) != 0) { counter++; } mask = mask 1; } #ifndef _LP64 long msb_mask = 0x8000; if ((m_p_nArray[i] msb_mask) != 0) { counter++; } #endif } return counter; } On Tue, Apr 5, 2011 at 1:30 AM, Korey Sewell ksew...@umich.edumailto:ksew...@umich.edu wrote: Brad, it looks like you were right on the money here. I found the spot where it was returning the wrong value via a SLICC function to count sharers for everyone except the owner. I realized that the machine that I use for testing is just a 32-bit machine, and like you warned there look to be issues with the Set type there. I ran the Fft-32 cores on a 64-bit machine and it seems to work correctly. I'll be running on the full splash/parsec suites soon and that should stress Ruby a good bit :). I have a patch that checks to see if _LP64 is defined, and if not check that last bit when doing the set count function. Thanks for being helpful in debugging. It was a relatively easy bug, but as always going through code and becoming more proficient at getting around while trying to solve a bug is really helpful. On Fri, Apr 1, 2011 at 7:28 PM, Beckmann, Brad brad.beckm...@amd.commailto:brad.beckm...@amd.com wrote: Ok for the first trace, the critical line is the following: 348523 0L2Cache L1_GETX ILOSXIFLXO [0x16180, line 0x16180] [NetDest (4) 0 - 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 - 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - ]30 L2Cache identifies that 31 caches have a shared copy and that L1 cache 9 (L1-9) is the owner. When L1Cache 0 (L1-0) issues a GETX, the L2Cache issues 30 Inv probes, forwards the GETX to L1-9, and sends an ack to L1-0 itself. However, the L2 cache tells L1-0 to expect only 30 acks instead of 31. It could be something wrong with the NetDest::count() function, or the Set::count() function? I slightly modified my previous patch to isolate on what value the NetDest::count() function is returning. If it is returning 30, instead of 31, then it must be a problem with NetDest. You are compiling gem5 as a 64-bit binary, right? The second problem is essentially the same issue. L2Cache 31 (L2-31) is the owner of the block, but I suspect NetDest is not counting bit 31 and thus it is returning a count of 0...causing the error. Overall, concentrate on that NetDest::count function, or more importantly the Set::count() function. Once you find out the problem, please let me know. Thanks, Brad From: koreylsew...@gmail.commailto:koreylsew...@gmail.com [mailto:koreylsew...@gmail.commailto:koreylsew...@gmail.com] On Behalf Of Korey Sewell Sent: Friday, April 01, 2011 12:00 PM To: Beckmann, Brad Subject: Re: [m5-dev] Running Ruby w/32 Cores Brad, attached are the protocol traces grep'd for the offending addresses. I'm going to spend the weekend digging through Ruby code so hopefully I'm pretty close to generating the fixes myself.
Re: [m5-dev] Running Ruby w/32 Cores
Jumping in somewhat randomly here, uint64_t even on a 32bit machine is reasonably fast. It's not going to be as fast, but it will be correct. My vote would be to just switch all that Set code that uses long to explicitly use uint64_t and if it's slower on a 32bit machine so be it. At least it's correct. Ali On Wed, 6 Apr 2011 15:24:24 -0500, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Korey, Yes, let's move this conversation back to m5-dev, since I think others may be interested and could help. I don't know what the problem is exactly, but at some point of time (probably back in the early GEMS days) I seem to remember the Set code included an assertion check about the 31st bit in 32-bit mode. Therefore, I think we knew about this problem and made sure that never happened. I believe that is why we used to have a restriction that Ruby could only support 16 processors. I'm really fuzzy on the details...maybe someone else can elaborate. In the end, I just want to make sure we add something in the code that makes sure we don't encounter this problem again. This is one of those bugs that can take a while to track down, if you don't catch it right when it happens with an assertion. Brad From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On Behalf Of Korey Sewell Sent: Tuesday, April 05, 2011 7:14 AM To: Beckmann, Brad Subject: Re: [m5-dev] Running Ruby w/32 Cores Hi again Brad, I looked this over again and although my 32-bit patch fixes things, now that I look at it again, I'm not convinced that I actually fixed the symptom of the bug but rather the cause of the bug. Do you happen to know what are the problems with the 32-bit Set counts? Sorry for prolonging the issue, but I thought I had put this to bed but maybe not. Finally, it may not matter that this works on 32-bit machines but it'd be nice if it did. (Let me know if I should move this convo to the m5-dev list) I end up checking the last bit in the count function manually (the code as follows): int Set::count() const { int counter = 0; long mask; for (int i = 0; i m_nArrayLen; i++) { mask = (long)0x01; for (int j = 0; j LONG_BITS; j++) { // FIXME - significant performance loss when array // population LONG_BITS if ((m_p_nArray[i] mask) != 0) { counter++; } mask = mask 1; } #ifndef _LP64 long msb_mask = 0x8000; if ((m_p_nArray[i] msb_mask) != 0) { counter++; } #endif } return counter; } On Tue, Apr 5, 2011 at 1:30 AM, Korey Sewell ksew...@umich.edumailto:ksew...@umich.edu wrote: Brad, it looks like you were right on the money here. I found the spot where it was returning the wrong value via a SLICC function to count sharers for everyone except the owner. I realized that the machine that I use for testing is just a 32-bit machine, and like you warned there look to be issues with the Set type there. I ran the Fft-32 cores on a 64-bit machine and it seems to work correctly. I'll be running on the full splash/parsec suites soon and that should stress Ruby a good bit :). I have a patch that checks to see if _LP64 is defined, and if not check that last bit when doing the set count function. Thanks for being helpful in debugging. It was a relatively easy bug, but as always going through code and becoming more proficient at getting around while trying to solve a bug is really helpful. On Fri, Apr 1, 2011 at 7:28 PM, Beckmann, Brad brad.beckm...@amd.commailto:brad.beckm...@amd.com wrote: Ok for the first trace, the critical line is the following: 348523 0L2Cache L1_GETX ILOSXIFLXO [0x16180, line 0x16180] [NetDest (4) 0 - 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 - 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - ]30 L2Cache identifies that 31 caches have a shared copy and that L1 cache 9 (L1-9) is the owner. When L1Cache 0 (L1-0) issues a GETX, the L2Cache issues 30 Inv probes, forwards the GETX to L1-9, and sends an ack to L1-0 itself. However, the L2 cache tells L1-0 to expect only 30 acks instead of 31. It could be something wrong with the NetDest::count() function, or the Set::count() function? I slightly modified my previous patch to isolate on what value the NetDest::count() function is returning. If it is returning 30, instead of 31, then it must be a problem with NetDest. You are compiling gem5 as a 64-bit binary, right? The second problem is essentially the same issue. L2Cache 31 (L2-31) is the owner of the block, but I suspect NetDest is not counting bit 31 and thus it is returning a count of 0...causing the error. Overall, concentrate on that NetDest::count function, or more importantly the Set::count() function. Once you find out the problem, please let me know. Thanks, Brad From: koreylsew...@gmail.commailto:koreylsew...@gmail.com
Re: [m5-dev] Running Ruby w/32 Cores
stl::bitset does these type of optimizations underneath and it's portable. Ali On Wed, 6 Apr 2011 15:57:37 -0500 (CDT), Nilay Vaish ni...@cs.wisc.edu wrote: I would prefer we make use of GCC builtin __builtin_popcount() for counting the number of 1's in an int or related data type. Nilay On Wed, 6 Apr 2011, Ali Saidi wrote: And actually, couldn't you use an stl bitset for this? Thanks, Ali On Wed, 06 Apr 2011 15:34:01 -0500, Ali Saidi sa...@umich.edu wrote: Jumping in somewhat randomly here, uint64_t even on a 32bit machine is reasonably fast. It's not going to be as fast, but it will be correct. My vote would be to just switch all that Set code that uses long to explicitly use uint64_t and if it's slower on a 32bit machine so be it. At least it's correct. Ali On Wed, 6 Apr 2011 15:24:24 -0500, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Korey, Yes, let's move this conversation back to m5-dev, since I think others may be interested and could help. I don't know what the problem is exactly, but at some point of time (probably back in the early GEMS days) I seem to remember the Set code included an assertion check about the 31st bit in 32-bit mode. Therefore, I think we knew about this problem and made sure that never happened. I believe that is why we used to have a restriction that Ruby could only support 16 processors. I'm really fuzzy on the details...maybe someone else can elaborate. In the end, I just want to make sure we add something in the code that makes sure we don't encounter this problem again. This is one of those bugs that can take a while to track down, if you don't catch it right when it happens with an assertion. Brad From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On Behalf Of Korey Sewell Sent: Tuesday, April 05, 2011 7:14 AM To: Beckmann, Brad Subject: Re: [m5-dev] Running Ruby w/32 Cores Hi again Brad, I looked this over again and although my 32-bit patch fixes things, now that I look at it again, I'm not convinced that I actually fixed the symptom of the bug but rather the cause of the bug. Do you happen to know what are the problems with the 32-bit Set counts? Sorry for prolonging the issue, but I thought I had put this to bed but maybe not. Finally, it may not matter that this works on 32-bit machines but it'd be nice if it did. (Let me know if I should move this convo to the m5-dev list) I end up checking the last bit in the count function manually (the code as follows): int Set::count() const { int counter = 0; long mask; for (int i = 0; i m_nArrayLen; i++) { mask = (long)0x01; for (int j = 0; j LONG_BITS; j++) { // FIXME - significant performance loss when array // population LONG_BITS if ((m_p_nArray[i] mask) != 0) { counter++; } mask = mask 1; } #ifndef _LP64 long msb_mask = 0x8000; if ((m_p_nArray[i] msb_mask) != 0) { counter++; } #endif } return counter; } ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] changeset in m5: ruby: fixes to support more types of RubyRequests
changeset 02cb69e5cfeb in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=02cb69e5cfeb description: ruby: fixes to support more types of RubyRequests diffstat: src/mem/ruby/system/Sequencer.cc | 9 +++-- 1 files changed, 7 insertions(+), 2 deletions(-) diffs (40 lines): diff -r 54a65799e4c1 -r 02cb69e5cfeb src/mem/ruby/system/Sequencer.cc --- a/src/mem/ruby/system/Sequencer.cc Mon Apr 04 11:42:32 2011 -0500 +++ b/src/mem/ruby/system/Sequencer.cc Wed Apr 06 14:41:41 2011 -0700 @@ -229,6 +229,7 @@ Address line_addr(request-ruby_request.m_PhysicalAddress); line_addr.makeLineAddress(); if ((request-ruby_request.m_Type == RubyRequestType_ST) || +(request-ruby_request.m_Type == RubyRequestType_ATOMIC) || (request-ruby_request.m_Type == RubyRequestType_RMW_Read) || (request-ruby_request.m_Type == RubyRequestType_RMW_Write) || (request-ruby_request.m_Type == RubyRequestType_Load_Linked) || @@ -381,6 +382,7 @@ markRemoved(); assert((request-ruby_request.m_Type == RubyRequestType_ST) || + (request-ruby_request.m_Type == RubyRequestType_ATOMIC) || (request-ruby_request.m_Type == RubyRequestType_RMW_Read) || (request-ruby_request.m_Type == RubyRequestType_RMW_Write) || (request-ruby_request.m_Type == RubyRequestType_Load_Linked) || @@ -648,6 +650,7 @@ // case RubyRequestType_Load_Linked: case RubyRequestType_Store_Conditional: + case RubyRequestType_ATOMIC: ctype = RubyRequestType_ATOMIC; break; default: @@ -671,8 +674,10 @@ Address line_addr(request.m_PhysicalAddress); line_addr.makeLineAddress(); -int proc_id = request.pkt-req-hasContextId() ? -request.pkt-req-contextId() : -1; +int proc_id = -1; +if (request.pkt != NULL request.pkt-req-hasContextId()) { +proc_id = request.pkt-req-contextId(); +} RubyRequest *msg = new RubyRequest(request.m_PhysicalAddress.getAddress(), request.data, request.m_Size, request.m_ProgramCounter.getAddress(), ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] changeset in web-graphics: Ruby: Added figures for overview, dat...
changeset 9e9db0c974e3 in /z/repo/web-graphics details: web-graphics?cmd=changeset;node=9e9db0c974e3 description: Ruby: Added figures for overview, data structures and timing for the Memory Controller. The data structure and timing diagrams were adapted/taken from a presentation created by Andy Phelps in 2008. diffstat: ruby/figures/mc_addr_command_timing.jpg |0 ruby/figures/mc_addr_command_timing_back_to_back.jpg |0 ruby/figures/mc_data_struct.jpg |0 ruby/figures/mc_overview.jpg |0 ruby/sources/mc_addr_command_timing.ppt |0 ruby/sources/mc_addr_command_timing_back_to_back.ppt |0 ruby/sources/mc_data_struct.ppt |0 ruby/sources/mc_overview.doc |0 8 files changed, 0 insertions(+), 0 deletions(-) diffs (16 lines): diff -r 9125ffaa8bfc -r 9e9db0c974e3 ruby/figures/mc_addr_command_timing.jpg Binary file ruby/figures/mc_addr_command_timing.jpg has changed diff -r 9125ffaa8bfc -r 9e9db0c974e3 ruby/figures/mc_addr_command_timing_back_to_back.jpg Binary file ruby/figures/mc_addr_command_timing_back_to_back.jpg has changed diff -r 9125ffaa8bfc -r 9e9db0c974e3 ruby/figures/mc_data_struct.jpg Binary file ruby/figures/mc_data_struct.jpg has changed diff -r 9125ffaa8bfc -r 9e9db0c974e3 ruby/figures/mc_overview.jpg Binary file ruby/figures/mc_overview.jpg has changed diff -r 9125ffaa8bfc -r 9e9db0c974e3 ruby/sources/mc_addr_command_timing.ppt Binary file ruby/sources/mc_addr_command_timing.ppt has changed diff -r 9125ffaa8bfc -r 9e9db0c974e3 ruby/sources/mc_addr_command_timing_back_to_back.ppt Binary file ruby/sources/mc_addr_command_timing_back_to_back.ppt has changed diff -r 9125ffaa8bfc -r 9e9db0c974e3 ruby/sources/mc_data_struct.ppt Binary file ruby/sources/mc_data_struct.ppt has changed diff -r 9125ffaa8bfc -r 9e9db0c974e3 ruby/sources/mc_overview.doc Binary file ruby/sources/mc_overview.doc has changed ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] changeset in web-graphics: Ruby: Added FSM diagrams for MI_examp...
changeset 3a5726a3e1da in /z/repo/web-graphics details: web-graphics?cmd=changeset;node=3a5726a3e1da description: Ruby: Added FSM diagrams for MI_example cache coherence protocol. diffstat: ruby/figures/MI_example_cache_FSM.jpg |0 ruby/figures/MI_example_dir_FSM.jpg |0 ruby/sources/MI_example_cache_FSM.ppt |0 ruby/sources/MI_example_dir_FSM.ppt |0 4 files changed, 0 insertions(+), 0 deletions(-) diffs (8 lines): diff -r 9e9db0c974e3 -r 3a5726a3e1da ruby/figures/MI_example_cache_FSM.jpg Binary file ruby/figures/MI_example_cache_FSM.jpg has changed diff -r 9e9db0c974e3 -r 3a5726a3e1da ruby/figures/MI_example_dir_FSM.jpg Binary file ruby/figures/MI_example_dir_FSM.jpg has changed diff -r 9e9db0c974e3 -r 3a5726a3e1da ruby/sources/MI_example_cache_FSM.ppt Binary file ruby/sources/MI_example_cache_FSM.ppt has changed diff -r 9e9db0c974e3 -r 3a5726a3e1da ruby/sources/MI_example_dir_FSM.ppt Binary file ruby/sources/MI_example_dir_FSM.ppt has changed ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] changeset in web-graphics: Ruby: Added FSM diagrams for MOESI_CM...
changeset 01b8bcdb3a1c in /z/repo/web-graphics details: web-graphics?cmd=changeset;node=01b8bcdb3a1c description: Ruby: Added FSM diagrams for MOESI_CMP_directory cache coherence protocol. diffstat: ruby/figures/MOESI_CMP_directory_L1cache_FSM.jpg|0 ruby/figures/MOESI_CMP_directory_L1cache_optim_FSM.jpg |0 ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_1.jpg |0 ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_2.jpg |0 ruby/figures/MOESI_CMP_directory_dir_FSM.jpg|0 ruby/sources/MOESI_CMP_directory_L1cache_FSM.ppt|0 ruby/sources/MOESI_CMP_directory_L1cache_optim_FSM.ppt |0 ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_1.ppt |0 ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_2.ppt |0 ruby/sources/MOESI_CMP_directory_dir_FSM.ppt|0 10 files changed, 0 insertions(+), 0 deletions(-) diffs (20 lines): diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/figures/MOESI_CMP_directory_L1cache_FSM.jpg Binary file ruby/figures/MOESI_CMP_directory_L1cache_FSM.jpg has changed diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/figures/MOESI_CMP_directory_L1cache_optim_FSM.jpg Binary file ruby/figures/MOESI_CMP_directory_L1cache_optim_FSM.jpg has changed diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_1.jpg Binary file ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_1.jpg has changed diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_2.jpg Binary file ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_2.jpg has changed diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/figures/MOESI_CMP_directory_dir_FSM.jpg Binary file ruby/figures/MOESI_CMP_directory_dir_FSM.jpg has changed diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/sources/MOESI_CMP_directory_L1cache_FSM.ppt Binary file ruby/sources/MOESI_CMP_directory_L1cache_FSM.ppt has changed diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/sources/MOESI_CMP_directory_L1cache_optim_FSM.ppt Binary file ruby/sources/MOESI_CMP_directory_L1cache_optim_FSM.ppt has changed diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_1.ppt Binary file ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_1.ppt has changed diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_2.ppt Binary file ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_2.ppt has changed diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/sources/MOESI_CMP_directory_dir_FSM.ppt Binary file ruby/sources/MOESI_CMP_directory_dir_FSM.ppt has changed ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] changeset in web-graphics: Ruby: Added high-level figure for SLICC.
changeset 7555e9135731 in /z/repo/web-graphics details: web-graphics?cmd=changeset;node=7555e9135731 description: Ruby: Added high-level figure for SLICC. This was taken from the GEMS tutorial in ISCA 2005. diffstat: ruby/figures/slicc_overview.jpg |0 ruby/sources/slicc_overview.ppt |0 2 files changed, 0 insertions(+), 0 deletions(-) diffs (4 lines): diff -r 01b8bcdb3a1c -r 7555e9135731 ruby/figures/slicc_overview.jpg Binary file ruby/figures/slicc_overview.jpg has changed diff -r 01b8bcdb3a1c -r 7555e9135731 ruby/sources/slicc_overview.ppt Binary file ruby/sources/slicc_overview.ppt has changed ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] changeset in web-graphics: Ruby: Added FSM diagrams for MOESI_CM...
Sorry for exceeding 65 characters in the first comment line on some recent check-ins. I didn't notice this early. -Rathijit On 04/06/2011 05:49 PM, Rathijit Sen wrote: changeset 01b8bcdb3a1c in /z/repo/web-graphics details: web-graphics?cmd=changeset;node=01b8bcdb3a1c description: Ruby: Added FSM diagrams for MOESI_CMP_directory cache coherence protocol. diffstat: ruby/figures/MOESI_CMP_directory_L1cache_FSM.jpg|0 ruby/figures/MOESI_CMP_directory_L1cache_optim_FSM.jpg |0 ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_1.jpg |0 ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_2.jpg |0 ruby/figures/MOESI_CMP_directory_dir_FSM.jpg|0 ruby/sources/MOESI_CMP_directory_L1cache_FSM.ppt|0 ruby/sources/MOESI_CMP_directory_L1cache_optim_FSM.ppt |0 ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_1.ppt |0 ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_2.ppt |0 ruby/sources/MOESI_CMP_directory_dir_FSM.ppt|0 10 files changed, 0 insertions(+), 0 deletions(-) diffs (20 lines): diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/figures/MOESI_CMP_directory_L1cache_FSM.jpg Binary file ruby/figures/MOESI_CMP_directory_L1cache_FSM.jpg has changed diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/figures/MOESI_CMP_directory_L1cache_optim_FSM.jpg Binary file ruby/figures/MOESI_CMP_directory_L1cache_optim_FSM.jpg has changed diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_1.jpg Binary file ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_1.jpg has changed diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_2.jpg Binary file ruby/figures/MOESI_CMP_directory_L2cache_FSM_part_2.jpg has changed diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/figures/MOESI_CMP_directory_dir_FSM.jpg Binary file ruby/figures/MOESI_CMP_directory_dir_FSM.jpg has changed diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/sources/MOESI_CMP_directory_L1cache_FSM.ppt Binary file ruby/sources/MOESI_CMP_directory_L1cache_FSM.ppt has changed diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/sources/MOESI_CMP_directory_L1cache_optim_FSM.ppt Binary file ruby/sources/MOESI_CMP_directory_L1cache_optim_FSM.ppt has changed diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_1.ppt Binary file ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_1.ppt has changed diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_2.ppt Binary file ruby/sources/MOESI_CMP_directory_L2cache_FSM_part_2.ppt has changed diff -r 3a5726a3e1da -r 01b8bcdb3a1c ruby/sources/MOESI_CMP_directory_dir_FSM.ppt Binary file ruby/sources/MOESI_CMP_directory_dir_FSM.ppt has changed ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Running Ruby w/32 Cores
A few comments: (1) Using uint64_t seems like a quick, interim solution. But I still haven't grasped why we have the 31st bit problem, but we don't have the 63rd bit problem as well? (2) Adding the stl::bitset seems like a good idea (does the Flags in M5 use that?) but it wont be a straightforward switch because the Set class supports arbitrary size sets. If it was implemented it would take a little bit of effort but not too much. (3) I didnt say this earlier, but it does look like this code could use some optimization. From the gprof I ran on 2-8 cores, this Set::count() function is the 2nd or 3rd highest producer of time for the Ruby Fft runs (although still a very small overall % in system time). Looks like simple optimizations like only looping for the set size in the count() function should be helpful, instead of always looping for the complete length of long datatype: for (int j = 0; j LONG_BITS; j++) { if ((m_p_nArray[i] mask) != 0) { counter++; } mask = mask 1; } That as well as generating a mask, shifting and comparing each bit doesn't seem necessary given we can potentially use a bitset or a constant-time struct to loop over and check set inclusion. On Wed, Apr 6, 2011 at 5:12 PM, Nilay Vaish ni...@cs.wisc.edu wrote: I believe even popcount is portable. I am not opposed to using bitset, just that it would probably require lot more changes. -- Nilay On Wed, 6 Apr 2011, Ali Saidi wrote: stl::bitset does these type of optimizations underneath and it's portable. Ali On Wed, 6 Apr 2011 15:57:37 -0500 (CDT), Nilay Vaish ni...@cs.wisc.edu wrote: I would prefer we make use of GCC builtin __builtin_popcount() for counting the number of 1's in an int or related data type. Nilay On Wed, 6 Apr 2011, Ali Saidi wrote: And actually, couldn't you use an stl bitset for this? Thanks, Ali On Wed, 06 Apr 2011 15:34:01 -0500, Ali Saidi sa...@umich.edu wrote: Jumping in somewhat randomly here, uint64_t even on a 32bit machine is reasonably fast. It's not going to be as fast, but it will be correct. My vote would be to just switch all that Set code that uses long to explicitly use uint64_t and if it's slower on a 32bit machine so be it. At least it's correct. Ali On Wed, 6 Apr 2011 15:24:24 -0500, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Korey, Yes, let's move this conversation back to m5-dev, since I think others may be interested and could help. I don't know what the problem is exactly, but at some point of time (probably back in the early GEMS days) I seem to remember the Set code included an assertion check about the 31st bit in 32-bit mode. Therefore, I think we knew about this problem and made sure that never happened. I believe that is why we used to have a restriction that Ruby could only support 16 processors. I'm really fuzzy on the details...maybe someone else can elaborate. In the end, I just want to make sure we add something in the code that makes sure we don't encounter this problem again. This is one of those bugs that can take a while to track down, if you don't catch it right when it happens with an assertion. Brad From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On Behalf Of Korey Sewell Sent: Tuesday, April 05, 2011 7:14 AM To: Beckmann, Brad Subject: Re: [m5-dev] Running Ruby w/32 Cores Hi again Brad, I looked this over again and although my 32-bit patch fixes things, now that I look at it again, I'm not convinced that I actually fixed the symptom of the bug but rather the cause of the bug. Do you happen to know what are the problems with the 32-bit Set counts? Sorry for prolonging the issue, but I thought I had put this to bed but maybe not. Finally, it may not matter that this works on 32-bit machines but it'd be nice if it did. (Let me know if I should move this convo to the m5-dev list) I end up checking the last bit in the count function manually (the code as follows): int Set::count() const { int counter = 0; long mask; for (int i = 0; i m_nArrayLen; i++) { mask = (long)0x01; for (int j = 0; j LONG_BITS; j++) { // FIXME - significant performance loss when array // population LONG_BITS if ((m_p_nArray[i] mask) != 0) { counter++; } mask = mask 1; } #ifndef _LP64 long msb_mask = 0x8000; if ((m_p_nArray[i] msb_mask) != 0) { counter++; } #endif } return counter; } ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey
Re: [m5-dev] Running Ruby w/32 Cores
On Wed, 6 Apr 2011, Korey Sewell wrote: A few comments: (1) Using uint64_t seems like a quick, interim solution. But I still haven't grasped why we have the 31st bit problem, but we don't have the 63rd bit problem as well? I think if you use unsigned long, in place of long, the code would work on 32-bit machines. I am uncertain why the current code works on 64-bit machine. I think long means 32-bit, irrespective of memory address length. (2) Adding the stl::bitset seems like a good idea (does the Flags in M5 use that?) but it wont be a straightforward switch because the Set class supports arbitrary size sets. If it was implemented it would take a little bit of effort but not too much. (3) I didnt say this earlier, but it does look like this code could use some optimization. From the gprof I ran on 2-8 cores, this Set::count() function is the 2nd or 3rd highest producer of time for the Ruby Fft runs (although still a very small overall % in system time). Looks like simple optimizations like only looping for the set size in the count() function should be helpful, instead of always looping for the complete length of long datatype: for (int j = 0; j LONG_BITS; j++) { if ((m_p_nArray[i] mask) != 0) { counter++; } mask = mask 1; } That as well as generating a mask, shifting and comparing each bit doesn't seem necessary given we can potentially use a bitset or a constant-time struct to loop over and check set inclusion. I would still root for using popcount() builtin available with GCC. -- Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] changeset in web-graphics: Ruby: Adding figure for common networ...
changeset 1f62f3ea6275 in /z/repo/web-graphics details: web-graphics?cmd=changeset;node=1f62f3ea6275 description: Ruby: Adding figure for common network topologies. Individual components of the figure were taken from the GEMS tutorial in ISCA 2005. diffstat: ruby/figures/Topology_overview.jpg |0 ruby/sources/Topology_overview.doc |0 2 files changed, 0 insertions(+), 0 deletions(-) diffs (4 lines): diff -r 7555e9135731 -r 1f62f3ea6275 ruby/figures/Topology_overview.jpg Binary file ruby/figures/Topology_overview.jpg has changed diff -r 7555e9135731 -r 1f62f3ea6275 ruby/sources/Topology_overview.doc Binary file ruby/sources/Topology_overview.doc has changed ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Running Ruby w/32 Cores
On Apr 6, 2011, at 6:17 PM, Korey Sewell wrote: A few comments: (1) Using uint64_t seems like a quick, interim solution. But I still haven't grasped why we have the 31st bit problem, but we don't have the 63rd bit problem as well? (2) Adding the stl::bitset seems like a good idea (does the Flags in M5 use that?) but it wont be a straightforward switch because the Set class supports arbitrary size sets. If it was implemented it would take a little bit of effort but not too much. The functional units, instruction flags and packet flags use it. Trace flags doesn't. bitset supports arbitrarily sized sets too, you just have to declare the max size at construction (although there is a performance benefit to being less than the machine word length, it all still works if you're not). Additionally, bitset seem to support most if not all of the operations (intersection, union, count, zero, etc) that Set does, although they have different names. (3) I didnt say this earlier, but it does look like this code could use some optimization. From the gprof I ran on 2-8 cores, this Set::count() function is the 2nd or 3rd highest producer of time for the Ruby Fft runs (although still a very small overall % in system time). Looks like simple optimizations like only looping for the set size in the count() function should be helpful, instead of always looping for the complete length of long datatype: for (int j = 0; j LONG_BITS; j++) { if ((m_p_nArray[i] mask) != 0) { counter++; } mask = mask 1; } That as well as generating a mask, shifting and comparing each bit doesn't seem necessary given we can potentially use a bitset or a constant-time struct to loop over and check set inclusion. You can also do it with a constant time count of the number of bits that is set that is updated whenever something is changed. However, I don't think there is any reason to try and optimize a bespoke implementation of a bitset. The STL is going to be faster and will improve for free over time while this implementation won't. For example, bitset also uses count leading zeros where available to speed up finding the first set bit. Ali On Wed, Apr 6, 2011 at 5:12 PM, Nilay Vaish ni...@cs.wisc.edu wrote: I believe even popcount is portable. I am not opposed to using bitset, just that it would probably require lot more changes. -- Nilay On Wed, 6 Apr 2011, Ali Saidi wrote: stl::bitset does these type of optimizations underneath and it's portable. Ali On Wed, 6 Apr 2011 15:57:37 -0500 (CDT), Nilay Vaish ni...@cs.wisc.edu wrote: I would prefer we make use of GCC builtin __builtin_popcount() for counting the number of 1's in an int or related data type. Nilay On Wed, 6 Apr 2011, Ali Saidi wrote: And actually, couldn't you use an stl bitset for this? Thanks, Ali On Wed, 06 Apr 2011 15:34:01 -0500, Ali Saidi sa...@umich.edu wrote: Jumping in somewhat randomly here, uint64_t even on a 32bit machine is reasonably fast. It's not going to be as fast, but it will be correct. My vote would be to just switch all that Set code that uses long to explicitly use uint64_t and if it's slower on a 32bit machine so be it. At least it's correct. Ali On Wed, 6 Apr 2011 15:24:24 -0500, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Korey, Yes, let's move this conversation back to m5-dev, since I think others may be interested and could help. I don't know what the problem is exactly, but at some point of time (probably back in the early GEMS days) I seem to remember the Set code included an assertion check about the 31st bit in 32-bit mode. Therefore, I think we knew about this problem and made sure that never happened. I believe that is why we used to have a restriction that Ruby could only support 16 processors. I'm really fuzzy on the details...maybe someone else can elaborate. In the end, I just want to make sure we add something in the code that makes sure we don't encounter this problem again. This is one of those bugs that can take a while to track down, if you don't catch it right when it happens with an assertion. Brad From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On Behalf Of Korey Sewell Sent: Tuesday, April 05, 2011 7:14 AM To: Beckmann, Brad Subject: Re: [m5-dev] Running Ruby w/32 Cores Hi again Brad, I looked this over again and although my 32-bit patch fixes things, now that I look at it again, I'm not convinced that I actually fixed the symptom of the bug but rather the cause of the bug. Do you happen to know what are the problems with the 32-bit Set counts? Sorry for prolonging the issue, but I thought I had put this to bed but maybe not. Finally, it may not matter that this works on 32-bit machines but it'd be nice if it did. (Let me know if I should move this convo to the m5-dev list) I end up checking the last bit in the count function manually (the code as follows): int
Re: [m5-dev] Running Ruby w/32 Cores
When you say this is portable, what do you mean? Portable between compilers? We usually use gcc, but we have at least partial support for other compilers. I think this is necessary on some platforms. Gabe I would still root for using popcount() builtin available with GCC. -- Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Running Ruby w/32 Cores
Hi Ali, My only problem with stl::bitset here is that the Set type from Ruby seems to have the option to be resizable (through the overloaded assignment operator). That's what I meant by arbitrary length. In practice, I'm not sure if they ever assign sets of different lengths to each other (causing resizing), but if they do, then that would suggest that using the stl::bitset isnt a straightforward thing (definitely do-able though, just not plug/play). If the resizing is just a unused feature of Ruby, then I would suggest we switch to bitset. -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev