Re: [m5-dev] Running Ruby w/32 Cores
On Thu, 7 Apr 2011, Gabriel Michael Black wrote: When you say this is portable, what do you mean? Portable between compilers? We usually use gcc, but we have at least partial support for other compilers. I think this is necessary on some platforms. Gabe I would still root for using popcount() builtin available with GCC. -- Nilay Between different versions of gcc. Do we actually test whether the code compiles using other compilers? -- Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Running Ruby w/32 Cores
The problem is that LONG_BITS is 31, ie std::numeric_limitslong::digits returns 31 and not 32 which is what the writer expected. -- Nilay From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On Behalf Of Korey Sewell Sent: Tuesday, April 05, 2011 7:14 AM To: Beckmann, Brad Subject: Re: [m5-dev] Running Ruby w/32 Cores Hi again Brad, I looked this over again and although my 32-bit patch fixes things, now that I look at it again, I'm not convinced that I actually fixed the symptom of the bug but rather the cause of the bug. Do you happen to know what are the problems with the 32-bit Set counts? Sorry for prolonging the issue, but I thought I had put this to bed but maybe not. Finally, it may not matter that this works on 32-bit machines but it'd be nice if it did. (Let me know if I should move this convo to the m5-dev list) I end up checking the last bit in the count function manually (the code as follows): int Set::count() const { int counter = 0; long mask; for (int i = 0; i m_nArrayLen; i++) { mask = (long)0x01; for (int j = 0; j LONG_BITS; j++) { // FIXME - significant performance loss when array // population LONG_BITS if ((m_p_nArray[i] mask) != 0) { counter++; } mask = mask 1; } #ifndef _LP64 long msb_mask = 0x8000; if ((m_p_nArray[i] msb_mask) != 0) { counter++; } #endif } return counter; } ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Running Ruby w/32 Cores
Quoting Nilay Vaish ni...@cs.wisc.edu: On Thu, 7 Apr 2011, Gabriel Michael Black wrote: When you say this is portable, what do you mean? Portable between compilers? We usually use gcc, but we have at least partial support for other compilers. I think this is necessary on some platforms. Gabe I would still root for using popcount() builtin available with GCC. -- Nilay Between different versions of gcc. Do we actually test whether the code compiles using other compilers? -- Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev I don't know if we actively test it, but it worked at one time. Ali did some work on that, I think to get it to build with sun's compiler back when he was doing the SPARC full system support. It would be a good idea not to bake in any dependence on gcc. Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Running Ruby w/32 Cores
On Thu, 7 Apr 2011, Gabriel Michael Black wrote: Quoting Nilay Vaish ni...@cs.wisc.edu: On Thu, 7 Apr 2011, Gabriel Michael Black wrote: When you say this is portable, what do you mean? Portable between compilers? We usually use gcc, but we have at least partial support for other compilers. I think this is necessary on some platforms. Gabe I would still root for using popcount() builtin available with GCC. -- Nilay Between different versions of gcc. Do we actually test whether the code compiles using other compilers? -- Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev I don't know if we actively test it, but it worked at one time. Ali did some work on that, I think to get it to build with sun's compiler back when he was doing the SPARC full system support. It would be a good idea not to bake in any dependence on gcc. Gabe I agree with you. If we can avoid dependence on a compiler, we certainly should. -- Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Running Ruby w/32 Cores
Hi Korey, Yes, let's move this conversation back to m5-dev, since I think others may be interested and could help. I don't know what the problem is exactly, but at some point of time (probably back in the early GEMS days) I seem to remember the Set code included an assertion check about the 31st bit in 32-bit mode. Therefore, I think we knew about this problem and made sure that never happened. I believe that is why we used to have a restriction that Ruby could only support 16 processors. I'm really fuzzy on the details...maybe someone else can elaborate. In the end, I just want to make sure we add something in the code that makes sure we don't encounter this problem again. This is one of those bugs that can take a while to track down, if you don't catch it right when it happens with an assertion. Brad From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On Behalf Of Korey Sewell Sent: Tuesday, April 05, 2011 7:14 AM To: Beckmann, Brad Subject: Re: [m5-dev] Running Ruby w/32 Cores Hi again Brad, I looked this over again and although my 32-bit patch fixes things, now that I look at it again, I'm not convinced that I actually fixed the symptom of the bug but rather the cause of the bug. Do you happen to know what are the problems with the 32-bit Set counts? Sorry for prolonging the issue, but I thought I had put this to bed but maybe not. Finally, it may not matter that this works on 32-bit machines but it'd be nice if it did. (Let me know if I should move this convo to the m5-dev list) I end up checking the last bit in the count function manually (the code as follows): int Set::count() const { int counter = 0; long mask; for (int i = 0; i m_nArrayLen; i++) { mask = (long)0x01; for (int j = 0; j LONG_BITS; j++) { // FIXME - significant performance loss when array // population LONG_BITS if ((m_p_nArray[i] mask) != 0) { counter++; } mask = mask 1; } #ifndef _LP64 long msb_mask = 0x8000; if ((m_p_nArray[i] msb_mask) != 0) { counter++; } #endif } return counter; } On Tue, Apr 5, 2011 at 1:30 AM, Korey Sewell ksew...@umich.edumailto:ksew...@umich.edu wrote: Brad, it looks like you were right on the money here. I found the spot where it was returning the wrong value via a SLICC function to count sharers for everyone except the owner. I realized that the machine that I use for testing is just a 32-bit machine, and like you warned there look to be issues with the Set type there. I ran the Fft-32 cores on a 64-bit machine and it seems to work correctly. I'll be running on the full splash/parsec suites soon and that should stress Ruby a good bit :). I have a patch that checks to see if _LP64 is defined, and if not check that last bit when doing the set count function. Thanks for being helpful in debugging. It was a relatively easy bug, but as always going through code and becoming more proficient at getting around while trying to solve a bug is really helpful. On Fri, Apr 1, 2011 at 7:28 PM, Beckmann, Brad brad.beckm...@amd.commailto:brad.beckm...@amd.com wrote: Ok for the first trace, the critical line is the following: 348523 0L2Cache L1_GETX ILOSXIFLXO [0x16180, line 0x16180] [NetDest (4) 0 - 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 - 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - ]30 L2Cache identifies that 31 caches have a shared copy and that L1 cache 9 (L1-9) is the owner. When L1Cache 0 (L1-0) issues a GETX, the L2Cache issues 30 Inv probes, forwards the GETX to L1-9, and sends an ack to L1-0 itself. However, the L2 cache tells L1-0 to expect only 30 acks instead of 31. It could be something wrong with the NetDest::count() function, or the Set::count() function? I slightly modified my previous patch to isolate on what value the NetDest::count() function is returning. If it is returning 30, instead of 31, then it must be a problem with NetDest. You are compiling gem5 as a 64-bit binary, right? The second problem is essentially the same issue. L2Cache 31 (L2-31) is the owner of the block, but I suspect NetDest is not counting bit 31 and thus it is returning a count of 0...causing the error. Overall, concentrate on that NetDest::count function, or more importantly the Set::count() function. Once you find out the problem, please let me know. Thanks, Brad From: koreylsew...@gmail.commailto:koreylsew...@gmail.com [mailto:koreylsew...@gmail.commailto:koreylsew...@gmail.com] On Behalf Of Korey Sewell Sent: Friday, April 01, 2011 12:00 PM To: Beckmann, Brad Subject: Re: [m5-dev] Running Ruby w/32 Cores Brad, attached are the protocol traces grep'd for the offending addresses. I'm going to spend the weekend digging through Ruby code so hopefully I'm pretty close to generating the fixes myself
Re: [m5-dev] Running Ruby w/32 Cores
Jumping in somewhat randomly here, uint64_t even on a 32bit machine is reasonably fast. It's not going to be as fast, but it will be correct. My vote would be to just switch all that Set code that uses long to explicitly use uint64_t and if it's slower on a 32bit machine so be it. At least it's correct. Ali On Wed, 6 Apr 2011 15:24:24 -0500, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Korey, Yes, let's move this conversation back to m5-dev, since I think others may be interested and could help. I don't know what the problem is exactly, but at some point of time (probably back in the early GEMS days) I seem to remember the Set code included an assertion check about the 31st bit in 32-bit mode. Therefore, I think we knew about this problem and made sure that never happened. I believe that is why we used to have a restriction that Ruby could only support 16 processors. I'm really fuzzy on the details...maybe someone else can elaborate. In the end, I just want to make sure we add something in the code that makes sure we don't encounter this problem again. This is one of those bugs that can take a while to track down, if you don't catch it right when it happens with an assertion. Brad From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On Behalf Of Korey Sewell Sent: Tuesday, April 05, 2011 7:14 AM To: Beckmann, Brad Subject: Re: [m5-dev] Running Ruby w/32 Cores Hi again Brad, I looked this over again and although my 32-bit patch fixes things, now that I look at it again, I'm not convinced that I actually fixed the symptom of the bug but rather the cause of the bug. Do you happen to know what are the problems with the 32-bit Set counts? Sorry for prolonging the issue, but I thought I had put this to bed but maybe not. Finally, it may not matter that this works on 32-bit machines but it'd be nice if it did. (Let me know if I should move this convo to the m5-dev list) I end up checking the last bit in the count function manually (the code as follows): int Set::count() const { int counter = 0; long mask; for (int i = 0; i m_nArrayLen; i++) { mask = (long)0x01; for (int j = 0; j LONG_BITS; j++) { // FIXME - significant performance loss when array // population LONG_BITS if ((m_p_nArray[i] mask) != 0) { counter++; } mask = mask 1; } #ifndef _LP64 long msb_mask = 0x8000; if ((m_p_nArray[i] msb_mask) != 0) { counter++; } #endif } return counter; } On Tue, Apr 5, 2011 at 1:30 AM, Korey Sewell ksew...@umich.edumailto:ksew...@umich.edu wrote: Brad, it looks like you were right on the money here. I found the spot where it was returning the wrong value via a SLICC function to count sharers for everyone except the owner. I realized that the machine that I use for testing is just a 32-bit machine, and like you warned there look to be issues with the Set type there. I ran the Fft-32 cores on a 64-bit machine and it seems to work correctly. I'll be running on the full splash/parsec suites soon and that should stress Ruby a good bit :). I have a patch that checks to see if _LP64 is defined, and if not check that last bit when doing the set count function. Thanks for being helpful in debugging. It was a relatively easy bug, but as always going through code and becoming more proficient at getting around while trying to solve a bug is really helpful. On Fri, Apr 1, 2011 at 7:28 PM, Beckmann, Brad brad.beckm...@amd.commailto:brad.beckm...@amd.com wrote: Ok for the first trace, the critical line is the following: 348523 0L2Cache L1_GETX ILOSXIFLXO [0x16180, line 0x16180] [NetDest (4) 0 - 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 - 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - ]30 L2Cache identifies that 31 caches have a shared copy and that L1 cache 9 (L1-9) is the owner. When L1Cache 0 (L1-0) issues a GETX, the L2Cache issues 30 Inv probes, forwards the GETX to L1-9, and sends an ack to L1-0 itself. However, the L2 cache tells L1-0 to expect only 30 acks instead of 31. It could be something wrong with the NetDest::count() function, or the Set::count() function? I slightly modified my previous patch to isolate on what value the NetDest::count() function is returning. If it is returning 30, instead of 31, then it must be a problem with NetDest. You are compiling gem5 as a 64-bit binary, right? The second problem is essentially the same issue. L2Cache 31 (L2-31) is the owner of the block, but I suspect NetDest is not counting bit 31 and thus it is returning a count of 0...causing the error. Overall, concentrate on that NetDest::count function, or more importantly the Set::count() function. Once you find out the problem, please let me know. Thanks, Brad From: koreylsew...@gmail.commailto:koreylsew...@gmail.com [mailto:koreylsew
Re: [m5-dev] Running Ruby w/32 Cores
stl::bitset does these type of optimizations underneath and it's portable. Ali On Wed, 6 Apr 2011 15:57:37 -0500 (CDT), Nilay Vaish ni...@cs.wisc.edu wrote: I would prefer we make use of GCC builtin __builtin_popcount() for counting the number of 1's in an int or related data type. Nilay On Wed, 6 Apr 2011, Ali Saidi wrote: And actually, couldn't you use an stl bitset for this? Thanks, Ali On Wed, 06 Apr 2011 15:34:01 -0500, Ali Saidi sa...@umich.edu wrote: Jumping in somewhat randomly here, uint64_t even on a 32bit machine is reasonably fast. It's not going to be as fast, but it will be correct. My vote would be to just switch all that Set code that uses long to explicitly use uint64_t and if it's slower on a 32bit machine so be it. At least it's correct. Ali On Wed, 6 Apr 2011 15:24:24 -0500, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Korey, Yes, let's move this conversation back to m5-dev, since I think others may be interested and could help. I don't know what the problem is exactly, but at some point of time (probably back in the early GEMS days) I seem to remember the Set code included an assertion check about the 31st bit in 32-bit mode. Therefore, I think we knew about this problem and made sure that never happened. I believe that is why we used to have a restriction that Ruby could only support 16 processors. I'm really fuzzy on the details...maybe someone else can elaborate. In the end, I just want to make sure we add something in the code that makes sure we don't encounter this problem again. This is one of those bugs that can take a while to track down, if you don't catch it right when it happens with an assertion. Brad From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On Behalf Of Korey Sewell Sent: Tuesday, April 05, 2011 7:14 AM To: Beckmann, Brad Subject: Re: [m5-dev] Running Ruby w/32 Cores Hi again Brad, I looked this over again and although my 32-bit patch fixes things, now that I look at it again, I'm not convinced that I actually fixed the symptom of the bug but rather the cause of the bug. Do you happen to know what are the problems with the 32-bit Set counts? Sorry for prolonging the issue, but I thought I had put this to bed but maybe not. Finally, it may not matter that this works on 32-bit machines but it'd be nice if it did. (Let me know if I should move this convo to the m5-dev list) I end up checking the last bit in the count function manually (the code as follows): int Set::count() const { int counter = 0; long mask; for (int i = 0; i m_nArrayLen; i++) { mask = (long)0x01; for (int j = 0; j LONG_BITS; j++) { // FIXME - significant performance loss when array // population LONG_BITS if ((m_p_nArray[i] mask) != 0) { counter++; } mask = mask 1; } #ifndef _LP64 long msb_mask = 0x8000; if ((m_p_nArray[i] msb_mask) != 0) { counter++; } #endif } return counter; } ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Running Ruby w/32 Cores
A few comments: (1) Using uint64_t seems like a quick, interim solution. But I still haven't grasped why we have the 31st bit problem, but we don't have the 63rd bit problem as well? (2) Adding the stl::bitset seems like a good idea (does the Flags in M5 use that?) but it wont be a straightforward switch because the Set class supports arbitrary size sets. If it was implemented it would take a little bit of effort but not too much. (3) I didnt say this earlier, but it does look like this code could use some optimization. From the gprof I ran on 2-8 cores, this Set::count() function is the 2nd or 3rd highest producer of time for the Ruby Fft runs (although still a very small overall % in system time). Looks like simple optimizations like only looping for the set size in the count() function should be helpful, instead of always looping for the complete length of long datatype: for (int j = 0; j LONG_BITS; j++) { if ((m_p_nArray[i] mask) != 0) { counter++; } mask = mask 1; } That as well as generating a mask, shifting and comparing each bit doesn't seem necessary given we can potentially use a bitset or a constant-time struct to loop over and check set inclusion. On Wed, Apr 6, 2011 at 5:12 PM, Nilay Vaish ni...@cs.wisc.edu wrote: I believe even popcount is portable. I am not opposed to using bitset, just that it would probably require lot more changes. -- Nilay On Wed, 6 Apr 2011, Ali Saidi wrote: stl::bitset does these type of optimizations underneath and it's portable. Ali On Wed, 6 Apr 2011 15:57:37 -0500 (CDT), Nilay Vaish ni...@cs.wisc.edu wrote: I would prefer we make use of GCC builtin __builtin_popcount() for counting the number of 1's in an int or related data type. Nilay On Wed, 6 Apr 2011, Ali Saidi wrote: And actually, couldn't you use an stl bitset for this? Thanks, Ali On Wed, 06 Apr 2011 15:34:01 -0500, Ali Saidi sa...@umich.edu wrote: Jumping in somewhat randomly here, uint64_t even on a 32bit machine is reasonably fast. It's not going to be as fast, but it will be correct. My vote would be to just switch all that Set code that uses long to explicitly use uint64_t and if it's slower on a 32bit machine so be it. At least it's correct. Ali On Wed, 6 Apr 2011 15:24:24 -0500, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Korey, Yes, let's move this conversation back to m5-dev, since I think others may be interested and could help. I don't know what the problem is exactly, but at some point of time (probably back in the early GEMS days) I seem to remember the Set code included an assertion check about the 31st bit in 32-bit mode. Therefore, I think we knew about this problem and made sure that never happened. I believe that is why we used to have a restriction that Ruby could only support 16 processors. I'm really fuzzy on the details...maybe someone else can elaborate. In the end, I just want to make sure we add something in the code that makes sure we don't encounter this problem again. This is one of those bugs that can take a while to track down, if you don't catch it right when it happens with an assertion. Brad From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On Behalf Of Korey Sewell Sent: Tuesday, April 05, 2011 7:14 AM To: Beckmann, Brad Subject: Re: [m5-dev] Running Ruby w/32 Cores Hi again Brad, I looked this over again and although my 32-bit patch fixes things, now that I look at it again, I'm not convinced that I actually fixed the symptom of the bug but rather the cause of the bug. Do you happen to know what are the problems with the 32-bit Set counts? Sorry for prolonging the issue, but I thought I had put this to bed but maybe not. Finally, it may not matter that this works on 32-bit machines but it'd be nice if it did. (Let me know if I should move this convo to the m5-dev list) I end up checking the last bit in the count function manually (the code as follows): int Set::count() const { int counter = 0; long mask; for (int i = 0; i m_nArrayLen; i++) { mask = (long)0x01; for (int j = 0; j LONG_BITS; j++) { // FIXME - significant performance loss when array // population LONG_BITS if ((m_p_nArray[i] mask) != 0) { counter++; } mask = mask 1; } #ifndef _LP64 long msb_mask = 0x8000; if ((m_p_nArray[i] msb_mask) != 0) { counter++; } #endif } return counter; } ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey
Re: [m5-dev] Running Ruby w/32 Cores
On Wed, 6 Apr 2011, Korey Sewell wrote: A few comments: (1) Using uint64_t seems like a quick, interim solution. But I still haven't grasped why we have the 31st bit problem, but we don't have the 63rd bit problem as well? I think if you use unsigned long, in place of long, the code would work on 32-bit machines. I am uncertain why the current code works on 64-bit machine. I think long means 32-bit, irrespective of memory address length. (2) Adding the stl::bitset seems like a good idea (does the Flags in M5 use that?) but it wont be a straightforward switch because the Set class supports arbitrary size sets. If it was implemented it would take a little bit of effort but not too much. (3) I didnt say this earlier, but it does look like this code could use some optimization. From the gprof I ran on 2-8 cores, this Set::count() function is the 2nd or 3rd highest producer of time for the Ruby Fft runs (although still a very small overall % in system time). Looks like simple optimizations like only looping for the set size in the count() function should be helpful, instead of always looping for the complete length of long datatype: for (int j = 0; j LONG_BITS; j++) { if ((m_p_nArray[i] mask) != 0) { counter++; } mask = mask 1; } That as well as generating a mask, shifting and comparing each bit doesn't seem necessary given we can potentially use a bitset or a constant-time struct to loop over and check set inclusion. I would still root for using popcount() builtin available with GCC. -- Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Running Ruby w/32 Cores
On Apr 6, 2011, at 6:17 PM, Korey Sewell wrote: A few comments: (1) Using uint64_t seems like a quick, interim solution. But I still haven't grasped why we have the 31st bit problem, but we don't have the 63rd bit problem as well? (2) Adding the stl::bitset seems like a good idea (does the Flags in M5 use that?) but it wont be a straightforward switch because the Set class supports arbitrary size sets. If it was implemented it would take a little bit of effort but not too much. The functional units, instruction flags and packet flags use it. Trace flags doesn't. bitset supports arbitrarily sized sets too, you just have to declare the max size at construction (although there is a performance benefit to being less than the machine word length, it all still works if you're not). Additionally, bitset seem to support most if not all of the operations (intersection, union, count, zero, etc) that Set does, although they have different names. (3) I didnt say this earlier, but it does look like this code could use some optimization. From the gprof I ran on 2-8 cores, this Set::count() function is the 2nd or 3rd highest producer of time for the Ruby Fft runs (although still a very small overall % in system time). Looks like simple optimizations like only looping for the set size in the count() function should be helpful, instead of always looping for the complete length of long datatype: for (int j = 0; j LONG_BITS; j++) { if ((m_p_nArray[i] mask) != 0) { counter++; } mask = mask 1; } That as well as generating a mask, shifting and comparing each bit doesn't seem necessary given we can potentially use a bitset or a constant-time struct to loop over and check set inclusion. You can also do it with a constant time count of the number of bits that is set that is updated whenever something is changed. However, I don't think there is any reason to try and optimize a bespoke implementation of a bitset. The STL is going to be faster and will improve for free over time while this implementation won't. For example, bitset also uses count leading zeros where available to speed up finding the first set bit. Ali On Wed, Apr 6, 2011 at 5:12 PM, Nilay Vaish ni...@cs.wisc.edu wrote: I believe even popcount is portable. I am not opposed to using bitset, just that it would probably require lot more changes. -- Nilay On Wed, 6 Apr 2011, Ali Saidi wrote: stl::bitset does these type of optimizations underneath and it's portable. Ali On Wed, 6 Apr 2011 15:57:37 -0500 (CDT), Nilay Vaish ni...@cs.wisc.edu wrote: I would prefer we make use of GCC builtin __builtin_popcount() for counting the number of 1's in an int or related data type. Nilay On Wed, 6 Apr 2011, Ali Saidi wrote: And actually, couldn't you use an stl bitset for this? Thanks, Ali On Wed, 06 Apr 2011 15:34:01 -0500, Ali Saidi sa...@umich.edu wrote: Jumping in somewhat randomly here, uint64_t even on a 32bit machine is reasonably fast. It's not going to be as fast, but it will be correct. My vote would be to just switch all that Set code that uses long to explicitly use uint64_t and if it's slower on a 32bit machine so be it. At least it's correct. Ali On Wed, 6 Apr 2011 15:24:24 -0500, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Korey, Yes, let's move this conversation back to m5-dev, since I think others may be interested and could help. I don't know what the problem is exactly, but at some point of time (probably back in the early GEMS days) I seem to remember the Set code included an assertion check about the 31st bit in 32-bit mode. Therefore, I think we knew about this problem and made sure that never happened. I believe that is why we used to have a restriction that Ruby could only support 16 processors. I'm really fuzzy on the details...maybe someone else can elaborate. In the end, I just want to make sure we add something in the code that makes sure we don't encounter this problem again. This is one of those bugs that can take a while to track down, if you don't catch it right when it happens with an assertion. Brad From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On Behalf Of Korey Sewell Sent: Tuesday, April 05, 2011 7:14 AM To: Beckmann, Brad Subject: Re: [m5-dev] Running Ruby w/32 Cores Hi again Brad, I looked this over again and although my 32-bit patch fixes things, now that I look at it again, I'm not convinced that I actually fixed the symptom of the bug but rather the cause of the bug. Do you happen to know what are the problems with the 32-bit Set counts? Sorry for prolonging the issue, but I thought I had put this to bed but maybe not. Finally, it may not matter that this works on 32-bit machines but it'd be nice if it did. (Let me know if I should move this convo to the m5-dev list) I end up checking the last bit in the count function manually (the code as follows): int Set
Re: [m5-dev] Running Ruby w/32 Cores
When you say this is portable, what do you mean? Portable between compilers? We usually use gcc, but we have at least partial support for other compilers. I think this is necessary on some platforms. Gabe I would still root for using popcount() builtin available with GCC. -- Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Running Ruby w/32 Cores
Hi Ali, My only problem with stl::bitset here is that the Set type from Ruby seems to have the option to be resizable (through the overloaded assignment operator). That's what I meant by arbitrary length. In practice, I'm not sure if they ever assign sets of different lengths to each other (causing resizing), but if they do, then that would suggest that using the stl::bitset isnt a straightforward thing (definitely do-able though, just not plug/play). If the resizing is just a unused feature of Ruby, then I would suggest we switch to bitset. -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Running Ruby w/32 Cores
Hi Lisa, I actually had sent the attachments to Brad since m5dev bounced the attachments. I think the limit is 512kB or something like that. But definitely, thanks for the heads up! On Wed, Mar 30, 2011 at 7:45 PM, Lisa Hsu h...@eecs.umich.edu wrote: I think you forgot the attachments :P. Sometimes, if ProtocolTrace isn't enough for me to find a problem, I turn on RubySlicc and RubyGenerated as well. RubySlicc is the DPRINTFs within the actual protocol *.sm files, and RubyGenerated are inside of the generated code that you will only see in the build directory. Lisa On Tue, Mar 29, 2011 at 10:15 AM, Korey Sewell ksew...@umich.edu wrote: Thanks for the response Brad. The 1st trace has 1 L2 and the 2nd has 1 L2 (i had a typo in the original email). For each trace, I attach the stdout/stderr (*.out) and then the protocol trace (*.prottrace). Also, in the 1st trace, the offending address is clear and I isolate that in the protocol trace file provided. However, in the 2nd trace, it's unclear (currently) which access caused it to fail so I took the whole protocol trace file and gzip'd it. My current lack of expertise in SLICC limits me a bit, but I'd like to be more helpful in debugging so if there is anything that I can look into (or run) on my end to expedite the process, please advise. In the interim, I'll try to locate the exact address that's breaking trace 2 and then hopefully repost that. Thanks! -Korey On Tue, Mar 29, 2011 at 12:02 PM, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Korey, I believe both of these issues should be easy to solve once we have a protocol trace leading up to the error. If you could create such a trace and send it to the list, that would be great. Just zero in on the offending address. Thanks, Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Tuesday, March 29, 2011 8:11 AM To: M5 Developer List Subject: [m5-dev] Running Ruby w/32 Cores Hi All, I'm still having a bit of trouble running Ruby with 32+ cores. I am experimenting w/configs varying the l2-caches. The runs seems to generate various errors in the SLICC. Has anybody seen these or have any insight to how to start solving these type of issues (posted below)? = The command line and errors are as follows: (1) 32 Cores and 32 L2s build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num- l2caches=32 ... info: Entering event queue @ 0. Starting simulation... Runtime Error at MOESI_CMP_directory-dir.sm:155, Ruby Time: 38279: assert failure, PID: 5990 press return to continue. Program aborted at cycle 19139500 Aborted (2) 32 Cores and 1 L2 build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num- l2caches=32 ... fatal: Invalid transition system.l1_cntrl0 time: 349075 addr: [0x16180, line 0x16180] event: Ack state: MM @ cycle 174537500 [doTransitionWorker:build/ALPHA_FS_MOESI_CMP_directory/mem/protoc ol/L1Cache_Transitions.cc, line 477] Memory Usage: 2316756 KBytes For more information see: http://www.m5sim.org/fatal/23f196b2 Please let me know if you do...Thanks! -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Running Ruby w/32 Cores
Is there an attached patch I should be running or did it get bounced by m5-dev? If so, can you send it directly to me rather through m5-dev? On Wed, Mar 30, 2011 at 8:26 PM, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Korey, For the first trace, it looks like the L2 cache is either miscounting the number of valid L1 copies, or there is an error with the ack arithmetic. We are going to need a bit more information to figure out where the exact problem is. Could you apply the attached patch and reply with the new protocol trace? Thanks. For the second trace, you should be able to get the offending address by simply attaching GDB to the aborted process. Without knowing which address to zero in on, it is the proverbial finding a needle in a haystack. Thanks, Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Tuesday, March 29, 2011 10:15 AM To: M5 Developer List Subject: Re: [m5-dev] Running Ruby w/32 Cores Thanks for the response Brad. The 1st trace has 1 L2 and the 2nd has 1 L2 (i had a typo in the original email). For each trace, I attach the stdout/stderr (*.out) and then the protocol trace (*.prottrace). Also, in the 1st trace, the offending address is clear and I isolate that in the protocol trace file provided. However, in the 2nd trace, it's unclear (currently) which access caused it to fail so I took the whole protocol trace file and gzip'd it. My current lack of expertise in SLICC limits me a bit, but I'd like to be more helpful in debugging so if there is anything that I can look into (or run) on my end to expedite the process, please advise. In the interim, I'll try to locate the exact address that's breaking trace 2 and then hopefully repost that. Thanks! -Korey On Tue, Mar 29, 2011 at 12:02 PM, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Korey, I believe both of these issues should be easy to solve once we have a protocol trace leading up to the error. If you could create such a trace and send it to the list, that would be great. Just zero in on the offending address. Thanks, Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev- boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Tuesday, March 29, 2011 8:11 AM To: M5 Developer List Subject: [m5-dev] Running Ruby w/32 Cores Hi All, I'm still having a bit of trouble running Ruby with 32+ cores. I am experimenting w/configs varying the l2-caches. The runs seems to generate various errors in the SLICC. Has anybody seen these or have any insight to how to start solving these type of issues (posted below)? = The command line and errors are as follows: (1) 32 Cores and 32 L2s build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num- l2caches=32 ... info: Entering event queue @ 0. Starting simulation... Runtime Error at MOESI_CMP_directory-dir.sm:155, Ruby Time: 38279: assert failure, PID: 5990 press return to continue. Program aborted at cycle 19139500 Aborted (2) 32 Cores and 1 L2 build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num- l2caches=32 ... fatal: Invalid transition system.l1_cntrl0 time: 349075 addr: [0x16180, line 0x16180] event: Ack state: MM @ cycle 174537500 [doTransitionWorker:build/ALPHA_FS_MOESI_CMP_directory/mem/protoc ol/L1Cache_Transitions.cc, line 477] Memory Usage: 2316756 KBytes For more information see: http://www.m5sim.org/fatal/23f196b2 Please let me know if you do...Thanks! -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Running Ruby w/32 Cores
I think you forgot the attachments :P. Sometimes, if ProtocolTrace isn't enough for me to find a problem, I turn on RubySlicc and RubyGenerated as well. RubySlicc is the DPRINTFs within the actual protocol *.sm files, and RubyGenerated are inside of the generated code that you will only see in the build directory. Lisa On Tue, Mar 29, 2011 at 10:15 AM, Korey Sewell ksew...@umich.edu wrote: Thanks for the response Brad. The 1st trace has 1 L2 and the 2nd has 1 L2 (i had a typo in the original email). For each trace, I attach the stdout/stderr (*.out) and then the protocol trace (*.prottrace). Also, in the 1st trace, the offending address is clear and I isolate that in the protocol trace file provided. However, in the 2nd trace, it's unclear (currently) which access caused it to fail so I took the whole protocol trace file and gzip'd it. My current lack of expertise in SLICC limits me a bit, but I'd like to be more helpful in debugging so if there is anything that I can look into (or run) on my end to expedite the process, please advise. In the interim, I'll try to locate the exact address that's breaking trace 2 and then hopefully repost that. Thanks! -Korey On Tue, Mar 29, 2011 at 12:02 PM, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Korey, I believe both of these issues should be easy to solve once we have a protocol trace leading up to the error. If you could create such a trace and send it to the list, that would be great. Just zero in on the offending address. Thanks, Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Tuesday, March 29, 2011 8:11 AM To: M5 Developer List Subject: [m5-dev] Running Ruby w/32 Cores Hi All, I'm still having a bit of trouble running Ruby with 32+ cores. I am experimenting w/configs varying the l2-caches. The runs seems to generate various errors in the SLICC. Has anybody seen these or have any insight to how to start solving these type of issues (posted below)? = The command line and errors are as follows: (1) 32 Cores and 32 L2s build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num- l2caches=32 ... info: Entering event queue @ 0. Starting simulation... Runtime Error at MOESI_CMP_directory-dir.sm:155, Ruby Time: 38279: assert failure, PID: 5990 press return to continue. Program aborted at cycle 19139500 Aborted (2) 32 Cores and 1 L2 build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num- l2caches=32 ... fatal: Invalid transition system.l1_cntrl0 time: 349075 addr: [0x16180, line 0x16180] event: Ack state: MM @ cycle 174537500 [doTransitionWorker:build/ALPHA_FS_MOESI_CMP_directory/mem/protoc ol/L1Cache_Transitions.cc, line 477] Memory Usage: 2316756 KBytes For more information see: http://www.m5sim.org/fatal/23f196b2 Please let me know if you do...Thanks! -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Running Ruby w/32 Cores
Hi Korey, For the first trace, it looks like the L2 cache is either miscounting the number of valid L1 copies, or there is an error with the ack arithmetic. We are going to need a bit more information to figure out where the exact problem is. Could you apply the attached patch and reply with the new protocol trace? Thanks. For the second trace, you should be able to get the offending address by simply attaching GDB to the aborted process. Without knowing which address to zero in on, it is the proverbial finding a needle in a haystack. Thanks, Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Tuesday, March 29, 2011 10:15 AM To: M5 Developer List Subject: Re: [m5-dev] Running Ruby w/32 Cores Thanks for the response Brad. The 1st trace has 1 L2 and the 2nd has 1 L2 (i had a typo in the original email). For each trace, I attach the stdout/stderr (*.out) and then the protocol trace (*.prottrace). Also, in the 1st trace, the offending address is clear and I isolate that in the protocol trace file provided. However, in the 2nd trace, it's unclear (currently) which access caused it to fail so I took the whole protocol trace file and gzip'd it. My current lack of expertise in SLICC limits me a bit, but I'd like to be more helpful in debugging so if there is anything that I can look into (or run) on my end to expedite the process, please advise. In the interim, I'll try to locate the exact address that's breaking trace 2 and then hopefully repost that. Thanks! -Korey On Tue, Mar 29, 2011 at 12:02 PM, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Korey, I believe both of these issues should be easy to solve once we have a protocol trace leading up to the error. If you could create such a trace and send it to the list, that would be great. Just zero in on the offending address. Thanks, Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev- boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Tuesday, March 29, 2011 8:11 AM To: M5 Developer List Subject: [m5-dev] Running Ruby w/32 Cores Hi All, I'm still having a bit of trouble running Ruby with 32+ cores. I am experimenting w/configs varying the l2-caches. The runs seems to generate various errors in the SLICC. Has anybody seen these or have any insight to how to start solving these type of issues (posted below)? = The command line and errors are as follows: (1) 32 Cores and 32 L2s build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num- l2caches=32 ... info: Entering event queue @ 0. Starting simulation... Runtime Error at MOESI_CMP_directory-dir.sm:155, Ruby Time: 38279: assert failure, PID: 5990 press return to continue. Program aborted at cycle 19139500 Aborted (2) 32 Cores and 1 L2 build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num- l2caches=32 ... fatal: Invalid transition system.l1_cntrl0 time: 349075 addr: [0x16180, line 0x16180] event: Ack state: MM @ cycle 174537500 [doTransitionWorker:build/ALPHA_FS_MOESI_CMP_directory/mem/protoc ol/L1Cache_Transitions.cc, line 477] Memory Usage: 2316756 KBytes For more information see: http://www.m5sim.org/fatal/23f196b2 Please let me know if you do...Thanks! -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] Running Ruby w/32 Cores
Hi All, I'm still having a bit of trouble running Ruby with 32+ cores. I am experimenting w/configs varying the l2-caches. The runs seems to generate various errors in the SLICC. Has anybody seen these or have any insight to how to start solving these type of issues (posted below)? = The command line and errors are as follows: (1) 32 Cores and 32 L2s build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num-l2caches=32 ... info: Entering event queue @ 0. Starting simulation... Runtime Error at MOESI_CMP_directory-dir.sm:155, Ruby Time: 38279: assert failure, PID: 5990 press return to continue. Program aborted at cycle 19139500 Aborted (2) 32 Cores and 1 L2 build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num-l2caches=32 ... fatal: Invalid transition system.l1_cntrl0 time: 349075 addr: [0x16180, line 0x16180] event: Ack state: MM @ cycle 174537500 [doTransitionWorker:build/ALPHA_FS_MOESI_CMP_directory/mem/protocol/L1Cache_Transitions.cc, line 477] Memory Usage: 2316756 KBytes For more information see: http://www.m5sim.org/fatal/23f196b2 Please let me know if you do...Thanks! -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Running Ruby w/32 Cores
Hi Korey, I believe both of these issues should be easy to solve once we have a protocol trace leading up to the error. If you could create such a trace and send it to the list, that would be great. Just zero in on the offending address. Thanks, Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Tuesday, March 29, 2011 8:11 AM To: M5 Developer List Subject: [m5-dev] Running Ruby w/32 Cores Hi All, I'm still having a bit of trouble running Ruby with 32+ cores. I am experimenting w/configs varying the l2-caches. The runs seems to generate various errors in the SLICC. Has anybody seen these or have any insight to how to start solving these type of issues (posted below)? = The command line and errors are as follows: (1) 32 Cores and 32 L2s build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num- l2caches=32 ... info: Entering event queue @ 0. Starting simulation... Runtime Error at MOESI_CMP_directory-dir.sm:155, Ruby Time: 38279: assert failure, PID: 5990 press return to continue. Program aborted at cycle 19139500 Aborted (2) 32 Cores and 1 L2 build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num- l2caches=32 ... fatal: Invalid transition system.l1_cntrl0 time: 349075 addr: [0x16180, line 0x16180] event: Ack state: MM @ cycle 174537500 [doTransitionWorker:build/ALPHA_FS_MOESI_CMP_directory/mem/protoc ol/L1Cache_Transitions.cc, line 477] Memory Usage: 2316756 KBytes For more information see: http://www.m5sim.org/fatal/23f196b2 Please let me know if you do...Thanks! -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Running Ruby w/32 Cores
Thanks for the response Brad. The 1st trace has 1 L2 and the 2nd has 1 L2 (i had a typo in the original email). For each trace, I attach the stdout/stderr (*.out) and then the protocol trace (*.prottrace). Also, in the 1st trace, the offending address is clear and I isolate that in the protocol trace file provided. However, in the 2nd trace, it's unclear (currently) which access caused it to fail so I took the whole protocol trace file and gzip'd it. My current lack of expertise in SLICC limits me a bit, but I'd like to be more helpful in debugging so if there is anything that I can look into (or run) on my end to expedite the process, please advise. In the interim, I'll try to locate the exact address that's breaking trace 2 and then hopefully repost that. Thanks! -Korey On Tue, Mar 29, 2011 at 12:02 PM, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Korey, I believe both of these issues should be easy to solve once we have a protocol trace leading up to the error. If you could create such a trace and send it to the list, that would be great. Just zero in on the offending address. Thanks, Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Tuesday, March 29, 2011 8:11 AM To: M5 Developer List Subject: [m5-dev] Running Ruby w/32 Cores Hi All, I'm still having a bit of trouble running Ruby with 32+ cores. I am experimenting w/configs varying the l2-caches. The runs seems to generate various errors in the SLICC. Has anybody seen these or have any insight to how to start solving these type of issues (posted below)? = The command line and errors are as follows: (1) 32 Cores and 32 L2s build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num- l2caches=32 ... info: Entering event queue @ 0. Starting simulation... Runtime Error at MOESI_CMP_directory-dir.sm:155, Ruby Time: 38279: assert failure, PID: 5990 press return to continue. Program aborted at cycle 19139500 Aborted (2) 32 Cores and 1 L2 build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num- l2caches=32 ... fatal: Invalid transition system.l1_cntrl0 time: 349075 addr: [0x16180, line 0x16180] event: Ack state: MM @ cycle 174537500 [doTransitionWorker:build/ALPHA_FS_MOESI_CMP_directory/mem/protoc ol/L1Cache_Transitions.cc, line 477] Memory Usage: 2316756 KBytes For more information see: http://www.m5sim.org/fatal/23f196b2 Please let me know if you do...Thanks! -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev