Re: [m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...

2010-07-22 Thread Steve Reinhardt
(I wrote most of this email before I saw your second message, so I'm
not sure whether it still applies or not, but I figured I'd send it
anyway.)

Hi Gabe,

I certainly agree with your goals, but I will quibble with some of the
details...

On Thu, Jul 22, 2010 at 2:17 PM, Gabriel Michael Black
 wrote:
> To me there's actually a spectrum of ISA dependence, incompletely described
> below:
>
> 1. If ISA == suchandsuch do this, otherwise do that.
> 2. If ISA has characteristic such and such, do this, otherwise do that.
> 3. Here, ISA, you take care of this.
> 4. ISA data parameterizing non-ISA stuff.
>
> Number 1 is the worst since it's hard to maintain, can be cumbersome to
> specify, and it isn't always clear -why- ISA suchandsuch needs a particular
> behavior.

I could go into a lot more detail about why I think #1 is the
worst---I think you're being too kind here---but since we agree it's
bad I won't bother.

> I'd say number 2 is second worst, and that's what this is an example of.
> It's better since it's obvious why the code is separated out and there can
> be sharing between CPU models, etc., but at the same time it increases the
> CPU's awareness of the ISA on it and partially breaks down the barriers of
> abstraction. It also sets up special case code paths where, for instance,
> only x86 on the timing CPU would possibly exhibit a certain bug. If someone
> changes things for ARM and everything seems to work, they could be subtley
> breaking x86 which they aren't familiar with and weren't thinking about when
> they made their change.

I don't really agree with this analysis... I think you can also look
at #2 as a special case of #4 where the ISA parameter is a bool rather
than (for example) an int.  I agree there's a qualitative difference
in terms of how dramatically control flow is affected, but I think
it's a perspective that makes it psychologically more appealing.  And
the fact that it sets up special code paths is orthogonal: if there's
a bug in how unaligned accesses are handled, that's only going to show
up in x86 regardless of whether that code path is explicitly blocked
by a HasUnalignedAccess flag or just implicitly never gets exercised
because sreqLow is always NULL for non-x86 ISAs.

Overall I don't disagree with your ranking; in fact we've already
discussed moving the unaligned access handling for both the TLB and
cache into the ISA-specific TLB and a shim, respectively, which would
be an example of moving this code from case 2 to case 3.

However I view it as more of a gulf, where #1 is really bad, and 2-4
are reasonable things to do, and while in general 4>3>2 there are
things that are just easier to do via #2 and there's no need to
unnaturally avoid that.

> I think it's bad news to have a big list of yes or no checkboxes in each ISA
> directory which turn on and off behaviors. This is especially true when it's
> ambiguous whether to say yes or no, if the behavior changes based on
> circumstances, etc., and if/when the checkbox is interpretted subtley (or
> not so subtley) differently by the consumer.

I'm not advocating for a huge list of ambiguous checkboxes... I'm just
advocating for #2 in preference to #1, and also saying that #2 is
really not so bad that we need to bend over backward to avoid it when
it seems natural.

Steve
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...

2010-07-22 Thread Gabriel Michael Black
Actually, maybe this falls into a subcategory of 2. which is why it  
seems more acceptable.


2.a If the ISA says it's safe to use this optimization which doesn't  
affect behavior and which could be turned off and only result in lower  
simulator performance, skip this code/check/whatever.


This still has some of the drawbacks I mentioned for the  
list-of-checkboxes model, but at least if it doesn't fit there's  
always a failsafe option. These would be optional hints, like telling  
the compiler a function never returns.


This may have been what you were talking about in the first place, in  
which case great, we're on the same page. I'd been debating whether to  
send out my little list anyway since I think it's productive to spend  
a few cycles thinking about this stuff.


Gabe

Quoting Gabriel Michael Black :

To me there's actually a spectrum of ISA dependence, incompletely  
described below:


1. If ISA == suchandsuch do this, otherwise do that.
2. If ISA has characteristic such and such, do this, otherwise do that.
3. Here, ISA, you take care of this.
4. ISA data parameterizing non-ISA stuff.

Number 1 is the worst since it's hard to maintain, can be cumbersome  
to specify, and it isn't always clear -why- ISA suchandsuch needs a  
particular behavior.


I'd say number 2 is second worst, and that's what this is an example  
of. It's better since it's obvious why the code is separated out and  
there can be sharing between CPU models, etc., but at the same time  
it increases the CPU's awareness of the ISA on it and partially  
breaks down the barriers of abstraction. It also sets up special  
case code paths where, for instance, only x86 on the timing CPU  
would possibly exhibit a certain bug. If someone changes things for  
ARM and everything seems to work, they could be subtley breaking x86  
which they aren't familiar with and weren't thinking about when they  
made their change.


3. Three is better in some ways and worse in others. It's clear  
what's happening, there's a lot of flexibility, and the CPU isn't  
actually aware of -how- something is being done, just that, say,  
this would be a good time to check for interrupts, whatever that  
means. It's not as great because you have more complex interactions  
between ISAs and CPUs, and you have to do a bit more work in the  
ISA. It can also be hard to make some of this functionality work  
sensibly in order, out of order, single cycle, multi cycle, timing  
mode, atomic mode, etc. etc.


4. This one is the best when you can get away with it. This is where  
you, say, make your integer register file 32 registers because the  
ISA says that's how many it needs. Basically the only draw back to  
this is that behavior changes a little based on each ISA, but if you  
can get away with it this is the safest.




I think having a multi ISA simulator that will be modified by its  
end users, especially one with as large a cross product of options  
as ours, needs to try to be safe and simple before being as  
absolutely fast as it can be. You know what they say about premature  
optimization. Ideally we should design things so the big,  
unnecessary pieces of code just aren't part of the equation because  
they're in the ISA defined objects, control just doesn't go that way  
when it wouldn't make sense, etc. And if, for instance, a pointer  
should always be 0, the code should still behave correctly. The code  
should do its job correctly no matter -why- it's being asked to do it.


I think it's bad news to have a big list of yes or no checkboxes in  
each ISA directory which turn on and off behaviors. This is  
especially true when it's ambiguous whether to say yes or no, if the  
behavior changes based on circumstances, etc., and if/when the  
checkbox is interpretted subtley (or not so subtley) differently by  
the consumer.


In this particular limitted case it seems relatively ok although not  
necessarily advisable, but it's a slippery slope I don't consider us  
have a completely clean history with.


Gabe

Quoting Steve Reinhardt :


In generalI think this is the kind of ISA hook we should be using...
in the sense that checking TheISA::HasUnalignedMemAcc is much better
than "(TheISA == x86 || TheISA == Power)".  I think it's useful not
only to avoid the overhead of a dynamic check for an ISA that doesn't
need it, but also to clarify that this is code that never gets
executed on those ISAs.  Maybe for a little one-liner like this it's
not a big deal either way, but for bigger hunks of code I think that
clarification is potentially useful.

Steve

On Thu, Jul 22, 2010 at 1:10 PM, Gabriel Michael Black
 wrote:

Yes it does, and that sounds reasonable to me. I'd still like to see us use
ISA hooks as minimally as possible, but this seems ok.

Gabe

Quoting Timothy M Jones :


Oh, ok, I see where you're going with that.  However, the main idea of
having TheISA::HasUnalignedMemAcc was that it is a constant  
specific to each

ISA.  Therefor

Re: [m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...

2010-07-22 Thread Gabriel Michael Black
To me there's actually a spectrum of ISA dependence, incompletely  
described below:


1. If ISA == suchandsuch do this, otherwise do that.
2. If ISA has characteristic such and such, do this, otherwise do that.
3. Here, ISA, you take care of this.
4. ISA data parameterizing non-ISA stuff.

Number 1 is the worst since it's hard to maintain, can be cumbersome  
to specify, and it isn't always clear -why- ISA suchandsuch needs a  
particular behavior.


I'd say number 2 is second worst, and that's what this is an example  
of. It's better since it's obvious why the code is separated out and  
there can be sharing between CPU models, etc., but at the same time it  
increases the CPU's awareness of the ISA on it and partially breaks  
down the barriers of abstraction. It also sets up special case code  
paths where, for instance, only x86 on the timing CPU would possibly  
exhibit a certain bug. If someone changes things for ARM and  
everything seems to work, they could be subtley breaking x86 which  
they aren't familiar with and weren't thinking about when they made  
their change.


3. Three is better in some ways and worse in others. It's clear what's  
happening, there's a lot of flexibility, and the CPU isn't actually  
aware of -how- something is being done, just that, say, this would be  
a good time to check for interrupts, whatever that means. It's not as  
great because you have more complex interactions between ISAs and  
CPUs, and you have to do a bit more work in the ISA. It can also be  
hard to make some of this functionality work sensibly in order, out of  
order, single cycle, multi cycle, timing mode, atomic mode, etc. etc.


4. This one is the best when you can get away with it. This is where  
you, say, make your integer register file 32 registers because the ISA  
says that's how many it needs. Basically the only draw back to this is  
that behavior changes a little based on each ISA, but if you can get  
away with it this is the safest.




I think having a multi ISA simulator that will be modified by its end  
users, especially one with as large a cross product of options as  
ours, needs to try to be safe and simple before being as absolutely  
fast as it can be. You know what they say about premature  
optimization. Ideally we should design things so the big, unnecessary  
pieces of code just aren't part of the equation because they're in the  
ISA defined objects, control just doesn't go that way when it wouldn't  
make sense, etc. And if, for instance, a pointer should always be 0,  
the code should still behave correctly. The code should do its job  
correctly no matter -why- it's being asked to do it.


I think it's bad news to have a big list of yes or no checkboxes in  
each ISA directory which turn on and off behaviors. This is especially  
true when it's ambiguous whether to say yes or no, if the behavior  
changes based on circumstances, etc., and if/when the checkbox is  
interpretted subtley (or not so subtley) differently by the consumer.


In this particular limitted case it seems relatively ok although not  
necessarily advisable, but it's a slippery slope I don't consider us  
have a completely clean history with.


Gabe

Quoting Steve Reinhardt :


In generalI think this is the kind of ISA hook we should be using...
in the sense that checking TheISA::HasUnalignedMemAcc is much better
than "(TheISA == x86 || TheISA == Power)".  I think it's useful not
only to avoid the overhead of a dynamic check for an ISA that doesn't
need it, but also to clarify that this is code that never gets
executed on those ISAs.  Maybe for a little one-liner like this it's
not a big deal either way, but for bigger hunks of code I think that
clarification is potentially useful.

Steve

On Thu, Jul 22, 2010 at 1:10 PM, Gabriel Michael Black
 wrote:

Yes it does, and that sounds reasonable to me. I'd still like to see us use
ISA hooks as minimally as possible, but this seems ok.

Gabe

Quoting Timothy M Jones :


Oh, ok, I see where you're going with that.  However, the main idea of
having TheISA::HasUnalignedMemAcc was that it is a constant  
specific to each

ISA.  Therefore, the compiler should really recognise this and optimise it
away wherever it's used.  In this case, for ISAs that don't have unaligned
memory accesses the whole 'if' block should disappear.  For ISAs that do
have them then the condition should be reduced to just checking sreqLow.
 Therefore it's better for the first set of ISAs for this to be kept in.
 Does that make sense?

Tim

On Thu, 22 Jul 2010 14:30:56 -0400, Gabe Black 
wrote:


I think you missed my point. If the check of TheISA::HasUnalignedMemAcc
is redundant, we shouldn't be checking it at all. It's a free, though
very small, performance bump, but more significantly it removes a direct
dependence on the ISA.

Gabe

Timothy M. Jones wrote:


changeset 3bd51d6ac9ef in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=3bd51d6ac9ef
description:
       O

Re: [m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...

2010-07-22 Thread Steve Reinhardt
In generalI think this is the kind of ISA hook we should be using...
in the sense that checking TheISA::HasUnalignedMemAcc is much better
than "(TheISA == x86 || TheISA == Power)".  I think it's useful not
only to avoid the overhead of a dynamic check for an ISA that doesn't
need it, but also to clarify that this is code that never gets
executed on those ISAs.  Maybe for a little one-liner like this it's
not a big deal either way, but for bigger hunks of code I think that
clarification is potentially useful.

Steve

On Thu, Jul 22, 2010 at 1:10 PM, Gabriel Michael Black
 wrote:
> Yes it does, and that sounds reasonable to me. I'd still like to see us use
> ISA hooks as minimally as possible, but this seems ok.
>
> Gabe
>
> Quoting Timothy M Jones :
>
>> Oh, ok, I see where you're going with that.  However, the main idea of
>> having TheISA::HasUnalignedMemAcc was that it is a constant specific to each
>> ISA.  Therefore, the compiler should really recognise this and optimise it
>> away wherever it's used.  In this case, for ISAs that don't have unaligned
>> memory accesses the whole 'if' block should disappear.  For ISAs that do
>> have them then the condition should be reduced to just checking sreqLow.
>>  Therefore it's better for the first set of ISAs for this to be kept in.
>>  Does that make sense?
>>
>> Tim
>>
>> On Thu, 22 Jul 2010 14:30:56 -0400, Gabe Black 
>> wrote:
>>
>>> I think you missed my point. If the check of TheISA::HasUnalignedMemAcc
>>> is redundant, we shouldn't be checking it at all. It's a free, though
>>> very small, performance bump, but more significantly it removes a direct
>>> dependence on the ISA.
>>>
>>> Gabe
>>>
>>> Timothy M. Jones wrote:

 changeset 3bd51d6ac9ef in /z/repo/m5
 details: http://repo.m5sim.org/m5?cmd=changeset;node=3bd51d6ac9ef
 description:
        O3CPU: Fix a bug where stores in the cpu where never marked as
 split.

 diffstat:

 src/cpu/o3/lsq_unit.hh |  6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

 diffs (16 lines):

 diff -r 02b471d9d400 -r 3bd51d6ac9ef src/cpu/o3/lsq_unit.hh
 --- a/src/cpu/o3/lsq_unit.hh    Thu Jul 22 18:47:52 2010 +0100
 +++ b/src/cpu/o3/lsq_unit.hh    Thu Jul 22 18:52:02 2010 +0100
 @@ -822,6 +822,12 @@
    storeQueue[store_idx].sreqLow = sreqLow;
    storeQueue[store_idx].sreqHigh = sreqHigh;
    storeQueue[store_idx].size = sizeof(T);
 +
 +    // Split stores can only occur in ISAs with unaligned memory
 accesses.  If
 +    // a store request has been split, sreqLow and sreqHigh will be
 non-null.
 +    if (TheISA::HasUnalignedMemAcc && sreqLow) {
 +        storeQueue[store_idx].isSplit = true;
 +    }
    assert(sizeof(T) <= sizeof(storeQueue[store_idx].data));

    T gData = htog(data);
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

>>>
>>> ___
>>> m5-dev mailing list
>>> m5-dev@m5sim.org
>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>
>>
>>
>> --
>> Timothy M. Jones
>> http://homepages.inf.ed.ac.uk/tjones1
>>
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>> ___
>> m5-dev mailing list
>> m5-dev@m5sim.org
>> http://m5sim.org/mailman/listinfo/m5-dev
>>
>
>
> ___
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: BranchPred: Take the branch predictor out of O3CPU and make it a stand-alone

2010-07-22 Thread Timothy M Jones
On Thu, 22 Jul 2010 16:35:10 -0400, Steve Reinhardt   
wrote:


On Thu, Jul 22, 2010 at 1:12 PM, Timothy M Jones   
wrote:


So, after all this, which version do you want me to implement?  TID or  
ASID?

 I'll have a go at either.


I think you should go with the TID since that's the normal approach.
If Korey needs to use the ASID instead for his particular application
then he can add that as an option or just keep that change in his own
local patch queue... I can understand why he wants it but I'm not
convinced it's a change that's of general interest.


Ok, cool, thanks Steve.  I'll do that.

Cheers
Tim

--
Timothy M. Jones
http://homepages.inf.ed.ac.uk/tjones1

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: BranchPred: Take the branch predictor out of O3CPU and make it a stand-alone

2010-07-22 Thread Steve Reinhardt
On Thu, Jul 22, 2010 at 1:12 PM, Timothy M Jones  wrote:
>
> So, after all this, which version do you want me to implement?  TID or ASID?
>  I'll have a go at either.

I think you should go with the TID since that's the normal approach.
If Korey needs to use the ASID instead for his particular application
then he can add that as an option or just keep that change in his own
local patch queue... I can understand why he wants it but I'm not
convinced it's a change that's of general interest.

Steve
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: BranchPred: Take the branch predictor out of O3CPU and make it a stand-alone

2010-07-22 Thread Timothy M Jones

Hi Folks,

On Sun, 18 Jul 2010 10:46:38 -0400, Korey Sewell  wrote:


That seems odd... I assume you're suggesting that Tim switch the
InOrderCPU to pass the PC instead, right?


Yep, I'm suggesting that this should be updated.

Tim, you'll want to edit the code fetch_seq_unit.cc, execution_unit.cc,  
and
branch_predictor.cc to make sure that the PC gets passed throughout  
instead

of the nextPC.


Ok, thanks for that Korey.




> Also, the BTB accesses use the asid instead of the tid (to accomodate
> multithreading) so please dont remove that.

I'm confused... are you trying to let threads that run in the same
address space all share the same BTB entries?


Yes!






That might actually be

a good idea, but I don't know that any real CPUs work that way.  (Not
that I have intimate knowledge, I'd just be surprised... the ASID
generally has a lot more bits than the TID, esp. in x86 where the ASID
is really just the page table base pointer.)


Yes, as you might've guessed, I ran into some situations where the BTB is
becoming useless because of the amount of threads thrashing the entries  
when

really the entries could be shared within each other.

So do you suggest that we just use the "tid" instead of the "asid" for  
BTB

accesses to try to be more realistic for a general purpose processor
simulation?

It would seem that there would be something for this in modern CPUs to  
help

aid "the plight" of multithreaded programmersAlthough, If you assume
that the majority of programs are multiprogrammed I guess this "feature"
would only be useful for more application specific CPUs.

I'm not sure if there could be some type of BTB remapping for when  
hardware
threads are of the same asid or maybe some type of allocation of BTB  
entries
per thread, but it would be interesting to see what (if anything) people  
do

for this.

So, after all this, which version do you want me to implement?  TID or  
ASID?  I'll have a go at either.


Cheers
Tim

--
Timothy M. Jones
http://homepages.inf.ed.ac.uk/tjones1

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...

2010-07-22 Thread Gabriel Michael Black
Yes it does, and that sounds reasonable to me. I'd still like to see  
us use ISA hooks as minimally as possible, but this seems ok.


Gabe

Quoting Timothy M Jones :

Oh, ok, I see where you're going with that.  However, the main idea  
of having TheISA::HasUnalignedMemAcc was that it is a constant  
specific to each ISA.  Therefore, the compiler should really  
recognise this and optimise it away wherever it's used.  In this  
case, for ISAs that don't have unaligned memory accesses the whole  
'if' block should disappear.  For ISAs that do have them then the  
condition should be reduced to just checking sreqLow.  Therefore  
it's better for the first set of ISAs for this to be kept in.  Does  
that make sense?


Tim

On Thu, 22 Jul 2010 14:30:56 -0400, Gabe Black  wrote:


I think you missed my point. If the check of TheISA::HasUnalignedMemAcc
is redundant, we shouldn't be checking it at all. It's a free, though
very small, performance bump, but more significantly it removes a direct
dependence on the ISA.

Gabe

Timothy M. Jones wrote:

changeset 3bd51d6ac9ef in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=3bd51d6ac9ef
description:
O3CPU: Fix a bug where stores in the cpu where never marked as split.

diffstat:

src/cpu/o3/lsq_unit.hh |  6 ++
1 files changed, 6 insertions(+), 0 deletions(-)

diffs (16 lines):

diff -r 02b471d9d400 -r 3bd51d6ac9ef src/cpu/o3/lsq_unit.hh
--- a/src/cpu/o3/lsq_unit.hhThu Jul 22 18:47:52 2010 +0100
+++ b/src/cpu/o3/lsq_unit.hhThu Jul 22 18:52:02 2010 +0100
@@ -822,6 +822,12 @@
storeQueue[store_idx].sreqLow = sreqLow;
storeQueue[store_idx].sreqHigh = sreqHigh;
storeQueue[store_idx].size = sizeof(T);
+
+// Split stores can only occur in ISAs with unaligned memory  
accesses.  If
+// a store request has been split, sreqLow and sreqHigh will  
be non-null.

+if (TheISA::HasUnalignedMemAcc && sreqLow) {
+storeQueue[store_idx].isSplit = true;
+}
assert(sizeof(T) <= sizeof(storeQueue[store_idx].data));

T gData = htog(data);
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




--
Timothy M. Jones
http://homepages.inf.ed.ac.uk/tjones1

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...

2010-07-22 Thread Timothy M Jones
Oh, ok, I see where you're going with that.  However, the main idea of  
having TheISA::HasUnalignedMemAcc was that it is a constant specific to  
each ISA.  Therefore, the compiler should really recognise this and  
optimise it away wherever it's used.  In this case, for ISAs that don't  
have unaligned memory accesses the whole 'if' block should disappear.  For  
ISAs that do have them then the condition should be reduced to just  
checking sreqLow.  Therefore it's better for the first set of ISAs for  
this to be kept in.  Does that make sense?


Tim

On Thu, 22 Jul 2010 14:30:56 -0400, Gabe Black   
wrote:



I think you missed my point. If the check of TheISA::HasUnalignedMemAcc
is redundant, we shouldn't be checking it at all. It's a free, though
very small, performance bump, but more significantly it removes a direct
dependence on the ISA.

Gabe

Timothy M. Jones wrote:

changeset 3bd51d6ac9ef in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=3bd51d6ac9ef
description:
O3CPU: Fix a bug where stores in the cpu where never marked as split.

diffstat:

 src/cpu/o3/lsq_unit.hh |  6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diffs (16 lines):

diff -r 02b471d9d400 -r 3bd51d6ac9ef src/cpu/o3/lsq_unit.hh
--- a/src/cpu/o3/lsq_unit.hhThu Jul 22 18:47:52 2010 +0100
+++ b/src/cpu/o3/lsq_unit.hhThu Jul 22 18:52:02 2010 +0100
@@ -822,6 +822,12 @@
 storeQueue[store_idx].sreqLow = sreqLow;
 storeQueue[store_idx].sreqHigh = sreqHigh;
 storeQueue[store_idx].size = sizeof(T);
+
+// Split stores can only occur in ISAs with unaligned memory  
accesses.  If
+// a store request has been split, sreqLow and sreqHigh will be  
non-null.

+if (TheISA::HasUnalignedMemAcc && sreqLow) {
+storeQueue[store_idx].isSplit = true;
+}
 assert(sizeof(T) <= sizeof(storeQueue[store_idx].data));

 T gData = htog(data);
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




--
Timothy M. Jones
http://homepages.inf.ed.ac.uk/tjones1

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] changeset in m5: Power: The condition register should be set or ...

2010-07-22 Thread Timothy M Jones

Ah, sorry, I'll be sure to take this into account in the future.

Tim

On Thu, 22 Jul 2010 14:29:44 -0400, Gabe Black   
wrote:



In mercurial, the first line of a commit description is treated
specially and should be a one line summary of the change. You're first
line here and in a few other changes I noticed wraps to the second line
which is a no-no. In the future, please keep it to one line even if you
have to leave out details, and then be complete with the other lines.

Gabe

Timothy M. Jones wrote:

changeset ffac9df60637 in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=ffac9df60637
description:
	Power: The condition register should be set or cleared upon a system  
call

return to indicate success or failure.

diffstat:

 src/arch/power/miscregs.hh |  7 ++-
 src/arch/power/process.cc  |  7 +++
 2 files changed, 13 insertions(+), 1 deletions(-)

diffs (33 lines):

diff -r bd104adbf04d -r ffac9df60637 src/arch/power/miscregs.hh
--- a/src/arch/power/miscregs.hhThu Jul 22 18:54:37 2010 +0100
+++ b/src/arch/power/miscregs.hhThu Jul 22 18:54:37 2010 +0100
@@ -44,7 +44,12 @@
 };

 BitUnion32(Cr)
-Bitfield<31,28> cr0;
+SubBitUnion(cr0, 31, 28)
+Bitfield<31> lt;
+Bitfield<30> gt;
+Bitfield<29> eq;
+Bitfield<28> so;
+EndSubBitUnion(cr0)
 Bitfield<27,24> cr1;
 EndBitUnion(Cr)

diff -r bd104adbf04d -r ffac9df60637 src/arch/power/process.cc
--- a/src/arch/power/process.cc Thu Jul 22 18:54:37 2010 +0100
+++ b/src/arch/power/process.cc Thu Jul 22 18:54:37 2010 +0100
@@ -284,5 +284,12 @@
 PowerLiveProcess::setSyscallReturn(ThreadContext *tc,
 SyscallReturn return_value)
 {
+Cr cr = tc->readIntReg(INTREG_CR);
+if (return_value.successful()) {
+cr.cr0.so = 0;
+} else {
+cr.cr0.so = 1;
+}
+tc->setIntReg(INTREG_CR, cr);
 tc->setIntReg(ReturnValueReg, return_value.value());
 }
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




--
Timothy M. Jones
http://homepages.inf.ed.ac.uk/tjones1

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...

2010-07-22 Thread Gabe Black
I think you missed my point. If the check of TheISA::HasUnalignedMemAcc
is redundant, we shouldn't be checking it at all. It's a free, though
very small, performance bump, but more significantly it removes a direct
dependence on the ISA.

Gabe

Timothy M. Jones wrote:
> changeset 3bd51d6ac9ef in /z/repo/m5
> details: http://repo.m5sim.org/m5?cmd=changeset;node=3bd51d6ac9ef
> description:
>   O3CPU: Fix a bug where stores in the cpu where never marked as split.
>
> diffstat:
>
>  src/cpu/o3/lsq_unit.hh |  6 ++
>  1 files changed, 6 insertions(+), 0 deletions(-)
>
> diffs (16 lines):
>
> diff -r 02b471d9d400 -r 3bd51d6ac9ef src/cpu/o3/lsq_unit.hh
> --- a/src/cpu/o3/lsq_unit.hh  Thu Jul 22 18:47:52 2010 +0100
> +++ b/src/cpu/o3/lsq_unit.hh  Thu Jul 22 18:52:02 2010 +0100
> @@ -822,6 +822,12 @@
>  storeQueue[store_idx].sreqLow = sreqLow;
>  storeQueue[store_idx].sreqHigh = sreqHigh;
>  storeQueue[store_idx].size = sizeof(T);
> +
> +// Split stores can only occur in ISAs with unaligned memory accesses.  
> If
> +// a store request has been split, sreqLow and sreqHigh will be non-null.
> +if (TheISA::HasUnalignedMemAcc && sreqLow) {
> +storeQueue[store_idx].isSplit = true;
> +}
>  assert(sizeof(T) <= sizeof(storeQueue[store_idx].data));
>  
>  T gData = htog(data);
> ___
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>   

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] changeset in m5: Power: The condition register should be set or ...

2010-07-22 Thread Gabe Black
In mercurial, the first line of a commit description is treated
specially and should be a one line summary of the change. You're first
line here and in a few other changes I noticed wraps to the second line
which is a no-no. In the future, please keep it to one line even if you
have to leave out details, and then be complete with the other lines.

Gabe

Timothy M. Jones wrote:
> changeset ffac9df60637 in /z/repo/m5
> details: http://repo.m5sim.org/m5?cmd=changeset;node=ffac9df60637
> description:
>   Power: The condition register should be set or cleared upon a system 
> call
>   return to indicate success or failure.
>
> diffstat:
>
>  src/arch/power/miscregs.hh |  7 ++-
>  src/arch/power/process.cc  |  7 +++
>  2 files changed, 13 insertions(+), 1 deletions(-)
>
> diffs (33 lines):
>
> diff -r bd104adbf04d -r ffac9df60637 src/arch/power/miscregs.hh
> --- a/src/arch/power/miscregs.hh  Thu Jul 22 18:54:37 2010 +0100
> +++ b/src/arch/power/miscregs.hh  Thu Jul 22 18:54:37 2010 +0100
> @@ -44,7 +44,12 @@
>  };
>  
>  BitUnion32(Cr)
> -Bitfield<31,28> cr0;
> +SubBitUnion(cr0, 31, 28)
> +Bitfield<31> lt;
> +Bitfield<30> gt;
> +Bitfield<29> eq;
> +Bitfield<28> so;
> +EndSubBitUnion(cr0)
>  Bitfield<27,24> cr1;
>  EndBitUnion(Cr)
>  
> diff -r bd104adbf04d -r ffac9df60637 src/arch/power/process.cc
> --- a/src/arch/power/process.cc   Thu Jul 22 18:54:37 2010 +0100
> +++ b/src/arch/power/process.cc   Thu Jul 22 18:54:37 2010 +0100
> @@ -284,5 +284,12 @@
>  PowerLiveProcess::setSyscallReturn(ThreadContext *tc,
>  SyscallReturn return_value)
>  {
> +Cr cr = tc->readIntReg(INTREG_CR);
> +if (return_value.successful()) {
> +cr.cr0.so = 0;
> +} else {
> +cr.cr0.so = 1;
> +}
> +tc->setIntReg(INTREG_CR, cr);
>  tc->setIntReg(ReturnValueReg, return_value.value());
>  }
> ___
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>   

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] changeset in m5: Power: The condition register should be set or ...

2010-07-22 Thread Timothy M. Jones
changeset ffac9df60637 in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=ffac9df60637
description:
Power: The condition register should be set or cleared upon a system 
call
return to indicate success or failure.

diffstat:

 src/arch/power/miscregs.hh |  7 ++-
 src/arch/power/process.cc  |  7 +++
 2 files changed, 13 insertions(+), 1 deletions(-)

diffs (33 lines):

diff -r bd104adbf04d -r ffac9df60637 src/arch/power/miscregs.hh
--- a/src/arch/power/miscregs.hhThu Jul 22 18:54:37 2010 +0100
+++ b/src/arch/power/miscregs.hhThu Jul 22 18:54:37 2010 +0100
@@ -44,7 +44,12 @@
 };
 
 BitUnion32(Cr)
-Bitfield<31,28> cr0;
+SubBitUnion(cr0, 31, 28)
+Bitfield<31> lt;
+Bitfield<30> gt;
+Bitfield<29> eq;
+Bitfield<28> so;
+EndSubBitUnion(cr0)
 Bitfield<27,24> cr1;
 EndBitUnion(Cr)
 
diff -r bd104adbf04d -r ffac9df60637 src/arch/power/process.cc
--- a/src/arch/power/process.cc Thu Jul 22 18:54:37 2010 +0100
+++ b/src/arch/power/process.cc Thu Jul 22 18:54:37 2010 +0100
@@ -284,5 +284,12 @@
 PowerLiveProcess::setSyscallReturn(ThreadContext *tc,
 SyscallReturn return_value)
 {
+Cr cr = tc->readIntReg(INTREG_CR);
+if (return_value.successful()) {
+cr.cr0.so = 0;
+} else {
+cr.cr0.so = 1;
+}
+tc->setIntReg(INTREG_CR, cr);
 tc->setIntReg(ReturnValueReg, return_value.value());
 }
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] changeset in m5: LSQ Unit: After deleting part of a split reques...

2010-07-22 Thread Timothy M. Jones
changeset bd104adbf04d in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=bd104adbf04d
description:
LSQ Unit: After deleting part of a split request, set it to NULL so 
that it
isn't accidentally deleted again later (causing a segmentation fault).

diffstat:

 src/cpu/o3/lsq_unit.hh |  3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diffs (20 lines):

diff -r fb7fc9aca918 -r bd104adbf04d src/cpu/o3/lsq_unit.hh
--- a/src/cpu/o3/lsq_unit.hhThu Jul 22 18:54:37 2010 +0100
+++ b/src/cpu/o3/lsq_unit.hhThu Jul 22 18:54:37 2010 +0100
@@ -744,6 +744,8 @@
 delete fst_data_pkt;
 delete snd_data_pkt->req;
 delete snd_data_pkt;
+sreqLow = NULL;
+sreqHigh = NULL;
 }
 
 req = NULL;
@@ -769,6 +771,7 @@
 state->complete();
 
 req = NULL;
+sreqHigh = NULL;
 
 lsq->setRetryTid(lsqID);
 }
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] changeset in m5: Port: Only indicate that a SimpleTimingPort is ...

2010-07-22 Thread Timothy M. Jones
changeset fb7fc9aca918 in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=fb7fc9aca918
description:
Port: Only indicate that a SimpleTimingPort is drained if its send 
event is
not scheduled, as well as the transmit list being empty.

diffstat:

 src/mem/cache/cache_impl.hh |  2 +-
 src/mem/tport.cc|  4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diffs (33 lines):

diff -r 3bd51d6ac9ef -r fb7fc9aca918 src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Thu Jul 22 18:52:02 2010 +0100
+++ b/src/mem/cache/cache_impl.hh   Thu Jul 22 18:54:37 2010 +0100
@@ -1569,7 +1569,7 @@
 schedule(sendEvent, std::max(nextReady, curTick + 1));
 } else {
 // no more to send right now: if we're draining, we may be done
-if (drainEvent) {
+if (drainEvent && !sendEvent->scheduled()) {
 drainEvent->process();
 drainEvent = NULL;
 }
diff -r 3bd51d6ac9ef -r fb7fc9aca918 src/mem/tport.cc
--- a/src/mem/tport.cc  Thu Jul 22 18:52:02 2010 +0100
+++ b/src/mem/tport.cc  Thu Jul 22 18:54:37 2010 +0100
@@ -155,7 +155,7 @@
 schedule(sendEvent, time <= curTick ? curTick+1 : time);
 }
 
-if (transmitList.empty() && drainEvent) {
+if (transmitList.empty() && drainEvent && !sendEvent->scheduled()) {
 drainEvent->process();
 drainEvent = NULL;
 }
@@ -195,7 +195,7 @@
 unsigned int
 SimpleTimingPort::drain(Event *de)
 {
-if (transmitList.size() == 0)
+if (transmitList.size() == 0 && !sendEvent->scheduled())
 return 0;
 drainEvent = de;
 return 1;
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...

2010-07-22 Thread Timothy M. Jones
changeset 3bd51d6ac9ef in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=3bd51d6ac9ef
description:
O3CPU: Fix a bug where stores in the cpu where never marked as split.

diffstat:

 src/cpu/o3/lsq_unit.hh |  6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diffs (16 lines):

diff -r 02b471d9d400 -r 3bd51d6ac9ef src/cpu/o3/lsq_unit.hh
--- a/src/cpu/o3/lsq_unit.hhThu Jul 22 18:47:52 2010 +0100
+++ b/src/cpu/o3/lsq_unit.hhThu Jul 22 18:52:02 2010 +0100
@@ -822,6 +822,12 @@
 storeQueue[store_idx].sreqLow = sreqLow;
 storeQueue[store_idx].sreqHigh = sreqHigh;
 storeQueue[store_idx].size = sizeof(T);
+
+// Split stores can only occur in ISAs with unaligned memory accesses.  If
+// a store request has been split, sreqLow and sreqHigh will be non-null.
+if (TheISA::HasUnalignedMemAcc && sreqLow) {
+storeQueue[store_idx].isSplit = true;
+}
 assert(sizeof(T) <= sizeof(storeQueue[store_idx].data));
 
 T gData = htog(data);
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] changeset in m5: Syscall: Don't close the simulator's standard f...

2010-07-22 Thread Timothy M. Jones
changeset 02b471d9d400 in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=02b471d9d400
description:
Syscall: Don't close the simulator's standard file descriptors.

diffstat:

 src/sim/syscall_emul.cc |  5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diffs (15 lines):

diff -r b1ac6773e83d -r 02b471d9d400 src/sim/syscall_emul.cc
--- a/src/sim/syscall_emul.cc   Thu Jul 22 18:47:43 2010 +0100
+++ b/src/sim/syscall_emul.cc   Thu Jul 22 18:47:52 2010 +0100
@@ -186,7 +186,10 @@
 {
 int index = 0;
 int target_fd = p->getSyscallArg(tc, index);
-int status = close(p->sim_fd(target_fd));
+int sim_fd = p->sim_fd(target_fd);
+int status = 0;
+if (sim_fd > 2)
+status = close(sim_fd);
 if (status >= 0)
 p->free_fd(target_fd);
 return status;
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] changeset in m5: O3CPU: O3's tick event gets squashed when it is...

2010-07-22 Thread Timothy M. Jones
changeset b1ac6773e83d in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=b1ac6773e83d
description:
O3CPU: O3's tick event gets squashed when it is switched out.  When 
repeatedly
switching between O3 and another CPU, O3's tick event might still be 
scheduled
in the event queue (as squashed).  Therefore, check for a squashed tick 
event
as well as a non-scheduled event when taking over from another CPU and 
deal
with it accordingly.

diffstat:

 src/cpu/o3/cpu.cc |  4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diffs (21 lines):

diff -r e76cc0ca16cc -r b1ac6773e83d src/cpu/o3/cpu.cc
--- a/src/cpu/o3/cpu.cc Thu Jul 22 18:47:03 2010 +0100
+++ b/src/cpu/o3/cpu.cc Thu Jul 22 18:47:43 2010 +0100
@@ -1143,7 +1143,7 @@
 iew.takeOverFrom();
 commit.takeOverFrom();
 
-assert(!tickEvent.scheduled());
+assert(!tickEvent.scheduled() || tickEvent.squashed());
 
 // @todo: Figure out how to properly select the tid to put onto
 // the active threads list.
@@ -1168,7 +1168,7 @@
 ThreadContext *tc = threadContexts[i];
 if (tc->status() == ThreadContext::Active && _status != Running) {
 _status = Running;
-schedule(tickEvent, nextCycle());
+reschedule(tickEvent, nextCycle(), true);
 }
 }
 if (!tickEvent.scheduled())
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] changeset in m5: Power: Provide a utility function to copy regis...

2010-07-22 Thread Timothy M. Jones
changeset e76cc0ca16cc in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=e76cc0ca16cc
description:
Power: Provide a utility function to copy registers from one thread 
context
to another in the Power ISA.

diffstat:

 src/arch/power/SConscript |   1 +
 src/arch/power/utility.cc |  58 +++
 src/arch/power/utility.hh |   8 +
 3 files changed, 61 insertions(+), 6 deletions(-)

diffs (95 lines):

diff -r 7772a8bf76ee -r e76cc0ca16cc src/arch/power/SConscript
--- a/src/arch/power/SConscript Wed Jul 21 18:54:53 2010 -0700
+++ b/src/arch/power/SConscript Thu Jul 22 18:47:03 2010 +0100
@@ -42,6 +42,7 @@
 Source('insts/static_inst.cc')
 Source('pagetable.cc')
 Source('tlb.cc')
+Source('utility.cc')
 
 SimObject('PowerTLB.py')
 TraceFlag('Power')
diff -r 7772a8bf76ee -r e76cc0ca16cc src/arch/power/utility.cc
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/src/arch/power/utility.cc Thu Jul 22 18:47:03 2010 +0100
@@ -0,0 +1,58 @@
+/*
+ * Copyright (c) 2003-2005 The Regents of The University of Michigan
+ * Copyright (c) 2007-2008 The Florida State University
+ * Copyright (c) 2009 The University of Edinburgh
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met: redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer;
+ * redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution;
+ * neither the name of the copyright holders nor the names of its
+ * contributors may be used to endorse or promote products derived from
+ * this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Authors: Korey Sewell
+ *  Stephen Hines
+ *  Timothy M. Jones
+ */
+
+#include "arch/power/utility.hh"
+
+namespace PowerISA {
+
+void
+copyRegs(ThreadContext *src, ThreadContext *dest)
+{
+// First loop through the integer registers.
+for (int i = 0; i < NumIntRegs; ++i)
+dest->setIntReg(i, src->readIntReg(i));
+
+// Then loop through the floating point registers.
+for (int i = 0; i < NumFloatRegs; ++i)
+dest->setFloatRegBits(i, src->readFloatRegBits(i));
+
+// Copy misc. registers
+copyMiscRegs(src, dest);
+
+// Lastly copy PC/NPC
+dest->setPC(src->readPC());
+dest->setNextPC(src->readNextPC());
+}
+
+} // PowerISA namespace
diff -r 7772a8bf76ee -r e76cc0ca16cc src/arch/power/utility.hh
--- a/src/arch/power/utility.hh Wed Jul 21 18:54:53 2010 -0700
+++ b/src/arch/power/utility.hh Thu Jul 22 18:47:03 2010 +0100
@@ -101,16 +101,12 @@
 return NoFault;
 }
 
-static inline void
-copyRegs(ThreadContext *src, ThreadContext *dest)
-{
-panic("Copy Regs Not Implemented Yet\n");
-}
+void
+copyRegs(ThreadContext *src, ThreadContext *dest);
 
 static inline void
 copyMiscRegs(ThreadContext *src, ThreadContext *dest)
 {
-panic("Copy Misc. Regs Not Implemented Yet\n");
 }
 
 } // PowerISA namespace
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] times syscall fix / ThreadContext suspension-reactivation

2010-07-22 Thread Steve Reinhardt
On Thu, Jul 22, 2010 at 10:03 AM, Korey Sewell  wrote:
> The problem is that there are some conventions that we would need to point
> out and follow (i.e. Do threads start out Halted, Idle, or ???)

Yea, I know that's the key to the problems we had before, is that
different people had different mental notions of what these states
mean, which led to inconsistencies across the CPU models in terms of
how they were/are handled.  I agree that, ideally, the first step is
to hash out some definitions so we can agree on what they should mean
before we go changing the code again.  Though if Ioannis wants to hack
in some workarounds just to get his situation working he may want to
do that rather than waiting on us to fix everything :-).  But we would
value his input in making sure that the long-term solution meets his
needs as well as everyone else's.

> and then
> also sort out (at least for me) what is the true role and relationship
> between ThreadContext and ThreadState. I always was confused by those two
> and thought they should be merged into the same object

Off the top of my head, the former is an interface while the latter is
the canonical (but not necessarily only) implementation of that
interface.  But if it turns out that there's no foreseeable need for
an alternative implementation then perhaps they should be merged.  Or
if my characterization is wrong please enlighten me.

> I know at one point,
> Steve was frustrated enough to just get things working and commit a change
> for this stuff (maybe 1yr ago),

Yup, I remember that too.

> Lastly, there are some issues of how the architectural state gets
> transferred in/out in SE mode since there is no explicit OS handler to do
> this for you (e.g. assign a thread to a SMT hardware context in O3, before
> there are enough physical registers ... or flushing the pipeline and
> deallocating registers after a thread is suspended or halted)

My general recollection is that a lot of the complexity came from
trying to implement fancy SE-mode thread scheduling capabilities that
never really worked and were never used.  That's probably an
overgeneralization of the situation, but to the extent that it's true,
my reaction is that we should just scale back to the basics and get
rid of any code that doesn't work.  For example, just support some
fixed number of hardware thread contexts without any dynamic adding or
removing of contexts.  (You will want to enable/disable or
suspend/resume the ones that are there--are those different operations
or the same thing?--but you never expand or reclaim the architectural
state; I think that's one place where things got complicated and
broken.)

People who want serious thread scheduling should probably be using FS
mode anyway, and people who really think they have a good reason to
play with thread scheduling in SE mode probably want something very
specific and so if the basic features aren't adequate then it's up to
them to add in the features they need.  Again, that's just my
recollection/opinion, so speak up if you disagree.

> If people want, I can try to summarize what changes I think need to be made
> and then others with more knowledge on how that relates to things working
> for FS_MODE can chime in as well.

That would be awesome.  I suggest starting a wiki page that tries to
document how it ought to be, so that when we're done tweaking it we'll
automatically have some documentation in place for future reference.

Steve
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] times syscall fix / ThreadContext suspension-reactivation

2010-07-22 Thread Korey Sewell
On Thu, Jul 22, 2010 at 12:53 PM, nathan binkert  wrote:

> > This looks like a legit bug. In getting SMT to work as threads are
> > spawned/respawned/suspended/etc. there may definitely be some bugs there
> in
> > code assuming the CPU only has 1 thread context.
> >
> > do you want to give fixing that a go and then send out a patch to the
> m5-dev
> > list (or better yet reviewboard)?
>
> To be honest, unless it's changed in the last couple of years, the
> whole wakeup context suspend resume everything is pretty messy.  I
> remember trying to make sense of it years ago.  It would be extremely
> helpful if someone could clean this all up or at least come up with a
> proposal for a direction we'd like to take this code.
>
I actually redid this mechanism for the InOrderCPU (in terms of coordinating
when events are called) and I wanted to re-do how the O3 was done to match
that modeling.

The problem is that there are some conventions that we would need to point
out and follow (i.e. Do threads start out Halted, Idle, or ???) and then
also sort out (at least for me) what is the true role and relationship
between ThreadContext and ThreadState. I always was confused by those two
and thought they should be merged into the same object. I know at one point,
Steve was frustrated enough to just get things working and commit a change
for this stuff (maybe 1yr ago),

Lastly, there are some issues of how the architectural state gets
transferred in/out in SE mode since there is no explicit OS handler to do
this for you (e.g. assign a thread to a SMT hardware context in O3, before
there are enough physical registers ... or flushing the pipeline and
deallocating registers after a thread is suspended or halted)

If people want, I can try to summarize what changes I think need to be made
and then others with more knowledge on how that relates to things working
for FS_MODE can chime in as well.
-- 
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] times syscall fix / ThreadContext suspension-reactivation

2010-07-22 Thread nathan binkert
> This looks like a legit bug. In getting SMT to work as threads are
> spawned/respawned/suspended/etc. there may definitely be some bugs there in
> code assuming the CPU only has 1 thread context.
>
> do you want to give fixing that a go and then send out a patch to the m5-dev
> list (or better yet reviewboard)?

To be honest, unless it's changed in the last couple of years, the
whole wakeup context suspend resume everything is pretty messy.  I
remember trying to make sense of it years ago.  It would be extremely
helpful if someone could clean this all up or at least come up with a
proposal for a direction we'd like to take this code.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] changeset in m5: stats: unify the two stats distribution type be...

2010-07-22 Thread Ali Saidi

This fixed my problem... Thanks Nate!

Ali


On Wed, 21 Jul 2010 21:54:56 -0400, Nathan Binkert
begin_of_the_skype_highlighting end_of_the_skype_highlighting

wrote:
> changeset 7772a8bf76ee in /z/repo/m5
> details: http://repo.m5sim.org/m5?cmd=changeset;node=7772a8bf76ee
> description:
>   stats: unify the two stats distribution type better
> 
> diffstat:
> 
>  src/base/statistics.hh  |  34 ++
>  src/base/stats/info.hh  |  31 ---
>  src/base/stats/mysql.cc |  14 --
>  src/base/stats/text.cc  |  36 
>  4 files changed, 58 insertions(+), 57 deletions(-)
> 
> diffs (257 lines):
> 
> diff -r ad631c296c9b -r 7772a8bf76ee src/base/statistics.hh
> --- a/src/base/statistics.hh  Wed Jul 21 15:53:53 2010 -0700
> +++ b/src/base/statistics.hh  Wed Jul 21 18:54:53 2010 -0700
> @@ -160,6 +160,11 @@
>  Vector2dInfoProxy(Stat &stat) : InfoProxy(stat)
{}
>  };
>  
> +struct StorageParams
> +{
> +virtual ~StorageParams();
> +};
> +
>  class InfoAccess
>  {
>protected:
> @@ -1269,6 +1274,12 @@
>  // Non formula statistics
>  //
>  //
> +/** The parameters for a distribution stat. */
> +struct DistParams : public StorageParams
> +{
> +const DistType type;
> +DistParams(DistType t) : type(t) {}
> +};
>  
>  /**
>   * Templatized storage and interface for a distrbution stat.
> @@ -1279,6 +1290,15 @@
>  /** The parameters for a distribution stat. */
>  struct Params : public DistParams
>  {
> +/** The minimum value to track. */
> +Counter min;
> +/** The maximum value to track. */
> +Counter max;
> +/** The number of entries in each bucket. */
> +Counter bucket_size;
> +/** The number of buckets. Equal to (max-min)/bucket_size. */
> +size_type buckets;
> +
>  Params() : DistParams(Dist) {}
>  };
>  
> @@ -1368,6 +1388,12 @@
>  {
>  const Params *params = safe_cast  *>(info->storageParams);
>  
> +assert(params->type == Dist);
> +data.type = params->type;
> +data.min = params->min;
> +data.max = params->max;
> +data.bucket_size = params->bucket_size;
> +
>  data.min_val = (min_val == CounterLimits::max()) ? 0 : min_val;
>  data.max_val = (max_val == CounterLimits::min()) ? 0 : max_val;
>  data.underflow = underflow;
> @@ -1468,6 +1494,10 @@
>  void
>  prepare(Info *info, DistData &data)
>  {
> +const Params *params = safe_cast *>(info->storageParams);
> +
> +assert(params->type == Deviation);
> +data.type = params->type;
>  data.sum = sum;
>  data.squares = squares;
>  data.samples = samples;
> @@ -1540,6 +1570,10 @@
>  void
>  prepare(Info *info, DistData &data)
>  {
> +const Params *params = safe_cast *>(info->storageParams);
> +
> +assert(params->type == Deviation);
> +data.type = params->type;
>  data.sum = sum;
>  data.squares = squares;
>  data.samples = curTick;
> diff -r ad631c296c9b -r 7772a8bf76ee src/base/stats/info.hh
> --- a/src/base/stats/info.hh  Wed Jul 21 15:53:53 2010 -0700
> +++ b/src/base/stats/info.hh  Wed Jul 21 18:54:53 2010 -0700
> @@ -61,11 +61,7 @@
>  /** Mask of flags that can't be set directly */
>  const FlagsType __reserved =init | display;
>  
> -struct StorageParams
> -{
> -virtual ~StorageParams();
> -};
> -
> +struct StorageParams;
>  struct Visit;
>  
>  class Info
> @@ -168,8 +164,15 @@
>  virtual Result total() const = 0;
>  };
>  
> +enum DistType { Deviation, Dist };
> +
>  struct DistData
>  {
> +DistType type;
> +Counter min;
> +Counter max;
> +Counter bucket_size;
> +
>  Counter min_val;
>  Counter max_val;
>  Counter underflow;
> @@ -180,24 +183,6 @@
>  Counter samples;
>  };
>  
> -enum DistType { Deviation, Dist };
> -
> -struct DistParams : public StorageParams
> -{
> -const DistType type;
> -
> -/** The minimum value to track. */
> -Counter min;
> -/** The maximum value to track. */
> -Counter max;
> -/** The number of entries in each bucket. */
> -Counter bucket_size;
> -/** The number of buckets. Equal to (max-min)/bucket_size. */
> -size_type buckets;
> -
> -explicit DistParams(DistType t) : type(t) {}
> -};
> -
>  class DistInfo : public Info
>  {
>public:
> diff -r ad631c296c9b -r 7772a8bf76ee src/base/stats/mysql.cc
> --- a/src/base/stats/mysql.cc Wed Jul 21 15:53:53 2010 -0700
> +++ b/src/base/stats/mysql.cc Wed Jul 21 18:54:53 2010 -0700
> @@ -481,9 +481,10 @@
>  if (!configure(info, "DIST"))
>  return;
>  
> -const DistParams *params =
> -safe_cast(info.storageParams);
> -if (params->type == Dist) {
> +const DistStor::Params *params =
> +dynamic_cast(info.stora

Re: [m5-dev] times syscall fix / ThreadContext suspension-reactivation

2010-07-22 Thread Korey Sewell
> Secondly, I have implemented the nanosleep() system call. I believe that
> the appropriate way to do this would be suspending (tc->suspend()) the
> ThreadContext and re-activating (tc->activate(delay)) after the sleep
> duration (please correct me if I am mistaken). This however led to a
> peculiar behaviour with O3 in SMT mode:
> 1. in FullO3CPU::suspendContext (cpu.cc) the CPU _status is set to
> Idle, regardless whether there are other running contexts in the CPU. Thus a
> thread sleeping leads to all the other SMT threads sleeping.
>

This looks like a legit bug. In getting SMT to work as threads are
spawned/respawned/suspended/etc. there may definitely be some bugs there in
code assuming the CPU only has 1 thread context.

do you want to give fixing that a go and then send out a patch to the m5-dev
list (or better yet reviewboard)?


-- 
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] O3CPU + translateTiming

2010-07-22 Thread Korey Sewell
Eventually, the InOrderCPU's "DynInst" object will be merged with the
base_dyn_inst.hh.

If memory serves correct, in the original implementation I did not want to
deal with the templated-definition and derivation of the base_dyn_inst.hh
(at least upfront) and so DynInst specific to InOrderCPU was used.

On Thu, Jul 22, 2010 at 9:35 AM, Steve Reinhardt  wrote:

> I think you stated it pretty accurately, Gabe.  It looks like the
> base_dyn_inst.hh file is only used in o3, ozone, and checker, and of
> those only o3 is really being used right now.
>
> Steve
>
> On Thu, Jul 22, 2010 at 12:56 AM, Gabe Black 
> wrote:
> > The dynamic instruction object is really just the dynamic information
> > associated with an instruction, as apposed to the static instruction
> > object that gets reused. Strictly speaking there's no guarantee that an
> > arbitrary dynamic instruction will always use timing mode, but all the
> > CPU models we have that are complicated enough to need dynamic
> > instructions use timing mode exclusively as far as I know. O3's dynamic
> > instruction object for sure uses only timing mode, so yes, in that case,
> > the data parameter is just so the function signatures are the same in
> > the read case. initiateAcc is how memory instructions access memory in
> > timing mode, so again, in this case and in most practical cases a
> > dynamic instruction object's read function would be called from
> > initiateAcc and not execute. If you're only worried about O3 (ie. it's
> > in the o3 directory) you don't need to keep track of data. If it's the
> > base dynamic instruction object it's a little less clear since
> > -practically- speaking it will probably only be used with timing mode,
> > but there isn't any reason I can think of that has to be true in all
> > cases. I think the base dynamic instruction object may actually not be
> > very far separated from O3 so it may already assume timing mode in other
> > places.
> >
> > Please someone speak up if I'm wildly mischaracterizing how this is
> > supposed to work. I've worked a lot with O3's innards, but I think all
> > the design work was done before I was associated with M5.
> >
> > Gabe
> >
> > Min Kyu Jeong wrote:
> >> so base_dyn_inst is always used timing memory - I assumed so but just
> >> wanted to confirm this to make sure that read function,
> >>
> >> BaseDynInst::read(Addr addr, T &data, unsigned flags)
> >>
> >> 'data' argument isn't really doing anything but being a placeholder
> >> for func sig matching in xc interface. -- is this correct?
> >>
> >> BaseDynInst::read() will be called only in initiateAcc(), not in
> >> execute() function. When the initiateAcc()/completeAcc() pair is used,
> >> pkt->get() is used in completeAcc() to read the data.
> >>
> >> I am moving some code from read() to finishTranslation() and it looks
> >> like 'data' variable doesn't need to be passed.
> >>
> >> Thanks,
> >>
> >> Min
> >>
> >> On Thu, Jul 15, 2010 at 10:01 AM, Steve Reinhardt  >> > wrote:
> >>
> >> On Wed, Jul 14, 2010 at 8:35 AM, Min Kyu Jeong  >> > wrote:
> >> >  b) memory is atomic (is it a possible combination? dyn_inst +
> >> atomic?) -
> >> > x86 doesn't seem to have code for this case -
> >> Walker::recvAtomic() does
> >> > nothing.
> >>
> >> I don't believe that O3+atomic mode works.  Practically speaking it
> >> doesn't make any sense.
> >>
> >> Steve
> >> ___
> >> m5-dev mailing list
> >> m5-dev@m5sim.org 
> >> http://m5sim.org/mailman/listinfo/m5-dev
> >>
> >>
> >> 
> >>
> >> ___
> >> m5-dev mailing list
> >> m5-dev@m5sim.org
> >> http://m5sim.org/mailman/listinfo/m5-dev
> >>
> >
> > ___
> > m5-dev mailing list
> > m5-dev@m5sim.org
> > http://m5sim.org/mailman/listinfo/m5-dev
> >
> ___
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>



-- 
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] diff reviewing

2010-07-22 Thread nathan binkert
I just wanted to point out to everyone on the list that we welcome
everyone's comments on reviewing patches (not just the core
developers).  So, if you see a patch that you are interested in for
whatever reason, please review it.  The more eyeballs on the code, the
better.  If there are features that you'd like to see, you can help
them get into M5 faster if you help the discussion and the reviews.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: O3CPU: Fix a bug where stores in the cpu where never marked as split.

2010-07-22 Thread Timothy Jones


> On 2010-07-22 00:26:07, Gabe Black wrote:
> > src/cpu/o3/lsq_unit.hh, line 825
> > 
> >
> > Is it possible for sreqLow to be non-null and 
> > TheISA::HasUnalignedMemAcc -not- to be true? In that instance, wouldn't 
> > this still be a split access? Or does the code not shown in this diff make 
> > that not work?
> > 
> > If this is just to make it more obvious what sort of condition your 
> > checking for a comment would be better, or if it's a sanity check sort of 
> > thing an assert.

This case should never happen.  In BaseDynInst::read and 
BaseDynInst::write, sreqLow and sreqHigh get set to NULL.  They are only 
set non-null when a request is split in two, which can only happen if 
TheISA::HasUnalignedMemAcc is true (and the access crosses a cache line 
boundary).  I'll add a comment into the code to clarify what's happening.


- Timothy


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/54/#review105
---


On 2010-07-09 18:20:19, Timothy Jones wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/54/
> ---
> 
> (Updated 2010-07-09 18:20:19)
> 
> 
> Review request for Default.
> 
> 
> Summary
> ---
> 
> O3CPU: Fix a bug where stores in the cpu where never marked as split.
> 
> 
> Diffs
> -
> 
>   src/cpu/o3/lsq_unit.hh 249f174e6f37 
> 
> Diff: http://reviews.m5sim.org/r/54/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Timothy
> 
>

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Simulating caches having variable read and write latencies.

2010-07-22 Thread Steve Reinhardt
I don't know of any fundamental reason this shouldn't work, but I also
doubt that anyone has tried it before, so it's not terribly surprising
that you're running into bugs.  In general you have the right idea on
how to go about it.

Steve

On Sun, Jul 18, 2010 at 12:48 PM, Vidyabhushan Mohan
 wrote:
> Hi,
> I am simulating a 3-level cache hierarchy (private L1 and L2, shared L3)
> which has variable read and write latencies. I modified the files in m5's
> src/mem/cache directory and wherever latency was getting added to the timer
> tick, I check if the request is a write request. The code  looks like this:
>
> time = curTick + latency;
>
> is modified as
>
> if (pkt->isWrite())
>    time = curTick + writeHitlatency;
> else
>    time = curTick + latency;
>
> I'm trying to measure the sensitivity of write latencies to the performance
> of PARSEC workloads. Since M5 supports PARSEC in full-system mode, I'm
> running m5 in FS mode (using the binaries and scripts provided by UT-Austin)
> and use checkpointing for faster simulation. However, most PARSEC workloads
> segfault for input sizes larger than simdev (they run fine for test and
> simdev) after simulating for close to an hour. I will share the stack trace
> of the simulation, but before that I wanted to get a few questions cleared.
>
> Btw, I use the latest m5 code repository.
>
> My questions are as follows:
>
> *) Is it possible to simulate a cache with variable read and write latencies
> or are there any intricacies in the m5 codebase that stops us from
> simulating such a cache? If it is possible, is the code snippet above the
> correct way of simulating such a cache or are there any special cases that
> should be handled correctly for the code to work?
>
> *) I'm currently using 12 clock cycle write latency for L1-I and L1-D cache
> (the read latency is 1 and 3 clock cycles for L1-I and L1-D cache) and 30
> cycle write latency for an L2 cache (read is 20 clock cycles). Does these
> values matter or would m5 take care of simulating these high latency writes
> correctly?
>
> *) Has anyone already tried simulating caches which have variable read and
> write latencies? If so, have you encountered any problems or issues that
> should be addressed?
>
> Any pointers on solving this will be highly appreciated.
> Thanks a lot.
>
> --
> Bhushan
>
> ___
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>
>
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] times syscall fix / ThreadContext suspension-reactivation

2010-07-22 Thread Steve Reinhardt
For your second part (the thread suspend/activate) I agree that at
first glance it looks like a bug.  Korey has been the most recent
person to work with O3 SMT thread handling, and I see his name
attached to some of the lines in question, so I'll let him respond...

Steve

On Wed, Jul 21, 2010 at 2:46 PM, Ioannis Ilkos
 wrote:
> Hello,
>
> I have been playing with time in m5 syscall emulation mode and had a couple 
> of issues:
>
> First of all I believe there is a bug in the times() syscall implementation 
> (sim/syscall_emul.hh). The current clocks value passed to the userland is:
>        int64_t clocks = curTick * OS::M5_SC_CLK_TCK / SimClock::Int::s;
>
> This evaluates to a wrong expression. Now, according to the man page all the 
> tms buffer values are supposed to be in clock ticks. Thus a more appropriate 
> expression for it would be:
>        int64_t clocks = tc->getCpuPtr()->tickToCycles(curTick);
>
> Secondly, I have implemented the nanosleep() system call. I believe that the 
> appropriate way to do this would be suspending (tc->suspend()) the 
> ThreadContext and re-activating (tc->activate(delay)) after the sleep 
> duration (please correct me if I am mistaken). This however led to a peculiar 
> behaviour with O3 in SMT mode:
> 1. in FullO3CPU::suspendContext (cpu.cc) the CPU _status is set to 
> Idle, regardless whether there are other running contexts in the CPU. Thus a 
> thread sleeping leads to all the other SMT threads sleeping.
>
> 2. in FullO3CPU::activateContext (cpu.cc) we schedule a tick event 
> after the delay parameter passed, effectively overwriting CPU's tickEvent and 
> moving all threads of the processor tick(delay) ticks forward. Is this the 
> intended behaviour or is it a bug? Clearly in my case it is not helpful since 
> the point of getting a thread to sleep relative to the other threads' time.
>
> Thank you,
> Ioannis
>
> ___
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] O3CPU + translateTiming

2010-07-22 Thread Steve Reinhardt
I think you stated it pretty accurately, Gabe.  It looks like the
base_dyn_inst.hh file is only used in o3, ozone, and checker, and of
those only o3 is really being used right now.

Steve

On Thu, Jul 22, 2010 at 12:56 AM, Gabe Black  wrote:
> The dynamic instruction object is really just the dynamic information
> associated with an instruction, as apposed to the static instruction
> object that gets reused. Strictly speaking there's no guarantee that an
> arbitrary dynamic instruction will always use timing mode, but all the
> CPU models we have that are complicated enough to need dynamic
> instructions use timing mode exclusively as far as I know. O3's dynamic
> instruction object for sure uses only timing mode, so yes, in that case,
> the data parameter is just so the function signatures are the same in
> the read case. initiateAcc is how memory instructions access memory in
> timing mode, so again, in this case and in most practical cases a
> dynamic instruction object's read function would be called from
> initiateAcc and not execute. If you're only worried about O3 (ie. it's
> in the o3 directory) you don't need to keep track of data. If it's the
> base dynamic instruction object it's a little less clear since
> -practically- speaking it will probably only be used with timing mode,
> but there isn't any reason I can think of that has to be true in all
> cases. I think the base dynamic instruction object may actually not be
> very far separated from O3 so it may already assume timing mode in other
> places.
>
> Please someone speak up if I'm wildly mischaracterizing how this is
> supposed to work. I've worked a lot with O3's innards, but I think all
> the design work was done before I was associated with M5.
>
> Gabe
>
> Min Kyu Jeong wrote:
>> so base_dyn_inst is always used timing memory - I assumed so but just
>> wanted to confirm this to make sure that read function,
>>
>> BaseDynInst::read(Addr addr, T &data, unsigned flags)
>>
>> 'data' argument isn't really doing anything but being a placeholder
>> for func sig matching in xc interface. -- is this correct?
>>
>> BaseDynInst::read() will be called only in initiateAcc(), not in
>> execute() function. When the initiateAcc()/completeAcc() pair is used,
>> pkt->get() is used in completeAcc() to read the data.
>>
>> I am moving some code from read() to finishTranslation() and it looks
>> like 'data' variable doesn't need to be passed.
>>
>> Thanks,
>>
>> Min
>>
>> On Thu, Jul 15, 2010 at 10:01 AM, Steve Reinhardt > > wrote:
>>
>>     On Wed, Jul 14, 2010 at 8:35 AM, Min Kyu Jeong >     > wrote:
>>     >  b) memory is atomic (is it a possible combination? dyn_inst +
>>     atomic?) -
>>     > x86 doesn't seem to have code for this case -
>>     Walker::recvAtomic() does
>>     > nothing.
>>
>>     I don't believe that O3+atomic mode works.  Practically speaking it
>>     doesn't make any sense.
>>
>>     Steve
>>     ___
>>     m5-dev mailing list
>>     m5-dev@m5sim.org 
>>     http://m5sim.org/mailman/listinfo/m5-dev
>>
>>
>> 
>>
>> ___
>> m5-dev mailing list
>> m5-dev@m5sim.org
>> http://m5sim.org/mailman/listinfo/m5-dev
>>
>
> ___
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: util/m5/m5.c: in readfile(), added memset to touch all pages - ensure they are in the page table

2010-07-22 Thread Gabe Black

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/64/#review106
---



util/m5/Makefile.x86


Why is this necessary? Is this so it runs under SE mode? In that case I 
think we should make it run like before as the default since 99% of the time 
this will run in FS, and provide a way to inject -static for the 1% of the time 
it runs in SE.

Compiling it as static all the time wouldn't be the end of the world, but 
it seems like we'd be making universal changes for a very uncommon case.



util/m5/m5ops.h


It looks like Ali commandeered that value on line 61. It might have been 
better to use 0x5A for that, but it also might not be safe to change it now 
since there may be binaries out there that use it (probably not too many). It 
would be a little strange, but you could actually use 0x5A for reserved1_func. 
I don't know what restrictions there are in the various ISAs for function 
numbers, but in x86 it's a 16 bit value.


- Gabe


On 2010-07-21 14:57:19, Joel Hestness wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/64/
> ---
> 
> (Updated 2010-07-21 14:57:19)
> 
> 
> Review request for Default.
> 
> 
> Summary
> ---
> 
> util/m5/m5.c: in readfile(), added memset to touch all pages - ensure they 
> are in the page table
> util/m5/m5ops/h: add reserved1_func back into list of magic ops
> util/m5/Makefile.x86: should build the binary statically
> 
> 
> Diffs
> -
> 
>   util/m5/Makefile.x86 a75564db03c3 
>   util/m5/m5.c a75564db03c3 
>   util/m5/m5ops.h a75564db03c3 
> 
> Diff: http://reviews.m5sim.org/r/64/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Joel
> 
>

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Booting Linux, X86_FS Timing CPU

2010-07-22 Thread Gabe Black
Ok, I've had a chance to look at this. The reason completeDataAccess is
called twice is that the first instance is for the previous instruction.
It's data access comes back, allowing it to finish, and because it
finished it fires off the next instruction. I'm assuming this is a
subsequent microop of the same macroop since no fetch actually happens.
Because there's no trip to memory and we don't wait for a call back, the
next instruction is started in place without collapsing the existing
stack frames. This can (and has) actually lead to problems where the
subsequent instruction may change things unexpectedly out from under the
previous instruction or earlier parts of the same instruction, and it
also means the stack could theoretically grow without bound and, in the
worst case, run out of space. This doesn't have anything to do with the
assert, but it does explain completeDataAccess showing up twice.

The assert is, as you said, from NO_ACCESS skipping the call out to the
memory system and going right to the code that finishes off execution of
that instruction, surprising that code by never having left the Running
state. Under any other circumstance, though, the CPU shouldn't be in the
Running state, and if we just added that to the assert we wouldn't catch
those bugs. What I think would be a better fix is to move the assert
(but not the assignment to _status) up above the code that aggregates
the components of a split packet  and add
pkt->req->getFlags().isSet(Request::NO_ACCESS) or something similar to
the or. This isn't perfect because it asserts every time the function is
called and not just once all the fragments (should be only two) are
gathered, but it's safer and the overhead should be minimal.

I think the reason this doesn't cause a problem for any other ISA is
that, according to grep, no other ISA uses the NO_ACCESS flag.

Gabe

Gabe Black wrote:
> Hi. I've done a little work enabling X86_FS with the timing CPU, but
> there was a problem where a pointer to a packet was corrupted somehow
> (if I remember correctly) and it died not very far in. As far as I know
> that problem was never fixed, and it's quite likely you'll run into
> silly issues I didn't since I couldn't get all that far. I haven't
> looked at the Timing CPU's code again yet, but I have to imagine in
> other ISAs other reads/writes have NO_ACCESS set and don't get held up
> by the TLB. There may be some mechanism in place to handle this case
> already that isn't being triggered or is being circumvented somehow.
> I'll look at it when I get a chance and get back to you. What
> kernel/configuration script/command line are you using?
>
> Gabe
>
> Joel Hestness wrote:
>   
>> Hi,
>>   I am currently experimenting with the timing CPU in X86_FS, and I
>> have encountered an assertion failure while booting Linux (using Linux
>> boot as a test):
>> m5.debug: build/X86_FS/cpu/simple/timing.cc:900: void
>> TimingSimpleCPU::completeDataAccess(Packet*): Assertion `_status ==
>> DcacheWaitResponse || _status == DTBWaitResponse' failed.
>>   I have attached a stack trace (note that completeDataAccess is
>> called twice in the trace).  The current macro-instruction is a POP_M,
>> and the current uop is the Cda.
>>   In timing mode since the Cda doesn't access memory
>> (the Request::NO_ACCESS flag is set by Cda), it doesn't wait on a
>> memory access or TLB, so the status of the CPU before the assertion is
>> _status = Running.  I've tried adding "|| _status == Running" to the
>> conditional in the assertion, and the simulation gets past that point,
>> but crashes later.  I'm not sure if this is a sound fix, or if there
>> is a better way to handle this.
>>   While browsing the code, I noticed that further up in the call
>> stack, TimingSimpleCPU::write is called, and when executing this same
>> test using the atomic CPU, AtomicSimpleCPU::write is called.  In the
>> AtomicSimpleCPU::write code, there is a special case test for
>> when the Request::NO_ACCESS flag is set.  I wonder if the same should
>> occur in TimingSimpleCPU::write?
>>   Thanks,
>>   Joel
>>
>> -- 
>>   Joel Hestness
>>   PhD Student, Computer Architecture
>>   Dept. of Computer Science, University of Texas - Austin
>>   http://www.cs.utexas.edu/~hestness
>> 
>> 
>>
>> ___
>> m5-dev mailing list
>> m5-dev@m5sim.org
>> http://m5sim.org/mailman/listinfo/m5-dev
>> 
>
> ___
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>   

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] times syscall fix / ThreadContext suspension-reactivation

2010-07-22 Thread Gabe Black
Ioannis Ilkos wrote:
> Hello,
>
> I have been playing with time in m5 syscall emulation mode and had a couple 
> of issues:
>
> First of all I believe there is a bug in the times() syscall implementation 
> (sim/syscall_emul.hh). The current clocks value passed to the userland is:
>   int64_t clocks = curTick * OS::M5_SC_CLK_TCK / SimClock::Int::s;
>
> This evaluates to a wrong expression. Now, according to the man page all the 
> tms buffer values are supposed to be in clock ticks. Thus a more appropriate 
> expression for it would be:
>   int64_t clocks = tc->getCpuPtr()->tickToCycles(curTick);
>   

The man pages do misleadingly say the values are in clock ticks. They
don't mean CPU cycles, though, they mean (based on what the internet
just told me) the periodic timer interrupts delivered to the kernel
which are typically 100Hz, 1000Hz, etc. The value used by M5 for
M5_SC_CLK_TCK is 100 which is reasonable, and using this other
interpretation the original formula looks correct to me.

>  
> Secondly, I have implemented the nanosleep() system call. I believe that the 
> appropriate way to do this would be suspending (tc->suspend()) the 
> ThreadContext and re-activating (tc->activate(delay)) after the sleep 
> duration (please correct me if I am mistaken). This however led to a peculiar 
> behaviour with O3 in SMT mode:
> 1. in FullO3CPU::suspendContext (cpu.cc) the CPU _status is set to 
> Idle, regardless whether there are other running contexts in the CPU. Thus a 
> thread sleeping leads to all the other SMT threads sleeping.
>
> 2. in FullO3CPU::activateContext (cpu.cc) we schedule a tick event 
> after the delay parameter passed, effectively overwriting CPU's tickEvent and 
> moving all threads of the processor tick(delay) ticks forward. Is this the 
> intended behaviour or is it a bug? Clearly in my case it is not helpful since 
> the point of getting a thread to sleep relative to the other threads' time.
>   

For this part I have no idea.

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] O3CPU + translateTiming

2010-07-22 Thread Gabe Black
The dynamic instruction object is really just the dynamic information
associated with an instruction, as apposed to the static instruction
object that gets reused. Strictly speaking there's no guarantee that an
arbitrary dynamic instruction will always use timing mode, but all the
CPU models we have that are complicated enough to need dynamic
instructions use timing mode exclusively as far as I know. O3's dynamic
instruction object for sure uses only timing mode, so yes, in that case,
the data parameter is just so the function signatures are the same in
the read case. initiateAcc is how memory instructions access memory in
timing mode, so again, in this case and in most practical cases a
dynamic instruction object's read function would be called from
initiateAcc and not execute. If you're only worried about O3 (ie. it's
in the o3 directory) you don't need to keep track of data. If it's the
base dynamic instruction object it's a little less clear since
-practically- speaking it will probably only be used with timing mode,
but there isn't any reason I can think of that has to be true in all
cases. I think the base dynamic instruction object may actually not be
very far separated from O3 so it may already assume timing mode in other
places.

Please someone speak up if I'm wildly mischaracterizing how this is
supposed to work. I've worked a lot with O3's innards, but I think all
the design work was done before I was associated with M5.

Gabe

Min Kyu Jeong wrote:
> so base_dyn_inst is always used timing memory - I assumed so but just
> wanted to confirm this to make sure that read function,
>
> BaseDynInst::read(Addr addr, T &data, unsigned flags)
>
> 'data' argument isn't really doing anything but being a placeholder
> for func sig matching in xc interface. -- is this correct?
>
> BaseDynInst::read() will be called only in initiateAcc(), not in
> execute() function. When the initiateAcc()/completeAcc() pair is used,
> pkt->get() is used in completeAcc() to read the data.
>
> I am moving some code from read() to finishTranslation() and it looks
> like 'data' variable doesn't need to be passed.
>
> Thanks, 
>
> Min
>
> On Thu, Jul 15, 2010 at 10:01 AM, Steve Reinhardt  > wrote:
>
> On Wed, Jul 14, 2010 at 8:35 AM, Min Kyu Jeong  > wrote:
> >  b) memory is atomic (is it a possible combination? dyn_inst +
> atomic?) -
> > x86 doesn't seem to have code for this case -
> Walker::recvAtomic() does
> > nothing.
>
> I don't believe that O3+atomic mode works.  Practically speaking it
> doesn't make any sense.
>
> Steve
> ___
> m5-dev mailing list
> m5-dev@m5sim.org 
> http://m5sim.org/mailman/listinfo/m5-dev
>
>
> 
>
> ___
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>   

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: O3CPU: Fix a bug where stores in the cpu where never marked as split.

2010-07-22 Thread Gabe Black

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/54/#review105
---



src/cpu/o3/lsq_unit.hh


Is it possible for sreqLow to be non-null and TheISA::HasUnalignedMemAcc 
-not- to be true? In that instance, wouldn't this still be a split access? Or 
does the code not shown in this diff make that not work?

If this is just to make it more obvious what sort of condition your 
checking for a comment would be better, or if it's a sanity check sort of thing 
an assert.


- Gabe


On 2010-07-09 18:20:19, Timothy Jones wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/54/
> ---
> 
> (Updated 2010-07-09 18:20:19)
> 
> 
> Review request for Default.
> 
> 
> Summary
> ---
> 
> O3CPU: Fix a bug where stores in the cpu where never marked as split.
> 
> 
> Diffs
> -
> 
>   src/cpu/o3/lsq_unit.hh 249f174e6f37 
> 
> Diff: http://reviews.m5sim.org/r/54/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Timothy
> 
>

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: Syscall: Don't close the simulator's standard file descriptors.

2010-07-22 Thread Gabe Black

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/52/#review104
---



src/sim/syscall_emul.cc


We might want a more robust way of detecting whether a file descriptor is 
safe to close. This will catch instances where stdout, stderr and stdin are 
forwarded to the simulated program, but could there be others? I can't think of 
any off hand, but we should at least think about it carefully.

This shouldn't prevent you from checking this in, though, since some 
allegedly imperfect protection is better than no protection.


- Gabe


On 2010-07-16 17:57:30, Timothy Jones wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/52/
> ---
> 
> (Updated 2010-07-16 17:57:30)
> 
> 
> Review request for Default.
> 
> 
> Summary
> ---
> 
> Syscall: Don't close the simulator's standard file descriptors.
> 
> 
> Diffs
> -
> 
>   src/sim/syscall_emul.cc 249f174e6f37 
> 
> Diff: http://reviews.m5sim.org/r/52/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Timothy
> 
>

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] Notification from M5 Bugs

2010-07-22 Thread Flyspray
THIS IS AN AUTOMATED MESSAGE, DO NOT REPLY.

A new Flyspray task has been opened.  Details are below.

User who did this: - Gabe Black (gblack)

Attached to Project - M5 Bugs
Summary - Create a "generic" ISA directory.
Task Type - Minor Enhancement
Category - ISA Support
Status - New
Assigned To - Gabe Black
Operating System - All
Severity - Low
Priority - Normal
Reported Version - 2.0beta5
Due in Version - 
Due Date - Undecided
Details - A non-trivial amount of code is exactly the same between
multiple ISAs or is the same but with minor modifications. A directory
in arch called "generic" should be created for these common
implementations which will be put in an namespace called GenericISA.
The individual ISAs can wrap these versions of functions or pieces of
functions, or import them directly into their own namespace with
"using".

More information can be found at the following URL:
http://www.m5sim.org/flyspray/task/333

You are receiving this message because you have requested it from the
Flyspray bugtracking system.  You can be removed from future
notifications by visiting the URL shown above.

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: Power: Provide a utility function to copy registers from one thread context

2010-07-22 Thread Gabe Black

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/35/#review103
---



src/arch/power/utility.cc


Do we really need all these copyrights here? We can imagine it was copied 
from the Alpha implementation instead and just leave the UM one there. I'll 
file a bug in flyspray against myself to create a generic ISA directory for 
this sort of thing.


- Gabe


On 2010-07-09 17:53:55, Timothy Jones wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/35/
> ---
> 
> (Updated 2010-07-09 17:53:55)
> 
> 
> Review request for Default.
> 
> 
> Summary
> ---
> 
> Power: Provide a utility function to copy registers from one thread context
> to another in the Power ISA.
> 
> 
> Diffs
> -
> 
>   src/arch/power/SConscript 249f174e6f37 
>   src/arch/power/utility.hh 249f174e6f37 
>   src/arch/power/utility.cc PRE-CREATION 
> 
> Diff: http://reviews.m5sim.org/r/35/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Timothy
> 
>

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev