Re: [m5-dev] Review Request: x86: Timing support for pagetable walker

2011-01-22 Thread Gabe Black
You're headed in the right direction but aren't quite there yet. If you
split out the functional part of startWalk into its own function, you
can change the signature and pass the address and size by reference.
Then you don't need a request or translation object at all. You could
even put that code right into startFunctional since that's the only
place it's called from.

Also I think there may be some memory leaks here from packets not being
deleted when they should, and if that's true it's actually my fault from
the original code. It would be nice to verify that and if necessary
clean it up and not duplicate the brokenness.

Gabe


Gabe Black wrote:
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/396/
>
>
> src/arch/x86/pagetable_walker.cc
> 
> (Diff revision 3)
>
>   
> Walker::WalkerState::startWalk()
>
>   
>   232 
> walker->port.sendAtomic(read);
>
> Possible memory leak? I think the sender is responsible for cleaning up the 
> packet for atomic accesses, and I don't see where that's being done. I may be 
> wrong about atomic mode, or have missed where this is cleaned up.
>
> src/arch/x86/pagetable_walker.cc
> 
> (Diff revision 3)
>
>   
> Walker::WalkerState::startWalk()
>
>   
>   239 
> walker->port.sendAtomic(write);
>
> Possible memory leak?
>
> src/arch/x86/pagetable_walker.cc
> 
> (Diff revision 3)
>
>   
> Walker::WalkerState::startWalk()
>
>   
>   244 
> } else {
>
> Instead of putting the majority of this function in an if/else, you could 
> split it into two different functions. Then you could make the functional 
> part take different parameters and not have to use the req or translation 
> object to ferry information back.
>
> src/arch/x86/pagetable_walker.cc
> 
> (Diff revision 3)
>
>   
> Walker::WalkerState::startWalk()
>
>   
>   246 
> walker->port.sendFunctional(read);
>
> Possible memory leak? Same rational as atomic mode.
>
> src/arch/x86/pagetable_walker.cc
> 
> (Diff revision 3)
>
>   
> Walker::WalkerState::startWalk()
>
>   
>   250 
> fault = stepWalk(write);
>
> Possible memory leak? If write is set to something we can't just lose it, we 
> need to clean it up. That doesn't apply if write is statically allocated, but 
> I don't see where that's happening.
>
> - Gabe
>
>
> On January 20th, 2011, 4:57 p.m., Brad Beckmann wrote:
>
> Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt,
> and Nathan Binkert.
> By Brad Beckmann.
>
> /Updated 2011-01-20 16:57:10/
>
>
>   Description
>
> x86: Timing support for pagetable walker
>
> Move page table walker state to its own object type, and make the
> walker instantiate state for each outstanding walk. By storing the
> states in a queue, the walker is able to handle multiple outstanding
> timing requests. Note that functional walks use separate state
> elements.
>
>
>   Diffs
>
> * src/arch/x86/tlb.hh (9f9e10967912)
> * src/arch/x86/tlb.cc (9f9e10967912)
> * src/arch/x86/pagetable_walker.cc (9f9e10967912)
> * src/arch/x86/pagetable_walker.hh (9f9e10967912)
>
> View Diff 
>
> 
>
> ___
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>   

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: x86: Timing support for pagetable walker

2011-01-22 Thread Gabe Black

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/396/#review796
---



src/arch/x86/pagetable_walker.cc


Possible memory leak? I think the sender is responsible for cleaning up the 
packet for atomic accesses, and I don't see where that's being done. I may be 
wrong about atomic mode, or have missed where this is cleaned up.



src/arch/x86/pagetable_walker.cc


Possible memory leak?



src/arch/x86/pagetable_walker.cc


Instead of putting the majority of this function in an if/else, you could 
split it into two different functions. Then you could make the functional part 
take different parameters and not have to use the req or translation object to 
ferry information back.



src/arch/x86/pagetable_walker.cc


Possible memory leak? Same rational as atomic mode.



src/arch/x86/pagetable_walker.cc


Possible memory leak? If write is set to something we can't just lose it, 
we need to clean it up. That doesn't apply if write is statically allocated, 
but I don't see where that's happening.


- Gabe


On 2011-01-20 16:57:10, Brad Beckmann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/396/
> ---
> 
> (Updated 2011-01-20 16:57:10)
> 
> 
> Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and 
> Nathan Binkert.
> 
> 
> Summary
> ---
> 
> x86: Timing support for pagetable walker
> 
> Move page table walker state to its own object type, and make the
> walker instantiate state for each outstanding walk. By storing the
> states in a queue, the walker is able to handle multiple outstanding
> timing requests. Note that functional walks use separate state
> elements.
> 
> 
> Diffs
> -
> 
>   src/arch/x86/tlb.hh 9f9e10967912 
>   src/arch/x86/tlb.cc 9f9e10967912 
>   src/arch/x86/pagetable_walker.cc 9f9e10967912 
>   src/arch/x86/pagetable_walker.hh 9f9e10967912 
> 
> Diff: http://reviews.m5sim.org/r/396/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Brad
> 
>

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: x86: Timing support for pagetable walker

2011-01-20 Thread Brad Beckmann

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/396/
---

(Updated 2011-01-20 16:57:10.835752)


Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan 
Binkert.


Summary
---

x86: Timing support for pagetable walker

Move page table walker state to its own object type, and make the
walker instantiate state for each outstanding walk. By storing the
states in a queue, the walker is able to handle multiple outstanding
timing requests. Note that functional walks use separate state
elements.


Diffs (updated)
-

  src/arch/x86/tlb.hh 9f9e10967912 
  src/arch/x86/tlb.cc 9f9e10967912 
  src/arch/x86/pagetable_walker.cc 9f9e10967912 
  src/arch/x86/pagetable_walker.hh 9f9e10967912 

Diff: http://reviews.m5sim.org/r/396/diff


Testing
---


Thanks,

Brad

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: x86: Timing support for pagetable walker

2011-01-11 Thread Brad Beckmann

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/396/
---

(Updated 2011-01-11 08:14:04.743657)


Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan 
Binkert.


Summary
---

x86: Timing support for pagetable walker

Move page table walker state to its own object type, and make the
walker instantiate state for each outstanding walk. By storing the
states in a queue, the walker is able to handle multiple outstanding
timing requests. Note that functional walks use separate state
elements.


Diffs (updated)
-

  src/arch/x86/pagetable_walker.hh 9f9e10967912 
  src/arch/x86/pagetable_walker.cc 9f9e10967912 
  src/arch/x86/tlb.hh 9f9e10967912 
  src/arch/x86/tlb.cc 9f9e10967912 

Diff: http://reviews.m5sim.org/r/396/diff


Testing
---


Thanks,

Brad

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: x86: Timing support for pagetable walker

2011-01-09 Thread Gabe Black


> On 2011-01-07 05:51:30, Gabe Black wrote:
> > The code seems ok, but why do we need to have multiple outstanding page 
> > walks in timing mode again?
> 
> Gabe Black wrote:
> Actually, I wrote the above before I'd read it carefully. My question 
> still stands, but there are some areas that need to be fixed up. Also, since 
> translation is very much on the critical path, make sure you measure how much 
> this change affects performance. I expect with the addition indirection at 
> least there will be some slow down, and we should know what that is before we 
> commit anything.
> 
> Joel Hestness wrote:
> In timing mode x86, if a memory address translation misses in the TLB AND 
> happens to be an unaligned access (one that straddles a page boundary), the 
> TLB promptly fires both of the requests to the page table walker.  The old 
> implementation of the walker doesn't support multiple outstanding requests, 
> so it immediately crashes simulation with a state assertion failure (I asked 
> a few questions about this in June and July, back when I made the changes to 
> the walker).  The implementation in this patch can queue the requests and 
> service them sequentially.  It should be a simple future extension to service 
> them concurrently.
> 
> I modeled this implementation after the ARM implementation in 
> arch/arm/table_walker.*.
> 
> Concerning the slowdown, the frequency of unaligned accesses that miss in 
> the TLB is extremely rare (<10 in seconds of simulated system time).  Since 
> timing mode doesn't work without this fix, there isn't a way to compare 
> performance against a baseline.

Ah, ok. I remembered there was a discussion about something along these lines 
before, but I'd forgotten all the details. I initially was thinking this would 
affect every translation, but that's not true since this only changes page 
table walks which will (hopefully) be a lot less frequent. While that means a 
slow down would be less of an issue, since there are some new for loops and 
non-trivial data structures it would still be nice to know for sure whether it 
makes a difference. You could run without the patch until the simulation 
crashes and then run again with the patch up to that tick. It's not earth 
shatteringly critical to measure it, but it would be good for peace of mind.


- Gabe


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/396/#review649
---


On 2011-01-06 16:12:34, Brad Beckmann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/396/
> ---
> 
> (Updated 2011-01-06 16:12:34)
> 
> 
> Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and 
> Nathan Binkert.
> 
> 
> Summary
> ---
> 
> x86: Timing support for pagetable walker
> 
> Move page table walker state to its own object type, and make the
> walker instantiate state for each outstanding walk. By storing the
> states in a queue, the walker is able to handle multiple outstanding
> timing requests. Note that functional walks use separate state
> elements.
> 
> 
> Diffs
> -
> 
>   src/arch/x86/pagetable_walker.hh 9f9e10967912 
>   src/arch/x86/pagetable_walker.cc 9f9e10967912 
>   src/arch/x86/tlb.hh 9f9e10967912 
>   src/arch/x86/tlb.cc 9f9e10967912 
> 
> Diff: http://reviews.m5sim.org/r/396/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Brad
> 
>

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: x86: Timing support for pagetable walker

2011-01-09 Thread Gabe Black


> On 2011-01-07 05:51:30, Gabe Black wrote:
> > src/arch/x86/pagetable_walker.cc, line 77
> > 
> >
> > These should use FastAlloc if at all possible since they're on a 
> > critical path and the heap is slow.
> 
> Joel Hestness wrote:
> Is this as simple as having WalkerState inherit from FastAlloc?

Yes I think so. I've never used FastAlloc myself, but it looks like it 
basically allocates a big pool of memory in certain sized chunks, and then 
rather than allocating them on the heap it just grabs a free one and fills it 
in. That implies your class has to be a fixed size or less (512 bytes according 
to the header). If it's too big then "new" falls back to the heap, so it'd 
actually be a little slower than going to the heap directly. Packets use 
FastAlloc if you want an example.


- Gabe


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/396/#review649
---


On 2011-01-06 16:12:34, Brad Beckmann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/396/
> ---
> 
> (Updated 2011-01-06 16:12:34)
> 
> 
> Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and 
> Nathan Binkert.
> 
> 
> Summary
> ---
> 
> x86: Timing support for pagetable walker
> 
> Move page table walker state to its own object type, and make the
> walker instantiate state for each outstanding walk. By storing the
> states in a queue, the walker is able to handle multiple outstanding
> timing requests. Note that functional walks use separate state
> elements.
> 
> 
> Diffs
> -
> 
>   src/arch/x86/pagetable_walker.hh 9f9e10967912 
>   src/arch/x86/pagetable_walker.cc 9f9e10967912 
>   src/arch/x86/tlb.hh 9f9e10967912 
>   src/arch/x86/tlb.cc 9f9e10967912 
> 
> Diff: http://reviews.m5sim.org/r/396/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Brad
> 
>

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: x86: Timing support for pagetable walker

2011-01-07 Thread Joel Hestness


> On 2011-01-07 05:51:30, Gabe Black wrote:
> > The code seems ok, but why do we need to have multiple outstanding page 
> > walks in timing mode again?
> 
> Gabe Black wrote:
> Actually, I wrote the above before I'd read it carefully. My question 
> still stands, but there are some areas that need to be fixed up. Also, since 
> translation is very much on the critical path, make sure you measure how much 
> this change affects performance. I expect with the addition indirection at 
> least there will be some slow down, and we should know what that is before we 
> commit anything.

In timing mode x86, if a memory address translation misses in the TLB AND 
happens to be an unaligned access (one that straddles a page boundary), the TLB 
promptly fires both of the requests to the page table walker.  The old 
implementation of the walker doesn't support multiple outstanding requests, so 
it immediately crashes simulation with a state assertion failure (I asked a few 
questions about this in June and July, back when I made the changes to the 
walker).  The implementation in this patch can queue the requests and service 
them sequentially.  It should be a simple future extension to service them 
concurrently.

I modeled this implementation after the ARM implementation in 
arch/arm/table_walker.*.

Concerning the slowdown, the frequency of unaligned accesses that miss in the 
TLB is extremely rare (<10 in seconds of simulated system time).  Since timing 
mode doesn't work without this fix, there isn't a way to compare performance 
against a baseline.


> On 2011-01-07 05:51:30, Gabe Black wrote:
> > src/arch/x86/pagetable_walker.hh, line 187
> > 
> >
> > Why call this reqType instead of leaving it as mode? requests have 
> > types which are orthogonal to this, and it's called mode everywhere else.

Good point.  I'm not sure why I named it that.  "mode" would be better.


> On 2011-01-07 05:51:30, Gabe Black wrote:
> > src/arch/x86/pagetable_walker.cc, line 77
> > 
> >
> > These should use FastAlloc if at all possible since they're on a 
> > critical path and the heap is slow.

Is this as simple as having WalkerState inherit from FastAlloc?


> On 2011-01-07 05:51:30, Gabe Black wrote:
> > src/arch/x86/pagetable_walker.cc, line 89
> > 
> >
> > Memory leak.

Well, that's embarrassing :P


> On 2011-01-07 05:51:30, Gabe Black wrote:
> > src/arch/x86/pagetable_walker.cc, line 179
> > 
> >
> > Is letting translations pass each other realistic? I worry we're making 
> > our walker artificially powerful. These loops will also slow things down 
> > potentially.

This is an abstract implementation just to get the walker to work.  It can be 
easily molded to order the requests appropriately.

On the topic of slowdown, having more than one request in the queue is 
extremely rare, so any slowdown should be trivial.


> On 2011-01-07 05:51:30, Gabe Black wrote:
> > src/arch/x86/pagetable_walker.cc, line 541
> > 
> >
> > Declare this where it's used.

I think I had other plans for this variable, but it doesn't look like I 
followed through.  Moving it to line 566 should be a simple fix.


> On 2011-01-07 05:51:30, Gabe Black wrote:
> > src/arch/x86/pagetable_walker.cc, line 574
> > 
> >
> > Why is this pulled out into its own switch statement? That will slow 
> > down the code and makes things more complicated.

As I recall, this was part of my intermediate solution to the unaligned access 
problem.  These lines can be moved back to the previous locations.


- Joel


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/396/#review649
---


On 2011-01-06 16:12:34, Brad Beckmann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/396/
> ---
> 
> (Updated 2011-01-06 16:12:34)
> 
> 
> Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and 
> Nathan Binkert.
> 
> 
> Summary
> ---
> 
> x86: Timing support for pagetable walker
> 
> Move page table walker state to its own object type, and make the
> walker instantiate state for each outstanding walk. By storing the
> states in a queue, the walker is able to handle multiple outstanding
> timing requests. Note that functional walks use separate state
> elements.
> 
> 
> Diffs
> -
> 
>   src/arch/x86/pagetable_walker.hh 9f9e10967912 
> 

Re: [m5-dev] Review Request: x86: Timing support for pagetable walker

2011-01-07 Thread Gabe Black


> On 2011-01-07 05:51:30, Gabe Black wrote:
> > The code seems ok, but why do we need to have multiple outstanding page 
> > walks in timing mode again?

Actually, I wrote the above before I'd read it carefully. My question still 
stands, but there are some areas that need to be fixed up. Also, since 
translation is very much on the critical path, make sure you measure how much 
this change affects performance. I expect with the addition indirection at 
least there will be some slow down, and we should know what that is before we 
commit anything.


- Gabe


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/396/#review649
---


On 2011-01-06 16:12:34, Brad Beckmann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/396/
> ---
> 
> (Updated 2011-01-06 16:12:34)
> 
> 
> Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and 
> Nathan Binkert.
> 
> 
> Summary
> ---
> 
> x86: Timing support for pagetable walker
> 
> Move page table walker state to its own object type, and make the
> walker instantiate state for each outstanding walk. By storing the
> states in a queue, the walker is able to handle multiple outstanding
> timing requests. Note that functional walks use separate state
> elements.
> 
> 
> Diffs
> -
> 
>   src/arch/x86/pagetable_walker.hh 9f9e10967912 
>   src/arch/x86/pagetable_walker.cc 9f9e10967912 
>   src/arch/x86/tlb.hh 9f9e10967912 
>   src/arch/x86/tlb.cc 9f9e10967912 
> 
> Diff: http://reviews.m5sim.org/r/396/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Brad
> 
>

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: x86: Timing support for pagetable walker

2011-01-07 Thread Gabe Black

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/396/#review649
---


The code seems ok, but why do we need to have multiple outstanding page walks 
in timing mode again?


src/arch/x86/pagetable_walker.hh


Why call this reqType instead of leaving it as mode? requests have types 
which are orthogonal to this, and it's called mode everywhere else.



src/arch/x86/pagetable_walker.cc


These should use FastAlloc if at all possible since they're on a critical 
path and the heap is slow.



src/arch/x86/pagetable_walker.cc


Memory leak.



src/arch/x86/pagetable_walker.cc


Is letting translations pass each other realistic? I worry we're making our 
walker artificially powerful. These loops will also slow things down 
potentially.



src/arch/x86/pagetable_walker.cc


Declare this where it's used.



src/arch/x86/pagetable_walker.cc


Why is this pulled out into its own switch statement? That will slow down 
the code and makes things more complicated.


- Gabe


On 2011-01-06 16:12:34, Brad Beckmann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/396/
> ---
> 
> (Updated 2011-01-06 16:12:34)
> 
> 
> Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and 
> Nathan Binkert.
> 
> 
> Summary
> ---
> 
> x86: Timing support for pagetable walker
> 
> Move page table walker state to its own object type, and make the
> walker instantiate state for each outstanding walk. By storing the
> states in a queue, the walker is able to handle multiple outstanding
> timing requests. Note that functional walks use separate state
> elements.
> 
> 
> Diffs
> -
> 
>   src/arch/x86/pagetable_walker.hh 9f9e10967912 
>   src/arch/x86/pagetable_walker.cc 9f9e10967912 
>   src/arch/x86/tlb.hh 9f9e10967912 
>   src/arch/x86/tlb.cc 9f9e10967912 
> 
> Diff: http://reviews.m5sim.org/r/396/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Brad
> 
>

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] Review Request: x86: Timing support for pagetable walker

2011-01-06 Thread Brad Beckmann

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/396/
---

Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan 
Binkert.


Summary
---

x86: Timing support for pagetable walker

Move page table walker state to its own object type, and make the
walker instantiate state for each outstanding walk. By storing the
states in a queue, the walker is able to handle multiple outstanding
timing requests. Note that functional walks use separate state
elements.


Diffs
-

  src/arch/x86/pagetable_walker.hh 9f9e10967912 
  src/arch/x86/pagetable_walker.cc 9f9e10967912 
  src/arch/x86/tlb.hh 9f9e10967912 
  src/arch/x86/tlb.cc 9f9e10967912 

Diff: http://reviews.m5sim.org/r/396/diff


Testing
---


Thanks,

Brad

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev