Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-22 Thread Beckmann, Brad
Korey, if you're deadlock is with running the MOESI_CMP_directory protocol, I'm 
not surprised.  DMA support is pretty much broken in that protocol.  I have 
that fixed and I also fixed the underlining DMA problem.  I'll be pushing the 
fixes momentarily.

Korey and Malek, please pull these changes and confirm they fix your problem.

Brad


 -Original Message-
 From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
 On Behalf Of Korey Sewell
 Sent: Friday, March 18, 2011 9:12 AM
 To: M5 Developer List
 Subject: Re: [m5-dev] Ruby FS - DMA Controller problem?
 
 message below
 
 Why did it work before the block size patch?
  - When the ChuckGenerator sees the block size is 0, it doesn't split
  up the request into multiple patches and sends the whole dma request
  at once.  That is fine because the DMASequencer splits the request
  into multiple requests and only responds to the dma port when the entire
 request is complete.
 
 With regards to the old changeset that boots with the block size = 0, I was 
 not
 able to boot a large scale CMP system (more than 16 cores) due to the
 deadlock threshold being triggered.
 
 I'm assuming that Brad has a read on how to fix that problem so I'll probably
 start working on what is causing that deadlock so hopefully we can kind of
 pipeline the bug fixes.
 
 --
 - Korey
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-22 Thread Nilay
On Sat, March 19, 2011 6:01 pm, Beckmann, Brad wrote:
 Korey, if you're deadlock is with running the MOESI_CMP_directory
 protocol, I'm not surprised.  DMA support is pretty much broken in that
 protocol.  I have that fixed and I also fixed the underlining DMA problem.
  I'll be pushing the fixes momentarily.

 Korey and Malek, please pull these changes and confirm they fix your
 problem.

 Brad



Brad, how come the mails you sent on Saturday are being received now?


--
Nilay

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-22 Thread Beckmann, Brad
Nevermind those.  I had several incoming and outgoing emails from the weekend 
that finally got through our system.

Brad

  

 -Original Message-
 From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On
 Behalf Of Nilay
 Sent: Tuesday, March 22, 2011 8:07 PM
 To: M5 Developer List
 Subject: Re: [m5-dev] Ruby FS - DMA Controller problem?
 
 On Sat, March 19, 2011 6:01 pm, Beckmann, Brad wrote:
  Korey, if you're deadlock is with running the MOESI_CMP_directory
  protocol, I'm not surprised.  DMA support is pretty much broken in
 that
  protocol.  I have that fixed and I also fixed the underlining DMA
 problem.
   I'll be pushing the fixes momentarily.
 
  Korey and Malek, please pull these changes and confirm they fix your
  problem.
 
  Brad
 
 
 
 Brad, how come the mails you sent on Saturday are being received now?
 
 
 --
 Nilay
 
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-18 Thread Korey Sewell
message below

Why did it work before the block size patch?
 - When the ChuckGenerator sees the block size is 0, it doesn't split up the
 request into multiple patches and sends the whole dma request at once.  That
 is fine because the DMASequencer splits the request into multiple requests
 and only responds to the dma port when the entire request is complete.

With regards to the old changeset that boots with the block size = 0, I was
not able to boot a large scale CMP system (more than 16 cores) due to the
deadlock threshold being triggered.

I'm assuming that Brad has a read on how to fix that problem so I'll
probably start working on what is causing that deadlock so hopefully we can
kind of pipeline the bug fixes.

-- 
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-18 Thread Malek Musleh
Hi Korey,

I don't seem to have encountered that deadlock threshold when booting
the old changeset. I tried both 16 + 20 core configurations just now
and they seem to work. Although, they do take a really really long
time compared to ~1-4 cores.

I have also tried previously booting 64 cores, some time ago, and that
also worked, but that took several hours.

In general though, that threshold is just a fixed number, and as the
CMP machine gets bigger, the 5 seems to be way too low, and would
have to be multiplied by a factor of 2 -3?

I tried using the default crossbar topology, maybe you encounter the
deadlock threshold using Mesh?

Malek

On Fri, Mar 18, 2011 at 12:12 PM, Korey Sewell ksew...@umich.edu wrote:
 message below

 Why did it work before the block size patch?
 - When the ChuckGenerator sees the block size is 0, it doesn't split up the
 request into multiple patches and sends the whole dma request at once.  That
 is fine because the DMASequencer splits the request into multiple requests
 and only responds to the dma port when the entire request is complete.

 With regards to the old changeset that boots with the block size = 0, I was
 not able to boot a large scale CMP system (more than 16 cores) due to the
 deadlock threshold being triggered.

 I'm assuming that Brad has a read on how to fix that problem so I'll
 probably start working on what is causing that deadlock so hopefully we can
 kind of pipeline the bug fixes.

 --
 - Korey
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-17 Thread Malek Musleh
Brad/Korey,

An Update of what I have seen.

I did notice that in the failing case, the DMASequencer would think
that the request is completed (length of request == 64) when in fact
it should be 8192. The 8192 reflects the byte sector size, but what is
interesting is that a DPRINTF(IdeIDisk) in ide_disk.cc right before it
fails indicates that the request length is 8192. So there is something
wrong with the transfer in the RubyPorts.

I have a feeling it might be also linked with the timing simpleCpu
changes about handling split requests, although Alpha does not support
split requests, that is independent of the DMA transfers.

Also, comparing Ruby Traces (with and without failing changeset) the
first PRD BaseAddr is consistent between them, but not consistent
between Ruby/M5. So the fact that the PRD BaseAddr is 'wrong' in the
one case does not prevent it from booting the Kernel.

Not really sure if that helps anymore.

Malek

On Tue, Mar 15, 2011 at 6:50 PM, Korey Sewell ksew...@umich.edu wrote:
 Sorry for the confusion, I definitely garbled up some terminology.

 I meant that the M5 ran with the atomic model to compare with the timing
 Ruby model.

 M5-atomic maybe runs in 10-15 mins and then Ruby 20-30 mins.

 I am able to get the problem point in the Ruby simulation (bad DMA access)
 in about 20 mins.

 I able to get to that same problem point in the M5-atomic mode in about 10
 mins so as to see what to compare against and what values are being
 set/unset incorrectly.



 On Tue, Mar 15, 2011 at 6:22 PM, Beckmann, Brad brad.beckm...@amd.comwrote:

 I'm confused.

 Korey, I thought this DMA problem only existed with Ruby?  If so, how were
 you able to reproduce it using atomic mode?  Ruby does not work with the
 atomic cpu model.

 Please clarify, thanks!

 Brad

  -Original Message-
  From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
  On Behalf Of Korey Sewell
  Sent: Tuesday, March 15, 2011 12:09 PM
  To: M5 Developer List
  Subject: Re: [m5-dev] Ruby FS - DMA Controller problem?
 
  Hi Brad/Malek,
  I've been able to regenerate this error  in about 20mins now (instead of
  hours) by running things in atomic mode. Not sure if that helps or not...
 
  On Tue, Mar 15, 2011 at 3:03 PM, Beckmann, Brad
  brad.beckm...@amd.comwrote:
 
How is that you are able to run the memtester in FS Mode?
I see the ruby_mem_tester.py in /configs/example/ but it seems that
it is only configured for SE Mode as far as Ruby is concerned?
  
   I don't run it in FS mode.  Since the DMA bug manifests only after
   hours of execution, I wanted to first verify that the DMA protocol
   support was solid using the mem tester.  Somewhat surprisingly, I
   found several bugs in MOESI_CMP_directory's support of DMA.  It turns
   out that the initial DMA support in that protocol wasn't very well
   thought out.  Now I fixed those bugs, but since the DMA problem also
   arises with the MOESI_hammer protocol, I'm confident that my patches
  don't fix the real problem.
  
   Brad
  
   ___
   m5-dev mailing list
   m5-dev@m5sim.org
   http://m5sim.org/mailman/listinfo/m5-dev
  
 
 
 
  --
  - Korey
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev


 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev




 --
 - Korey
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-17 Thread Korey Sewell
Hi Malek,
Can you send your most recent trace showing what you described (if it isnt
too big)? I havent observed the different request size errors, but I think I
have observed the different PRD addresses on the first access (in the most
recent changeset). I'll double check.

I was planning to post sometime soon what was the latest on my debugging
efforts but a quick summary is that the PRD address gets set from a
BMI.DTP register that eventually gets propagate through. I havent been
able to verify if that is loaded from the kernel or some configuration
parameter quite yet.


I have a feeling it might be also linked with the timing simpleCpu
 changes about handling split requests, although Alpha does not support
 split requests, that is independent of the DMA transfers.

Are you sure it's a split request problem and not an uncacheable address
thing? Or maybe it's some combo of both?



 Also, comparing Ruby Traces (with and without failing changeset) the
 first PRD BaseAddr is consistent between them, but not consistent
 between Ruby/M5. So the fact that the PRD BaseAddr is 'wrong' in the
 one case does not prevent it from booting the Kernel.

That's an interesting observation. It would be nice to figure out why that
address may or may not matter though.




 Not really sure if that helps anymore.

 Malek

 On Tue, Mar 15, 2011 at 6:50 PM, Korey Sewell ksew...@umich.edu wrote:
  Sorry for the confusion, I definitely garbled up some terminology.
 
  I meant that the M5 ran with the atomic model to compare with the timing
  Ruby model.
 
  M5-atomic maybe runs in 10-15 mins and then Ruby 20-30 mins.
 
  I am able to get the problem point in the Ruby simulation (bad DMA
 access)
  in about 20 mins.
 
  I able to get to that same problem point in the M5-atomic mode in about
 10
  mins so as to see what to compare against and what values are being
  set/unset incorrectly.
 
 
 
  On Tue, Mar 15, 2011 at 6:22 PM, Beckmann, Brad brad.beckm...@amd.com
 wrote:
 
  I'm confused.
 
  Korey, I thought this DMA problem only existed with Ruby?  If so, how
 were
  you able to reproduce it using atomic mode?  Ruby does not work with the
  atomic cpu model.
 
  Please clarify, thanks!
 
  Brad
 
   -Original Message-
   From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
   On Behalf Of Korey Sewell
   Sent: Tuesday, March 15, 2011 12:09 PM
   To: M5 Developer List
   Subject: Re: [m5-dev] Ruby FS - DMA Controller problem?
  
   Hi Brad/Malek,
   I've been able to regenerate this error  in about 20mins now (instead
 of
   hours) by running things in atomic mode. Not sure if that helps or
 not...
  
   On Tue, Mar 15, 2011 at 3:03 PM, Beckmann, Brad
   brad.beckm...@amd.comwrote:
  
 How is that you are able to run the memtester in FS Mode?
 I see the ruby_mem_tester.py in /configs/example/ but it seems
 that
 it is only configured for SE Mode as far as Ruby is concerned?
   
I don't run it in FS mode.  Since the DMA bug manifests only after
hours of execution, I wanted to first verify that the DMA protocol
support was solid using the mem tester.  Somewhat surprisingly, I
found several bugs in MOESI_CMP_directory's support of DMA.  It
 turns
out that the initial DMA support in that protocol wasn't very well
thought out.  Now I fixed those bugs, but since the DMA problem also
arises with the MOESI_hammer protocol, I'm confident that my patches
   don't fix the real problem.
   
Brad
   
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev
   
  
  
  
   --
   - Korey
   ___
   m5-dev mailing list
   m5-dev@m5sim.org
   http://m5sim.org/mailman/listinfo/m5-dev
 
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 
 
 
 
  --
  - Korey
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 




-- 
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-17 Thread Beckmann, Brad
Hi Malek/Korey,

The good news is that I've been able to dedicate a significant amount of time 
to this over the past  day or so and I've got a good handle on what is going on 
here.  

Why did it work before the block size patch?
- When the ChuckGenerator sees the block size is 0, it doesn't split up the 
request into multiple patches and sends the whole dma request at once.  That is 
fine because the DMASequencer splits the request into multiple requests and 
only responds to the dma port when the entire request is complete.

What is the current problem?
- When the ChuckGenerator sees the block size of 64, the dma port splits the 
request into 64-byte packets, effectively doing the same thing the dma 
sequencer does.  That in itself shouldn't break things...The DMA sequencer 
nacks all but the first 64-byte request of the dma transfer because it is 
designed to only handle one M5 packet at a time.  Eventually the first 64-byte 
packet completes and the RubyPort tells the dma port to retry the second 
packet.  The dma port does, but for some reason DMASequencer still nacks that 
second request.  I'm not quite sure why that is, but I'm sure I'll figure it 
out soon.  Once I do, I'll push a fix along with all the other fixes I've come 
across along this multi-day adventure.

Brad

 -Original Message-
 From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
 On Behalf Of Korey Sewell
 Sent: Thursday, March 17, 2011 3:10 PM
 To: Malek Musleh
 Cc: M5 Developer List
 Subject: Re: [m5-dev] Ruby FS - DMA Controller problem?
 
 Hi Malek,
 Can you send your most recent trace showing what you described (if it isnt
 too big)? I havent observed the different request size errors, but I think I
 have observed the different PRD addresses on the first access (in the most
 recent changeset). I'll double check.
 
 I was planning to post sometime soon what was the latest on my debugging
 efforts but a quick summary is that the PRD address gets set from a
 BMI.DTP register that eventually gets propagate through. I havent been
 able to verify if that is loaded from the kernel or some configuration
 parameter quite yet.
 
 
 I have a feeling it might be also linked with the timing simpleCpu
  changes about handling split requests, although Alpha does not support
  split requests, that is independent of the DMA transfers.
 
 Are you sure it's a split request problem and not an uncacheable address
 thing? Or maybe it's some combo of both?
 
 
 
  Also, comparing Ruby Traces (with and without failing changeset) the
  first PRD BaseAddr is consistent between them, but not consistent
  between Ruby/M5. So the fact that the PRD BaseAddr is 'wrong' in the
  one case does not prevent it from booting the Kernel.
 
 That's an interesting observation. It would be nice to figure out why that
 address may or may not matter though.
 
 
 
 
  Not really sure if that helps anymore.
 
  Malek
 
  On Tue, Mar 15, 2011 at 6:50 PM, Korey Sewell ksew...@umich.edu
 wrote:
   Sorry for the confusion, I definitely garbled up some terminology.
  
   I meant that the M5 ran with the atomic model to compare with the
   timing Ruby model.
  
   M5-atomic maybe runs in 10-15 mins and then Ruby 20-30 mins.
  
   I am able to get the problem point in the Ruby simulation (bad DMA
  access)
   in about 20 mins.
  
   I able to get to that same problem point in the M5-atomic mode in
   about
  10
   mins so as to see what to compare against and what values are being
   set/unset incorrectly.
  
  
  
   On Tue, Mar 15, 2011 at 6:22 PM, Beckmann, Brad
  brad.beckm...@amd.com
  wrote:
  
   I'm confused.
  
   Korey, I thought this DMA problem only existed with Ruby?  If so,
   how
  were
   you able to reproduce it using atomic mode?  Ruby does not work
   with the atomic cpu model.
  
   Please clarify, thanks!
  
   Brad
  
-Original Message-
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-
 boun...@m5sim.org]
On Behalf Of Korey Sewell
Sent: Tuesday, March 15, 2011 12:09 PM
To: M5 Developer List
Subject: Re: [m5-dev] Ruby FS - DMA Controller problem?
   
Hi Brad/Malek,
I've been able to regenerate this error  in about 20mins now
(instead
  of
hours) by running things in atomic mode. Not sure if that helps
or
  not...
   
On Tue, Mar 15, 2011 at 3:03 PM, Beckmann, Brad
brad.beckm...@amd.comwrote:
   
  How is that you are able to run the memtester in FS Mode?
  I see the ruby_mem_tester.py in /configs/example/ but it
  seems
  that
  it is only configured for SE Mode as far as Ruby is concerned?

 I don't run it in FS mode.  Since the DMA bug manifests only
 after hours of execution, I wanted to first verify that the DMA
 protocol support was solid using the mem tester.  Somewhat
 surprisingly, I found several bugs in MOESI_CMP_directory's
 support of DMA.  It
  turns
 out that the initial DMA support in that protocol wasn't very
 well

Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-15 Thread Malek Musleh
Hi Brad,

How is that you are able to run the memtester in FS Mode?
I see the ruby_mem_tester.py in /configs/example/ but it seems that it
is only configured for SE Mode as far as Ruby is concerned?

Also, how would the default block size be '0' without that problem
changeset? If it was 0, doesn't that mean it's not passing the data
from the DMA transfer? It would have to be at least 1?

Malek

On Mon, Mar 14, 2011 at 5:32 PM, Beckmann, Brad brad.beckm...@amd.com wrote:
 Hi Malek,

 Just to reiterate, I don't think my patches will fix the underlining problem. 
  Instead, my patches just fix various corner cases in the protocols.  I 
 suspect these corner cases are never actually reached in real execution.

 The fact that your dma traces point out that the Ruby and Classic 
 configurations use different base addresses makes me think this might be a 
 problem with configuration and device registration.  We should investigate 
 further.

 Brad


 -Original Message-
 From: Malek Musleh [mailto:malek.mus...@gmail.com]
 Sent: Monday, March 14, 2011 9:11 AM
 To: M5 Developer List
 Cc: Beckmann, Brad
 Subject: Re: [m5-dev] Ruby FS - DMA Controller problem?

 Hi Korey/Brad,

 I commented out the following lines:

 In RubyPort.hh

  unsigned deviceBlockSize() const;

 In RubyPort.cc

 unsigned
 RubyPort::M5Port::deviceBlockSize() const {
     return (unsigned) RubySystem::getBlockSizeBytes(); }

 I also did a diff trace between M5 and Ruby using the IdeDisk traceflag as
 indicated earlier on.

 In the Ruby Trace, it stalls at this

 2398589225000: system.disk0: Write to disk at offset: 0x1 data 0
 239858940: system.disk0: Write to disk at offset: 0x2 data 0x10
 2398589575000: system.disk0: Write to disk at offset: 0x3 data 0
 2398589742000: system.disk0: Write to disk at offset: 0x4 data 0
 2398589909000: system.disk0: Write to disk at offset: 0x5 data 0
 2398590088000: system.disk0: Write to disk at offset: 0x6 data 0xe0
 2398596763500: system.disk0: Write to disk at offset: 0x7 data 0xc8
 2398597916500: system.disk0: PRD: baseAddr:0x87298000 (0x7298000)
 byteCount:8192 (16) eot:0x8000 sector:0
 2398597916500: system.disk0: doDmaWrite, diskDelay: 100
 totalDiskDelay: 116

 Waiting for the Interrupt to be Posted.

 However, a comparison between the M5 and Ruby traces suggest that they
 differ on the following line:

 RubyTrace:

 239858940: system.disk0: Write to disk at offset: 0x2 data 0x10
 2398589575000: system.disk0: Write to disk at offset: 0x3 data 0
 2398589742000: system.disk0: Write to disk at offset: 0x4 data 0
 2398589909000: system.disk0: Write to disk at offset: 0x5 data 0
 2398590088000: system.disk0: Write to disk at offset: 0x6 data 0xe0
 2398596763500: system.disk0: Write to disk at offset: 0x7 data 0xc8
 2398597916500: system.disk0: PRD: baseAddr:0x87298000 (0x7298000)
 byteCount:8192 (16) eot:0x8000 sector:0
 2398597916500: system.disk0: doDmaWrite, diskDelay: 100
 totalDiskDelay: 116


 M5 Trace:

 2237623634000: system.disk0: Write to disk at offset: 0x7 data 0xc8
 2237624206501: system.disk0: PRD: baseAddr:0x87392000 (0x7392000)
 byteCount:8192
  (16) eot:0x8000 sector:0
 2237624206501: system.disk0: doDmaWrite, diskDelay: 100
 totalDiskDelay: 116

 If you note that the PRD:baseAddr it tries to access is different, which I 
 would
 think should be the same right? There is no reason why it should be
 different? The 0 or 1 block size, and the sequential retries are forcing the
 DMA timer to time out the request, and thus fails in the dma inconsistent
 state.

 I have attached both sets of traces in case it sheds anymore light on to the
 cause of the problem.

 In any case, it might not matter too much now since Brad was able to
 reproduce the problem and has a patch for it, but may be of use for future
 M5 changes.

 Malek

 On Mon, Mar 14, 2011 at 11:54 AM, Beckmann, Brad
 brad.beckm...@amd.com wrote:
  Thanks Malek.  Very interesting.
 
  Yes, this 5 line changeset seems rather benign, but actually has huge
 ramifications.  With this change, the RubyPort passes the correct block size 
 to
 the cpu/device models.  Without it, I believe the block size defaults to 0 or
 1...I can't remember which.  While that seems rather inconsequential, I
 noticed when I made this change that the memtester behaved quite
 differently.  In particular, it keeps issuing requests until sendTiming 
 returns
 false, instead of just one request/cpu at a time.  Therefore another patch in
 this series added the retry mechanism to the RubyPort.  I'm still not sure
 exactly what the problem is with ruby+dma, but I suspect that the dma
 devices are behaving differently now that the RubyPort passes the correct
 block size.
 
  I was able to spend a few hours on this over the weekend.  I am now able
 to reproduce the error and I have a few protocol bug fixes queued
 up.  However, I don't think those fixes actually solved the main issue.  I 
 don't
 think I'll be able to get to it today

Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-15 Thread Korey Sewell

 Also, how would the default block size be '0' without that problem
 changeset?

The M5Port is derived from SimpleTimingPort (mem/tport.hh) which was derived
from Port (mem/port.hh) which has a virtual deviceBlockSize function that
always set to 0.



 If it was 0, doesn't that mean it's not passing the data
 from the DMA transfer? It would have to be at least 1?

I'm not sure about this, but I hope to find soon. More than likely some
default value is getting set if it see '0' or something invalid (But thats
just a guess).


-- 
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-15 Thread Beckmann, Brad
 How is that you are able to run the memtester in FS Mode?
 I see the ruby_mem_tester.py in /configs/example/ but it seems that it is
 only configured for SE Mode as far as Ruby is concerned?

I don't run it in FS mode.  Since the DMA bug manifests only after hours of 
execution, I wanted to first verify that the DMA protocol support was solid 
using the mem tester.  Somewhat surprisingly, I found several bugs in 
MOESI_CMP_directory's support of DMA.  It turns out that the initial DMA 
support in that protocol wasn't very well thought out.  Now I fixed those bugs, 
but since the DMA problem also arises with the MOESI_hammer protocol, I'm 
confident that my patches don't fix the real problem.

Brad

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-15 Thread Korey Sewell
Hi Brad/Malek,
I've been able to regenerate this error  in about 20mins now (instead of
hours) by running things in atomic mode. Not sure if that helps or not...

On Tue, Mar 15, 2011 at 3:03 PM, Beckmann, Brad brad.beckm...@amd.comwrote:

  How is that you are able to run the memtester in FS Mode?
  I see the ruby_mem_tester.py in /configs/example/ but it seems that it is
  only configured for SE Mode as far as Ruby is concerned?

 I don't run it in FS mode.  Since the DMA bug manifests only after hours of
 execution, I wanted to first verify that the DMA protocol support was solid
 using the mem tester.  Somewhat surprisingly, I found several bugs in
 MOESI_CMP_directory's support of DMA.  It turns out that the initial DMA
 support in that protocol wasn't very well thought out.  Now I fixed those
 bugs, but since the DMA problem also arises with the MOESI_hammer protocol,
 I'm confident that my patches don't fix the real problem.

 Brad

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev




-- 
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-15 Thread Beckmann, Brad
I'm confused.

Korey, I thought this DMA problem only existed with Ruby?  If so, how were you 
able to reproduce it using atomic mode?  Ruby does not work with the atomic cpu 
model.

Please clarify, thanks!

Brad

 -Original Message-
 From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
 On Behalf Of Korey Sewell
 Sent: Tuesday, March 15, 2011 12:09 PM
 To: M5 Developer List
 Subject: Re: [m5-dev] Ruby FS - DMA Controller problem?
 
 Hi Brad/Malek,
 I've been able to regenerate this error  in about 20mins now (instead of
 hours) by running things in atomic mode. Not sure if that helps or not...
 
 On Tue, Mar 15, 2011 at 3:03 PM, Beckmann, Brad
 brad.beckm...@amd.comwrote:
 
   How is that you are able to run the memtester in FS Mode?
   I see the ruby_mem_tester.py in /configs/example/ but it seems that
   it is only configured for SE Mode as far as Ruby is concerned?
 
  I don't run it in FS mode.  Since the DMA bug manifests only after
  hours of execution, I wanted to first verify that the DMA protocol
  support was solid using the mem tester.  Somewhat surprisingly, I
  found several bugs in MOESI_CMP_directory's support of DMA.  It turns
  out that the initial DMA support in that protocol wasn't very well
  thought out.  Now I fixed those bugs, but since the DMA problem also
  arises with the MOESI_hammer protocol, I'm confident that my patches
 don't fix the real problem.
 
  Brad
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 
 
 
 
 --
 - Korey
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-15 Thread Korey Sewell
Sorry for the confusion, I definitely garbled up some terminology.

I meant that the M5 ran with the atomic model to compare with the timing
Ruby model.

M5-atomic maybe runs in 10-15 mins and then Ruby 20-30 mins.

I am able to get the problem point in the Ruby simulation (bad DMA access)
in about 20 mins.

I able to get to that same problem point in the M5-atomic mode in about 10
mins so as to see what to compare against and what values are being
set/unset incorrectly.



On Tue, Mar 15, 2011 at 6:22 PM, Beckmann, Brad brad.beckm...@amd.comwrote:

 I'm confused.

 Korey, I thought this DMA problem only existed with Ruby?  If so, how were
 you able to reproduce it using atomic mode?  Ruby does not work with the
 atomic cpu model.

 Please clarify, thanks!

 Brad

  -Original Message-
  From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
  On Behalf Of Korey Sewell
  Sent: Tuesday, March 15, 2011 12:09 PM
  To: M5 Developer List
  Subject: Re: [m5-dev] Ruby FS - DMA Controller problem?
 
  Hi Brad/Malek,
  I've been able to regenerate this error  in about 20mins now (instead of
  hours) by running things in atomic mode. Not sure if that helps or not...
 
  On Tue, Mar 15, 2011 at 3:03 PM, Beckmann, Brad
  brad.beckm...@amd.comwrote:
 
How is that you are able to run the memtester in FS Mode?
I see the ruby_mem_tester.py in /configs/example/ but it seems that
it is only configured for SE Mode as far as Ruby is concerned?
  
   I don't run it in FS mode.  Since the DMA bug manifests only after
   hours of execution, I wanted to first verify that the DMA protocol
   support was solid using the mem tester.  Somewhat surprisingly, I
   found several bugs in MOESI_CMP_directory's support of DMA.  It turns
   out that the initial DMA support in that protocol wasn't very well
   thought out.  Now I fixed those bugs, but since the DMA problem also
   arises with the MOESI_hammer protocol, I'm confident that my patches
  don't fix the real problem.
  
   Brad
  
   ___
   m5-dev mailing list
   m5-dev@m5sim.org
   http://m5sim.org/mailman/listinfo/m5-dev
  
 
 
 
  --
  - Korey
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev


 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev




-- 
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-14 Thread Malek Musleh
Hi Brad,

I found the problem that was causing this error. Specifically, it is
this changeset:

changeset:   7909:eee578ed2130
user:Joel Hestness hestn...@cs.utexas.edu
date:Sun Feb 06 22:14:18 2011 -0800
summary: Ruby: Fix to return cache block size to CPU for split data
transfers

Link: http://reviews.m5sim.org/r/393/diff/#index_header

Previously, I mentioned it was a couple of changesets prior to this
one, but the changes between them are related, so it wasn't as obvious
what was happening.

In fact, this corresponds to the assert() for the block size you had
put in to deal with x86 unaligned accesses, but then later removed
because of LL/SC in Alpha.

It's not clear to me why this is causing a problem, or rather why this
doesn't return the default 64 byte block size from the ruby system,
but commenting out those lines of code allowed it to work.

Maybe Korey could confirm?

Malek

On Wed, Mar 9, 2011 at 8:24 PM, Beckmann, Brad brad.beckm...@amd.com wrote:
 I still have not been able to reproduce the problem, but I haven't tried in a 
 few weeks.  So does this happen when booting up the system, independent of 
 what benchmark you are running?  If so, could you send me your command line?  
 I'm sure the disk image and kernel binaries between us are different, so I 
 don't necessarily think I'll be able to reproduce your problem, but at least 
 I'll be able to isolate it.

 Brad



 -Original Message-
 From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
 On Behalf Of Malek Musleh
 Sent: Wednesday, March 09, 2011 4:41 PM
 To: M5 Developer List
 Subject: Re: [m5-dev] Ruby FS - DMA Controller problem?

 Hi Korey,

 I ran into a similar problem with a different benchmark/boot up attempt.
 There is another thread on m5-dev with 'Ruby FS failing with recent
 changesets' as the subject. I was able to track down the changeset which it
 was coming from, but I did not look further into the changeset as to why it
 was causing it.

 Brad said he would take a look at it, but I am not sure if he was able to
 reproduce the problem.

 Malek

 On Wed, Mar 9, 2011 at 7:08 PM, Korey Sewell ksew...@umich.edu wrote:
  Hi all,
  I'm trying to run Ruby in FS mode for the FFT benchmark.
 
  However, I've been unable to fully boot the kernel and error with a
  panic in the IDE disk controller:
  panic: Inconsistent DMA transfer state: dmaState = 2 devState = 1 @
  cycle 62640732569001
  [doDmaTransfer:build/ALPHA_FS_MOESI_CMP_directory/dev/ide_disk.cc,
  line 323]
 
  Has anybody run into a similar error or does anyone have any
  suggestions for debugging the problem? I can run the same code using
  the M5 memory system and FFT finishes properly so it's definitely a
  ruby-specific thing. It seems to track this down , I could diff
  instruction traces (M5 v. Ruby) or maybe even diff trace output from
  the IdeDisk trace flags but those routes seem a bit heavy-handed
 considering the amount of trace output generated.
 
  The command line this was run with is:
  build/ALPHA_FS_MOESI_CMP_directory/m5.opt
 configs/example/ruby_fs.py
  -b fft_64t_base -n 1
 
  The output in system.terminal is:
  hda: M5 IDE Disk, ATA DISK drive
  hdb: M5 IDE Disk, ATA DISK drive
  hda: UDMA/33 mode selected
  hdb: UDMA/33 mode selected
  hdc: M5 IDE Disk, ATA DISK drive
  hdc: UDMA/33 mode selected
  ide0 at 0x8410-0x8417,0x8422 on irq 31
  ide1 at 0x8418-0x841f,0x8426 on irq 31
  ide_generic: please use probe_mask=0x3f module parameter for probing
  all legacy ISA IDE ports
  ide2 at 0x1f0-0x1f7,0x3f6 on irq 14
  ide3 at 0x170-0x177,0x376 on irq 15
  hda: max request size: 128KiB
  hda: 2866752 sectors (1467 MB), CHS=2844/16/63
   hda:4hda: dma_timer_expiry: dma status == 0x65
  hda: DMA interrupt recovery
  hda: lost interrupt
   unknown partition table
  hdb: max request size: 128KiB
  hdb: 1008000 sectors (516 MB), CHS=1000/16/63
   hdb:4hdb: dma_timer_expiry: dma status == 0x65
  hdb: DMA interrupt recovery
  hdb: lost interrupt
 
  Thanks again, any help or thoughts would be well appreciated.
 
  --
  - Korey
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-14 Thread Korey Sewell
Which lines are you commenting out to  get it to work? It's a bit
unclear in the diff you point to (maybe because you said it's a full
set of changes, not just one)

(btw: The work I've been doing is comparing the old m5 memory trace
to the gem5 memory trace to try to chase down the bug. I wouldn't be
surprised if we are converging to the same bug though.)

On Mon, Mar 14, 2011 at 3:51 AM, Malek Musleh malek.mus...@gmail.com wrote:
 Hi Brad,

 I found the problem that was causing this error. Specifically, it is
 this changeset:

 changeset:   7909:eee578ed2130
 user:        Joel Hestness hestn...@cs.utexas.edu
 date:        Sun Feb 06 22:14:18 2011 -0800
 summary:     Ruby: Fix to return cache block size to CPU for split data
 transfers

 Link: http://reviews.m5sim.org/r/393/diff/#index_header

 Previously, I mentioned it was a couple of changesets prior to this
 one, but the changes between them are related, so it wasn't as obvious
 what was happening.

 In fact, this corresponds to the assert() for the block size you had
 put in to deal with x86 unaligned accesses, but then later removed
 because of LL/SC in Alpha.

 It's not clear to me why this is causing a problem, or rather why this
 doesn't return the default 64 byte block size from the ruby system,
 but commenting out those lines of code allowed it to work.

 Maybe Korey could confirm?

 Malek

 On Wed, Mar 9, 2011 at 8:24 PM, Beckmann, Brad brad.beckm...@amd.com wrote:
 I still have not been able to reproduce the problem, but I haven't tried in 
 a few weeks.  So does this happen when booting up the system, independent of 
 what benchmark you are running?  If so, could you send me your command line? 
  I'm sure the disk image and kernel binaries between us are different, so I 
 don't necessarily think I'll be able to reproduce your problem, but at least 
 I'll be able to isolate it.

 Brad



 -Original Message-
 From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
 On Behalf Of Malek Musleh
 Sent: Wednesday, March 09, 2011 4:41 PM
 To: M5 Developer List
 Subject: Re: [m5-dev] Ruby FS - DMA Controller problem?

 Hi Korey,

 I ran into a similar problem with a different benchmark/boot up attempt.
 There is another thread on m5-dev with 'Ruby FS failing with recent
 changesets' as the subject. I was able to track down the changeset which it
 was coming from, but I did not look further into the changeset as to why it
 was causing it.

 Brad said he would take a look at it, but I am not sure if he was able to
 reproduce the problem.

 Malek

 On Wed, Mar 9, 2011 at 7:08 PM, Korey Sewell ksew...@umich.edu wrote:
  Hi all,
  I'm trying to run Ruby in FS mode for the FFT benchmark.
 
  However, I've been unable to fully boot the kernel and error with a
  panic in the IDE disk controller:
  panic: Inconsistent DMA transfer state: dmaState = 2 devState = 1 @
  cycle 62640732569001
  [doDmaTransfer:build/ALPHA_FS_MOESI_CMP_directory/dev/ide_disk.cc,
  line 323]
 
  Has anybody run into a similar error or does anyone have any
  suggestions for debugging the problem? I can run the same code using
  the M5 memory system and FFT finishes properly so it's definitely a
  ruby-specific thing. It seems to track this down , I could diff
  instruction traces (M5 v. Ruby) or maybe even diff trace output from
  the IdeDisk trace flags but those routes seem a bit heavy-handed
 considering the amount of trace output generated.
 
  The command line this was run with is:
  build/ALPHA_FS_MOESI_CMP_directory/m5.opt
 configs/example/ruby_fs.py
  -b fft_64t_base -n 1
 
  The output in system.terminal is:
  hda: M5 IDE Disk, ATA DISK drive
  hdb: M5 IDE Disk, ATA DISK drive
  hda: UDMA/33 mode selected
  hdb: UDMA/33 mode selected
  hdc: M5 IDE Disk, ATA DISK drive
  hdc: UDMA/33 mode selected
  ide0 at 0x8410-0x8417,0x8422 on irq 31
  ide1 at 0x8418-0x841f,0x8426 on irq 31
  ide_generic: please use probe_mask=0x3f module parameter for probing
  all legacy ISA IDE ports
  ide2 at 0x1f0-0x1f7,0x3f6 on irq 14
  ide3 at 0x170-0x177,0x376 on irq 15
  hda: max request size: 128KiB
  hda: 2866752 sectors (1467 MB), CHS=2844/16/63
   hda:4hda: dma_timer_expiry: dma status == 0x65
  hda: DMA interrupt recovery
  hda: lost interrupt
   unknown partition table
  hdb: max request size: 128KiB
  hdb: 1008000 sectors (516 MB), CHS=1000/16/63
   hdb:4hdb: dma_timer_expiry: dma status == 0x65
  hdb: DMA interrupt recovery
  hdb: lost interrupt
 
  Thanks again, any help or thoughts would be well appreciated.
 
  --
  - Korey
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

 ___
 m5

Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-14 Thread Beckmann, Brad
Thanks Malek.  Very interesting.

Yes, this 5 line changeset seems rather benign, but actually has huge 
ramifications.  With this change, the RubyPort passes the correct block size to 
the cpu/device models.  Without it, I believe the block size defaults to 0 or 
1...I can't remember which.  While that seems rather inconsequential, I noticed 
when I made this change that the memtester behaved quite differently.  In 
particular, it keeps issuing requests until sendTiming returns false, instead 
of just one request/cpu at a time.  Therefore another patch in this series 
added the retry mechanism to the RubyPort.  I'm still not sure exactly what the 
problem is with ruby+dma, but I suspect that the dma devices are behaving 
differently now that the RubyPort passes the correct block size.

I was able to spend a few hours on this over the weekend.  I am now able to 
reproduce the error and I have a few protocol bug fixes queued up.  However, I 
don't think those fixes actually solved the main issue.  I don't think I'll be 
able to get to it today, but I'll try to find some time tomorrow to investigate 
further.  

Brad


 -Original Message-
 From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
 On Behalf Of Korey Sewell
 Sent: Monday, March 14, 2011 2:10 AM
 To: M5 Developer List
 Subject: Re: [m5-dev] Ruby FS - DMA Controller problem?
 
 Which lines are you commenting out to  get it to work? It's a bit unclear in 
 the
 diff you point to (maybe because you said it's a full set of changes, not just
 one)
 
 (btw: The work I've been doing is comparing the old m5 memory trace to
 the gem5 memory trace to try to chase down the bug. I wouldn't be
 surprised if we are converging to the same bug though.)
 
 On Mon, Mar 14, 2011 at 3:51 AM, Malek Musleh
 malek.mus...@gmail.com wrote:
  Hi Brad,
 
  I found the problem that was causing this error. Specifically, it is
  this changeset:
 
  changeset:   7909:eee578ed2130
  user:        Joel Hestness hestn...@cs.utexas.edu
  date:        Sun Feb 06 22:14:18 2011 -0800
  summary:     Ruby: Fix to return cache block size to CPU for split
  data transfers
 
  Link: http://reviews.m5sim.org/r/393/diff/#index_header
 
  Previously, I mentioned it was a couple of changesets prior to this
  one, but the changes between them are related, so it wasn't as obvious
  what was happening.
 
  In fact, this corresponds to the assert() for the block size you had
  put in to deal with x86 unaligned accesses, but then later removed
  because of LL/SC in Alpha.
 
  It's not clear to me why this is causing a problem, or rather why this
  doesn't return the default 64 byte block size from the ruby system,
  but commenting out those lines of code allowed it to work.
 
  Maybe Korey could confirm?
 
  Malek
 
  On Wed, Mar 9, 2011 at 8:24 PM, Beckmann, Brad
 brad.beckm...@amd.com wrote:
  I still have not been able to reproduce the problem, but I haven't tried 
  in a
 few weeks.  So does this happen when booting up the system, independent
 of what benchmark you are running?  If so, could you send me your
 command line?  I'm sure the disk image and kernel binaries between us are
 different, so I don't necessarily think I'll be able to reproduce your 
 problem,
 but at least I'll be able to isolate it.
 
  Brad
 
 
 
  -Original Message-
  From: m5-dev-boun...@m5sim.org [mailto:m5-dev-
 boun...@m5sim.org] On
  Behalf Of Malek Musleh
  Sent: Wednesday, March 09, 2011 4:41 PM
  To: M5 Developer List
  Subject: Re: [m5-dev] Ruby FS - DMA Controller problem?
 
  Hi Korey,
 
  I ran into a similar problem with a different benchmark/boot up attempt.
  There is another thread on m5-dev with 'Ruby FS failing with recent
  changesets' as the subject. I was able to track down the changeset
  which it was coming from, but I did not look further into the
  changeset as to why it was causing it.
 
  Brad said he would take a look at it, but I am not sure if he was
  able to reproduce the problem.
 
  Malek
 
  On Wed, Mar 9, 2011 at 7:08 PM, Korey Sewell ksew...@umich.edu
 wrote:
   Hi all,
   I'm trying to run Ruby in FS mode for the FFT benchmark.
  
   However, I've been unable to fully boot the kernel and error with
   a panic in the IDE disk controller:
   panic: Inconsistent DMA transfer state: dmaState = 2 devState = 1
   @ cycle 62640732569001
  
 [doDmaTransfer:build/ALPHA_FS_MOESI_CMP_directory/dev/ide_disk.cc,
   line 323]
  
   Has anybody run into a similar error or does anyone have any
   suggestions for debugging the problem? I can run the same code
   using the M5 memory system and FFT finishes properly so it's
   definitely a ruby-specific thing. It seems to track this down , I
   could diff instruction traces (M5 v. Ruby) or maybe even diff
   trace output from the IdeDisk trace flags but those routes seem a
   bit heavy-handed
  considering the amount of trace output generated.
  
   The command line this was run with is:
   build/ALPHA_FS_MOESI_CMP_directory

Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-14 Thread Beckmann, Brad
Hi Malek,

Just to reiterate, I don't think my patches will fix the underlining problem.  
Instead, my patches just fix various corner cases in the protocols.  I suspect 
these corner cases are never actually reached in real execution.

The fact that your dma traces point out that the Ruby and Classic 
configurations use different base addresses makes me think this might be a 
problem with configuration and device registration.  We should investigate 
further.

Brad


 -Original Message-
 From: Malek Musleh [mailto:malek.mus...@gmail.com]
 Sent: Monday, March 14, 2011 9:11 AM
 To: M5 Developer List
 Cc: Beckmann, Brad
 Subject: Re: [m5-dev] Ruby FS - DMA Controller problem?
 
 Hi Korey/Brad,
 
 I commented out the following lines:
 
 In RubyPort.hh
 
  unsigned deviceBlockSize() const;
 
 In RubyPort.cc
 
 unsigned
 RubyPort::M5Port::deviceBlockSize() const {
 return (unsigned) RubySystem::getBlockSizeBytes(); }
 
 I also did a diff trace between M5 and Ruby using the IdeDisk traceflag as
 indicated earlier on.
 
 In the Ruby Trace, it stalls at this
 
 2398589225000: system.disk0: Write to disk at offset: 0x1 data 0
 239858940: system.disk0: Write to disk at offset: 0x2 data 0x10
 2398589575000: system.disk0: Write to disk at offset: 0x3 data 0
 2398589742000: system.disk0: Write to disk at offset: 0x4 data 0
 2398589909000: system.disk0: Write to disk at offset: 0x5 data 0
 2398590088000: system.disk0: Write to disk at offset: 0x6 data 0xe0
 2398596763500: system.disk0: Write to disk at offset: 0x7 data 0xc8
 2398597916500: system.disk0: PRD: baseAddr:0x87298000 (0x7298000)
 byteCount:8192 (16) eot:0x8000 sector:0
 2398597916500: system.disk0: doDmaWrite, diskDelay: 100
 totalDiskDelay: 116
 
 Waiting for the Interrupt to be Posted.
 
 However, a comparison between the M5 and Ruby traces suggest that they
 differ on the following line:
 
 RubyTrace:
 
 239858940: system.disk0: Write to disk at offset: 0x2 data 0x10
 2398589575000: system.disk0: Write to disk at offset: 0x3 data 0
 2398589742000: system.disk0: Write to disk at offset: 0x4 data 0
 2398589909000: system.disk0: Write to disk at offset: 0x5 data 0
 2398590088000: system.disk0: Write to disk at offset: 0x6 data 0xe0
 2398596763500: system.disk0: Write to disk at offset: 0x7 data 0xc8
 2398597916500: system.disk0: PRD: baseAddr:0x87298000 (0x7298000)
 byteCount:8192 (16) eot:0x8000 sector:0
 2398597916500: system.disk0: doDmaWrite, diskDelay: 100
 totalDiskDelay: 116
 
 
 M5 Trace:
 
 2237623634000: system.disk0: Write to disk at offset: 0x7 data 0xc8
 2237624206501: system.disk0: PRD: baseAddr:0x87392000 (0x7392000)
 byteCount:8192
  (16) eot:0x8000 sector:0
 2237624206501: system.disk0: doDmaWrite, diskDelay: 100
 totalDiskDelay: 116
 
 If you note that the PRD:baseAddr it tries to access is different, which I 
 would
 think should be the same right? There is no reason why it should be
 different? The 0 or 1 block size, and the sequential retries are forcing the
 DMA timer to time out the request, and thus fails in the dma inconsistent
 state.
 
 I have attached both sets of traces in case it sheds anymore light on to the
 cause of the problem.
 
 In any case, it might not matter too much now since Brad was able to
 reproduce the problem and has a patch for it, but may be of use for future
 M5 changes.
 
 Malek
 
 On Mon, Mar 14, 2011 at 11:54 AM, Beckmann, Brad
 brad.beckm...@amd.com wrote:
  Thanks Malek.  Very interesting.
 
  Yes, this 5 line changeset seems rather benign, but actually has huge
 ramifications.  With this change, the RubyPort passes the correct block size 
 to
 the cpu/device models.  Without it, I believe the block size defaults to 0 or
 1...I can't remember which.  While that seems rather inconsequential, I
 noticed when I made this change that the memtester behaved quite
 differently.  In particular, it keeps issuing requests until sendTiming 
 returns
 false, instead of just one request/cpu at a time.  Therefore another patch in
 this series added the retry mechanism to the RubyPort.  I'm still not sure
 exactly what the problem is with ruby+dma, but I suspect that the dma
 devices are behaving differently now that the RubyPort passes the correct
 block size.
 
  I was able to spend a few hours on this over the weekend.  I am now able
 to reproduce the error and I have a few protocol bug fixes queued
 up.  However, I don't think those fixes actually solved the main issue.  I 
 don't
 think I'll be able to get to it today, but I'll try to find some time 
 tomorrow to
 investigate further.
 
  Brad
 
 
  -Original Message-
  From: m5-dev-boun...@m5sim.org [mailto:m5-dev-
 boun...@m5sim.org] On
  Behalf Of Korey Sewell
  Sent: Monday, March 14, 2011 2:10 AM
  To: M5 Developer List
  Subject: Re: [m5-dev] Ruby FS - DMA Controller problem?
 
  Which lines are you commenting out to  get it to work? It's a bit
  unclear in the diff you point to (maybe because you said it's

[m5-dev] Ruby FS - DMA Controller problem?

2011-03-09 Thread Korey Sewell
Hi all,
I'm trying to run Ruby in FS mode for the FFT benchmark.

However, I've been unable to fully boot the kernel and error with a panic in
the IDE disk controller:
panic: Inconsistent DMA transfer state: dmaState = 2 devState = 1
@ cycle 62640732569001
[doDmaTransfer:build/ALPHA_FS_MOESI_CMP_directory/dev/ide_disk.cc, line 323]

Has anybody run into a similar error or does anyone have any suggestions for
debugging the problem? I can run the same code using the M5 memory system
and FFT finishes properly so it's definitely a ruby-specific thing. It seems
to track this down , I could diff instruction traces (M5 v. Ruby) or maybe
even diff trace output from the IdeDisk trace flags but those routes seem a
bit heavy-handed considering the amount of trace output generated.

The command line this was run with is:
build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b
fft_64t_base -n 1

The output in system.terminal is:
hda: M5 IDE Disk, ATA DISK drive
hdb: M5 IDE Disk, ATA DISK drive
hda: UDMA/33 mode selected
hdb: UDMA/33 mode selected
hdc: M5 IDE Disk, ATA DISK drive
hdc: UDMA/33 mode selected
ide0 at 0x8410-0x8417,0x8422 on irq 31
ide1 at 0x8418-0x841f,0x8426 on irq 31
ide_generic: please use probe_mask=0x3f module parameter for probing all
legacy ISA IDE ports
ide2 at 0x1f0-0x1f7,0x3f6 on irq 14
ide3 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 2866752 sectors (1467 MB), CHS=2844/16/63
 hda:4hda: dma_timer_expiry: dma status == 0x65
hda: DMA interrupt recovery
hda: lost interrupt
 unknown partition table
hdb: max request size: 128KiB
hdb: 1008000 sectors (516 MB), CHS=1000/16/63
 hdb:4hdb: dma_timer_expiry: dma status == 0x65
hdb: DMA interrupt recovery
hdb: lost interrupt

Thanks again, any help or thoughts would be well appreciated.

-- 
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-09 Thread Malek Musleh
Hi Korey,

I ran into a similar problem with a different benchmark/boot up
attempt. There is another thread on m5-dev with 'Ruby FS failing with
recent changesets' as the subject. I was able to track down the
changeset which it was coming from, but I did not look further into
the changeset as to why it was causing it.

Brad said he would take a look at it, but I am not sure if he was able
to reproduce the problem.

Malek

On Wed, Mar 9, 2011 at 7:08 PM, Korey Sewell ksew...@umich.edu wrote:
 Hi all,
 I'm trying to run Ruby in FS mode for the FFT benchmark.

 However, I've been unable to fully boot the kernel and error with a panic in
 the IDE disk controller:
 panic: Inconsistent DMA transfer state: dmaState = 2 devState = 1
 @ cycle 62640732569001
 [doDmaTransfer:build/ALPHA_FS_MOESI_CMP_directory/dev/ide_disk.cc, line 323]

 Has anybody run into a similar error or does anyone have any suggestions for
 debugging the problem? I can run the same code using the M5 memory system
 and FFT finishes properly so it's definitely a ruby-specific thing. It seems
 to track this down , I could diff instruction traces (M5 v. Ruby) or maybe
 even diff trace output from the IdeDisk trace flags but those routes seem a
 bit heavy-handed considering the amount of trace output generated.

 The command line this was run with is:
 build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b
 fft_64t_base -n 1

 The output in system.terminal is:
 hda: M5 IDE Disk, ATA DISK drive
 hdb: M5 IDE Disk, ATA DISK drive
 hda: UDMA/33 mode selected
 hdb: UDMA/33 mode selected
 hdc: M5 IDE Disk, ATA DISK drive
 hdc: UDMA/33 mode selected
 ide0 at 0x8410-0x8417,0x8422 on irq 31
 ide1 at 0x8418-0x841f,0x8426 on irq 31
 ide_generic: please use probe_mask=0x3f module parameter for probing all
 legacy ISA IDE ports
 ide2 at 0x1f0-0x1f7,0x3f6 on irq 14
 ide3 at 0x170-0x177,0x376 on irq 15
 hda: max request size: 128KiB
 hda: 2866752 sectors (1467 MB), CHS=2844/16/63
  hda:4hda: dma_timer_expiry: dma status == 0x65
 hda: DMA interrupt recovery
 hda: lost interrupt
  unknown partition table
 hdb: max request size: 128KiB
 hdb: 1008000 sectors (516 MB), CHS=1000/16/63
  hdb:4hdb: dma_timer_expiry: dma status == 0x65
 hdb: DMA interrupt recovery
 hdb: lost interrupt

 Thanks again, any help or thoughts would be well appreciated.

 --
 - Korey
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Ruby FS - DMA Controller problem?

2011-03-09 Thread Beckmann, Brad
I still have not been able to reproduce the problem, but I haven't tried in a 
few weeks.  So does this happen when booting up the system, independent of what 
benchmark you are running?  If so, could you send me your command line?  I'm 
sure the disk image and kernel binaries between us are different, so I don't 
necessarily think I'll be able to reproduce your problem, but at least I'll be 
able to isolate it.

Brad



 -Original Message-
 From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
 On Behalf Of Malek Musleh
 Sent: Wednesday, March 09, 2011 4:41 PM
 To: M5 Developer List
 Subject: Re: [m5-dev] Ruby FS - DMA Controller problem?
 
 Hi Korey,
 
 I ran into a similar problem with a different benchmark/boot up attempt.
 There is another thread on m5-dev with 'Ruby FS failing with recent
 changesets' as the subject. I was able to track down the changeset which it
 was coming from, but I did not look further into the changeset as to why it
 was causing it.
 
 Brad said he would take a look at it, but I am not sure if he was able to
 reproduce the problem.
 
 Malek
 
 On Wed, Mar 9, 2011 at 7:08 PM, Korey Sewell ksew...@umich.edu wrote:
  Hi all,
  I'm trying to run Ruby in FS mode for the FFT benchmark.
 
  However, I've been unable to fully boot the kernel and error with a
  panic in the IDE disk controller:
  panic: Inconsistent DMA transfer state: dmaState = 2 devState = 1 @
  cycle 62640732569001
  [doDmaTransfer:build/ALPHA_FS_MOESI_CMP_directory/dev/ide_disk.cc,
  line 323]
 
  Has anybody run into a similar error or does anyone have any
  suggestions for debugging the problem? I can run the same code using
  the M5 memory system and FFT finishes properly so it's definitely a
  ruby-specific thing. It seems to track this down , I could diff
  instruction traces (M5 v. Ruby) or maybe even diff trace output from
  the IdeDisk trace flags but those routes seem a bit heavy-handed
 considering the amount of trace output generated.
 
  The command line this was run with is:
  build/ALPHA_FS_MOESI_CMP_directory/m5.opt
 configs/example/ruby_fs.py
  -b fft_64t_base -n 1
 
  The output in system.terminal is:
  hda: M5 IDE Disk, ATA DISK drive
  hdb: M5 IDE Disk, ATA DISK drive
  hda: UDMA/33 mode selected
  hdb: UDMA/33 mode selected
  hdc: M5 IDE Disk, ATA DISK drive
  hdc: UDMA/33 mode selected
  ide0 at 0x8410-0x8417,0x8422 on irq 31
  ide1 at 0x8418-0x841f,0x8426 on irq 31
  ide_generic: please use probe_mask=0x3f module parameter for probing
  all legacy ISA IDE ports
  ide2 at 0x1f0-0x1f7,0x3f6 on irq 14
  ide3 at 0x170-0x177,0x376 on irq 15
  hda: max request size: 128KiB
  hda: 2866752 sectors (1467 MB), CHS=2844/16/63
   hda:4hda: dma_timer_expiry: dma status == 0x65
  hda: DMA interrupt recovery
  hda: lost interrupt
   unknown partition table
  hdb: max request size: 128KiB
  hdb: 1008000 sectors (516 MB), CHS=1000/16/63
   hdb:4hdb: dma_timer_expiry: dma status == 0x65
  hdb: DMA interrupt recovery
  hdb: lost interrupt
 
  Thanks again, any help or thoughts would be well appreciated.
 
  --
  - Korey
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev