Re: [m5-dev] Ruby FS - DMA Controller problem?
Korey, if you're deadlock is with running the MOESI_CMP_directory protocol, I'm not surprised. DMA support is pretty much broken in that protocol. I have that fixed and I also fixed the underlining DMA problem. I'll be pushing the fixes momentarily. Korey and Malek, please pull these changes and confirm they fix your problem. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Friday, March 18, 2011 9:12 AM To: M5 Developer List Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? message below Why did it work before the block size patch? - When the ChuckGenerator sees the block size is 0, it doesn't split up the request into multiple patches and sends the whole dma request at once. That is fine because the DMASequencer splits the request into multiple requests and only responds to the dma port when the entire request is complete. With regards to the old changeset that boots with the block size = 0, I was not able to boot a large scale CMP system (more than 16 cores) due to the deadlock threshold being triggered. I'm assuming that Brad has a read on how to fix that problem so I'll probably start working on what is causing that deadlock so hopefully we can kind of pipeline the bug fixes. -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
On Sat, March 19, 2011 6:01 pm, Beckmann, Brad wrote: Korey, if you're deadlock is with running the MOESI_CMP_directory protocol, I'm not surprised. DMA support is pretty much broken in that protocol. I have that fixed and I also fixed the underlining DMA problem. I'll be pushing the fixes momentarily. Korey and Malek, please pull these changes and confirm they fix your problem. Brad Brad, how come the mails you sent on Saturday are being received now? -- Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
Nevermind those. I had several incoming and outgoing emails from the weekend that finally got through our system. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Sent: Tuesday, March 22, 2011 8:07 PM To: M5 Developer List Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? On Sat, March 19, 2011 6:01 pm, Beckmann, Brad wrote: Korey, if you're deadlock is with running the MOESI_CMP_directory protocol, I'm not surprised. DMA support is pretty much broken in that protocol. I have that fixed and I also fixed the underlining DMA problem. I'll be pushing the fixes momentarily. Korey and Malek, please pull these changes and confirm they fix your problem. Brad Brad, how come the mails you sent on Saturday are being received now? -- Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
message below Why did it work before the block size patch? - When the ChuckGenerator sees the block size is 0, it doesn't split up the request into multiple patches and sends the whole dma request at once. That is fine because the DMASequencer splits the request into multiple requests and only responds to the dma port when the entire request is complete. With regards to the old changeset that boots with the block size = 0, I was not able to boot a large scale CMP system (more than 16 cores) due to the deadlock threshold being triggered. I'm assuming that Brad has a read on how to fix that problem so I'll probably start working on what is causing that deadlock so hopefully we can kind of pipeline the bug fixes. -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
Hi Korey, I don't seem to have encountered that deadlock threshold when booting the old changeset. I tried both 16 + 20 core configurations just now and they seem to work. Although, they do take a really really long time compared to ~1-4 cores. I have also tried previously booting 64 cores, some time ago, and that also worked, but that took several hours. In general though, that threshold is just a fixed number, and as the CMP machine gets bigger, the 5 seems to be way too low, and would have to be multiplied by a factor of 2 -3? I tried using the default crossbar topology, maybe you encounter the deadlock threshold using Mesh? Malek On Fri, Mar 18, 2011 at 12:12 PM, Korey Sewell ksew...@umich.edu wrote: message below Why did it work before the block size patch? - When the ChuckGenerator sees the block size is 0, it doesn't split up the request into multiple patches and sends the whole dma request at once. That is fine because the DMASequencer splits the request into multiple requests and only responds to the dma port when the entire request is complete. With regards to the old changeset that boots with the block size = 0, I was not able to boot a large scale CMP system (more than 16 cores) due to the deadlock threshold being triggered. I'm assuming that Brad has a read on how to fix that problem so I'll probably start working on what is causing that deadlock so hopefully we can kind of pipeline the bug fixes. -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
Brad/Korey, An Update of what I have seen. I did notice that in the failing case, the DMASequencer would think that the request is completed (length of request == 64) when in fact it should be 8192. The 8192 reflects the byte sector size, but what is interesting is that a DPRINTF(IdeIDisk) in ide_disk.cc right before it fails indicates that the request length is 8192. So there is something wrong with the transfer in the RubyPorts. I have a feeling it might be also linked with the timing simpleCpu changes about handling split requests, although Alpha does not support split requests, that is independent of the DMA transfers. Also, comparing Ruby Traces (with and without failing changeset) the first PRD BaseAddr is consistent between them, but not consistent between Ruby/M5. So the fact that the PRD BaseAddr is 'wrong' in the one case does not prevent it from booting the Kernel. Not really sure if that helps anymore. Malek On Tue, Mar 15, 2011 at 6:50 PM, Korey Sewell ksew...@umich.edu wrote: Sorry for the confusion, I definitely garbled up some terminology. I meant that the M5 ran with the atomic model to compare with the timing Ruby model. M5-atomic maybe runs in 10-15 mins and then Ruby 20-30 mins. I am able to get the problem point in the Ruby simulation (bad DMA access) in about 20 mins. I able to get to that same problem point in the M5-atomic mode in about 10 mins so as to see what to compare against and what values are being set/unset incorrectly. On Tue, Mar 15, 2011 at 6:22 PM, Beckmann, Brad brad.beckm...@amd.comwrote: I'm confused. Korey, I thought this DMA problem only existed with Ruby? If so, how were you able to reproduce it using atomic mode? Ruby does not work with the atomic cpu model. Please clarify, thanks! Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Tuesday, March 15, 2011 12:09 PM To: M5 Developer List Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? Hi Brad/Malek, I've been able to regenerate this error in about 20mins now (instead of hours) by running things in atomic mode. Not sure if that helps or not... On Tue, Mar 15, 2011 at 3:03 PM, Beckmann, Brad brad.beckm...@amd.comwrote: How is that you are able to run the memtester in FS Mode? I see the ruby_mem_tester.py in /configs/example/ but it seems that it is only configured for SE Mode as far as Ruby is concerned? I don't run it in FS mode. Since the DMA bug manifests only after hours of execution, I wanted to first verify that the DMA protocol support was solid using the mem tester. Somewhat surprisingly, I found several bugs in MOESI_CMP_directory's support of DMA. It turns out that the initial DMA support in that protocol wasn't very well thought out. Now I fixed those bugs, but since the DMA problem also arises with the MOESI_hammer protocol, I'm confident that my patches don't fix the real problem. Brad ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
Hi Malek, Can you send your most recent trace showing what you described (if it isnt too big)? I havent observed the different request size errors, but I think I have observed the different PRD addresses on the first access (in the most recent changeset). I'll double check. I was planning to post sometime soon what was the latest on my debugging efforts but a quick summary is that the PRD address gets set from a BMI.DTP register that eventually gets propagate through. I havent been able to verify if that is loaded from the kernel or some configuration parameter quite yet. I have a feeling it might be also linked with the timing simpleCpu changes about handling split requests, although Alpha does not support split requests, that is independent of the DMA transfers. Are you sure it's a split request problem and not an uncacheable address thing? Or maybe it's some combo of both? Also, comparing Ruby Traces (with and without failing changeset) the first PRD BaseAddr is consistent between them, but not consistent between Ruby/M5. So the fact that the PRD BaseAddr is 'wrong' in the one case does not prevent it from booting the Kernel. That's an interesting observation. It would be nice to figure out why that address may or may not matter though. Not really sure if that helps anymore. Malek On Tue, Mar 15, 2011 at 6:50 PM, Korey Sewell ksew...@umich.edu wrote: Sorry for the confusion, I definitely garbled up some terminology. I meant that the M5 ran with the atomic model to compare with the timing Ruby model. M5-atomic maybe runs in 10-15 mins and then Ruby 20-30 mins. I am able to get the problem point in the Ruby simulation (bad DMA access) in about 20 mins. I able to get to that same problem point in the M5-atomic mode in about 10 mins so as to see what to compare against and what values are being set/unset incorrectly. On Tue, Mar 15, 2011 at 6:22 PM, Beckmann, Brad brad.beckm...@amd.com wrote: I'm confused. Korey, I thought this DMA problem only existed with Ruby? If so, how were you able to reproduce it using atomic mode? Ruby does not work with the atomic cpu model. Please clarify, thanks! Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Tuesday, March 15, 2011 12:09 PM To: M5 Developer List Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? Hi Brad/Malek, I've been able to regenerate this error in about 20mins now (instead of hours) by running things in atomic mode. Not sure if that helps or not... On Tue, Mar 15, 2011 at 3:03 PM, Beckmann, Brad brad.beckm...@amd.comwrote: How is that you are able to run the memtester in FS Mode? I see the ruby_mem_tester.py in /configs/example/ but it seems that it is only configured for SE Mode as far as Ruby is concerned? I don't run it in FS mode. Since the DMA bug manifests only after hours of execution, I wanted to first verify that the DMA protocol support was solid using the mem tester. Somewhat surprisingly, I found several bugs in MOESI_CMP_directory's support of DMA. It turns out that the initial DMA support in that protocol wasn't very well thought out. Now I fixed those bugs, but since the DMA problem also arises with the MOESI_hammer protocol, I'm confident that my patches don't fix the real problem. Brad ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
Hi Malek/Korey, The good news is that I've been able to dedicate a significant amount of time to this over the past day or so and I've got a good handle on what is going on here. Why did it work before the block size patch? - When the ChuckGenerator sees the block size is 0, it doesn't split up the request into multiple patches and sends the whole dma request at once. That is fine because the DMASequencer splits the request into multiple requests and only responds to the dma port when the entire request is complete. What is the current problem? - When the ChuckGenerator sees the block size of 64, the dma port splits the request into 64-byte packets, effectively doing the same thing the dma sequencer does. That in itself shouldn't break things...The DMA sequencer nacks all but the first 64-byte request of the dma transfer because it is designed to only handle one M5 packet at a time. Eventually the first 64-byte packet completes and the RubyPort tells the dma port to retry the second packet. The dma port does, but for some reason DMASequencer still nacks that second request. I'm not quite sure why that is, but I'm sure I'll figure it out soon. Once I do, I'll push a fix along with all the other fixes I've come across along this multi-day adventure. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Thursday, March 17, 2011 3:10 PM To: Malek Musleh Cc: M5 Developer List Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? Hi Malek, Can you send your most recent trace showing what you described (if it isnt too big)? I havent observed the different request size errors, but I think I have observed the different PRD addresses on the first access (in the most recent changeset). I'll double check. I was planning to post sometime soon what was the latest on my debugging efforts but a quick summary is that the PRD address gets set from a BMI.DTP register that eventually gets propagate through. I havent been able to verify if that is loaded from the kernel or some configuration parameter quite yet. I have a feeling it might be also linked with the timing simpleCpu changes about handling split requests, although Alpha does not support split requests, that is independent of the DMA transfers. Are you sure it's a split request problem and not an uncacheable address thing? Or maybe it's some combo of both? Also, comparing Ruby Traces (with and without failing changeset) the first PRD BaseAddr is consistent between them, but not consistent between Ruby/M5. So the fact that the PRD BaseAddr is 'wrong' in the one case does not prevent it from booting the Kernel. That's an interesting observation. It would be nice to figure out why that address may or may not matter though. Not really sure if that helps anymore. Malek On Tue, Mar 15, 2011 at 6:50 PM, Korey Sewell ksew...@umich.edu wrote: Sorry for the confusion, I definitely garbled up some terminology. I meant that the M5 ran with the atomic model to compare with the timing Ruby model. M5-atomic maybe runs in 10-15 mins and then Ruby 20-30 mins. I am able to get the problem point in the Ruby simulation (bad DMA access) in about 20 mins. I able to get to that same problem point in the M5-atomic mode in about 10 mins so as to see what to compare against and what values are being set/unset incorrectly. On Tue, Mar 15, 2011 at 6:22 PM, Beckmann, Brad brad.beckm...@amd.com wrote: I'm confused. Korey, I thought this DMA problem only existed with Ruby? If so, how were you able to reproduce it using atomic mode? Ruby does not work with the atomic cpu model. Please clarify, thanks! Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev- boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Tuesday, March 15, 2011 12:09 PM To: M5 Developer List Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? Hi Brad/Malek, I've been able to regenerate this error in about 20mins now (instead of hours) by running things in atomic mode. Not sure if that helps or not... On Tue, Mar 15, 2011 at 3:03 PM, Beckmann, Brad brad.beckm...@amd.comwrote: How is that you are able to run the memtester in FS Mode? I see the ruby_mem_tester.py in /configs/example/ but it seems that it is only configured for SE Mode as far as Ruby is concerned? I don't run it in FS mode. Since the DMA bug manifests only after hours of execution, I wanted to first verify that the DMA protocol support was solid using the mem tester. Somewhat surprisingly, I found several bugs in MOESI_CMP_directory's support of DMA. It turns out that the initial DMA support in that protocol wasn't very well
Re: [m5-dev] Ruby FS - DMA Controller problem?
Hi Brad, How is that you are able to run the memtester in FS Mode? I see the ruby_mem_tester.py in /configs/example/ but it seems that it is only configured for SE Mode as far as Ruby is concerned? Also, how would the default block size be '0' without that problem changeset? If it was 0, doesn't that mean it's not passing the data from the DMA transfer? It would have to be at least 1? Malek On Mon, Mar 14, 2011 at 5:32 PM, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Malek, Just to reiterate, I don't think my patches will fix the underlining problem. Instead, my patches just fix various corner cases in the protocols. I suspect these corner cases are never actually reached in real execution. The fact that your dma traces point out that the Ruby and Classic configurations use different base addresses makes me think this might be a problem with configuration and device registration. We should investigate further. Brad -Original Message- From: Malek Musleh [mailto:malek.mus...@gmail.com] Sent: Monday, March 14, 2011 9:11 AM To: M5 Developer List Cc: Beckmann, Brad Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? Hi Korey/Brad, I commented out the following lines: In RubyPort.hh unsigned deviceBlockSize() const; In RubyPort.cc unsigned RubyPort::M5Port::deviceBlockSize() const { return (unsigned) RubySystem::getBlockSizeBytes(); } I also did a diff trace between M5 and Ruby using the IdeDisk traceflag as indicated earlier on. In the Ruby Trace, it stalls at this 2398589225000: system.disk0: Write to disk at offset: 0x1 data 0 239858940: system.disk0: Write to disk at offset: 0x2 data 0x10 2398589575000: system.disk0: Write to disk at offset: 0x3 data 0 2398589742000: system.disk0: Write to disk at offset: 0x4 data 0 2398589909000: system.disk0: Write to disk at offset: 0x5 data 0 2398590088000: system.disk0: Write to disk at offset: 0x6 data 0xe0 2398596763500: system.disk0: Write to disk at offset: 0x7 data 0xc8 2398597916500: system.disk0: PRD: baseAddr:0x87298000 (0x7298000) byteCount:8192 (16) eot:0x8000 sector:0 2398597916500: system.disk0: doDmaWrite, diskDelay: 100 totalDiskDelay: 116 Waiting for the Interrupt to be Posted. However, a comparison between the M5 and Ruby traces suggest that they differ on the following line: RubyTrace: 239858940: system.disk0: Write to disk at offset: 0x2 data 0x10 2398589575000: system.disk0: Write to disk at offset: 0x3 data 0 2398589742000: system.disk0: Write to disk at offset: 0x4 data 0 2398589909000: system.disk0: Write to disk at offset: 0x5 data 0 2398590088000: system.disk0: Write to disk at offset: 0x6 data 0xe0 2398596763500: system.disk0: Write to disk at offset: 0x7 data 0xc8 2398597916500: system.disk0: PRD: baseAddr:0x87298000 (0x7298000) byteCount:8192 (16) eot:0x8000 sector:0 2398597916500: system.disk0: doDmaWrite, diskDelay: 100 totalDiskDelay: 116 M5 Trace: 2237623634000: system.disk0: Write to disk at offset: 0x7 data 0xc8 2237624206501: system.disk0: PRD: baseAddr:0x87392000 (0x7392000) byteCount:8192 (16) eot:0x8000 sector:0 2237624206501: system.disk0: doDmaWrite, diskDelay: 100 totalDiskDelay: 116 If you note that the PRD:baseAddr it tries to access is different, which I would think should be the same right? There is no reason why it should be different? The 0 or 1 block size, and the sequential retries are forcing the DMA timer to time out the request, and thus fails in the dma inconsistent state. I have attached both sets of traces in case it sheds anymore light on to the cause of the problem. In any case, it might not matter too much now since Brad was able to reproduce the problem and has a patch for it, but may be of use for future M5 changes. Malek On Mon, Mar 14, 2011 at 11:54 AM, Beckmann, Brad brad.beckm...@amd.com wrote: Thanks Malek. Very interesting. Yes, this 5 line changeset seems rather benign, but actually has huge ramifications. With this change, the RubyPort passes the correct block size to the cpu/device models. Without it, I believe the block size defaults to 0 or 1...I can't remember which. While that seems rather inconsequential, I noticed when I made this change that the memtester behaved quite differently. In particular, it keeps issuing requests until sendTiming returns false, instead of just one request/cpu at a time. Therefore another patch in this series added the retry mechanism to the RubyPort. I'm still not sure exactly what the problem is with ruby+dma, but I suspect that the dma devices are behaving differently now that the RubyPort passes the correct block size. I was able to spend a few hours on this over the weekend. I am now able to reproduce the error and I have a few protocol bug fixes queued up. However, I don't think those fixes actually solved the main issue. I don't think I'll be able to get to it today
Re: [m5-dev] Ruby FS - DMA Controller problem?
Also, how would the default block size be '0' without that problem changeset? The M5Port is derived from SimpleTimingPort (mem/tport.hh) which was derived from Port (mem/port.hh) which has a virtual deviceBlockSize function that always set to 0. If it was 0, doesn't that mean it's not passing the data from the DMA transfer? It would have to be at least 1? I'm not sure about this, but I hope to find soon. More than likely some default value is getting set if it see '0' or something invalid (But thats just a guess). -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
How is that you are able to run the memtester in FS Mode? I see the ruby_mem_tester.py in /configs/example/ but it seems that it is only configured for SE Mode as far as Ruby is concerned? I don't run it in FS mode. Since the DMA bug manifests only after hours of execution, I wanted to first verify that the DMA protocol support was solid using the mem tester. Somewhat surprisingly, I found several bugs in MOESI_CMP_directory's support of DMA. It turns out that the initial DMA support in that protocol wasn't very well thought out. Now I fixed those bugs, but since the DMA problem also arises with the MOESI_hammer protocol, I'm confident that my patches don't fix the real problem. Brad ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
Hi Brad/Malek, I've been able to regenerate this error in about 20mins now (instead of hours) by running things in atomic mode. Not sure if that helps or not... On Tue, Mar 15, 2011 at 3:03 PM, Beckmann, Brad brad.beckm...@amd.comwrote: How is that you are able to run the memtester in FS Mode? I see the ruby_mem_tester.py in /configs/example/ but it seems that it is only configured for SE Mode as far as Ruby is concerned? I don't run it in FS mode. Since the DMA bug manifests only after hours of execution, I wanted to first verify that the DMA protocol support was solid using the mem tester. Somewhat surprisingly, I found several bugs in MOESI_CMP_directory's support of DMA. It turns out that the initial DMA support in that protocol wasn't very well thought out. Now I fixed those bugs, but since the DMA problem also arises with the MOESI_hammer protocol, I'm confident that my patches don't fix the real problem. Brad ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
I'm confused. Korey, I thought this DMA problem only existed with Ruby? If so, how were you able to reproduce it using atomic mode? Ruby does not work with the atomic cpu model. Please clarify, thanks! Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Tuesday, March 15, 2011 12:09 PM To: M5 Developer List Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? Hi Brad/Malek, I've been able to regenerate this error in about 20mins now (instead of hours) by running things in atomic mode. Not sure if that helps or not... On Tue, Mar 15, 2011 at 3:03 PM, Beckmann, Brad brad.beckm...@amd.comwrote: How is that you are able to run the memtester in FS Mode? I see the ruby_mem_tester.py in /configs/example/ but it seems that it is only configured for SE Mode as far as Ruby is concerned? I don't run it in FS mode. Since the DMA bug manifests only after hours of execution, I wanted to first verify that the DMA protocol support was solid using the mem tester. Somewhat surprisingly, I found several bugs in MOESI_CMP_directory's support of DMA. It turns out that the initial DMA support in that protocol wasn't very well thought out. Now I fixed those bugs, but since the DMA problem also arises with the MOESI_hammer protocol, I'm confident that my patches don't fix the real problem. Brad ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
Sorry for the confusion, I definitely garbled up some terminology. I meant that the M5 ran with the atomic model to compare with the timing Ruby model. M5-atomic maybe runs in 10-15 mins and then Ruby 20-30 mins. I am able to get the problem point in the Ruby simulation (bad DMA access) in about 20 mins. I able to get to that same problem point in the M5-atomic mode in about 10 mins so as to see what to compare against and what values are being set/unset incorrectly. On Tue, Mar 15, 2011 at 6:22 PM, Beckmann, Brad brad.beckm...@amd.comwrote: I'm confused. Korey, I thought this DMA problem only existed with Ruby? If so, how were you able to reproduce it using atomic mode? Ruby does not work with the atomic cpu model. Please clarify, thanks! Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Tuesday, March 15, 2011 12:09 PM To: M5 Developer List Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? Hi Brad/Malek, I've been able to regenerate this error in about 20mins now (instead of hours) by running things in atomic mode. Not sure if that helps or not... On Tue, Mar 15, 2011 at 3:03 PM, Beckmann, Brad brad.beckm...@amd.comwrote: How is that you are able to run the memtester in FS Mode? I see the ruby_mem_tester.py in /configs/example/ but it seems that it is only configured for SE Mode as far as Ruby is concerned? I don't run it in FS mode. Since the DMA bug manifests only after hours of execution, I wanted to first verify that the DMA protocol support was solid using the mem tester. Somewhat surprisingly, I found several bugs in MOESI_CMP_directory's support of DMA. It turns out that the initial DMA support in that protocol wasn't very well thought out. Now I fixed those bugs, but since the DMA problem also arises with the MOESI_hammer protocol, I'm confident that my patches don't fix the real problem. Brad ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
Hi Brad, I found the problem that was causing this error. Specifically, it is this changeset: changeset: 7909:eee578ed2130 user:Joel Hestness hestn...@cs.utexas.edu date:Sun Feb 06 22:14:18 2011 -0800 summary: Ruby: Fix to return cache block size to CPU for split data transfers Link: http://reviews.m5sim.org/r/393/diff/#index_header Previously, I mentioned it was a couple of changesets prior to this one, but the changes between them are related, so it wasn't as obvious what was happening. In fact, this corresponds to the assert() for the block size you had put in to deal with x86 unaligned accesses, but then later removed because of LL/SC in Alpha. It's not clear to me why this is causing a problem, or rather why this doesn't return the default 64 byte block size from the ruby system, but commenting out those lines of code allowed it to work. Maybe Korey could confirm? Malek On Wed, Mar 9, 2011 at 8:24 PM, Beckmann, Brad brad.beckm...@amd.com wrote: I still have not been able to reproduce the problem, but I haven't tried in a few weeks. So does this happen when booting up the system, independent of what benchmark you are running? If so, could you send me your command line? I'm sure the disk image and kernel binaries between us are different, so I don't necessarily think I'll be able to reproduce your problem, but at least I'll be able to isolate it. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Malek Musleh Sent: Wednesday, March 09, 2011 4:41 PM To: M5 Developer List Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? Hi Korey, I ran into a similar problem with a different benchmark/boot up attempt. There is another thread on m5-dev with 'Ruby FS failing with recent changesets' as the subject. I was able to track down the changeset which it was coming from, but I did not look further into the changeset as to why it was causing it. Brad said he would take a look at it, but I am not sure if he was able to reproduce the problem. Malek On Wed, Mar 9, 2011 at 7:08 PM, Korey Sewell ksew...@umich.edu wrote: Hi all, I'm trying to run Ruby in FS mode for the FFT benchmark. However, I've been unable to fully boot the kernel and error with a panic in the IDE disk controller: panic: Inconsistent DMA transfer state: dmaState = 2 devState = 1 @ cycle 62640732569001 [doDmaTransfer:build/ALPHA_FS_MOESI_CMP_directory/dev/ide_disk.cc, line 323] Has anybody run into a similar error or does anyone have any suggestions for debugging the problem? I can run the same code using the M5 memory system and FFT finishes properly so it's definitely a ruby-specific thing. It seems to track this down , I could diff instruction traces (M5 v. Ruby) or maybe even diff trace output from the IdeDisk trace flags but those routes seem a bit heavy-handed considering the amount of trace output generated. The command line this was run with is: build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b fft_64t_base -n 1 The output in system.terminal is: hda: M5 IDE Disk, ATA DISK drive hdb: M5 IDE Disk, ATA DISK drive hda: UDMA/33 mode selected hdb: UDMA/33 mode selected hdc: M5 IDE Disk, ATA DISK drive hdc: UDMA/33 mode selected ide0 at 0x8410-0x8417,0x8422 on irq 31 ide1 at 0x8418-0x841f,0x8426 on irq 31 ide_generic: please use probe_mask=0x3f module parameter for probing all legacy ISA IDE ports ide2 at 0x1f0-0x1f7,0x3f6 on irq 14 ide3 at 0x170-0x177,0x376 on irq 15 hda: max request size: 128KiB hda: 2866752 sectors (1467 MB), CHS=2844/16/63 hda:4hda: dma_timer_expiry: dma status == 0x65 hda: DMA interrupt recovery hda: lost interrupt unknown partition table hdb: max request size: 128KiB hdb: 1008000 sectors (516 MB), CHS=1000/16/63 hdb:4hdb: dma_timer_expiry: dma status == 0x65 hdb: DMA interrupt recovery hdb: lost interrupt Thanks again, any help or thoughts would be well appreciated. -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
Which lines are you commenting out to get it to work? It's a bit unclear in the diff you point to (maybe because you said it's a full set of changes, not just one) (btw: The work I've been doing is comparing the old m5 memory trace to the gem5 memory trace to try to chase down the bug. I wouldn't be surprised if we are converging to the same bug though.) On Mon, Mar 14, 2011 at 3:51 AM, Malek Musleh malek.mus...@gmail.com wrote: Hi Brad, I found the problem that was causing this error. Specifically, it is this changeset: changeset: 7909:eee578ed2130 user: Joel Hestness hestn...@cs.utexas.edu date: Sun Feb 06 22:14:18 2011 -0800 summary: Ruby: Fix to return cache block size to CPU for split data transfers Link: http://reviews.m5sim.org/r/393/diff/#index_header Previously, I mentioned it was a couple of changesets prior to this one, but the changes between them are related, so it wasn't as obvious what was happening. In fact, this corresponds to the assert() for the block size you had put in to deal with x86 unaligned accesses, but then later removed because of LL/SC in Alpha. It's not clear to me why this is causing a problem, or rather why this doesn't return the default 64 byte block size from the ruby system, but commenting out those lines of code allowed it to work. Maybe Korey could confirm? Malek On Wed, Mar 9, 2011 at 8:24 PM, Beckmann, Brad brad.beckm...@amd.com wrote: I still have not been able to reproduce the problem, but I haven't tried in a few weeks. So does this happen when booting up the system, independent of what benchmark you are running? If so, could you send me your command line? I'm sure the disk image and kernel binaries between us are different, so I don't necessarily think I'll be able to reproduce your problem, but at least I'll be able to isolate it. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Malek Musleh Sent: Wednesday, March 09, 2011 4:41 PM To: M5 Developer List Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? Hi Korey, I ran into a similar problem with a different benchmark/boot up attempt. There is another thread on m5-dev with 'Ruby FS failing with recent changesets' as the subject. I was able to track down the changeset which it was coming from, but I did not look further into the changeset as to why it was causing it. Brad said he would take a look at it, but I am not sure if he was able to reproduce the problem. Malek On Wed, Mar 9, 2011 at 7:08 PM, Korey Sewell ksew...@umich.edu wrote: Hi all, I'm trying to run Ruby in FS mode for the FFT benchmark. However, I've been unable to fully boot the kernel and error with a panic in the IDE disk controller: panic: Inconsistent DMA transfer state: dmaState = 2 devState = 1 @ cycle 62640732569001 [doDmaTransfer:build/ALPHA_FS_MOESI_CMP_directory/dev/ide_disk.cc, line 323] Has anybody run into a similar error or does anyone have any suggestions for debugging the problem? I can run the same code using the M5 memory system and FFT finishes properly so it's definitely a ruby-specific thing. It seems to track this down , I could diff instruction traces (M5 v. Ruby) or maybe even diff trace output from the IdeDisk trace flags but those routes seem a bit heavy-handed considering the amount of trace output generated. The command line this was run with is: build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b fft_64t_base -n 1 The output in system.terminal is: hda: M5 IDE Disk, ATA DISK drive hdb: M5 IDE Disk, ATA DISK drive hda: UDMA/33 mode selected hdb: UDMA/33 mode selected hdc: M5 IDE Disk, ATA DISK drive hdc: UDMA/33 mode selected ide0 at 0x8410-0x8417,0x8422 on irq 31 ide1 at 0x8418-0x841f,0x8426 on irq 31 ide_generic: please use probe_mask=0x3f module parameter for probing all legacy ISA IDE ports ide2 at 0x1f0-0x1f7,0x3f6 on irq 14 ide3 at 0x170-0x177,0x376 on irq 15 hda: max request size: 128KiB hda: 2866752 sectors (1467 MB), CHS=2844/16/63 hda:4hda: dma_timer_expiry: dma status == 0x65 hda: DMA interrupt recovery hda: lost interrupt unknown partition table hdb: max request size: 128KiB hdb: 1008000 sectors (516 MB), CHS=1000/16/63 hdb:4hdb: dma_timer_expiry: dma status == 0x65 hdb: DMA interrupt recovery hdb: lost interrupt Thanks again, any help or thoughts would be well appreciated. -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5
Re: [m5-dev] Ruby FS - DMA Controller problem?
Thanks Malek. Very interesting. Yes, this 5 line changeset seems rather benign, but actually has huge ramifications. With this change, the RubyPort passes the correct block size to the cpu/device models. Without it, I believe the block size defaults to 0 or 1...I can't remember which. While that seems rather inconsequential, I noticed when I made this change that the memtester behaved quite differently. In particular, it keeps issuing requests until sendTiming returns false, instead of just one request/cpu at a time. Therefore another patch in this series added the retry mechanism to the RubyPort. I'm still not sure exactly what the problem is with ruby+dma, but I suspect that the dma devices are behaving differently now that the RubyPort passes the correct block size. I was able to spend a few hours on this over the weekend. I am now able to reproduce the error and I have a few protocol bug fixes queued up. However, I don't think those fixes actually solved the main issue. I don't think I'll be able to get to it today, but I'll try to find some time tomorrow to investigate further. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Monday, March 14, 2011 2:10 AM To: M5 Developer List Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? Which lines are you commenting out to get it to work? It's a bit unclear in the diff you point to (maybe because you said it's a full set of changes, not just one) (btw: The work I've been doing is comparing the old m5 memory trace to the gem5 memory trace to try to chase down the bug. I wouldn't be surprised if we are converging to the same bug though.) On Mon, Mar 14, 2011 at 3:51 AM, Malek Musleh malek.mus...@gmail.com wrote: Hi Brad, I found the problem that was causing this error. Specifically, it is this changeset: changeset: 7909:eee578ed2130 user: Joel Hestness hestn...@cs.utexas.edu date: Sun Feb 06 22:14:18 2011 -0800 summary: Ruby: Fix to return cache block size to CPU for split data transfers Link: http://reviews.m5sim.org/r/393/diff/#index_header Previously, I mentioned it was a couple of changesets prior to this one, but the changes between them are related, so it wasn't as obvious what was happening. In fact, this corresponds to the assert() for the block size you had put in to deal with x86 unaligned accesses, but then later removed because of LL/SC in Alpha. It's not clear to me why this is causing a problem, or rather why this doesn't return the default 64 byte block size from the ruby system, but commenting out those lines of code allowed it to work. Maybe Korey could confirm? Malek On Wed, Mar 9, 2011 at 8:24 PM, Beckmann, Brad brad.beckm...@amd.com wrote: I still have not been able to reproduce the problem, but I haven't tried in a few weeks. So does this happen when booting up the system, independent of what benchmark you are running? If so, could you send me your command line? I'm sure the disk image and kernel binaries between us are different, so I don't necessarily think I'll be able to reproduce your problem, but at least I'll be able to isolate it. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev- boun...@m5sim.org] On Behalf Of Malek Musleh Sent: Wednesday, March 09, 2011 4:41 PM To: M5 Developer List Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? Hi Korey, I ran into a similar problem with a different benchmark/boot up attempt. There is another thread on m5-dev with 'Ruby FS failing with recent changesets' as the subject. I was able to track down the changeset which it was coming from, but I did not look further into the changeset as to why it was causing it. Brad said he would take a look at it, but I am not sure if he was able to reproduce the problem. Malek On Wed, Mar 9, 2011 at 7:08 PM, Korey Sewell ksew...@umich.edu wrote: Hi all, I'm trying to run Ruby in FS mode for the FFT benchmark. However, I've been unable to fully boot the kernel and error with a panic in the IDE disk controller: panic: Inconsistent DMA transfer state: dmaState = 2 devState = 1 @ cycle 62640732569001 [doDmaTransfer:build/ALPHA_FS_MOESI_CMP_directory/dev/ide_disk.cc, line 323] Has anybody run into a similar error or does anyone have any suggestions for debugging the problem? I can run the same code using the M5 memory system and FFT finishes properly so it's definitely a ruby-specific thing. It seems to track this down , I could diff instruction traces (M5 v. Ruby) or maybe even diff trace output from the IdeDisk trace flags but those routes seem a bit heavy-handed considering the amount of trace output generated. The command line this was run with is: build/ALPHA_FS_MOESI_CMP_directory
Re: [m5-dev] Ruby FS - DMA Controller problem?
Hi Malek, Just to reiterate, I don't think my patches will fix the underlining problem. Instead, my patches just fix various corner cases in the protocols. I suspect these corner cases are never actually reached in real execution. The fact that your dma traces point out that the Ruby and Classic configurations use different base addresses makes me think this might be a problem with configuration and device registration. We should investigate further. Brad -Original Message- From: Malek Musleh [mailto:malek.mus...@gmail.com] Sent: Monday, March 14, 2011 9:11 AM To: M5 Developer List Cc: Beckmann, Brad Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? Hi Korey/Brad, I commented out the following lines: In RubyPort.hh unsigned deviceBlockSize() const; In RubyPort.cc unsigned RubyPort::M5Port::deviceBlockSize() const { return (unsigned) RubySystem::getBlockSizeBytes(); } I also did a diff trace between M5 and Ruby using the IdeDisk traceflag as indicated earlier on. In the Ruby Trace, it stalls at this 2398589225000: system.disk0: Write to disk at offset: 0x1 data 0 239858940: system.disk0: Write to disk at offset: 0x2 data 0x10 2398589575000: system.disk0: Write to disk at offset: 0x3 data 0 2398589742000: system.disk0: Write to disk at offset: 0x4 data 0 2398589909000: system.disk0: Write to disk at offset: 0x5 data 0 2398590088000: system.disk0: Write to disk at offset: 0x6 data 0xe0 2398596763500: system.disk0: Write to disk at offset: 0x7 data 0xc8 2398597916500: system.disk0: PRD: baseAddr:0x87298000 (0x7298000) byteCount:8192 (16) eot:0x8000 sector:0 2398597916500: system.disk0: doDmaWrite, diskDelay: 100 totalDiskDelay: 116 Waiting for the Interrupt to be Posted. However, a comparison between the M5 and Ruby traces suggest that they differ on the following line: RubyTrace: 239858940: system.disk0: Write to disk at offset: 0x2 data 0x10 2398589575000: system.disk0: Write to disk at offset: 0x3 data 0 2398589742000: system.disk0: Write to disk at offset: 0x4 data 0 2398589909000: system.disk0: Write to disk at offset: 0x5 data 0 2398590088000: system.disk0: Write to disk at offset: 0x6 data 0xe0 2398596763500: system.disk0: Write to disk at offset: 0x7 data 0xc8 2398597916500: system.disk0: PRD: baseAddr:0x87298000 (0x7298000) byteCount:8192 (16) eot:0x8000 sector:0 2398597916500: system.disk0: doDmaWrite, diskDelay: 100 totalDiskDelay: 116 M5 Trace: 2237623634000: system.disk0: Write to disk at offset: 0x7 data 0xc8 2237624206501: system.disk0: PRD: baseAddr:0x87392000 (0x7392000) byteCount:8192 (16) eot:0x8000 sector:0 2237624206501: system.disk0: doDmaWrite, diskDelay: 100 totalDiskDelay: 116 If you note that the PRD:baseAddr it tries to access is different, which I would think should be the same right? There is no reason why it should be different? The 0 or 1 block size, and the sequential retries are forcing the DMA timer to time out the request, and thus fails in the dma inconsistent state. I have attached both sets of traces in case it sheds anymore light on to the cause of the problem. In any case, it might not matter too much now since Brad was able to reproduce the problem and has a patch for it, but may be of use for future M5 changes. Malek On Mon, Mar 14, 2011 at 11:54 AM, Beckmann, Brad brad.beckm...@amd.com wrote: Thanks Malek. Very interesting. Yes, this 5 line changeset seems rather benign, but actually has huge ramifications. With this change, the RubyPort passes the correct block size to the cpu/device models. Without it, I believe the block size defaults to 0 or 1...I can't remember which. While that seems rather inconsequential, I noticed when I made this change that the memtester behaved quite differently. In particular, it keeps issuing requests until sendTiming returns false, instead of just one request/cpu at a time. Therefore another patch in this series added the retry mechanism to the RubyPort. I'm still not sure exactly what the problem is with ruby+dma, but I suspect that the dma devices are behaving differently now that the RubyPort passes the correct block size. I was able to spend a few hours on this over the weekend. I am now able to reproduce the error and I have a few protocol bug fixes queued up. However, I don't think those fixes actually solved the main issue. I don't think I'll be able to get to it today, but I'll try to find some time tomorrow to investigate further. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev- boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Monday, March 14, 2011 2:10 AM To: M5 Developer List Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? Which lines are you commenting out to get it to work? It's a bit unclear in the diff you point to (maybe because you said it's
[m5-dev] Ruby FS - DMA Controller problem?
Hi all, I'm trying to run Ruby in FS mode for the FFT benchmark. However, I've been unable to fully boot the kernel and error with a panic in the IDE disk controller: panic: Inconsistent DMA transfer state: dmaState = 2 devState = 1 @ cycle 62640732569001 [doDmaTransfer:build/ALPHA_FS_MOESI_CMP_directory/dev/ide_disk.cc, line 323] Has anybody run into a similar error or does anyone have any suggestions for debugging the problem? I can run the same code using the M5 memory system and FFT finishes properly so it's definitely a ruby-specific thing. It seems to track this down , I could diff instruction traces (M5 v. Ruby) or maybe even diff trace output from the IdeDisk trace flags but those routes seem a bit heavy-handed considering the amount of trace output generated. The command line this was run with is: build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b fft_64t_base -n 1 The output in system.terminal is: hda: M5 IDE Disk, ATA DISK drive hdb: M5 IDE Disk, ATA DISK drive hda: UDMA/33 mode selected hdb: UDMA/33 mode selected hdc: M5 IDE Disk, ATA DISK drive hdc: UDMA/33 mode selected ide0 at 0x8410-0x8417,0x8422 on irq 31 ide1 at 0x8418-0x841f,0x8426 on irq 31 ide_generic: please use probe_mask=0x3f module parameter for probing all legacy ISA IDE ports ide2 at 0x1f0-0x1f7,0x3f6 on irq 14 ide3 at 0x170-0x177,0x376 on irq 15 hda: max request size: 128KiB hda: 2866752 sectors (1467 MB), CHS=2844/16/63 hda:4hda: dma_timer_expiry: dma status == 0x65 hda: DMA interrupt recovery hda: lost interrupt unknown partition table hdb: max request size: 128KiB hdb: 1008000 sectors (516 MB), CHS=1000/16/63 hdb:4hdb: dma_timer_expiry: dma status == 0x65 hdb: DMA interrupt recovery hdb: lost interrupt Thanks again, any help or thoughts would be well appreciated. -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
Hi Korey, I ran into a similar problem with a different benchmark/boot up attempt. There is another thread on m5-dev with 'Ruby FS failing with recent changesets' as the subject. I was able to track down the changeset which it was coming from, but I did not look further into the changeset as to why it was causing it. Brad said he would take a look at it, but I am not sure if he was able to reproduce the problem. Malek On Wed, Mar 9, 2011 at 7:08 PM, Korey Sewell ksew...@umich.edu wrote: Hi all, I'm trying to run Ruby in FS mode for the FFT benchmark. However, I've been unable to fully boot the kernel and error with a panic in the IDE disk controller: panic: Inconsistent DMA transfer state: dmaState = 2 devState = 1 @ cycle 62640732569001 [doDmaTransfer:build/ALPHA_FS_MOESI_CMP_directory/dev/ide_disk.cc, line 323] Has anybody run into a similar error or does anyone have any suggestions for debugging the problem? I can run the same code using the M5 memory system and FFT finishes properly so it's definitely a ruby-specific thing. It seems to track this down , I could diff instruction traces (M5 v. Ruby) or maybe even diff trace output from the IdeDisk trace flags but those routes seem a bit heavy-handed considering the amount of trace output generated. The command line this was run with is: build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b fft_64t_base -n 1 The output in system.terminal is: hda: M5 IDE Disk, ATA DISK drive hdb: M5 IDE Disk, ATA DISK drive hda: UDMA/33 mode selected hdb: UDMA/33 mode selected hdc: M5 IDE Disk, ATA DISK drive hdc: UDMA/33 mode selected ide0 at 0x8410-0x8417,0x8422 on irq 31 ide1 at 0x8418-0x841f,0x8426 on irq 31 ide_generic: please use probe_mask=0x3f module parameter for probing all legacy ISA IDE ports ide2 at 0x1f0-0x1f7,0x3f6 on irq 14 ide3 at 0x170-0x177,0x376 on irq 15 hda: max request size: 128KiB hda: 2866752 sectors (1467 MB), CHS=2844/16/63 hda:4hda: dma_timer_expiry: dma status == 0x65 hda: DMA interrupt recovery hda: lost interrupt unknown partition table hdb: max request size: 128KiB hdb: 1008000 sectors (516 MB), CHS=1000/16/63 hdb:4hdb: dma_timer_expiry: dma status == 0x65 hdb: DMA interrupt recovery hdb: lost interrupt Thanks again, any help or thoughts would be well appreciated. -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
I still have not been able to reproduce the problem, but I haven't tried in a few weeks. So does this happen when booting up the system, independent of what benchmark you are running? If so, could you send me your command line? I'm sure the disk image and kernel binaries between us are different, so I don't necessarily think I'll be able to reproduce your problem, but at least I'll be able to isolate it. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Malek Musleh Sent: Wednesday, March 09, 2011 4:41 PM To: M5 Developer List Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? Hi Korey, I ran into a similar problem with a different benchmark/boot up attempt. There is another thread on m5-dev with 'Ruby FS failing with recent changesets' as the subject. I was able to track down the changeset which it was coming from, but I did not look further into the changeset as to why it was causing it. Brad said he would take a look at it, but I am not sure if he was able to reproduce the problem. Malek On Wed, Mar 9, 2011 at 7:08 PM, Korey Sewell ksew...@umich.edu wrote: Hi all, I'm trying to run Ruby in FS mode for the FFT benchmark. However, I've been unable to fully boot the kernel and error with a panic in the IDE disk controller: panic: Inconsistent DMA transfer state: dmaState = 2 devState = 1 @ cycle 62640732569001 [doDmaTransfer:build/ALPHA_FS_MOESI_CMP_directory/dev/ide_disk.cc, line 323] Has anybody run into a similar error or does anyone have any suggestions for debugging the problem? I can run the same code using the M5 memory system and FFT finishes properly so it's definitely a ruby-specific thing. It seems to track this down , I could diff instruction traces (M5 v. Ruby) or maybe even diff trace output from the IdeDisk trace flags but those routes seem a bit heavy-handed considering the amount of trace output generated. The command line this was run with is: build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b fft_64t_base -n 1 The output in system.terminal is: hda: M5 IDE Disk, ATA DISK drive hdb: M5 IDE Disk, ATA DISK drive hda: UDMA/33 mode selected hdb: UDMA/33 mode selected hdc: M5 IDE Disk, ATA DISK drive hdc: UDMA/33 mode selected ide0 at 0x8410-0x8417,0x8422 on irq 31 ide1 at 0x8418-0x841f,0x8426 on irq 31 ide_generic: please use probe_mask=0x3f module parameter for probing all legacy ISA IDE ports ide2 at 0x1f0-0x1f7,0x3f6 on irq 14 ide3 at 0x170-0x177,0x376 on irq 15 hda: max request size: 128KiB hda: 2866752 sectors (1467 MB), CHS=2844/16/63 hda:4hda: dma_timer_expiry: dma status == 0x65 hda: DMA interrupt recovery hda: lost interrupt unknown partition table hdb: max request size: 128KiB hdb: 1008000 sectors (516 MB), CHS=1000/16/63 hdb:4hdb: dma_timer_expiry: dma status == 0x65 hdb: DMA interrupt recovery hdb: lost interrupt Thanks again, any help or thoughts would be well appreciated. -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev