Re: [marss86-devel] Assertion failure with modified version of marss

jims Sun, 13 May 2012 00:36:16 -0700

A quick followup to my prior message:

* I ran this again with the SSD model off and the assertion failure didn't
happen. This rules out some sort of issue simply with having marss
swapping a lot (I was a bit concerned that since we are the only people
doing disk swapping with marss right now that we would run into a problem
with that).


* After looking at the SSD debug logs, it seems that there weren't any DMA
transactions outstanding to the memory system when the assertion failure
happened. This means that this was not an address conflict from software
using the DMA buffer.

* I talked with Paul Rosenfeld and he said that the assertion failure
seems to be on the path to the memory controller. I looked at the code
further and it looks like what is happening is
memoryController::handle_interconnect_cb is returning false because
pendingRequests_ did not have any open entries. What I think is happening
is that because of contention to DRAMSim (the SSD DMA and normal marss
last layer cache both using it), marss is building up a long list of
pending requests and then eventually the request queue overflows. Does
this explain what I'm seeing? If so, can I fix this simply by making the
MEM_REQ_NUM larger (right now it is 64)? Or will that break something
else?

Thanks,

Jim Stevens



> Hello,
>
> We've been getting an assertion failure in my modified version of marss
> for SSD simulation...
>
>  Completed    4583629000 cycles,     787459313 commits:     73877 Hz,
>   187432 insns/sec: rip ffffffff8108adc8 ffffffff8108e548
> ffffffff81013091 ffffffff81013091qemu-system-x86_64:
> ptlsim/build/cache/splitPhaseBus.cpp:371: bool
> Memory::SplitPhaseBus::BusInterconnect::broadcast_completed_cb(void*):
> Assertion `ret' failed.
>
>
> This is running parsec's fluidanimate simlarge with 64 MB of RAM.
>
> My SSD simulation code has to check the address stream coming back from
> DRAMSim and if the address is part of a DMA, then it sends the callback to
> the SSD and not back to the marss pipeline. My concern about this approach
> was that if the pipeline and the SSD both request the same address at the
> same time, that bad things could happen (e.g. assertion failures).
>
> Do you think this is a likely explanation of the above assertion failure?
>
> I was hoping that this wouldn't be an issue because the OS should know not
> to touch the DMA buffer until after it receives the IRQ (note: the IRQ
> isn't sent until after all DMA and disk interactions have completed). If
> this is indeed the problem, I think I have a fix for it (I'll have to
> check both the pending DMA requests AND the pending marss requests), but
> I'm worried this might break something else. And I'm confused as to why
> the software would ever touch the DMA buffer, so it makes me think I have
> a bug elsewhere (probably on the qemu side of things, which is where I
> extract the scatter gather lists for the DMA buffers).
>
> I'm currently running two more tests (with no SSD model and the SSD model
> with DMA off) to see if this error happens again. But I think this is the
> DMA code, given that the assertion failure happened in the ptlsim/cache
> subdirectory.
>
>
> Thanks,
>
> Jim Stevens
> University of Maryland, College Park
>
>
> _______________________________________________
> http://www.marss86.org
> Marss86-Devel mailing list
> [email protected]
> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>



_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Re: [marss86-devel] Assertion failure with modified version of marss

Reply via email to