Thanks. I'll try it with the bug fix. The following is not really a question, but just as a follow up if you are interested. Unfortunately, it appears that the software is touching the DMA buffer.
--------------- I tried running the experiment again with a larger request queue size and by having the SSD throttle the DMA accesses (so it only allows 64 outstanding at any one time) and now I got an assertion failure from the SSD code saying: PCI_SSD_System.cpp:220: void PCISSD::PCI_SSD_System::CompleteDMATransaction(bool, uint64_t): Assertion `isWrite == !old_t.isWrite' failed. Note: the ! is because the original disk transaction type is flipped for a DMA (e.g. a disk write causes a DMA read). The access in causing this is: 2817645044: Completed DRAMSim2 DMA transaction for (1, 44125504) The previous transactions were: 2817645026: Completed DRAMSim2 DMA transaction for (0, 32815360) 2817645032: Completed DRAMSim2 DMA transaction for (0, 32815488) 2817645038: Completed DRAMSim2 DMA transaction for (0, 32815552) (note: 0 is read and 1 is write). So it looks like while the DMA was performing a bunch of reads to DRAM, the software tried to do a write. This was caught by the DMA's address checker and sent the result to the DMA code, where it failed the assertion check for the read/write type. I'm going to add code to check the read/write type before sending it to the SSD, but I'm very confused as to why the software would be touching the DMA buffer (and like I said before, it makes me think something could be wrong on the qemu side where I grab the scatter gather lists for the DMA). Also, not sure if it matters, but the transaction that started this particular DMA was very large: 2817629132: Sector addTransaction() arrived (isWrite: 1, addr: 3666849792, num_sectors: 504) DMA type is 1 (1 => SSD Write/DMA Read and 0 => SSD Read/DMA Write) I'm going to try emailing the qemu guys again to understand what is going on. --------------- Thanks, Jim Stevens > This bug is also related to split-bus interconnect because it assumes that > controller will keep an entry available for interconnect. I recently > pushed > a fix to split-bus that make sure that controller has an entry available > before sending the request. Its commit id : 6d610a8 > > Just pull the change from either master or features branch. > You can definitely increase the memory request queue size, it won't > break anything else. > > - Avadh > > On Sun, May 13, 2012 at 12:35 AM, <[email protected]> wrote: > >> A quick followup to my prior message: >> >> * I ran this again with the SSD model off and the assertion failure >> didn't >> happen. This rules out some sort of issue simply with having marss >> swapping a lot (I was a bit concerned that since we are the only people >> doing disk swapping with marss right now that we would run into a >> problem >> with that). >> >> * After looking at the SSD debug logs, it seems that there weren't any >> DMA >> transactions outstanding to the memory system when the assertion failure >> happened. This means that this was not an address conflict from software >> using the DMA buffer. >> >> * I talked with Paul Rosenfeld and he said that the assertion failure >> seems to be on the path to the memory controller. I looked at the code >> further and it looks like what is happening is >> memoryController::handle_interconnect_cb is returning false because >> pendingRequests_ did not have any open entries. What I think is >> happening >> is that because of contention to DRAMSim (the SSD DMA and normal marss >> last layer cache both using it), marss is building up a long list of >> pending requests and then eventually the request queue overflows. Does >> this explain what I'm seeing? If so, can I fix this simply by making the >> MEM_REQ_NUM larger (right now it is 64)? Or will that break something >> else? >> >> Thanks, >> >> Jim Stevens >> >> >> >> > Hello, >> > >> > We've been getting an assertion failure in my modified version of >> marss >> > for SSD simulation... >> > >> > Completed 4583629000 cycles, 787459313 commits: 73877 Hz, >> > 187432 insns/sec: rip ffffffff8108adc8 ffffffff8108e548 >> > ffffffff81013091 ffffffff81013091qemu-system-x86_64: >> > ptlsim/build/cache/splitPhaseBus.cpp:371: bool >> > Memory::SplitPhaseBus::BusInterconnect::broadcast_completed_cb(void*): >> > Assertion `ret' failed. >> > >> > >> > This is running parsec's fluidanimate simlarge with 64 MB of RAM. >> > >> > My SSD simulation code has to check the address stream coming back >> from >> > DRAMSim and if the address is part of a DMA, then it sends the >> callback >> to >> > the SSD and not back to the marss pipeline. My concern about this >> approach >> > was that if the pipeline and the SSD both request the same address at >> the >> > same time, that bad things could happen (e.g. assertion failures). >> > >> > Do you think this is a likely explanation of the above assertion >> failure? >> > >> > I was hoping that this wouldn't be an issue because the OS should know >> not >> > to touch the DMA buffer until after it receives the IRQ (note: the IRQ >> > isn't sent until after all DMA and disk interactions have completed). >> If >> > this is indeed the problem, I think I have a fix for it (I'll have to >> > check both the pending DMA requests AND the pending marss requests), >> but >> > I'm worried this might break something else. And I'm confused as to >> why >> > the software would ever touch the DMA buffer, so it makes me think I >> have >> > a bug elsewhere (probably on the qemu side of things, which is where I >> > extract the scatter gather lists for the DMA buffers). >> > >> > I'm currently running two more tests (with no SSD model and the SSD >> model >> > with DMA off) to see if this error happens again. But I think this is >> the >> > DMA code, given that the assertion failure happened in the >> ptlsim/cache >> > subdirectory. >> > >> > >> > Thanks, >> > >> > Jim Stevens >> > University of Maryland, College Park >> > >> > >> > _______________________________________________ >> > http://www.marss86.org >> > Marss86-Devel mailing list >> > [email protected] >> > https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel >> > >> >> >> >> _______________________________________________ >> http://www.marss86.org >> Marss86-Devel mailing list >> [email protected] >> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel >> > _______________________________________________ http://www.marss86.org Marss86-Devel mailing list [email protected] https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
