Thanks. I'll try it with the bug fix.

The following is not really a question, but just as a follow up if you are
interested. Unfortunately, it appears that the software is touching the
DMA buffer.

---------------

I tried running the experiment again with a larger request queue size and
by having the SSD throttle the DMA accesses (so it only allows 64
outstanding at any one time) and now I got an assertion failure from the
SSD code saying:

PCI_SSD_System.cpp:220: void
PCISSD::PCI_SSD_System::CompleteDMATransaction(bool, uint64_t): Assertion
`isWrite == !old_t.isWrite' failed.

Note: the ! is because the original disk transaction type is flipped for a
DMA (e.g. a disk write causes a DMA read).

The access in causing this is:

2817645044: Completed DRAMSim2 DMA transaction for (1, 44125504)

The previous transactions were:
2817645026: Completed DRAMSim2 DMA transaction for (0, 32815360)
2817645032: Completed DRAMSim2 DMA transaction for (0, 32815488)
2817645038: Completed DRAMSim2 DMA transaction for (0, 32815552)


(note: 0 is read and 1 is write). So it looks like while the DMA was
performing a bunch of reads to DRAM, the software tried to do a write.
This was caught by the DMA's address checker and sent the result to the
DMA code, where it failed the assertion check for the read/write type. I'm
going to add code to check the read/write type before sending it to the
SSD, but I'm very confused as to why the software would be touching the
DMA buffer (and like I said before, it makes me think something could be
wrong on the qemu side where I grab the scatter gather lists for the DMA).

Also, not sure if it matters, but the transaction that started this
particular DMA was very large:

2817629132: Sector addTransaction() arrived (isWrite: 1, addr: 3666849792,
num_sectors: 504)
DMA type is 1 (1 => SSD Write/DMA Read and 0 => SSD Read/DMA Write)

I'm going to try emailing the qemu guys again to understand what is going on.

---------------

Thanks,

Jim Stevens



> This bug is also related to split-bus interconnect because it assumes that
> controller will keep an entry available for interconnect.  I recently
> pushed
> a fix to split-bus that make sure that controller has an entry available
> before sending the request. Its commit id : 6d610a8
>
> Just pull the change from either master or features branch.
> You can definitely increase the memory request queue size, it won't
> break anything else.
>
> - Avadh
>
> On Sun, May 13, 2012 at 12:35 AM, <[email protected]> wrote:
>
>> A quick followup to my prior message:
>>
>> * I ran this again with the SSD model off and the assertion failure
>> didn't
>> happen. This rules out some sort of issue simply with having marss
>> swapping a lot (I was a bit concerned that since we are the only people
>> doing disk swapping with marss right now that we would run into a
>> problem
>> with that).
>>
>> * After looking at the SSD debug logs, it seems that there weren't any
>> DMA
>> transactions outstanding to the memory system when the assertion failure
>> happened. This means that this was not an address conflict from software
>> using the DMA buffer.
>>
>> * I talked with Paul Rosenfeld and he said that the assertion failure
>> seems to be on the path to the memory controller. I looked at the code
>> further and it looks like what is happening is
>> memoryController::handle_interconnect_cb is returning false because
>> pendingRequests_ did not have any open entries. What I think is
>> happening
>> is that because of contention to DRAMSim (the SSD DMA and normal marss
>> last layer cache both using it), marss is building up a long list of
>> pending requests and then eventually the request queue overflows. Does
>> this explain what I'm seeing? If so, can I fix this simply by making the
>> MEM_REQ_NUM larger (right now it is 64)? Or will that break something
>> else?
>>
>> Thanks,
>>
>> Jim Stevens
>>
>>
>>
>> > Hello,
>> >
>> > We've been getting an assertion failure in my modified version of
>> marss
>> > for SSD simulation...
>> >
>> >  Completed    4583629000 cycles,     787459313 commits:     73877 Hz,
>> >   187432 insns/sec: rip ffffffff8108adc8 ffffffff8108e548
>> > ffffffff81013091 ffffffff81013091qemu-system-x86_64:
>> > ptlsim/build/cache/splitPhaseBus.cpp:371: bool
>> > Memory::SplitPhaseBus::BusInterconnect::broadcast_completed_cb(void*):
>> > Assertion `ret' failed.
>> >
>> >
>> > This is running parsec's fluidanimate simlarge with 64 MB of RAM.
>> >
>> > My SSD simulation code has to check the address stream coming back
>> from
>> > DRAMSim and if the address is part of a DMA, then it sends the
>> callback
>> to
>> > the SSD and not back to the marss pipeline. My concern about this
>> approach
>> > was that if the pipeline and the SSD both request the same address at
>> the
>> > same time, that bad things could happen (e.g. assertion failures).
>> >
>> > Do you think this is a likely explanation of the above assertion
>> failure?
>> >
>> > I was hoping that this wouldn't be an issue because the OS should know
>> not
>> > to touch the DMA buffer until after it receives the IRQ (note: the IRQ
>> > isn't sent until after all DMA and disk interactions have completed).
>> If
>> > this is indeed the problem, I think I have a fix for it (I'll have to
>> > check both the pending DMA requests AND the pending marss requests),
>> but
>> > I'm worried this might break something else. And I'm confused as to
>> why
>> > the software would ever touch the DMA buffer, so it makes me think I
>> have
>> > a bug elsewhere (probably on the qemu side of things, which is where I
>> > extract the scatter gather lists for the DMA buffers).
>> >
>> > I'm currently running two more tests (with no SSD model and the SSD
>> model
>> > with DMA off) to see if this error happens again. But I think this is
>> the
>> > DMA code, given that the assertion failure happened in the
>> ptlsim/cache
>> > subdirectory.
>> >
>> >
>> > Thanks,
>> >
>> > Jim Stevens
>> > University of Maryland, College Park
>> >
>> >
>> > _______________________________________________
>> > http://www.marss86.org
>> > Marss86-Devel mailing list
>> > [email protected]
>> > https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>> >
>>
>>
>>
>> _______________________________________________
>> http://www.marss86.org
>> Marss86-Devel mailing list
>> [email protected]
>> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>>
>



_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Reply via email to