Hi Brad,

I tested the different changesets and have narrowed down to where it begins.

The last changeset that works (since 7842) is 7905.

At 7906 this is the error:

command line: ./build/ALPHA_FS_MOESI_CMP_directory/m5.opt
./configs/example/ruby\
_fs.py -n 4 --topology Crossbar
Global frequency set at 1000000000000 ticks per second
info: kernel located at: /home/musleh/M5/m5_system_2.0b3/binaries/vmlinux
Listening for system connection on port 3456
      0: system.tsunami.io.rtc: Real-time clock set to Thu Jan  1 00:00:00 2009
0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000
0: system.remote_gdb.listener: listening for remote gdb #1 on port 7001
0: system.remote_gdb.listener: listening for remote gdb #2 on port 7002
0: system.remote_gdb.listener: listening for remote gdb #3 on port 7003
**** REAL SIMULATION ****
info: Entering event queue @ 0.  Starting simulation...
info: Launching CPU 1 @ 835461000
info: Launching CPU 2 @ 846156000
info: Launching CPU 3 @ 856768000
warn: Prefetch instrutions is Alpha do not do anything
For more information see: http://www.m5sim.org/warn/3e0eccba
1349195500: system.terminal: attach terminal 0
warn: Prefetch instrutions is Alpha do not do anything
For more information see: http://www.m5sim.org/warn/3e0eccba
m5.opt: build/ALPHA_FS_MOESI_CMP_directory/mem/ruby/system/RubyPort.cc:230:
virt\
ual bool RubyPort::M5Port::recvTiming(Packet*): Assertion
`Address(ruby_request.\
paddr).getOffset() + ruby_request.len <=
RubySystem::getBlockSizeBytes()' failed\
.
Program aborted at cycle 2406378289516
Aborted


The same error occurs for 7907 - 7908.

At changeset 7909 is where the dma_expiry error first shows up:

7909:

hda: M5 IDE Disk, ATA DISK drive
hdb: M5 IDE Disk, ATA DISK drive
hda: UDMA/33 mode selected
hdb: UDMA/33 mode selected
ide0 at 0x8410-0x8417,0x8422 on irq 31
ide1 at 0x8418-0x841f,0x8426 on irq 31
ide_generic: please use "probe_mask=0x3f" module parameter for probing
all legac\
y ISA IDE ports
ide2 at 0x1f0-0x1f7,0x3f6 on irq 14
ide3 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 101808 sectors (52 MB), CHS=101/16/63
 hda:<4>hda: dma_timer_expiry: dma status == 0x65
hda: DMA interrupt recovery
hda: lost interrupt
 unknown partition table
hdb: max request size: 128KiB
hdb: 4177920 sectors (2139 MB), CHS=4144/16/63

I tested changeset 7920:

and thats where I notice the handleResponse()

7920:

M5 compiled Feb 10 2011 14:49:49
M5 revision 39c86a8306d2+ 7920+ default
M5 started Feb 10 2011 14:53:38
M5 executing on sherpa05
command line: ./build/ALPHA_FS_MOESI_CMP_directory/m5.opt
./configs/example/ruby\
_fs.py -n 4 --topology Crossbar
Global frequency set at 1000000000000 ticks per second
info: kernel located at: /home/musleh/M5/m5_system_2.0b3/binaries/vmlinux
Listening for system connection on port 3456
      0: system.tsunami.io.rtc: Real-time clock set to Thu Jan  1 00:00:00 2009
0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000
0: system.remote_gdb.listener: listening for remote gdb #1 on port 7001
0: system.remote_gdb.listener: listening for remote gdb #2 on port 7002
0: system.remote_gdb.listener: listening for remote gdb #3 on port 7003
**** REAL SIMULATION ****
info: Entering event queue @ 0.  Starting simulation...
info: Launching CPU 1 @ 835461000
info: Launching CPU 2 @ 846156000
info: Launching CPU 3 @ 856768000
warn: Prefetch instrutions is Alpha do not do anything
For more information see: http://www.m5sim.org/warn/3e0eccba
1128875500: system.terminal: attach terminal 0
warn: Prefetch instrutions is Alpha do not do anything
For more information see: http://www.m5sim.org/warn/3e0eccba
m5.opt: build/ALPHA_FS_MOESI_CMP_directory/mem/packet.hh:590: void
Packet::makeResponse(): Assertion `needsResponse()' failed.
Program aborted at cycle 36235566500
Aborted

Note that I have not tested changesets 7911-7918.

I have tested the MOESI_CMP_directory protocol on all of these with
m5.opt. I have testes using MESI_CMP_directory for some of them and
got the same messages.

This is my command line:

./build/ALPHA_FS_MOESI_CMP_directory/m5.opt -
./configs/example/ruby_fs.py -n 4 --topology Crossbar

The error comes at about 15 minutes in to boot the kernel. Note that
it takes a while for the io to be scheduled.

io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)

In all cases though where the dma_expiry occurs (which does not
include changesets 7906-7908), the last thing that appears is this:

ide0 at 0x8410-0x8417,0x8422 on irq 31
ide1 at 0x8418-0x841f,0x8426 on irq 31
ide_generic: please use "probe_mask=0x3f" module parameter for probing
all legacy ISA IDE ports
ide2 at 0x1f0-0x1f7,0x3f6 on irq 14
ide3 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 101808 sectors (52 MB), CHS=101/16/63
 hda:<4>hda: dma_timer_expiry: dma status == 0x65
hda: DMA interrupt recovery
hda: lost interrupt
 unknown partition table
hdb: max request size: 128KiB
hdb: 4177920 sectors (2139 MB), CHS=4144/16/63

Is it possible to generate a trace for Ruby in M5 the way it is for
Ruby in GEMS like something of this sort:

http://www.cs.wisc.edu/gems/doc/gems-wiki/moin.cgi/How_do_I_understand_a_Protocol

?

Let me know if you need anymore information.

Malek

On Thu, Feb 10, 2011 at 4:43 PM, Beckmann, Brad <[email protected]> wrote:
> H Malek,
>
> Hmm...I have never seen that type of error before.  As you mentioned, I don't 
> think any of my recent patches changed how DMA is executed for ALPHA_FS.
>
> How long does it take for you to encounter the error?  It would be great if 
> you could tell me how I can reproduce the error.  I would like to look at 
> this in more detail and get a protocol trace of what is going on.
>
> Thanks,
>
> Brad
>
>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]]
>> On Behalf Of Malek Musleh
>> Sent: Thursday, February 10, 2011 5:05 AM
>> To: M5 Developer List
>> Subject: Re: [m5-dev] Ruby FS Fails with recent Changesets
>>
>> Hi Brad,
>>
>> I tested your latest changeset, and it seems that it 'solves' the
>> handleResponse error I was getting when running 3 or more cores, but the
>> dma_expiry error is still there.
>>
>> Such that, now the error is consistent, no matter what number of cores I try
>> to run with:
>>
>> For more information see: http://www.m5sim.org/warn/3e0eccba
>> panic: Inconsistent DMA transfer state: dmaState = 2 devState = 1  @ cycle
>> 62411238889001
>> [doDmaTransfer:build/ALPHA_FS_MOESI_CMP_directory/dev/ide_disk.cc,
>> line 323] Memory Usage: 382600 KBytes
>>
>> ------------------------- M5 Terminal -------------------
>> hda: max request size: 128KiB
>> hda: 101808 sectors (52 MB), CHS=101/16/63
>>  hda:<4>hda: dma_timer_expiry: dma status == 0x65
>> hda: DMA interrupt recovery
>> hda: lost interrupt
>>  unknown partition table
>> hdb: max request size: 128KiB
>> hdb: 4177920 sectors (2139 MB), CHS=4144/16/63
>>  hdb:<4>hdb: dma_timer_expiry: dma status == 0x65
>> hdb: DMA interrupt recovery
>> hdb: lost interrupt
>>
>> The panic error seems to suggest an inconsistent DMA state, so I tried
>> reverting to an older changeset (before DMA changes were pushed out)
>> such as 7936, and even 7930 but no such luck.
>>
>> The changeset that I know works from last week or so is changeset 7842.
>> Looking at the changset summaries between 7842 and 7930 seem to indicate
>> a lot of changes 'unrelated' to the DMA, such as O3, InOrderCPU, and x86
>> changes. That being said, I did not do a diff on those intermediate 
>> changesets
>> to verify that maybe a related file was slightly modified in the process.
>>
>> I might be able to spend some more time trying changesets till I narrow down
>> which one its coming from, but maybe the new panic message might give
>> you some indication on how to fix it?
>>
>> (I think the panic messaged appeared now and not before because I let the
>> simulation terminate itself when running overnight as opposed to me killing 
>> it
>> once I saw the dma_expiry message on the M5 Terminal).
>>
>> Malek
>>
>> On Wed, Feb 9, 2011 at 7:00 PM, Beckmann, Brad
>> <[email protected]> wrote:
>> > Hi Malek,
>> >
>> > Yes, thanks for letting us know.  I'm pretty sure I know what the problem
>> is.  Previously, if a SC operation failed, the RubyPort would convert the
>> request packet to a response packet, bypassed writing the functional view of
>> memory, and pass it back up to the CPU.  In my most recent patches I
>> generalized the mechanism that converts request packets to response
>> packets and avoids writing functional memory.  However, I forgot to remove
>> the duplicate request to response conversion for failed SC
>> requests.  Therefore, I bet you are encounter that assertion error on that
>> duplicate call.  It should be a simple one line change that fixes your
>> problem.  I'll push it momentarily and it would be great if you could confirm
>> that my change does indeed fix your problem.
>> >
>> > Brad
>> >
>> >
>> >
>> >> -----Original Message-----
>> >> From: [email protected] [mailto:m5-dev-
>> [email protected]] On
>> >> Behalf Of Gabe Black
>> >> Sent: Wednesday, February 09, 2011 3:54 PM
>> >> To: M5 Developer List
>> >> Subject: Re: [m5-dev] Ruby FS Fails with recent Changesets
>> >>
>> >> Thanks for letting us know. If it wouldn't be too much trouble, could
>> >> you please try some other changesets near the one that isn't working
>> >> and try to determine which one specifically broke things? A bunch of
>> >> changes went in recently so it would be helpful to narrow things
>> >> down. I'm not very involved with Ruby right now personally, but I
>> >> assume that would be useful information for the people that are.
>> >>
>> >> Gabe
>> >>
>> >> On 02/09/11 14:51, Malek Musleh wrote:
>> >> > Hello,
>> >> >
>> >> > I first started using the Ruby Model in M5  about a week or so ago,
>> >> > and was able to boot in FS mode (up to 64 cores once applying the
>> >> > BigTsunami patches).
>> >> >
>> >> > In order to keep up with the changes in the Ruby code, I have
>> >> > started fetching recent updates from the devrepo.
>> >> >
>> >> > However, in fetching the updates to the recent changesets (from the
>> >> > last 2 days) Ruby FS does not boot. I tried both MESI_CMP_directory
>> >> > and MOESI_CMP_directory.
>> >> >
>> >> > If running 2 cores or less I get this at the terminal screen after
>> >> > letting it run for some time:
>> >> >
>> >> > hda: M5 IDE Disk, ATA DISK drive
>> >> > hdb: M5 IDE Disk, ATA DISK drive
>> >> > hda: UDMA/33 mode selected
>> >> > hdb: UDMA/33 mode selected
>> >> > ide0 at 0x8410-0x8417,0x8422 on irq 31
>> >> > ide1 at 0x8418-0x841f,0x8426 on irq 31
>> >> > ide_generic: please use "probe_mask=0x3f" module parameter for
>> >> > probing all legacy ISA IDE ports
>> >> > ide2 at 0x1f0-0x1f7,0x3f6 on irq 14
>> >> > ide3 at 0x170-0x177,0x376 on irq 15
>> >> > hda: max request size: 128KiB
>> >> > hda: 101808 sectors (52 MB), CHS=101/16/63
>> >> >  hda:<4>hda: dma_timer_expiry: dma status == 0x65
>> >> > <------------------------------------------------------- problem
>> >> >
>> >> >
>> >> > When running 3 or more cores, I get the following assertion failure:
>> >> >
>> >> >
>> >> > info: kernel located at:
>> >> > /home/musleh/M5/m5_system_2.0b3/binaries/vmlinux
>> >> > Listening for system connection on port 3456
>> >> >       0: system.tsunami.io.rtc: Real-time clock set to Thu Jan  1
>> >> > 00:00:00 2009
>> >> > 0: system.remote_gdb.listener: listening for remote gdb #0 on port
>> >> > 7000
>> >> > 0: system.remote_gdb.listener: listening for remote gdb #1 on port
>> >> > 7001
>> >> > 0: system.remote_gdb.listener: listening for remote gdb #2 on port
>> >> > 7002
>> >> > 0: system.remote_gdb.listener: listening for remote gdb #3 on port
>> >> > 7003
>> >> > **** REAL SIMULATION ****
>> >> > info: Entering event queue @ 0.  Starting simulation...
>> >> > info: Launching CPU 1 @ 834794000
>> >> > info: Launching CPU 2 @ 845489000
>> >> > info: Launching CPU 3 @ 856101000
>> >> > m5.opt: build/ALPHA_FS_MESI_CMP_directory/mem/packet.hh:590:
>> void
>> >> > Packet::makeResponse(): Assertion `needsResponse()' failed.
>> >> > Program aborted at cycle 977160000
>> >> > Aborted
>> >> >
>> >> > The top of the tree is this last changeset:
>> >> >
>> >> > changeset:   7939:215c8be67063
>> >> > tag:         tip
>> >> > user:        Brad Beckmann <[email protected]>
>> >> > date:        Tue Feb 08 18:07:54 2011 -0800
>> >> > summary:     regess: protocol regression tester updates
>> >> >
>> >> > I am not sure if those whom it concern are aware of it or not, or
>> >> > if there will be a soon to be updated changeset already in the
>> >> > works for this or not, but I figured I would bring it to your attention.
>> >> >
>> >> > Malek
>> >> > _______________________________________________
>> >> > m5-dev mailing list
>> >> > [email protected]
>> >> > http://m5sim.org/mailman/listinfo/m5-dev
>> >>
>> >> _______________________________________________
>> >> m5-dev mailing list
>> >> [email protected]
>> >> http://m5sim.org/mailman/listinfo/m5-dev
>> >
>> >
>> > _______________________________________________
>> > m5-dev mailing list
>> > [email protected]
>> > http://m5sim.org/mailman/listinfo/m5-dev
>> >
>> _______________________________________________
>> m5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/m5-dev
>
>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to