ok I think I find the bug. I used "continue" and "ctrl+c" multiple
times to see if it stuck at a particular function. The backtrace
shows:
#0 0x00000000004dfee7 in __gnu_cxx::hashtable<std::pair<unsigned long
const, X86ISA::TlbEntry>, unsigned long, __gnu_cxx::hash<unsigned
long>, std::_Select1st<std::pair<unsigned long const,
X86ISA::TlbEntry> >, std::equal_to<unsigned long>,
std::allocator<X86ISA::TlbEntry> >::_M_bkt_num_key (this=0x28b8970,
__key=@0x21d9a7c8, __n=50331653) at
/usr/include/c++/4.4/backward/hashtable.h:590
#1 0x00000000004dfff9 in __gnu_cxx::hashtable<std::pair<unsigned long
const, X86ISA::TlbEntry>, unsigned long, __gnu_cxx::hash<unsigned
long>, std::_Select1st<std::pair<unsigned long const,
X86ISA::TlbEntry> >, std::equal_to<unsigned long>,
std::allocator<X86ISA::TlbEntry> >::_M_bkt_num (this=0x28b8970,
__obj=..., __n=50331653) at
/usr/include/c++/4.4/backward/hashtable.h:594
#2 0x00000000004df9c8 in __gnu_cxx::hashtable<std::pair<unsigned long
const, X86ISA::TlbEntry>, unsigned long, __gnu_cxx::hash<unsigned
long>, std::_Select1st<std::pair<unsigned long const,
X86ISA::TlbEntry> >, std::equal_to<unsigned long>,
std::allocator<X86ISA::TlbEntry> >::resize (this=0x28b8970,
__num_elements_hint=25165844) at
/usr/include/c++/4.4/backward/hashtable.h:1001
#3 0x00000000004df100 in __gnu_cxx::hashtable<std::pair<unsigned long
const, X86ISA::TlbEntry>, unsigned long, __gnu_cxx::hash<unsigned
long>, std::_Select1st<std::pair<unsigned long const,
X86ISA::TlbEntry> >, std::equal_to<unsigned long>,
std::allocator<X86ISA::TlbEntry> >::find_or_insert (this=0x28b8970,
__obj=...) at /usr/include/c++/4.4/backward/hashtable.h:789
#4 0x00000000004deaca in __gnu_cxx::hash_map<unsigned long,
X86ISA::TlbEntry, __gnu_cxx::hash<unsigned long>,
std::equal_to<unsigned long>, std::allocator<X86ISA::TlbEntry>
::operator[] (this=0x28b8970,
__key=@0x7fffffffba80) at /usr/include/c++/4.4/ext/hash_map:217
#5 0x00000000004daa68 in PageTable::map (this=0x28b8970,
vaddr=47015569313792, paddr=103079288832,
size=5548434767986339840, clobber=false) at
build/X86/mem/page_table.cc:82
#6 0x000000000074b9c8 in Process::allocateMem (this=0x30be640,
vaddr=46912496128000,
size=5548434871059525632, clobber=false) at
build/X86/sim/process.cc:332
#7 0x00000000007aba21 in mmapFunc<X86Linux64> (desc=0x2052fb8, num=9,
p=0x30be640, tc=0x3331210)
at build/X86/sim/syscall_emul.hh:1069
#8 0x000000000073ca11 in SyscallDesc::doSyscall (this=0x2052fb8,
callnum=9, process=0x30be640,
tc=0x3331210) at build/X86/sim/syscall_emul.cc:69
#9 0x00000000007516a0 in LiveProcess::syscall (this=0x30be640,
callnum=9, tc=0x3331210)
at build/X86/sim/process.cc:590
#10 0x0000000000c10ce3 in SimpleThread::syscall (this=0x33305d0, callnum=9)
at build/X86/cpu/simple_thread.hh:384
As you can see there is a problem with mmapFunc<X86Linux64> syscall
which allocate memory through Process::allocateMem
That is my understanding....
On 4/27/12, Mahmood Naderan <[email protected]> wrote:
Is this useful?
339051500: system.cpu + A0 T0 : 0x83d48d.4 : CALL_NEAR_I : wrip ,
t7, t1 : IntAlu :
339052000: system.cpu.icache: ReadReq (ifetch) 452f90 hit
339052000: system.cpu + A0 T0 : 0x852f90 : mov r10, rcx
339052000: system.cpu + A0 T0 : 0x852f90.0 : MOV_R_R : mov r10,
r10, rcx : IntAlu : D=0x0000000000000022
339052500: system.cpu.icache: ReadReq (ifetch) 452f90 hit
339052500: system.cpu + A0 T0 : 0x852f93 : mov eax, 0x9
339052500: system.cpu + A0 T0 : 0x852f93.0 : MOV_R_I : limm eax,
0x9 : IntAlu : D=0x0000000000000009
339053000: system.cpu.icache: ReadReq (ifetch) 452f98 hit
^C
Program received signal SIGINT, Interrupt.
0x00000000004e0f90 in
std::__fill_n_a<__gnu_cxx::_Hashtable_node<std::pair<unsigned long
const, X86ISA::TlbEntry> >**, unsigned long,
__gnu_cxx::_Hashtable_node<std::pair<unsigned long const,
X86ISA::TlbEntry> >*> (__first=0x7fff70017000, __n=4065295,
__value=@0x7fffffffb8d0)
at /usr/include/c++/4.4/bits/stl_algobase.h:758
758 *__first = __tmp;
(gdb) ^CQuit
(gdb)
On 4/27/12, Steve Reinhardt <[email protected]> wrote:
Perhaps you could fire off the run under gdb, and use the --debug-break
flag to drop in to gdb at the tick where it seems to stop running. If
the
simulation stops and memory blows up, it's almost like you're stuck in
some
subtle infinite loop with a memory allocation in it. (You might have to
continue just a little past there and hit ctrl-c before it dies to catch
it
in the middle of this loop.)
On Fri, Apr 27, 2012 at 11:29 AM, Mahmood Naderan
<[email protected]>wrote:
i searched for something similar (stoping the simulation when it reach
at a specific memory usage to prevent killing) but didn't find such
thing. Do you know?
I also attached gdb. it doesn't show anything useful because lastly it
get killed.
On 4/27/12, Gabe Black <[email protected]> wrote:
Valgrind should tell you where the leaked memory was allocated. You
may
have to give it a command line option for that, or stop it before it
gets itself killed.
Gabe
On 04/27/12 11:10, Steve Reinhardt wrote:
Can you attach gdb when it does this, see where it's at, and maybe
step through the code a bit to see what it's doing?
On Fri, Apr 27, 2012 at 10:54 AM, Mahmood Naderan
<[email protected] <mailto:[email protected]>> wrote:
That was a guess. As I said, i turned on the debugger to see
when
it
start eating the memory. As you can see the last messageit print
is:
339069000: system.cpu + A0 T0 : 0x852f93.0 : MOV_R_I : limm
eax,
0x9 : IntAlu : D=0x0000000000000009
339069500: system.cpu.icache: set be: moving blk 452f80 to MRU
339069500: system.cpu.icache: ReadReq (ifetch) 452f98 hit
Then no message is printed and I see, with top command, that the
memory usage gos up and up until it consumes all memory.
On 4/27/12, Nilay Vaish <[email protected]
<mailto:[email protected]>> wrote:
> How do you know the instruction at which the memory starts
leaking? What
> should we conclude from the instruction trace in your mail. I
am
unable to
> arrive at any conclusion from the valgrind report that you had
attached.
> Apart from the info on uninitialized values, I did not find
any
useful
> output produced by valgrind.
>
> --
> Nilay
>
> On Fri, 27 Apr 2012, Mahmood Naderan wrote:
>
>> tonto with the test input uses about 4 GB and runs for about
2
seconds
>> on a real machine.
>>
>> I also used the test input with gem5. However again after
tick
>> 300000000, all the 30GB memory is used and then gem5 is
killed.
The
>> same behaviour with ref input...
>>
>> I ran the following command:
>> valgrind --tool=memcheck --leak-check=full
--track-origins=yes
>> --suppressions=../util/valgrind-suppressions
../build/X86/m5.debug
>> --debug-flags=Cache,ExecAll,Bus,CacheRepl,Context
>> --trace-start=339050000 ../configs/example/se.py -c
>> tonto_base.amd64-m64-gcc44-nn --cpu-type=detailed -F 5000000
--maxtick
>> 10000000 --caches --l2cache --prog-interval=100000
>>
>>
>> I also attach the report again. At the instruction that the
memory
>> leak begins, you can see:
>> ...
>> 339066000: system.cpu + A0 T0 : 0x83d48d : call 0x15afe
>> 339066000: system.cpu + A0 T0 : 0x83d48d.0 : CALL_NEAR_I :
limm
>> t1, 0x15afe : IntAlu : D=0x0000000000015afe
>> 339066500: system.cpu + A0 T0 : 0x83d48d.1 : CALL_NEAR_I :
rdip
>> t7, %ctrl153, : IntAlu : D=0x000000000083d492
>> 339067000: system.cpu.dcache: set 9a: moving blk 5aa680 to
MRU
>> 339067000: system.cpu.dcache: WriteReq 5aa6b8 hit
>> 339067000: system.cpu + A0 T0 : 0x83d48d.2 : CALL_NEAR_I :
st t7,
>> SS:[rsp + 0xfffffffffffffff8] : MemWrite :
D=0x000000000083d492
>> A=0x7fffffffe6b8
>> 339067500: system.cpu + A0 T0 : 0x83d48d.3 : CALL_NEAR_I :
subi
>> rsp, rsp, 0x8 : IntAlu : D=0x00007fffffffe6b8
>> 339068000: system.cpu + A0 T0 : 0x83d48d.4 : CALL_NEAR_I :
wrip ,
>> t7, t1 : IntAlu :
>> 339068500: system.cpu.icache: set be: moving blk 452f80 to
MRU
>> 339068500: system.cpu.icache: ReadReq (ifetch) 452f90 hit
>> 339068500: system.cpu + A0 T0 : 0x852f90 : mov r10, rcx
>> 339068500: system.cpu + A0 T0 : 0x852f90.0 : MOV_R_R : mov
r10,
>> r10, rcx : IntAlu : D=0x0000000000000022
>> 339069000: system.cpu.icache: set be: moving blk 452f80 to
MRU
>> 339069000: system.cpu.icache: ReadReq (ifetch) 452f90 hit
>> 339069000: system.cpu + A0 T0 : 0x852f93 : mov eax, 0x9
>> 339069000: system.cpu + A0 T0 : 0x852f93.0 : MOV_R_I :
limm
eax,
>> 0x9 : IntAlu : D=0x0000000000000009
>> 339069500: system.cpu.icache: set be: moving blk 452f80 to
MRU
>> 339069500: system.cpu.icache: ReadReq (ifetch) 452f98 hit
>>
>>
>> What is your opinion then?
>> Regards,
>>
>> On 4/27/12, Steve Reinhardt <[email protected]
<mailto:[email protected]>> wrote:
>>> Also, if you do run valgrind, use the
util/valgrind-suppressions file to
>>> suppress spurious reports. Read the valgrind docs to see
how
this
>>> works.
>>>
>>> Steve
>>>
> _______________________________________________
> gem5-users mailing list
> [email protected] <mailto:[email protected]>
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
--
// Naderan *Mahmood;
_______________________________________________
gem5-users mailing list
[email protected] <mailto:[email protected]>
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
// Naderan *Mahmood;
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
// Naderan *Mahmood;
--
// Naderan *Mahmood;