Re: [osv-dev] Re: OOM query

2020-03-24 Thread Waldek Kozaczuk
I remember it is an Erlang app. Is there any comparable open source
app/example I could run to re-create your issue?

On Tue, Mar 24, 2020 at 17:49 Waldek Kozaczuk  wrote:

> Also, besides the main app, what other OSv modules do you have in the
> image? Httpserver-api, cloud-init, ?
>
> On Tue, Mar 24, 2020 at 17:44 Waldek Kozaczuk 
> wrote:
>
>>
>>
>> On Tue, Mar 24, 2020 at 17:34 Rick Payne  wrote:
>>
>>>
>>> I backed out the original patch, applied the other two. Do I need the
>>> first one still?
>>
>>
>> Nope. You should be fine.
>>
>>> We're not on master, but we're pretty close. Last
>>> synced on March 3rd, commit ref 92eb26f3a645
>>>
>>> scripts/build check runs fine:
>>>
>>> OK (131 tests run, 274.
>>
>> Well there may be a bug in my patches. It looks my patches make your
>> situation even worse. Did you try to connect with gdb to get better stack
>> trace? Ideally ‘osv info threads’
>>
>>>
>>>
>>> Rick
>>>
>>> On Tue, 2020-03-24 at 17:15 -0400, Waldek Kozaczuk wrote:
>>> > Is it with the exact same code as on master with the latest 2 patches
>>> > I sent applied ? Does ‘scripts/build check’ pass for you?
>>> >
>>> > On Tue, Mar 24, 2020 at 16:56 Rick Payne 
>>> > wrote:
>>> > > I tried the patches, but its crashing almost instantly...
>>> > >
>>> > > page fault outside application, addr: 0x56c0
>>> > > [registers]
>>> > > RIP: 0x403edd23 
>>> > > RFL:
>>> > > 0x00010206  CS:  0x0008  SS:
>>> > > 0x0010
>>> > > RAX: 0x56c0  RBX: 0x200056c00040  RCX:
>>> > > 0x004c  RDX: 0x0008
>>> > > RSI: 0x004c  RDI: 0x200056c00040  RBP:
>>> > > 0x200041501740  R8:  0x
>>> > > R9:  0x5e7a7333  R10: 0x  R11:
>>> > > 0x  R12: 0x004c
>>> > > R13: 0x5e7a7333  R14: 0x  R15:
>>> > > 0x  RSP: 0x2000415016f8
>>> > > Aborted
>>> > >
>>> > > [backtrace]
>>> > > 0x40343779 
>>> > > 0x4034534d >> > > exception_frame*)+397>
>>> > > 0x403a667b 
>>> > > 0x403a54c6 
>>> > > 0x1174c2b0 
>>> > > 0x 
>>> > >
>>> > > (gdb) osv heap
>>> > > 0x8e9ad000 0x22a53000
>>> > > 0x8e9a2000 0x4000
>>> > >
>>> > > Rick
>>> > >
>>> > >
>>> > >
>>> > > On Mon, 2020-03-23 at 22:06 -0700, Waldek Kozaczuk wrote:
>>> > > > I have sent a more complete patch that should also address
>>> > > > fragmentation issue with requests >= 4K and < 2MB.
>>> > > >
>>> > > > On Monday, March 23, 2020 at 6:12:51 PM UTC-4, Waldek Kozaczuk
>>> > > wrote:
>>> > > > > I have just sent a new patch to the mailing list. I am hoping
>>> > > it
>>> > > > > will address the OOM crash if my theory of heavy memory
>>> > > > > fragmentation is right. It would be nice if Nadav could review
>>> > > it.
>>> > > > >
>>> > > > > Regardless if you have another crash in production and are able
>>> > > to
>>> > > > > connect with gdb, could you run 'osv heap' - it should show
>>> > > > > free_page_ranges. If memory is heavily fragmented we should see
>>> > > a
>>> > > > > long list.
>>> > > > >
>>> > > > > It would be nice to recreate that load in dev env and capture
>>> > > the
>>> > > > > memory trace data (BWT you do not need to enable backtrace to
>>> > > have
>>> > > > > enough useful information). It would help us better understand
>>> > > how
>>> > > > > memory is allocated by the app. I saw you send me one trace but
>>> > > it
>>> > > > > does not seem to be revealing anything interesting.
>>> > > > >
>>> > > > > Waldek
>>> > > > >
>>> > > > > On Monday, March 23, 2020 at 1:19:18 AM UTC-4, rickp wrote:
>>> > > > > > On Sun, 2020-03-22 at 22:08 -0700, Waldek Kozaczuk wrote:
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > On Monday, March 23, 2020 at 12:36:52 AM UTC-4, rickp
>>> > > wrote:
>>> > > > > > > > Looks to me like its trying to allocate 40MB but the
>>> > > > > > available
>>> > > > > > > > memory
>>> > > > > > > > is 10GB, surely? 10933128KB is 10,933MB
>>> > > > > > > >
>>> > > > > > >
>>> > > > > > > I misread the number - forgot about 1K.
>>> > > > > > >
>>> > > > > > > Any chance you could run the app outside of production with
>>> > > > > > memory
>>> > > > > > > tracing enabled -
>>> > > > > > >
>>> > > > > >
>>> > >
>>> https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py#tracing-memory-allocations
>>> > > > > >
>>> > > > > > >  (without --trace-backtrace) for while? And then we can
>>> > > have a
>>> > > > > > better
>>> > > > > > > sense of what kind of allocations it makes. The output of
>>> > > > > > trace
>>> > > > > > > memory-analyzer would be really helpful.
>>> > > > > >
>>> > > > > > I can certainly run that local with locally generated
>>> > > workloads,
>>> > > > > > which
>>> > > > > > should be close enough - but we've never managed to trigger
>>> > > the
>>> > > > > > oom
>>> > > > > > condition that way (other tha

Re: [osv-dev] Re: OOM query

2020-03-24 Thread Waldek Kozaczuk
Also, besides the main app, what other OSv modules do you have in the
image? Httpserver-api, cloud-init, ?

On Tue, Mar 24, 2020 at 17:44 Waldek Kozaczuk  wrote:

>
>
> On Tue, Mar 24, 2020 at 17:34 Rick Payne  wrote:
>
>>
>> I backed out the original patch, applied the other two. Do I need the
>> first one still?
>
>
> Nope. You should be fine.
>
>> We're not on master, but we're pretty close. Last
>> synced on March 3rd, commit ref 92eb26f3a645
>>
>> scripts/build check runs fine:
>>
>> OK (131 tests run, 274.
>
> Well there may be a bug in my patches. It looks my patches make your
> situation even worse. Did you try to connect with gdb to get better stack
> trace? Ideally ‘osv info threads’
>
>>
>>
>> Rick
>>
>> On Tue, 2020-03-24 at 17:15 -0400, Waldek Kozaczuk wrote:
>> > Is it with the exact same code as on master with the latest 2 patches
>> > I sent applied ? Does ‘scripts/build check’ pass for you?
>> >
>> > On Tue, Mar 24, 2020 at 16:56 Rick Payne 
>> > wrote:
>> > > I tried the patches, but its crashing almost instantly...
>> > >
>> > > page fault outside application, addr: 0x56c0
>> > > [registers]
>> > > RIP: 0x403edd23 
>> > > RFL:
>> > > 0x00010206  CS:  0x0008  SS:
>> > > 0x0010
>> > > RAX: 0x56c0  RBX: 0x200056c00040  RCX:
>> > > 0x004c  RDX: 0x0008
>> > > RSI: 0x004c  RDI: 0x200056c00040  RBP:
>> > > 0x200041501740  R8:  0x
>> > > R9:  0x5e7a7333  R10: 0x  R11:
>> > > 0x  R12: 0x004c
>> > > R13: 0x5e7a7333  R14: 0x  R15:
>> > > 0x  RSP: 0x2000415016f8
>> > > Aborted
>> > >
>> > > [backtrace]
>> > > 0x40343779 
>> > > 0x4034534d > > > exception_frame*)+397>
>> > > 0x403a667b 
>> > > 0x403a54c6 
>> > > 0x1174c2b0 
>> > > 0x 
>> > >
>> > > (gdb) osv heap
>> > > 0x8e9ad000 0x22a53000
>> > > 0x8e9a2000 0x4000
>> > >
>> > > Rick
>> > >
>> > >
>> > >
>> > > On Mon, 2020-03-23 at 22:06 -0700, Waldek Kozaczuk wrote:
>> > > > I have sent a more complete patch that should also address
>> > > > fragmentation issue with requests >= 4K and < 2MB.
>> > > >
>> > > > On Monday, March 23, 2020 at 6:12:51 PM UTC-4, Waldek Kozaczuk
>> > > wrote:
>> > > > > I have just sent a new patch to the mailing list. I am hoping
>> > > it
>> > > > > will address the OOM crash if my theory of heavy memory
>> > > > > fragmentation is right. It would be nice if Nadav could review
>> > > it.
>> > > > >
>> > > > > Regardless if you have another crash in production and are able
>> > > to
>> > > > > connect with gdb, could you run 'osv heap' - it should show
>> > > > > free_page_ranges. If memory is heavily fragmented we should see
>> > > a
>> > > > > long list.
>> > > > >
>> > > > > It would be nice to recreate that load in dev env and capture
>> > > the
>> > > > > memory trace data (BWT you do not need to enable backtrace to
>> > > have
>> > > > > enough useful information). It would help us better understand
>> > > how
>> > > > > memory is allocated by the app. I saw you send me one trace but
>> > > it
>> > > > > does not seem to be revealing anything interesting.
>> > > > >
>> > > > > Waldek
>> > > > >
>> > > > > On Monday, March 23, 2020 at 1:19:18 AM UTC-4, rickp wrote:
>> > > > > > On Sun, 2020-03-22 at 22:08 -0700, Waldek Kozaczuk wrote:
>> > > > > > >
>> > > > > > >
>> > > > > > > On Monday, March 23, 2020 at 12:36:52 AM UTC-4, rickp
>> > > wrote:
>> > > > > > > > Looks to me like its trying to allocate 40MB but the
>> > > > > > available
>> > > > > > > > memory
>> > > > > > > > is 10GB, surely? 10933128KB is 10,933MB
>> > > > > > > >
>> > > > > > >
>> > > > > > > I misread the number - forgot about 1K.
>> > > > > > >
>> > > > > > > Any chance you could run the app outside of production with
>> > > > > > memory
>> > > > > > > tracing enabled -
>> > > > > > >
>> > > > > >
>> > >
>> https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py#tracing-memory-allocations
>> > > > > >
>> > > > > > >  (without --trace-backtrace) for while? And then we can
>> > > have a
>> > > > > > better
>> > > > > > > sense of what kind of allocations it makes. The output of
>> > > > > > trace
>> > > > > > > memory-analyzer would be really helpful.
>> > > > > >
>> > > > > > I can certainly run that local with locally generated
>> > > workloads,
>> > > > > > which
>> > > > > > should be close enough - but we've never managed to trigger
>> > > the
>> > > > > > oom
>> > > > > > condition that way (other than by really constraining the
>> > > memory
>> > > > > > artificially). It should be close enough though - let me see
>> > > what
>> > > > > > I can
>> > > > > > do.
>> > > > > >
>> > > > > > Rick
>> > > > > >
>> > > > > >
>> > > >
>> > > > --
>> > > > You received this message because you are subscribed to the
>> > 

Re: [osv-dev] Re: OOM query

2020-03-24 Thread Waldek Kozaczuk
On Tue, Mar 24, 2020 at 17:34 Rick Payne  wrote:

>
> I backed out the original patch, applied the other two. Do I need the
> first one still?


Nope. You should be fine.

> We're not on master, but we're pretty close. Last
> synced on March 3rd, commit ref 92eb26f3a645
>
> scripts/build check runs fine:
>
> OK (131 tests run, 274.

Well there may be a bug in my patches. It looks my patches make your
situation even worse. Did you try to connect with gdb to get better stack
trace? Ideally ‘osv info threads’

>
>
> Rick
>
> On Tue, 2020-03-24 at 17:15 -0400, Waldek Kozaczuk wrote:
> > Is it with the exact same code as on master with the latest 2 patches
> > I sent applied ? Does ‘scripts/build check’ pass for you?
> >
> > On Tue, Mar 24, 2020 at 16:56 Rick Payne 
> > wrote:
> > > I tried the patches, but its crashing almost instantly...
> > >
> > > page fault outside application, addr: 0x56c0
> > > [registers]
> > > RIP: 0x403edd23 
> > > RFL:
> > > 0x00010206  CS:  0x0008  SS:
> > > 0x0010
> > > RAX: 0x56c0  RBX: 0x200056c00040  RCX:
> > > 0x004c  RDX: 0x0008
> > > RSI: 0x004c  RDI: 0x200056c00040  RBP:
> > > 0x200041501740  R8:  0x
> > > R9:  0x5e7a7333  R10: 0x  R11:
> > > 0x  R12: 0x004c
> > > R13: 0x5e7a7333  R14: 0x  R15:
> > > 0x  RSP: 0x2000415016f8
> > > Aborted
> > >
> > > [backtrace]
> > > 0x40343779 
> > > 0x4034534d  > > exception_frame*)+397>
> > > 0x403a667b 
> > > 0x403a54c6 
> > > 0x1174c2b0 
> > > 0x 
> > >
> > > (gdb) osv heap
> > > 0x8e9ad000 0x22a53000
> > > 0x8e9a2000 0x4000
> > >
> > > Rick
> > >
> > >
> > >
> > > On Mon, 2020-03-23 at 22:06 -0700, Waldek Kozaczuk wrote:
> > > > I have sent a more complete patch that should also address
> > > > fragmentation issue with requests >= 4K and < 2MB.
> > > >
> > > > On Monday, March 23, 2020 at 6:12:51 PM UTC-4, Waldek Kozaczuk
> > > wrote:
> > > > > I have just sent a new patch to the mailing list. I am hoping
> > > it
> > > > > will address the OOM crash if my theory of heavy memory
> > > > > fragmentation is right. It would be nice if Nadav could review
> > > it.
> > > > >
> > > > > Regardless if you have another crash in production and are able
> > > to
> > > > > connect with gdb, could you run 'osv heap' - it should show
> > > > > free_page_ranges. If memory is heavily fragmented we should see
> > > a
> > > > > long list.
> > > > >
> > > > > It would be nice to recreate that load in dev env and capture
> > > the
> > > > > memory trace data (BWT you do not need to enable backtrace to
> > > have
> > > > > enough useful information). It would help us better understand
> > > how
> > > > > memory is allocated by the app. I saw you send me one trace but
> > > it
> > > > > does not seem to be revealing anything interesting.
> > > > >
> > > > > Waldek
> > > > >
> > > > > On Monday, March 23, 2020 at 1:19:18 AM UTC-4, rickp wrote:
> > > > > > On Sun, 2020-03-22 at 22:08 -0700, Waldek Kozaczuk wrote:
> > > > > > >
> > > > > > >
> > > > > > > On Monday, March 23, 2020 at 12:36:52 AM UTC-4, rickp
> > > wrote:
> > > > > > > > Looks to me like its trying to allocate 40MB but the
> > > > > > available
> > > > > > > > memory
> > > > > > > > is 10GB, surely? 10933128KB is 10,933MB
> > > > > > > >
> > > > > > >
> > > > > > > I misread the number - forgot about 1K.
> > > > > > >
> > > > > > > Any chance you could run the app outside of production with
> > > > > > memory
> > > > > > > tracing enabled -
> > > > > > >
> > > > > >
> > >
> https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py#tracing-memory-allocations
> > > > > >
> > > > > > >  (without --trace-backtrace) for while? And then we can
> > > have a
> > > > > > better
> > > > > > > sense of what kind of allocations it makes. The output of
> > > > > > trace
> > > > > > > memory-analyzer would be really helpful.
> > > > > >
> > > > > > I can certainly run that local with locally generated
> > > workloads,
> > > > > > which
> > > > > > should be close enough - but we've never managed to trigger
> > > the
> > > > > > oom
> > > > > > condition that way (other than by really constraining the
> > > memory
> > > > > > artificially). It should be close enough though - let me see
> > > what
> > > > > > I can
> > > > > > do.
> > > > > >
> > > > > > Rick
> > > > > >
> > > > > >
> > > >
> > > > --
> > > > You received this message because you are subscribed to the
> > > Google
> > > > Groups "OSv Development" group.
> > > > To unsubscribe from this group and stop receiving emails from it,
> > > > send an email to osv-dev+unsubscr...@googlegroups.com.
> > > > To view this discussion on the web visit
> > > >
> > >
> https://groups.google.com/d/msgid/osv-dev/e88bb103-ca6f-4f12-b944-

Re: [osv-dev] Re: OOM query

2020-03-24 Thread Rick Payne


I backed out the original patch, applied the other two. Do I need the
first one still? We're not on master, but we're pretty close. Last
synced on March 3rd, commit ref 92eb26f3a645

scripts/build check runs fine:

OK (131 tests run, 274.753 s)

Rick

On Tue, 2020-03-24 at 17:15 -0400, Waldek Kozaczuk wrote:
> Is it with the exact same code as on master with the latest 2 patches
> I sent applied ? Does ‘scripts/build check’ pass for you?
> 
> On Tue, Mar 24, 2020 at 16:56 Rick Payne 
> wrote:
> > I tried the patches, but its crashing almost instantly...
> > 
> > page fault outside application, addr: 0x56c0
> > [registers]
> > RIP: 0x403edd23 
> > RFL:
> > 0x00010206  CS:  0x0008  SS: 
> > 0x0010
> > RAX: 0x56c0  RBX: 0x200056c00040  RCX:
> > 0x004c  RDX: 0x0008
> > RSI: 0x004c  RDI: 0x200056c00040  RBP:
> > 0x200041501740  R8:  0x
> > R9:  0x5e7a7333  R10: 0x  R11:
> > 0x  R12: 0x004c
> > R13: 0x5e7a7333  R14: 0x  R15:
> > 0x  RSP: 0x2000415016f8
> > Aborted
> > 
> > [backtrace]
> > 0x40343779 
> > 0x4034534d  > exception_frame*)+397>
> > 0x403a667b 
> > 0x403a54c6 
> > 0x1174c2b0 
> > 0x 
> > 
> > (gdb) osv heap
> > 0x8e9ad000 0x22a53000
> > 0x8e9a2000 0x4000
> > 
> > Rick
> > 
> > 
> > 
> > On Mon, 2020-03-23 at 22:06 -0700, Waldek Kozaczuk wrote:
> > > I have sent a more complete patch that should also address
> > > fragmentation issue with requests >= 4K and < 2MB.
> > > 
> > > On Monday, March 23, 2020 at 6:12:51 PM UTC-4, Waldek Kozaczuk
> > wrote:
> > > > I have just sent a new patch to the mailing list. I am hoping
> > it
> > > > will address the OOM crash if my theory of heavy memory
> > > > fragmentation is right. It would be nice if Nadav could review
> > it.
> > > > 
> > > > Regardless if you have another crash in production and are able
> > to
> > > > connect with gdb, could you run 'osv heap' - it should show
> > > > free_page_ranges. If memory is heavily fragmented we should see
> > a
> > > > long list.
> > > > 
> > > > It would be nice to recreate that load in dev env and capture
> > the
> > > > memory trace data (BWT you do not need to enable backtrace to
> > have
> > > > enough useful information). It would help us better understand
> > how
> > > > memory is allocated by the app. I saw you send me one trace but
> > it
> > > > does not seem to be revealing anything interesting.
> > > > 
> > > > Waldek 
> > > > 
> > > > On Monday, March 23, 2020 at 1:19:18 AM UTC-4, rickp wrote:
> > > > > On Sun, 2020-03-22 at 22:08 -0700, Waldek Kozaczuk wrote: 
> > > > > > 
> > > > > > 
> > > > > > On Monday, March 23, 2020 at 12:36:52 AM UTC-4, rickp
> > wrote: 
> > > > > > > Looks to me like its trying to allocate 40MB but the
> > > > > available 
> > > > > > > memory 
> > > > > > > is 10GB, surely? 10933128KB is 10,933MB 
> > > > > > > 
> > > > > > 
> > > > > > I misread the number - forgot about 1K. 
> > > > > > 
> > > > > > Any chance you could run the app outside of production with
> > > > > memory 
> > > > > > tracing enabled - 
> > > > > > 
> > > > > 
> > https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py#tracing-memory-allocations
> > > > >  
> > > > > >  (without --trace-backtrace) for while? And then we can
> > have a
> > > > > better 
> > > > > > sense of what kind of allocations it makes. The output of
> > > > > trace 
> > > > > > memory-analyzer would be really helpful. 
> > > > > 
> > > > > I can certainly run that local with locally generated
> > workloads,
> > > > > which 
> > > > > should be close enough - but we've never managed to trigger
> > the
> > > > > oom 
> > > > > condition that way (other than by really constraining the
> > memory 
> > > > > artificially). It should be close enough though - let me see
> > what
> > > > > I can 
> > > > > do. 
> > > > > 
> > > > > Rick 
> > > > > 
> > > > > 
> > > 
> > > -- 
> > > You received this message because you are subscribed to the
> > Google
> > > Groups "OSv Development" group.
> > > To unsubscribe from this group and stop receiving emails from it,
> > > send an email to osv-dev+unsubscr...@googlegroups.com.
> > > To view this discussion on the web visit 
> > > 
> > https://groups.google.com/d/msgid/osv-dev/e88bb103-ca6f-4f12-b944-e2d1391e2f8e%40googlegroups.com
> > > .
> > 

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/75323fbb15af066032b00c4d0bcfce0f10b2c9af.camel%40rossfell.co.uk.


Re: [osv-dev] Re: OOM query

2020-03-24 Thread Waldek Kozaczuk
Is it with the exact same code as on master with the latest 2 patches I
sent applied ? Does ‘scripts/build check’ pass for you?

On Tue, Mar 24, 2020 at 16:56 Rick Payne  wrote:

>
> I tried the patches, but its crashing almost instantly...
>
> page fault outside application, addr: 0x56c0
> [registers]
> RIP: 0x403edd23 
> RFL:
> 0x00010206  CS:  0x0008  SS:  0x0010
> RAX: 0x56c0  RBX: 0x200056c00040  RCX:
> 0x004c  RDX: 0x0008
> RSI: 0x004c  RDI: 0x200056c00040  RBP:
> 0x200041501740  R8:  0x
> R9:  0x5e7a7333  R10: 0x  R11:
> 0x  R12: 0x004c
> R13: 0x5e7a7333  R14: 0x  R15:
> 0x  RSP: 0x2000415016f8
> Aborted
>
> [backtrace]
> 0x40343779 
> 0x4034534d 
> 0x403a667b 
> 0x403a54c6 
> 0x1174c2b0 
> 0x 
>
> (gdb) osv heap
> 0x8e9ad000 0x22a53000
> 0x8e9a2000 0x4000
>
> Rick
>
>
>
> On Mon, 2020-03-23 at 22:06 -0700, Waldek Kozaczuk wrote:
> > I have sent a more complete patch that should also address
> > fragmentation issue with requests >= 4K and < 2MB.
> >
> > On Monday, March 23, 2020 at 6:12:51 PM UTC-4, Waldek Kozaczuk wrote:
> > > I have just sent a new patch to the mailing list. I am hoping it
> > > will address the OOM crash if my theory of heavy memory
> > > fragmentation is right. It would be nice if Nadav could review it.
> > >
> > > Regardless if you have another crash in production and are able to
> > > connect with gdb, could you run 'osv heap' - it should show
> > > free_page_ranges. If memory is heavily fragmented we should see a
> > > long list.
> > >
> > > It would be nice to recreate that load in dev env and capture the
> > > memory trace data (BWT you do not need to enable backtrace to have
> > > enough useful information). It would help us better understand how
> > > memory is allocated by the app. I saw you send me one trace but it
> > > does not seem to be revealing anything interesting.
> > >
> > > Waldek
> > >
> > > On Monday, March 23, 2020 at 1:19:18 AM UTC-4, rickp wrote:
> > > > On Sun, 2020-03-22 at 22:08 -0700, Waldek Kozaczuk wrote:
> > > > >
> > > > >
> > > > > On Monday, March 23, 2020 at 12:36:52 AM UTC-4, rickp wrote:
> > > > > > Looks to me like its trying to allocate 40MB but the
> > > > available
> > > > > > memory
> > > > > > is 10GB, surely? 10933128KB is 10,933MB
> > > > > >
> > > > >
> > > > > I misread the number - forgot about 1K.
> > > > >
> > > > > Any chance you could run the app outside of production with
> > > > memory
> > > > > tracing enabled -
> > > > >
> > > >
> https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py#tracing-memory-allocations
> > > >
> > > > >  (without --trace-backtrace) for while? And then we can have a
> > > > better
> > > > > sense of what kind of allocations it makes. The output of
> > > > trace
> > > > > memory-analyzer would be really helpful.
> > > >
> > > > I can certainly run that local with locally generated workloads,
> > > > which
> > > > should be close enough - but we've never managed to trigger the
> > > > oom
> > > > condition that way (other than by really constraining the memory
> > > > artificially). It should be close enough though - let me see what
> > > > I can
> > > > do.
> > > >
> > > > Rick
> > > >
> > > >
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "OSv Development" group.
> > To unsubscribe from this group and stop receiving emails from it,
> > send an email to osv-dev+unsubscr...@googlegroups.com.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/osv-dev/e88bb103-ca6f-4f12-b944-e2d1391e2f8e%40googlegroups.com
> > .
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to osv-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/osv-dev/8e9dae61b2f412998468916508404d4867e759a3.camel%40rossfell.co.uk
> .
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CAL9cFfNp4ceOgdgmDZoLd6JMJb5XDuyp8zGOgTsZukBEQxODCA%40mail.gmail.com.


Re: [osv-dev] Re: OOM query

2020-03-24 Thread Rick Payne


I tried the patches, but its crashing almost instantly...

page fault outside application, addr: 0x56c0
[registers]
RIP: 0x403edd23 
RFL:
0x00010206  CS:  0x0008  SS:  0x0010
RAX: 0x56c0  RBX: 0x200056c00040  RCX:
0x004c  RDX: 0x0008
RSI: 0x004c  RDI: 0x200056c00040  RBP:
0x200041501740  R8:  0x
R9:  0x5e7a7333  R10: 0x  R11:
0x  R12: 0x004c
R13: 0x5e7a7333  R14: 0x  R15:
0x  RSP: 0x2000415016f8
Aborted

[backtrace]
0x40343779 
0x4034534d 
0x403a667b 
0x403a54c6 
0x1174c2b0 
0x 

(gdb) osv heap
0x8e9ad000 0x22a53000
0x8e9a2000 0x4000

Rick



On Mon, 2020-03-23 at 22:06 -0700, Waldek Kozaczuk wrote:
> I have sent a more complete patch that should also address
> fragmentation issue with requests >= 4K and < 2MB.
> 
> On Monday, March 23, 2020 at 6:12:51 PM UTC-4, Waldek Kozaczuk wrote:
> > I have just sent a new patch to the mailing list. I am hoping it
> > will address the OOM crash if my theory of heavy memory
> > fragmentation is right. It would be nice if Nadav could review it.
> > 
> > Regardless if you have another crash in production and are able to
> > connect with gdb, could you run 'osv heap' - it should show
> > free_page_ranges. If memory is heavily fragmented we should see a
> > long list.
> > 
> > It would be nice to recreate that load in dev env and capture the
> > memory trace data (BWT you do not need to enable backtrace to have
> > enough useful information). It would help us better understand how
> > memory is allocated by the app. I saw you send me one trace but it
> > does not seem to be revealing anything interesting.
> > 
> > Waldek 
> > 
> > On Monday, March 23, 2020 at 1:19:18 AM UTC-4, rickp wrote:
> > > On Sun, 2020-03-22 at 22:08 -0700, Waldek Kozaczuk wrote: 
> > > > 
> > > > 
> > > > On Monday, March 23, 2020 at 12:36:52 AM UTC-4, rickp wrote: 
> > > > > Looks to me like its trying to allocate 40MB but the
> > > available 
> > > > > memory 
> > > > > is 10GB, surely? 10933128KB is 10,933MB 
> > > > > 
> > > > 
> > > > I misread the number - forgot about 1K. 
> > > > 
> > > > Any chance you could run the app outside of production with
> > > memory 
> > > > tracing enabled - 
> > > > 
> > > https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py#tracing-memory-allocations
> > >  
> > > >  (without --trace-backtrace) for while? And then we can have a
> > > better 
> > > > sense of what kind of allocations it makes. The output of
> > > trace 
> > > > memory-analyzer would be really helpful. 
> > > 
> > > I can certainly run that local with locally generated workloads,
> > > which 
> > > should be close enough - but we've never managed to trigger the
> > > oom 
> > > condition that way (other than by really constraining the memory 
> > > artificially). It should be close enough though - let me see what
> > > I can 
> > > do. 
> > > 
> > > Rick 
> > > 
> > > 
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to osv-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/osv-dev/e88bb103-ca6f-4f12-b944-e2d1391e2f8e%40googlegroups.com
> .

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/8e9dae61b2f412998468916508404d4867e759a3.camel%40rossfell.co.uk.