Re: [bitc-dev] Runtime issues with unboxed arrays

Ben Kloosterman Mon, 28 Oct 2013 19:46:59 -0700

On Tue, Oct 29, 2013 at 1:33 AM, Jonathan S. Shapiro <[email protected]>wrote:

> On Sun, Oct 27, 2013 at 10:20 PM, Ben Kloosterman <[email protected]>wrote:
>
>> I thought Bartok had some mechanism for working out the worst cost of a
>> call tree and hence reduce the checks..
>>
>
> Oh. Yeah. I'd forgotten about that.
>
> Remember that Bartok is a whole-program compiler, so it can do
> interprocedural analysis of this kind. Which is great when you have it in
> hand, but whole-program isn't really a viable compilation method at scale.
>

Whole progamming with a modern type system  or whole program with loose
team control ? ... Linux kernel is whole program and very few apps will
reach that scale.   Scale can also be done via SOA/COM/IPC message pumped
services breaking big apps into more reuseable components.  It is diffirent
though to what some progammers are used to .. and you still need some libs
just not shared libs .. However have shared libs / DLLs gone past there
use-buy date? .NET libs for example are not shared.

>
>> Curiously it was not the check cost on Rust that was the nail in the
>> coffin ( It was significant ) but the large amount of allocations  (which
>> caused cache , TLB probs  etc  ) .
>>
>
> This should not have been the case. In reasonably structured programs the
> stack just isn't that big. They may have picked the wrong segment size, and
> it sounds like they didn't do a very good job recycling segments.
>

Hard to say how big the stack is . They use the stack a lot more , Task /
Regions can mostly be stack allocated , things like closures are on the
stack etc. I dont know they reuse segment code ( hope they are not getting
new pages i suspect it just comes and goes to a seperate area of the heap)
..  Expansion to the desired size rapidly i think was an issue ( it was
exponential but 2* each time was not fast enough)  . So allocating a 10K
WORD element array would require say 5 expansions as it hit out of stack
5*.

> What was it they were allocating?
>

Just some general apps / micro benches.  They do have a default stack thats
very small so they can run lots of tasks without a huge memory overhead.

>
>
>> With a single segment  you can also use large pages but not if you need a
>> Read only page  though you may need a seperate path to handle small stacks
>> anyway . ( Default in CLR is 1 meg)
>>
>
> 1MB doesn't let you use large pages either. For calibration, it's not
> uncommon to run pthreads threads on 8K or 16K stacks.
>

Yep you do need to allow smaller pages but optomizing for default is
important.

>
> But even at 1MB, the large page issue doesn't really enter in to it. Which
> is why I'm okay with using a guard page.
>
1M / 2M not much diffirence for most apps.  However there will be little
access deeper than a afew pages so less TLB benefit

Note with a region system / region analysis stack allocation  there will be
changes
- Bigger stacks smaller heaps
- Frequent access deep into the stack
- You could have 50-100% of the app end up on the stack. Which has serious
implications on the stack design.

And
- Much bigger stack to scan.

So 2M may be a better default with a region system though. ( If it wasnt
for a guard page) ..

Should regions push objects into the stack if it can or always push them
into a heap allocated region ?

Ben

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] Runtime issues with unboxed arrays

Reply via email to