Note this paper mentioned above allows no header ( including Vtable)
 using Typed Virtual Addressing  (TVA) , basically objects of a given type
live in a certain virtual  memory area which indicates the type. This can
apply to some or all of the objects.  Its actually more viable than it
first apears ( to me anyway) . Im not so keen on pointer compression on non
TVA types.

The performance cost of a header in the paper agrees what was said in early
papers ( eg 3% per word + the header overhead ) and my guestimate was 7-10%
. The cost with  TVA  is  5-10%.

Lastly TVA can work nicely with a Nursery allocating blocks. The problems
of fragmentation for the main heap seems significant ( despite doing some )
and with RC schemes few objects relocate hence the header cost is low (
given a reasonable nursery size) , so to me it Synergizes with URC .( And
its possible they even use this technique , they say no header in the
nursery- i thought they just meant GC but this could mean no Vtable ptr
either , then again the paper was later and they would have mentioned it) .

What interestes me here also is  the "managed cost" , Java and C# ( on
windows) are not far behind native yet they pay heavy costs ie header ,
bounds checking , card table , GC  , no embedded value types if you add
those costs it means the base performance is higher than the native
languages.  Probably due to JIT and Polymorphic inline caches ( which alone
were measured at 14%) .  I note some people using LLVM have noted  that the
execution engine ( which compiles an encoded IR) runs faster in a lot of
cases.

Anyway i  have been inspired and think i will build something to
investigate the higher perf costs  ( and learn more)  , bitc may benefit
from the experiment if i get somewhere.

Use Mono compiler & parser .( No Generics , prob C#1 )
M1 Change compiler to emit LLVM  ( update code from LLVM# to generate LLVM
IR instead of CIL) , 1 month ( 2-3 months later to mature it)
M2 Add ref counting for allocations ( just a malloc wrap at this point)
 and get 20-30 test compiles working. 1/2 - 1 week + 2 week for tests
M3 Remove unsafe from fixed blocks by having a safe way to address it
(slices or gen array code)  and add compile failure on auto boxing if type
is marked nobox.. This is very tricky in C#. 2 weeks + slices ?
M4 Upgrade GC to  add no header on Nursery objects , block allocated
nursery ( local thread allocator)  , with a global pause and parralel mark/
sweep .  Small global pauses  is obviously small nursery.2 weeks.
M5 Add better allocation to main Nursery  , improved ref counting .
Extensive Benchmarks . 1 -2 months
M6 Add Slices and strings standard lib. 2 weeks
M7 Add VecImp  , 2 months
M8 Add Cinterop ( which users can use to host as winrt objects) . 1 week
M9 Deploy LLVM execution engine libs as default , native as option.  No
Bounds checking on  Single threaded release code.
M10 Remove cycles , 1 month

And i think i will have something useful for implementing high performance
low pause algorithms and attract some C# mono/windows/OS people.

Ben
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to