Note this paper mentioned above allows no header ( including Vtable) using Typed Virtual Addressing (TVA) , basically objects of a given type live in a certain virtual memory area which indicates the type. This can apply to some or all of the objects. Its actually more viable than it first apears ( to me anyway) . Im not so keen on pointer compression on non TVA types.
The performance cost of a header in the paper agrees what was said in early papers ( eg 3% per word + the header overhead ) and my guestimate was 7-10% . The cost with TVA is 5-10%. Lastly TVA can work nicely with a Nursery allocating blocks. The problems of fragmentation for the main heap seems significant ( despite doing some ) and with RC schemes few objects relocate hence the header cost is low ( given a reasonable nursery size) , so to me it Synergizes with URC .( And its possible they even use this technique , they say no header in the nursery- i thought they just meant GC but this could mean no Vtable ptr either , then again the paper was later and they would have mentioned it) . What interestes me here also is the "managed cost" , Java and C# ( on windows) are not far behind native yet they pay heavy costs ie header , bounds checking , card table , GC , no embedded value types if you add those costs it means the base performance is higher than the native languages. Probably due to JIT and Polymorphic inline caches ( which alone were measured at 14%) . I note some people using LLVM have noted that the execution engine ( which compiles an encoded IR) runs faster in a lot of cases. Anyway i have been inspired and think i will build something to investigate the higher perf costs ( and learn more) , bitc may benefit from the experiment if i get somewhere. Use Mono compiler & parser .( No Generics , prob C#1 ) M1 Change compiler to emit LLVM ( update code from LLVM# to generate LLVM IR instead of CIL) , 1 month ( 2-3 months later to mature it) M2 Add ref counting for allocations ( just a malloc wrap at this point) and get 20-30 test compiles working. 1/2 - 1 week + 2 week for tests M3 Remove unsafe from fixed blocks by having a safe way to address it (slices or gen array code) and add compile failure on auto boxing if type is marked nobox.. This is very tricky in C#. 2 weeks + slices ? M4 Upgrade GC to add no header on Nursery objects , block allocated nursery ( local thread allocator) , with a global pause and parralel mark/ sweep . Small global pauses is obviously small nursery.2 weeks. M5 Add better allocation to main Nursery . Extensive Benchmarks . 2 weeks M6 Add Slices and strings standard lib. 2 weeks M7 Add VecImp , 2 months M8 Add Cinterop ( which users can use to host as winrt objects) . 1 week M9 Deploy LLVM execution engine libs as default , native as option. No Bounds checking on Single threaded release code. M10 Remove cycles , 1 month And i think i will have something useful for implementing high performance low pause algorithms and attract some C# mono/windows/OS people. And I can investigate the big "managed" costs. Ben
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
