Hi, Alexei, all, Just another idea for startup optimizations pops out of our talk with Egor Pasko. :)
As you probably know there are many places in VM and JIT that use locking for safety reasons. Most of this locking is driven by mutexes, that is, the kernel calls. That's a good option in case of contention, because such locking will need arbitration (e.g. "who will take the mutex next"?) from kernel side. But what if that locking is not contended? Even then we will make the kernel call for trying to catch the mutex. Linux has long ago implemented such thing as "fast user-space mutex", "futex" [1]. Generally it is simple memory region that could be incremented/decremented atomically. In case of contention futex, of course, will resort to kernel-side mutex. That mean we could save precious time using futexes instead of mutexes: we definitely will save on kernel call time. AFAIK, current implementation of porting layer has no support for futexes even on Linux side. And so we might try to implement them for Windows part and use the Linux-provided futex'es on Linux part. Then after the implementation of hyfutex_lock/unlock we might try to migrate performance-significant locks to futexes one-by-one. Profilers are good directions, maybe anywhere else too. What do you think? Thanks, Aleksey, ESSD, Intel [1] http://en.wikipedia.org/wiki/Futex
