> > > > >> I have never cared that much about mem usage 150% or 200% min heap im >> fine with and reduced pauses are a nice to have ( which regions and better >> stack based algorithms will do anyway) . >> > > One of the things that I find curious is that there are no measurements in > the literature for malloc/free under constrained total heap size, and also > that nobody talks about the overhead of the arena management structures. In > most mallocs there is an 8 to 16 byte overhead per allocated object (on > 32-bit systems). Java object headers generally run 8 bytes. Where I'm > going here is that the the required heap sizes may be closer than people > think, simply because corresponding studies do not appear to have been done. >
Agree + and there is more , eg the cost of SLUB etc to make it efficient for malloc ( I dont see them changing the OS MM in many recent papers either and SLUB is tuned for malloc but i think raw Buddy style allocators work better for GCs due to less but large 2 power allocations ) . Also the programmer is often forgotten if he wants to write memory constrainted algorithm after seeing or predicting memory is an issue then he can and it certainly will not be with huge amounts of new objects which changes the whole profile for when is an issue. Embedded value types help a lot here . > >> But i have changed my position a bit , >> 1) 10-20% just does not matter if you can express algortihms in SIMD ( >> see my other ad nauseum posts) and id much rather have basic RC as pointed >> out and SIMD expression as part of the language than a full RC-immix >> system. You get easier to write SIMD in the language and it can be called >> by native programs and you have a game changer - you need not much else . >> > > I think we all know by now that you are very excited about SIMD. I > understand that mono isn't doing a good job, but it's not hard to do. > No safe implimentations so far id like to see a paper :-) .. im not talking simple intrinsics ..im talking about expressing things is such a way that the compiler generates SIMD. Obviously copying vecImp should be ok ( at least you save a lot of design ) but - Can you do better expressing problems in a way that is easier to vectorieze ( eg something like an expanded form or collection operations like C# LINQ may help ) - Does it will work well for a safe languages . VecImp mentioned unsafe C like language , maybe they see safe as langueages with few valuye types and hence SIMD is more difficult but the emphasize on unsafe worries me even though i can see no reason for it, - One of the biggest pains of SIMD is getting the data into SIMD registers , doing more normal work in SIMD registers and loading them effifiently is an "art" at present. ( Obviously the more you have in SIMD registers already the cheaper it is to load them , loading them from normal registers sequentially is not efficient ) Im not a language guy ( at best learning ) but it doesnt seem that easy designing this as we really dont know how to express SIMD algortihms well or more importantly algorithms in a form that may be used for SIMD. If its easy we should get it out there ;-)... people are desperate for a better way of writing portable safe SIMD insstead of intrinsics/ inline asm you dont need much else. Functions some data types , memory safety and interface with C ( both ways) . > >> 2) the cost of concurrency is shocking me.. and i dont really see the >> reasons behind it. In languages with more pure functions it makes sense >> but with Java or C# it makes little sense. You have mutable data and 90% >> of the code will fail without a lock ...every class is marked in MSDN >> whether its thread safe. >> > > What numbers are you looking at that are shocking you? In languages with > pure functions there is essentially *no* cost to concurrency, so I don't > understand what you are saying there. In Java and C# the problem boils down > to maintaining memory consistency at the hardware level, which is > inherently expensive. > - eg for non concurrent code Simple RC - without the lock and atomic op , RC basically becomes a conditional and store in the header (or pointer ) both of which will be written to anyway at creation and often afterwards . I have seen simple single threaded smart pointers cost 10% vs malloc ( not vs a gc) but not with smart_ptr ( 20%) and when doing it on multi cores or multi CPU the concurrent smart_ptrs degrade rapidly . - Due to concurrent safety bounds check elimination is difficult , thats a big cost. - ARM has out of order instructions but then we may throw in memory barriers/ fences for single threaded code that MAY be concurrent. You can scrape back some of these with much more complex algorithms which incur further developend costs so there is a runtime complexity / development cost. For pure functional languages ( or a high amount of pure ) they dont have mutable data so you dont need to manually ensure the data is accessed in a safe way. So it makes a lot of sense ( in a backwards way because their gurantees are cheap ), but when you have lots of mutable data the guarantee loses IMHO most of its value because you need to ensure the code is concurrent anyway .. Obviously you need to throw in these guarantees for concurrent code and to wite simpler locks etc but id like a runtime / compiler gurantee that if im running single threaded i do not pay the penalty :-) .. LLVM does nothing so its all up to the compiler creating the IR so it should not be that hard.. if (singleThreaded) emit ... else . > > So im thinking Its better to have structures in the standard lib which >> help gurantee concurency not the runtime since you need locks and a >> carefull design anyway. Or at least tell the JIT to emit which. Maybe for >> loops have for and foreach as well as concurrent foreach and concurrent >> for.. >> > > Implementation of concurrent algorithms has to be done in the language > (possibly in the library). *Safety* of concurrency has to be done in the > compiler/runtime. > I dont think the destinction is that clear ... Safety of concurrency is done by the compiler /runtime if you guarantee safety , but if you have a partial guarantee then there could be interaction and language features ( and possibly attributes or helpers in a lib) . I supose the real gripe is not some safety in the compiler/runtime but a blanket gurantee for safety of concurrency . On x86 at least the gurantee is worthless unless you are using correct algorithms and designs and they dont help most people making thread safe objects as they normally use safe functions or just slap a lock around so who are you helping is a small % of already highly skilled people who write locks or non locxking concurrent code .. ( On ARM its more usefull) Not saying you cant have a whole lib /assembly marked safe since its multi threaded but why should my single threaded or LMAX / Actor pattern code be burdened with concurrency safety when my competitors in Javascript ,C and C++ are not. Especially when the development cost is low. This sort of design "Lightweight Concurrency Primitives for GHC" is of interest as well and would allow adding say transactional memory though its more for implementing the VM http://research.microsoft.com/en-us/um/people/simonpj/papers/lw-conc/lw-conc.pdf Ben
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
