On Wed, Oct 16, 2013 at 2:28 AM, Jonathan S. Shapiro <[email protected]>wrote:
> On Tue, Oct 15, 2013 at 11:03 AM, Bennie Kloosteman <[email protected]>wrote: > >> On Tue, Oct 15, 2013 at 10:53 PM, Jonathan S. Shapiro >> <[email protected]>wrote: >> >>> >>> A value type can live on the heap, and when it does, it will be >>> "wrapped" by a conventional object header. If you like, you can imagine >>> that for every value type V there is a corresponding reference type V_ref, >>> and the two are assignment compatible by dispensation. >>> >> >> Yep but that would be poor design in a ref counted language , you would >> often put many in a single object , try to use regions and stack more etc.. >> > > Umm. I didn't mean to suggest that *interior* value type instances would > get headers added. Only the outermost. And in a reference counting design, > that seems to be the only option in designs where boxed objects get a > header at all. > Im not too concerned about boxed objects ( only that we can explicitly deny or warn on boxing) , what im saying is we will get far more interior value type objects than say java which only has them on base types or C# even . We will get ( for high perfomance but not maintenance) larger objects composed of many of these interior values. ( Which i know makes headers less of an issue :-) ) BTW Do we have a better name than value types as it doesnt mean they are always copied by value .. they are really subobjects without a header. But you do (implicitly) raise the problem of whether interior references > can be supported. That's a big can of worms, but it seems to be orthogonal > to the GC vs. RC choice. That's something we should take up separately, I > think. > > >> Agree imutable is important and also note string implimentation as an >> array reference vs an embeded char[]. Ref counting has huge implications on >> design ..its going to be hard getting a standard lib that works well with >> ref counting and a GC. >> > > I'm not sure why ref counting should have any impact on string > implementation. The question of whether the string payload immediately > follows the header or is stored separately has more to do with relocation > concerns than with GC/RC in my mind. Above a certain size, you really don't > want the string payload stored contiguously. > > Can you explain what you see as the relationship between GC/RC and string > design? > With RC every time you have a string reference the algorithm needs to put the internal char array object reference on the stack to work on it ( note i used the term array reference ) .. that introduces a count . With an embeded array your really looking at string[0]. Now if the internal array is a seperatly allocated value type then its not an issue as its really a pointer not a reference ( and leads into whether internal referances can be supported or wthether they are really pointers or reference offsets ) , The key point though is it becomes much worse to to have an object store another object hence RC has an impact on design . WHich IMHO will make it preform better as the libs mature . > > >> >> >> >>> When i wrote that i was thinking reference could be 11-12 bytes >>>> structures with the pointer ( or possibly masked high bits in a 64 bit >>>> pointer or 32 bit pointer as an option on a 64 bit machine) and some >>>> flags (freeze / release immutable ) and a counter so you dont have to have >>>> a header. >>>> >>> >>> Adding size to the reference is far worse than having an object header. >>> Unless you can play mapping games to implement lazy masking, masking is >>> also expensive. On some machines you can used the VM system for masking, >>> because some virtual caches aren't physically anti-aliased. >>> >> >> Im not convinced of that ...ref counting on C++ does exactly that and >> the basic implimentation C++ is much faster than Java using bits in the >> header( yet alone a new header field) . >> > > Measurements please, because this is *incredibly* counter-intuitive, and > seems contrary to every hard measurement I know about concerning L1 cache > misses. Unless the implementation has changed a lot since I last looked, > the performance of C++ reference counting pointers is truly awful, and the > size penalty of using them is pretty significant. > smart_ptr do more and are slow ... I dont have measurement but i do know about 2 years ago i wacked up a simple interlock and added it to the pointer and it was a bit over 10% ( which could be 15) , cant find a paper only a micro bench which shows 13% compared to malloc http://www.codeproject.com/Articles/1648/The-fastest-smart-pointer-in-the-west ( this is just an interlock ) . Trivial java implimentation adding a field to the header before modern techniques were measured at somethign like 30% compared to MMtk .. . And in the C++ world 13% is aweful but they dont have any of the fancy techniques they used in Java to pull it down from 30-10% and 3% or so for rc-immix ! So an apples to apples comparison with header might be 15% to 30%... again this is complicated by the fact that in Java everything except a few basic types is an object and may be better with value types. Im also saying we should measure it ... We do know the cost of headers is also very significant. And in my experience on modern hw asumptions are often wrong trading memory for CPU is nearly always good and improves overal cache performance ( which is an asumption :-) ) > I think the cost of the object header is greater than we think... >> > > Maybe so. But a significant number of objects do have in-degree higher > than one, and for those objects it is much better to use a one-word header > than an additional word per pointer. There are also interop issues when fat > pointers are used. > Its not as clear as that IF 70-80% of objects have no reference eg stirngs have no reference and objects holding just int , matrix , points , strings themselves etc You would need an average degree of 5 to equal this and this is not the case.. Now if 50% are interior values or value objects on stacks and regions the figure changes to needing an average degree of 2.5. On the other hand if 10% are interior value you are looking at 4.5 and IMHO the pointer + field may be better . We dont know this figure but it makes a significant diffirence. for your own system yes you can add 1 64 bit or maybe even 32 bit . But the header is determined by the runtime on 64 bit CLR thats 16 bytes ., Jikes is also 16 ytes . Now these runtimes carry this burden when compared to C and C++..which is likely a significant factor in the performance to native .. So a CLR bitc will carry this cost in any benchmark against C . And we now have some figure on the header cost. Part of this does not directly relate to Bitc just my shock at the measured cost they wear for a 12 (64 bit JVM) - 16 byte header (CLR/Jikes) . If you extrapolate 3% per 32 bits on 64 bit Jikes thats 12% + header management costs . Is this most of the managed cost , not the GC but the GC needing a header ? Or is it poor design of the run times. Finally, I'm not at all convinced that the per-object header can be > eliminated when fat pointers are used. You only have to need a single bit > for object forwarding to require a full word of object header. > Maybe ...there are other ways .. and its a bonus for URC not needing it . > > Im pretty sure the fastest will be 32 bit pointer on 64 bit machine .. >> > > I assume you mean to be filtering out the high 32 bits? Then you might be > surprised. The reduction in D-cache utilization from this is pretty serious. > Not filtering just an indexed 32 bit load instead of a 64 bit load at +0 or +4 ( depending on whether you want the reference or counts and flags) . Program obviously limited to 4G ( for large memory run large mode eg add field or mask ) . I dont see this affecting D-cache ( it should improve it as there is no header and the same size pointer ) .. For extra fields which you meant I think its hard to make a judgement on D-cache utilization maybe you know better but loading the header to update teh count to me means its similar . I think it will be proportional to overall memory usage change . You can say an objet with lots of references will have worse D- cache but string for example would be better as there is no reference and no header... Obviously the ponter wouldnt run on any common/ existing VM but has a low cost , would be more competative with C++ and would cover probably 80% of implimentations soon ( asuming most phones follow iphones into 64 bit) . You then have 32 bit , 64 bit and 64 bit large mode ( which has an extra field) . If the header cost is 7-10% you would get most of that back on 64 bit. ( though i think it will be much less due to embedded value types and regions) Interop is worth considering , 2 apps on the same runtime is not an issue but for native to runtime this is not trivial , if the increment is done in the ptr then a c header can do the same thing mask or extra fields ..( and it will work with URC it just bypases the nursery) but everything needs to be a reference eg *bitc_ref or maybe even *bitc_ref<T>. If your going with rc-immix then you need 2 functions for the c app to call from the header instead of increment/decrement you can also pass raw bitc pointers unmodified but then you need to trust the client and seems a dangerous/ bad option . I dont see a huge diffirence but it needs more thought that the 1 minute i just gave it ... > > >> So whats the cost of 2 32 bit fields + the header overhead itself 7-10% >> ? >> > > Fair question, except that I don't see any use case for more than one word > of object header at this point (unless you count the vtable pointer as part > of the header, which I do not). > 2 32 bit + header cost is one word on 64 bit ... its a memory allocation/copy cost so its probably a fraction less than 7-10% but not half . Maybe you could use a 32 bit headers on 64 bit machine , if its just type and the count bits its ok but the allignment and fragmentation and wasted space alligning objects may ofset this. The runtime builders arent stupid ( I hope) and they would have tested more packed headers ( which they use in some environments) . > What do you imagine lives in that 3-word header? > All you need is type and some flags/ counts . Ben
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
