Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
Hi Arne, Arne Babenhauserheide writes: > I don’t understand the implications of this, but I realized that I would > love to have info about the sizes of different datastructures in Guile. Sure. > I recently started measuring how much memory is required for a record > compared to a list, so I could give people concrete guidelines which > datastructures to use for which algorithm, and for that such low-level > details would be great to have! (or just to know where to read them up — > or which keywords to look for) For core data structures implemented in C, the place to look is in the corresponding *.h and *.c files in libguile, for example struct.[ch], strings.[ch], numbers.[ch], etc. However, pairs receive special handling, and only part of their implementation is in pairs.[ch]. I will summarize the costs of some important structures below: In the following, a "word" is the size of a pointer (4 bytes on a 32-bit system, or 8 bytes on a 64-bit system). I will _not_ include the SCM word itself in the accounting of the size of the object represented by that SCM. That's because for heap-allocated objects, the SCM is simply a pointer to the object, and there can be many pointers to the same object. The cost of SCM words will instead be included in the cost of the aggregate objects that contain them. In this method of accounting, immediate objects cost nothing, because they are entirely stored within the SCM itself. Pairs cost 2 words. Lists cost 2 words per element. The empty list is immediate, and therefore costs nothing. In current 'master', records cost 1 word per field, plus another word, rounded up to the nearest multiple of 2 words. In 2.0 and 2.2, they cost 1 word per field, plus another _2_ words, rounded up Vectors cost 1 word per element, plus another word, rounded up to the nearest multiple of 2 words. Bytevectors cost 4 words plus the data itself, rounded up to the nearest multiple of 2 words. As a special case, empty bytevectors cost nothing. Non-immediate inexact real numbers cost 16 bytes. Non-immediate complex numbers cost 24 bytes on 32-bit systems, or 32 bytes on 64-bit systems. In the common case, strings currently cost 6 words plus the character data itself, rounded up to the nearest multiple of 2 words. The character data costs either 1 byte per character (if all code points are less than 256, i.e. the string is latin-1), or 4 bytes per character. (However, I may propose a new string representation for 3.0). As a special case, the empty string costs nothing. Finally, I should mention that the ability to share substructure in lists and pairs make them potentially more memory efficient than vectors, depending on the use case. For example, if you need a structure with 4 fields, and a common operation in your application will involve copying the structure except for one field which is changed, then you might be able save memory by using lists, even though a 4-element list requires more memory than a 4-element vector or struct. If you put the commonly-changed field in the CAR, then you can update that field by allocating only a single pair (2 words) and sharing the rest of the structure with the old version. This is not possible with vectors or structs, where you must allocate a fresh new object of 5 or 6 words to perform the same operation. Regards, Mark
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
Hi, Mark H Weaver skribis: > Ludovic Courtès writes: >> Though an immediate, like a fixnum or an iflo, is still something >> different from a tagged heap object like a pair, right? So I would >> expect SCM_THOB_P to be a different test, not a drop-in replacement for >> SCM_NIMP, is that correct? > > That's right. It's not possible to create a drop-in replacement for > SCM_NIMP, because it is being used to answer two different questions > which used to be effectively equivalent, but no longer are: > > (1) Is X a pointer to a heap object with a heap tag in the first word? > (2) Is X a reference to a heap object? > > Test (1) needs to be done before checking the heap tag, to implement > type predicates for heap objects. Test (2) is needed in relatively few > places, e.g. to decide whether to register disappearing links when > adding an entry to a weak hash table. > > Actually, in my current branch I've removed the SCM_IMP and SCM_NIMP > macros outright, because it seems to me they are likely to be misused. > > SCM_THOB_P implements test (1) and SCM_HEAP_OBJECT_P implements test (2). I see. > There's no masking involved. Rather, it is subtracted from the pointer, > which allows the tag to be fused with the field offset. For example, on > x86-64, whereas CAR and CDR were previously: > > 1c0: 48 8b 07mov(%rdi),%rax ;old car > and: > 1d0: 48 8b 47 08 mov0x8(%rdi),%rax ;old cdr > > Now they become: > > 1e0: 48 8b 47 fa mov-0x6(%rdi),%rax ;new car > and: > 1f0: 48 8b 47 02 mov0x2(%rdi),%rax ;new cdr Looks reasonable. :-) > Fortunately, BDW-GC provides GC_REGISTER_DISPLACEMENT, which allows us > to register a small offset K, such that BDW-GC should recognize pointers > that point K bytes into a heap block. We've been using this in both 2.0 > and 2.2 from scm_init_struct (), and in 'master' it's done in > scm_storage_prehistory (). > > This new approach entails registering one additional displacement. Cool; if it’s just one displacement, that’s OK. > What do you think? It all looks perfectly reasonable to me! Thanks for explaining, Ludo’.
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
Ludovic Courtès writes: > Though an immediate, like a fixnum or an iflo, is still something > different from a tagged heap object like a pair, right? I should clarify that in this new approach, a pair is *not* a tagged heap object. Tagged heap objects are those which have a tag in their first word. In this new approach, pairs are not tagged in this sense, and it is no longer possible to distinguish a pair from a non-pair by looking at their raw heap blocks. Tagged heap objects (thobs) have 000 in the low 3 bits of the SCM. Pairs have 0110 in the low 4 bits of the SCM. Regards, Mark
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
Hi Ludovic, Ludovic Courtès writes: > Though an immediate, like a fixnum or an iflo, is still something > different from a tagged heap object like a pair, right? So I would > expect SCM_THOB_P to be a different test, not a drop-in replacement for > SCM_NIMP, is that correct? That's right. It's not possible to create a drop-in replacement for SCM_NIMP, because it is being used to answer two different questions which used to be effectively equivalent, but no longer are: (1) Is X a pointer to a heap object with a heap tag in the first word? (2) Is X a reference to a heap object? Test (1) needs to be done before checking the heap tag, to implement type predicates for heap objects. Test (2) is needed in relatively few places, e.g. to decide whether to register disappearing links when adding an entry to a weak hash table. Actually, in my current branch I've removed the SCM_IMP and SCM_NIMP macros outright, because it seems to me they are likely to be misused. SCM_THOB_P implements test (1) and SCM_HEAP_OBJECT_P implements test (2). >> (2) Our existing VM instructions almost invariably specify offsets with >> a granularity of whole words. To support tagged pair pointers with >> good performance, I think we need a few new instructions that >> specify byte offsets, to avoid the expensive extra step of removing >> the tag before accessing the CAR or CDR of a pair. > > So instead of a pointer dereference, SCM_CAR becomes mask + dereference, > right? There's no masking involved. Rather, it is subtracted from the pointer, which allows the tag to be fused with the field offset. For example, on x86-64, whereas CAR and CDR were previously: 1c0: 48 8b 07mov(%rdi),%rax ;old car and: 1d0: 48 8b 47 08 mov0x8(%rdi),%rax ;old cdr Now they become: 1e0: 48 8b 47 fa mov-0x6(%rdi),%rax ;new car and: 1f0: 48 8b 47 02 mov0x2(%rdi),%rax ;new cdr In the VM, I've added four new instructions: make-tagged-non-immediate dst:12 tag:12 offset:32 tagged-allocate-words/immediate dst:8 count:8 tag:8 tagged-scm-ref/immediate dst:8 obj:8 byte-offset:8 tagged-scm-set!/immediate obj:8 byte-offset:8 val:8 The last two instructions above are like 'scm-ref/immediate' and 'scm-set!/immediate' except that they accept byte offsets instead of word offsets. CAR and CDR become: (tagged-scm-ref/immediate DST SRC -6);CAR and: (tagged-scm-ref/immediate DST SRC 2) ;CDR (although at present the -6 prints as 250 in the disassembler). > I think we disable GC “interior pointer” scanning. I agree. > With this scheme, an SCM for a pair would actually point in the middle > of a pair; could this be an issue for GC? It is already the case that Guile has tagged pointers in the first word of every struct. The first word of a struct contains a pointer to the vtable, with scm_tc3_struct added. Fortunately, BDW-GC provides GC_REGISTER_DISPLACEMENT, which allows us to register a small offset K, such that BDW-GC should recognize pointers that point K bytes into a heap block. We've been using this in both 2.0 and 2.2 from scm_init_struct (), and in 'master' it's done in scm_storage_prehistory (). This new approach entails registering one additional displacement. In 2.0 and 2.2, we register displacements 0, 16, and 17. The last two are for struct vtables, which point 2 words into a heap block, even before adding scm_tc3_struct (which is 1). In current master, we register 0 and 1 (scm_tc3_struct). With this new approach, we would register 0, 1, and 6 (scm_pair_tag). What do you think? Thanks, Mark
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
Hello, Mark H Weaver skribis: > Ludovic Courtès writes: [...] >> IIUC, your plan is to have a different tagging on 32-bit platforms, >> without fixflos, right? I’m curious to see how much complexity would >> entail from that. > > Yes, although I'm avoiding the term "fixflos" because IEEE doubles are > also fixed width, and thus the term "fixflos" wouldn't adequately > distinguish them from IEEE doubles. Right! > Anyway, I agree that it's inconvenient to have different tags on > different targets, and I've been working to minimize the differences. > > At present, I'm currently implementing an alternative strategy where > pairs are tagged in their pointers instead of in their CARs, which > enables us to separate the heap tags and immediate tags into two > independent spaces. At first this sounds rather radical :-), but maybe it’s preferable this way. > In this new approach, the heap tags are left unchanged, and the only > tags that vary with target word size are the fixints, fixrats, iflos, > and pair pointers. All other tags will be uniform across targets, > including the non-number immediates. Here's the new version: > > ;; /* with iflos: xxx: iflo (000 < xxx < 110) > ;;(64-bit) 0111: fixrat > ;; : fixnum > ;; 0110: pair > ;; 000: tagged heap object (thob) > ;; 1110: other immediate > ;; > ;; without iflos: 1: fixnum > ;;(32-bit) 010: fixrat > ;; 100: pair > ;; 000: tagged heap object (thob) > ;; 1110: other immediate > > This new approach brings its own complications, mainly two: > > (1) It breaks the long-standing assumptions in Guile that all > non-immediates have a tag in their first word and that pointers are > always untagged. In my preliminary patch, I introduce a new concept > called a "tagged heap object" or "thob", and most existing checks > for SCM_NIMP or !SCM_IMP must be changed to use SCM_THOB_P. Though an immediate, like a fixnum or an iflo, is still something different from a tagged heap object like a pair, right? So I would expect SCM_THOB_P to be a different test, not a drop-in replacement for SCM_NIMP, is that correct? > (2) Our existing VM instructions almost invariably specify offsets with > a granularity of whole words. To support tagged pair pointers with > good performance, I think we need a few new instructions that > specify byte offsets, to avoid the expensive extra step of removing > the tag before accessing the CAR or CDR of a pair. So instead of a pointer dereference, SCM_CAR becomes mask + dereference, right? I think we disable GC “interior pointer” scanning. With this scheme, an SCM for a pair would actually point in the middle of a pair; could this be an issue for GC? Thank you! Ludo’.