On Sat, 2004-08-14 at 13:19, Prof Brian Ripley wrote: > On Sat, 14 Aug 2004, Marc Schwartz wrote: > > > > object.size("a") > > [1] 44 > > > > > object.size(letters) > > [1] 340 > > > > In the second case, as Tony has noted, the size of letters (a character > > vector) is not 26 * 44. > > Of course not. Both are character vectors, so have the overhead of any R > object plus an allocation for pointers to the elements plus an amount for > each element of the vector (see the end). > > These calculations differ on 32-bit and 64-bit machines. For a 32-bit > machine storage is in units of either 28 bytes (Ncells) or 8 bytes > (Vcells) so single-letter characters are wasteful, viz > > > object.size("aaaaaaa") > [1] 44 > > That is 1 Ncell and 2 Vcells, 1 for the string (7 bytes plus terminator) > and 1 for the pointer. > > Whereas > > > object.size(letters) > [1] 340 > > has 1 Ncell and 39 Vcells, 26 for the strings and 13 for the pointers > (which fit two to a Vcell). > > Note that repeated character strings may share storage, so for example > > > object.size(rep("a", 26)) > [1] 340 > > is wrong (140, I think). And that makes comparisons with factors depend > on exactly how they were created, for a character vector there probably is > a lot of sharing. > > I have a feeling that these calculations are off for character vectors, as > each element is a CHARSXP and so may have an Ncell not accounted for by > object.size. (`May' because of potential sharing.) Would anyone who is > sure like to confirm or deny this? > > It ought to be possible to improve the estimates for character vectors a > bit as we can detect sharing amongst the elements.
Prof. Ripley, Thanks for the clarifications. I'll need to spend some time reading through R-exts.pdf and Rinternals.h. Regards, Marc ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html