On Sat, 14 Aug 2004, Marc Schwartz wrote: > > object.size("a") > [1] 44 > > > object.size(letters) > [1] 340 > > In the second case, as Tony has noted, the size of letters (a character > vector) is not 26 * 44.
Of course not. Both are character vectors, so have the overhead of any R object plus an allocation for pointers to the elements plus an amount for each element of the vector (see the end). These calculations differ on 32-bit and 64-bit machines. For a 32-bit machine storage is in units of either 28 bytes (Ncells) or 8 bytes (Vcells) so single-letter characters are wasteful, viz > object.size("aaaaaaa") [1] 44 That is 1 Ncell and 2 Vcells, 1 for the string (7 bytes plus terminator) and 1 for the pointer. Whereas > object.size(letters) [1] 340 has 1 Ncell and 39 Vcells, 26 for the strings and 13 for the pointers (which fit two to a Vcell). Note that repeated character strings may share storage, so for example > object.size(rep("a", 26)) [1] 340 is wrong (140, I think). And that makes comparisons with factors depend on exactly how they were created, for a character vector there probably is a lot of sharing. I have a feeling that these calculations are off for character vectors, as each element is a CHARSXP and so may have an Ncell not accounted for by object.size. (`May' because of potential sharing.) Would anyone who is sure like to confirm or deny this? It ought to be possible to improve the estimates for character vectors a bit as we can detect sharing amongst the elements. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html