[R] character type and memory usage

Mike Miller Fri, 16 Jan 2015 22:24:07 -0800

First, a very easy question: What is the difference between usingwhat="character" and what=character() in scan()? What is the reason forthe character() syntax?

I am working with some character vectors that are up to about 27.5 millionelements long. The elements are always unique. Specifically, these arenames of genetic markers. This is how much memory those names take up:

snps <- scan("SNPs.txt", what=character())

Read 27446736 items

object.size(snps)

1756363648 bytes

object.size(snps)/length(snps)

63.9917128215173 bytes

As you can see, that's about 1.76 GB of memory for the vector at anaverage of 64 bytes per element. The longest string is only 14 bytes,though. The file takes up 313 MB.

Using 64 bytes per element instead of 14 bytes per element is costing me atotal of 1,372,336,800 bytes. In a different example where the longeststring is 4 characters, the elements each use 8 bytes. So it looks likeI'm stuck with either 8 bytes or 64 bytes. Is that true? There is no wayto modify that?


By the way...

It turns out that 99.72% of those character strings are of the formpaste("rs", Int) where Int is an integer of no more than 9 digits. So ifI use only those markers, drop the "rs" off, and load them as integers, Isee a huge improvement:

snps <- scan("SNPs_rs.txt", what=integer())

Read 27369706 items

object.size(snps)

109478864 bytes

object.size(snps)/length(snps)

4.00000146146985 bytes

That saves 93.8% of the memory by dropping 0.28% of the markers andencoding as integers instead of strings. I might end up doing this byencoding the other characters as negative integers.


Mike

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] character type and memory usage

Reply via email to