Dear list:

A datafile was sent to me that is very large (92890 x 1620) and is *very* 
sparse. Instead of leaving the entries with missing data blank, each cell with 
missing data contains a dot (.)

The data are binary in almost all columns, with only a few columns containing 
whole numbers, which I believe requires 2 bytes for the binary and 4 for the 
others. So, by my calculations (assuming 4 bytes for all cells to create an 
upperbound) I should need around 92890 * 1620 * 4 = 574MB to read in these data 
and about twice that for analyses. My computer has 3GB. 

But, I am unable to read in the file even though I have allocated sufficient 
memory to R for this. 

My first question is do the dots in the empty cells consume additional memory? 
I am assuming the answer is yes and believe I should remove them before I do 
the read in. Because my data are in a fixed width format file, I can open the 
file in a text editor and find and replace all dots with nothing. Then, I 
should retry the read in process? Maybe this will work?

I created a smaller data file (~ 14000 * 1620) in SAS and tried to import this 
subset (it still had the dots), but R still would not allow for me to do so.

I could use a little guidance as I think I have allocated sufficient memory to 
read in a datafile assuming my calculations are right.

Does anyone have any thoughts on a strategy?

Harold


        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to