Hi R users:

 

I have the British Household Panel Survey (BHPS) in .tab format. I want to
feed it through the Amelia package (which will be an ‘interesting’ job in
itself)..

But first I need to convert the various types of missing value (from about
-9 to -1) to a more generic ‘NA’ code.

 

I’ve written the following function to do this:

 

BHPS.converter <- function(from="D:/Data/BHPS/UKDA-5151-tab/tab/",
to="D:/BHPS/NA/", ext="tab" ) {

                from.files <- dir(from, pattern=paste(".",ext,"$",sep="") ) 

                existing.to.files <- dir(to,
pattern=paste(".",ext,"$",sep="") )

                still.to.do.index <- 1:length(from.files)

                still.to.do.index <-
still.to.do.index[-match(existing.to.files, from.files)]

                obs.to.do <- length(still.to.do.index)

                for (i in 1:obs.to.do){

                                temp.table <-
read.delim(paste(from,from.files[still.to.do.index[i]], sep=""))

                                print(paste("read:",
from.files[still.to.do.index[i]]))

                                temp.table[temp.table < 0 ] <- NA

                                write.table(temp.table,
file=paste(to,from.files[still.to.do.index[i]], sep=""))

                                print(paste("written:",
from.files[still.to.do.index[i]]))

                }

 

 

                rm(i, from.files, existing.to.files, still.to.do.index,
obs.to.do, temp.table)

}

 

It checks for existing files in the ‘to’ directory (where files which have
been modified with R- -> NA) because when I tried to do this conversion
operation previously it got about ½ way through then crashed.

 

The problem is that it crashes *this time* too, without displaying a prompt
to say it’s read a single file. 

 

The file it gets stuck on is about 75mb in size. 

 

I am using a dual-core 3.2Ghz Pentium D processor with 2 Gb memory (& 2Gb
virtual memory), and (unfortunately) Windows XP.

 

Questions:

 1) Any general tips on how to increase the amount of memory available to
process the file?

2) Can you see a more efficient way of doing what I’m doing?

3) What’s the best way of coding for multiple forms of NA? – the BHPS code
‘-8’ (meaning ‘inapplicable’, not routed for this respondent) should really
be distinguished from other forms of nonresponse...

 

 

Thanks,

 

Jon

 

 

p.s. Apologies if this is slightly too vague/long winded...

 

 

Jon Minton

 

 


        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to