I think that you have to be a little more explicit with a description of your data. I am not clear as to what this means:
> There are lots of variables between each exposure and the values are nominal > with upto 6 values.. Can you provide a more complete description. How many columns of exposure are there in your data? How many unique IDs? Depending on these answers, you can probably read in a portion of your 5GB data base and summarize the information and the aggregate it at then end since I would expect that the length of the aggregated data is just the number of unique IDs. On Tue, Aug 5, 2008 at 11:54 AM, Michael Pearmain <[EMAIL PROTECTED]> wrote: > Thanks for the help guys, > > i think i needed to be a bit more explicit however (sorry) > > There are lots of variables between each exposure and the values are nominal > with upto 6 values.. > And to add to the problem the datasets i deal with range from anything upto > 5G. > > My guess is that the melt function would be inefficient in this situation. > > I was looking at the agrep function to count the number Exposures in the > names() , i wasn't sure of how to count if there was a value in each one but > the y[complete.cases(y),] looks like a nice function. > > Is this a good path to follow? > > > > > On Tue, Aug 5, 2008 at 3:09 PM, jim holtman <[EMAIL PROTECTED]> wrote: >> >> I am not sure where the "Max" comes from, but this might be a start for >> you: >> >> > x <- read.table(textConnection("ID Exposure_1 Exposure_2 Exposure_3 >> + 1 y y y >> + 2 y y - >> + 3 y - -"), header=TRUE, >> na.strings='-') >> > closeAllConnections() >> > require(reshape) >> > y <- melt(x, id.var='ID') >> > # get rid of NAs >> > y <- y[complete.cases(y),] >> > y >> ID variable value >> 1 1 Exposure_1 y >> 2 2 Exposure_1 y >> 3 3 Exposure_1 y >> 4 1 Exposure_2 y >> 5 2 Exposure_2 y >> 7 1 Exposure_3 y >> > cbind(Unique=tapply(y$ID, y$ID, length)) >> Unique >> 1 3 >> 2 2 >> 3 1 >> > >> >> >> On Tue, Aug 5, 2008 at 9:21 AM, Michael Pearmain <[EMAIL PROTECTED]> >> wrote: >> > Hi All, >> > >> > i have a dataset that i want to dynamically inspect for the number of >> > variables that start with "Exposure_" and then for these count the >> > entries >> > across each case i.e >> > >> > ID Exposure_1 Exposure_2 Exposure_3 >> > 1 y y y >> > 2 y y - >> > 3 y - - >> > >> > So the corresponding new variables that would be created are >> > >> > ID Max_Exposure Unique_Exposure >> > 1 3 3 >> > 2 3 2 >> > 3 3 1 >> > >> > I know this may seem fairly basic but it will give me the starting point >> > to >> > develop more advanced things with loop and nat lang >> > >> > Thanks in advance >> > >> > Mike >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> >> -- >> Jim Holtman >> Cincinnati, OH >> +1 513 646 9390 >> >> What is the problem that you are trying to solve? > > > > -- > Michael Pearmain > Senior Statistical Analyst > > > 1st Floor, 180 Great Portland St. London W1W 5QZ > t +44 (0) 2032191684 > [EMAIL PROTECTED] > [EMAIL PROTECTED] > > > Doubleclick is a part of the Google group of companies > > "If you received this communication by mistake, please don't forward it to > anyone else (it may contain confidential or privileged information), please > erase all copies of it, including all attachments, and please let the sender > know it went to the wrong person. Thanks." > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.