You were a bit vague about the format of your data. I'm assuming all columns were numeric and the entries are one of 0, 1, and NA (missing value). I made a little function to generate random data of that format for testing purposes:
makeData <- function (nrow = 1500, ncol = 140, pMissing = 0.1) { # pMissing if proportion of missing values m <- matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE), nrow, ncol) m[runif(nrow * ncol) < pMissing] <- NA data.frame(m) } E.g., > set.seed(168) > d <- makeData(15,3) > d X1 X2 X3 1 1 1 1 2 0 0 NA 3 0 1 0 4 0 0 NA 5 0 1 1 6 0 0 NA 7 1 0 0 8 0 1 1 9 0 0 1 10 1 1 NA 11 0 0 1 12 0 0 0 13 NA NA NA 14 0 0 0 15 1 0 0 I think the following function does what you want. The algorithm is pretty similar to what you showed. columnOfFirstOne <- function(data) { # col will be return value, one entry per row of data. # Fill it with NA's: NA in output will mean there were no 1's in row col <- rep(as.integer(NA), nrow(data)) for (j in seq_len(ncol(data))) { # loop over columns # For each entry in 'col', if it has not been set yet # and this entry the j'th column of data is 1 (and not missing) # then set to the column number. col[is.na(col) & !is.na(data[, j]) & data[, j] == 1] <- j } col # return this from function } With the above data we get > columnOfFirstOne(d) [1] 1 NA 2 NA 2 NA 1 2 3 1 3 NA NA NA 1 It seems quick enough for a dataset of your size > dd <- makeData(nrow=1500, ncol=140) > system.time(columnOfFirstOne(dd)) # time in seconds user system elapsed 0.08 0.00 0.08 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of David Herzberg > Sent: Friday, October 22, 2010 8:34 AM > To: r-help@r-project.org > Subject: [R] Conditional looping over a set of variables in R > > Here's the problem I'm trying to solve in R: I have a data > frame that consists of about 1500 cases (rows) of data from > kids who took a test of listening comprehension. The columns > are their scores (1 = correct, 0 = incorrect, . = missing) > on 140 test items. The items are numbered sequentially and > are ordered by increasing difficulty as you go from left to > right across the columns. I want R to go through the data and > find the first correct response for each case. Because of > basal and ceiling rules, many cases have missing data on many > items before the first correct response appears. > > For each case, I want R to evaluate the item responses > sequentially starting with item 1. If the score is 0 or > missing, proceed to the next item and evaluate it. If the > score is 1, stop the operation for that case, record the item > number of that first correct response in a new variable, > proceed to the next case, and restart the operation. > > In SPSS, this operation would be carried out with LOOP, > VECTOR, and DO IF, as follows (assuming the data set is > already loaded): > > * DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST > CORRECT RESPONSE, SET IT EQUAL TO 0. > numeric LCfirst1. > comp LCfirst1 = 0 > > * DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES. > vector x=LC1a_score to LC140a_score. > > * SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS > LCfirst1 = 0. "#i" IS AN INDEX VARIABLE THAT INCREASES BY 1 > EACH TIME THE LOOP RUNS. > loop #i=1 to 140 if (LCfirst1 = 0). > > * SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR > EACH ELEMENT OF THE VECTOR. THUS, WHEN #i = 1, THE > EXPRESSION EVALUATES THE FIRST ELEMENT OF THE VECTOR (THAT > IS, THE FIRST OF THE 140 ITEM RESPONSES). AS THE LOOP RUNS > AND #i INCREASES, SUBSEQUENT VECTOR ELELMENTS ARE EVALUATED. > THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH > THE VECTOR UNTIL A '1' IS ENCOUNTERED. > + do if x(#i) = 1. > > * WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT > STATEMENT, WHICH RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'. > + comp x(#i) = 99. > > * AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH > RECODES THE VALUE OF LCfirst1 TO THE CURRENT INDEX VALUE, > THUS CAPTURING THE ITEM NUMBER OF THE FIRST CORRECT RESPONSE > FOR THAT CASE. CHANGING THE VALUE OF LCfirst1 ALSO CAUSE S > THE LOOP TO STOP EXECUTING FOR THAT CASE, AND THE PROGRAM > MOVES TO THE NEXT CASE AND RESTARTS THE LOOP. > + comp LCfirst1 = #i. > + end if. > end loop. > exe. > > After several hours of trying to translate this procedure to > R, I'm stumped. I played around with creating a list to hold > the item responses variables (analogous to 'vector' in SPSS), > but when I tried to use the list in an R procedure, I kept > getting a warning along the lines of 'the list contains > 1 > element, only the first element will be used'. So perhaps a > list is not the appropriate class to 'hold' these variables? > > It seems that some nested arrangement of 'for' 'while' and/or > 'lapply' will allow me to recreate the operation described > above? How do I set up the indexing operation analogous to > 'loop #i' in SPSS? > > Any help is appreciated, and I'm happy to provide more > information if needed. > > David S. Herzberg, Ph.D. > Vice President, Research and Development > Western Psychological Services > 12031 Wilshire Blvd. > Los Angeles, CA 90025-1251 > Phone: (310)478-2061 x144 > FAX: (310)478-7838 > email: dav...@wpspublish.com > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.