Re: [R] Conditional looping over a set of variables in R

Peter Ehlers Sun, 24 Oct 2010 08:19:42 -0700

This won't be as quick as Bill's elegant solution, but it's a one-liner:


 apply(d, 1, function(x), match(1, x))

See ?match.

  -Peter Ehlers

On 2010-10-22 10:36, David Herzberg wrote:

Bill, thanks so much for this. I'll get a chance to test it later today, and 
will post the outcome.


David S. Herzberg, Ph.D.
Vice President, Research and Development
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: dav...@wpspublish.com



-----Original Message-----
From: William Dunlap [mailto:wdun...@tibco.com]
Sent: Friday, October 22, 2010 9:52 AM
To: David Herzberg; r-help@r-project.org
Subject: RE: [R] Conditional looping over a set of variables in R

You were a bit vague about the format of your data.
I'm assuming all columns were numeric and the entries are one of 0, 1, and NA 
(missing value).  I made a little function to generate random data of that 
format for testing purposes:

makeData<- function (nrow = 1500, ncol = 140, pMissing = 0.1) {
     # pMissing if proportion of missing values
     m<- matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE),
         nrow, ncol)
     m[runif(nrow * ncol)<  pMissing]<- NA
     data.frame(m)
}

E.g.,

   >  set.seed(168)
   >  d<- makeData(15,3)
   >  d
       X1 X2 X3
    1   1  1  1
    2   0  0 NA
    3   0  1  0
    4   0  0 NA
    5   0  1  1
    6   0  0 NA
    7   1  0  0
    8   0  1  1
    9   0  0  1
   10   1  1 NA
   11   0  0  1
   12   0  0  0
   13  NA NA NA
   14   0  0  0
   15   1  0  0

I think the following function does what you want.
The algorithm is pretty similar to what you showed.

   columnOfFirstOne<- function(data) {
       # col will be return value, one entry per row of data.
       # Fill it with NA's: NA in output will mean there were no 1's in row
       col<- rep(as.integer(NA), nrow(data))
       for (j in seq_len(ncol(data))) { # loop over columns
           # For each entry in 'col', if it has not been set yet
           # and this entry the j'th column of data is 1 (and not
missing)
           # then set to the column number.
           col[is.na(col)&  !is.na(data[, j])&  data[, j] == 1]<- j
       }
       col # return this from function
   }

With the above data we get
   >  columnOfFirstOne(d)
    [1]  1 NA  2 NA  2 NA  1  2  3  1  3 NA NA NA  1

It seems quick enough for a dataset of your size
   >  dd<- makeData(nrow=1500, ncol=140)
   >  system.time(columnOfFirstOne(dd)) # time in seconds
      user  system elapsed
      0.08    0.00    0.08

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of David Herzberg
Sent: Friday, October 22, 2010 8:34 AM
To: r-help@r-project.org
Subject: [R] Conditional looping over a set of variables in R

Here's the problem I'm trying to solve in R: I have a data frame that
consists of about 1500 cases (rows) of data from kids who took a test
of listening comprehension. The columns are their scores (1 = correct,
0 = incorrect,  . = missing) on 140 test items. The items are numbered
sequentially and are ordered by increasing difficulty as you go from
left to right across the columns. I want R to go through the data and
find the first correct response for each case. Because of basal and
ceiling rules, many cases have missing data on many items before the
first correct response appears.

For each case, I want R to evaluate the item responses sequentially
starting with item 1. If the score is 0 or missing, proceed to the
next item and evaluate it. If the score is 1, stop the operation for
that case, record the item number of that first correct response in a
new variable, proceed to the next case, and restart the operation.

In SPSS, this operation would be carried out with LOOP, VECTOR, and DO
IF, as follows (assuming the data set is already loaded):

* DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT
RESPONSE, SET IT EQUAL TO 0.
numeric LCfirst1.
comp LCfirst1 = 0

* DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
vector x=LC1a_score to LC140a_score.

* SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS
LCfirst1 = 0. "#i" IS AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME
THE LOOP RUNS.
loop #i=1 to 140 if (LCfirst1 = 0).

* SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH
ELEMENT OF THE VECTOR.  THUS, WHEN #i = 1, THE EXPRESSION EVALUATES
THE FIRST ELEMENT OF THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM
RESPONSES). AS THE LOOP RUNS AND #i INCREASES, SUBSEQUENT VECTOR
ELELMENTS ARE EVALUATED.
THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE
VECTOR UNTIL A '1' IS ENCOUNTERED.
+ do if x(#i) = 1.

* WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT,
WHICH RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
+ comp x(#i) = 99.

* AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE
VALUE OF LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM
NUMBER OF THE FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE
OF LCfirst1 ALSO CAUSE S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND
THE PROGRAM MOVES TO THE NEXT CASE AND RESTARTS THE LOOP.
+ comp LCfirst1 = #i.
+ end if.
end loop.
exe.

After several hours of trying to translate this procedure to R, I'm
stumped. I played around with creating a list to hold the item
responses variables (analogous to 'vector' in SPSS), but when I tried
to use the list in an R procedure, I kept getting a warning along the
lines of  'the list contains>  1 element, only the first element will
be used'. So perhaps a list is not the appropriate class to 'hold'
these variables?

It seems that some nested arrangement of 'for' 'while' and/or 'lapply'
will allow me to recreate the operation described above? How do I set
up the indexing operation analogous to 'loop #i' in SPSS?

Any help is appreciated, and I'm happy to provide more information if
needed.

David S. Herzberg, Ph.D.
Vice President, Research and Development Western Psychological
Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: dav...@wpspublish.com


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Conditional looping over a set of variables in R

Reply via email to