I often use code like Curt's encapsulated in the following isFirstInRun function:
isFirstInRun <- function(x,...) { lengthX <- length(x) if (lengthX == 0) return(logical(0)) retVal <- c(TRUE, x[-1]!=x[-lengthX]) for(arg in list(...)) { stopifnot(lengthX == length(arg)) retVal <- retVal | c(TRUE, arg[-1]!=arg[-lengthX]) } if (any(missing<-is.na(retVal))) # match rle: NA!=NA retVal[missing] <- TRUE retVal } E.g., > d <- data.frame(log=trunc(log(1:10)), sqrt=trunc(sqrt(1:10))) > within(d, first <- isFirstInRun(log)) log sqrt first 1 0 1 TRUE 2 0 1 FALSE 3 1 1 TRUE 4 1 2 FALSE 5 1 2 FALSE 6 1 2 FALSE 7 1 2 FALSE 8 2 2 TRUE 9 2 3 FALSE 10 2 3 FALSE > # Or look for change in any number of vectors: > within(d, first <- isFirstInRun(log, sqrt)) # TRUE if either changes log sqrt first 1 0 1 TRUE 2 0 1 FALSE 3 1 1 TRUE 4 1 2 TRUE 5 1 2 FALSE 6 1 2 FALSE 7 1 2 FALSE 8 2 2 TRUE 9 2 3 TRUE 10 2 3 FALSE To do isLastInRun put the TRUE after the x[-1]!=x[-length(x)] isLastInRun <- function(x,...) { lengthX <- length(x) if (lengthX == 0) return(logical(0)) retVal <- c(x[-1]!=x[-lengthX], TRUE) for(arg in list(...)) { stopifnot(lengthX == length(arg)) retVal <- retVal | c(arg[-1]!=arg[-lengthX], TRUE) } if (any(missing<-is.na(retVal))) # match rle: NA!=NA retVal[missing] <- TRUE retVal } Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of > seeliger.c...@epamail.epa.gov > Sent: Tuesday, November 23, 2010 10:38 AM > To: r-help@r-project.org > Cc: Joel > Subject: Re: [R] the first. from SAS in R > > > > Is there any similar function in R to the first. in SAS? > > ?duplicated > > > > a$d <- ifelse( duplicated( a$a ), 0 , 1 ) > > > > a$d.2 <- as.numeric( !duplicated( a$a ) ) > > Actually, duplicated does not duplicate SAS' first. operator, > though it > may suffice for the OP's needs. > > To illustrate, let's start with a dataframe of 3 key columns > and some data > in x: > tt <- data.frame(k1 = rep(1:3, each=10), k2 = rep(1:5, > each=2, times=3), > k3=rep(1:2, times=15), x = 1:30) > > # Try to mimic what the following SAS datastep would do, > # assuming 'tt' is already sorted: > # data foo; > # set tt; > # by k1, k2; > # put first.k1=, first.k2=; > # run; > > # SAS' first. operations would result in these values: > tt$sas.first.k1 <- rep(c(1, rep(0,9)), 3) > tt$sas.first.k2 <- rep(1:0, 15) > > # R duplicated() returns these values. You can see they > # are the same for k1, but dissimilar after row 10 for k2. > tt$duplicated.k1 <- 0+!duplicated(tt$k1) > tt$duplicated.k2 <- 0+!duplicated(tt$k2) > > # I've found I need to lag a column to mimic SAS' first. > # operator, thusly, though perhaps someone else knows > # differently. Note this does not work on unordered > # dataframes! > lag.k1 <- c(NA, tt$k1[1:(nrow(tt) - 1)]) > tt$r.first.k1 <- ifelse(is.na(lag.k1), 1, tt$k1 != lag.k1) > > lag.k2 <- c(NA, tt$k2[1:(nrow(tt) - 1)]) > tt$r.first.k2 <- ifelse(is.na(lag.k2), 1, tt$k2 != lag.k2) > > Mimicking SAS' last. operation can be done in a similar manner, by > anti-laging the column of interest and changing the > comparisons somewhat. > > Enjoy the days, > cur > -- > Curt Seeliger, Data Ranger > Raytheon Information Services - Contractor to ORD > seeliger.c...@epa.gov > 541/754-4638 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.