Hi! # toy data
toyData <- data.frame(x = 1:4, y = 5:8, xy = 9:12, z = 13:16) vars <- c("x", "z") # "pattern" is an argument of grep args(grep) # "pattern" must only consist of a single element # otherwise only the first element is used grep(pattern = vars, x = names(toyData)) # one way to do this - a loop # create a vector to collect the output of each call toyColIndexList <- vector(length = length(vars), mode = "list") # grep each element in turn for (i in seq_along(vars)) { toyColIndexList[[i]] <- grep(pattern = vars[i], x = names(toyData)) } # combine all of the answers toyColIndex <- unlist(toyColIndexList) # remove duplicated columns if present toyColIndex <- toyColIndex[!duplicated(toyColIndex)] # select the elements we want toyData[, toyColIndex] # alternatively we could use regular expressions grep(pattern = ("x|z"), x = names(toyData)) # hope this helps Best wishes Chris Chris Campbell Mango Solutions Data Analysis that Delivers http://www.mango-solutions.com +44 (0) 1249 705 450 -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Lib Gray Sent: 20 July 2012 01:17 To: Rui Barradas Cc: r-help Subject: Re: [R] Subsetting problem data, 2 I'm still getting the message (if this is what you were suggesting I try). The data set I'm using has many more columns other than these variables; could that be a problem? I didn't think it would affect it. >pattern <- "L[1-8][12]" > nms<-names(data)[grep(vars,names(data))] Warning message: In grep(vars, names(data)) : argument 'pattern' has length > 1 and only the first element will be used > On Thu, Jul 19, 2012 at 6:55 PM, Rui Barradas <ruipbarra...@sapo.pt> wrote: > Hello, > > Sorry, forgot about that. It's trickier to write code without a > dataset to test it. > > Try > > pattern <- "L[1-8][12]" > > and after the grep print nms to see if it's right. > > Rui Barradas > > Em 20-07-2012 00:33, Lib Gray escreveu: > >> I'm getting this error message: >> >> nms<-names(data)[grep(vars,**names(data))] >> Warning message: >> In grep(vars, names(data)) : >> argument 'pattern' has length > 1 and only the first element will >> be used >> >> Is there a way around this? >> >> >> On Thu, Jul 19, 2012 at 6:17 PM, Rui Barradas <ruipbarra...@sapo.pt> >> wrote: >> >> Hello, >>> >>> I guess so, and I can save you some typing. >>> >>> vars <- sort(apply(expand.grid("L", 1:8, 1:2), 1, paste, >>> collapse="")) >>> >>> >>> Then use it and see the result. >>> >>> Rui Barradas >>> >>> Em 20-07-2012 00:00, Lib Gray escreveu: >>> >>> The variables are actually L11, L12, L21, L22, ... , L81, L82. >>> Would >>>> just >>>> creating a vector c(L11,... ,L82) be fine? (I'm about to try it, >>>> but I wanted to check to see if that was going to be a big issue). >>>> >>>> On Thu, Jul 19, 2012 at 3:27 PM, Rui Barradas >>>> <ruipbarra...@sapo.pt> >>>> wrote: >>>> >>>> Hello, >>>> >>>>> Try the following. The data is your example of Patient A through >>>>> E, but from the output of dput(). >>>>> >>>>> dat <- structure(list(Patient = structure(c(1L, 1L, 1L, 1L, 1L, >>>>> 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L), .Label = >>>>> c("A", "B", "C", "D", "E"), class = "factor"), Cycle = c(1L, 2L, >>>>> 3L, 4L, 5L, 1L, 2L, 1L, 3L, 4L, 5L, 1L, 2L, 4L, 5L, 1L, 2L, 3L), >>>>> V1 = c(0.4, 0.3, 0.3, 0.4, 0.5, 0.4, 0.4, 0.9, 0.3, NA, 0.4, >>>>> 0.2, 0.5, 0.6, 0.5, 0.1, 0.5, 0.4), V2 = c(0.1, 0.2, NA, >>>>> NA, 0.2, NA, NA, 0.9, 0.5, NA, NA, 0.5, 0.7, 0.4, 0.5, NA, >>>>> 0.3, 0.3), V3 = c(0.5, 0.5, 0.6, 0.4, 0.5, NA, NA, 0.9, 0.6, >>>>> NA, NA, NA, NA, NA, NA, NA, NA, NA), V4 = c(1.5, 1.6, 1.7, >>>>> 1.8, 1.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>>>> NA), V5 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>>>> NA, NA, NA, NA, NA, NA)), .Names = c("Patient", "Cycle", >>>>> "V1", "V2", "V3", "V4", "V5"), class = "data.frame", row.names = >>>>> c(NA, >>>>> -18L)) >>>>> >>>>> dat >>>>> >>>>> nms <- names(dat)[grep("^V[1-9]$", names(dat))] dd <- split(dat, >>>>> dat$Patient) fun <- function(x) any(is.na(x)) && any(!is.na(x)) ix >>>>> <- sapply(dd, function(x) Reduce(`|`, lapply(x[, nms], fun))) >>>>> >>>>> dd[ix] >>>>> do.call(rbind, dd[ix]) >>>>> >>>>> >>>>> I'm assuming that the variables names are as posted, V followed by >>>>> one single digit 1-9. To keep the Patients with complete cases >>>>> just negate the index 'ix', it's a logical index. >>>>> Note also that dput() is the best way of posting a data example. >>>>> >>>>> Hope this helps, >>>>> >>>>> Rui Barradas >>>>> >>>>> Em 19-07-2012 15:15, Lib Gray escreveu: >>>>> >>>>> Hello, >>>>> >>>>>> I didn't give enough information when I sent an query before, so >>>>>> I'm trying again with a more detailed explanation: >>>>>> >>>>>> In this data set, each patient has a different number of measured >>>>>> variables (they represent tumors, so some people had 2 tumors, >>>>>> some had 5, etc). >>>>>> The >>>>>> problem I have is that often in later cycles for a patient, >>>>>> tumors that were originally measured are now missing (or a "new" >>>>>> tumor showed up). >>>>>> We >>>>>> assume there are many different reasons for why a tumor would be >>>>>> measured in one cycle and not another, and so I want to subset >>>>>> OUT the "problem" >>>>>> patients to better study these patterns. >>>>>> >>>>>> An example: >>>>>> >>>>>> Patient Cycle V1 V2 V3 V4 V5 A 1 0.4 0.1 0.5 1.5 NA A >>>>>> 2 0.3 0.2 0.5 1.6 NA A 3 0.3 NA 0.6 1.7 NA A 4 0.4 >>>>>> NA 0.4 1.8 NA A 5 0.5 0.2 0.5 1.5 NA >>>>>> >>>>>> I want to keep patient A; they have 4 measured tumors, but tumor >>>>>> 2 is missing data for cycles 3 and 4 >>>>>> >>>>>> B 1 0.4 NA NA NA NA >>>>>> B 2 0.4 NA NA NA NA >>>>>> >>>>>> I do not want to keep patient B; they have 1 tumor that is >>>>>> measure consistently in both cycles >>>>>> >>>>>> C 1 0.9 0.9 0.9 NA NA >>>>>> C 3 0.3 0.5 0.6 NA NA >>>>>> C 4 NA NA NA NA NA >>>>>> C 5 0.4 NA NA NA NA >>>>>> >>>>>> I do want to keep patient C; all their data is missing for cycle >>>>>> 4 and cycle 5 only measured one tumor >>>>>> >>>>>> D 1 0.2 0.5 NA NA NA >>>>>> D 2 0.5 0.7 NA NA NA >>>>>> D 4 0.6 0.4 NA NA NA >>>>>> D 5 0.5 0.5 NA NA NA >>>>>> >>>>>> I do not want patient D, their two tumors were measured each >>>>>> cycle >>>>>> >>>>>> E 1 0.1 NA NA NA NA >>>>>> E 2 0.5 0.3 NA NA NA >>>>>> E 3 0.4 0.3 NA NA NA >>>>>> >>>>>> I DO want patient E; they only had one tumor register in Cycle 1, >>>>>> but cycles 2 and 3 had two tumors. >>>>>> >>>>>> >>>>>> Thanks for any help! >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> ______________________________******________________ >>>>>> R-help@r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/******listinfo/r-help<https://stat.e >>>>>> thz.ch/mailman/****listinfo/r-help> >>>>>> <https://**stat.ethz.ch/mailman/****listinfo/r-help<https://stat. >>>>>> ethz.ch/mailman/**listinfo/r-help> >>>>>> > >>>>>> <https://stat.**ethz.ch/**mailman/listinfo/r-**help<http://ethz.c >>>>>> h/mailman/listinfo/r-**help> >>>>>> <http**s://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.et >>>>>> hz.ch/mailman/listinfo/r-help> >>>>>> > >>>>>> >>>>>> PLEASE do read the posting guide http://www.R-project.org/** >>>>>> posting-guide.html >>>>>> <http://www.R-project.org/****posting-guide.html<http://www.R-pro >>>>>> ject.org/**posting-guide.html> >>>>>> <http://www.**R-project.org/posting-guide.**html<http://www.R-pro >>>>>> ject.org/posting-guide.html> >>>>>> > >>>>>> >>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>> >>>>>> >>>>>> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- LEGAL NOTICE\ \ This message is intended for the use of ...{{dropped:18}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.