subject:"\"\\\[R\\\] Subsetting by number of observations in a factor\""

[R] Subsetting by number of observations in a factor

2007-08-09 Thread Ron Crump

Hi, I generally do my data preparation externally to R, so I this is a bit unfamiliar to me, but a colleague has asked me how to do certain data manipulations within R. Anyway, basically I can get his large file into a dataframe. One of the columns is a management group code (mg). There may be va

Re: [R] Subsetting by number of observations in a factor

2007-08-09 Thread jim holtman

Does this do what you want? It creates a new dataframe with those 'mg' that have at least a certain number of observation. > set.seed(2) > # create some test data > x <- data.frame(mg=sample(LETTERS[1:4], 20, TRUE), data=1:20) > # split the data into subsets based on 'mg' > x.split <- split(x, x$

Re: [R] Subsetting by number of observations in a factor

2007-08-09 Thread Ron Crump

Jim, > Does this do what you want? It creates a new dataframe with those > 'mg' that have at least a certain number of observation. Looks good. I also have an alternative solution which appears to work, so I'll see which is quicker on the big data set in question. My solution: mgsize <- as.dat

Re: [R] Subsetting by number of observations in a factor

2007-08-09 Thread jim holtman

Here is an even faster way: > # faster way > x.mg.size <- table(x$mg) # count occurance > x.mg.5 <- names(x.mg.size)[x.mg.size > 5] # select greater than 5 > x.new1 <- subset(x, x$mg %in% x.mg.5) # use in the subset > x.new1 mg data 1 A1 4 A4 5 D5 6 D6 7 A7 8