Hi,
I generally do my data preparation externally to R, so I
this is a bit unfamiliar to me, but a colleague has asked
me how to do certain data manipulations within R.
Anyway, basically I can get his large file into a dataframe.
One of the columns is a management group code (mg). There may be
va
Does this do what you want? It creates a new dataframe with those
'mg' that have at least a certain number of observation.
> set.seed(2)
> # create some test data
> x <- data.frame(mg=sample(LETTERS[1:4], 20, TRUE), data=1:20)
> # split the data into subsets based on 'mg'
> x.split <- split(x, x$
Jim,
> Does this do what you want? It creates a new dataframe with those
> 'mg' that have at least a certain number of observation.
Looks good. I also have an alternative solution which appears to work,
so I'll see which is quicker on the big data set in question.
My solution:
mgsize <- as.dat
Here is an even faster way:
> # faster way
> x.mg.size <- table(x$mg) # count occurance
> x.mg.5 <- names(x.mg.size)[x.mg.size > 5] # select greater than 5
> x.new1 <- subset(x, x$mg %in% x.mg.5) # use in the subset
> x.new1
mg data
1 A1
4 A4
5 D5
6 D6
7 A7
8