Could you help me? Thank you Cecília
Em Mon, 2 Aug 2010 18:42:27 -0400 jim holtman <jholt...@gmail.com> escreveu:
This is just following up with the example data you sent. This will create a list 'result' that will have the subset of data between the10% & 90%-tiles of the data:#My reproducible example: firm<-sort(rep(1:1000,10),decreasing=F) year<-rep(1998:2007,1000) industry<-rep(c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10),rep(6,10),rep(7,10),rep(8,10),rep(9,10),+ rep(10,10)),1000)X1<-rnorm(10000) data<-data.frame(firm, industry,year,X1) # split the data by industry/yeard.s <- split(data, list(data$industry, data$year), drop=TRUE)result <- lapply(d.s, function(.id){+ # get 10/90% values + .limit <- quantile(.id$X1, prob=c(.1, .9)) + subset(.id, X1 >= .limit[1] & X1 <= .limit[2]) + })str(result)List of 100 $ 1.1998 :'data.frame': 800 obs. of 4 variables:..$ firm : int [1:800] 1 21 31 41 51 61 71 81 91 111 .....$ industry: num [1:800] 1 1 1 1 1 1 1 1 1 1 .....$ year : int [1:800] 1998 1998 1998 1998 1998 1998 1998 19981998 1998 .....$ X1 : num [1:800] 0.659 -0.105 -0.617 0.342 -1.077 ...$ 2.1998 :'data.frame': 800 obs. of 4 variables:..$ firm : int [1:800] 2 32 42 52 62 72 102 112 132 162 .....$ industry: num [1:800] 2 2 2 2 2 2 2 2 2 2 .....$ year : int [1:800] 1998 1998 1998 1998 1998 1998 1998 19981998 1998 .....$ X1 : num [1:800] -1.1044 -0.0666 -0.9184 0.3469 -0.2348 ...You can see that the 'name' of the list element is the industry.yearcombination; this can also be seen in the data.On Mon, Aug 2, 2010 at 6:20 PM, Cecilia Carmo <cecilia.ca...@ua.pt> wrote:Thank you for your help but I don't understand how can I have a dataframe with the columns: firm, year, industry, X1 and X2. Could you help me(again)? Cecília Carmo Em Sat, 31 Jul 2010 22:10:38 -0400 jim holtman <jholt...@gmail.com> escreveu:This will split the data by industry & year and then return the valuesthat include the 80%-tile (>=10% & <= 90%) # split the data by industry/yeard.s <- split(data, list(data$industry, data$year), drop=TRUE)result <- lapply(d.s, function(.id){ # get 10/90% values .limit <- quantile(.id$X1, prob=c(.1, .9)) subset(.id, X1 >= .limit[1] & X1 <= .limit[2]) })This returns a list of 100 elements for each combination.On Sat, Jul 31, 2010 at 9:39 PM, Cecilia Carmo <cecilia.ca...@ua.pt>wrote:Hi everyone!#I need a loop or a function that creates a X2 variable that is X1withoutthe extreme values (or X1 winsorized) by industry and year.#My reproducible example: firm<-sort(rep(1:1000,10),decreasing=F) year<-rep(1998:2007,1000) industry<-rep(c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10),rep(6,10),rep(7,10),rep(8,10),rep(9,10), rep(10,10)),1000) X1<-rnorm(10000) data<-data.frame(firm, industry,year,X1) dataThe way I’m doing this is very hard. I split my sample by industry andyear,for each industry and year I calculate the 10% and 90% quantiles, then Icreate a X2 variable like this: industry1<-subset(data,data$industry==1) ind1year1999<-subset(industry1,industry1$year==1999) q1<-quantile(ind1year1999$X1,probs=0.1,na.rm=TRUE) q99<-quantile(ind1year1999$X1,probs=0.90,na.rm=TRUE) ind1year1999winsorized<-transform(ind1year1999,X2=ifelse(X1<q1,q1,ifelse(X1>q99,q99,X1))) ind1year2000<-subset(industry1,industry1$year==2000) q1<-quantile(ind1year2000$X1,probs=0.1,na.rm=TRUE) q99<-quantile(ind1year2000$X1,probs=0.90,na.rm=TRUE) ind1year2000winsorized<-transform(ind1year2000,X2=ifelse(X1<q1,q1,ifelse(X1>q99,q99,X1)))I repeat this for all years and industries, and then I merge/bind allagainto have a new dataframe with all the columns of the dataframe «data» plusX2. Could anyone help me doing this in a easier way? Thanks Cecília Carmo Universidade de Aveiro - Portugal ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.