[R] Create factor variable by groups

Mateus Rabello Mon, 04 Jul 2011 21:59:09 -0700

Hi, suppose that I have the following data.frame:

      cnae4 cnpj 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Y 
      24996 10020470 1 1 2 12 16 21 17 51 43 19 183 
      24996 10020470 69 91 79 92 91 77 90 96 98 108 891 
      36145 10020470 0 0 0 0 2 83 112 97 91 144 529 
      44444 10023333 5 20 60 0 0 0 0 5 20 1000 1110



I would like to create a new variable X that indicates which line, within the 
cnpj variable, has the highest value Y. For instance, within the cnpj = 
10020470, the second line has the largest value Y (891). For cnpj = 10023333 is 
trivial (1110). Then, my new data.frame would become:

      cnae4 cnpj 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Y X 
      24996 10020470 1 1 2 12 16 21 17 51 43 19 183 FALSE 
      24996 10020470 69 91 79 92 91 77 90 96 98 108 891 TRUE 
      36145 10020470 0 0 0 0 2 83 112 97 91 144 529 FALSE 
      44444 10023333 5 20 60 0 0 0 0 5 20 1000 1110 TRUE 


Notice that for every value of the variable cnpj, only one line will have X = 
TRUE. 

Then, I would like to create a variable Z that is the sum of variable Y, also 
by variable cnpj. Thus, if cnpj = 10020470, Z = 183 + 891 +529 and for cnpj = 
10023333, Z = 120. These sums can easily be done with tapply or aggregate but 
those would eliminate line with equal cnpj and I donât want that. I would 
like to achieve a data.frame like the following:

      cnae4 cnpj 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Y X Z 
      24996 10020470 1 1 2 12 16 21 17 51 43 19 183 FALSE 1603 
      24996 10020470 69 91 79 92 91 77 90 96 98 108 891 TRUE 1603 
      36145 10020470 0 0 0 0 2 83 112 97 91 144 529 FALSE 1603 
      44444 10023333 5 20 60 0 0 0 0 5 20 1000 1110 TRUE 1110 


In the end I will eliminate all lines with X = FALSE. 


Thank you and sorry for the long question.

Mateus Rabello
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Create factor variable by groups

Reply via email to