[R] Robust vce for heckman estimators

2011-07-11 Thread Mateus Rabello
When using function heckit() from package ‘sampleSelection’, is there 
anyway to make t-tests for the coefficients using robust covariance matrix 
estimator? By “robust” I mean something like if a had an object ‘lm’ 
called “reg” and then used:

 coeftest(reg, vcov = vcovHC(reg)).

I’m asking this because in Stata we could use function heckman and then use 
vce option “robust”. We could do the same for cluster.

In a more general way, is there anyway to use another covariance matrix to make 
t-test (e.g. linear hypothesis) for heckit (selection) models?

Thanks,

Mateus Rabello
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Create factor variable by groups

2011-07-04 Thread Mateus Rabello
Hi, suppose that I have the following data.frame:

  cnae4 cnpj 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Y 
  24996 10020470 1 1 2 12 16 21 17 51 43 19 183 
  24996 10020470 69 91 79 92 91 77 90 96 98 108 891 
  36145 10020470 0 0 0 0 2 83 112 97 91 144 529 
  4 1002 5 20 60 0 0 0 0 5 20 1000 1110 


I would like to create a new variable X that indicates which line, within the 
cnpj variable, has the highest value Y. For instance, within the cnpj = 
10020470, the second line has the largest value Y (891). For cnpj = 1002 is 
trivial (1110). Then, my new data.frame would become:

  cnae4 cnpj 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Y X 
  24996 10020470 1 1 2 12 16 21 17 51 43 19 183 FALSE 
  24996 10020470 69 91 79 92 91 77 90 96 98 108 891 TRUE 
  36145 10020470 0 0 0 0 2 83 112 97 91 144 529 FALSE 
  4 1002 5 20 60 0 0 0 0 5 20 1000 1110 TRUE 


Notice that for every value of the variable cnpj, only one line will have X = 
TRUE. 

Then, I would like to create a variable Z that is the sum of variable Y, also 
by variable cnpj. Thus, if cnpj = 10020470, Z = 183 + 891 +529 and for cnpj = 
1002, Z = 120. These sums can easily be done with tapply or aggregate but 
those would eliminate line with equal cnpj and I don’t want that. I would 
like to achieve a data.frame like the following:

  cnae4 cnpj 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Y X Z 
  24996 10020470 1 1 2 12 16 21 17 51 43 19 183 FALSE 1603 
  24996 10020470 69 91 79 92 91 77 90 96 98 108 891 TRUE 1603 
  36145 10020470 0 0 0 0 2 83 112 97 91 144 529 FALSE 1603 
  4 1002 5 20 60 0 0 0 0 5 20 1000 1110 TRUE 1110 


In the end I will eliminate all lines with X = FALSE. 


Thank you and sorry for the long question.

Mateus Rabello
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Conditional Correlation

2011-06-17 Thread Mateus Rabello
Hi, 
How can I accomplish this in R. Example:  

I have the following data.frame:

data - 
data.frame(x=c(1,2,3,4,5,6,5,3,7,1,0,4,8),y=c(1,2,1,2,2,2,1,1,1,2,2,2,2),z=c(5,8,4,3,4,1,6,3,3,6,3,5,7))

Supposing that data$y is a factor, I would like to find the Spearman 
correlation between data$x and data$z indexing it by data$y. 
To be more specific, I want to find two correlations: between x and z with y==1 
and the same correlation with x and z where y==2.
Something like:

cor(data$x[data$y==1],data$z[data$y==1],method= spearman) and 
cor(data$x[data$y==2],data$z[data$y==2],method= spearman),

but without having to write all the values for data$y and use cor more than 
once.

I hope I made myself clear.
Thanks
Mateus


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.