Hi. I am using the Matching package for propensity score matching. For each treated unit, I want to find all control units whose propensity scores lie within a certain distance from the treated unit. The sample code is as follows:
> library(Matching) > x <- rnorm(100000) > y <- rnorm(100000) > z <- rbinom(100000,1,0.002) > logit.reg <- glm(z~x+y,family=binomial(link='logit')) > match <- > Match(Y=NULL,Tr=z,X=logit.reg$fitted,version='fast',ties=TRUE,M=1,distance.tolerance=1e-5) According to the function definition (http://sekhon.berkeley.edu/matching/Match.html): "distance.tolerance: This is a scalar which is used to determine if distances between two observations are different from zero. Values less than distance.tolerance are deemed to be equal to zero. This option can be used to perform a type of optimal matching" Thus, for each treated unit I should get all control units whose difference in propensity scores from the treated unit is less than 1e-5. However, the actual difference between the treated unit's and the control units' propensity is distributed as follows: > summary(abs(logit.reg$fitted[match$index.treated]-logit.reg$fitted[match$index.control])) Min. 1st Qu. Median Mean 3rd Qu. Max. 7.453e-13 2.959e-07 5.849e-07 5.842e-07 8.741e-07 1.167e-06 The maximum difference is only 1.167e-6 instead of the 1e-5 I expected. Similarly, when I set higher tolerances I get: > match <- > Match(Y=NULL,Tr=z,X=logit.reg$fitted,version='fast',ties=TRUE,M=1,distance.tolerance=2e-5) > summary(abs(logit.reg$fitted[match$index.treated]-logit.reg$fitted[match$index.control])) Min. 1st Qu. Median Mean 3rd Qu. Max. 7.453e-13 4.133e-07 8.208e-07 8.230e-07 1.232e-06 1.652e-06 > match <- > Match(Y=NULL,Tr=z,X=logit.reg$fitted,version='fast',ties=TRUE,M=1,distance.tolerance=3e-5) > summary(abs(logit.reg$fitted[match$index.treated]-logit.reg$fitted[match$index.control])) Min. 1st Qu. Median Mean 3rd Qu. Max. 7.453e-13 5.051e-07 1.006e-06 1.008e-06 1.514e-06 2.022e-06 > match <- > Match(Y=NULL,Tr=z,X=logit.reg$fitted,version='fast',ties=TRUE,M=1,distance.tolerance=4e-5) > summary(abs(logit.reg$fitted[match$index.treated]-logit.reg$fitted[match$index.control])) Min. 1st Qu. Median Mean 3rd Qu. Max. 7.453e-13 5.818e-07 1.162e-06 1.166e-06 1.750e-06 2.365e-06 So, although there are more control units available with distances greater than 1.167e-6, for some reason the function doesn't select those and instead clips it at this value even when the tolerance is set at 1e-5. Similar issues occur at higher tolerances. I really hope someone can help me resolve this. Thanks a lot! -- View this message in context: http://r.789695.n4.nabble.com/Matching-package-Match-function-tp3406144p3406144.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.