On Wednesday 01 July 2009, roger koenker wrote: > It's not clear to me whether you are looking for an exploratory tool > or something more like formal inference. For the former, it seems > that estimating a few weighted quantiles would be quite useful. at > least > it is rather Tukeyesque. While I'm appealing to authorities, I can't > resist recalling that Galton's "invention of correlation" paper: Co- > relations > and their measurement, Proceedings of the Royal Society, 1888-89, > used medians and interquartile ranges. >
Thanks Roger. We are interested in a formal inference type of comparison. I have found that weighted density plots to be a helpful exploratory graphic-- but ultimately we would like to (as objectively as possible) determine if distributions are different. Unfortunately the distributions are very poorly behaved. Once such example can be seen here: http://169.237.35.250/~dylan/temp/ACTIVITY-method_comparision.pdf I was imagine using something like the number of 'significant' differences in n-quantiles would be used to determine which treatment group was most different than the control group. Cheers, Dylan > url: www.econ.uiuc.edu/~roger Roger Koenker > email rkoen...@uiuc.edu Department of Economics > vox: 217-333-4558 University of Illinois > fax: 217-244-6678 Urbana, IL 61801 > > On Jul 1, 2009, at 2:48 PM, Dylan Beaudette wrote: > > Thanks Roger. Your comments were very helpful. Unfortunately, each of > > the 'groups' in this example are derived from the same set of data, > > two of > > which were subsets-- so it is not that unlikely that the weighted > > medians > > were the same in some cases. > > > > This all leads back to an operation attempting to answer the question: > > > > Of the 2 subsetting methods, which one produces a distribution most > > like the > > complete data set? Since the distributions are not normal, and there > > are > > area-weights involved others on the list suggested quantile- > > regression. For a > > more complete picture of how 'different' the distributions are, I > > have tried > > looking at the differences between weighted quantiles: (0.1, 0.25, > > 0.5, 0.75, > > 0.9) as a more complete 'description' of each distribution. > > > > I imagine that there may be a better way to perform this comparison... > > > > Cheers, > > Dylan > > > > On Tuesday 30 June 2009, roger koenker wrote: > >> Admittedly this seemed quite peculiar.... but if you look at the > >> entrails > >> of the following code you will see that with the weights the first > >> and > >> second levels of your x$method variable have the same (weighted) > >> median > >> so the contrast that you are estimating SHOULD be zero. Perhaps > >> there is something fishy about the data construction that would have > >> allowed us to anticipate this? Regarding the "fn" option, and the > >> non-uniqueness warning, this is covered in the (admittedly obscure) > >> faq on quantile regression available at: > >> > >> http://www.econ.uiuc.edu/~roger/research/rq/FAQ > >> > >> # example: > >> library(quantreg) > >> > >> # load data > >> x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv')) > >> > >> # with weights > >> summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5), > >> se='ker') > >> > >> #Reproduction with more convenient notation: > >> > >> X0 <- model.matrix(~method, data = x) > >> y <- x$sand > >> w <- x$area_fraction > >> f0 <- summary(rq(y ~ X0 - 1, weights = w),se = "ker") > >> > >> #Second reproduction with orthogonal design: > >> > >> X1 <- model.matrix(~method - 1, data = x) > >> f1 <- summary(rq(y ~ X1 - 1, weights = w),se = "ker") > >> > >> #Comparing f0 and f1 we see that they are consistent!! How can that > >> be?? > >> #Since the columns of X1 are orthogonal estimation of the 3 > >> parameters > >> are separable > >> #so we can check to see whether the univariate weighted medians are > >> reproducible. > >> > >> s1 <- X1[,1] == 1 > >> s2 <- X1[,2] == 1 > >> g1 <- rq(y[s1] ~ X1[s1,1] - 1, weights = w[s1]) > >> g2 <- rq(y[s2] ~ X1[s2,2] - 1, weights = w[s2]) > >> > >> #Now looking at the g1 and g2 objects we see that they are equal and > >> agree with f1. > >> > >> > >> url: www.econ.uiuc.edu/~roger Roger Koenker > >> email rkoen...@uiuc.edu Department of Economics > >> vox: 217-333-4558 University of Illinois > >> fax: 217-244-6678 Urbana, IL 61801 > >> > >> On Jun 30, 2009, at 3:54 PM, Dylan Beaudette wrote: > >>> Hi, > >>> > >>> I am trying to use quantile regression to perform weighted- > >>> comparisons of the > >>> median across groups. This works most of the time, however I am > >>> seeing some > >>> odd output in summary(rq()): > >>> > >>> Call: rq(formula = sand ~ method, tau = 0.5, data = x, weights = > >>> area_fraction) > >>> Coefficients: > >>> Value Std. Error t value Pr(>|t|) > >>> (Intercept) 45.44262 3.64706 12.46007 0.00000 > >>> methodmukey-HRU 0.00000 4.67115 0.00000 1.00000 > >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > >>> > >>> When I do not include the weights, I get something a little closer > >>> to a > >>> weighted comparison of means, along with an error message: > >>> > >>> Call: rq(formula = sand ~ method, tau = 0.5, data = x) > >>> Coefficients: > >>> Value Std. Error t value Pr(>|t|) > >>> (Intercept) 44.91579 2.46341 18.23318 0.00000 > >>> methodmukey-HRU 9.57601 9.29348 1.03040 0.30380 > >>> Warning message: > >>> In rq.fit.br(x, y, tau = tau, ...) : Solution may be nonunique > >>> > >>> > >>> I have noticed that the error message goes away when specifying > >>> method='fn' to > >>> rq(). An example is below. Could this have something to do with > >>> replication > >>> in the data? > >>> > >>> > >>> # example: > >>> library(quantreg) > >>> > >>> # load data > >>> x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv')) > >>> > >>> # with weights > >>> summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5), > >>> se='ker') > >>> > >>> # without weights > >>> # note error message > >>> summary(rq(sand ~ method, data=x, tau=0.5), se='ker') > >>> > >>> # without weights, no error message > >>> summary(rq(sand ~ method, data=x, tau=0.5, method='fn'), se='ker') -- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.