On Wednesday 01 July 2009, roger koenker wrote: > url: www.econ.uiuc.edu/~roger Roger Koenker > email rkoen...@uiuc.edu Department of Economics > vox: 217-333-4558 University of Illinois > fax: 217-244-6678 Urbana, IL 61801 > > On Jul 1, 2009, at 4:52 PM, Dylan Beaudette wrote: > > On Wednesday 01 July 2009, roger koenker wrote: > >> It's not clear to me whether you are looking for an exploratory tool > >> or something more like formal inference. For the former, it seems > >> that estimating a few weighted quantiles would be quite useful. at > >> least > >> it is rather Tukeyesque. While I'm appealing to authorities, I can't > >> resist recalling that Galton's "invention of correlation" paper: Co- > >> relations > >> and their measurement, Proceedings of the Royal Society, 1888-89, > >> used medians and interquartile ranges. > > > > Thanks Roger. > > > > We are interested in a formal inference type of comparison. I have > > found that > > weighted density plots to be a helpful exploratory graphic-- but > > ultimately > > we would like to (as objectively as possible) determine if > > distributions are > > different. > > > > Unfortunately the distributions are very poorly behaved. Once such > > example can > > be seen here: > > http://169.237.35.250/~dylan/temp/ACTIVITY-method_comparision.pdf > > > > I was imagine using something like the number of 'significant' > > differences in > > n-quantiles would be used to determine which treatment group was most > > different than the control group. > > sure, you could do that but to be formal and account for multiple > testing you > are going to have some headaches. The much maligned two sample KS > tests might be a worthwhile alternative to consider.
Right... forgot about that. OK. I had considered the KS test, but have not yet been able to identify how one could include weights... There appear to be methods listed on the internet, but I haven't seen one implemented in R. Still looking... Dylan > > > Cheers, > > Dylan > > > >> url: www.econ.uiuc.edu/~roger Roger Koenker > >> email rkoen...@uiuc.edu Department of Economics > >> vox: 217-333-4558 University of Illinois > >> fax: 217-244-6678 Urbana, IL 61801 > >> > >> On Jul 1, 2009, at 2:48 PM, Dylan Beaudette wrote: > >>> Thanks Roger. Your comments were very helpful. Unfortunately, each > >>> of > >>> the 'groups' in this example are derived from the same set of data, > >>> two of > >>> which were subsets-- so it is not that unlikely that the weighted > >>> medians > >>> were the same in some cases. > >>> > >>> This all leads back to an operation attempting to answer the > >>> question: > >>> > >>> Of the 2 subsetting methods, which one produces a distribution most > >>> like the > >>> complete data set? Since the distributions are not normal, and there > >>> are > >>> area-weights involved others on the list suggested quantile- > >>> regression. For a > >>> more complete picture of how 'different' the distributions are, I > >>> have tried > >>> looking at the differences between weighted quantiles: (0.1, 0.25, > >>> 0.5, 0.75, > >>> 0.9) as a more complete 'description' of each distribution. > >>> > >>> I imagine that there may be a better way to perform this > >>> comparison... > >>> > >>> Cheers, > >>> Dylan > >>> > >>> On Tuesday 30 June 2009, roger koenker wrote: > >>>> Admittedly this seemed quite peculiar.... but if you look at the > >>>> entrails > >>>> of the following code you will see that with the weights the first > >>>> and > >>>> second levels of your x$method variable have the same (weighted) > >>>> median > >>>> so the contrast that you are estimating SHOULD be zero. Perhaps > >>>> there is something fishy about the data construction that would > >>>> have > >>>> allowed us to anticipate this? Regarding the "fn" option, and the > >>>> non-uniqueness warning, this is covered in the (admittedly > >>>> obscure) > >>>> faq on quantile regression available at: > >>>> > >>>> http://www.econ.uiuc.edu/~roger/research/rq/FAQ > >>>> > >>>> # example: > >>>> library(quantreg) > >>>> > >>>> # load data > >>>> x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv')) > >>>> > >>>> # with weights > >>>> summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5), > >>>> se='ker') > >>>> > >>>> #Reproduction with more convenient notation: > >>>> > >>>> X0 <- model.matrix(~method, data = x) > >>>> y <- x$sand > >>>> w <- x$area_fraction > >>>> f0 <- summary(rq(y ~ X0 - 1, weights = w),se = "ker") > >>>> > >>>> #Second reproduction with orthogonal design: > >>>> > >>>> X1 <- model.matrix(~method - 1, data = x) > >>>> f1 <- summary(rq(y ~ X1 - 1, weights = w),se = "ker") > >>>> > >>>> #Comparing f0 and f1 we see that they are consistent!! How can > >>>> that > >>>> be?? > >>>> #Since the columns of X1 are orthogonal estimation of the 3 > >>>> parameters > >>>> are separable > >>>> #so we can check to see whether the univariate weighted medians are > >>>> reproducible. > >>>> > >>>> s1 <- X1[,1] == 1 > >>>> s2 <- X1[,2] == 1 > >>>> g1 <- rq(y[s1] ~ X1[s1,1] - 1, weights = w[s1]) > >>>> g2 <- rq(y[s2] ~ X1[s2,2] - 1, weights = w[s2]) > >>>> > >>>> #Now looking at the g1 and g2 objects we see that they are equal > >>>> and > >>>> agree with f1. > >>>> > >>>> > >>>> url: www.econ.uiuc.edu/~roger Roger Koenker > >>>> email rkoen...@uiuc.edu Department of Economics > >>>> vox: 217-333-4558 University of Illinois > >>>> fax: 217-244-6678 Urbana, IL 61801 > >>>> > >>>> On Jun 30, 2009, at 3:54 PM, Dylan Beaudette wrote: > >>>>> Hi, > >>>>> > >>>>> I am trying to use quantile regression to perform weighted- > >>>>> comparisons of the > >>>>> median across groups. This works most of the time, however I am > >>>>> seeing some > >>>>> odd output in summary(rq()): > >>>>> > >>>>> Call: rq(formula = sand ~ method, tau = 0.5, data = x, weights = > >>>>> area_fraction) > >>>>> Coefficients: > >>>>> Value Std. Error t value Pr(>|t|) > >>>>> (Intercept) 45.44262 3.64706 12.46007 0.00000 > >>>>> methodmukey-HRU 0.00000 4.67115 0.00000 1.00000 > >>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > >>>>> > >>>>> When I do not include the weights, I get something a little closer > >>>>> to a > >>>>> weighted comparison of means, along with an error message: > >>>>> > >>>>> Call: rq(formula = sand ~ method, tau = 0.5, data = x) > >>>>> Coefficients: > >>>>> Value Std. Error t value Pr(>|t|) > >>>>> (Intercept) 44.91579 2.46341 18.23318 0.00000 > >>>>> methodmukey-HRU 9.57601 9.29348 1.03040 0.30380 > >>>>> Warning message: > >>>>> In rq.fit.br(x, y, tau = tau, ...) : Solution may be nonunique > >>>>> > >>>>> > >>>>> I have noticed that the error message goes away when specifying > >>>>> method='fn' to > >>>>> rq(). An example is below. Could this have something to do with > >>>>> replication > >>>>> in the data? > >>>>> > >>>>> > >>>>> # example: > >>>>> library(quantreg) > >>>>> > >>>>> # load data > >>>>> x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv')) > >>>>> > >>>>> # with weights > >>>>> summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5), > >>>>> se='ker') > >>>>> > >>>>> # without weights > >>>>> # note error message > >>>>> summary(rq(sand ~ method, data=x, tau=0.5), se='ker') > >>>>> > >>>>> # without weights, no error message > >>>>> summary(rq(sand ~ method, data=x, tau=0.5, method='fn'), se='ker') > > -- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.