Dear Phil, I'll bypass questions about why one would want to do this, why you're using glm() rather than lm(), etc., and just point out that the *studentized* residual for the 11th observation is undefined for your example. Simplifying your code:
-------- snip ------- > y <- c(1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,13.98) > x1 <- c(4.6, 35.8, 4.1, 5.4, 5.2, 27.9, 48.2, 4.5, 5.7, 4.2, 100.5) > x2 <- c(0.2990466, 1.0497156, 0.2805028, 0.3517993, 0.3543678, 1.0178604, + 0.4933182, 0.2810865, 0.4349550, 0.3269192, 3.0889747) > mod.glm <- glm(y ~ x1 + x2) > rstudent(mod.glm) 1 2 3 4 5 6 7 8 9 0.459064418 -2.342404171 0.520003419 0.284592752 0.274181980 -2.283387024 1.236226216 0.521360367 0.007045656 10 11 0.359359609 NaN > residuals(mod.glm) 1 2 3 4 5 6 7 8 9 10 0.66006969 -2.58163338 0.74389676 0.41298738 0.39764981 -2.53508626 0.31282392 0.74657662 0.01020209 0.51813375 11 1.31437963 ------------- snip ---------- Best, John On Tue, 21 Oct 2014 09:34:07 +0200 Phil <chobop...@gmail.com> wrote: > Hi all, > > as John pointed out, there is a way to create settings where the studentized > residuals are undefined. However, after cross-checking it seems that the > residuals are getting calculated without any error. The problem comes up when > I use outlierTest to assign a p,q value,respectively. > > Below is the code and some printouts: > > > non.zero.orga.vector <- > c(1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,13.98) #targervector > > print(meta.Matrix[,3]) > x1 x28 x2 x3 x4 x51 x52 x5 > x6 x7 x82 > 4.6 35.8 4.1 5.4 5.2 27.9 48.2 4.5 > 5.7 4.2 100.5 > > print(meta.Matrix[,10]) > x1 x28 x2 x3 > 0.2990466 1.0497156 0.2805028 0.3517993 > > x4 x51 x52 x5 > 0.3543678 1.0178604 0.4933182 0.2810865 > > x6 x7 x82 > 0.4349550 0.3269192 3.0889747 > > glModel <- glm(non.zero.orga.vector ~ meta.Matrix[,3]+meta.Matrix[,10]) > #create glm for testing > > print(glModel$residuals) #check if residuals are calculated for every entry > in the target vector > x1 x28 x2 x3 > x4 x51 x52 > 0.6600696 -2.5816334 0.7438969 0.4129873 0.3976498 -2.5350861 > 0.3128240 > x5 x6 x7 x82 > 0.7465766 0.0102020 0.5181337 1.3143796 > > > > test.Res <- outlierTest(glModel,digits=4,cutoff=Inf,n.max=Inf) #test > the glm for outlier, cutoff=Inf/n.max=Inf == report everything > > print(test.Res$bonf.p) #check if q-value exists > x28 x51 x52 x5 x2 > x1 x7 x3 > 0.1915995 0.2240759 2.1637438 6.0211575 6.0306110 6.4618792 7.1932613 > 7.7595619 > x4 x6 > 7.8394477 9.9437848 > > > > As you see the glm calculates residuals for x82 (which is in fact 1.314) but > the outlierTest does not assign a p/q value to it. Does anyone know why? > > Thanks in advance, > > Phil > > On 10/17/2014 08:54 PM, John Fox wrote: > > Dear Phil, > > > > Yes, that's a bit clearer. One can invent data configurations where certain > > studentized residuals are undefined. For example, try the following: > > > > y <- c(0, 0, 0, 0, 0, 1) > > x <- 1:6 > > xx <- (1:6 - 3.5)^2 > > rstudent(lm(y ~ x)) > > rstudent(lm(y ~ xx)) > > plot(x, y) > > plot(xx, y) > > > > The plots should clarify what's going on. > > > > I'm copying to r-help since the discussion began there. > > > > I hope this helps, > > John > > > > On Fri, 17 Oct 2014 18:35:59 +0200 > > Philip Stevens <chobop...@gmail.com> wrote: > >> Dear John, > >> > >> Thank you for your fast reply. Unfortunately I am out of office right now > >> but I will get back to you on Monday asap with a toy example and some code. > >> Meanwhile let me try to explain further. > >> > >> Basically not the glm but the outlierTest afterwards fails. And it only > >> fails if all values, used to set up the glm, are exactly 1 except for one > >> value which can be arbitrary large. I construct the glm from a vector of > >> doubles (the target values) and a vector of other numeric values(metadata). > >> (So this should be fit <- glm(targetVector ~ metadata) ) And want to check > >> if one of those doubles(in the target vector) is an outlier using > >> outlierTest(fit). The residuals are calculated by the glm but the > >> outlierTest does not report a p nor a q value for the one value in the > >> target vector which is not 1. And I can't figure out why... > >> > >> I hope this makes it a bit clearer. > >> Anyways I will come back to you on Monday. > >> > >> Best, > >> > >> Phil > >> Am 17.10.2014 17:43 schrieb "John Fox" <j...@mcmaster.ca>: > >> > >>> Dear Phil, > >>> > >>> After reading your posting several times, I still don't understand what > >>> you did. As usual, having a reproducible example illustrating the error > >>> would be a great help. I do have a guess about the source of the error: > >>> glm() failed in some way for the problematic case. > >>> > >>> Best, > >>> John > >>> > >>> ------------------------------------------------ > >>> John Fox, Professor > >>> McMaster University > >>> Hamilton, Ontario, Canada > >>> http://socserv.mcmaster.ca/jfox/ > >>> > >>> > >>> On Fri, 17 Oct 2014 12:53:30 +0200 > >>> Phil <chobop...@gmail.com> wrote: > >>>> Hi guys, > >>>> > >>>> I came across a strange phenomena and can't figure out why it happens by > >>> myself so here we go. > >>>> I got a dataframe which consists of double numbers which I want to > >>> check, row-wise if there are outliers in the rows. > >>>> So I iterate over the rows and create a glm using the numbers of that > >>> particular row. Which might look like this: > >>>> case1) > >>>> x1 x2 x3 x4 x5 x6 x7 > >>>> x8 x9 x10 x11 > >>>> 0.00 3.91 0.00 0.00 0.00 68.03 40.39 0.00 > >>>> 0.00 0.00 4.11 > >>>> > >>>> or like this: > >>>> case2) > >>>> x1 x2 x3 x4 x5 x6 x7 > >>>> x8 x9 x10 x11 > >>>> 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 > >>>> 1.00 1.00 5.34 > >>>> > >>>> or any other combination of double numbers... > >>>> > >>>> however, using a glm like this: > >>>> > >>>> glModel <- glm(vector ~ some_other_meta_data_which_is_double_numbers) > >>>> > >>>> and testing it with: > >>>> > >>>> test.Res <- outlierTest(glModel,digits=4,cutoff=Inf,n.max=Inf) > >>>> > >>>> I always get a result consisting of the desired p and q values but not > >>> if the vector I use looks like case2. There is no error message and the > >>> computation does not stop either. > >>>> However, all p and q values are produced except for the last value x11. > >>>> > >>>> Any idea why this particular value gets dropped from the output of the > >>> outlierTest Method in the car package. > >>>> Here is the sessioninfo: > >>>> > >>>> sessionInfo() > >>>> R version 3.1.1 (2014-07-10) > >>>> Platform: x86_64-redhat-linux-gnu (64-bit) > >>>> > >>>> locale: > >>>> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > >>>> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 > >>>> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 > >>>> [7] LC_PAPER=en_US.utf8 LC_NAME=C > >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C > >>>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > >>>> > >>>> attached base packages: > >>>> [1] stats graphics grDevices utils datasets methods base > >>>> > >>>> other attached packages: > >>>> [1] ggplot2_1.0.0 car_2.0-21 RColorBrewer_1.0-5 iNEXT_1.0 > >>>> [5] vegan_2.0-10 lattice_0.20-29 permute_0.8-3 > >>>> > >>>> loaded via a namespace (and not attached): > >>>> [1] colorspace_1.2-4 compiler_3.1.1 digest_0.6.4 grid_3.1.1 > >>>> [5] gtable_0.1.2 labeling_0.3 MASS_7.3-33 munsell_0.4.2 > >>>> [9] nnet_7.3-8 plyr_1.8.1 proto_0.3-10 Rcpp_0.11.2 > >>>> [13] reshape2_1.4 scales_0.2.4 stringr_0.6.2 tools_3.1.1 > >>>> > >>>> Any help is highly appreciated. > >>>> > >>>> Thanks > >>>> > >>>> Phil > >>>> > >>>> ______________________________________________ > >>>> R-help@r-project.org mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible code. > >>> > >>> > >>> > >>> > > ------------------------------------------------ > > John Fox, Professor > > McMaster University > > Hamilton, Ontario, Canada > > http://socserv.mcmaster.ca/jfox/ > > > > > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ------------------------------------------------ John Fox, Professor McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.