[R] Opportunities for Developing R Packages (Research-Based, Open-Source)
Dear R package community, I am uncertain whether this is appropriate for this mailing list. Please let me know. If not, would you be so kind as to point me in a better direction? I am a mathematics major with a well-developed R experience. I have graduated two years ago and have been working in business operations in a cryptocurrency startup. I am rather rusty and I wish to venture back into statistical research and R-package development. My question is: For those researchers who are interested in developing tools and algorithms for their new-founded research, be it in medical statistics or data visualisation or machine learning, I was wondering whether is there a possibility for collaboration. This will help me extend my experience and possibly open more avenues for me to enter research. I am quite aware of statistical concepts and can read research papers (I've done a research internship in experimental design, linear algebra and data compression, particle filters and bayes analyses). I do not expect to be paid and am willing to commit to a project. Yours sincerely, Justin *I check my email at 9AM and 4PM everyday* *If you have an EMERGENCY, contact me at +447938674419(UK) or +60125056192(Malaysia)* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] igraph problem
Run this code tree<-graph_from_literal(1-+2:3,3-+5,1-+4); graph.bfs(tree,root=1, neimode="out",father=TRUE,order=TRUE,unreachable = FALSE) I do not understand why the father values will give NA 1 1 3 1 rather than NA 1 1 1 3 The reason I am doing this is to obtain the values(by vertex names) or some index of each individual branch in tree. Does anyone have any ideas on how to do this? Yours sincerely, Justin *I check my email at 9AM and 4PM everyday* *If you have an EMERGENCY, contact me at +447938674419 <07938%20674419>(UK) or +60125056192 <+60%2012-505%206192>(Malaysia)* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rmutil parameters for Pareto distribution
In https://en.wikipedia.org/wiki/Pareto_distribution, it is clear what the parameters are for the pareto distribution: *xmin *the scale parameter and *a* the shape parameter. I am using rmutil to generate random deviates from a pareto distribution. It says in the documentation that the probabilty density of the pareto distribution The Pareto distribution has density f(y) = s (1 + y/(m (s-1)))^(-s-1)/(m (s-1)) where m is the mean parameter of the distribution and s is the dispersion Through my experimentation of using rpareto function from the library using m as the scale parameter *xmin* value and s as the shape parameter* a* , I found that the deviates generated are not all larger than *xmin*. This leads me to believe that m and s are not the shape and scale parameter respectively. What is m and s? Could it be defined as the mean and variance respectively as shown on the wikipedia link? Yours sincerely, Justin *I check my email at 9AM and 4PM everyday* *If you have an EMERGENCY, contact me at +447938674419(UK) or +60125056192(Malaysia)* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kernel Density Estimation: Generate a sample from Epanechnikov Kernel
Below are samples from a kernel density estimated "data" with gaussian kernel. I really like this solution of estimation of a kernel because it is nice and elegant. fit<-density(data) rnorm(N, sample(data, size = N, replace = TRUE), fit$bw) #samples from kernel density estimation I am however interested in generating a kernel density estimate with an Epanechnikov kernel fit<-density(data,kernel = "epanechnikov") #is there a quick way to compute the samples and INCORPORATING THE BANDWIDTH of the #kernel density estimate Yours sincerely, Justin *I check my email at 9AM and 4PM everyday* *If you have an EMERGENCY, contact me at +447938674419(UK) or +60125056192(Malaysia)* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Estimated Effects Not Balanced
Hi, Thanks Richard, That was me playing with too many examples and having too many variables just lying around. Thanks for the tip though. On 22 August 2016 at 23:32, Bert Gunter <bgunter.4...@gmail.com> wrote: > Thanks, Rich. I didn't notice that! > > -- Bert > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Aug 22, 2016 at 1:43 PM, Richard M. Heiberger <r...@temple.edu> > wrote: > > The problem is that you have 12 observations and 1+2+10=13 degrees of > freedom. > > There should be 1 + 2 + 8 = 11 degrees of freedom. > > Probably one of your variables is masked by something else in you > workspace. > > Protect yourself by using a data.frame > > > >> tmp <- data.frame(A=factor(c(1,1,1,1,1,1,2,2,2,2,2,2)), > > + B=factor(c(1,1,2,2,3,3,1,1,2,2,3,3)), > > + y=rnorm(12)) > >> mod <- aov(y ~ A+B, data=tmp) > >> summary(mod) > > Df Sum Sq Mean Sq F value Pr(>F) > > A 1 1.553 1.553 1.334 0.281 > > B2 3.158 1.579 1.357 0.311 > > Residuals8 9.311 1.164 > > > > On Mon, Aug 22, 2016 at 11:15 AM, Justin Thong <justinthon...@gmail.com> > wrote: > >> Something does not make sense in R. It has to do with the question of > >> balance and unbalance. > >> > >> *A<-factor(c(1,1,1,1,1,1,2,2,2,2,2,2))* > >> *B<-factor(c(1,1,2,2,3,3,1,1,2,2,3,3))* > >> *y<-rnorm(12)* > >> *mod<-aov(y~A+B)* > >> > >> I was under the impression that the design is balanced ie order does not > >> effect the sums of squares. However, when I compute the anova R reports > >> that the Estimated Effects are Unbalanced. I thought that when all > >> combinations of levels of A and B have equal replications then the > design > >> is called balanced. But, R tends to think that when not all levels of A > and > >> levels of B have equal replication, then the "Estimated Effects are > >> unbalanced" Is this the same as the design being unbalanced? Because > >> for the example below, where the error occured, the order does not > matter > >> (which make me think that the design is balanced). > >> > >> > >> *Call:* > >> * aov(formula = y ~ A + B)* > >> > >> *Terms:* > >> *A B Residuals* > >> *Sum of Squares 0.872572 0.025604 16.805706* > >> *Deg. of Freedom 1 210* > >> > >> *Residual standard error: 1.296368* > >> *Estimated effects may be unbalanced* > >> -- > >> Yours sincerely, > >> Justin > >> > >> *I check my email at 9AM and 4PM everyday* > >> *If you have an EMERGENCY, contact me at +447938674419(UK) or > >> +60125056192(Malaysia)* > >> > >> [[alternative HTML version deleted]] > >> > >> __ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > -- Yours sincerely, Justin *I check my email at 9AM and 4PM everyday* *If you have an EMERGENCY, contact me at +447938674419(UK) or +60125056192(Malaysia)* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Estimated Effects Not Balanced
Something does not make sense in R. It has to do with the question of balance and unbalance. *A<-factor(c(1,1,1,1,1,1,2,2,2,2,2,2))* *B<-factor(c(1,1,2,2,3,3,1,1,2,2,3,3))* *y<-rnorm(12)* *mod<-aov(y~A+B)* I was under the impression that the design is balanced ie order does not effect the sums of squares. However, when I compute the anova R reports that the Estimated Effects are Unbalanced. I thought that when all combinations of levels of A and B have equal replications then the design is called balanced. But, R tends to think that when not all levels of A and levels of B have equal replication, then the "Estimated Effects are unbalanced" Is this the same as the design being unbalanced? Because for the example below, where the error occured, the order does not matter (which make me think that the design is balanced). *Call:* * aov(formula = y ~ A + B)* *Terms:* *A B Residuals* *Sum of Squares 0.872572 0.025604 16.805706* *Deg. of Freedom 1 210* *Residual standard error: 1.296368* *Estimated effects may be unbalanced* -- Yours sincerely, Justin *I check my email at 9AM and 4PM everyday* *If you have an EMERGENCY, contact me at +447938674419(UK) or +60125056192(Malaysia)* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Intercept in Model Matrix (Parameters not what I expected)
I have something which has been bugging me and I have even asked this on cross validated but I did not get a response. Let's construct a simple example. Below is the code. A<-gl(2,4) #factor of 2 levels B<-gl(4,2) #factor of 4 levels df<-data.frame(y,A,B) As you can see, B is nested within A. The peculiar result I am interested in the output of the model matrix when I fit for a nested model . *How does R decide what is included inside the intercept?* Since we are using dummy coding, the coefficients of the model is interpreted as the difference between a particular level and the reference level/the intercept for an single factor model. I understand for model ~A, A1 becomes the intercept and that for model ~A+B, A1 and B1 (both) become the intercept. *I do not get why when we use a nested model, A1:B2 appears as a column inside the model matrix. Why isn't the first parameter of the interaction subspace A1:B1 or A2:B1? *I think I am missing the concept. I think the intercept is A1. *Hence, Why do we not compare the levels of A1:B1 and A1(intercept) or A2:B1 and A1(intercept)?* #nested model > mod<-aov(y~A+A:B) > model.matrix(mod) (Intercept) A2 A1:B2 A2:B2 A1:B3 A2:B3 A1:B4 A2:B4 1 1 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 3 1 0 1 0 0 0 0 0 4 1 0 1 0 0 0 0 0 5 1 1 0 0 0 1 0 0 6 1 1 0 0 0 1 0 0 7 1 1 0 0 0 0 0 1 8 1 1 0 0 0 0 0 1 -- Yours sincerely, Justin *I check my email at 9AM and 4PM everyday* *If you have an EMERGENCY, contact me at +447938674419(UK) or +60125056192(Malaysia)* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] What is "args" in this function?
Hi again I need help *R-code* debug(model.matrix) model.matrix(~S) *model.matrix code* ans <- .External2(C_modelmatrix, t, data) #t =terms(object) , data="data frame of object" *modelframe C-code* SEXP modelframe(SEXP call, SEXP op, SEXP args, SEXP rho) { SEXP terms, data, names, variables, varnames, dots, dotnames, na_action; SEXP ans, row_names, subset, tmp; char buf[256]; int i, j, nr, nc; int nvars, ndots, nactualdots; const void *vmax = vmaxget(); args = CDR(args); terms = CAR(args); args = CDR(args); row_names = CAR(args); args = CDR(args); variables = CAR(args); args = CDR(args); varnames = CAR(args); args = CDR(args); dots = CAR(args); args = CDR(args); dotnames = CAR(args); args = CDR(args); subset = CAR(args); args = CDR(args); na_action = CAR(args); . . . . I am sorry I virtually have no experience in C. Can someone explain to me what "args" is at the point when it enters the function? I know CAR points to the first element of an object, and CDR points to the complement of the first element of an object. Does "args" represent the list of t and data? or Does "args" represent the thrid argument in .External2 which is data? or something else I am guessing this whole process of playing CAR and CDR is just a way of extracting variables from "args" until everything thing in "args" is assigned to. For instance, if args=(1,2,3,4,5,6) then below correspond in square brackets args = CDR(args); [(1,2,3,4,5,6)] terms = CAR(args) ;[(1)] args = CDR(args);[(2,3,4,5,6)] row_names = CAR(args);[(2)] args = CDR(args);[(3,4,5,6)] variables = CAR(args);[(3)] args = CDR(args);[(4,5,6)] varnames = CAR(args);[(4)] args = CDR(args);[(5,6)] etc Is this correct? I am sorry if I am asking too many questions on C. Please advise if I am posting inappropriately. -- Yours sincerely, Justin *I check my email at 9AM and 4PM everyday* *If you have an EMERGENCY, contact me at +447938674419(UK) or +60125056192(Malaysia)* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ways to understand C code (like debug function)
Hi I need some advice. Note: I do not know anything from C apart from my 2 days of research. I am currently trying to make meaning of the modelmatrix function (written in C) and called from R function model.matrix() via .External2. In trying to view the source code (in R) for model.matrix(), I have been reasonably succesful thanks to the debug command. This command was good because I was able to check line-by-line what the code was doing and obtain an output within my R console. Furthermore, checking the values of each of my variables while sequentially moving through the lines was also very useful. However, just by looking at the R source code, it is insufficient in understanding most of the computation. I have to look within the C code. In particular, within model.matrix(), a .External2 call is executed to a C function named modelmatrix. I downloaded the source from the website and can view the function modelmatrix(in model.c) in a text editor. I am now finding a way to play with the code so I understand whats going on and I don't know what's the best way to do this. I* was wondering whether there is an equivalent way as the debug function to check C code line by line. ie each line of code are typed in, and an output is obtained*. I know a package "inline" allows you to build C functions and use them in R. But I can't find anything which does what I want. * If this is not possible, is there an alternative good, easy way to run through and understand the commands in C that anyone knows about.* -- Yours sincerely, Justin *I check my email at 9AM and 4PM everyday* *If you have an EMERGENCY, contact me at +447938674419(UK) or +60125056192(Malaysia)* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reference for aov()
Hi Peter, Thank you for your good answer. I am sorry for the late reply. *An ortogonalized model matrix generates a decomposition of the model space into orthogonal subspaces corresponding to the terms of the model. Projections onto each of the subspaces are easily worked out. E.g., for a two-way analysis (Y~row+col+row:col) you can decompose the model effect as a row effect, a column effect, and an interaction effect. This allows straightforward calculation of the sums of squares of the ANOVA table. As you are probably well aware, if the design is unbalanced, the results will depend on the order of terms -- col + row + col:row gives a different result.* It may be a stupid question. How are projections of each sums of squares easily worked out and how does the sums of squares follow easily? Does it matter that certain parameters of the model are not estimated. R appears to just give a sums of squares despite some of the parameters being non-estimable. Thank you On 14 July 2016 at 09:50, peter dalgaard <pda...@gmail.com> wrote: > I am not aware of a detailed documentation of this beyond the actual > source code. > However, the principles are fairly straightforward, except that the rules > for constructing the design matrix from the model formula can be a bit > arcane at times. > > The two main tools are the design matrix constructor (model.matrix) and a > Gram-Schmidt type ortogonalization of its columns (the latter is called a > QR decomposition in R, which it is, but there are several algorithms for > QR, and the linear models codes depend on the QR algorithm being based on > orthogonalization - so LINPACK works and LAPACK doesn't). > > An ortogonalized model matrix generates a decomposition of the model space > into orthogonal subspaces corresponding to the terms of the model. > Projections onto each of the subspaces are easily worked out. E.g., for a > two-way analysis (Y~row+col+row:col) you can decompose the model effect as > a row effect, a column effect, and an interaction effect. This allows > straightforward calculation of the sums of squares of the ANOVA table. As > you are probably well aware, if the design is unbalanced, the results will > depend on the order of terms -- col + row + col:row gives a different > result. > > What aov() does is that it first decomposes the observations according to > the Error() term, forming the error strata, then fits the systematic part > of the model to each stratum in turn. In the nice cases, each term of the > model will be estimable in exactly one stratum, and part of the aov() logic > is to detect and remove unestimable terms. E.g., if you have a balanced two > way layout, say individual x treatment, the variable gender is a subfactor > of individual, so Y ~ gender * treatment + Error(individual/treatment), the > gender effect is estimated in the individual stratum, whereas treatment and > gender:treatment are estimated in the individual:treatment stratum. > > It should be noted that it is very hard to interpret the results of aov() > unless the Error() part of the model corresponds to a balanced experimental > design. Or put more sharply: The model implied by the decomposition into > error strata becomes nonsensical otherwise. If you do have a balanced > design, the error strata reduce to simple combinations of means and > observation, so the aov() algorithm is quite inefficient, but to my > knowledge nobody has bothered to try and do better. > > -pd > > > On 13 Jul 2016, at 18:18 , Justin Thong <justinthon...@gmail.com> wrote: > > > > Hi > > > > *I have been looking for a reference to explain how R uses the aov > > command(at a deeper level)*. More specifically, how R reads the formulae > > and R computes the sums of squares. I am not interested in understanding > > what the difference of Type 1,2,3 sum of squares are. I am more > interested > > in finding out about how R computes ~x1:x2:x3 or how R computes ~A:x1 > > emphasizing sequential nature of the way it computes, and models even > more > > complicated than this. > > > > Yours sincerely, > > Justin > > > > *I check my email at 9AM and 4PM everyday* > > *If you have an EMERGENCY, contact me at +447938674419(UK) or > > +60125056192(Malaysia)* > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics
[R] Linear Dependance of Model Matrix and How Fitted/ Sums of Squares Follow
Below is the covariates for a model ~x1+x2+x3+x4+x5+x6. I noticed that when fitting this model that the coefficient x6 is unestimable.*Is this merely a case that adding more columns to my model matrix will eventually lead to linear dependance so the more terms I have in the model formulae the more likely the model matrix becomes linearly dependant?* I found that for model formulae ~x1+x2+x3+x4+x5, all the coefficients are estimable so I guess this example supports my statement. But that being said, since not all coefficients are estimated then how does R compute the fitted values and anova table. *Does it just ignore the existence of x6 and consider the model to be ~x1+x2+x3+x4+x5? Or is there something deeper that I do not understand.* Because the sums of squares and fitted seem to be the same for model ~x1+x2+x3+x4+x5 as it is for ~x1+x2+x3+x4+x5+x6 However, this is not so clear cut for model with factors. Because factors are only represented by a parameter for each level in the model matrix. Consider factor F with 2 levels and G with 3 levels. The problem is that R has a way of excluding certain rows from the anova table. Again, it can be seen that it excludes the rows associated with the parameters which are not estimable, but this is not absolutely clear in my mind.Look at small example below for model ~F*G for two factors. As you can see, the interaction parameters are not estimable ie F2:G2 and F2:G3. Now from what I was told, F1 and G1 is contained within the (Intercept) parameter so F1:G1, F1:G2, F2:G1 are not considered. You can see from the anova table that the interaction row F:G is ignored. My main problem is why is it ignored. *Does that mean that if all the parameters (excluding the ones asasociated with intercept) that is associated with a particular term is unestimable then the row of that term in the anova table is ignored? How many unestimable parameters must there be for the row of a term to be ignored? *Because If the answer to the second question is to calculate fitted values and sums of squares by ignoring unestimable parameters, then it means that the rows of sums of squares disappear for a different reason other than unestimability. Sorry for the generally wordy question. I may not be thinking of it in the correct manner and I would appreciate if anyone has an answer and perhaps even some generalisations towards the use of QR decomposition. (There is more code below this data) x1 x2 x3 x4 x5 x6 1 12 0 0 0 0 0 2 12 0 0 0 0 0 3 12 0 0 0 0 0 4 12 0 0 0 0 0 50 12 0 0 0 0 60 12 0 0 0 0 70 12 0 0 0 0 80 12 0 0 0 0 90 0 12 0 0 0 10 0 0 12 0 0 0 11 0 0 12 0 0 0 12 0 0 12 0 0 0 13 0 0 0 12 0 0 14 0 0 0 12 0 0 15 0 0 0 12 0 0 16 0 0 0 12 0 0 17 0 0 0 0 12 0 18 0 0 0 0 12 0 19 0 0 0 0 12 0 20 0 0 0 0 12 0 21 0 0 0 0 0 12 22 0 0 0 0 0 12 23 0 0 0 0 0 12 24 0 0 0 0 0 12 25 6 6 0 0 0 0 26 6 6 0 0 0 0 27 6 6 0 0 0 0 28 6 6 0 0 0 0 29 6 0 6 0 0 0 30 6 0 6 0 0 0 31 6 0 6 0 0 0 32 6 0 6 0 0 0 33 6 0 0 6 0 0 34 6 0 0 6 0 0 35 6 0 0 6 0 0 36 6 0 0 6 0 0 37 6 0 0 0 6 0 38 6 0 0 0 6 0 39 6 0 0 0 6 0 40 6 0 0 0 6 0 41 6 0 0 0 0 6 42 6 0 0 0 0 6 43 6 0 0 0 0 6 44 6 0 0 0 0 6 45 0 6 6 0 0 0 46 0 6 6 0 0 0 47 0 6 6 0 0 0 48 0 6 6 0 0 0 49 0 6 0 6 0 0 50 0 6 0 6 0 0 51 0 6 0 6 0 0 52 0 6 0 6 0 0 53 0 6 0 0 6 0 54 0 6 0 0 6 0 55 0 6 0 0 6 0 56 0 6 0 0 6 0 57 0 6 0 0 0 6 58 0 6 0 0 0 6 59 0 6 0 0 0 6 60 0 6 0 0 0 6 61 0 0 6 6 0 0 62 0 0 6 6 0 0 63 0 0 6 6 0 0 64 0 0 6 6 0 0 65 0 0 6 0 6 0 66 0 0 6 0 6 0 67 0 0 6 0 6 0 68 0 0 6 0 6 0 69 0 0 6 0 0 6 70 0 0 6 0 0 6 71 0 0 6 0 0 6 72 0 0 6 0 0 6 73 0 0 0 6 6 0 74 0 0 0 6 6 0 75 0 0 0 6 6 0 76 0 0 0 6 6 0 77 0 0 0 6 0 6 78 0 0 0 6 0 6 79 0 0 0 6 0 6 80 0 0 0 6 0 6 81 0 0 0 0 6 6 82 0 0 0 0 6 6 83 0 0 0 0 6 6 84 0 0 0 0 6 6 85 4 4 4 0 0 0 86 4 4 4 0 0 0 87 4 4 4 0 0 0 88 4 4 4 0 0 0 89 4 4 0 4 0 0 90 4 4 0 4 0 0 91 4 4 0 4 0 0 92 4 4 0 4 0 0 93 4 4 0 0 4 0 94 4 4 0 0 4 0 95 4 4 0 0 4 0 96 4 4 0 0 4 0 97 4 4 0 0 0 4 98 4 4 0 0 0 4 99 4 4 0 0 0 4 100 4 4 0 0 0 4 101 4 0 4 4 0 0 102 4 0 4 4 0 0 103 4 0 4 4 0 0 104 4 0 4 4 0 0 105 4 0 4 0 4 0 106 4 0 4 0 4 0 107 4 0 4 0 4 0 108 4 0 4 0 4 0 109 4 0 4 0 0 4 110 4 0 4 0 0 4 111 4 0 4 0 0 4 112 4 0 4 0 0 4 113 4 0 0 4 4 0 114 4 0 0 4 4 0 115 4 0 0 4 4
[R] Soft Question: Where to find this reference.
I notice a lot of r documentation refer to this reference below. I can't seem to find it anywhere. Does anyone have a link to point to where I can either view it or buy it? *Chambers, J. M., Freeny, A and Heiberger, R. M. (1992) Analysis of variance; designed experiments* -- Yours sincerely, Justin *I check my email at 9AM and 4PM everyday* *If you have an EMERGENCY, contact me at +447938674419(UK) or +60125056192(Malaysia)* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Missing rows anova
Hi Michael, Thank you for the reply. I am sorry I forgot to print out the anova table to make my question clear. DfSum Sq Mean Sq F value Pr(>F) S 20.199.630e-060.8180.444 x110.0002562.560e-04 21.751 9.44e-06 *** ID 47 0.0035247.498e-056.370 3.35e-15 *** Resid102 0.0012011.177e-05 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 There is *no* unique value for ID for each combination of S and x1. For example, S=1 and x1=0 can equal to either B or C or D or E or F . Perhaps you mean that for each combination of S and x1 have different values. If that is the case, I think maybe it makes sense. *What I think? * Anova has this thing where it fits the terms of 1st order first ( a formula term including no interactions) before it fits a 2nd order term ( a formula term including 1 interaction) and so on. First Order--> Second Order--> Third Order--> etc Therefore, it is known that the true fitting formula is not S+x1+S:x1+ID but it is S+x1+ID+S:x1. Hence, it appears that ID is fitted before S:x1 but since ID is a more refined factor than S:x1, it can be said that S:x1 is already included in the fit of ID so R recognizes the linear dependance and excludes the term S:x1. In other words, S:x1 is linearly dependant to ID. And so, the row S:x1 disappears because it is considered within ID. Does this makes sense? On 19 July 2016 at 16:19, Michael Dewey <li...@dewey.myzen.co.uk> wrote: > Presumably it disappears because there is a unique value of ID for eac > combination of S*x1 so they are indistinguishable. > > > On 19/07/2016 12:53, Justin Thong wrote: > >> Why does the S:x1 column disappear (presumably S:x1 goes into ID but I >> dont >> know why)? S is a factor, x1 is a covariate and ID is a factor. >> >> rich.side<-aov(y~S*x1+ID) >> summary(rich.side) >> >> Below is the model frame >> >> model.frame(~S*x1+ID) >> >> S x1 ID >> 1 1 12 A >> 2 1 12 A >> 3 1 12 A >> 4 1 12 A >> 5 1 0 B >> 6 1 0 B >> 7 1 0 B >> 8 1 0 B >> 9 1 0 C >> 10 1 0 C >> 11 1 0 C >> 12 1 0 C >> 13 1 0 D >> 14 1 0 D >> 15 1 0 D >> 16 1 0 D >> 17 1 0 E >> 18 1 0 E >> 19 1 0 E >> 20 1 0 E >> 21 1 0 F >> 22 1 0 F >> 23 1 0 F >> 24 1 0 F >> 25 2 6 AB >> 26 2 6 AB >> 27 2 6 AB >> 28 2 6 AB >> 29 2 6 AC >> 30 2 6 AC >> 31 2 6 AC >> 32 2 6 AC >> 33 2 6 AD >> 34 2 6 AD >> 35 2 6 AD >> 36 2 6 AD >> 37 2 6 AE >> 38 2 6 AE >> 39 2 6 AE >> 40 2 6 AE >> 41 2 6 AF >> 42 2 6 AF >> 43 2 6 AF >> 44 2 6 AF >> 45 2 0 BC >> 46 2 0 BC >> 47 2 0 BC >> 48 2 0 BC >> 49 2 0 BD >> 50 2 0 BD >> 51 2 0 BD >> 52 2 0 BD >> 53 2 0 BE >> 54 2 0 BE >> 55 2 0 BE >> 56 2 0 BE >> 57 2 0 BF >> 58 2 0 BF >> 59 2 0 BF >> 60 2 0 BF >> 61 2 0 CD >> 62 2 0 CD >> 63 2 0 CD >> 64 2 0 CD >> 65 2 0 CE >> 66 2 0 CE >> 67 2 0 CE >> 68 2 0 CE >> 69 2 0 CF >> 70 2 0 CF >> 71 2 0 CF >> 72 2 0 CF >> 73 2 0 DE >> 74 2 0 DE >> 75 2 0 DE >> 76 2 0 DE >> 77 2 0 DF >> 78 2 0 DF >> 79 2 0 DF >> 80 2 0 DF >> 81 2 0 EF >> 82 2 0 EF >> 83 2 0 EF >> 84 2 0 EF >> 85 3 4 ABC >> 86 3 4 ABC >> 87 3 4 ABC >> 88 3 4 ABC >> 89 3 4 ABD >> 90 3 4 ABD >> 91 3 4 ABD >> 92 3 4 ABD >> 93 3 4 ABE >> 94 3 4 ABE >> 95 3 4 ABE >> 96 3 4 ABE >> 97 3 4 ABF >> 98 3 4 ABF >> 99 3 4 ABF >> 100 3 4 ABF >> 101 3 4 ACD >> 102 3 4 ACD >> 103 3 4 ACD >> 104 3 4 ACD >> 105 3 4 ACE >> 106 3 4 ACE >> 107 3 4 ACE >> 108 3 4 ACE >>
[R] Missing rows anova
Why does the S:x1 column disappear (presumably S:x1 goes into ID but I dont know why)? S is a factor, x1 is a covariate and ID is a factor. rich.side<-aov(y~S*x1+ID) summary(rich.side) Below is the model frame model.frame(~S*x1+ID) S x1 ID 1 1 12 A 2 1 12 A 3 1 12 A 4 1 12 A 5 1 0 B 6 1 0 B 7 1 0 B 8 1 0 B 9 1 0 C 10 1 0 C 11 1 0 C 12 1 0 C 13 1 0 D 14 1 0 D 15 1 0 D 16 1 0 D 17 1 0 E 18 1 0 E 19 1 0 E 20 1 0 E 21 1 0 F 22 1 0 F 23 1 0 F 24 1 0 F 25 2 6 AB 26 2 6 AB 27 2 6 AB 28 2 6 AB 29 2 6 AC 30 2 6 AC 31 2 6 AC 32 2 6 AC 33 2 6 AD 34 2 6 AD 35 2 6 AD 36 2 6 AD 37 2 6 AE 38 2 6 AE 39 2 6 AE 40 2 6 AE 41 2 6 AF 42 2 6 AF 43 2 6 AF 44 2 6 AF 45 2 0 BC 46 2 0 BC 47 2 0 BC 48 2 0 BC 49 2 0 BD 50 2 0 BD 51 2 0 BD 52 2 0 BD 53 2 0 BE 54 2 0 BE 55 2 0 BE 56 2 0 BE 57 2 0 BF 58 2 0 BF 59 2 0 BF 60 2 0 BF 61 2 0 CD 62 2 0 CD 63 2 0 CD 64 2 0 CD 65 2 0 CE 66 2 0 CE 67 2 0 CE 68 2 0 CE 69 2 0 CF 70 2 0 CF 71 2 0 CF 72 2 0 CF 73 2 0 DE 74 2 0 DE 75 2 0 DE 76 2 0 DE 77 2 0 DF 78 2 0 DF 79 2 0 DF 80 2 0 DF 81 2 0 EF 82 2 0 EF 83 2 0 EF 84 2 0 EF 85 3 4 ABC 86 3 4 ABC 87 3 4 ABC 88 3 4 ABC 89 3 4 ABD 90 3 4 ABD 91 3 4 ABD 92 3 4 ABD 93 3 4 ABE 94 3 4 ABE 95 3 4 ABE 96 3 4 ABE 97 3 4 ABF 98 3 4 ABF 99 3 4 ABF 100 3 4 ABF 101 3 4 ACD 102 3 4 ACD 103 3 4 ACD 104 3 4 ACD 105 3 4 ACE 106 3 4 ACE 107 3 4 ACE 108 3 4 ACE 109 3 4 ACF 110 3 4 ACF 111 3 4 ACF 112 3 4 ACF 113 3 4 ADE 114 3 4 ADE 115 3 4 ADE 116 3 4 ADE 117 3 4 ADF 118 3 4 ADF 119 3 4 ADF 120 3 4 ADF 121 3 4 AEF 122 3 4 AEF 123 3 4 AEF 124 3 4 AEF 125 3 0 BCD 126 3 0 BCD 127 3 0 BCD 128 3 0 BCD 129 3 0 BCE 130 3 0 BCE 131 3 0 BCE 132 3 0 BCE 133 3 0 BCF 134 3 0 BCF 135 3 0 BCF 136 3 0 BCF 137 3 0 BDE 138 3 0 BDE 139 3 0 BDE 140 3 0 BDE 141 3 0 BDF 142 3 0 BDF 143 3 0 BDF 144 3 0 BDF 145 3 0 BEF 146 3 0 BEF 147 3 0 BEF 148 3 0 BEF 149 3 0 CDE 150 3 0 CDE 151 3 0 CDE 152 3 0 CDE 153 3 0 CDF 154 3 0 CDF 155 3 0 CDF 156 3 0 CDF 157 3 0 CEF 158 3 0 CEF 159 3 0 CEF 160 3 0 CEF 161 3 0 DEF 162 3 0 DEF 163 3 0 DEF 164 3 0 DEF -- Yours sincerely, Justin *I check my email at 9AM and 4PM everyday* *If you have an EMERGENCY, contact me at +447938674419(UK) or +60125056192(Malaysia)* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reference for aov()
Hi *I have been looking for a reference to explain how R uses the aov command(at a deeper level)*. More specifically, how R reads the formulae and R computes the sums of squares. I am not interested in understanding what the difference of Type 1,2,3 sum of squares are. I am more interested in finding out about how R computes ~x1:x2:x3 or how R computes ~A:x1 emphasizing sequential nature of the way it computes, and models even more complicated than this. Yours sincerely, Justin *I check my email at 9AM and 4PM everyday* *If you have an EMERGENCY, contact me at +447938674419(UK) or +60125056192(Malaysia)* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.