[R] Opportunities for Developing R Packages (Research-Based, Open-Source)

2019-04-22 Thread Justin Thong
Dear R package community,

I am uncertain whether this is appropriate for this mailing list. Please
let me know. If not, would you be so kind as to point me in a better
direction?

I am a mathematics major with a well-developed R experience. I have
graduated two years ago and have been working in business operations in a
cryptocurrency startup. I am rather rusty and I wish to venture back into
statistical research and R-package development.

My question is: For those researchers who are interested in developing
tools and algorithms for their new-founded research, be it in medical
statistics or data visualisation or machine learning, I was wondering
whether is there a possibility for collaboration. This will help me extend
my experience and possibly open more avenues for me to enter research.
I am quite aware of statistical concepts and can read research papers (I've
done a research internship in experimental design, linear algebra and data
compression, particle filters and bayes analyses). I do not expect to be
paid and am willing to commit to a project.


Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] igraph problem

2017-09-19 Thread Justin Thong
Run this code

tree<-graph_from_literal(1-+2:3,3-+5,1-+4);
graph.bfs(tree,root=1, neimode="out",father=TRUE,order=TRUE,unreachable =
FALSE)

I do not understand why the father values will give NA 1 1 3 1 rather than NA
1 1 1 3

The reason I am doing this is to obtain the values(by vertex names) or some
index of each individual branch in tree. Does anyone have any ideas on how
to do this?

Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419 <07938%20674419>(UK)
or +60125056192 <+60%2012-505%206192>(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rmutil parameters for Pareto distribution

2017-08-24 Thread Justin Thong
In https://en.wikipedia.org/wiki/Pareto_distribution, it is clear what the
parameters are for the pareto distribution: *xmin *the scale parameter and
*a* the shape parameter.

I am using rmutil to generate random deviates from a pareto distribution.
It says in the documentation that the probabilty density of the pareto
distribution

The Pareto distribution has density

f(y) = s (1 + y/(m (s-1)))^(-s-1)/(m (s-1))

where m is the mean parameter of the distribution and s is the dispersion

Through my experimentation of using rpareto function from the library using
m as the scale parameter *xmin* value and s as the shape parameter* a* , I
found that the deviates generated are not all larger than *xmin*. This
leads me to believe that m and s are not the shape and scale parameter
respectively.

What is m and s? Could it be defined as the mean and variance respectively
 as shown on the wikipedia link?


Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Kernel Density Estimation: Generate a sample from Epanechnikov Kernel

2017-03-21 Thread Justin Thong
Below are samples from a kernel density estimated "data" with gaussian
kernel.
I really like this solution of estimation of a kernel because it is nice
and elegant.

fit<-density(data)
rnorm(N, sample(data, size = N, replace = TRUE), fit$bw)  #samples from
kernel density estimation

I am however interested in generating a kernel density estimate with
an Epanechnikov kernel

fit<-density(data,kernel = "epanechnikov")
#is there a quick way to compute the samples and INCORPORATING THE
BANDWIDTH of the #kernel density estimate


Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Estimated Effects Not Balanced

2016-08-23 Thread Justin Thong
Hi,

Thanks Richard,

That was me playing with too many examples and having too many variables
just lying around. Thanks for the tip though.

On 22 August 2016 at 23:32, Bert Gunter <bgunter.4...@gmail.com> wrote:

> Thanks, Rich. I didn't notice that!
>
> -- Bert
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Aug 22, 2016 at 1:43 PM, Richard M. Heiberger <r...@temple.edu>
> wrote:
> > The problem is that you have 12 observations and 1+2+10=13 degrees of
> freedom.
> > There should be 1 + 2 + 8 = 11 degrees of freedom.
> > Probably one of your variables is masked by something else in you
> workspace.
> > Protect yourself by using a data.frame
> >
> >> tmp <- data.frame(A=factor(c(1,1,1,1,1,1,2,2,2,2,2,2)),
> > + B=factor(c(1,1,2,2,3,3,1,1,2,2,3,3)),
> > + y=rnorm(12))
> >> mod <- aov(y ~ A+B, data=tmp)
> >> summary(mod)
> > Df Sum Sq Mean Sq F value Pr(>F)
> > A    1  1.553   1.553   1.334  0.281
> > B2  3.158   1.579   1.357  0.311
> > Residuals8  9.311   1.164
> >
> > On Mon, Aug 22, 2016 at 11:15 AM, Justin Thong <justinthon...@gmail.com>
> wrote:
> >> Something does not make sense in R. It has to do with the question of
> >> balance and unbalance.
> >>
> >> *A<-factor(c(1,1,1,1,1,1,2,2,2,2,2,2))*
> >> *B<-factor(c(1,1,2,2,3,3,1,1,2,2,3,3))*
> >> *y<-rnorm(12)*
> >> *mod<-aov(y~A+B)*
> >>
> >> I was under the impression that the design is balanced ie order does not
> >> effect the sums of squares. However, when I compute the anova R reports
> >> that the Estimated Effects are Unbalanced. I thought that when all
> >> combinations of levels of A and B have equal replications then the
> design
> >> is called balanced. But, R tends to think that when not all levels of A
> and
> >> levels of B have equal replication, then the "Estimated Effects are
> >> unbalanced" Is this the same as the design being unbalanced? Because
> >> for the example below, where the error occured, the order does not
> matter
> >> (which make me think that the design is balanced).
> >>
> >>
> >> *Call:*
> >> *   aov(formula = y ~ A + B)*
> >>
> >> *Terms:*
> >> *A B Residuals*
> >> *Sum of Squares   0.872572  0.025604 16.805706*
> >> *Deg. of Freedom 1 210*
> >>
> >> *Residual standard error: 1.296368*
> >> *Estimated effects may be unbalanced*
> >> --
> >> Yours sincerely,
> >> Justin
> >>
> >> *I check my email at 9AM and 4PM everyday*
> >> *If you have an EMERGENCY, contact me at +447938674419(UK) or
> >> +60125056192(Malaysia)*
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>



-- 
Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Estimated Effects Not Balanced

2016-08-22 Thread Justin Thong
Something does not make sense in R. It has to do with the question of
balance and unbalance.

*A<-factor(c(1,1,1,1,1,1,2,2,2,2,2,2))*
*B<-factor(c(1,1,2,2,3,3,1,1,2,2,3,3))*
*y<-rnorm(12)*
*mod<-aov(y~A+B)*

I was under the impression that the design is balanced ie order does not
effect the sums of squares. However, when I compute the anova R reports
that the Estimated Effects are Unbalanced. I thought that when all
combinations of levels of A and B have equal replications then the design
is called balanced. But, R tends to think that when not all levels of A and
levels of B have equal replication, then the "Estimated Effects are
unbalanced" Is this the same as the design being unbalanced? Because
for the example below, where the error occured, the order does not matter
(which make me think that the design is balanced).


*Call:*
*   aov(formula = y ~ A + B)*

*Terms:*
*A B Residuals*
*Sum of Squares   0.872572  0.025604 16.805706*
*Deg. of Freedom 1 210*

*Residual standard error: 1.296368*
*Estimated effects may be unbalanced*
-- 
Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Intercept in Model Matrix (Parameters not what I expected)

2016-08-21 Thread Justin Thong
I have something which has been bugging me and I have even asked this on
cross validated but I did not get a response.  Let's construct a simple
example. Below is the code.

A<-gl(2,4) #factor of 2 levels
B<-gl(4,2) #factor of 4 levels
df<-data.frame(y,A,B)

As you can see, B is nested within A.
The peculiar result I am interested in the output of the model matrix when
I fit for a nested model . *How does R decide what is included inside the
intercept?* Since we are using dummy coding, the coefficients of the model
is interpreted as the difference between a particular level and the
reference level/the intercept for an single factor model. I understand for
model ~A, A1 becomes the intercept and that for model ~A+B, A1 and B1
(both) become the intercept.

*I do not get why when we use a nested model, A1:B2 appears as a column
inside the model matrix. Why isn't the first parameter of the interaction
subspace A1:B1 or A2:B1? *I think I am missing the concept. I think the
intercept is A1. *Hence, Why do we not compare the levels of A1:B1 and
A1(intercept)  or A2:B1 and A1(intercept)?*

#nested model
> mod<-aov(y~A+A:B)
> model.matrix(mod)
  (Intercept) A2 A1:B2 A2:B2 A1:B3 A2:B3 A1:B4 A2:B4
1   1  0 0 0 0 0 0 0
2   1  0 0 0 0 0 0 0
3   1  0 1 0 0 0 0 0
4   1  0 1 0 0 0 0 0
5   1  1 0 0 0 1 0 0
6   1  1 0 0 0 1 0 0
7   1  1 0 0 0 0 0 1
8   1  1 0 0 0 0 0 1


-- 
Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] What is "args" in this function?

2016-08-02 Thread Justin Thong
Hi again I need help

*R-code*
debug(model.matrix)
model.matrix(~S)

*model.matrix code*
ans <- .External2(C_modelmatrix, t, data) #t =terms(object) , data="data
frame of object"

*modelframe C-code*
SEXP modelframe(SEXP call, SEXP op, SEXP args, SEXP rho)
{
SEXP terms, data, names, variables, varnames, dots, dotnames, na_action;
SEXP ans, row_names, subset, tmp;
char buf[256];
int i, j, nr, nc;
int nvars, ndots, nactualdots;
const void *vmax = vmaxget();

args = CDR(args);
terms = CAR(args); args = CDR(args);
row_names = CAR(args); args = CDR(args);
variables = CAR(args); args = CDR(args);
varnames = CAR(args); args = CDR(args);
dots = CAR(args); args = CDR(args);
dotnames = CAR(args); args = CDR(args);
subset = CAR(args); args = CDR(args);
na_action = CAR(args);

. . . .

I am sorry I virtually have no experience in C.
Can someone explain to me what "args" is at the point when it enters the
function? I know CAR points to the first element of an object, and CDR
points to the complement of the first element of an object.

Does "args" represent the list of t and data?
or
Does "args" represent the thrid argument in .External2 which is data?
or
something else

I am guessing this whole process of playing CAR and CDR is just a way of
extracting variables from "args" until everything thing in "args" is
assigned to.

For instance, if args=(1,2,3,4,5,6) then below correspond in square
brackets

  args = CDR(args); [(1,2,3,4,5,6)]
  terms = CAR(args) ;[(1)] args = CDR(args);[(2,3,4,5,6)]
row_names = CAR(args);[(2)] args = CDR(args);[(3,4,5,6)]
variables = CAR(args);[(3)] args = CDR(args);[(4,5,6)]
varnames = CAR(args);[(4)] args = CDR(args);[(5,6)]
   etc

Is this correct?

I am sorry if I am asking too many questions on C. Please advise if I am
posting inappropriately.



-- 
Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Ways to understand C code (like debug function)

2016-08-01 Thread Justin Thong
Hi

I need some advice. Note: I do not know anything from C apart from my 2
days of research.

I am currently trying to make meaning of the modelmatrix function (written
in C) and called from R function model.matrix() via .External2.

In trying to view the source code (in R) for model.matrix(), I have been
reasonably succesful thanks to the debug command. This command was good
because I was able to check line-by-line what the code was doing and obtain
an output within my R console. Furthermore, checking the values of each of
my variables while sequentially moving through the lines was also very
useful. However, just by looking at the R source code, it is insufficient
in understanding most of the computation. I have to look within the C code.
In particular, within model.matrix(), a .External2 call is executed to a C
function named modelmatrix. I downloaded the source from the website and
can view the function modelmatrix(in model.c) in a text editor. I am now
finding a way to play with the code so I understand whats going on and I
don't know what's the best way to do this.

I* was wondering whether there is an equivalent way as the debug function
to check C code line by line. ie each line of code are typed in, and an
output is obtained*. I know a package "inline" allows you to build C
functions and use them in R. But I can't find anything which does what I
want. * If this is not possible, is there an alternative good, easy way to
run through and understand the commands in C that anyone knows about.*

-- 
Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reference for aov()

2016-07-27 Thread Justin Thong
Hi Peter,


Thank you for your good answer. I am sorry for the late reply.

*An ortogonalized model matrix generates a decomposition of the model space
into orthogonal subspaces corresponding to the terms of the model.
Projections onto each of the subspaces are easily worked out.  E.g., for a
two-way analysis (Y~row+col+row:col) you can decompose the model effect as
a row effect, a column effect, and an interaction effect. This allows
straightforward calculation of the sums of squares of the ANOVA table. As
you are probably well aware, if the design is unbalanced, the results will
depend on the order of terms -- col + row + col:row gives a different
result.*

It may be a stupid question. How are projections of each sums of squares
easily worked out and how does the sums of squares follow easily? Does it
matter that certain parameters of the model are not estimated. R appears to
just give a sums of squares despite some of the parameters being
non-estimable.

Thank you





On 14 July 2016 at 09:50, peter dalgaard <pda...@gmail.com> wrote:

> I am not aware of a detailed documentation of this beyond the actual
> source code.
> However, the principles are fairly straightforward, except that the rules
> for constructing the design matrix from the model formula can be a bit
> arcane at times.
>
> The two main tools are the design matrix constructor (model.matrix) and a
> Gram-Schmidt type ortogonalization of its columns (the latter is called a
> QR decomposition in R, which it is, but there are several algorithms for
> QR, and the linear models codes depend on the QR algorithm being based on
> orthogonalization - so LINPACK works and LAPACK doesn't).
>
> An ortogonalized model matrix generates a decomposition of the model space
> into orthogonal subspaces corresponding to the terms of the model.
> Projections onto each of the subspaces are easily worked out.  E.g., for a
> two-way analysis (Y~row+col+row:col) you can decompose the model effect as
> a row effect, a column effect, and an interaction effect. This allows
> straightforward calculation of the sums of squares of the ANOVA table. As
> you are probably well aware, if the design is unbalanced, the results will
> depend on the order of terms -- col + row + col:row gives a different
> result.
>
> What aov() does is that it first decomposes the observations according to
> the Error() term, forming the error strata, then fits the systematic part
> of the model to each stratum in turn. In the nice cases, each term of the
> model will be estimable in exactly one stratum, and part of the aov() logic
> is to detect and remove unestimable terms. E.g., if you have a balanced two
> way layout, say individual x treatment, the variable gender is a subfactor
> of individual, so Y ~ gender * treatment + Error(individual/treatment), the
> gender effect is estimated in the individual stratum, whereas treatment and
> gender:treatment are estimated in the individual:treatment stratum.
>
> It should be noted that it is very hard to interpret the results of aov()
> unless the Error() part of the model corresponds to a balanced experimental
> design. Or put more sharply: The model implied by the decomposition into
> error strata becomes nonsensical otherwise. If you do have a balanced
> design, the error strata reduce to simple combinations of means and
> observation, so the aov() algorithm is quite inefficient, but to my
> knowledge nobody has bothered to try and do better.
>
> -pd
>
> > On 13 Jul 2016, at 18:18 , Justin Thong <justinthon...@gmail.com> wrote:
> >
> > Hi
> >
> > *I have been looking for a reference to explain how R uses the aov
> > command(at a deeper level)*. More specifically, how R reads the formulae
> > and R computes the sums of squares. I am not interested in understanding
> > what the difference of Type 1,2,3 sum of squares are. I am more
> interested
> > in finding out about how R computes ~x1:x2:x3  or how R computes ~A:x1
> > emphasizing sequential nature of the way it computes, and models even
> more
> > complicated than this.
> >
> > Yours sincerely,
> > Justin
> >
> > *I check my email at 9AM and 4PM everyday*
> > *If you have an EMERGENCY, contact me at +447938674419(UK) or
> > +60125056192(Malaysia)*
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics

[R] Linear Dependance of Model Matrix and How Fitted/ Sums of Squares Follow

2016-07-26 Thread Justin Thong
Below is the covariates for a model ~x1+x2+x3+x4+x5+x6. I noticed that when
fitting this model that the coefficient x6 is unestimable.*Is this merely a
case that adding more columns to my model matrix will eventually lead to
linear dependance so the more terms I have in the model formulae the more
likely the model matrix becomes linearly dependant?*   I found that for
model formulae ~x1+x2+x3+x4+x5, all the coefficients are estimable so I
guess this example supports my statement.

But that being said, since not all coefficients are estimated then how does
R compute the fitted values and anova table. *Does it just ignore the
existence of x6 and consider the model to be ~x1+x2+x3+x4+x5? Or is there
something deeper that I do not understand.* Because the sums of squares and
fitted seem to be the same for model ~x1+x2+x3+x4+x5 as it is for
~x1+x2+x3+x4+x5+x6

However, this is not so clear cut for model with factors. Because factors
are only represented by a parameter for each level in the model matrix.
Consider factor F with 2 levels and G with 3 levels. The problem is that R
has a way of excluding certain rows from the anova table. Again, it can be
seen that it excludes the rows associated with the parameters which are not
estimable, but this is not absolutely clear in my mind.Look at small
example below for model ~F*G for two factors. As you can see, the
interaction parameters are not estimable ie F2:G2 and F2:G3. Now from what
I was told, F1 and G1 is contained within the (Intercept) parameter so
F1:G1, F1:G2, F2:G1 are not considered. You can see from the anova table
that the interaction row F:G is ignored. My main problem is why is it
ignored.
*Does that mean that if all the parameters (excluding the ones asasociated
with intercept) that is associated with a particular term is unestimable
then the row of that term in the anova table is ignored? How many
 unestimable parameters must there be for the row of a term to be
ignored? *Because
If the answer to the second question is to calculate fitted values and sums
of squares by ignoring unestimable parameters, then it means that the rows
of sums of squares disappear for a different reason other than
unestimability.

Sorry for the generally wordy question. I may not be thinking of it in the
correct manner and I would appreciate if anyone has an answer and perhaps
even some generalisations towards the use of QR decomposition.

(There is more code below this data)

 x1 x2 x3 x4 x5 x6
1   12  0  0  0  0  0
2   12  0  0  0  0  0
3   12  0  0  0  0  0
4   12  0  0  0  0  0
50 12  0  0  0  0
60 12  0  0  0  0
70 12  0  0  0  0
80 12  0  0  0  0
90  0 12  0  0  0
10   0  0 12  0  0  0
11   0  0 12  0  0  0
12   0  0 12  0  0  0
13   0  0  0 12  0  0
14   0  0  0 12  0  0
15   0  0  0 12  0  0
16   0  0  0 12  0  0
17   0  0  0  0 12  0
18   0  0  0  0 12  0
19   0  0  0  0 12  0
20   0  0  0  0 12  0
21   0  0  0  0  0 12
22   0  0  0  0  0 12
23   0  0  0  0  0 12
24   0  0  0  0  0 12
25   6  6  0  0  0  0
26   6  6  0  0  0  0
27   6  6  0  0  0  0
28   6  6  0  0  0  0
29   6  0  6  0  0  0
30   6  0  6  0  0  0
31   6  0  6  0  0  0
32   6  0  6  0  0  0
33   6  0  0  6  0  0
34   6  0  0  6  0  0
35   6  0  0  6  0  0
36   6  0  0  6  0  0
37   6  0  0  0  6  0
38   6  0  0  0  6  0
39   6  0  0  0  6  0
40   6  0  0  0  6  0
41   6  0  0  0  0  6
42   6  0  0  0  0  6
43   6  0  0  0  0  6
44   6  0  0  0  0  6
45   0  6  6  0  0  0
46   0  6  6  0  0  0
47   0  6  6  0  0  0
48   0  6  6  0  0  0
49   0  6  0  6  0  0
50   0  6  0  6  0  0
51   0  6  0  6  0  0
52   0  6  0  6  0  0
53   0  6  0  0  6  0
54   0  6  0  0  6  0
55   0  6  0  0  6  0
56   0  6  0  0  6  0
57   0  6  0  0  0  6
58   0  6  0  0  0  6
59   0  6  0  0  0  6
60   0  6  0  0  0  6
61   0  0  6  6  0  0
62   0  0  6  6  0  0
63   0  0  6  6  0  0
64   0  0  6  6  0  0
65   0  0  6  0  6  0
66   0  0  6  0  6  0
67   0  0  6  0  6  0
68   0  0  6  0  6  0
69   0  0  6  0  0  6
70   0  0  6  0  0  6
71   0  0  6  0  0  6
72   0  0  6  0  0  6
73   0  0  0  6  6  0
74   0  0  0  6  6  0
75   0  0  0  6  6  0
76   0  0  0  6  6  0
77   0  0  0  6  0  6
78   0  0  0  6  0  6
79   0  0  0  6  0  6
80   0  0  0  6  0  6
81   0  0  0  0  6  6
82   0  0  0  0  6  6
83   0  0  0  0  6  6
84   0  0  0  0  6  6
85   4  4  4  0  0  0
86   4  4  4  0  0  0
87   4  4  4  0  0  0
88   4  4  4  0  0  0
89   4  4  0  4  0  0
90   4  4  0  4  0  0
91   4  4  0  4  0  0
92   4  4  0  4  0  0
93   4  4  0  0  4  0
94   4  4  0  0  4  0
95   4  4  0  0  4  0
96   4  4  0  0  4  0
97   4  4  0  0  0  4
98   4  4  0  0  0  4
99   4  4  0  0  0  4
100  4  4  0  0  0  4
101  4  0  4  4  0  0
102  4  0  4  4  0  0
103  4  0  4  4  0  0
104  4  0  4  4  0  0
105  4  0  4  0  4  0
106  4  0  4  0  4  0
107  4  0  4  0  4  0
108  4  0  4  0  4  0
109  4  0  4  0  0  4
110  4  0  4  0  0  4
111  4  0  4  0  0  4
112  4  0  4  0  0  4
113  4  0  0  4  4  0
114  4  0  0  4  4  0
115  4  0  0  4  4 

[R] Soft Question: Where to find this reference.

2016-07-25 Thread Justin Thong
I notice a lot of r documentation refer to this reference below. I can't
seem to find it anywhere.
Does anyone have a link to point to where I can either view it or buy it?


*Chambers, J. M., Freeny, A and Heiberger, R. M. (1992) Analysis of
variance; designed experiments*

-- 
Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Missing rows anova

2016-07-20 Thread Justin Thong
Hi Michael,

Thank you for the reply.

I am sorry I forgot to print out the anova table to make my question clear.

  DfSum Sq  Mean Sq F value   Pr(>F)
S 20.199.630e-060.8180.444
x110.0002562.560e-04   21.751   9.44e-06 ***
ID   47   0.0035247.498e-056.370 3.35e-15 ***
Resid102   0.0012011.177e-05
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

There is *no* unique value for ID for each combination of S and x1. For
example, S=1 and x1=0 can equal to either  B or C or  D or  E or  F .
Perhaps you mean that for each combination of S and x1 have different
values. If that is the case, I think maybe it makes sense.

*What I think? *
Anova has this thing where it fits the terms of 1st order first ( a formula
term including no interactions) before it fits a 2nd order term ( a formula
term including 1 interaction) and so on.

First Order--> Second Order--> Third Order--> etc

Therefore, it is known that the true fitting formula is not S+x1+S:x1+ID
but it is S+x1+ID+S:x1. Hence, it appears that ID is fitted before S:x1 but
since ID is a more refined factor than S:x1, it can be said that S:x1 is
already included in the fit of ID so R recognizes the linear dependance and
excludes the term S:x1.
In other words, S:x1 is linearly dependant to ID. And so, the row S:x1
disappears because it is considered within ID.

Does this makes sense?








On 19 July 2016 at 16:19, Michael Dewey <li...@dewey.myzen.co.uk> wrote:

> Presumably it disappears because there is a unique value of ID for eac
> combination of S*x1 so they are indistinguishable.
>
>
> On 19/07/2016 12:53, Justin Thong wrote:
>
>> Why does the S:x1 column disappear (presumably S:x1 goes into ID but I
>> dont
>> know why)? S is a factor, x1 is a covariate and ID is a factor.
>>
>> rich.side<-aov(y~S*x1+ID)
>> summary(rich.side)
>>
>> Below is the model frame
>>
>> model.frame(~S*x1+ID)
>>
>> S x1  ID
>> 1   1 12   A
>> 2   1 12   A
>> 3   1 12   A
>> 4   1 12   A
>> 5   1  0   B
>> 6   1  0   B
>> 7   1  0   B
>> 8   1  0   B
>> 9   1  0   C
>> 10  1  0   C
>> 11  1  0   C
>> 12  1  0   C
>> 13  1  0   D
>> 14  1  0   D
>> 15  1  0   D
>> 16  1  0   D
>> 17  1  0   E
>> 18  1  0   E
>> 19  1  0   E
>> 20  1  0   E
>> 21  1  0   F
>> 22  1  0   F
>> 23  1  0   F
>> 24  1  0   F
>> 25  2  6  AB
>> 26  2  6  AB
>> 27  2  6  AB
>> 28  2  6  AB
>> 29  2  6  AC
>> 30  2  6  AC
>> 31  2  6  AC
>> 32  2  6  AC
>> 33  2  6  AD
>> 34  2  6  AD
>> 35  2  6  AD
>> 36  2  6  AD
>> 37  2  6  AE
>> 38  2  6  AE
>> 39  2  6  AE
>> 40  2  6  AE
>> 41  2  6  AF
>> 42  2  6  AF
>> 43  2  6  AF
>> 44  2  6  AF
>> 45  2  0  BC
>> 46  2  0  BC
>> 47  2  0  BC
>> 48  2  0  BC
>> 49  2  0  BD
>> 50  2  0  BD
>> 51  2  0  BD
>> 52  2  0  BD
>> 53  2  0  BE
>> 54  2  0  BE
>> 55  2  0  BE
>> 56  2  0  BE
>> 57  2  0  BF
>> 58  2  0  BF
>> 59  2  0  BF
>> 60  2  0  BF
>> 61  2  0  CD
>> 62  2  0  CD
>> 63  2  0  CD
>> 64  2  0  CD
>> 65  2  0  CE
>> 66  2  0  CE
>> 67  2  0  CE
>> 68  2  0  CE
>> 69  2  0  CF
>> 70  2  0  CF
>> 71  2  0  CF
>> 72  2  0  CF
>> 73  2  0  DE
>> 74  2  0  DE
>> 75  2  0  DE
>> 76  2  0  DE
>> 77  2  0  DF
>> 78  2  0  DF
>> 79  2  0  DF
>> 80  2  0  DF
>> 81  2  0  EF
>> 82  2  0  EF
>> 83  2  0  EF
>> 84  2  0  EF
>> 85  3  4 ABC
>> 86  3  4 ABC
>> 87  3  4 ABC
>> 88  3  4 ABC
>> 89  3  4 ABD
>> 90  3  4 ABD
>> 91  3  4 ABD
>> 92  3  4 ABD
>> 93  3  4 ABE
>> 94  3  4 ABE
>> 95  3  4 ABE
>> 96  3  4 ABE
>> 97  3  4 ABF
>> 98  3  4 ABF
>> 99  3  4 ABF
>> 100 3  4 ABF
>> 101 3  4 ACD
>> 102 3  4 ACD
>> 103 3  4 ACD
>> 104 3  4 ACD
>> 105 3  4 ACE
>> 106 3  4 ACE
>> 107 3  4 ACE
>> 108 3  4 ACE
>> 

[R] Missing rows anova

2016-07-19 Thread Justin Thong
Why does the S:x1 column disappear (presumably S:x1 goes into ID but I dont
know why)? S is a factor, x1 is a covariate and ID is a factor.

rich.side<-aov(y~S*x1+ID)
summary(rich.side)

Below is the model frame

model.frame(~S*x1+ID)

S x1  ID
1   1 12   A
2   1 12   A
3   1 12   A
4   1 12   A
5   1  0   B
6   1  0   B
7   1  0   B
8   1  0   B
9   1  0   C
10  1  0   C
11  1  0   C
12  1  0   C
13  1  0   D
14  1  0   D
15  1  0   D
16  1  0   D
17  1  0   E
18  1  0   E
19  1  0   E
20  1  0   E
21  1  0   F
22  1  0   F
23  1  0   F
24  1  0   F
25  2  6  AB
26  2  6  AB
27  2  6  AB
28  2  6  AB
29  2  6  AC
30  2  6  AC
31  2  6  AC
32  2  6  AC
33  2  6  AD
34  2  6  AD
35  2  6  AD
36  2  6  AD
37  2  6  AE
38  2  6  AE
39  2  6  AE
40  2  6  AE
41  2  6  AF
42  2  6  AF
43  2  6  AF
44  2  6  AF
45  2  0  BC
46  2  0  BC
47  2  0  BC
48  2  0  BC
49  2  0  BD
50  2  0  BD
51  2  0  BD
52  2  0  BD
53  2  0  BE
54  2  0  BE
55  2  0  BE
56  2  0  BE
57  2  0  BF
58  2  0  BF
59  2  0  BF
60  2  0  BF
61  2  0  CD
62  2  0  CD
63  2  0  CD
64  2  0  CD
65  2  0  CE
66  2  0  CE
67  2  0  CE
68  2  0  CE
69  2  0  CF
70  2  0  CF
71  2  0  CF
72  2  0  CF
73  2  0  DE
74  2  0  DE
75  2  0  DE
76  2  0  DE
77  2  0  DF
78  2  0  DF
79  2  0  DF
80  2  0  DF
81  2  0  EF
82  2  0  EF
83  2  0  EF
84  2  0  EF
85  3  4 ABC
86  3  4 ABC
87  3  4 ABC
88  3  4 ABC
89  3  4 ABD
90  3  4 ABD
91  3  4 ABD
92  3  4 ABD
93  3  4 ABE
94  3  4 ABE
95  3  4 ABE
96  3  4 ABE
97  3  4 ABF
98  3  4 ABF
99  3  4 ABF
100 3  4 ABF
101 3  4 ACD
102 3  4 ACD
103 3  4 ACD
104 3  4 ACD
105 3  4 ACE
106 3  4 ACE
107 3  4 ACE
108 3  4 ACE
109 3  4 ACF
110 3  4 ACF
111 3  4 ACF
112 3  4 ACF
113 3  4 ADE
114 3  4 ADE
115 3  4 ADE
116 3  4 ADE
117 3  4 ADF
118 3  4 ADF
119 3  4 ADF
120 3  4 ADF
121 3  4 AEF
122 3  4 AEF
123 3  4 AEF
124 3  4 AEF
125 3  0 BCD
126 3  0 BCD
127 3  0 BCD
128 3  0 BCD
129 3  0 BCE
130 3  0 BCE
131 3  0 BCE
132 3  0 BCE
133 3  0 BCF
134 3  0 BCF
135 3  0 BCF
136 3  0 BCF
137 3  0 BDE
138 3  0 BDE
139 3  0 BDE
140 3  0 BDE
141 3  0 BDF
142 3  0 BDF
143 3  0 BDF
144 3  0 BDF
145 3  0 BEF
146 3  0 BEF
147 3  0 BEF
148 3  0 BEF
149 3  0 CDE
150 3  0 CDE
151 3  0 CDE
152 3  0 CDE
153 3  0 CDF
154 3  0 CDF
155 3  0 CDF
156 3  0 CDF
157 3  0 CEF
158 3  0 CEF
159 3  0 CEF
160 3  0 CEF
161 3  0 DEF
162 3  0 DEF
163 3  0 DEF
164 3  0 DEF

-- 
Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reference for aov()

2016-07-13 Thread Justin Thong
Hi

*I have been looking for a reference to explain how R uses the aov
command(at a deeper level)*. More specifically, how R reads the formulae
and R computes the sums of squares. I am not interested in understanding
what the difference of Type 1,2,3 sum of squares are. I am more interested
in finding out about how R computes ~x1:x2:x3  or how R computes ~A:x1
emphasizing sequential nature of the way it computes, and models even more
complicated than this.

Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.