Hi > > OK. So my original advice and warnings are correct. > > However, now there is an additional wrinkle because your response is a > count, which is not a continuous measurement. For this, you'll need glm(..., > family = "poisson") instead of lm(...), where the ... is the stuff I gave > you before. A backup approach is there aren't too many small counts (below > about 10, say) is to take the square root of the counts and analyze that via > lm(). > > In either approach, your interpretation becomes more difficult -- e.g. have > you any experience with glm's = generalized linear models? Moreover, if > there are large numbers of users -- e.g. > dozens (and you may have hundreds > or thousands -- of course the interaction will be significant, but so what? > For this you'll need to re-frame the question. > > So given all this and what appears to be your relative ignorance of > statistics, I strongly recommend that you get local statistical help. Or > just forget about formal statistical analysis altogether and do some > sensible plotting.
what was actually my advice too > >>> > > library(ggplot2) > >>> > > p<-ggplot(test.m, aes(x=variable, y=value, colour=users)) > >>> > > p+geom_point() Regards Petr > > Finally, that's it for me on this. I will offer you no more advice. > > -- Bert > > On Mon, Oct 10, 2011 at 9:40 AM, gj <gaw...@gmail.com> wrote: > > > Hi Bert, > > > > The real situation is like what you suggested, user x group interactions. > > The users can be in more than one group. > > In fact, the data that I am trying to analyse constitute of users, online > > forums as groups and the attribute under measure is the number of posts made > > by each user in a particular forum. > > > > My hypothesis is that the number of posts a user makes to a forum is > > dependent on the forum. For example if the user is in a forum that is active > > he contributes more compared to when he is in a forum that is less active. I > > guess there will be some users who contribute the same irrespective of the > > forum. > > > > I hope this makes sense. > > > > Regards > > Gawesh > > > > On Mon, Oct 10, 2011 at 4:50 PM, Bert Gunter <gunter.ber...@gene.com>wrote: > > > >> Yes, of course. But then one gets into additional problems with carryover > >> effects,etc. > >> Also, one then has a repeated measures problem (User is the experimental > >> unit) and my previous advice is nonsense, > >> > >> Like you, I have no idea what his real situation is. > >> > >> -- Bert > >> > >> > >> On Mon, Oct 10, 2011 at 8:39 AM, Anupam <anupa...@gmail.com> wrote: > >> > >>> It is possible to give multiple treatments, one at a time, to same pool > >>> of patients. You are correct that interactions may be important in this > >>> problem. I am only trying to help him frame the problem using an analogy. > >>> **** > >>> > >>> ** ** > >>> > >>> Anupam.**** > >>> > >>> *From:* Bert Gunter [mailto:gunter.ber...@gene.com] > >>> *Sent:* Monday, October 10, 2011 8:21 PM > >>> *To:* Anupam > >>> *Cc:* gj > >>> *Subject:* Re: [R] help with statistics in R - how to measure the effect > >>> of users in groups**** > >>> > >>> ** ** > >>> > >>> If that is the case, and each user can appear in only one group, there is > >>> no group x user interaction, the poster's question was nonsense, and one > >>> analyzes the group effect only, as originally shown > >>> > >>> -- Bert**** > >>> > >>> On Mon, Oct 10, 2011 at 7:43 AM, Anupam <anupa...@gmail.com> wrote:**** > >>> > >>> Groups are different treatments given to Users for your Outcome > >>> (measurement) of interest. Take this idea forward and you will have an > >>> answer. > >>> > >>> Anupam. > >>> -----Original Message----- > >>> From: r-help-boun...@r-project.org [ mailto:r-help-boun...@r-project.org] > >>> On > >>> Behalf Of Bert Gunter > >>> Sent: Monday, October 10, 2011 7:36 PM > >>> To: gj > >>> Cc: r-help@r-project.org > >>> Subject: Re: [R] help with statistics in R - how to measure the effect of > >>> users in groups > >>> > >>> Assuming your data are in a data frame, yourdat, as: > >>> > >>> User Group Value > >>> u1 1 !0 > >>> u2 2 5 > >>> u3 3 NA > >>> ...(etc) > >>> > >>> where Group is **explicitly coerced to be a factor,** then you want the > >>> User > >>> x Group interaction, obtained from > >>> > >>> lm( Value ~ Group*User,data = yourdat) > >>> > >>> However, you'll get some kind of warning message if > >>> > >>> a) Not all Group x User combinations are present in the data > >>> > >>> b) Moreover, no statistics can be calculated if there are no replicates > >>> of > >>> UserxGroup combinations. > >>> > >>> If you do not know why either of these are the case, get local help or > >>> study > >>> any linear models (regression) text or online tutorial, as these last > >>> issues > >>> have nothing to do with R. > >>> > >>> -- Bert > >>> > >>> > >>> On Mon, Oct 10, 2011 at 3:48 AM, gj <gaw...@gmail.com> wrote: > >>> > >>> > Thanks Petr. I will try it on the real data. > >>> > > >>> > But that will only show that the groups are different or not. > >>> > Is there any way I can test if the users are different when they are > >>> > in different groups? > >>> > > >>> > Regards > >>> > Gawesh > >>> > > >>> > On Mon, Oct 10, 2011 at 11:17 AM, Petr PIKAL <petr.pi...@precheza.cz> > >>> > wrote: > >>> > > >>> > > > > >>> > > > Hi Petr, > >>> > > > > >>> > > > It's not an equation. It's my mistake; the * are meant to be field > >>> > > > separators for the example data. I should have just use blank > >>> > > > spaces as > >>> > > > follows: > >>> > > > > >>> > > > users Group1 Group2 Group3 > >>> > > > u1 10 5 N/A > >>> > > > u2 6 N/A 4 > >>> > > > u3 5 2 3 > >>> > > > > >>> > > > > >>> > > > Regards > >>> > > > Gawesh > >>> > > > >>> > > OK. You shall transform your data to long format to use lm > >>> > > > >>> > > test <- read.table("clipboard", header=T, na.strings="N/A") > >>> > > test.m<-melt(test) > >>> > > Using users as id variables > >>> > > fit<-lm(value~variable, data=test.m) > >>> > > summary(fit) > >>> > > > >>> > > Call: > >>> > > lm(formula = value ~ variable, data = test.m) > >>> > > > >>> > > Residuals: > >>> > > 1 2 3 4 6 8 9 > >>> > > 3.0 -1.0 -2.0 1.5 -1.5 0.5 -0.5 > >>> > > > >>> > > Coefficients: > >>> > > Estimate Std. Error t value Pr(>|t|) > >>> > > (Intercept) 7.000 1.258 5.563 0.00511 ** > >>> > > variableGroup2 -3.500 1.990 -1.759 0.15336 > >>> > > variableGroup3 -3.500 1.990 -1.759 0.15336 > >>> > > --- > >>> > > Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 > >>> > > > >>> > > Residual standard error: 2.179 on 4 degrees of freedom > >>> > > (2 observations deleted due to missingness) > >>> > > Multiple R-squared: 0.525, Adjusted R-squared: 0.2875 > >>> > > F-statistic: 2.211 on 2 and 4 DF, p-value: 0.2256 > >>> > > > >>> > > No difference among groups, but I am not sure if this is the correct > >>> > > way to evaluate. > >>> > > > >>> > > library(ggplot2) > >>> > > p<-ggplot(test.m, aes(x=variable, y=value, colour=users)) > >>> > > p+geom_point() > >>> > > > >>> > > There is some sign that user3 has lowest value in each group. > >>> > > However for including users to fit there is not enough data. > >>> > > > >>> > > Regards > >>> > > Petr > >>> > > > >>> > > > >>> > > > > >>> > > > > >>> > > > On Mon, Oct 10, 2011 at 9:32 AM, Petr PIKAL > >>> > > > <petr.pi...@precheza.cz> > >>> > > wrote: > >>> > > > > >>> > > > > Hi > >>> > > > > > >>> > > > > I do not understand much about your equations. I think you shall > >>> > > > > look > >>> > > to > >>> > > > > Practical Regression and Anova Using R from J.Faraway. > >>> > > > > > >>> > > > > Having data frame DF with columns - users, groups, results you > >>> > > > > could > >>> > > do > >>> > > > > > >>> > > > > fit <- lm(results~groups, data = DF) > >>> > > > > > >>> > > > > Regards > >>> > > > > Petr > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > > >>> > > > > > Hi, > >>> > > > > > > >>> > > > > > I'm a newbie to R. My knowledge of statistics is mostly > >>> > self-taught. > >>> > > My > >>> > > > > > problem is how to measure the effect of users in groups. I can > >>> > > calculate > >>> > > > > a > >>> > > > > > particular attribute for a user in a group. But my hypothesis > >>> > > > > > is > >>> > > that > >>> > > > > the > >>> > > > > > user's attribute is not independent of each other and that the > >>> > > user's > >>> > > > > > attribute depends on the group ie that user's behaviour change > >>> > based > >>> > > on > >>> > > > > the > >>> > > > > > group. > >>> > > > > > > >>> > > > > > Let me give an example: > >>> > > > > > > >>> > > > > > users*Group 1*Group 2*Group 3 > >>> > > > > > u1*10*5*n/a > >>> > > > > > u2*6*n/a*4 > >>> > > > > > u3*5*2*3 > >>> > > > > > > >>> > > > > > For example, I want to be able to prove that u1 behaviour is > >>> > > different > >>> > > > > in > >>> > > > > > group 1 than other groups and the particular thing about Group > >>> > > > > > 1 is > >>> > > that > >>> > > > > > users in Group 1 tend to have a higher value of the attribute > >>> > > > > > under measurement. > >>> > > > > > > >>> > > > > > > >>> > > > > > Hence, can use R to test my hypothesis. I'm willing to learn; > >>> > > > > > so if > >>> > > this > >>> > > > > is > >>> > > > > > very simple, just point me in the direction of any online > >>> > > > > > resources > >>> > > > > about > >>> > > > > > it. At the moment, I don't even how to define these class of > >>> > > problems? > >>> > > > > That > >>> > > > > > will be a start. > >>> > > > > > > >>> > > > > > Regards > >>> > > > > > Gawesh > >>> > > > > > > >>> > > > > > [[alternative HTML version deleted]] > >>> > > > > > > >>> > > > > > ______________________________________________ > >>> > > > > > R-help@r-project.org mailing list > >>> > > > > > https://stat.ethz.ch/mailman/listinfo/r-help > >>> > > > > > PLEASE do read the posting guide > >>> > > > > http://www.R-project.org/posting-guide.html > >>> > > > > > and provide commented, minimal, self-contained, reproducible > >>> code. > >>> > > > > > >>> > > > > > >>> > > > > >>> > > > [[alternative HTML version deleted]] > >>> > > > > >>> > > > ______________________________________________ > >>> > > > R-help@r-project.org mailing list > >>> > > > https://stat.ethz.ch/mailman/listinfo/r-help > >>> > > > PLEASE do read the posting guide > >>> > > http://www.R-project.org/posting-guide.html > >>> > > > and provide commented, minimal, self-contained, reproducible code. > >>> > > > >>> > > > >>> > > >>> > [[alternative HTML version deleted]] > >>> > > >>> > > >>> > ______________________________________________ > >>> > R-help@r-project.org mailing list > >>> > https://stat.ethz.ch/mailman/listinfo/r-help > >>> > PLEASE do read the posting guide > >>> > http://www.R-project.org/posting-guide.html > >>> > and provide commented, minimal, self-contained, reproducible code. > >>> > > >>> > > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> **** > >>> > >>> ** ** > >>> > >> > >> > >> > >> -- > >> "Men by nature long to get on to the ultimate truths, and will often be > >> impatient with elementary studies or fight shy of them. If it were possible > >> to reach the ultimate truths without the elementary studies usually prefixed > >> to them, these would not be preparatory studies but superfluous diversions." > >> > >> -- Maimonides (1135-1204) > >> > >> Bert Gunter > >> Genentech Nonclinical Biostatistics > >> 467-7374 > >> > >> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb- > biostatistics/pdb-ncb-home.htm > >> > >> > >> > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.