Re: [R] help with statistics in R - how to measure the effect of users in groups

Bert Gunter Mon, 10 Oct 2011 10:02:02 -0700

OK. So my original advice and warnings are correct.

However, now there is an additional wrinkle because your response is a
count, which is not a continuous measurement. For this, you'll need glm(...,
family = "poisson") instead of lm(...), where the ... is the stuff I gave
you before. A backup approach is there aren't too many small counts (below
about 10, say) is to take the square root of the counts and analyze that via
lm().


In either approach, your interpretation becomes more difficult -- e.g. have
you any experience with glm's = generalized linear models? Moreover, if
there are large numbers of users -- e.g. > dozens (and you may have hundreds
or thousands -- of course the interaction will be significant, but so what?
For this you'll need to re-frame the question.

So given all this and what appears to be your relative ignorance of
statistics, I strongly recommend that you get local statistical help. Or
just forget about formal statistical analysis altogether and do some
sensible plotting.

Finally, that's it for me on this. I will offer you no more advice.

-- Bert

On Mon, Oct 10, 2011 at 9:40 AM, gj <gaw...@gmail.com> wrote:

> Hi Bert,
>
> The real situation is like what you suggested, user x group interactions.
> The users can be in more than one group.
> In fact, the data that I am trying to analyse constitute of users, online
> forums as groups and the attribute under measure is the number of posts made
> by each user in a particular forum.
>
> My hypothesis is that the number of posts a user makes to a forum is
> dependent on the forum. For example if the user is in a forum that is active
> he contributes more compared to when he is in a forum that is less active. I
> guess there will be some users who contribute the same irrespective of the
> forum.
>
> I hope this makes sense.
>
> Regards
> Gawesh
>
> On Mon, Oct 10, 2011 at 4:50 PM, Bert Gunter <gunter.ber...@gene.com>wrote:
>
>> Yes, of course. But then one gets into additional problems with carryover
>> effects,etc.
>> Also, one then has a repeated measures problem (User is the experimental
>> unit) and my previous advice is nonsense,
>>
>> Like you, I have no idea what his real situation is.
>>
>> -- Bert
>>
>>
>> On Mon, Oct 10, 2011 at 8:39 AM, Anupam <anupa...@gmail.com> wrote:
>>
>>> It is possible to give multiple treatments, one at a time, to same pool
>>> of patients. You are correct that interactions may be important in this
>>> problem. I am only trying to help him frame the problem using an analogy.
>>> ****
>>>
>>> ** **
>>>
>>> Anupam.****
>>>
>>> *From:* Bert Gunter [mailto:gunter.ber...@gene.com]
>>> *Sent:* Monday, October 10, 2011 8:21 PM
>>> *To:* Anupam
>>> *Cc:* gj
>>> *Subject:* Re: [R] help with statistics in R - how to measure the effect
>>> of users in groups****
>>>
>>> ** **
>>>
>>> If that is the case, and each user can appear in only one group, there is
>>> no group x user interaction, the poster's question was nonsense, and one
>>> analyzes the group effect only, as originally shown
>>>
>>> -- Bert****
>>>
>>> On Mon, Oct 10, 2011 at 7:43 AM, Anupam <anupa...@gmail.com> wrote:****
>>>
>>> Groups are different treatments given to Users for your Outcome
>>> (measurement) of interest. Take this idea forward and you will have an
>>> answer.
>>>
>>> Anupam.
>>> -----Original Message-----
>>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
>>> On
>>> Behalf Of Bert Gunter
>>> Sent: Monday, October 10, 2011 7:36 PM
>>> To: gj
>>> Cc: r-help@r-project.org
>>> Subject: Re: [R] help with statistics in R - how to measure the effect of
>>> users in groups
>>>
>>> Assuming your data are in a data frame, yourdat,  as:
>>>
>>> User   Group   Value
>>> u1     1          !0
>>> u2     2         5
>>> u3      3      NA
>>> ...(etc)
>>>
>>> where Group is **explicitly coerced to be a factor,** then you want the
>>> User
>>> x Group interaction, obtained from
>>>
>>> lm( Value ~ Group*User,data = yourdat)
>>>
>>> However, you'll get some kind of warning message if
>>>
>>> a) Not all Group x User combinations are present in the data
>>>
>>> b) Moreover, no statistics can be calculated if there are no replicates
>>> of
>>> UserxGroup combinations.
>>>
>>> If you do not know why either of these are the case, get local help or
>>> study
>>> any linear models (regression) text or online tutorial, as these last
>>> issues
>>> have nothing to do with R.
>>>
>>> -- Bert
>>>
>>>
>>> On Mon, Oct 10, 2011 at 3:48 AM, gj <gaw...@gmail.com> wrote:
>>>
>>> > Thanks Petr. I will try it on the real data.
>>> >
>>> > But that will only show that the groups are different or not.
>>> > Is there any way I can test if the users are different when they are
>>> > in different groups?
>>> >
>>> > Regards
>>> > Gawesh
>>> >
>>> > On Mon, Oct 10, 2011 at 11:17 AM, Petr PIKAL <petr.pi...@precheza.cz>
>>> > wrote:
>>> >
>>> > > >
>>> > > > Hi Petr,
>>> > > >
>>> > > > It's not an equation. It's my mistake; the * are meant to be field
>>> > > > separators for the example data. I should have just use blank
>>> > > > spaces as
>>> > > > follows:
>>> > > >
>>> > > > users   Group1   Group2   Group3
>>> > > > u1        10           5            N/A
>>> > > > u2         6          N/A          4
>>> > > > u3         5           2            3
>>> > > >
>>> > > >
>>> > > > Regards
>>> > > > Gawesh
>>> > >
>>> > > OK. You shall transform your data to long format to use lm
>>> > >
>>> > > test <- read.table("clipboard", header=T, na.strings="N/A")
>>> > > test.m<-melt(test)
>>> > > Using users as id variables
>>> > > fit<-lm(value~variable, data=test.m)
>>> > > summary(fit)
>>> > >
>>> > > Call:
>>> > > lm(formula = value ~ variable, data = test.m)
>>> > >
>>> > > Residuals:
>>> > >   1    2    3    4    6    8    9
>>> > >  3.0 -1.0 -2.0  1.5 -1.5  0.5 -0.5
>>> > >
>>> > > Coefficients:
>>> > >               Estimate Std. Error t value Pr(>|t|)
>>> > > (Intercept)       7.000      1.258   5.563 0.00511 **
>>> > > variableGroup2   -3.500      1.990  -1.759 0.15336
>>> > > variableGroup3   -3.500      1.990  -1.759 0.15336
>>> > > ---
>>> > > Signif. codes:  0  ***  0.001  **  0.01  *  0.05  .  0.1     1
>>> > >
>>> > > Residual standard error: 2.179 on 4 degrees of freedom
>>> > >  (2 observations deleted due to missingness)
>>> > > Multiple R-squared: 0.525,      Adjusted R-squared: 0.2875
>>> > > F-statistic: 2.211 on 2 and 4 DF,  p-value: 0.2256
>>> > >
>>> > > No difference among groups, but I am not sure if this is the correct
>>> > > way to evaluate.
>>> > >
>>> > > library(ggplot2)
>>> > > p<-ggplot(test.m, aes(x=variable, y=value, colour=users))
>>> > > p+geom_point()
>>> > >
>>> > > There is some sign that user3 has lowest value in each group.
>>> > > However for including users to fit there is not enough data.
>>> > >
>>> > > Regards
>>> > > Petr
>>> > >
>>> > >
>>> > > >
>>> > > >
>>> > > > On Mon, Oct 10, 2011 at 9:32 AM, Petr PIKAL
>>> > > > <petr.pi...@precheza.cz>
>>> > > wrote:
>>> > > >
>>> > > > > Hi
>>> > > > >
>>> > > > > I do not understand much about your equations. I think you shall
>>> > > > > look
>>> > > to
>>> > > > > Practical Regression and Anova Using R from J.Faraway.
>>> > > > >
>>> > > > > Having data frame DF with columns - users, groups, results you
>>> > > > > could
>>> > > do
>>> > > > >
>>> > > > > fit <- lm(results~groups, data = DF)
>>> > > > >
>>> > > > > Regards
>>> > > > > Petr
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > >
>>> > > > > > Hi,
>>> > > > > >
>>> > > > > > I'm a newbie to R. My knowledge of statistics is mostly
>>> > self-taught.
>>> > > My
>>> > > > > > problem is how to measure the effect of users in groups. I can
>>> > > calculate
>>> > > > > a
>>> > > > > > particular attribute for a user in a group. But my hypothesis
>>> > > > > > is
>>> > > that
>>> > > > > the
>>> > > > > > user's attribute is not independent of each other and that the
>>> > > user's
>>> > > > > > attribute depends on the group ie that user's behaviour change
>>> > based
>>> > > on
>>> > > > > the
>>> > > > > > group.
>>> > > > > >
>>> > > > > > Let me give an example:
>>> > > > > >
>>> > > > > > users*Group 1*Group 2*Group 3
>>> > > > > > u1*10*5*n/a
>>> > > > > > u2*6*n/a*4
>>> > > > > > u3*5*2*3
>>> > > > > >
>>> > > > > > For example, I want to be able to prove that u1 behaviour is
>>> > > different
>>> > > > > in
>>> > > > > > group 1 than other groups and the particular thing about Group
>>> > > > > > 1 is
>>> > > that
>>> > > > > > users in Group 1 tend to have a higher value of the attribute
>>> > > > > > under measurement.
>>> > > > > >
>>> > > > > >
>>> > > > > > Hence, can use R to test my hypothesis. I'm willing to learn;
>>> > > > > > so if
>>> > > this
>>> > > > > is
>>> > > > > > very simple, just point me in the direction of any online
>>> > > > > > resources
>>> > > > > about
>>> > > > > > it. At the moment, I don't even how to define these class of
>>> > > problems?
>>> > > > > That
>>> > > > > > will be a start.
>>> > > > > >
>>> > > > > > Regards
>>> > > > > > Gawesh
>>> > > > > >
>>> > > > > >    [[alternative HTML version deleted]]
>>> > > > > >
>>> > > > > > ______________________________________________
>>> > > > > > R-help@r-project.org mailing list
>>> > > > > > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > > > > > PLEASE do read the posting guide
>>> > > > > http://www.R-project.org/posting-guide.html
>>> > > > > > and provide commented, minimal, self-contained, reproducible
>>> code.
>>> > > > >
>>> > > > >
>>> > > >
>>> > > >    [[alternative HTML version deleted]]
>>> > > >
>>> > > > ______________________________________________
>>> > > > R-help@r-project.org mailing list
>>> > > > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > > > PLEASE do read the posting guide
>>> > > http://www.R-project.org/posting-guide.html
>>> > > > and provide commented, minimal, self-contained, reproducible code.
>>> > >
>>> > >
>>> >
>>> >        [[alternative HTML version deleted]]
>>> >
>>> >
>>> > ______________________________________________
>>> > R-help@r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> > http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>> >
>>> >
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ****
>>>
>>> ** **
>>>
>>
>>
>>
>> --
>> "Men by nature long to get on to the ultimate truths, and will often be
>> impatient with elementary studies or fight shy of them. If it were possible
>> to reach the ultimate truths without the elementary studies usually prefixed
>> to them, these would not be preparatory studies but superfluous diversions."
>>
>> -- Maimonides (1135-1204)
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>> 467-7374
>>
>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>>
>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with statistics in R - how to measure the effect of users in groups

Reply via email to