Re: [R] data analysis for partial two-by-two factorial design

2018-03-05 Thread Ding, Yuan Chun
Thanks a lot, after reading this message, I think I got the advantage of Bert's 
coding. Those two drugs indeed do not interact with each other, so additive 
assumption is valid. 

I learned a lot today. Thanks again.

Ding

-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net] 
Sent: Monday, March 05, 2018 3:55 PM
To: Bert Gunter
Cc: Ding, Yuan Chun; r-help@r-project.org
Subject: Re: [R] data analysis for partial two-by-two factorial design


> On Mar 5, 2018, at 3:04 PM, Bert Gunter  wrote:
> 
> But of course the whole point of additivity is to decompose the combined 
> effect as the sum of individual effects.

Agreed. Furthermore your encoding of the treatment assignments has the 
advantage that the default treatment contrast for A+B will have a statistical 
estimate associated with it. That was a deficiency of my encoding that Ding 
found problematic. I did have the incorrect notion that the encoding of Drug B 
in the single drug situation would have been NA and that the `lm`-function 
would produce nothing useful. Your setup had not occurred to me.

Best;
David.

> 
> "Mislead" is a subjective judgment, so no comment. The explanation I provided 
> is standard. I used it for decades when I taught in industry.
> 
> Cheers,
> Bert
> 
> 
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along and 
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> On Mon, Mar 5, 2018 at 3:00 PM, David Winsemius  
> wrote:
> 
> > On Mar 5, 2018, at 2:27 PM, Bert Gunter  wrote:
> >
> > David:
> >
> > I believe your response on SO is incorrect. This is a standard OFAT (one 
> > factor at a time) design, so that assuming additivity (no interactions), 
> > the effects of drugA and drugB can be determined via the model you rejected:
> 
> >> three groups, no drugA/no drugB, yes drugA/no drugB, yes drugA/yes drug B, 
> >> omitting the fourth group of no drugA/yes drugB.
> 
> >
> > For example, if baseline control (no drugs) has a response of 0, drugA has 
> > an effect of 1, drugB has an effect of 2, and the effects are additive, 
> > with no noise we would have:
> >
> > > d <- data.frame(drugA = c("n","y","y"),drugB = c("n","n","y"))
> 
> d2 <- data.frame(trt = c("Baseline","DrugA_only","DrugA_drugB")
> >
> > > y <- c(0,1,3)
> >
> > And a straighforward inear model recovers the effects:
> >
> > > lm(y ~ drugA + drugB, data=d)
> >
> > Call:
> > lm(formula = y ~ drugA + drugB, data = d)
> >
> > Coefficients:
> > (Intercept)   drugAy   drugBy
> >   1.282e-161.000e+002.000e+00
> 
> I think the labeling above is rather to mislead since what is labeled drugB 
> is actually A&B. I think the method I suggest is more likely to be 
> interpreted correctly:
> 
> > d2 <- data.frame(trt = c("Baseline","DrugA_only","DrugA_drugB"))
> >  y <- c(0,1,3)
> > lm(y ~ trt, data=d2)
> 
> Call:
> lm(formula = y ~ trt, data = d2)
> 
> Coefficients:
>(Intercept)  trtDrugA_drugB   trtDrugA_only
>  2.564e-16   3.000e+00   1.000e+00
> 
> --
> David.
> >
> > As usual, OFAT designs are blind to interactions, so that if they really 
> > exist, the interpretation as additive effects is incorrect.
> >
> > Cheers,
> > Bert
> >
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along and 
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> > On Mon, Mar 5, 2018 at 2:03 PM, David Winsemius  
> > wrote:
> >
> > > On Mar 5, 2018, at 8:52 AM, Ding, Yuan Chun  wrote:
> > >
> > > Hi Bert,
> > >
> > > I am very sorry to bother you again.
> > >
> > > For the following question, as you suggested, I posted it in both 
> > > Biostars website and stackexchange website, so far no reply.
> > >
> > > I really hope that you can do me a great favor to share your points about 
> > > how to explain the coefficients for drug A and drug B if run anova model 
> > > (response variable = drug A + drug B). is it different from running three 
> > > separate T tests?
> > >
> > > Thank you so much!!
> > >
> > > Ding
> > >
> > > I need to analyze data generated from a part

Re: [R] data analysis for partial two-by-two factorial design

2018-03-05 Thread David Winsemius

> On Mar 5, 2018, at 3:04 PM, Bert Gunter  wrote:
> 
> But of course the whole point of additivity is to decompose the combined 
> effect as the sum of individual effects.

Agreed. Furthermore your encoding of the treatment assignments has the 
advantage that the default treatment contrast for A+B will have a statistical 
estimate associated with it. That was a deficiency of my encoding that Ding 
found problematic. I did have the incorrect notion that the encoding of Drug B 
in the single drug situation would have been NA and that the `lm`-function 
would produce nothing useful. Your setup had not occurred to me.

Best;
David.

> 
> "Mislead" is a subjective judgment, so no comment. The explanation I provided 
> is standard. I used it for decades when I taught in industry.
> 
> Cheers,
> Bert
> 
> 
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along and 
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> On Mon, Mar 5, 2018 at 3:00 PM, David Winsemius  
> wrote:
> 
> > On Mar 5, 2018, at 2:27 PM, Bert Gunter  wrote:
> >
> > David:
> >
> > I believe your response on SO is incorrect. This is a standard OFAT (one 
> > factor at a time) design, so that assuming additivity (no interactions), 
> > the effects of drugA and drugB can be determined via the model you rejected:
> 
> >> three groups, no drugA/no drugB, yes drugA/no drugB, yes drugA/yes drug B, 
> >> omitting the fourth group of no drugA/yes drugB.
> 
> >
> > For example, if baseline control (no drugs) has a response of 0, drugA has 
> > an effect of 1, drugB has an effect of 2, and the effects are additive, 
> > with no noise we would have:
> >
> > > d <- data.frame(drugA = c("n","y","y"),drugB = c("n","n","y"))
> 
> d2 <- data.frame(trt = c("Baseline","DrugA_only","DrugA_drugB")
> >
> > > y <- c(0,1,3)
> >
> > And a straighforward inear model recovers the effects:
> >
> > > lm(y ~ drugA + drugB, data=d)
> >
> > Call:
> > lm(formula = y ~ drugA + drugB, data = d)
> >
> > Coefficients:
> > (Intercept)   drugAy   drugBy
> >   1.282e-161.000e+002.000e+00
> 
> I think the labeling above is rather to mislead since what is labeled drugB 
> is actually A&B. I think the method I suggest is more likely to be 
> interpreted correctly:
> 
> > d2 <- data.frame(trt = c("Baseline","DrugA_only","DrugA_drugB"))
> >  y <- c(0,1,3)
> > lm(y ~ trt, data=d2)
> 
> Call:
> lm(formula = y ~ trt, data = d2)
> 
> Coefficients:
>(Intercept)  trtDrugA_drugB   trtDrugA_only
>  2.564e-16   3.000e+00   1.000e+00
> 
> --
> David.
> >
> > As usual, OFAT designs are blind to interactions, so that if they really 
> > exist, the interpretation as additive effects is incorrect.
> >
> > Cheers,
> > Bert
> >
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along and 
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> > On Mon, Mar 5, 2018 at 2:03 PM, David Winsemius  
> > wrote:
> >
> > > On Mar 5, 2018, at 8:52 AM, Ding, Yuan Chun  wrote:
> > >
> > > Hi Bert,
> > >
> > > I am very sorry to bother you again.
> > >
> > > For the following question, as you suggested, I posted it in both 
> > > Biostars website and stackexchange website, so far no reply.
> > >
> > > I really hope that you can do me a great favor to share your points about 
> > > how to explain the coefficients for drug A and drug B if run anova model 
> > > (response variable = drug A + drug B). is it different from running three 
> > > separate T tests?
> > >
> > > Thank you so much!!
> > >
> > > Ding
> > >
> > > I need to analyze data generated from a partial two-by-two factorial 
> > > design: two levels for drug A (yes, no), two levels for drug B (yes, no); 
> > >  however, data points are available only for three groups, no drugA/no 
> > > drugB, yes drugA/no drugB, yes drugA/yes drug B, omitting the fourth 
> > > group of no drugA/yes drugB.  I think we can not investigate interaction 
> > > between drug A and drug B, can I still run  model using R as usual:  
> > &g

Re: [R] data analysis for partial two-by-two factorial design

2018-03-05 Thread Bert Gunter
Yuan:

IMHO you need to stop making up your own statistical analyses and get local
expert help.

I have nothing further to say. Do what you will.

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Mon, Mar 5, 2018 at 2:44 PM, Ding, Yuan Chun  wrote:

> Hi Bert and David,
>
>
>
> Thank you so much for willingness to spend some time on my problem!!!  I
> have some statistical knowledge (going to get a master in applied
> statisitics), but do not have a chance to purse a phD for statistics, so I
> am always be careful before starting to do analysis and hope to gather
> supportive information from real statisticians.
>
>
>
> Sorry that I did not tell more info about experiment design.
>
>
>
> I did not do this experiment, my collaborator did it and I only got chance
> to analyze the data.
>
>
>
> There are nine dishes of cells.  Three replicates for each treatment
> combination.  So randomly select three dishes for no drug A/no drug B
> treatment, a second three dishes for drug A only, then last three dishes to
> add both A and B drugs.  After drug treatments, they measure DNA
> methylation and genes or gene expression as outcome or response
> variables(two differnet types of response variables).
>
>
>
> My boss might want to find out net effect of drug B, but I think we can
> not exclude the confounding effect of drugA. For example, it is possible
> that drug B has no effect, only has effect when drug A is present.   I
> asked my collaborator whey she omitted the fourth combination drugA only
> treatment, she said it was expensive to measure methylation or gene
> expression, so they performed the experiments based on their hypothesis
> which is too complicated here, so not illustrated here in details.  I am
> still not happy that they could just add three more replicates to do a full
> 2X2 design.
>
>
>
> On the weekend, I also thought about doing a one-way anova, but then I
> have to do three pairwise comparisons to find out the pair to show
> difference if p value for one way anova is significant.
>
>
>
> Thanks,
>
>
> Ding
>
>
>
> *From:* Bert Gunter [mailto:bgunter.4...@gmail.com]
> *Sent:* Monday, March 05, 2018 2:27 PM
> *To:* David Winsemius
> *Cc:* Ding, Yuan Chun; r-help@r-project.org
>
> *Subject:* Re: [R] data analysis for partial two-by-two factorial design
>
>
>
> David:
>
> I believe your response on SO is incorrect. This is a standard OFAT (one
> factor at a time) design, so that assuming additivity (no interactions),
> the effects of drugA and drugB can be determined via the model you rejected:
>
> For example, if baseline control (no drugs) has a response of 0, drugA has
> an effect of 1, drugB has an effect of 2, and the effects are additive,
> with no noise we would have:
>
> > d <- data.frame(drugA = c("n","y","y"),drugB = c("n","n","y"))
> > y <- c(0,1,3)
>
> And a straighforward inear model recovers the effects:
>
>
> > lm(y ~ drugA + drugB, data=d)
>
> Call:
> lm(formula = y ~ drugA + drugB, data = d)
>
> Coefficients:
> (Intercept)   drugAy   drugBy
>   1.282e-161.000e+002.000e+00
>
> As usual, OFAT designs are blind to interactions, so that if they really
> exist, the interpretation as additive effects is incorrect.
>
>
>
> Cheers,
>
> Bert
>
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>
> On Mon, Mar 5, 2018 at 2:03 PM, David Winsemius 
> wrote:
>
>
> > On Mar 5, 2018, at 8:52 AM, Ding, Yuan Chun  wrote:
> >
> > Hi Bert,
> >
> > I am very sorry to bother you again.
> >
> > For the following question, as you suggested, I posted it in both
> Biostars website and stackexchange website, so far no reply.
> >
> > I really hope that you can do me a great favor to share your points
> about how to explain the coefficients for drug A and drug B if run anova
> model (response variable = drug A + drug B). is it different from running
> three separate T tests?
> >
> > Thank you so much!!
> >
> > Ding
> >
> > I need to analyze data generated from a partial two-by-two factorial
> design: two levels for drug A (yes, no), two levels for drug B (yes, no);
> however, data points are available only for three groups, no drugA/no

Re: [R] data analysis for partial two-by-two factorial design

2018-03-05 Thread Bert Gunter
But of course the whole point of additivity is to decompose the combined
effect as the sum of individual effects.

"Mislead" is a subjective judgment, so no comment. The explanation I
provided is standard. I used it for decades when I taught in industry.

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Mon, Mar 5, 2018 at 3:00 PM, David Winsemius 
wrote:

>
> > On Mar 5, 2018, at 2:27 PM, Bert Gunter  wrote:
> >
> > David:
> >
> > I believe your response on SO is incorrect. This is a standard OFAT (one
> factor at a time) design, so that assuming additivity (no interactions),
> the effects of drugA and drugB can be determined via the model you rejected:
>
> >> three groups, no drugA/no drugB, yes drugA/no drugB, yes drugA/yes drug
> B, omitting the fourth group of no drugA/yes drugB.
>
> >
> > For example, if baseline control (no drugs) has a response of 0, drugA
> has an effect of 1, drugB has an effect of 2, and the effects are additive,
> with no noise we would have:
> >
> > > d <- data.frame(drugA = c("n","y","y"),drugB = c("n","n","y"))
>
> d2 <- data.frame(trt = c("Baseline","DrugA_only","DrugA_drugB")
> >
> > > y <- c(0,1,3)
> >
> > And a straighforward inear model recovers the effects:
> >
> > > lm(y ~ drugA + drugB, data=d)
> >
> > Call:
> > lm(formula = y ~ drugA + drugB, data = d)
> >
> > Coefficients:
> > (Intercept)   drugAy   drugBy
> >   1.282e-161.000e+002.000e+00
>
> I think the labeling above is rather to mislead since what is labeled
> drugB is actually A&B. I think the method I suggest is more likely to be
> interpreted correctly:
>
> > d2 <- data.frame(trt = c("Baseline","DrugA_only","DrugA_drugB"))
> >  y <- c(0,1,3)
> > lm(y ~ trt, data=d2)
>
> Call:
> lm(formula = y ~ trt, data = d2)
>
> Coefficients:
>(Intercept)  trtDrugA_drugB   trtDrugA_only
>  2.564e-16   3.000e+00   1.000e+00
>
> --
> David.
> >
> > As usual, OFAT designs are blind to interactions, so that if they really
> exist, the interpretation as additive effects is incorrect.
> >
> > Cheers,
> > Bert
> >
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> > On Mon, Mar 5, 2018 at 2:03 PM, David Winsemius 
> wrote:
> >
> > > On Mar 5, 2018, at 8:52 AM, Ding, Yuan Chun  wrote:
> > >
> > > Hi Bert,
> > >
> > > I am very sorry to bother you again.
> > >
> > > For the following question, as you suggested, I posted it in both
> Biostars website and stackexchange website, so far no reply.
> > >
> > > I really hope that you can do me a great favor to share your points
> about how to explain the coefficients for drug A and drug B if run anova
> model (response variable = drug A + drug B). is it different from running
> three separate T tests?
> > >
> > > Thank you so much!!
> > >
> > > Ding
> > >
> > > I need to analyze data generated from a partial two-by-two factorial
> design: two levels for drug A (yes, no), two levels for drug B (yes, no);
> however, data points are available only for three groups, no drugA/no
> drugB, yes drugA/no drugB, yes drugA/yes drug B, omitting the fourth group
> of no drugA/yes drugB.  I think we can not investigate interaction between
> drug A and drug B, can I still run  model using R as usual:  response
> variable = drug A + drug B?  any suggestion is appreciated.
> >
> > Replied on CrossValidated where this would be on-topic.
> >
> > --
> > David,
> >
> > >
> > >
> > > From: Bert Gunter [mailto:bgunter.4...@gmail.com]
> > > Sent: Friday, March 02, 2018 12:32 PM
> > > To: Ding, Yuan Chun
> > > Cc: r-help@r-project.org
> > > Subject: Re: [R] data analysis for partial two-by-two factorial design
> > >
> > > 
> > > [Attention: This email came from an external source. Do not open
> attachments or click on links from unknown senders or unexpected emails.]
> > > 
> > >
> > >

Re: [R] data analysis for partial two-by-two factorial design

2018-03-05 Thread David Winsemius

> On Mar 5, 2018, at 2:27 PM, Bert Gunter  wrote:
> 
> David:
> 
> I believe your response on SO is incorrect. This is a standard OFAT (one 
> factor at a time) design, so that assuming additivity (no interactions), the 
> effects of drugA and drugB can be determined via the model you rejected:

>> three groups, no drugA/no drugB, yes drugA/no drugB, yes drugA/yes drug B, 
>> omitting the fourth group of no drugA/yes drugB.

> 
> For example, if baseline control (no drugs) has a response of 0, drugA has an 
> effect of 1, drugB has an effect of 2, and the effects are additive, with no 
> noise we would have:
> 
> > d <- data.frame(drugA = c("n","y","y"),drugB = c("n","n","y"))

d2 <- data.frame(trt = c("Baseline","DrugA_only","DrugA_drugB")
> 
> > y <- c(0,1,3)
> 
> And a straighforward inear model recovers the effects:
> 
> > lm(y ~ drugA + drugB, data=d)
> 
> Call:
> lm(formula = y ~ drugA + drugB, data = d)
> 
> Coefficients:
> (Intercept)   drugAy   drugBy  
>   1.282e-161.000e+002.000e+00  

I think the labeling above is rather to mislead since what is labeled drugB is 
actually A&B. I think the method I suggest is more likely to be interpreted 
correctly:

> d2 <- data.frame(trt = c("Baseline","DrugA_only","DrugA_drugB"))
>  y <- c(0,1,3)
> lm(y ~ trt, data=d2)

Call:
lm(formula = y ~ trt, data = d2)

Coefficients:
   (Intercept)  trtDrugA_drugB   trtDrugA_only  
 2.564e-16   3.000e+00   1.000e+00  

-- 
David.
> 
> As usual, OFAT designs are blind to interactions, so that if they really 
> exist, the interpretation as additive effects is incorrect.
> 
> Cheers,
> Bert
> 
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along and 
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> On Mon, Mar 5, 2018 at 2:03 PM, David Winsemius  
> wrote:
> 
> > On Mar 5, 2018, at 8:52 AM, Ding, Yuan Chun  wrote:
> >
> > Hi Bert,
> >
> > I am very sorry to bother you again.
> >
> > For the following question, as you suggested, I posted it in both Biostars 
> > website and stackexchange website, so far no reply.
> >
> > I really hope that you can do me a great favor to share your points about 
> > how to explain the coefficients for drug A and drug B if run anova model 
> > (response variable = drug A + drug B). is it different from running three 
> > separate T tests?
> >
> > Thank you so much!!
> >
> > Ding
> >
> > I need to analyze data generated from a partial two-by-two factorial 
> > design: two levels for drug A (yes, no), two levels for drug B (yes, no);  
> > however, data points are available only for three groups, no drugA/no 
> > drugB, yes drugA/no drugB, yes drugA/yes drug B, omitting the fourth group 
> > of no drugA/yes drugB.  I think we can not investigate interaction between 
> > drug A and drug B, can I still run  model using R as usual:  response 
> > variable = drug A + drug B?  any suggestion is appreciated.
> 
> Replied on CrossValidated where this would be on-topic.
> 
> --
> David,
> 
> >
> >
> > From: Bert Gunter [mailto:bgunter.4...@gmail.com]
> > Sent: Friday, March 02, 2018 12:32 PM
> > To: Ding, Yuan Chun
> > Cc: r-help@r-project.org
> > Subject: Re: [R] data analysis for partial two-by-two factorial design
> >
> > 
> > [Attention: This email came from an external source. Do not open 
> > attachments or click on links from unknown senders or unexpected emails.]
> > 
> >
> > This list provides help on R programming (see the posting guide linked 
> > below for details on what is/is not considered on topic), and generally 
> > avoids discussion of purely statistical issues, which is what your query 
> > appears to be. The simple answer is yes, you can fit the model as 
> > described,  but you clearly need the off topic discussion as to what it 
> > does or does not mean. For that, you might try the 
> > stats.stackexchange.com<http://stats.stackexchange.com> statistical site.
> >
> > Cheers,
> > Bert
> >
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along and 
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
&g

Re: [R] data analysis for partial two-by-two factorial design

2018-03-05 Thread Ding, Yuan Chun
I am sorry that I made a typo:
.   I asked my collaborator whey she omitted the fourth combination drugA only 
treatment,
I wanted to say .   "I asked my collaborator why she omitted the fourth 
combination drugB only treatment",

Ding 

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ding, Yuan Chun
Sent: Monday, March 05, 2018 2:45 PM
To: Bert Gunter; David Winsemius
Cc: r-help@r-project.org
Subject: Re: [R] data analysis for partial two-by-two factorial design

Hi Bert and David,

Thank you so much for willingness to spend some time on my problem!!!  I have 
some statistical knowledge (going to get a master in applied statisitics), but 
do not have a chance to purse a phD for statistics, so I am always be careful 
before starting to do analysis and hope to gather supportive information from 
real statisticians.

Sorry that I did not tell more info about experiment design.

I did not do this experiment, my collaborator did it and I only got chance to 
analyze the data.

There are nine dishes of cells.  Three replicates for each treatment 
combination.  So randomly select three dishes for no drug A/no drug B 
treatment, a second three dishes for drug A only, then last three dishes to add 
both A and B drugs.  After drug treatments, they measure DNA methylation and 
genes or gene expression as outcome or response variables(two differnet types 
of response variables).

My boss might want to find out net effect of drug B, but I think we can not 
exclude the confounding effect of drugA. For example, it is possible that drug 
B has no effect, only has effect when drug A is present.   I asked my 
collaborator whey she omitted the fourth combination drugA only treatment, she 
said it was expensive to measure methylation or gene expression, so they 
performed the experiments based on their hypothesis which is too complicated 
here, so not illustrated here in details.  I am still not happy that they could 
just add three more replicates to do a full 2X2 design.

On the weekend, I also thought about doing a one-way anova, but then I have to 
do three pairwise comparisons to find out the pair to show difference if p 
value for one way anova is significant.

Thanks,

Ding

From: Bert Gunter [mailto:bgunter.4...@gmail.com]
Sent: Monday, March 05, 2018 2:27 PM
To: David Winsemius
Cc: Ding, Yuan Chun; r-help@r-project.org
Subject: Re: [R] data analysis for partial two-by-two factorial design

David:
I believe your response on SO is incorrect. This is a standard OFAT (one factor 
at a time) design, so that assuming additivity (no interactions), the effects 
of drugA and drugB can be determined via the model you rejected:
For example, if baseline control (no drugs) has a response of 0, drugA has an 
effect of 1, drugB has an effect of 2, and the effects are additive, with no 
noise we would have:

> d <- data.frame(drugA = c("n","y","y"),drugB = c("n","n","y")) y <- 
> c(0,1,3)
And a straighforward inear model recovers the effects:

> lm(y ~ drugA + drugB, data=d)

Call:
lm(formula = y ~ drugA + drugB, data = d)

Coefficients:
(Intercept)   drugAy   drugBy
  1.282e-161.000e+002.000e+00
As usual, OFAT designs are blind to interactions, so that if they really exist, 
the interpretation as additive effects is incorrect.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Mon, Mar 5, 2018 at 2:03 PM, David Winsemius 
mailto:dwinsem...@comcast.net>> wrote:

> On Mar 5, 2018, at 8:52 AM, Ding, Yuan Chun 
> mailto:ycd...@coh.org>> wrote:
>
> Hi Bert,
>
> I am very sorry to bother you again.
>
> For the following question, as you suggested, I posted it in both Biostars 
> website and stackexchange website, so far no reply.
>
> I really hope that you can do me a great favor to share your points about how 
> to explain the coefficients for drug A and drug B if run anova model 
> (response variable = drug A + drug B). is it different from running three 
> separate T tests?
>
> Thank you so much!!
>
> Ding
>
> I need to analyze data generated from a partial two-by-two factorial design: 
> two levels for drug A (yes, no), two levels for drug B (yes, no);  however, 
> data points are available only for three groups, no drugA/no drugB, yes 
> drugA/no drugB, yes drugA/yes drug B, omitting the fourth group of no 
> drugA/yes drugB.  I think we can not investigate interaction between drug A 
> and drug B, can I still run  model using R as usual:  response variable = 
> drug A + drug B?  any suggestion is appreciated.

Replied on CrossValidated where this would be on-topic.

--
David,

>
>
> From: Bert Gunter 
> [mailto:bgunter.4.

Re: [R] data analysis for partial two-by-two factorial design

2018-03-05 Thread Ding, Yuan Chun
Hi Bert and David,

Thank you so much for willingness to spend some time on my problem!!!  I have 
some statistical knowledge (going to get a master in applied statisitics), but 
do not have a chance to purse a phD for statistics, so I am always be careful 
before starting to do analysis and hope to gather supportive information from 
real statisticians.

Sorry that I did not tell more info about experiment design.

I did not do this experiment, my collaborator did it and I only got chance to 
analyze the data.

There are nine dishes of cells.  Three replicates for each treatment 
combination.  So randomly select three dishes for no drug A/no drug B 
treatment, a second three dishes for drug A only, then last three dishes to add 
both A and B drugs.  After drug treatments, they measure DNA methylation and 
genes or gene expression as outcome or response variables(two differnet types 
of response variables).

My boss might want to find out net effect of drug B, but I think we can not 
exclude the confounding effect of drugA. For example, it is possible that drug 
B has no effect, only has effect when drug A is present.   I asked my 
collaborator whey she omitted the fourth combination drugA only treatment, she 
said it was expensive to measure methylation or gene expression, so they 
performed the experiments based on their hypothesis which is too complicated 
here, so not illustrated here in details.  I am still not happy that they could 
just add three more replicates to do a full 2X2 design.

On the weekend, I also thought about doing a one-way anova, but then I have to 
do three pairwise comparisons to find out the pair to show difference if p 
value for one way anova is significant.

Thanks,

Ding

From: Bert Gunter [mailto:bgunter.4...@gmail.com]
Sent: Monday, March 05, 2018 2:27 PM
To: David Winsemius
Cc: Ding, Yuan Chun; r-help@r-project.org
Subject: Re: [R] data analysis for partial two-by-two factorial design

David:
I believe your response on SO is incorrect. This is a standard OFAT (one factor 
at a time) design, so that assuming additivity (no interactions), the effects 
of drugA and drugB can be determined via the model you rejected:
For example, if baseline control (no drugs) has a response of 0, drugA has an 
effect of 1, drugB has an effect of 2, and the effects are additive, with no 
noise we would have:

> d <- data.frame(drugA = c("n","y","y"),drugB = c("n","n","y"))
> y <- c(0,1,3)
And a straighforward inear model recovers the effects:

> lm(y ~ drugA + drugB, data=d)

Call:
lm(formula = y ~ drugA + drugB, data = d)

Coefficients:
(Intercept)   drugAy   drugBy
  1.282e-161.000e+002.000e+00
As usual, OFAT designs are blind to interactions, so that if they really exist, 
the interpretation as additive effects is incorrect.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Mon, Mar 5, 2018 at 2:03 PM, David Winsemius 
mailto:dwinsem...@comcast.net>> wrote:

> On Mar 5, 2018, at 8:52 AM, Ding, Yuan Chun 
> mailto:ycd...@coh.org>> wrote:
>
> Hi Bert,
>
> I am very sorry to bother you again.
>
> For the following question, as you suggested, I posted it in both Biostars 
> website and stackexchange website, so far no reply.
>
> I really hope that you can do me a great favor to share your points about how 
> to explain the coefficients for drug A and drug B if run anova model 
> (response variable = drug A + drug B). is it different from running three 
> separate T tests?
>
> Thank you so much!!
>
> Ding
>
> I need to analyze data generated from a partial two-by-two factorial design: 
> two levels for drug A (yes, no), two levels for drug B (yes, no);  however, 
> data points are available only for three groups, no drugA/no drugB, yes 
> drugA/no drugB, yes drugA/yes drug B, omitting the fourth group of no 
> drugA/yes drugB.  I think we can not investigate interaction between drug A 
> and drug B, can I still run  model using R as usual:  response variable = 
> drug A + drug B?  any suggestion is appreciated.

Replied on CrossValidated where this would be on-topic.

--
David,

>
>
> From: Bert Gunter 
> [mailto:bgunter.4...@gmail.com<mailto:bgunter.4...@gmail.com>]
> Sent: Friday, March 02, 2018 12:32 PM
> To: Ding, Yuan Chun
> Cc: r-help@r-project.org<mailto:r-help@r-project.org>
> Subject: Re: [R] data analysis for partial two-by-two factorial design
>
> 
> [Attention: This email came from an external source. Do not open attachments 
> or click on links from unknown senders or unexpected emails.]
> 
>
> This list provides hel

Re: [R] data analysis for partial two-by-two factorial design

2018-03-05 Thread Bert Gunter
David:

I believe your response on SO is incorrect. This is a standard OFAT (one
factor at a time) design, so that assuming additivity (no interactions),
the effects of drugA and drugB can be determined via the model you rejected:

For example, if baseline control (no drugs) has a response of 0, drugA has
an effect of 1, drugB has an effect of 2, and the effects are additive,
with no noise we would have:

> d <- data.frame(drugA = c("n","y","y"),drugB = c("n","n","y"))
> y <- c(0,1,3)

And a straighforward inear model recovers the effects:

> lm(y ~ drugA + drugB, data=d)

Call:
lm(formula = y ~ drugA + drugB, data = d)

Coefficients:
(Intercept)   drugAy   drugBy
  1.282e-161.000e+002.000e+00

As usual, OFAT designs are blind to interactions, so that if they really
exist, the interpretation as additive effects is incorrect.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Mon, Mar 5, 2018 at 2:03 PM, David Winsemius 
wrote:

>
> > On Mar 5, 2018, at 8:52 AM, Ding, Yuan Chun  wrote:
> >
> > Hi Bert,
> >
> > I am very sorry to bother you again.
> >
> > For the following question, as you suggested, I posted it in both
> Biostars website and stackexchange website, so far no reply.
> >
> > I really hope that you can do me a great favor to share your points
> about how to explain the coefficients for drug A and drug B if run anova
> model (response variable = drug A + drug B). is it different from running
> three separate T tests?
> >
> > Thank you so much!!
> >
> > Ding
> >
> > I need to analyze data generated from a partial two-by-two factorial
> design: two levels for drug A (yes, no), two levels for drug B (yes, no);
> however, data points are available only for three groups, no drugA/no
> drugB, yes drugA/no drugB, yes drugA/yes drug B, omitting the fourth group
> of no drugA/yes drugB.  I think we can not investigate interaction between
> drug A and drug B, can I still run  model using R as usual:  response
> variable = drug A + drug B?  any suggestion is appreciated.
>
> Replied on CrossValidated where this would be on-topic.
>
> --
> David,
>
> >
> >
> > From: Bert Gunter [mailto:bgunter.4...@gmail.com]
> > Sent: Friday, March 02, 2018 12:32 PM
> > To: Ding, Yuan Chun
> > Cc: r-help@r-project.org
> > Subject: Re: [R] data analysis for partial two-by-two factorial design
> >
> > 
> > [Attention: This email came from an external source. Do not open
> attachments or click on links from unknown senders or unexpected emails.]
> > 
> >
> > This list provides help on R programming (see the posting guide linked
> below for details on what is/is not considered on topic), and generally
> avoids discussion of purely statistical issues, which is what your query
> appears to be. The simple answer is yes, you can fit the model as
> described,  but you clearly need the off topic discussion as to what it
> does or does not mean. For that, you might try the stats.stackexchange.com
> <http://stats.stackexchange.com> statistical site.
> >
> > Cheers,
> > Bert
> >
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> > On Fri, Mar 2, 2018 at 10:34 AM, Ding, Yuan Chun  ycd...@coh.org>> wrote:
> > Dear R users,
> >
> > I need to analyze data generated from a partial two-by-two factorial
> design: two levels for drug A (yes, no), two levels for drug B (yes, no);
> however, data points are available only for three groups, no drugA/no
> drugB, yes drugA/no drugB, yes drugA/yes drug B, omitting the fourth group
> of no drugA/yes drugB.  I think we can not investigate interaction between
> drug A and drug B, can I still run  model using R as usual:  response
> variable = drug A + drug B?  any suggestion is appreciated.
> >
> > Thank you very much!
> >
> > Yuan Chun Ding
> >
> >
> > -
> > -SECURITY/CONFIDENTIALITY WARNING-
> > This message (and any attachments) are intended solely f...{{dropped:28}}
> >
> > __
> > R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To
> UNSUBSCRIBE and

Re: [R] data analysis for partial two-by-two factorial design

2018-03-05 Thread David Winsemius

> On Mar 5, 2018, at 8:52 AM, Ding, Yuan Chun  wrote:
> 
> Hi Bert,
> 
> I am very sorry to bother you again.
> 
> For the following question, as you suggested, I posted it in both Biostars 
> website and stackexchange website, so far no reply.
> 
> I really hope that you can do me a great favor to share your points about how 
> to explain the coefficients for drug A and drug B if run anova model 
> (response variable = drug A + drug B). is it different from running three 
> separate T tests?
> 
> Thank you so much!!
> 
> Ding
> 
> I need to analyze data generated from a partial two-by-two factorial design: 
> two levels for drug A (yes, no), two levels for drug B (yes, no);  however, 
> data points are available only for three groups, no drugA/no drugB, yes 
> drugA/no drugB, yes drugA/yes drug B, omitting the fourth group of no 
> drugA/yes drugB.  I think we can not investigate interaction between drug A 
> and drug B, can I still run  model using R as usual:  response variable = 
> drug A + drug B?  any suggestion is appreciated.

Replied on CrossValidated where this would be on-topic.

-- 
David,

> 
> 
> From: Bert Gunter [mailto:bgunter.4...@gmail.com]
> Sent: Friday, March 02, 2018 12:32 PM
> To: Ding, Yuan Chun
> Cc: r-help@r-project.org
> Subject: Re: [R] data analysis for partial two-by-two factorial design
> 
> 
> [Attention: This email came from an external source. Do not open attachments 
> or click on links from unknown senders or unexpected emails.]
> 
> 
> This list provides help on R programming (see the posting guide linked below 
> for details on what is/is not considered on topic), and generally avoids 
> discussion of purely statistical issues, which is what your query appears to 
> be. The simple answer is yes, you can fit the model as described,  but you 
> clearly need the off topic discussion as to what it does or does not mean. 
> For that, you might try the 
> stats.stackexchange.com<http://stats.stackexchange.com> statistical site.
> 
> Cheers,
> Bert
> 
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along and 
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> On Fri, Mar 2, 2018 at 10:34 AM, Ding, Yuan Chun 
> mailto:ycd...@coh.org>> wrote:
> Dear R users,
> 
> I need to analyze data generated from a partial two-by-two factorial design: 
> two levels for drug A (yes, no), two levels for drug B (yes, no);  however, 
> data points are available only for three groups, no drugA/no drugB, yes 
> drugA/no drugB, yes drugA/yes drug B, omitting the fourth group of no 
> drugA/yes drugB.  I think we can not investigate interaction between drug A 
> and drug B, can I still run  model using R as usual:  response variable = 
> drug A + drug B?  any suggestion is appreciated.
> 
> Thank you very much!
> 
> Yuan Chun Ding
> 
> 
> -
> -SECURITY/CONFIDENTIALITY WARNING-
> This message (and any attachments) are intended solely...{{dropped:31}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data analysis for partial two-by-two factorial design

2018-03-05 Thread Ding, Yuan Chun
Hi Bert,

I am very sorry to bother you again.

For the following question, as you suggested, I posted it in both Biostars 
website and stackexchange website, so far no reply.

I really hope that you can do me a great favor to share your points about how 
to explain the coefficients for drug A and drug B if run anova model (response 
variable = drug A + drug B). is it different from running three separate T 
tests?

Thank you so much!!

Ding

I need to analyze data generated from a partial two-by-two factorial design: 
two levels for drug A (yes, no), two levels for drug B (yes, no);  however, 
data points are available only for three groups, no drugA/no drugB, yes 
drugA/no drugB, yes drugA/yes drug B, omitting the fourth group of no drugA/yes 
drugB.  I think we can not investigate interaction between drug A and drug B, 
can I still run  model using R as usual:  response variable = drug A + drug B?  
any suggestion is appreciated.


From: Bert Gunter [mailto:bgunter.4...@gmail.com]
Sent: Friday, March 02, 2018 12:32 PM
To: Ding, Yuan Chun
Cc: r-help@r-project.org
Subject: Re: [R] data analysis for partial two-by-two factorial design


[Attention: This email came from an external source. Do not open attachments or 
click on links from unknown senders or unexpected emails.]


This list provides help on R programming (see the posting guide linked below 
for details on what is/is not considered on topic), and generally avoids 
discussion of purely statistical issues, which is what your query appears to 
be. The simple answer is yes, you can fit the model as described,  but you 
clearly need the off topic discussion as to what it does or does not mean. For 
that, you might try the stats.stackexchange.com<http://stats.stackexchange.com> 
statistical site.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Mar 2, 2018 at 10:34 AM, Ding, Yuan Chun 
mailto:ycd...@coh.org>> wrote:
Dear R users,

I need to analyze data generated from a partial two-by-two factorial design: 
two levels for drug A (yes, no), two levels for drug B (yes, no);  however, 
data points are available only for three groups, no drugA/no drugB, yes 
drugA/no drugB, yes drugA/yes drug B, omitting the fourth group of no drugA/yes 
drugB.  I think we can not investigate interaction between drug A and drug B, 
can I still run  model using R as usual:  response variable = drug A + drug B?  
any suggestion is appreciated.

Thank you very much!

Yuan Chun Ding


-
-SECURITY/CONFIDENTIALITY WARNING-
This message (and any attachments) are intended solely f...{{dropped:28}}

__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data analysis for partial two-by-two factorial design

2018-03-02 Thread Ding, Yuan Chun
Hi Bert,

Thank  you so much for your direction, I have asked a question on stackexchange 
website.

Ding

From: Bert Gunter [mailto:bgunter.4...@gmail.com]
Sent: Friday, March 02, 2018 12:32 PM
To: Ding, Yuan Chun
Cc: r-help@r-project.org
Subject: Re: [R] data analysis for partial two-by-two factorial design


[Attention: This email came from an external source. Do not open attachments or 
click on links from unknown senders or unexpected emails.]


This list provides help on R programming (see the posting guide linked below 
for details on what is/is not considered on topic), and generally avoids 
discussion of purely statistical issues, which is what your query appears to 
be. The simple answer is yes, you can fit the model as described,  but you 
clearly need the off topic discussion as to what it does or does not mean. For 
that, you might try the stats.stackexchange.com<http://stats.stackexchange.com> 
statistical site.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Mar 2, 2018 at 10:34 AM, Ding, Yuan Chun 
mailto:ycd...@coh.org>> wrote:
Dear R users,

I need to analyze data generated from a partial two-by-two factorial design: 
two levels for drug A (yes, no), two levels for drug B (yes, no);  however, 
data points are available only for three groups, no drugA/no drugB, yes 
drugA/no drugB, yes drugA/yes drug B, omitting the fourth group of no drugA/yes 
drugB.  I think we can not investigate interaction between drug A and drug B, 
can I still run  model using R as usual:  response variable = drug A + drug B?  
any suggestion is appreciated.

Thank you very much!

Yuan Chun Ding


-
-SECURITY/CONFIDENTIALITY WARNING-
This message (and any attachments) are intended solely f...{{dropped:28}}

__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data analysis for partial two-by-two factorial design

2018-03-02 Thread Bert Gunter
This list provides help on R programming (see the posting guide linked
below for details on what is/is not considered on topic), and generally
avoids discussion of purely statistical issues, which is what your query
appears to be. The simple answer is yes, you can fit the model as
described,  but you clearly need the off topic discussion as to what it
does or does not mean. For that, you might try the stats.stackexchange.com
statistical site.

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Mar 2, 2018 at 10:34 AM, Ding, Yuan Chun  wrote:

> Dear R users,
>
> I need to analyze data generated from a partial two-by-two factorial
> design: two levels for drug A (yes, no), two levels for drug B (yes, no);
> however, data points are available only for three groups, no drugA/no
> drugB, yes drugA/no drugB, yes drugA/yes drug B, omitting the fourth group
> of no drugA/yes drugB.  I think we can not investigate interaction between
> drug A and drug B, can I still run  model using R as usual:  response
> variable = drug A + drug B?  any suggestion is appreciated.
>
> Thank you very much!
>
> Yuan Chun Ding
>
>
> -
> -SECURITY/CONFIDENTIALITY WARNING-
> This message (and any attachments) are intended solely...{{dropped:13}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data analysis for partial two-by-two factorial design

2018-03-02 Thread Ding, Yuan Chun
Dear R users,

I need to analyze data generated from a partial two-by-two factorial design: 
two levels for drug A (yes, no), two levels for drug B (yes, no);  however, 
data points are available only for three groups, no drugA/no drugB, yes 
drugA/no drugB, yes drugA/yes drug B, omitting the fourth group of no drugA/yes 
drugB.  I think we can not investigate interaction between drug A and drug B, 
can I still run  model using R as usual:  response variable = drug A + drug B?  
any suggestion is appreciated.

Thank you very much!

Yuan Chun Ding


-
-SECURITY/CONFIDENTIALITY WARNING-
This message (and any attachments) are intended solely f...{{dropped:28}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data analysis problem

2012-06-04 Thread Bert Gunter
Stef:

1. Read and follow the posting guide. I could make no sense of your
post. This may be because I didn't work hard enough to decrypt it -
which I shouldn't have to do -- or because I'm too stupid -- which I
can't do anything about anyway.

2. What does this have to do with R anyway? Try posting on a
statistical list like stats.stackexchange.com if your primary concern
is "What should I do" rather than "How do I do _this_ in R?"

-- Bert

On Mon, Jun 4, 2012 at 5:11 PM, stef salvez  wrote:
> Dear R users,
>
> I have data on 4  types of interest rates. These rates evolve over
> time and across regions of countries . So for each type  of interest
> rates I want to run a regression of rates  on some other variables.
> So my regression for one type of  interest rate will be I_{ij}_t= a
> +regressors +error term.
> where I_{ij}_t is the absolute difference in rates between two
> locations i and j at time t. Note that i and j can be locations in
> the same country or locations at different countries.
> What I need  is construct a vector with all the pairs of locations for
> a specific t. Put differently, I want to see how the interest rate
> differential evolves over time for each pair of region. But the
> monthly time series data I have available are heterogeneous across
> countries
>
> Take a look at the following table
>
>  Country A                country B                  country C
> country D     country E   country F
>
>   '2-11-2002 '                07-12-2002'       '23-11-2002'
> '26-10-2002'    '27-12-2002'
> .
> .
> .
> 09-10-2004'               '06-11-2004'              02-10-2004'  09-10-2004'
>
>
> >From the above table, In  country A the time starts at  "2/11/02 , in
> country B the time starts at "07/12/02 and so forth.
> Furthermore,  in  country A the time ends at  "9/10/04 , in country B
> the time ends  at "06/11/02 and so forth.
> As a result of this anomaly in the beginning of time, the time duration for
> each country differs
>
> So I cannot construct these pairs because for a particular time, t,
> the rate exists in one location, but the rate in another location
> starts after t or ends before t.
>
> So the main thing I need to define is what I want done when data has
> not yet started or is already finished in another country. I do not
> know actually what the best solution is. This is my main question.
> I found something about extrapolation (if this is to be the solution)
> but I learn that extrapolation usually has quite a wide margin of
> error!! Apart from that, I have no idea how to implement it in R.
>
> Do you think that it would be better to try and create a more
> symmetric sample so as the start and end dates across countries to be
> very similar?
> It is a data analysis problem. I need some help
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data analysis problem

2012-06-04 Thread stef salvez
Dear R users,

I have data on 4  types of interest rates. These rates evolve over
time and across regions of countries . So for each type  of interest
rates I want to run a regression of rates  on some other variables.
So my regression for one type of  interest rate will be I_{ij}_t= a
+regressors +error term.
where I_{ij}_t is the absolute difference in rates between two
locations i and j at time t. Note that i and j can be locations in
the same country or locations at different countries.
What I need  is construct a vector with all the pairs of locations for
a specific t. Put differently, I want to see how the interest rate
differential evolves over time for each pair of region. But the
monthly time series data I have available are heterogeneous across
countries

Take a look at the following table

  Country Acountry B  country C
country D country E   country F

   '2-11-2002 '07-12-2002'   '23-11-2002'
'26-10-2002''27-12-2002'
.
.
.
09-10-2004'   '06-11-2004'  02-10-2004'  09-10-2004'


>From the above table, In  country A the time starts at  "2/11/02 , in
country B the time starts at "07/12/02 and so forth.
Furthermore,  in  country A the time ends at  "9/10/04 , in country B
the time ends  at "06/11/02 and so forth.
As a result of this anomaly in the beginning of time, the time duration for
each country differs

So I cannot construct these pairs because for a particular time, t,
the rate exists in one location, but the rate in another location
starts after t or ends before t.

So the main thing I need to define is what I want done when data has
not yet started or is already finished in another country. I do not
know actually what the best solution is. This is my main question.
I found something about extrapolation (if this is to be the solution)
but I learn that extrapolation usually has quite a wide margin of
error!! Apart from that, I have no idea how to implement it in R.

Do you think that it would be better to try and create a more
symmetric sample so as the start and end dates across countries to be
very similar?
It is a data analysis problem. I need some help

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data analysis

2012-02-28 Thread David Winsemius


On Feb 28, 2012, at 4:16 AM, Hans Ekbrand wrote:


On Mon, Feb 27, 2012 at 11:04:13PM -0800, nontokozo mhlanga wrote:
Please assist me with  all the tests including risk factor analysis  
i  can
use to analyse the enclosed database established from a  
questionnaire survey

to test for the prevalence of tuberculosis in humans .


That's quite a general request. I think you should try to formulate a
specific question.

Have you read the posting-guide? http://www.R-project.org/posting-guide.html


Yes.



Also, I don't think the list accepts attached files.



That last statement is incorrect, but there are specific requirements.  
Generally mail clients will send .txt, .png, .pdf. and .ps files with  
the proper mime type so that the mail server will accept.



--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data analysis

2012-02-28 Thread Hans Ekbrand
On Mon, Feb 27, 2012 at 11:04:13PM -0800, nontokozo mhlanga wrote:
>  Please assist me with  all the tests including risk factor analysis i  can
> use to analyse the enclosed database established from a questionnaire survey
> to test for the prevalence of tuberculosis in humans .

That's quite a general request. I think you should try to formulate a
specific question.

Have you read the posting-guide? http://www.R-project.org/posting-guide.html

Also, I don't think the list accepts attached files.

-- 
Hans Ekbrand (http://sociologi.cjb.net) 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data analysis

2012-02-28 Thread nontokozo mhlanga
 Please assist me with  all the tests including risk factor analysis i  can
use to analyse the enclosed database established from a questionnaire survey
to test for the prevalence of tuberculosis in humans .

Thank you

Nonty

--
View this message in context: 
http://r.789695.n4.nabble.com/data-analysis-tp4427257p4427257.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Analysis for Gas Prices

2011-12-04 Thread John
On Sat, 3 Dec 2011 14:11:39 -0800 (PST)
inferno846  wrote:

...
> 
> Also, could anyone help me figure out how to import a data table to
> R? When I try to create a .txt file from a word document and read it
> in R, the format of the first column always messes up. Any/all help
> is appreciated.
> 
For reading in data, avoid using Word, a program that is overkill for
almost all purposes.  Either use Notepad, or grab a simple text editor
off the internet - preferably capable of block selection.  

Since you do not say just how the "first column messes up," it is
rather problematic trying suggest a solution the problem.  The first
line of the table and a couple data lines, the form of command you use
to read in the data and the field separator in your table are all useful
pieces of information. There are however a great many books about R out
there and nearly all of them address at least briefly how to import
data. Portland has one of the best bookstores on the planet.  As a last
resort you might try the command ??read.table from the R prompt.

JD

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Analysis for Gas Prices

2011-12-03 Thread B77S
use a < ? > to get help on a function; example:

?read.table

If you do this you will see an option called "header"...  
use header=T if your top row contains column names.  

Learn how to read these help pages.  Also, read thru a few beginner R
manuals and see this website:
http://www.statmethods.net/interface/io.html

As for the rest of your questions. 
Perhaps this is your sign to begin reading a intro-stats book, or inquire on
a different forum.
see:
http://stats.stackexchange.com/questions

good luck


inferno846 wrote
> 
> Hi there,
> 
> I'm looking to analyze a set of data on local gas prices for a single day.
> I'm wondering what kind of questions I should be looking to ask and how to
> find and answer to them with R. Examples would be: 
> 
> Do prices differ between brands?
> Does location affect (NE, NW, SE, SW) price?
> Does the number of nearby (within .25 miles) competitors affect price?
> Do gas stations near shopping centers or highways have different prices?
> 
> Also, could anyone help me figure out how to import a data table to R?
> When I try to create a .txt file from a word document and read it in R,
> the format of the first column always messes up. Any/all help is
> appreciated.
> 


--
View this message in context: 
http://r.789695.n4.nabble.com/Data-Analysis-for-Gas-Prices-tp4155078p4155185.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data Analysis for Gas Prices

2011-12-03 Thread inferno846
Hi there,

I'm looking to analyze a set of data on local gas prices for a single day.
I'm wondering what kind of questions I should be looking to ask and how to
find and answer to them with R. Examples would be: 

Do prices differ between brands?
Does location affect (NE, NW, SE, SW) price?
Does the number of nearby (within .25 miles) competitors affect price?
Do gas stations near shopping centers or highways have different prices?

Also, could anyone help me figure out how to import a data table to R? When
I try to create a .txt file from a word document and read it in R, the
format of the first column always messes up. Any/all help is appreciated.

--
View this message in context: 
http://r.789695.n4.nabble.com/Data-Analysis-for-Gas-Prices-tp4155078p4155078.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data analysis: normal approximation for binomial

2011-11-21 Thread John Kane
You need a statistician or at least someone who's take a stats course in the 
last 10 years but it may be what the author was trying to get at.  

At least the binomial is descrete as is 
the z so it may be that the z was used as easier to calculate than a binomial?  
How old is the paper. Before, let's say the early 1980s a lot of people were 
still doing stats by hand (IIRC, even calculators were relatively expensive and 
rare and calculating a binomial of any size was close to impractical.-- I tried 
it once).  So using a z-distribution with Yates correction made sense.

It's a little like early factor analysis when rotate the factors actually meant 
rotate the glass plates. 


--- On Sun, 11/20/11, Colstat  wrote:

From: Colstat 
Subject: Re: [R] Data analysis: normal approximation for binomial
To: "John Kane" 
Cc: r-help@r-project.org
Received: Sunday, November 20, 2011, 10:10 PM

Hey, John

I like the explicit formula they put in there.  I looked around last night and 
found this
http://www.stat.yale.edu/Courses/1997-98/101/binom.htm





which is basically normal approximation to the binomial, I thought that was 
what the author was trying to get at?

Colin

On Sun, Nov 20, 2011 at 8:49 AM, John Kane  wrote:




Hi Colin,



I'm no statistician and it's been a very long time but IIRC a t-test is a 
'modified version of a x-test that is used on small sample sizes.  (I can hear 
some of our statistians screaming in the background as I type.)







In any case I thing a Z distribution is descrete and a standard normal is not 
so a user can use Yates continuity correction to interpolate values for the 
normal between the discrete z-values.  Or something like this.







I have only encountered it once in a Psych stats course taught by an animal 
geneticist who seemed to think it was important. To be honest, it looked pretty 
trivial for the type of data I'd be likely to see.



I cannot remember ever seeing a continuity correction used in a published 
paper--for that matter I have trouble remembering a z-test.



If you want more information on the subject I found a very tiny bit of info at 
http://books.google.ca/books?id=SiJ2UB3dv9UC&pg=PA139&lpg=PA139&dq=z-test+with+continuity+correction&source=bl&ots=0vMTCUZWXx&sig=bfCPx0vynGjA0tHLRAf6B42x0mM&hl=en&ei=nQHJTo7LPIrf0gHxs6Aq&sa=X&oi=book_result&ct=result&resnum=2&ved=0CC0Q6AEwAQ#v=onepage&q=z-test%20with%20continuity%20correction&f=false







A print source that, IIRC, has a discussion of this is "Hayes, W. (1981. 
Statistics. 3rd Ed., Holt Rinehart and Winston



Have fun



--- On Sat, 11/19/11, Colstat  wrote:



> From: Colstat 

> Subject: [R] Data analysis: normal approximation for binomial

> To: r-help@r-project.org

> Received: Saturday, November 19, 2011, 6:01 PM

> Dear R experts,

>

> I am trying to analyze data from an article, the data looks

> like this

>

> Patient Age Sex Aura preCSM preFreq preIntensity postFreq

> postIntensity

> postOutcome

> 1 47 F A 4 6 9 2 8 SD

> 2 40 F A/N 5 8 9 0 0 E

> 3 49 M N 5 8 9 2 6 SD

> 4 40 F A 5 3 10 0 0 E

> 5 42 F N 5 4 9 0 0 E

> 6 35 F N 5 8 9 12 7 NR

> 7 38 F A 5 NA 10 2 9 SD

> 8 44 M A 4 4 10 0 0 E

> 9 47 M A 4 5 8 2 7 SD

> 10 53 F A 5 3 10 0 0 E

> 11 41 F N 5 6 7 0 0 E

> 12 49 F A 4 6 8 0 0 E

> 13 48 F A 5 4 8 0 0 E

> 14 63 M N 4 6 9 15 9 NR

> 15 58 M N 5 9 7 2 8 SD

> 16 53 F A 4 3 9 0 0 E

> 17 47 F N 5 4 8 1 4 SD

> 18 34 F A NA  5 9 0 0 E

> 19 53 F N 5 4 9 5 7 NR

> 20 45 F N 5 5 8 5 4 SD

> 21 30 F A 5 3 8 0 0 E

> 22 29 F A 4 5 9 0 0 E

> 23 49 F N 5 9 10 0 0 E

> 24 24 F A 5 5 9 0 0 E

> 25 63 F N 4 19 7 10 7 NR

> 26 62 F A 5 8 9 11 9 NR

> 27 44 F A 5 3 10 0 0 E

> 28 38 F N 4 8 10 1 3 SD

> 29 38 F N 5 3 10 0 0 E

>

> How do I do a binomial distribution z statistics with

> continuity

> correction? basically normal approximation.

> Could anyone give me some suggestions what I (or R) can do

> with these data?

> I have tried tried histogram, maybe t-test? or even

> lattice?  what else can

> I(or can R) do?

> help please, thanks so much.

>

>     [[alternative HTML version deleted]]

>

> __

> R-help@r-project.org

> mailing list

> https://stat.ethz.ch/mailman/listinfo/r-help

> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

> and provide commented, minimal, self-contained,

> reproducible code.

>




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data analysis: normal approximation for binomial

2011-11-20 Thread Colstat
Hey, John

I like the explicit formula they put in there.  I looked around last night
and found this
http://www.stat.yale.edu/Courses/1997-98/101/binom.htm

which is basically normal approximation to the binomial, I thought that was
what the author was trying to get at?

Colin

On Sun, Nov 20, 2011 at 8:49 AM, John Kane  wrote:

> Hi Colin,
>
> I'm no statistician and it's been a very long time but IIRC a t-test is a
> 'modified version of a x-test that is used on small sample sizes.  (I can
> hear some of our statistians screaming in the background as I type.)
>
> In any case I thing a Z distribution is descrete and a standard normal is
> not so a user can use Yates continuity correction to interpolate values for
> the normal between the discrete z-values.  Or something like this.
>
> I have only encountered it once in a Psych stats course taught by an
> animal geneticist who seemed to think it was important. To be honest, it
> looked pretty trivial for the type of data I'd be likely to see.
>
> I cannot remember ever seeing a continuity correction used in a published
> paper--for that matter I have trouble remembering a z-test.
>
> If you want more information on the subject I found a very tiny bit of
> info at
> http://books.google.ca/books?id=SiJ2UB3dv9UC&pg=PA139&lpg=PA139&dq=z-test+with+continuity+correction&source=bl&ots=0vMTCUZWXx&sig=bfCPx0vynGjA0tHLRAf6B42x0mM&hl=en&ei=nQHJTo7LPIrf0gHxs6Aq&sa=X&oi=book_result&ct=result&resnum=2&ved=0CC0Q6AEwAQ#v=onepage&q=z-test%20with%20continuity%20correction&f=false
>
> A print source that, IIRC, has a discussion of this is "Hayes, W. (1981.
> Statistics. 3rd Ed., Holt Rinehart and Winston
>
> Have fun
>
> --- On Sat, 11/19/11, Colstat  wrote:
>
> > From: Colstat 
> > Subject: [R] Data analysis: normal approximation for binomial
> > To: r-help@r-project.org
> > Received: Saturday, November 19, 2011, 6:01 PM
> > Dear R experts,
> >
> > I am trying to analyze data from an article, the data looks
> > like this
> >
> > Patient Age Sex Aura preCSM preFreq preIntensity postFreq
> > postIntensity
> > postOutcome
> > 1 47 F A 4 6 9 2 8 SD
> > 2 40 F A/N 5 8 9 0 0 E
> > 3 49 M N 5 8 9 2 6 SD
> > 4 40 F A 5 3 10 0 0 E
> > 5 42 F N 5 4 9 0 0 E
> > 6 35 F N 5 8 9 12 7 NR
> > 7 38 F A 5 NA 10 2 9 SD
> > 8 44 M A 4 4 10 0 0 E
> > 9 47 M A 4 5 8 2 7 SD
> > 10 53 F A 5 3 10 0 0 E
> > 11 41 F N 5 6 7 0 0 E
> > 12 49 F A 4 6 8 0 0 E
> > 13 48 F A 5 4 8 0 0 E
> > 14 63 M N 4 6 9 15 9 NR
> > 15 58 M N 5 9 7 2 8 SD
> > 16 53 F A 4 3 9 0 0 E
> > 17 47 F N 5 4 8 1 4 SD
> > 18 34 F A NA  5 9 0 0 E
> > 19 53 F N 5 4 9 5 7 NR
> > 20 45 F N 5 5 8 5 4 SD
> > 21 30 F A 5 3 8 0 0 E
> > 22 29 F A 4 5 9 0 0 E
> > 23 49 F N 5 9 10 0 0 E
> > 24 24 F A 5 5 9 0 0 E
> > 25 63 F N 4 19 7 10 7 NR
> > 26 62 F A 5 8 9 11 9 NR
> > 27 44 F A 5 3 10 0 0 E
> > 28 38 F N 4 8 10 1 3 SD
> > 29 38 F N 5 3 10 0 0 E
> >
> > How do I do a binomial distribution z statistics with
> > continuity
> > correction? basically normal approximation.
> > Could anyone give me some suggestions what I (or R) can do
> > with these data?
> > I have tried tried histogram, maybe t-test? or even
> > lattice?  what else can
> > I(or can R) do?
> > help please, thanks so much.
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org
> > mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> > reproducible code.
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data analysis: normal approximation for binomial

2011-11-20 Thread Colstat
Hey, Joshua
Thank so much for your quick response.  Those examples you produced are
very good, I'm pretty impressed by the graphs.  When I ran the last line, I
hit an error, so I ran what's inside summary(), it give me

Error: could not find function "lmer"

Something with the package "lme4"?
Colin


On Sun, Nov 20, 2011 at 1:00 AM, Joshua Wiley wrote:

> Hi Colin,
>
> I have never heard of a binomial distribution z statistic with (or
> without for that matter) a continuity correction, but I am not a
> statistician.  Other's may have some ideas there.  As for other ways
> to analyze the data, I skimmed through the article and brought the
> data and played around with some different analyses and graphs.  I
> attached a file headache.txt with all the R script (including the data
> in an Rish format).  It is really a script file (i.e., .R) but for the
> listservs sake I saved it as a txt.  There are quite a few different
> things I tried in there so hopefully it gives you some ideas.
> Regardless of the analysis type used and whether one considers
> proportion that "significantly improved" or the raw frequency or
> intensity scores, I would say that concluding the treatment was
> effective is a good conclusion.  The only real concern could be that
> people would naturally get better on their own (a control group would
> be needed to bolster the causal inference drawn from a pre/post
> measurement).  However, given at least what I know about migraines, it
> is often a fairly chronic condition so over a relatively short time
> period, it seems implausible to conclude that as many people would be
> improving as this study reported.
>
> Cheers,
>
> Josh
>
> On Sat, Nov 19, 2011 at 7:43 PM, Colstat  wrote:
> > hey, Joshua
> > I was reading this paper, in attachment, and reproducing the results.
> >
> > I was really confused when he said in the paper "The results were then
> > statistically analyzed using binomial distribution z statistics with
> > continuity correction."  The data is binomial?  To me, this is a paired
> > t-test.
> >
> > What command should I use to get those results (the first paragraph in
> > Results section)?  Basically, it's a pre and post treatment problem.
> >
> > What other graphical analysis do you think is appropriate? reshape
> package?
> > lattice package, namely conditional graph?
> >
> > I know this might be too much, but I do really appreciate it if you do
> take
> > a look at it.
> >
> > Thanks,
> > Colin
> >
> >
> > On Sat, Nov 19, 2011 at 10:15 PM, Joshua Wiley 
> > wrote:
> >>
> >> Hi,
> >>
> >> I am not clear what your goal is.  There is a variety of data there.
> >> You could look at t-test differences in preIntensity broken down by
> >> sex, you could use regression looking at postIntensity controlling for
> >> preIntensity and explained by age, you could
> >>
> >> Why are you analyzing data from an article?  What did the article do?
> >> What you mention---some sort of z statistic (what exactly this was of
> >> and how it should be calculated did not seem like was clear even to
> >> you), histogram, t-test, lattice, are all very different things that
> >> help answer different questions, show different things, and in one is
> >> a piece of software.
> >>
> >> Without a clearer question and goal, my best advice is here are a
> >> number of different functions some of which may be useful to you:
> >>
> >> ls(pos = "package:stats")
> >>
> >> Cheers,
> >>
> >> Josh
> >>
> >> On Sat, Nov 19, 2011 at 3:01 PM, Colstat  wrote:
> >> > Dear R experts,
> >> >
> >> > I am trying to analyze data from an article, the data looks like this
> >> >
> >> > Patient Age Sex Aura preCSM preFreq preIntensity postFreq
> postIntensity
> >> > postOutcome
> >> > 1 47 F A 4 6 9 2 8 SD
> >> > 2 40 F A/N 5 8 9 0 0 E
> >> > 3 49 M N 5 8 9 2 6 SD
> >> > 4 40 F A 5 3 10 0 0 E
> >> > 5 42 F N 5 4 9 0 0 E
> >> > 6 35 F N 5 8 9 12 7 NR
> >> > 7 38 F A 5 NA 10 2 9 SD
> >> > 8 44 M A 4 4 10 0 0 E
> >> > 9 47 M A 4 5 8 2 7 SD
> >> > 10 53 F A 5 3 10 0 0 E
> >> > 11 41 F N 5 6 7 0 0 E
> >> > 12 49 F A 4 6 8 0 0 E
> >> > 13 48 F A 5 4 8 0 0 E
> >> > 14 63 M N 4 6 9 15 9 NR
> >> > 15 58 M N 5 9 7 2 8 SD
> >> > 16 53 F A 4 3 9 0 0 E
> >> > 17 47 F N 5 4 8 1 4 SD
> >> > 18 34 F A NA  5 9 0 0 E
> >> > 19 53 F N 5 4 9 5 7 NR
> >> > 20 45 F N 5 5 8 5 4 SD
> >> > 21 30 F A 5 3 8 0 0 E
> >> > 22 29 F A 4 5 9 0 0 E
> >> > 23 49 F N 5 9 10 0 0 E
> >> > 24 24 F A 5 5 9 0 0 E
> >> > 25 63 F N 4 19 7 10 7 NR
> >> > 26 62 F A 5 8 9 11 9 NR
> >> > 27 44 F A 5 3 10 0 0 E
> >> > 28 38 F N 4 8 10 1 3 SD
> >> > 29 38 F N 5 3 10 0 0 E
> >> >
> >> > How do I do a binomial distribution z statistics with continuity
> >> > correction? basically normal approximation.
> >> > Could anyone give me some suggestions what I (or R) can do with these
> >> > data?
> >> > I have tried tried histogram, maybe t-test? or even lattice?  what
> else
> >> > can
> >> > I(or can R) do?
> >> > help please, thanks so much.
> >> >
> >> >[[alternativ

Re: [R] Data analysis: normal approximation for binomial

2011-11-20 Thread John Kane
Hi Colin,

I'm no statistician and it's been a very long time but IIRC a t-test is a 
'modified version of a x-test that is used on small sample sizes.  (I can hear 
some of our statistians screaming in the background as I type.)

In any case I thing a Z distribution is descrete and a standard normal is not 
so a user can use Yates continuity correction to interpolate values for the 
normal between the discrete z-values.  Or something like this.  

I have only encountered it once in a Psych stats course taught by an animal 
geneticist who seemed to think it was important. To be honest, it looked pretty 
trivial for the type of data I'd be likely to see. 

I cannot remember ever seeing a continuity correction used in a published 
paper--for that matter I have trouble remembering a z-test.  

If you want more information on the subject I found a very tiny bit of info at 
http://books.google.ca/books?id=SiJ2UB3dv9UC&pg=PA139&lpg=PA139&dq=z-test+with+continuity+correction&source=bl&ots=0vMTCUZWXx&sig=bfCPx0vynGjA0tHLRAf6B42x0mM&hl=en&ei=nQHJTo7LPIrf0gHxs6Aq&sa=X&oi=book_result&ct=result&resnum=2&ved=0CC0Q6AEwAQ#v=onepage&q=z-test%20with%20continuity%20correction&f=false

A print source that, IIRC, has a discussion of this is "Hayes, W. (1981. 
Statistics. 3rd Ed., Holt Rinehart and Winston

Have fun

--- On Sat, 11/19/11, Colstat  wrote:

> From: Colstat 
> Subject: [R] Data analysis: normal approximation for binomial
> To: r-help@r-project.org
> Received: Saturday, November 19, 2011, 6:01 PM
> Dear R experts,
> 
> I am trying to analyze data from an article, the data looks
> like this
> 
> Patient Age Sex Aura preCSM preFreq preIntensity postFreq
> postIntensity
> postOutcome
> 1 47 F A 4 6 9 2 8 SD
> 2 40 F A/N 5 8 9 0 0 E
> 3 49 M N 5 8 9 2 6 SD
> 4 40 F A 5 3 10 0 0 E
> 5 42 F N 5 4 9 0 0 E
> 6 35 F N 5 8 9 12 7 NR
> 7 38 F A 5 NA 10 2 9 SD
> 8 44 M A 4 4 10 0 0 E
> 9 47 M A 4 5 8 2 7 SD
> 10 53 F A 5 3 10 0 0 E
> 11 41 F N 5 6 7 0 0 E
> 12 49 F A 4 6 8 0 0 E
> 13 48 F A 5 4 8 0 0 E
> 14 63 M N 4 6 9 15 9 NR
> 15 58 M N 5 9 7 2 8 SD
> 16 53 F A 4 3 9 0 0 E
> 17 47 F N 5 4 8 1 4 SD
> 18 34 F A NA  5 9 0 0 E
> 19 53 F N 5 4 9 5 7 NR
> 20 45 F N 5 5 8 5 4 SD
> 21 30 F A 5 3 8 0 0 E
> 22 29 F A 4 5 9 0 0 E
> 23 49 F N 5 9 10 0 0 E
> 24 24 F A 5 5 9 0 0 E
> 25 63 F N 4 19 7 10 7 NR
> 26 62 F A 5 8 9 11 9 NR
> 27 44 F A 5 3 10 0 0 E
> 28 38 F N 4 8 10 1 3 SD
> 29 38 F N 5 3 10 0 0 E
> 
> How do I do a binomial distribution z statistics with
> continuity
> correction? basically normal approximation.
> Could anyone give me some suggestions what I (or R) can do
> with these data?
> I have tried tried histogram, maybe t-test? or even
> lattice?  what else can
> I(or can R) do?
> help please, thanks so much.
> 
>     [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data analysis: normal approximation for binomial

2011-11-19 Thread Joshua Wiley
Hi Colin,

I have never heard of a binomial distribution z statistic with (or
without for that matter) a continuity correction, but I am not a
statistician.  Other's may have some ideas there.  As for other ways
to analyze the data, I skimmed through the article and brought the
data and played around with some different analyses and graphs.  I
attached a file headache.txt with all the R script (including the data
in an Rish format).  It is really a script file (i.e., .R) but for the
listservs sake I saved it as a txt.  There are quite a few different
things I tried in there so hopefully it gives you some ideas.
Regardless of the analysis type used and whether one considers
proportion that "significantly improved" or the raw frequency or
intensity scores, I would say that concluding the treatment was
effective is a good conclusion.  The only real concern could be that
people would naturally get better on their own (a control group would
be needed to bolster the causal inference drawn from a pre/post
measurement).  However, given at least what I know about migraines, it
is often a fairly chronic condition so over a relatively short time
period, it seems implausible to conclude that as many people would be
improving as this study reported.

Cheers,

Josh

On Sat, Nov 19, 2011 at 7:43 PM, Colstat  wrote:
> hey, Joshua
> I was reading this paper, in attachment, and reproducing the results.
>
> I was really confused when he said in the paper "The results were then
> statistically analyzed using binomial distribution z statistics with
> continuity correction."  The data is binomial?  To me, this is a paired
> t-test.
>
> What command should I use to get those results (the first paragraph in
> Results section)?  Basically, it's a pre and post treatment problem.
>
> What other graphical analysis do you think is appropriate? reshape package?
> lattice package, namely conditional graph?
>
> I know this might be too much, but I do really appreciate it if you do take
> a look at it.
>
> Thanks,
> Colin
>
>
> On Sat, Nov 19, 2011 at 10:15 PM, Joshua Wiley 
> wrote:
>>
>> Hi,
>>
>> I am not clear what your goal is.  There is a variety of data there.
>> You could look at t-test differences in preIntensity broken down by
>> sex, you could use regression looking at postIntensity controlling for
>> preIntensity and explained by age, you could
>>
>> Why are you analyzing data from an article?  What did the article do?
>> What you mention---some sort of z statistic (what exactly this was of
>> and how it should be calculated did not seem like was clear even to
>> you), histogram, t-test, lattice, are all very different things that
>> help answer different questions, show different things, and in one is
>> a piece of software.
>>
>> Without a clearer question and goal, my best advice is here are a
>> number of different functions some of which may be useful to you:
>>
>> ls(pos = "package:stats")
>>
>> Cheers,
>>
>> Josh
>>
>> On Sat, Nov 19, 2011 at 3:01 PM, Colstat  wrote:
>> > Dear R experts,
>> >
>> > I am trying to analyze data from an article, the data looks like this
>> >
>> > Patient Age Sex Aura preCSM preFreq preIntensity postFreq postIntensity
>> > postOutcome
>> > 1 47 F A 4 6 9 2 8 SD
>> > 2 40 F A/N 5 8 9 0 0 E
>> > 3 49 M N 5 8 9 2 6 SD
>> > 4 40 F A 5 3 10 0 0 E
>> > 5 42 F N 5 4 9 0 0 E
>> > 6 35 F N 5 8 9 12 7 NR
>> > 7 38 F A 5 NA 10 2 9 SD
>> > 8 44 M A 4 4 10 0 0 E
>> > 9 47 M A 4 5 8 2 7 SD
>> > 10 53 F A 5 3 10 0 0 E
>> > 11 41 F N 5 6 7 0 0 E
>> > 12 49 F A 4 6 8 0 0 E
>> > 13 48 F A 5 4 8 0 0 E
>> > 14 63 M N 4 6 9 15 9 NR
>> > 15 58 M N 5 9 7 2 8 SD
>> > 16 53 F A 4 3 9 0 0 E
>> > 17 47 F N 5 4 8 1 4 SD
>> > 18 34 F A NA  5 9 0 0 E
>> > 19 53 F N 5 4 9 5 7 NR
>> > 20 45 F N 5 5 8 5 4 SD
>> > 21 30 F A 5 3 8 0 0 E
>> > 22 29 F A 4 5 9 0 0 E
>> > 23 49 F N 5 9 10 0 0 E
>> > 24 24 F A 5 5 9 0 0 E
>> > 25 63 F N 4 19 7 10 7 NR
>> > 26 62 F A 5 8 9 11 9 NR
>> > 27 44 F A 5 3 10 0 0 E
>> > 28 38 F N 4 8 10 1 3 SD
>> > 29 38 F N 5 3 10 0 0 E
>> >
>> > How do I do a binomial distribution z statistics with continuity
>> > correction? basically normal approximation.
>> > Could anyone give me some suggestions what I (or R) can do with these
>> > data?
>> > I have tried tried histogram, maybe t-test? or even lattice?  what else
>> > can
>> > I(or can R) do?
>> > help please, thanks so much.
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> Programmer Analyst II, ATS Statistical Consulting Group
>> University of California, Los Angeles
>> https://joshuawiley.com/
>
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Anal

Re: [R] Data analysis: normal approximation for binomial

2011-11-19 Thread Joshua Wiley
Hi,

I am not clear what your goal is.  There is a variety of data there.
You could look at t-test differences in preIntensity broken down by
sex, you could use regression looking at postIntensity controlling for
preIntensity and explained by age, you could

Why are you analyzing data from an article?  What did the article do?
What you mention---some sort of z statistic (what exactly this was of
and how it should be calculated did not seem like was clear even to
you), histogram, t-test, lattice, are all very different things that
help answer different questions, show different things, and in one is
a piece of software.

Without a clearer question and goal, my best advice is here are a
number of different functions some of which may be useful to you:

ls(pos = "package:stats")

Cheers,

Josh

On Sat, Nov 19, 2011 at 3:01 PM, Colstat  wrote:
> Dear R experts,
>
> I am trying to analyze data from an article, the data looks like this
>
> Patient Age Sex Aura preCSM preFreq preIntensity postFreq postIntensity
> postOutcome
> 1 47 F A 4 6 9 2 8 SD
> 2 40 F A/N 5 8 9 0 0 E
> 3 49 M N 5 8 9 2 6 SD
> 4 40 F A 5 3 10 0 0 E
> 5 42 F N 5 4 9 0 0 E
> 6 35 F N 5 8 9 12 7 NR
> 7 38 F A 5 NA 10 2 9 SD
> 8 44 M A 4 4 10 0 0 E
> 9 47 M A 4 5 8 2 7 SD
> 10 53 F A 5 3 10 0 0 E
> 11 41 F N 5 6 7 0 0 E
> 12 49 F A 4 6 8 0 0 E
> 13 48 F A 5 4 8 0 0 E
> 14 63 M N 4 6 9 15 9 NR
> 15 58 M N 5 9 7 2 8 SD
> 16 53 F A 4 3 9 0 0 E
> 17 47 F N 5 4 8 1 4 SD
> 18 34 F A NA  5 9 0 0 E
> 19 53 F N 5 4 9 5 7 NR
> 20 45 F N 5 5 8 5 4 SD
> 21 30 F A 5 3 8 0 0 E
> 22 29 F A 4 5 9 0 0 E
> 23 49 F N 5 9 10 0 0 E
> 24 24 F A 5 5 9 0 0 E
> 25 63 F N 4 19 7 10 7 NR
> 26 62 F A 5 8 9 11 9 NR
> 27 44 F A 5 3 10 0 0 E
> 28 38 F N 4 8 10 1 3 SD
> 29 38 F N 5 3 10 0 0 E
>
> How do I do a binomial distribution z statistics with continuity
> correction? basically normal approximation.
> Could anyone give me some suggestions what I (or R) can do with these data?
> I have tried tried histogram, maybe t-test? or even lattice?  what else can
> I(or can R) do?
> help please, thanks so much.
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, ATS Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data analysis: normal approximation for binomial

2011-11-19 Thread Colstat
Dear R experts,

I am trying to analyze data from an article, the data looks like this

Patient Age Sex Aura preCSM preFreq preIntensity postFreq postIntensity
postOutcome
1 47 F A 4 6 9 2 8 SD
2 40 F A/N 5 8 9 0 0 E
3 49 M N 5 8 9 2 6 SD
4 40 F A 5 3 10 0 0 E
5 42 F N 5 4 9 0 0 E
6 35 F N 5 8 9 12 7 NR
7 38 F A 5 NA 10 2 9 SD
8 44 M A 4 4 10 0 0 E
9 47 M A 4 5 8 2 7 SD
10 53 F A 5 3 10 0 0 E
11 41 F N 5 6 7 0 0 E
12 49 F A 4 6 8 0 0 E
13 48 F A 5 4 8 0 0 E
14 63 M N 4 6 9 15 9 NR
15 58 M N 5 9 7 2 8 SD
16 53 F A 4 3 9 0 0 E
17 47 F N 5 4 8 1 4 SD
18 34 F A NA  5 9 0 0 E
19 53 F N 5 4 9 5 7 NR
20 45 F N 5 5 8 5 4 SD
21 30 F A 5 3 8 0 0 E
22 29 F A 4 5 9 0 0 E
23 49 F N 5 9 10 0 0 E
24 24 F A 5 5 9 0 0 E
25 63 F N 4 19 7 10 7 NR
26 62 F A 5 8 9 11 9 NR
27 44 F A 5 3 10 0 0 E
28 38 F N 4 8 10 1 3 SD
29 38 F N 5 3 10 0 0 E

How do I do a binomial distribution z statistics with continuity
correction? basically normal approximation.
Could anyone give me some suggestions what I (or R) can do with these data?
I have tried tried histogram, maybe t-test? or even lattice?  what else can
I(or can R) do?
help please, thanks so much.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] Deducer: An R data analysis GUI

2009-12-03 Thread ian . fellows
Announcing a new version of Deducer:

Deducer 0.2-1 is an intuitive, cross-platform graphical data analysis
system. It uses menus and dialogs to guide the user efficiently through
the data manipulation and analysis process, and has an excel like
spreadsheet for easy data frame visualization and editing. Deducer works
best when used with the Java based R GUI JGR, but the dialogs can be
called from the command line. Dialogs have also been integrated into the
Windows Rgui.

The statistical methods and concepts covered by the dialogs is increasing,
and currently includes:

Data Manipulation: factor editing, Variable recoding, subseting, sorting,
merging, transposing, opening data (text and foreign), and saving data

Analysis: Frequencies, Descriptives, Contingency tables (and related
statistics), one-sample, two-sample, k-sample tests, as well as
correlations

Models: Linear Models (with optional HCCM), Logistic regression,
Generalized Linear Models

Since it’s initial release in August, there have been significant changes
to the back-end as well as the programmatic interface. This has resulted
in increased stability, and made for easier incorporation of Deducer’s R
functions into non-GUI programs. Additionally, a plug-in interface has
been added, which allows arbitrary packages to add onto Deducer’s menu
system.

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data analysis package for positively skewed data

2009-09-27 Thread KABELI MEFANE
R-helpers
 
A curious question: Can you make suggestions  as to  what to use in R for 
the data from a sample of the following:
 
Hypermarket <- matrix(rnorm(100, mean=5, sd=5000))
Supermarket <- matrix(rnorm(400, mean=34000, sd=3000))
Minimarket  <- matrix(rnorm(1000, mean=1,sd=2000))
Cornershop  <- matrix(rnorm(1500, mean=2500, sd=500))
Spazashop   <- matrix(rnorm(2000, mean=1000, sd=250))
dat=data.frame(type=c(rep("Hypermarket",100), rep("Supermarket",400),
rep("Minimarket",1000),rep("Cornershop",1500), rep("Spazashop",2000)),
value=c(Hypermarket, Supermarket, Minimarket, Cornershop,Spazashop))
#Sampling with without replacement 
n<-1000
dat.srs<-dat[sample(1:dim(dat)[1], size=n,replace=F),]
dat.srs   
#Number of observations for each outlet type in a sample
dat.srs$type <- factor(dat.srs$type, 
levels = c("Hypermarket","Supermarket","Minimarket","Cornershop","Spazashop"))

(numoutlets<-data.frame(table(dat.srs$type)))
 
 
suggest a package that can help me get all the analysis info such as 
meean,var,std,dev,cv,ci,proportions,...



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data analysis. R

2009-03-22 Thread UBC

thx for ur fast responds.
but sorry for asking stupid, i am a turn beginner of R (just trying it out
<3 months, and i am taking my first course about it)
so, to tackle this questions,
i was told to use "nested design" method,
could you actually show me how would u attempt this problem?
(a) Determine if insulation in the house effects the average gas
consumption.
(b) How much extra gas is used when there is no insulation? Provide an
interval estimate as well as a point estimate.

i just got confused by the backgroud information.
"We are interested in looking at the effect of insulation on gas
consumption. The average outside temperature (degrees celcius) was also
measured."

so how should my model looks like?
i dont even know what should be my explanatory/response variables...

thx in advance



Gabor Grothendieck wrote:
> 
> This works with the example.  If the real data is different it may not
> work.  To run the example below just copy and paste it into R.
> To run with the real data replace textConnection(Lines) with
> "insulation.txt" everywhere.
> 
> Lines <- "Before insulAfter insul.
> tempgas tempgas
> -0.87.2-0.74.8
> -0.76.90.84.6
> 0.46.41.04.7
> 2.56.01.44.0
> 2.95.81.54.2
> 3.25.81.64.2
> 3.65.62.34.1
> 3.94.72.54.0
> 4.25.82.53.5
> 4.35.23.13.2
> 5.44.93.93.9
> 6.04.94.03.5
> 6.04.34.03.7
> 6.04.44.23.5
> 6.24.54.33.5
> 6.34.64.63.7
> 6.93.74.73.5
> 7.03.94.93.4
> 7.44.24.93.7
> 7.54.04.94.0
> 7.53.95.03.6
> 7.63.55.33.7
> 8.04.06.22.8
> 8.53.67.13.0
> 9.13.17.22.8
> 10.2  2.67.52.6
>8.02.7
>8.72.8
>8.81.3
>9.71.5"
> 
> nfld <- count.fields(textConnection(Lines))
> data.lines <- readLines(textConnection(Lines))
> data.lines <- ifelse(nfld == 2, paste("NA NA", data.lines), data.lines)
> my.data <- read.table(textConnection(data.lines), header = TRUE, skip = 1)
> 
> 
> 
> 
> On Sat, Mar 21, 2009 at 8:13 PM, UBC  wrote:
>>
>> so i am having this question
>> what should i do if the give data file (.txt) has 4 columns, but
>> different
>> lengths?
>> how can i read them in R?
>> any idea for the following problem?
>>
>>
>> Gas consumption (1000 cubic feet) was measured before and after
>> insulation
>> was put into
>> a house. We are interested in looking at the effect of insulation on gas
>> consumption. The
>> average outside temperature (degrees celcius) was also measured. The data
>> are included in
>> the file "insulation.txt".
>>
>> (a) Determine if insulation in the house effects the average gas
>> consumption.
>> (b) How much extra gas is used when there is no insulation? Provide an
>> interval estimate
>> as well as a point estimate.
>>
>> heres the content in "insulation.txt"  (u can just copy and paste it to
>> the
>> notepad so can be read in R)
>>
>> Before insul    After insul.
>> temp    gas     temp    gas
>> -0.8    7.2    -0.7    4.8
>> -0.7    6.9    0.8    4.6
>> 0.4    6.4    1.0    4.7
>> 2.5    6.0    1.4    4.0
>> 2.9    5.8    1.5    4.2
>> 3.2    5.8    1.6    4.2
>> 3.6    5.6    2.3    4.1
>> 3.9    4.7    2.5    4.0
>> 4.2    5.8    2.5    3.5
>> 4.3    5.2    3.1    3.2
>> 5.4    4.9    3.9    3.9
>> 6.0    4.9    4.0    3.5
>> 6.0    4.3    4.0    3.7
>> 6.0    4.4    4.2    3.5
>> 6.2    4.5    4.3    3.5
>> 6.3    4.6    4.6    3.7
>> 6.9    3.7    4.7    3.5
>> 7.0    3.9    4.9    3.4
>> 7.4    4.2    4.9    3.7
>> 7.5    4.0    4.9    4.0
>> 7.5    3.9    5.0    3.6
>> 7.6    3.5    5.3    3.7
>> 8.0    4.0    6.2    2.8
>> 8.5    3.6    7.1    3.0
>> 9.1    3.1    7.2    2.8
>> 10.2  2.6    7.5    2.6
>>                8.0    2.7
>>                8.7    2.8
>>                8.8    1.3
>>                9.7    1.5
>>
>>
>>
>> thx and any ideas would help.
>> --
>> View this message in context:
>> http://www.nabble.com/data-analysis.-R-tp22641912p22641912.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/data-analysis.-R-tp22641912p22643290.html
Sent from the R help mailing list archive at Nabble.com.

_

Re: [R] data analysis. R

2009-03-21 Thread Dylan Beaudette
On Sat, Mar 21, 2009 at 5:13 PM, UBC  wrote:
>
> so i am having this question
> what should i do if the give data file (.txt) has 4 columns, but different
> lengths?
> how can i read them in R?
> any idea for the following problem?
>
>
> Gas consumption (1000 cubic feet) was measured before and after insulation
> was put into
> a house. We are interested in looking at the effect of insulation on gas
> consumption. The
> average outside temperature (degrees celcius) was also measured. The data
> are included in
> the file "insulation.txt".
>
> (a) Determine if insulation in the house effects the average gas
> consumption.
> (b) How much extra gas is used when there is no insulation? Provide an
> interval estimate
> as well as a point estimate.
>
> heres the content in "insulation.txt"  (u can just copy and paste it to the
> notepad so can be read in R)
>
> Before insul    After insul.
> temp    gas     temp    gas
> -0.8    7.2    -0.7    4.8
> -0.7    6.9    0.8    4.6
> 0.4    6.4    1.0    4.7
> 2.5    6.0    1.4    4.0
> 2.9    5.8    1.5    4.2
> 3.2    5.8    1.6    4.2
> 3.6    5.6    2.3    4.1
> 3.9    4.7    2.5    4.0
> 4.2    5.8    2.5    3.5
> 4.3    5.2    3.1    3.2
> 5.4    4.9    3.9    3.9
> 6.0    4.9    4.0    3.5
> 6.0    4.3    4.0    3.7
> 6.0    4.4    4.2    3.5
> 6.2    4.5    4.3    3.5
> 6.3    4.6    4.6    3.7
> 6.9    3.7    4.7    3.5
> 7.0    3.9    4.9    3.4
> 7.4    4.2    4.9    3.7
> 7.5    4.0    4.9    4.0
> 7.5    3.9    5.0    3.6
> 7.6    3.5    5.3    3.7
> 8.0    4.0    6.2    2.8
> 8.5    3.6    7.1    3.0
> 9.1    3.1    7.2    2.8
> 10.2  2.6    7.5    2.6
>                8.0    2.7
>                8.7    2.8
>                8.8    1.3
>                9.7    1.5
>
>
>
> thx and any ideas would help.

Dude- really? This is just a funky-format version of the whiteside
data found in the MASS package:

library(MASS)
whiteside


See the posting guide (http://www.r-project.org/posting-guide.html),
especially the section on homework questions.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data analysis. R

2009-03-21 Thread Gabor Grothendieck
This works with the example.  If the real data is different it may not
work.  To run the example below just copy and paste it into R.
To run with the real data replace textConnection(Lines) with
"insulation.txt" everywhere.

Lines <- "Before insulAfter insul.
tempgas tempgas
-0.87.2-0.74.8
-0.76.90.84.6
0.46.41.04.7
2.56.01.44.0
2.95.81.54.2
3.25.81.64.2
3.65.62.34.1
3.94.72.54.0
4.25.82.53.5
4.35.23.13.2
5.44.93.93.9
6.04.94.03.5
6.04.34.03.7
6.04.44.23.5
6.24.54.33.5
6.34.64.63.7
6.93.74.73.5
7.03.94.93.4
7.44.24.93.7
7.54.04.94.0
7.53.95.03.6
7.63.55.33.7
8.04.06.22.8
8.53.67.13.0
9.13.17.22.8
10.2  2.67.52.6
   8.02.7
   8.72.8
   8.81.3
   9.71.5"

nfld <- count.fields(textConnection(Lines))
data.lines <- readLines(textConnection(Lines))
data.lines <- ifelse(nfld == 2, paste("NA NA", data.lines), data.lines)
my.data <- read.table(textConnection(data.lines), header = TRUE, skip = 1)




On Sat, Mar 21, 2009 at 8:13 PM, UBC  wrote:
>
> so i am having this question
> what should i do if the give data file (.txt) has 4 columns, but different
> lengths?
> how can i read them in R?
> any idea for the following problem?
>
>
> Gas consumption (1000 cubic feet) was measured before and after insulation
> was put into
> a house. We are interested in looking at the effect of insulation on gas
> consumption. The
> average outside temperature (degrees celcius) was also measured. The data
> are included in
> the file "insulation.txt".
>
> (a) Determine if insulation in the house effects the average gas
> consumption.
> (b) How much extra gas is used when there is no insulation? Provide an
> interval estimate
> as well as a point estimate.
>
> heres the content in "insulation.txt"  (u can just copy and paste it to the
> notepad so can be read in R)
>
> Before insul    After insul.
> temp    gas     temp    gas
> -0.8    7.2    -0.7    4.8
> -0.7    6.9    0.8    4.6
> 0.4    6.4    1.0    4.7
> 2.5    6.0    1.4    4.0
> 2.9    5.8    1.5    4.2
> 3.2    5.8    1.6    4.2
> 3.6    5.6    2.3    4.1
> 3.9    4.7    2.5    4.0
> 4.2    5.8    2.5    3.5
> 4.3    5.2    3.1    3.2
> 5.4    4.9    3.9    3.9
> 6.0    4.9    4.0    3.5
> 6.0    4.3    4.0    3.7
> 6.0    4.4    4.2    3.5
> 6.2    4.5    4.3    3.5
> 6.3    4.6    4.6    3.7
> 6.9    3.7    4.7    3.5
> 7.0    3.9    4.9    3.4
> 7.4    4.2    4.9    3.7
> 7.5    4.0    4.9    4.0
> 7.5    3.9    5.0    3.6
> 7.6    3.5    5.3    3.7
> 8.0    4.0    6.2    2.8
> 8.5    3.6    7.1    3.0
> 9.1    3.1    7.2    2.8
> 10.2  2.6    7.5    2.6
>                8.0    2.7
>                8.7    2.8
>                8.8    1.3
>                9.7    1.5
>
>
>
> thx and any ideas would help.
> --
> View this message in context: 
> http://www.nabble.com/data-analysis.-R-tp22641912p22641912.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data analysis. R

2009-03-21 Thread jim holtman
If the input file has a separator other than a space (e.g., tabs or
commas) then you can read it is and the missing data will be NAs and
you can decide how to handle it.  If it does not have a separator,
then maybe you can read it in with read.fwf.  Otherwise when you read
it in, you can tell the system to 'fill' the missing data, but you
don't really know what columns that might be in.  So you have some
choices; you are able to read in data that may have different lengths
in the columns, but if it is ill-structured, it may be difficult to
determine how to handle the missing data.

On Sat, Mar 21, 2009 at 8:13 PM, UBC  wrote:
>
> so i am having this question
> what should i do if the give data file (.txt) has 4 columns, but different
> lengths?
> how can i read them in R?
> any idea for the following problem?
>
>
> Gas consumption (1000 cubic feet) was measured before and after insulation
> was put into
> a house. We are interested in looking at the effect of insulation on gas
> consumption. The
> average outside temperature (degrees celcius) was also measured. The data
> are included in
> the file "insulation.txt".
>
> (a) Determine if insulation in the house effects the average gas
> consumption.
> (b) How much extra gas is used when there is no insulation? Provide an
> interval estimate
> as well as a point estimate.
>
> heres the content in "insulation.txt"  (u can just copy and paste it to the
> notepad so can be read in R)
>
> Before insul    After insul.
> temp    gas     temp    gas
> -0.8    7.2    -0.7    4.8
> -0.7    6.9    0.8    4.6
> 0.4    6.4    1.0    4.7
> 2.5    6.0    1.4    4.0
> 2.9    5.8    1.5    4.2
> 3.2    5.8    1.6    4.2
> 3.6    5.6    2.3    4.1
> 3.9    4.7    2.5    4.0
> 4.2    5.8    2.5    3.5
> 4.3    5.2    3.1    3.2
> 5.4    4.9    3.9    3.9
> 6.0    4.9    4.0    3.5
> 6.0    4.3    4.0    3.7
> 6.0    4.4    4.2    3.5
> 6.2    4.5    4.3    3.5
> 6.3    4.6    4.6    3.7
> 6.9    3.7    4.7    3.5
> 7.0    3.9    4.9    3.4
> 7.4    4.2    4.9    3.7
> 7.5    4.0    4.9    4.0
> 7.5    3.9    5.0    3.6
> 7.6    3.5    5.3    3.7
> 8.0    4.0    6.2    2.8
> 8.5    3.6    7.1    3.0
> 9.1    3.1    7.2    2.8
> 10.2  2.6    7.5    2.6
>                8.0    2.7
>                8.7    2.8
>                8.8    1.3
>                9.7    1.5
>
>
>
> thx and any ideas would help.
> --
> View this message in context: 
> http://www.nabble.com/data-analysis.-R-tp22641912p22641912.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data analysis. R

2009-03-21 Thread UBC

so i am having this question
what should i do if the give data file (.txt) has 4 columns, but different
lengths?
how can i read them in R?
any idea for the following problem?


Gas consumption (1000 cubic feet) was measured before and after insulation
was put into
a house. We are interested in looking at the effect of insulation on gas
consumption. The
average outside temperature (degrees celcius) was also measured. The data
are included in
the file "insulation.txt".

(a) Determine if insulation in the house effects the average gas
consumption.
(b) How much extra gas is used when there is no insulation? Provide an
interval estimate
as well as a point estimate.

heres the content in "insulation.txt"  (u can just copy and paste it to the
notepad so can be read in R)

Before insulAfter insul.
tempgas tempgas
-0.87.2-0.74.8
-0.76.90.84.6
0.46.41.04.7
2.56.01.44.0
2.95.81.54.2
3.25.81.64.2
3.65.62.34.1
3.94.72.54.0
4.25.82.53.5
4.35.23.13.2
5.44.93.93.9
6.04.94.03.5
6.04.34.03.7
6.04.44.23.5
6.24.54.33.5
6.34.64.63.7
6.93.74.73.5
7.03.94.93.4
7.44.24.93.7
7.54.04.94.0
7.53.95.03.6
7.63.55.33.7
8.04.06.22.8
8.53.67.13.0
9.13.17.22.8
10.2  2.67.52.6
8.02.7
8.72.8
8.81.3
9.71.5



thx and any ideas would help.
-- 
View this message in context: 
http://www.nabble.com/data-analysis.-R-tp22641912p22641912.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Analysis Functions in R

2008-12-09 Thread Dirk Eddelbuettel
On Mon, Dec 08, 2008 at 09:34:35PM -0800, Feanor22 wrote:
> 
> Hi experts of R,
> 
> Are there any functions in R to test a univariate series for long memory
> effects, structural breaks and time reversability?
> I've found for ARCH effects(ArchTest), for normal (Shapiro.test,
> KS.test(comparing with randn) and lillie.test) but not for the above
> mentioned.
> Where can I find a comprehensive list of functions available by type?

Please try the CRAN Task views for EmpiricalFinance, Econometrics and 
TimeSeries.

Dirk

-- 
Three out of two people have difficulties with fractions.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data Analysis Functions in R

2008-12-08 Thread Feanor22

Hi experts of R,

Are there any functions in R to test a univariate series for long memory
effects, structural breaks and time reversability?
I've found for ARCH effects(ArchTest), for normal (Shapiro.test,
KS.test(comparing with randn) and lillie.test) but not for the above
mentioned.
Where can I find a comprehensive list of functions available by type?

Thank you

Renato Costa
-- 
View this message in context: 
http://www.nabble.com/Data-Analysis-Functions-in-R-tp20909079p20909079.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UNSOLITED E_MAILS: Integrate R data-analysis projects wi

2008-03-19 Thread Duncan Murdoch
On 3/19/2008 10:18 AM, (Ted Harding) wrote:
> On 19-Mar-08 10:34:12, Rory Winston wrote:
>> Me too. Getting directly spammed like this is really annoying.
>> I dont mind a general post to the list, but individually
>> spamming each member of the list is unacceptable. Especially
>> as I have no interest in the stupid product in question.
> 
> It's not worth giving in to negative emotions when you receive
> this stuff. It's on the same footing as a bird-dropping on
> you car windscreen -- just wipe it off, and carry on as usual.
> 
> However, I do agree that individually spamming each member of
> the list is unacceptable.
> 
> I just received my personalised copy too.
> 
> I note from the headers that it was distributed by cpbounce.com
> See:
> 
>   http://www.aboutus.org/Icpbounce.com
> 
> I also see a header:
> 
> X-List-Unsubscribe:
>  listunsubscribe.php?r=8703476&l=4762&s=VH7B&m=121192&c=224770>
> 
> [all one line]
> 
> Does that mean that either I, or us, or the R-help list,
> am/are/is now subscribed to some icpbounce spam-dissemination
> list?
> 
> It is clear that the particular sender, Ben Hinchliffe,
> has been acting reprehensibly. This would possibly be a
> breach of the Data Protection Act and or Misuse of Computers
> Act in the UK.
> 
> But it also seems possible that we may now collectively be
> in the clutches of this icpbounce bunch.
> 
> I hate to suggest it (they have enough on their plates in
> any case), but might the R-help owners consider following
> this up with icpnounce and/or Hinchliffe, and getting this
> undone?

I don't think this is a reasonable request.  The mailing list admins 
have no more or less ability to deal with this than anyone else on the 
list.  If you don't like the spam, talk to the spammer (or the spammer's 
service provider), or "just wipe it off".

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UNSOLITED E_MAILS: Integrate R data-analysis projects wi

2008-03-19 Thread Ted Harding
On 19-Mar-08 10:34:12, Rory Winston wrote:
> Me too. Getting directly spammed like this is really annoying.
> I dont mind a general post to the list, but individually
> spamming each member of the list is unacceptable. Especially
> as I have no interest in the stupid product in question.

It's not worth giving in to negative emotions when you receive
this stuff. It's on the same footing as a bird-dropping on
you car windscreen -- just wipe it off, and carry on as usual.

However, I do agree that individually spamming each member of
the list is unacceptable.

I just received my personalised copy too.

I note from the headers that it was distributed by cpbounce.com
See:

  http://www.aboutus.org/Icpbounce.com

I also see a header:

X-List-Unsubscribe:


[all one line]

Does that mean that either I, or us, or the R-help list,
am/are/is now subscribed to some icpbounce spam-dissemination
list?

It is clear that the particular sender, Ben Hinchliffe,
has been acting reprehensibly. This would possibly be a
breach of the Data Protection Act and or Misuse of Computers
Act in the UK.

But it also seems possible that we may now collectively be
in the clutches of this icpbounce bunch.

I hate to suggest it (they have enough on their plates in
any case), but might the R-help owners consider following
this up with icpnounce and/or Hinchliffe, and getting this
undone?

Best wishes to all,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 19-Mar-08   Time: 14:18:13
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UNSOLITED E_MAILS: Integrate R data-analysis projects with Microsoft Office for free

2008-03-19 Thread ahimsa campos-arceiz
The guys didn't need to work very hard to obtain our data. (My understanding
is that) all the mails from this list are publicly available at the R-help
archives. You will find yourself here:

http://tolstoy.newcastle.edu.au/R/e4/help/08/03/index.html

That is why we can (and must) browse through the history of the mail-list to
avoid asking again an already-answered question. The bad side is that our
name and email address are exposed.

just in case: I have nothing to do with the spammers





On Tue, Mar 18, 2008 at 9:08 PM, Gorden T Jemwa <[EMAIL PROTECTED]> wrote:

> Dear R Admins,
>
> I received an unsolicited e-mail from BlueInference as an R
> user. Does it mean that R that our e-mails (and  names) is
> sharing it's user database with third parties without our
> consent? Or perhaps the BlueInference guys are using an
> e-mail address miner to get our contact details?
>
>
>
> [SNIP]
>
> Dear Gorden Jemwa,
>
> As a fellow R user, I am sure you agree with me that R is a
> dear gift from the R-project community that should enjoy
> broad use.  Towards that end, we've built a software
> solution directed at the very large community of Microsoft
> Office users, called Inference for Office.  It combines the
> powerful data-analysis capabilities of R with the familiar
> and flexible word-processing and data-preparation features
> of Microsoft Word and Excel.  We are making Inference for
> Office available for free to R users at educational and
> non-profit research institutions.  A free trial is available
> for everyone.
>
> With Inference for Office, you can assemble all the elements
> of an R data-analysis project (text, data, R objects, R
> code) into dynamic documents.  These dynamic documents can
> then be executed in real-time to create results documents
> containing all the output and graphics.  If Inference for
> Office is of no interest to you, please disregard this
> message and accept our humble apologies for having bothered you.
>
> If Inference for Office sounds like it might useful, you can
> obtain additional information by visiting our website and
> viewing a two-minute screencast overview of Inference for
> Office:
>
> http://www.inference.us
>
> While you're there, you can also download a free trial of
> Inference for Office
>
> To your success,
>
>  --Ben
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> . . . . .
> Ben Hinchliffe
> Inference Evangelist
> BlueReference, Inc.
> [EMAIL PROTECTED]
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> . . . . .
> website:  www.inference.us
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
ahimsa campos-arceiz
PhD candidate
Lab of Biodiversity Science
The University of Tokyo
www.camposarceiz.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UNSOLITED E_MAILS: Integrate R data-analysis projects with Microsoft Office for free

2008-03-19 Thread Rory Winston

Me too. Getting directly spammed like this is really annoying. I dont mind a
general post to the list, but individually spamming each member of the list
is unacceptable. Especially as I have no interest in the stupid product in
question.



Gorden T Jemwa wrote:
> 
> Dear R Admins,
> 
> I received an unsolicited e-mail from BlueInference as an R 
> user. Does it mean that R that our e-mails (and  names) is 
> sharing it's user database with third parties without our 
> consent? Or perhaps the BlueInference guys are using an 
> e-mail address miner to get our contact details?
> 
> 
> 
> [SNIP]
> 
> Dear Gorden Jemwa,
> 
> As a fellow R user, I am sure you agree with me that R is a 
> dear gift from the R-project community that should enjoy 
> broad use.  Towards that end, we’ve built a software 
> solution directed at the very large community of Microsoft 
> Office users, called Inference for Office.  It combines the 
> powerful data-analysis capabilities of R with the familiar 
> and flexible word-processing and data-preparation features 
> of Microsoft Word and Excel.  We are making Inference for 
> Office available for free to R users at educational and 
> non-profit research institutions.  A free trial is available 
> for everyone.
> 
> With Inference for Office, you can assemble all the elements 
> of an R data-analysis project (text, data, R objects, R 
> code) into dynamic documents.  These dynamic documents can 
> then be executed in real-time to create results documents 
> containing all the output and graphics.  If Inference for 
> Office is of no interest to you, please disregard this 
> message and accept our humble apologies for having bothered you.
> 
> If Inference for Office sounds like it might useful, you can 
> obtain additional information by visiting our website and 
> viewing a two-minute screencast overview of Inference for 
> Office:
> 
> http://www.inference.us
> 
> While you're there, you can also download a free trial of 
> Inference for Office
> 
> To your success,
> 
>   --Ben
> 
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
> . . . . .
> Ben Hinchliffe
> Inference Evangelist
> BlueReference, Inc.
> [EMAIL PROTECTED]
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
> . . . . .
> website:  www.inference.us
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/UNSOLITED-E_MAILS%3A-Integrate-R-data-analysis-projects-with-Microsoft-Office-for-free-tp16119878p16142681.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UNSOLITED E_MAILS: Integrate R data-analysis projects wi

2008-03-18 Thread Doran, Harold
I think on the python list, when you review the archives, the poster
address is viewed like a CAPTCHA. So, it makes it slightly more
difficult (though not impossible) to pullout poster emails addresses and
replace john.doe at domainname.com  



> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf 
> Of Douglas Bates
> Sent: Tuesday, March 18, 2008 8:39 AM
> To: Doran, Harold
> Cc: [EMAIL PROTECTED]; Gorden T Jemwa; r-help@r-project.org
> Subject: Re: [R] UNSOLITED E_MAILS: Integrate R data-analysis 
> projects wi
> 
> Usually a captcha is used to prevent creation of email 
> accounts for use by spammers.  (There was an interesting 
> article recently on whether the Gmail captcha scheme had been 
> broken so that spammers could create masses of gmail 
> accounts.  The general conclusion is that the capcha scheme 
> is intact but spammers hire people in low-wage countries to 
> manually respond to the captcha challenge.)
> 
> What Ted has suggested and what I am confident is the case is 
> that email addresses of posters were obtained from list 
> archives or something like that.  I know for a fact that the 
> R Foundation is not selling any email lists. The idea that R 
> Core has engaged in a nefarious money-making scheme of 
> spending more than a decade developing high-quality open 
> source software, providing support, enhancements, 
> conferences, email lists, etc. so they could "cash out"
> by selling a mailing list for a modest amount of money seems, 
> well, unlikely.
> 
> If email addresses are being extracted from the archives then 
> the only place a captcha would help is when viewing the 
> archives.  Requiring everyone to submit the solution to a 
> captcha before retrieving a message from the archives would 
> be tedious and make the archives essentially useless.  
> Besides, all that is required is for one person to 
> legitimately subscribe to the lists and run their own filters 
> on the incoming email to extract the addresses of posters.  
> My guess is that Ben Hinchliffe or someone else at 
> Bluereference.com is already subscribed.
> 
> The best way to discourage such questionable practices is not 
> to patronize organizations that use them.
> 
> On Tue, Mar 18, 2008 at 7:48 AM, Doran, Harold <[EMAIL PROTECTED]> wrote:
> > Can a CAPTCHA be implemented as a prevenative measure
> >
> >
> >  > -Original Message-
> >  > From: [EMAIL PROTECTED]
> >  > [mailto:[EMAIL PROTECTED] On Behalf Of  > 
> > [EMAIL PROTECTED]
> >
> >
> > > Sent: Tuesday, March 18, 2008 7:33 AM
> >  > To: Gorden T Jemwa
> >  > Cc: r-help@r-project.org
> >  > Subject: Re: [R] UNSOLITED E_MAILS: Integrate R data-analysis  > 
> > projects wi  >  > On 18-Mar-08 12:08:44, Gorden T Jemwa wrote:
> >  > > Dear R Admins,
> >  > >
> >  > > I received an unsolicited e-mail from BlueInference as an R  > 
> > user. Does  > > it mean that R that our e-mails (and  names) is 
> > sharing it's user  > > database with third parties without our 
> > consent? Or perhaps the  > > BlueInference guys are using an e-mail 
> > address miner to get our  > > contact details?
> >  > > [SNIP]
> >  > > Dear Gorden Jemwa,
> >  > >
> >  > > As a fellow R user, I am sure you agree with me that R is a  > 
> > dear gift  > > from the R-project community that should enjoy broad 
> > use.
> >  > > [...]
> >  > > Ben Hinchliffe
> >  > > Inference Evangelist
> >  > > BlueReference, Inc.
> >  > > [EMAIL PROTECTED]
> >  >
> >  > It would not be difficult to mine a database of email  > 
> addresses 
> > from the R-help archives. Each month's postings can  > be 
> downloaded 
> > as a .gz file. Each posting in the resulting  > unzipped 
> .txt file has 
> > a line of the form  >
> >  >   From: user.name at email.domain
> >  >
> >  > and all that's then needed is to replace " at " with "@", and  > 
> > you have the email address.
> >  >
> >  > On a Unix system, a quick 'grep | sed' would do the job 
> in a second!
> >  >
> >  > In this case, the spam was clearly carefully targeted at R  > 
> > users, so quite possibly they took a bit more trouble over 
> it  > (to 
> > the point of extracting full names as well).
> >  >
> >  > I can't see the R people deliberately sharing their database,  > 
> > and the list of subscr

Re: [R] UNSOLITED E_MAILS: Integrate R data-analysis projects wi

2008-03-18 Thread Douglas Bates
Usually a captcha is used to prevent creation of email accounts for
use by spammers.  (There was an interesting article recently on
whether the Gmail captcha scheme had been broken so that spammers
could create masses of gmail accounts.  The general conclusion is that
the capcha scheme is intact but spammers hire people in low-wage
countries to manually respond to the captcha challenge.)

What Ted has suggested and what I am confident is the case is that
email addresses of posters were obtained from list archives or
something like that.  I know for a fact that the R Foundation is not
selling any email lists. The idea that R Core has engaged in a
nefarious money-making scheme of spending more than a decade
developing high-quality open source software, providing support,
enhancements, conferences, email lists, etc. so they could "cash out"
by selling a mailing list for a modest amount of money seems, well,
unlikely.

If email addresses are being extracted from the archives then the only
place a captcha would help is when viewing the archives.  Requiring
everyone to submit the solution to a captcha before retrieving a
message from the archives would be tedious and make the archives
essentially useless.  Besides, all that is required is for one person
to legitimately subscribe to the lists and run their own filters on
the incoming email to extract the addresses of posters.  My guess is
that Ben Hinchliffe or someone else at Bluereference.com is already
subscribed.

The best way to discourage such questionable practices is not to
patronize organizations that use them.

On Tue, Mar 18, 2008 at 7:48 AM, Doran, Harold <[EMAIL PROTECTED]> wrote:
> Can a CAPTCHA be implemented as a prevenative measure
>
>
>  > -Original Message-
>  > From: [EMAIL PROTECTED]
>  > [mailto:[EMAIL PROTECTED] On Behalf Of
>  > [EMAIL PROTECTED]
>
>
> > Sent: Tuesday, March 18, 2008 7:33 AM
>  > To: Gorden T Jemwa
>  > Cc: r-help@r-project.org
>  > Subject: Re: [R] UNSOLITED E_MAILS: Integrate R data-analysis
>  > projects wi
>  >
>  > On 18-Mar-08 12:08:44, Gorden T Jemwa wrote:
>  > > Dear R Admins,
>  > >
>  > > I received an unsolicited e-mail from BlueInference as an R
>  > user. Does
>  > > it mean that R that our e-mails (and  names) is sharing it's user
>  > > database with third parties without our consent? Or perhaps the
>  > > BlueInference guys are using an e-mail address miner to get our
>  > > contact details?
>  > > [SNIP]
>  > > Dear Gorden Jemwa,
>  > >
>  > > As a fellow R user, I am sure you agree with me that R is a
>  > dear gift
>  > > from the R-project community that should enjoy broad use.
>  > > [...]
>  > > Ben Hinchliffe
>  > > Inference Evangelist
>  > > BlueReference, Inc.
>  > > [EMAIL PROTECTED]
>  >
>  > It would not be difficult to mine a database of email
>  > addresses from the R-help archives. Each month's postings can
>  > be downloaded as a .gz file. Each posting in the resulting
>  > unzipped .txt file has a line of the form
>  >
>  >   From: user.name at email.domain
>  >
>  > and all that's then needed is to replace " at " with "@", and
>  > you have the email address.
>  >
>  > On a Unix system, a quick 'grep | sed' would do the job in a second!
>  >
>  > In this case, the spam was clearly carefully targeted at R
>  > users, so quite possibly they took a bit more trouble over it
>  > (to the point of extracting full names as well).
>  >
>  > I can't see the R people deliberately sharing their database,
>  > and the list of subscribed email addresses is accessible only
>  > to the list owners. So it seems much more likely that the
>  > publicly readable archives have been mined along the lines I
>  > suggest above.
>  >
>  > Best wishes,
>  > Ted.
>  >
>  > 
>  > E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
>  > Fax-to-email: +44 (0)870 094 0861
>  > Date: 18-Mar-08   Time: 12:32:30
>  > -- XFMail --
>  >
>  > __
>  > R-help@r-project.org mailing list
>  > https://stat.ethz.ch/mailman/listinfo/r-help
>  > PLEASE do read the posting guide
>  > http://www.R-project.org/posting-guide.html
>  > and provide commented, minimal, self-contained, reproducible code.
>  >
>
>  __
>  R-help@r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UNSOLITED E_MAILS: Integrate R data-analysis projects wi

2008-03-18 Thread Doran, Harold
Can a CAPTCHA be implemented as a prevenative measure 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> [EMAIL PROTECTED]
> Sent: Tuesday, March 18, 2008 7:33 AM
> To: Gorden T Jemwa
> Cc: r-help@r-project.org
> Subject: Re: [R] UNSOLITED E_MAILS: Integrate R data-analysis 
> projects wi
> 
> On 18-Mar-08 12:08:44, Gorden T Jemwa wrote:
> > Dear R Admins,
> > 
> > I received an unsolicited e-mail from BlueInference as an R 
> user. Does 
> > it mean that R that our e-mails (and  names) is sharing it's user 
> > database with third parties without our consent? Or perhaps the 
> > BlueInference guys are using an e-mail address miner to get our 
> > contact details?
> > [SNIP]
> > Dear Gorden Jemwa,
> > 
> > As a fellow R user, I am sure you agree with me that R is a 
> dear gift 
> > from the R-project community that should enjoy broad use.
> > [...]
> > Ben Hinchliffe
> > Inference Evangelist
> > BlueReference, Inc.
> > [EMAIL PROTECTED]
> 
> It would not be difficult to mine a database of email 
> addresses from the R-help archives. Each month's postings can 
> be downloaded as a .gz file. Each posting in the resulting 
> unzipped .txt file has a line of the form
> 
>   From: user.name at email.domain
> 
> and all that's then needed is to replace " at " with "@", and 
> you have the email address.
> 
> On a Unix system, a quick 'grep | sed' would do the job in a second!
> 
> In this case, the spam was clearly carefully targeted at R 
> users, so quite possibly they took a bit more trouble over it 
> (to the point of extracting full names as well).
> 
> I can't see the R people deliberately sharing their database, 
> and the list of subscribed email addresses is accessible only 
> to the list owners. So it seems much more likely that the 
> publicly readable archives have been mined along the lines I 
> suggest above.
> 
> Best wishes,
> Ted.
> 
> 
> E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
> Fax-to-email: +44 (0)870 094 0861
> Date: 18-Mar-08   Time: 12:32:30
> -- XFMail --
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UNSOLITED E_MAILS: Integrate R data-analysis projects wi

2008-03-18 Thread Ted Harding
On 18-Mar-08 12:08:44, Gorden T Jemwa wrote:
> Dear R Admins,
> 
> I received an unsolicited e-mail from BlueInference as an R 
> user. Does it mean that R that our e-mails (and  names) is 
> sharing it's user database with third parties without our 
> consent? Or perhaps the BlueInference guys are using an 
> e-mail address miner to get our contact details?
> [SNIP]
> Dear Gorden Jemwa,
> 
> As a fellow R user, I am sure you agree with me that R is a 
> dear gift from the R-project community that should enjoy 
> broad use.
> [...]
> Ben Hinchliffe
> Inference Evangelist
> BlueReference, Inc.
> [EMAIL PROTECTED]

It would not be difficult to mine a database of email addresses
from the R-help archives. Each month's postings can be downloaded
as a .gz file. Each posting in the resulting unzipped .txt file
has a line of the form

  From: user.name at email.domain

and all that's then needed is to replace " at " with "@", and
you have the email address.

On a Unix system, a quick 'grep | sed' would do the job
in a second!

In this case, the spam was clearly carefully targeted at R users,
so quite possibly they took a bit more trouble over it (to the
point of extracting full names as well).

I can't see the R people deliberately sharing their database,
and the list of subscribed email addresses is accessible only
to the list owners. So it seems much more likely that the
publicly readable archives have been mined along the lines
I suggest above.

Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 18-Mar-08   Time: 12:32:30
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] UNSOLITED E_MAILS: Integrate R data-analysis projects with Microsoft Office for free

2008-03-18 Thread Gorden T Jemwa
Dear R Admins,

I received an unsolicited e-mail from BlueInference as an R 
user. Does it mean that R that our e-mails (and  names) is 
sharing it's user database with third parties without our 
consent? Or perhaps the BlueInference guys are using an 
e-mail address miner to get our contact details?



[SNIP]

Dear Gorden Jemwa,

As a fellow R user, I am sure you agree with me that R is a 
dear gift from the R-project community that should enjoy 
broad use.  Towards that end, we’ve built a software 
solution directed at the very large community of Microsoft 
Office users, called Inference for Office.  It combines the 
powerful data-analysis capabilities of R with the familiar 
and flexible word-processing and data-preparation features 
of Microsoft Word and Excel.  We are making Inference for 
Office available for free to R users at educational and 
non-profit research institutions.  A free trial is available 
for everyone.

With Inference for Office, you can assemble all the elements 
of an R data-analysis project (text, data, R objects, R 
code) into dynamic documents.  These dynamic documents can 
then be executed in real-time to create results documents 
containing all the output and graphics.  If Inference for 
Office is of no interest to you, please disregard this 
message and accept our humble apologies for having bothered you.

If Inference for Office sounds like it might useful, you can 
obtain additional information by visiting our website and 
viewing a two-minute screencast overview of Inference for 
Office:

http://www.inference.us

While you're there, you can also download a free trial of 
Inference for Office

To your success,

  --Ben

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . .
Ben Hinchliffe
Inference Evangelist
BlueReference, Inc.
[EMAIL PROTECTED]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . .
website:  www.inference.us

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.