Re: [R] p values from GLM

2016-04-03 Thread John Maindonald
How small does a p-value need to be to warrant attention, however?

Witness Fisher’s comment that:
“. . . we may, if we prefer it, draw the line at one in fifty (the 2 per cent 
point), or one in a hundred (the 1 per cent point). Personally, the writer 
prefers to set a low standard of significance at the 5 per cent point, and 
ignore entirely all results which fail to reach this level. A scientific fact 
should be regarded as experimentally established only if a properly designed 
experiment rarely fails to give this level of significance.”
[Fisher RA (1926), “The Arrangement of Field Experiments,” Journal of the 
Ministry of Agriculture of Great Britain, 33, 503-513.]

See the selection of Fisher quotes at http://www.jerrydallal.com/lhsp/p05.htm .

In contexts where a p <= 0.05 becomes more likely under the NULL (not the case 
if the experiment might just as well have been a random number generator), 
small P-values shift the weight of evidence.  An alternative that is apriori 
highly unlikely takes a lot of shifting.


John Maindonald email: 
john.maindon...@anu.edu.au


On 3/04/2016, at 22:00, 
r-help-requ...@r-project.org wrote:

From: Heinz Tuechler >
Subject: Re: [R] p values from GLM
Date: 3 April 2016 11:00:50 NZST
To: Bert Gunter >, Duncan 
Murdoch >
Cc: r-help >



Bert Gunter wrote on 01.04.2016 23:46:
... of course, whether one **should** get them is questionable...

http://www.nature.com/news/statisticians-issue-warning-over-misuse-of-p-values-1.19503#/ref-link-1

This paper repeats the common place statement that a small p-value does not 
necessarily indicate an important finding. Agreed, but maybe I overlooked 
examples of important findings with large p-values.
If there are some, I would be happy to get to know some of them. Otherwise a 
small p-value is no guarantee of importance, but a prerequisite.

best regards,

Heinz


Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Apr 1, 2016 at 3:26 PM, Duncan Murdoch 
> wrote:
On 01/04/2016 6:14 PM, John Sorkin wrote:
How can I get the p values from a glm ? I want to get the p values so I
can add them to a custom report


  fitwean<-
glm(data[,"JWean"]~data[,"Group"],data=data,family=binomial(link ="logit"))
  summary(fitwean) # This lists the coefficeints, SEs, z and p
values, but I can't isolate the pvalues.
  names(summary(fitwean))  # I see the coefficients, but not the p values
  names(fitmens)  # p values are not found here.

Doesn't summary(fitwean) give a matrix? Then it's
colnames(summary(fitwean)$coefficients) you want, not names(fitwean).

Duncan Murdoch

P.S. If you had given a reproducible example, I'd try it myself.




Thank you!
John

John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and
Geriatric Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for the sole use of the
intended recipient(s) and may contain confidential and privileged
information. Any unauthorized use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by reply
email and destroy all copies of the original message.
__
R-help@r-project.org mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and 

Re: [R] apply mean function to a subset of data

2016-04-03 Thread David L Carlson
Here are several ways to get there, but your original loop is fine once it is 
corrected:

> for (i in 1:2)  smean[i] <- mean(toy$diam[toy$group==i][1:nsel[i]])
> smean
[1] 0.271489 1.117015

Using sapply() to hide the loop:
> smean <- sapply(1:2, function(x) mean((toy$diam[toy$group==x])[1:nsel[x]]))
> smean
[1] 0.271489 1.117015

Or use head()
> smean <- sapply(1:2, function(x) mean(head(toy$diam[toy$group==x], nsel[x])))
> smean
[1] 0.271489 1.117015

Or mapply() instead of sapply
> smean <- mapply(function(x, y) mean(head(x, y)) , x=split(toy$diam, 
> toy$group), y=nsel)
> smean
   12 
0.271489 1.117015

--
David L. Carlson
Department of Anthropology
Texas A University

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Lemon
Sent: Saturday, April 2, 2016 6:14 PM
To: Pedro Mardones 
Cc: r-help mailing list 
Subject: Re: [R] apply mean function to a subset of data

Hi Pedro,
This may not be much of an improvement, but it was a challenge.

selvec<-as.vector(matrix(c(nsel,unlist(by(toy$diam,toy$group,length))-nsel),
 ncol=2,byrow=TRUE))
TFvec<-rep(c(TRUE,FALSE),length.out=length(selvec))
toynsel<-rep(TFvec,selvec)
by(toy[toynsel,]$diam,toy[toynsel,]$group,mean)

Jim

On 4/3/16, Pedro Mardones  wrote:
> Dear all;
>
> This must have a rather simple answer but haven't been able to figure it
> out: I have a data frame with say 2 groups (group 1 & 2). I want to select
> from group 1 say "n" rows and calculate the mean; then select "m" rows from
> group 2 and calculate the mean as well. So far I've been using a for loop
> for doing it but when it comes to a large data set is rather inefficient.
> Any hint to vectorize this would be appreciated.
>
> toy = data.frame(group = c(rep(1,10),rep(2,8)), diam =
> c(rnorm(10),rnorm(8)))
> nsel = c(6,4)
> smean <- c(0,0)
> for (i in 1:2)  smean[i] <- mean(toy$diam[1:nsel[i]])
>
> Thanks
>
> Pedro
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] use one way ANOVA to select genes

2016-04-03 Thread hehsham alpukhity via R-help
i want to select the significant genes form  5 clusters (groups) by one way 
ANOVA  in 
r###
# i want use One way ANOVA to select the siginificant from the clusters above 
selectgene <- function(GropuData,pvalue=0.05, na.rm=TRUE, file=1:5){# if each 
gruop in one  txt file                fdata <- list.files(data,full.names = 
TRUE)                for(i in file) {        anova() 
        }  }##

Hisham AL-bukhaiti Ph.D Student (Information system ) China, changsha,Hunan 
university. Mobile: 0068-15 111 4246 91.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] before-after control-impact analysis with R

2016-04-03 Thread Bert Gunter
This has nothing to do with R. Post on an ecology list or talk with
your teachers.

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Apr 2, 2016 at 11:24 PM, MAHFUZATUL IZYAN
 wrote:
> Hi! I’m Zatul from Malaysia. I’m currently doing simple task on BACI approach 
> in ecology study. I’m a newbie in ecology study. Perhaps, I can get link and 
> some idea regarding how to analyse BACI data. Tq.
>
>
>
>
>
> Regards.
>
>
> Sent from Windows Mail
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R model developing & validating - Open to Discussion

2016-04-03 Thread Bert Gunter
This is way OT for this list, and really has nothing to do with R.
Post on a statistical list like stats.stackexchange.com if you want to
repeat a discussion that has gone on for decades and has no
resolution.

You really should be spending time with the literature, though. Have
you? "Cross validation" and "penalized regression" might be a couple
of terms to start you off, although they are far from sufficient, and
others might suggest better ones.

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Apr 2, 2016 at 2:40 PM, Norman Polozka  wrote:
> Throughout my R journey I have noticed the way we can use given data to
> develop and validate a model.
>
> Assume that you have given data for a problem
>
> 1. train.csv
> 2. test.csv
>
> *Method A*
>
> *Combine train+test data* and develop a model using the combined data. Then
> use test.data to validate the model based on predicted error analysis.
>
> *Method B*
>
> Use *train data* to develop the model and then use *test data* to validate
> the model based on predicted error analysis.
>
> *Method C*
>
> Sub divided 75% as training data and 25% test data on *train.csv *file and
> use new training data for developing the model. Then use new test data to
> validate the model.
> After that use initial given test data to double check the performance of
> the model.
>
> I have identified 3 methods so it is bit confusing which one to use.
>
> *Are there any other methods other than these methods?*
>
> I need opinions from R experts on
>
> 1. What is the best practice?
>
> 2. Does that depend on the scale of the problem (smaller data or big data)?
>
> 3. a) Confusion matrix is the only way that can we use to check the
> performance of a model?
>
> b) Is there any other matrices to check the performance?
>
> c) Does it depend on the type of the model(lm(),glm(),tree(),svm()
> etc..)?
>
> d) Do we have different matrices for different models to evaluate the
> model?
>
>
> PS: I have asked this question in stack but no response so I thought to ask
> from you guys
>
> Many thanks
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] File 1 is not in sorted order Error

2016-04-03 Thread Jeff Newmiller
This question belongs on R-devel.
-- 
Sent from my phone. Please excuse my brevity.

On April 2, 2016 7:40:36 PM PDT, Michael Morrison  
wrote:
>Hi, I'm trying to build R on windows and i'm getting the following
>error
>when i run the "make all recommended" command:
>
>C:/Rtools/mingw_64/bin/windres -F pe-x86-64   -i dllversion.rc -o
>dllversion.o
>comm: file 1 is not in sorted order
>make[4]: *** [Rgraphapp.def] Error 1
>make[3]: *** [rlibs] Error 1
>make[2]: *** [../../bin/x64/R.dll] Error 2
>make[1]: *** [rbuild] Error 2
>make: *** [all] Error 2
>
>
>Can someone please help me figure out this error. I've tried
>researching on
>my own but i'm out of options. Thanks in advance.
>
>Michael Morrison
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] before-after control-impact analysis with R

2016-04-03 Thread MAHFUZATUL IZYAN
Hi! I’m Zatul from Malaysia. I’m currently doing simple task on BACI approach 
in ecology study. I’m a newbie in ecology study. Perhaps, I can get link and 
some idea regarding how to analyse BACI data. Tq. 





Regards. 


Sent from Windows Mail
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] File 1 is not in sorted order Error

2016-04-03 Thread Michael Morrison
Hi, I'm trying to build R on windows and i'm getting the following error
when i run the "make all recommended" command:

C:/Rtools/mingw_64/bin/windres -F pe-x86-64   -i dllversion.rc -o
dllversion.o
comm: file 1 is not in sorted order
make[4]: *** [Rgraphapp.def] Error 1
make[3]: *** [rlibs] Error 1
make[2]: *** [../../bin/x64/R.dll] Error 2
make[1]: *** [rbuild] Error 2
make: *** [all] Error 2


Can someone please help me figure out this error. I've tried researching on
my own but i'm out of options. Thanks in advance.

Michael Morrison

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R model developing & validating - Open to Discussion

2016-04-03 Thread Norman Polozka
Throughout my R journey I have noticed the way we can use given data to
develop and validate a model.

Assume that you have given data for a problem

1. train.csv
2. test.csv

*Method A*

*Combine train+test data* and develop a model using the combined data. Then
use test.data to validate the model based on predicted error analysis.

*Method B*

Use *train data* to develop the model and then use *test data* to validate
the model based on predicted error analysis.

*Method C*

Sub divided 75% as training data and 25% test data on *train.csv *file and
use new training data for developing the model. Then use new test data to
validate the model.
After that use initial given test data to double check the performance of
the model.

I have identified 3 methods so it is bit confusing which one to use.

*Are there any other methods other than these methods?*

I need opinions from R experts on

1. What is the best practice?

2. Does that depend on the scale of the problem (smaller data or big data)?

3. a) Confusion matrix is the only way that can we use to check the
performance of a model?

b) Is there any other matrices to check the performance?

c) Does it depend on the type of the model(lm(),glm(),tree(),svm()
etc..)?

d) Do we have different matrices for different models to evaluate the
model?


PS: I have asked this question in stack but no response so I thought to ask
from you guys

Many thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [Rd] TensorFlow in R

2016-04-03 Thread Florian Schwendinger

Hi Axel,

Maybe the following works for you.
https://bitbucket.org/Fl/tensorflow-r-examples/src/

Florian Schwendinger

On 2016-04-01 18:32, Axel Urbiz wrote:

Hi All,

I didn't have much success through my Google search in finding any active
R-related projects to create a wrapper around TensorFlow in R. Anyone know
if this is on the go?

Thanks,
Axel.

[[alternative HTML version deleted]]

__
r-de...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] p values from GLM

2016-04-03 Thread peter dalgaard

> On 03 Apr 2016, at 01:00 , Heinz Tuechler  wrote:
> 
> 
> Bert Gunter wrote on 01.04.2016 23:46:
>> ... of course, whether one **should** get them is questionable...
>> 
>> http://www.nature.com/news/statisticians-issue-warning-over-misuse-of-p-values-1.19503#/ref-link-1
>> 
> This paper repeats the common place statement that a small p-value does not 
> necessarily indicate an important finding. Agreed, but maybe I overlooked 
> examples of important findings with large p-values.
> If there are some, I would be happy to get to know some of them. Otherwise a 
> small p-value is no guarantee of importance, but a prerequisite.

This is getting seriously off-topic, but lots of underdimensioned studies would 
qualify. However, the effects found are almost indistiguishable from Type I 
errors. Later, larger, studies would be required to confirm that the effect is 
really there. (Like, halving or doubling the risk of some cancer is hardly 
unimportant, but knowing that that is often the detection limit in 
medium-scaled epidemiological studies may make you a bit jaded when hearing 
such reports.)

-pd

> 
> best regards,
> 
> Heinz
> 
>> 
>> Cheers,
>> Bert
>> 
>> 
>> 
>> Bert Gunter
>> 
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> 
>> 
>> On Fri, Apr 1, 2016 at 3:26 PM, Duncan Murdoch  
>> wrote:
>>> On 01/04/2016 6:14 PM, John Sorkin wrote:
 How can I get the p values from a glm ? I want to get the p values so I
 can add them to a custom report
 
 
   fitwean<-
 glm(data[,"JWean"]~data[,"Group"],data=data,family=binomial(link ="logit"))
   summary(fitwean) # This lists the coefficeints, SEs, z and p
 values, but I can't isolate the pvalues.
   names(summary(fitwean))  # I see the coefficients, but not the p values
   names(fitmens)  # p values are not found here.
>>> 
>>> Doesn't summary(fitwean) give a matrix? Then it's
>>> colnames(summary(fitwean)$coefficients) you want, not names(fitwean).
>>> 
>>> Duncan Murdoch
>>> 
>>> P.S. If you had given a reproducible example, I'd try it myself.
>>> 
>>> 
>>> 
 
 Thank you!
 John
 
 John David Sorkin M.D., Ph.D.
 Professor of Medicine
 Chief, Biostatistics and Informatics
 University of Maryland School of Medicine Division of Gerontology and
 Geriatric Medicine
 Baltimore VA Medical Center
 10 North Greene Street
 GRECC (BT/18/GR)
 Baltimore, MD 21201-1524
 (Phone) 410-605-7119
 (Fax) 410-605-7913 (Please call phone number above prior to faxing)
 
 Confidentiality Statement:
 This email message, including any attachments, is for the sole use of the
 intended recipient(s) and may contain confidential and privileged
 information. Any unauthorized use, disclosure or distribution is 
 prohibited.
 If you are not the intended recipient, please contact the sender by reply
 email and destroy all copies of the original message.
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.