Re: [R] I'm writing this letter to enquire where can I download the package of "lmtest".

2011-11-20 Thread Achim Zeileis

On Mon, 21 Nov 2011, R. Michael Weylandt wrote:


Please do keep your follow up on list for archival reasons:

This isn't an area of particular expertise for me, but Google suggests
the lmtest package does do the Durbin-Watson test:


Yes, dwtest() in "lmtest" can perform both one-sided and two-sided 
Durbin-Watson tests, using asymptotic p-values.


If you want to apply the test to a series y without covariates, simply use 
dwtest(y ~ 1).


An alternative implementation durbinWatsonTest() is available in "car" 
which uses bootstrap p-values and can also assess lags greater than 1.



http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/lmtest/html/dwtest.html

Michael

2011/11/20 Sophy <100...@mail.tku.edu.tw>:

Dear Michael:
  Thanks for your quick reply. I do successfully install package "lmtest".My intention is to do 
durbin.watson test for a vector of data. But, in "lmtest" package, it only test the residual vector 
for a lm model. Sorry to bother you again. Is there any package and syntex to do a "two-sided"  
durbin.watson test for a vector of data and output its pvalue?

Best regards,
Shu-FEi Wu


  

Date: Sun, 20 Nov 2011 14:27:21 -0500
From: "R. Michael Weylandt" 
Subject: Re: [R] I'm writing this letter to enquire where can I download the package of 
"lmtest".
To: 100...@mail.tku.edu.tw
Cc: r-help@r-project.org

http://cran.r-project.org/web/packages/lmtest/index.html

Or automatically within R by typing:

install.packages("lmtest")

Michael

On Sun, Nov 20, 2011 at 1:18 PM, Sophy <100...@mail.tku.edu.tw> wrote:

Dear editor:
. I'm writing this letter to enquire where can I download the package of 
"lmtest". Can you send me this package?

THanks a lot.
Best regards,
Shu-Fei Wu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating Sequence of Dates

2011-11-20 Thread Rainer Schuermann
> n = 2 
>   
>
> sort( rep( seq(as.Date("2000/1/1"), by="month", length.out=3), n ) )  
>   
>
[1] "2000-01-01" "2000-01-01" "2000-02-01" "2000-02-01" "2000-03-01"

 
[6] "2000-03-01"


On Sunday 20 November 2011 22:16:36 arunkumar wrote:
> Hi.
> 
>   I need to generate a sequence of date, I need to generate date for a
> period. Each date should be generated n times.
> 
> E.g
> 
> seq(as.Date("2000/1/1"), by="month", length.out=3)
> 
> the output of this is "2000-01-01" "2000-02-01" "2000-03-01"
> 
> But i need each date to be generated n times, if n =2 my output
> 
> "2000-01-01" "2001-01-01" "2000-02-01" "2000-02-01" "2000-03-01"
> "2000-03-01"
> 
> 
> 
> 
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Generating-Sequence-of-Dates-tp4090672p409
> 0672.html Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating Sequence of Dates

2011-11-20 Thread Jorge I Velez
Hi,

Try

rep(seq(as.Date("2000/1/1"), by="month", length.out=3),  each = 2)

and see ?rep for more info.

HTH,
Jorge.-


On Mon, Nov 21, 2011 at 1:16 AM, arunkumar <> wrote:

> Hi.
>
>  I need to generate a sequence of date, I need to generate date for a
> period. Each date should be generated n times.
>
> E.g
>
> seq(as.Date("2000/1/1"), by="month", length.out=3)
>
> the output of this is "2000-01-01" "2000-02-01" "2000-03-01"
>
> But i need each date to be generated n times, if n =2 my output
>
> "2000-01-01" "2001-01-01" "2000-02-01" "2000-02-01" "2000-03-01"
> "2000-03-01"
>
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Generating-Sequence-of-Dates-tp4090672p4090672.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Need help

2011-11-20 Thread dilshan benaragama
Hi,  R development team,
 
I am trying to use PCA in labdsv package.I need to build the ordination plot 
from scratch. I used the following code (which is used in RDA) and I cannot get 
the species (variable centroids)  to the ordination plot, only I can plot 
sample unit scores.Can somone help me on this. 
 
pca.tr1<-pca(pca1,dim=2)
plot(pca.tr1$scores, type="n", main="plot")
text(pca.tr1$scores, display="species", col="blue", pch=16)
 
Thanks.
 
Dilshan
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Generating Sequence of Dates

2011-11-20 Thread arunkumar1111
Hi.

  I need to generate a sequence of date, I need to generate date for a
period. Each date should be generated n times.

E.g

seq(as.Date("2000/1/1"), by="month", length.out=3)

the output of this is "2000-01-01" "2000-02-01" "2000-03-01"

But i need each date to be generated n times, if n =2 my output 

"2000-01-01" "2001-01-01" "2000-02-01" "2000-02-01" "2000-03-01"
"2000-03-01"




--
View this message in context: 
http://r.789695.n4.nabble.com/Generating-Sequence-of-Dates-tp4090672p4090672.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] coverage plot

2011-11-20 Thread sutada Mungpakdee
Hi,

I'm very beginner for R but I think it is a time to start as it is very useful.

I have a coverage read file (illusmp454merCbed) for whole genome ~ 450 Mbp. 
This is head of this file.
Scaffoldsca_positioncoverage
Scaffold1   1   0
Scaffold1   2   0
Scaffold1   3   0
Scaffold1   4   0
Scaffold1   5   0
Scaffold1   6   0
Scaffold1   7   1
Scaffold1   8   3
Scaffold1   9   3

I would like to plot everage coverage for every 1 kbp for whole genome. May I 
use R as below? Should I use IRanges?

data <-read.table(file="~/q20snpref/illusmp454merCbed",sep="\t",header=F)
colnames(data)<-c("Scaffold","sca_position","coverage")
depth<-mean(data[,"coverage'])
#depth now has the mean (overall)coverage
#set the bin-size
window<-1001 
rangefrom<-0
rangeto<-length(data[,"sca_position"])
data.1kb<-runmean(data[,"coverage"],k=window)
png(file="cov_1k.png",width=400,height=1000)
plot(x=data[rangefrom:rangeto,"sca_position"],y=data.1kb[rangefrom:rangeto],pch=".",cex1,xlab="bp
 position",ylab="depth",type="p")
dev.off()

I got these error.
  
Error: unexpected symbol in:
"rangefrom<-0
rangeto<-length(data[,"sca_position"
> data.1kb<-runmed(data[,"coverage"],k=window)
Error in as.integer(k) : 
  cannot coerce type 'closure' to vector of type 'integer'
> png(file="cov_1k.png",width=400,height=1000)
> plot(x=data[rangefrom:rangeto,"sca_position"],y=data.1kb[rangefrom:rangeto],pch=".",cex1,xlab="bp
>  position",ylab="depth",type="p")
Error in `[.data.frame`(data, rangefrom:rangeto, "sca_position") : 
  object 'rangefrom' not found

I thought that I have done thing wrong, any suggestion will be very helpful.

Best regards,
Sutada 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] I'm writing this letter to enquire where can I download the package of "lmtest".

2011-11-20 Thread R. Michael Weylandt
Please do keep your follow up on list for archival reasons:

This isn't an area of particular expertise for me, but Google suggests
the lmtest package does do the Durbin-Watson test:

http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/lmtest/html/dwtest.html

Michael

2011/11/20 Sophy <100...@mail.tku.edu.tw>:
> Dear Michael:
>   Thanks for your quick reply. I do successfully install package "lmtest".My 
> intention is to do durbin.watson test for a vector of data. But, in "lmtest" 
> package, it only test the residual vector for a lm model. Sorry to bother you 
> again. Is there any package and syntex to do a "two-sided"  durbin.watson 
> test for a vector of data and output its pvalue?
>
> Best regards,
> Shu-FEi Wu
>
>
>  原始郵件 
>>Date: Sun, 20 Nov 2011 14:27:21 -0500
>>From: "R. Michael Weylandt" 
>>Subject: Re: [R] I'm writing this letter to enquire where can I download the 
>>package of "lmtest".
>>To: 100...@mail.tku.edu.tw
>>Cc: r-help@r-project.org
>>
>>http://cran.r-project.org/web/packages/lmtest/index.html
>>
>>Or automatically within R by typing:
>>
>>install.packages("lmtest")
>>
>>Michael
>>
>>On Sun, Nov 20, 2011 at 1:18 PM, Sophy <100...@mail.tku.edu.tw> wrote:
>>> Dear editor:
>>> . I'm writing this letter to enquire where can I download the package of 
>>> "lmtest". Can you send me this package?
>>>
>>> THanks a lot.
>>> Best regards,
>>> Shu-Fei Wu
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lattice xyplot log scale labels help!

2011-11-20 Thread Deepayan Sarkar
On Fri, Sep 16, 2011 at 6:56 PM, Deepayan Sarkar
 wrote:
> On Fri, Sep 16, 2011 at 2:17 AM, Cram Rigby  wrote:
>> I have a problem with lattice log scales that I could use some help with.
>>
>> I'm trying to print log y-axis scales without exponents in the labels.
>>  A similar thread with Deepayan' recommendation is here:
>> http://tolstoy.newcastle.edu.au/R/e11/help/10/09/9865.html.  For
>> example, this code using xyplot produces a logged y-axis but the
>> labels  (e.g. "10^1.2") are not very user-friendly:
>>
>> xyplot(24:300~24:300, scales=list(y=list(log=T)))
>>
>> So, trying another y.scale.component function, we get something more
>> agreeable for y-axis scale labels:
>>
>> xyplot(24:300~24:300, scales=list(y=list(log=T)), yscale.components =
>> yscale.components.log10.3)
>>
>>
>> However, my problem is that occasionally I'll have to plot data that
>> doesn't quite "work".  For example, in the following example, I only
>> get one y-axis scale label:
>>
>> xyplot(11:30~11:30, scales=list(y=list(log=T)), yscale.components =
>> yscale.components.log10.3)
>>
>> or worse, no y-axis scale labels:
>>
>> xyplot(11:19~11:19, scales=list(log=T), yscale.components =
>> yscale.components.log10.3)
>>
>>
>> What would be most helpful is if someone can show me how to use an
>> xyplot y-scale function to mimic log y-scale labels generated with the
>> standard plot command.  This seems to work regardless of the
>> underlying data range:
>>
>> plot(11:30,11:30,log = "y")
>> plot(24:300,24:300,log="y")
>
> That is because the standard graphics log-axis rules (which is
> codified in axTicks(), and depends critically on par("yaxp")) is more
> sophisticated than latticeExtra:::logTicks() (which was originally
> written as a proof-of-concept demo).
>
> To make logTicks() equally sophisticated, we need to replicate the
> logic used to compute par("yaxp") (in src/main/graphics.c, I think),
> which is doable, but not trivial.

And thanks to Martin Maechler, these nontrivial computations are now
more easily accessible outside the traditional graphics world. In the
development version of lattice (on r-forge, not yet released to CRAN),
you can do

xyplot(11:30~11:30, scales=list(y=list(log=T, equispaced.log = FALSE)))

etc. This requires R 2.14.0 or better.

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] install.package tseries

2011-11-20 Thread David Winsemius


On Nov 20, 2011, at 6:14 PM, Joaquim Andrade wrote:

I have not been successfull in downloading tseries package in the R  
in my macbook air.


The message sent is:Error in dyn.load(file, DLLpath = DLLpath, ...) :
 impossível carregar objeto compartilhado '/Library/Frameworks/


If you upgraded to 2.14 before 2011-11-10, then you should upgrade  
again (as has been advised several times, both on this list and the  
Mac-SIG list.)  If that doesn't work then read the Posting Guide for  
what is considered a minimal problem report.





Do you have any clue?


Only when you give sufficient evidence.
**

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

##
--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regarding as. Date function

2011-11-20 Thread David Winsemius


On Nov 20, 2011, at 10:46 PM, Shant Ch wrote:



Hi all,



I have a data set
containing dates in the format of m/d/y starting from 1950. I wanted  
to

use standard format of date R uses. So I read the data set in R and
converted it using as.Dates. But for dates from 1950-1968, I have the
following problem.

as.Date("1/1/50","%m/%d/%y",origin="1900/01/01") ="2050-01-01"  
instead of "1950-01-01"


But for dates 1969 onwards it is working fine.


As is documented in ?strptime

%y
Year without century (00–99). On input, values 00 to 68 are prefixed  
by 20 and 69 to 99 by 19 – that is the behaviour specified by the 2004  
and 2008 POSIX standards, but they do also say ‘it is expected that in  
a future version the default century inferred from a 2-digit year will  
change’.






as.Date("1/1/69","%m/%d/%y") = "1969-01-01"


Perhaps some greppingg surgery?

> as.Date(sub("(.+/.+/)([0-6])", "\\119\\2", "1/1/69"),"%m/%d/%Y")
[1] "1969-01-01"

You have not really described the problem completely  what is the  
range of dates we need to work with?  Do you have any "1/1/05"  type  
values and if so what century should be assumed?




Can anyone please help me find out the problem?

Thanks
Shant
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Deleting multiple rows from a data matrix based on exp value

2011-11-20 Thread Dennis Murphy
Without a reproducible example this is just a guess, but try

Matrix[apply(Matrix, 1, function(x) any(x > 1.11)), ]

This will retain all rows of Matrix where at least one value in a row
is above the threshold 1.11. If that doesn't do what you want, please
provide a small reproducible example and a clearer statement of the
desired output. It's not at all clear to me what 'exp. values' means -
I can devise at least three meanings off the top of my head, none of
which may conform to what you mean.

HTH,
Dennis

On Sun, Nov 20, 2011 at 2:45 PM, Peter Davidsen  wrote:
> Dear List,
>
> I have a data matrix that consists of ~4500 rows and 25 columns (i.e.
> an exprSet object that I converted via the 'exprs' function into a
> data matrix)
>
> Now I want to remove/delete the rows where all exp. values in that
> particular row are below or equal to a specific cut-off value (e.g
> 1.11)
>
> I have tried using several commands to address this issue:
>>Matrix[rowSums(Matrix <= 1.11) <= 1.11, ]
>
> or
>
>>Matrix[ !apply(Matrix<=1.11,1,all), ]
>
>
> The above commands seem to work fine when I generate a small "test" matrix 
> like:
>>M <- matrix(c(2,5,8,0.4,0.8,0.5,4,12,3), nrow=3, byrow=T)
>
> However, non of the two commands decrease the number of rows in my real 
> matrix!
>
> Any help would be appreciated
>
> Kind regards,
> Peter
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding two or more columns of a data frame for each row when NAs are present.

2011-11-20 Thread David Winsemius


On Nov 20, 2011, at 3:38 PM, Ian Strang wrote:



I am fairly new to R and would like help with the problem below. I  
am trying to sum and count several rows in the data frame yy below.  
All works well as in example 1. When I try to add the columns, with  
an NA in Q21, I get as NA as mySum. I would like NA to be treated as  
O, or igored.


"Ignored" is by far the better option and that is easily accomplished  
by reading the help page for 'sum' and using the obvious parameter  
settings.


?sum


I wrote a function to try to count an NA element as 0, Example 3  
function. It works with a few warnings, Example 4, but still gives  
NA instead of the addition when there is an NA in an element.


In Example 6 & 7, I tried using sum() but it just sums the whole  
data frame, I think,


It sums whatever you give it.



How do I add together several columns giving the result for each row  
in mySum?


?rowSums  # which also has the same parameter setting for dealing with  
NAs.




NA should be treated as a 0.


Nooo , n,  no. If it's missing it's not 0.

Please, note, I do not want to sum all the columns, as I think  
rowSums would do, just the selected ones.


Fine. then select them:

?["

--
David.


Thanks for your help.
Ian,

> yy <- read.table( header = T, sep=",", text = ## to create a  
data frame

+ "Q20, Q21, Q22, Q23, Q24
+  0,1, 2,3,4
+  1,NA,2,3,4
+  2,1, 2,3,4")
+  yy
 Q20 Q21 Q22 Q23 Q24
1   0   12   3   4
2   1  NA   2   3   4
3   2   12   3   4

> x <- transform( yy, ## Example 1
+   mySum = as.numeric(Q20) + as.numeric(Q22) + as.numeric(Q24),
+   myCount = as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21)) 
+as.numeric(!is.na(Q24))

+ )
+ x
 Q20 Q21 Q22 Q23 Q24 mySum myCount
1   0   12   3   4 6   3
2   1  NA   2   3   4 7   2
3   2   12   3   4 8   3
>
+ x <- transform( yy,  Example 2
+   mySum = as.numeric(Q20) + as.numeric(Q21) + as.numeric(Q24),
+   myCount = as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21)) 
+as.numeric(!is.na(Q24))

+ )
+ x
 Q20 Q21 Q22 Q23 Q24 mySum myCount
1   0   12   3   4 5   3
2   1  NA   2   3   4NA   2
3   2   12   3   4 7   3

> NifAvail <- function(x) { if (is.na(x)) x<-0 else x <- x
### Example 3

+   return(as.numeric(x))
+ } #end function
+ NifAvail(5)
[1] 5
+ NifAvail(NA)
[1] 0

> x <- transform( yy,
+   mySum = NifAvail(Q20) + NifAvail(Q22) + NifAvail(Q24), 
### Example 4
+   myCount = as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21)) 
+as.numeric(!is.na(Q24))

+ )
Warning messages:
1: In if (is.na(x)) x <- 0 else x <- x :
 the condition has length > 1 and only the first element will be used
2: In if (is.na(x)) x <- 0 else x <- x :
 the condition has length > 1 and only the first element will be used
3: In if (is.na(x)) x <- 0 else x <- x :
 the condition has length > 1 and only the first element will be used
> x
 Q20 Q21 Q22 Q23 Q24 mySum myCount
1   0   12   3   4 6   3
2   1  NA   2   3   4 7   2
3   2   12   3   4 8   3
> x <- transform( yy,
+   mySum = NifAvail(Q20) + NifAvail(Q21) + NifAvail(Q24),  
 Example 5
+   myCount = as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21)) 
+as.numeric(!is.na(Q24))

+ )
Warning messages:
1: In if (is.na(x)) x <- 0 else x <- x :
 the condition has length > 1 and only the first element will be used
2: In if (is.na(x)) x <- 0 else x <- x :
 the condition has length > 1 and only the first element will be used
3: In if (is.na(x)) x <- 0 else x <- x :
 the condition has length > 1 and only the first element will be used
> x
 Q20 Q21 Q22 Q23 Q24 mySum myCount
1   0   12   3   4 5   3
2   1  NA   2   3   4NA   2
3   2   12   3   4 7   3


> x <- transform( yy, 
 Example 6
+   mySum = sum(as.numeric(Q20), as.numeric(Q21), as.numeric(Q23),  
na.rm=T),
+   myCount = as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21)) 
+as.numeric(!is.na(Q24))

+ )
+ x
 Q20 Q21 Q22 Q23 Q24 mySum myCount
1   0   12   3   414   3
2   1  NA   2   3   414   2
3   2   12   3   414   3

> x <- transform( yy,
# Example 7
+   mySum = sum(as.numeric(Q20), as.numeric(Q22), as.numeric(Q23),  
na.rm=T),
+   myCount = as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21)) 
+as.numeric(!is.na(Q24))

+ )
+ x
 Q20 Q21 Q22 Q23 Q24 mySum myCount
1   0   12   3   418   3
2   1  NA   2   3   418   2
3   2   12   3   418   3

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-he

Re: [R] logistic regression by glm

2011-11-20 Thread David Winsemius


On Nov 20, 2011, at 7:26 PM, 屠鞠传礼 wrote:


Thank you very much :)
I search on net and find sometimes response value in logistic model  
can have more than 2 values, and the way of this kinds of regression  
is called "Ordinal Logistic Regression". and even we can caculate it  
by the same way I mean glm in R.

here are some references:
1. http://en.wikipedia.org/wiki/Ordered_logit
2. http://www.stat.ubc.ca/~rollin/teach/643w04/lec/node62.html
above two tell us what is "Ordinal Logistic Regression".
3. http://www.ats.ucla.edu/stat/r/dae/ologit.htm
this show that we can use glm to model it


When I looked through the UCLA code it appeared they were using the  
Design package (now superseded by the `rms` package) and that the  
function was `lrm` rather than `glm`. In addition to Harrell's  
excellent text which has a full chapter on this topic you might also  
want to look at Laura Thompson's Companion to Agresti's text:


https://home.comcast.net/~lthompson221/Splusdiscrete2.pdf

--
David.



ÔÚ 2011-11-21 00:56:33£¬"Uwe Ligges" dortmund.de> дµÀ£º



On 20.11.2011 17:27, ÍÀ¾Ï´«Àñ wrote:

I worried it too, Do you have idear that what tools I can use?



Depends on your aims - what you want to do with the fitted model.
A multinomial model, some kind of discriminant analysis (lda, qda),  
tree
based methods, svm and so son come to mind. You probably want to  
discuss
this on some statistics mailing list/forum or among local experts  
rather

than on the R list. Since this is actually not that R releated.

Uwe Ligges








ÔÚ 2011-11-21 00:13:26£¬"Uwe Ligges"dortmund.de>  дµÀ£º



On 20.11.2011 16:58, ÍÀ¾Ï´«Àñ wrote:

Thank you Ligges :)
one more question:
my response value "diagnostic" have 4 levels (0, 1, 2 and 3), so  
I use it like this:

"as.factor(diagnostic) ~ as.factor(7161521) +as.factor(2281517)"
Is it all right?



Uhh. 4 levels? Than I doubt logistic regression is the right tool  
for
you. Please revisit the theory first: It is intended for 2  
levels...



Uwe Ligges










ÔÚ 2011-11-20 23:45:23£¬"Uwe Ligges"d.de>   дµÀ£º



On 20.11.2011 12:46, tujchl wrote:

HI

I use glm in R to do logistic regression. and treat both  
response and

predictor as factor
In my first try:

***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(7161521) +
as.factor(2281517), family = binomial())

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5370 -1.0431 -0.9416 1.3065 1.4331

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
as.factor(7161521)2 1.39811 0.66618 2.099 0.0358 *
as.factor(7161521)3 0.28192 0.83255 0.339 0.7349
as.factor(2281517)2 -1.11284 0.63692 -1.747 0.0806 .
as.factor(2281517)3 -0.02286 0.80708 -0.028 0.9774
---
Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯  
0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1


(Dispersion parameter for binomial family taken to be 1)

Null deviance: 678.55 on 498 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

And I remodel it and *want no intercept*:
***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(2281517) +
as.factor(7161521) - 1, family = binomial())

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5370 -1.0431 -0.9416 1.3065 1.4331

Coefficients:
Estimate Std. Error z value Pr(>|z|)
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 *
as.factor(2281517)2 -1.6965 0.6751 -2.513 0.0120 *
as.factor(2281517)3 -0.6065 0.8325 -0.728 0.4663
as.factor(7161521)2 1.3981 0.6662 2.099 0.0358 *
as.factor(7161521)3 0.2819 0.8325 0.339 0.7349
---
Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯  
0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1


(Dispersion parameter for binomial family taken to be 1)

Null deviance: 691.76 on 499 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

*As show above in my second model it return no intercept but  
look this:

Model1:
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
Model2:
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 **

They are exactly the same. Could you please tell me what happen?


Actually it does not make sense to estimate the model without an
intercept unless you assume that it is exactly zero for the  
first levels
of your factors. Think about the contrasts you are interested  
in. Looks

like not the default?

Uwe Ligges




Thank you in advance


--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-regression-by-glm-tp4088471p4088471.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the p

Re: [R] Cox proportional hazards confidence intervals

2011-11-20 Thread David Winsemius


On Nov 20, 2011, at 6:34 PM, Paul Johnston wrote:


I am calculating cox propotional hazards models with the coxph
function from the survival package.  My data relates to failure of
various types of endovascular interventions.  I can successfully
obtain the LR, Wald, and Score test p-values from the coxph.object, as
well as the hazard ratio as follows:

formula.obj = Surv(days, status) ~ type
coxph.model = coxph(formula.obj, df)
fit = summary(coxph.model)
hazard.ratio = fit$conf.int[1]
lower95 = fit$conf.int[3]
upper95 = fit$conf.int[4]
logrank.p.value = fit$sctest[3]
wald.p.value = fit$waldtest[3]
lr.p.value = fit$logtest[3]

I had intended to report logrank P values with the hazard ratio and CI
obtained from this function.  In one case the P was 0.04 yet the CI
crossed one, which confused me, and certainly will raise questions by
reviewers.  In retrospect I can see that the CI calculated by coxph is
intimately related to the Wald p-value (which in this specific case
was 0.06), so this would appear to me not a good strategy for
reporting my results (mixing the logrank test with the HR and CIs from
coxph).

I can report the Wald p-values instead, but I have read that the Wald
test is inferior to the score test or LR test.  My questions for
survival analysis jockeys out there / TT:

1. Should I just stop here and use the wald.p.value?  This appears to
be what Stata does with the stcox function (albeit Breslow method).


I don't understand two things: Why would your report the inferior  
result, and I suppose  I also wonder why does it make that much  
difference? The estimate is what it is and a p-value of .04 is not  
that different from one of .06. Or are we dealing with religious  
beliefs here?





2. Should I calculate HR and CIs that "agree" with the LR or logrank
P?  How do I do that?


Therneau and Grambsch show how to calculate profile likelihood curves  
that can be used to generate confidence intervals on pages 57-59 of  
"Modeling Survival Data". This "survival analysis jockey" considers  
that book an essential reference. They basically use the offset  
capacity to construct 50 likelihoods around the estimate for one  
particular variable in a more complete model and then show where the  
97.5th and 0.025th percentile points are for an beta estimate based on  
a chi-square distribution for these log-likelihoods.


Further code not possible in the absence of the complete formula.

--
David.




Thank you,
Paul


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Continuasly Compunded Returns with quantmod-data

2011-11-20 Thread Dennis Murphy
Hi:

Start by writing a small utility function to compute the CCR (not to
be confused with the rock band):

ccr <- function(x) 100 * (log(x[-1]) - log(x[-length(x)]))

# Small test:
x <- round(rnorm(10, 5, 0.5), 2)
ccr(x)  # Works on vectors...

# Load up the stock data:
library(quantmod)
getSymbols("GOOG",from="2011-11-01")
GOOG1<-GOOG[,1]

# Try it directly:
> ccr(GOOG[, 1])
   GOOG.Open
2011-11-02 0
2011-11-03 0
2011-11-04 0
2011-11-07 0
2011-11-08 0
2011-11-09 0
2011-11-10 0
2011-11-11 0
2011-11-14 0
2011-11-15 0
2011-11-16 0
2011-11-17 0

## Hmm, no go. Try this instead:

> apply(GOOG1, 2, ccr)
 GOOG.Open
2011-11-02  0.82403900
2011-11-03  0.35839274
2011-11-04  1.10123942
2011-11-07 -0.03033316
2011-11-08  2.60843853
2011-11-09 -0.78136988
2011-11-10  0.27598990
2011-11-11 -0.76704898
2011-11-14  1.10809039
2011-11-15  0.78637365
2011-11-16 -0.11756255
2011-11-17 -0.33220719
2011-11-18 -1.32834757

If you look at the structure of GOOG1 [str(GOOG1)], you'll see that it
is a one column matrix, so apply() is one way to go, especially if you
want to run the function on multiple columns of a matrix.

HTH,
Dennis

On Sun, Nov 20, 2011 at 3:52 PM, barb  wrote:
> Hey guys,
>
> i want to calculate the continuasly compounded returns for stock prices.
>
> Formula for CCR:
> R_t = ln(P_t/P_{t-1})*100
>
> With R:
>
> First i have to modify the vectors, so that they have the same length
> and we start at the second observation.
>
> log(GOOG1[-1]/GOOG1[1:length(GOOG1)-1])*100
>
> That does work with normal vectors.
>
> My Questions:
>
> 1) I want to use this for stock prices.
>
>
> so i use:
>
> library(quantmod)
> getSymbols("GOOG",from="2011-11-01")
> GOOG1<-GOOG[,1]
>
>
> If i use my formula i get only the value "1" for every observation :(
>
>
>
> Thanks for your time and help!
> I appreciate it
>
> Regards
> Tonio
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Continuasly-Compunded-Returns-with-quantmod-data-tp4090014p4090014.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding two or more columns of a data frame for each row when NAs are present.

2011-11-20 Thread Dennis Murphy
Hi:

Does this work for you?

> yy
  Q20 Q21 Q22 Q23 Q24
1   0   1   2   3   4
2   1  NA   2   3   4
3   2   1   2   3   4

rowSums(yy, na.rm = TRUE)
 1  2  3
10 10 12

# Use a subset of the variables in yy:
selectVars <- paste('Q', c(20, 21, 24), sep = '')
rowSums(yy[, selectVars], na.rm = TRUE)
1 2 3
5 5 7

HTH,
Dennis
On Sun, Nov 20, 2011 at 12:38 PM, Ian Strang  wrote:
>
> I am fairly new to R and would like help with the problem below. I am trying
> to sum and count several rows in the data frame yy below. All works well as
> in example 1. When I try to add the columns, with an NA in Q21, I get as NA
> as mySum. I would like NA to be treated as O, or igored.
> I wrote a function to try to count an NA element as 0, Example 3 function.
> It works with a few warnings, Example 4, but still gives NA instead of the
> addition when there is an NA in an element.
>
> In Example 6 & 7, I tried using sum() but it just sums the whole data frame,
> I think,
>
> How do I add together several columns giving the result for each row in
> mySum? NA should be treated as a 0. Please, note, I do not want to sum all
> the columns, as I think rowSums would do, just the selected ones.
>
> Thanks for your help.
> Ian,
>
>> yy <- read.table( header = T, sep=",", text =     ## to create a data
>> frame
> + "Q20, Q21, Q22, Q23, Q24
> +  0,1, 2,3,4
> +  1,NA,2,3,4
> +  2,1, 2,3,4")
> +  yy
>  Q20 Q21 Q22 Q23 Q24
> 1   0   1    2   3   4
> 2   1  NA   2   3   4
> 3   2   1    2   3   4
>
>> x <- transform( yy,     ## Example 1
> +   mySum = as.numeric(Q20) + as.numeric(Q22) + as.numeric(Q24),
> +   myCount =
> as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24))
> + )
> + x
>  Q20 Q21 Q22 Q23 Q24 mySum myCount
> 1   0   1    2   3   4     6       3
> 2   1  NA   2   3   4     7       2
> 3   2   1    2   3   4     8       3
>>
> + x <- transform( yy,      Example 2
> +   mySum = as.numeric(Q20) + as.numeric(Q21) + as.numeric(Q24),
> +   myCount =
> as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24))
> + )
> + x
>  Q20 Q21 Q22 Q23 Q24 mySum myCount
> 1   0   1    2   3   4     5       3
> 2   1  NA   2   3   4    NA       2
> 3   2   1    2   3   4     7       3
>
>> NifAvail <- function(x) { if (is.na(x)) x<-0 else x <- x   ###
>> Example 3
> +   return(as.numeric(x))
> + } #end function
> + NifAvail(5)
> [1] 5
> + NifAvail(NA)
> [1] 0
>
>> x <- transform( yy,
> +   mySum = NifAvail(Q20) + NifAvail(Q22) + NifAvail(Q24),
>  ### Example 4
> +   myCount =
> as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24))
> + )
> Warning messages:
> 1: In if (is.na(x)) x <- 0 else x <- x :
>  the condition has length > 1 and only the first element will be used
> 2: In if (is.na(x)) x <- 0 else x <- x :
>  the condition has length > 1 and only the first element will be used
> 3: In if (is.na(x)) x <- 0 else x <- x :
>  the condition has length > 1 and only the first element will be used
>> x
>  Q20 Q21 Q22 Q23 Q24 mySum myCount
> 1   0   1    2   3   4     6       3
> 2   1  NA   2   3   4     7       2
> 3   2   1    2   3   4     8       3
>> x <- transform( yy,
> +   mySum = NifAvail(Q20) + NifAvail(Q21) + NifAvail(Q24),
>  Example 5
> +   myCount =
> as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24))
> + )
> Warning messages:
> 1: In if (is.na(x)) x <- 0 else x <- x :
>  the condition has length > 1 and only the first element will be used
> 2: In if (is.na(x)) x <- 0 else x <- x :
>  the condition has length > 1 and only the first element will be used
> 3: In if (is.na(x)) x <- 0 else x <- x :
>  the condition has length > 1 and only the first element will be used
>> x
>  Q20 Q21 Q22 Q23 Q24 mySum myCount
> 1   0   1    2   3   4     5       3
> 2   1  NA   2   3   4    NA       2
> 3   2   1    2   3   4     7       3
>
>
>> x <- transform( yy,                                        
>> Example 6
> +   mySum = sum(as.numeric(Q20), as.numeric(Q21), as.numeric(Q23), na.rm=T),
> +   myCount =
> as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24))
> + )
> + x
>  Q20 Q21 Q22 Q23 Q24 mySum myCount
> 1   0   1    2   3   4    14       3
> 2   1  NA   2   3   4    14       2
> 3   2   1    2   3   4    14       3
>
>> x <- transform( yy,                                       #
>> Example 7
> +   mySum = sum(as.numeric(Q20), as.numeric(Q22), as.numeric(Q23), na.rm=T),
> +   myCount =
> as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24))
> + )
> + x
>  Q20 Q21 Q22 Q23 Q24 mySum myCount
> 1   0   1    2   3   4    18       3
> 2   1  NA   2   3   4    18       2
> 3   2   1    2   3   4    18       3
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self

Re: [R] Inserting blank records in a barplot

2011-11-20 Thread Richard M. Heiberger
## Kodar,

tmp <- textConnection("Date   Time ts_mean_RR ts_sdnn_RR ts_mean_HR
ts_sdnn_HR ts_rmssd
20110905 07:21:50.0  1139.829840.305352.7393 2.2824  45.7958
20110906 07:11:37.0  1182.733349.186150.8665 2.4983  60.2329
20110907 07:21:31.0  1136.602849.468252.9566 2.9231  47.6896
20110908 07:23:53.0  1133.347553.771453.1378 3.1837  54.5673
20110909 07:29:21.0  1110.319443.176854.2002 2.8508  40.4889
20110910 09:34:02.0  1041.759758.515057.9255 4.1735  53.6907
20110911 11:19:24.0   994.850972.978660.7633 5.2805  63.8904
20110912 07:03:06.0  1133.425537.477153.0426 2.1921  46.8450
20110913 07:18:43.0  1165.379668.604851.8129 3.7769  65.2377"
)

heart <- read.table(tmp, header=TRUE); close(tmp)
heart$DateTime <- as.POSIXlt(paste(as.character(heart$Date),
as.character(heart$Time)), format="%Y%m%d %H:%M:%S")

xyplot(ts_mean_RR + ts_sdnn_RR + ts_mean_HR + ts_sdnn_HR + ts_rmssd ~
DateTime,
   data=heart, outer=TRUE, pch=16,
scales=list(y=list(relation="free")), layout=c(1,5))


## This handles missing days correctly.  I am changing the first of your
values to August 26.
## And the fourth to 7PM
heart$DateTime[1] <- heart$DateTime[1] - 10*24*60*60
heart$DateTime[4] <- heart$DateTime[4] + 12*60*60

xyplot(ts_mean_RR + ts_sdnn_RR + ts_mean_HR + ts_sdnn_HR + ts_rmssd ~
DateTime,
   data=heart, outer=TRUE, pch=16,
scales=list(y=list(relation="free")), layout=c(1,5))

## You probably should not use barcharts for this data.

## Read about time classes in these two documents
## Please see the articles
## Grothendieck & Petzoldt (2004). Date and Time Classes in R.
##  R News, 4(1), 29-32.
http://www.R-project.org/doc/Rnews/
## for a good introduction.
## Also see the related JSS publication:
##  Garrett Grolemund, Hadley Wickham (2011).
##  Dates and Times Made Easy with lubridate.
##  Journal of Statistical Software, 40(3), 1-25.
##  http://www.jstatsoft.org/v40/i03/.

## Rich



On Sun, Nov 20, 2011 at 4:25 PM, kodar  wrote:

> Hi everyone,
>
> I currently do some statistics about my heart rate variability. I've a CSV
> file which looks like this:
>
>  Date   Time ts_mean_RR ts_sdnn_RR ts_mean_HR ts_sdnn_HR ts_rmssd
> 1  20110905 07:21:50.0  1139.829840.305352.7393 2.2824  45.7958
> 2  20110906 07:11:37.0  1182.733349.186150.8665 2.4983  60.2329
> 3  20110907 07:21:31.0  1136.602849.468252.9566 2.9231  47.6896
> 4  20110908 07:23:53.0  1133.347553.771453.1378 3.1837  54.5673
> 5  20110909 07:29:21.0  1110.319443.176854.2002 2.8508  40.4889
> 6  20110910 09:34:02.0  1041.759758.515057.9255 4.1735  53.6907
> 7  20110911 11:19:24.0   994.850972.978660.7633 5.2805  63.8904
> 8  20110912 07:03:06.0  1133.425537.477153.0426 2.1921  46.8450
> 9  20110913 07:18:43.0  1165.379668.604851.8129 3.7769  65.2377
>
> I'll plot one of these column as barplot with the 'Date' field in the
> x-axis. But as some days I miss to record my heart, the days in the first
> column are not always consecutive. Therefore I'm looking for a technique
> with which I can visually show these blank record in my barplot diagram. I
> know I can add manually these blank records directly in the CSV file but
> I'll avoid this process since the CSV file is generated automatically and
> can be overwritten.
>
> I think I should first create an array of the days I want to plot and try
> to
> match the 'Date' column with this array. But as I'm new in R I've no idea
> how I can do that in a R script.
>
> Anyone can put me on the right track or give me a simple example ?
>
> Thanks in advance for the help.
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Inserting-blank-records-in-a-barplot-tp4089619p4089619.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] install.package tseries

2011-11-20 Thread Hasan Diwan
On 21 November 2011 00:14, Joaquim Andrade  wrote:
> Do you have any clue?
Works for me... How about some further details?
> install.packages('tseries')
--- Please select a CRAN mirror for use in this session ---
Loading Tcl/Tk interface ... done
also installing the dependency ‘quadprog’

trying URL 
'http://mirrors.nics.utk.edu/cran/bin/macosx/leopard/contrib/2.14/quadprog_1.5-4.tgz'
Content type 'application/x-gzip' length 83092 bytes (81 Kb)
opened URL
==
downloaded 81 Kb

trying URL 
'http://mirrors.nics.utk.edu/cran/bin/macosx/leopard/contrib/2.14/tseries_0.10-27.tgz'
Content type 'application/x-gzip' length 393905 bytes (384 Kb)
opened URL
==
downloaded 384 Kb


The downloaded packages are in

/var/folders/3r/6x1cwkvn5lj6wk5pysgyxmm8gn/T//Rtmpp158in/downloaded_packages



-- 
Sent from my mobile device
Envoyait de mon portable

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regarding as. Date function

2011-11-20 Thread Shant Ch

Hi all, 

 

I have a data set 
containing dates in the format of m/d/y starting from 1950. I wanted to 
use standard format of date R uses. So I read the data set in R and 
converted it using as.Dates. But for dates from 1950-1968, I have the 
following problem. 

as.Date("1/1/50","%m/%d/%y",origin="1900/01/01") ="2050-01-01" instead of 
"1950-01-01"

But for dates 1969 onwards it is working fine.

as.Date("1/1/69","%m/%d/%y") = "1969-01-01"


Can anyone please help me find out the problem?

Thanks
Shant
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data analysis: normal approximation for binomial

2011-11-20 Thread Colstat
Hey, John

I like the explicit formula they put in there.  I looked around last night
and found this
http://www.stat.yale.edu/Courses/1997-98/101/binom.htm

which is basically normal approximation to the binomial, I thought that was
what the author was trying to get at?

Colin

On Sun, Nov 20, 2011 at 8:49 AM, John Kane  wrote:

> Hi Colin,
>
> I'm no statistician and it's been a very long time but IIRC a t-test is a
> 'modified version of a x-test that is used on small sample sizes.  (I can
> hear some of our statistians screaming in the background as I type.)
>
> In any case I thing a Z distribution is descrete and a standard normal is
> not so a user can use Yates continuity correction to interpolate values for
> the normal between the discrete z-values.  Or something like this.
>
> I have only encountered it once in a Psych stats course taught by an
> animal geneticist who seemed to think it was important. To be honest, it
> looked pretty trivial for the type of data I'd be likely to see.
>
> I cannot remember ever seeing a continuity correction used in a published
> paper--for that matter I have trouble remembering a z-test.
>
> If you want more information on the subject I found a very tiny bit of
> info at
> http://books.google.ca/books?id=SiJ2UB3dv9UC&pg=PA139&lpg=PA139&dq=z-test+with+continuity+correction&source=bl&ots=0vMTCUZWXx&sig=bfCPx0vynGjA0tHLRAf6B42x0mM&hl=en&ei=nQHJTo7LPIrf0gHxs6Aq&sa=X&oi=book_result&ct=result&resnum=2&ved=0CC0Q6AEwAQ#v=onepage&q=z-test%20with%20continuity%20correction&f=false
>
> A print source that, IIRC, has a discussion of this is "Hayes, W. (1981.
> Statistics. 3rd Ed., Holt Rinehart and Winston
>
> Have fun
>
> --- On Sat, 11/19/11, Colstat  wrote:
>
> > From: Colstat 
> > Subject: [R] Data analysis: normal approximation for binomial
> > To: r-help@r-project.org
> > Received: Saturday, November 19, 2011, 6:01 PM
> > Dear R experts,
> >
> > I am trying to analyze data from an article, the data looks
> > like this
> >
> > Patient Age Sex Aura preCSM preFreq preIntensity postFreq
> > postIntensity
> > postOutcome
> > 1 47 F A 4 6 9 2 8 SD
> > 2 40 F A/N 5 8 9 0 0 E
> > 3 49 M N 5 8 9 2 6 SD
> > 4 40 F A 5 3 10 0 0 E
> > 5 42 F N 5 4 9 0 0 E
> > 6 35 F N 5 8 9 12 7 NR
> > 7 38 F A 5 NA 10 2 9 SD
> > 8 44 M A 4 4 10 0 0 E
> > 9 47 M A 4 5 8 2 7 SD
> > 10 53 F A 5 3 10 0 0 E
> > 11 41 F N 5 6 7 0 0 E
> > 12 49 F A 4 6 8 0 0 E
> > 13 48 F A 5 4 8 0 0 E
> > 14 63 M N 4 6 9 15 9 NR
> > 15 58 M N 5 9 7 2 8 SD
> > 16 53 F A 4 3 9 0 0 E
> > 17 47 F N 5 4 8 1 4 SD
> > 18 34 F A NA  5 9 0 0 E
> > 19 53 F N 5 4 9 5 7 NR
> > 20 45 F N 5 5 8 5 4 SD
> > 21 30 F A 5 3 8 0 0 E
> > 22 29 F A 4 5 9 0 0 E
> > 23 49 F N 5 9 10 0 0 E
> > 24 24 F A 5 5 9 0 0 E
> > 25 63 F N 4 19 7 10 7 NR
> > 26 62 F A 5 8 9 11 9 NR
> > 27 44 F A 5 3 10 0 0 E
> > 28 38 F N 4 8 10 1 3 SD
> > 29 38 F N 5 3 10 0 0 E
> >
> > How do I do a binomial distribution z statistics with
> > continuity
> > correction? basically normal approximation.
> > Could anyone give me some suggestions what I (or R) can do
> > with these data?
> > I have tried tried histogram, maybe t-test? or even
> > lattice?  what else can
> > I(or can R) do?
> > help please, thanks so much.
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org
> > mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> > reproducible code.
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression by glm

2011-11-20 Thread 屠鞠传礼
Thank you very much :)
I search on net and find sometimes response value in logistic model can have 
more than 2 values, and the way of this kinds of regression is called "Ordinal 
Logistic Regression". and even we can caculate it by the same way I mean glm in 
R.
here are some references:
1. http://en.wikipedia.org/wiki/Ordered_logit
2. http://www.stat.ubc.ca/~rollin/teach/643w04/lec/node62.html
above two tell us what is "Ordinal Logistic Regression".
3. http://www.ats.ucla.edu/stat/r/dae/ologit.htm
this show that we can use glm to model it





ÔÚ 2011-11-21 00:56:33£¬"Uwe Ligges"  дµÀ£º
>
>
>On 20.11.2011 17:27, ÍÀ¾Ï´«Àñ wrote:
>> I worried it too, Do you have idear that what tools I can use?
>
>
>Depends on your aims - what you want to do with the fitted model.
>A multinomial model, some kind of discriminant analysis (lda, qda), tree 
>based methods, svm and so son come to mind. You probably want to discuss 
>this on some statistics mailing list/forum or among local experts rather 
>than on the R list. Since this is actually not that R releated.
>
>Uwe Ligges
>
>
>
>>
>>
>>
>>
>> ÔÚ 2011-11-21 00:13:26£¬"Uwe Ligges"  дµÀ£º
>>>
>>>
>>> On 20.11.2011 16:58, ÍÀ¾Ï´«Àñ wrote:
 Thank you Ligges :)
 one more question:
 my response value "diagnostic" have 4 levels (0, 1, 2 and 3), so I use it 
 like this:
 "as.factor(diagnostic) ~ as.factor(7161521) +as.factor(2281517)"
 Is it all right?
>>>
>>>
>>> Uhh. 4 levels? Than I doubt logistic regression is the right tool for
>>> you. Please revisit the theory first: It is intended for 2 levels...
>>>
>>>
>>> Uwe Ligges
>>>
>>>
>>>
>>>
>>>




 ÔÚ 2011-11-20 23:45:23£¬"Uwe Ligges"   
 дµÀ£º
>
>
> On 20.11.2011 12:46, tujchl wrote:
>> HI
>>
>> I use glm in R to do logistic regression. and treat both response and
>> predictor as factor
>> In my first try:
>>
>> ***
>> Call:
>> glm(formula = as.factor(diagnostic) ~ as.factor(7161521) +
>> as.factor(2281517), family = binomial())
>>
>> Deviance Residuals:
>> Min 1Q Median 3Q Max
>> -1.5370 -1.0431 -0.9416 1.3065 1.4331
>>
>> Coefficients:
>> Estimate Std. Error z value Pr(>|z|)
>> (Intercept) -0.58363 0.27948 -2.088 0.0368 *
>> as.factor(7161521)2 1.39811 0.66618 2.099 0.0358 *
>> as.factor(7161521)3 0.28192 0.83255 0.339 0.7349
>> as.factor(2281517)2 -1.11284 0.63692 -1.747 0.0806 .
>> as.factor(2281517)3 -0.02286 0.80708 -0.028 0.9774
>> ---
>> Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1
>>
>> (Dispersion parameter for binomial family taken to be 1)
>>
>> Null deviance: 678.55 on 498 degrees of freedom
>> Residual deviance: 671.20 on 494 degrees of freedom
>> AIC: 681.2
>>
>> Number of Fisher Scoring iterations: 4
>> ***
>>
>> And I remodel it and *want no intercept*:
>> ***
>> Call:
>> glm(formula = as.factor(diagnostic) ~ as.factor(2281517) +
>> as.factor(7161521) - 1, family = binomial())
>>
>> Deviance Residuals:
>> Min 1Q Median 3Q Max
>> -1.5370 -1.0431 -0.9416 1.3065 1.4331
>>
>> Coefficients:
>> Estimate Std. Error z value Pr(>|z|)
>> as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 *
>> as.factor(2281517)2 -1.6965 0.6751 -2.513 0.0120 *
>> as.factor(2281517)3 -0.6065 0.8325 -0.728 0.4663
>> as.factor(7161521)2 1.3981 0.6662 2.099 0.0358 *
>> as.factor(7161521)3 0.2819 0.8325 0.339 0.7349
>> ---
>> Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1
>>
>> (Dispersion parameter for binomial family taken to be 1)
>>
>> Null deviance: 691.76 on 499 degrees of freedom
>> Residual deviance: 671.20 on 494 degrees of freedom
>> AIC: 681.2
>>
>> Number of Fisher Scoring iterations: 4
>> ***
>>
>> *As show above in my second model it return no intercept but look this:
>> Model1:
>> (Intercept) -0.58363 0.27948 -2.088 0.0368 *
>> Model2:
>> as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 **
>>
>> They are exactly the same. Could you please tell me what happen?
>
> Actually it does not make sense to estimate the model without an
> intercept unless you assume that it is exactly zero for the first levels
> of your factors. Think about the contrasts you are interested in. Looks
> like not the default?
>
> Uwe Ligges
>
>
>>
>> Thank you in advance
>>
>>
>> --
>> View this message in context: 
>> http://r.789695.n4.nabble.com/logistic-regression-by-glm-tp4088471p4088471.html
>> 

Re: [R] Factor Analysis/Inputting Correlation Matrix

2011-11-20 Thread biorafas
ey man!!

you need to change the format of your data as matrix


data <- as.matrix(correlation_data) 

solucion <- factanal(covmat = data, factors=2)

that´s all!!!

enjoy

Rafael Ch.

--
View this message in context: 
http://r.789695.n4.nabble.com/Factor-Analysis-Inputting-Correlation-Matrix-tp3591163p4090012.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to read the contents .dat file

2011-11-20 Thread biorafas
ey man!!

one option to read .dat file is the next:

data <- read.delim("document.dat", sep=" ")

i hope it is useful for you

Rafael



--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-read-the-contents-dat-file-tp4089943p409.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] install.package tseries

2011-11-20 Thread Joaquim Andrade
I have not been successfull in downloading tseries package in the R in my 
macbook air.

The message sent is:Error in dyn.load(file, DLLpath = DLLpath, ...) : 
  impossível carregar objeto compartilhado '/Library/Frameworks/

Do you have any clue?
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Deleting multiple rows from a data matrix based on exp value

2011-11-20 Thread Peter Davidsen
Dear List,

I have a data matrix that consists of ~4500 rows and 25 columns (i.e.
an exprSet object that I converted via the 'exprs' function into a
data matrix)

Now I want to remove/delete the rows where all exp. values in that
particular row are below or equal to a specific cut-off value (e.g
1.11)

I have tried using several commands to address this issue:
>Matrix[rowSums(Matrix <= 1.11) <= 1.11, ]

or

>Matrix[ !apply(Matrix<=1.11,1,all), ]


The above commands seem to work fine when I generate a small "test" matrix like:
>M <- matrix(c(2,5,8,0.4,0.8,0.5,4,12,3), nrow=3, byrow=T)

However, non of the two commands decrease the number of rows in my real matrix!

Any help would be appreciated

Kind regards,
Peter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using GIS data in R

2011-11-20 Thread Zuofeng Shang

Many thanks Rolf! Your help is highly appreciated
since it helps me understand the clue of solving my problem.

Have a wonderful Sunday!
Jeff

于 2011/11/20 15:52, Rolf Turner 写道:

On 21/11/11 10:19, Zuofeng Shang wrote:

Hi Rolf,

Thanks very much for suggestions! Sorry if I made my question not very
clear.

The context is as follows. I have a set of locations with coordinates
(latitude and longitude)
from Texas region. Some of the locations are within Texas while some
of them are not.
I am trying to find those locations within Texas by looking at if the
coordinate of each location
is within the boundary of Texas.

Now I agree with you that the boundary of Texas is polygon although it
looks like "irregular".
I find your previous post very helpful. inside.owin() will do the job
but I have to figure out
the boundary (as a polygon) of Texas. Since I do not have the
shapefile of Texas, this will be
difficult for me. I am not sure if the shapefile of the US states are
available in R. If it not,
how can I get the shapefiles? Thanks a lot for kind help!

This is not my area of expertise (if indeed I have such an area at all!
:-) ) but
I'm sure that maps of the states of the USA --- including Texas, even!
--- are readily available
from various sources.  I.e. from various GIS's or perhaps from the
"maps" package.

With a bit of luck someone on the list will chime in with something more
specific.

Or you could try asking "how/where can I get a shapefile specifying the
boundary
of Texas" on the R-Sig-Geo list.

HTH

  cheers,

  Rolf



--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Searching for the first two characters of a string

2011-11-20 Thread Remko Duursma

dfr <- data.frame(txt= c("abab","ghghg","ththt","dfdfdf"), yvar=1:4)

ind <- substr(dfr$txt,1,2)

dfr[ind == "ab",]



greetings,
Remko

--
View this message in context: 
http://r.789695.n4.nabble.com/Searching-for-the-first-two-characters-of-a-string-tp4089862p4090042.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Inserting blank records in a barplot

2011-11-20 Thread kodar
Hi everyone,

I currently do some statistics about my heart rate variability. I've a CSV
file which looks like this: 

 Date   Time ts_mean_RR ts_sdnn_RR ts_mean_HR ts_sdnn_HR ts_rmssd
1  20110905 07:21:50.0  1139.829840.305352.7393 2.2824  45.7958
2  20110906 07:11:37.0  1182.733349.186150.8665 2.4983  60.2329
3  20110907 07:21:31.0  1136.602849.468252.9566 2.9231  47.6896
4  20110908 07:23:53.0  1133.347553.771453.1378 3.1837  54.5673
5  20110909 07:29:21.0  1110.319443.176854.2002 2.8508  40.4889
6  20110910 09:34:02.0  1041.759758.515057.9255 4.1735  53.6907
7  20110911 11:19:24.0   994.850972.978660.7633 5.2805  63.8904
8  20110912 07:03:06.0  1133.425537.477153.0426 2.1921  46.8450
9  20110913 07:18:43.0  1165.379668.604851.8129 3.7769  65.2377

I'll plot one of these column as barplot with the 'Date' field in the
x-axis. But as some days I miss to record my heart, the days in the first
column are not always consecutive. Therefore I'm looking for a technique
with which I can visually show these blank record in my barplot diagram. I
know I can add manually these blank records directly in the CSV file but
I'll avoid this process since the CSV file is generated automatically and
can be overwritten.

I think I should first create an array of the days I want to plot and try to
match the 'Date' column with this array. But as I'm new in R I've no idea
how I can do that in a R script.

Anyone can put me on the right track or give me a simple example ?

Thanks in advance for the help. 


--
View this message in context: 
http://r.789695.n4.nabble.com/Inserting-blank-records-in-a-barplot-tp4089619p4089619.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Continuasly Compunded Returns with quantmod-data

2011-11-20 Thread barb
Hey guys,  

i want to calculate the continuasly compounded returns for stock prices. 

Formula for CCR:
R_t = ln(P_t/P_{t-1})*100

With R:

First i have to modify the vectors, so that they have the same length 
and we start at the second observation. 

log(GOOG1[-1]/GOOG1[1:length(GOOG1)-1])*100

That does work with normal vectors. 

My Questions:

1) I want to use this for stock prices.


so i use:

library(quantmod) 
getSymbols("GOOG",from="2011-11-01")
GOOG1<-GOOG[,1]


If i use my formula i get only the value "1" for every observation :( 



Thanks for your time and help!
I appreciate it

Regards
Tonio

--
View this message in context: 
http://r.789695.n4.nabble.com/Continuasly-Compunded-Returns-with-quantmod-data-tp4090014p4090014.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adding two or more columns of a data frame for each row when NAs are present.

2011-11-20 Thread Ian Strang


I am fairly new to R and would like help with the problem below. I am 
trying to sum and count several rows in the data frame yy below. All works 
well as in example 1. When I try to add the columns, with an NA in Q21, I 
get as NA as mySum. I would like NA to be treated as O, or igored.
I wrote a function to try to count an NA element as 0, Example 3 function. 
It works with a few warnings, Example 4, but still gives NA instead of the 
addition when there is an NA in an element.


In Example 6 & 7, I tried using sum() but it just sums the whole data 
frame, I think,


How do I add together several columns giving the result for each row in 
mySum? NA should be treated as a 0. Please, note, I do not want to sum all 
the columns, as I think rowSums would do, just the selected ones.


Thanks for your help.
Ian,

> yy <- read.table( header = T, sep=",", text = ## to create a data frame
+ "Q20, Q21, Q22, Q23, Q24
+  0,1, 2,3,4
+  1,NA,2,3,4
+  2,1, 2,3,4")
+  yy
  Q20 Q21 Q22 Q23 Q24
1   0   12   3   4
2   1  NA   2   3   4
3   2   12   3   4

> x <- transform( yy, ## Example 1
+   mySum = as.numeric(Q20) + as.numeric(Q22) + as.numeric(Q24),
+   myCount = 
as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24))

+ )
+ x
  Q20 Q21 Q22 Q23 Q24 mySum myCount
1   0   12   3   4 6   3
2   1  NA   2   3   4 7   2
3   2   12   3   4 8   3
>
+ x <- transform( yy,  Example 2
+   mySum = as.numeric(Q20) + as.numeric(Q21) + as.numeric(Q24),
+   myCount = 
as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24))

+ )
+ x
  Q20 Q21 Q22 Q23 Q24 mySum myCount
1   0   12   3   4 5   3
2   1  NA   2   3   4NA   2
3   2   12   3   4 7   3

> NifAvail <- function(x) { if (is.na(x)) x<-0 else x <- x   
### Example 3

+   return(as.numeric(x))
+ } #end function
+ NifAvail(5)
[1] 5
+ NifAvail(NA)
[1] 0

> x <- transform( yy,
+   mySum = NifAvail(Q20) + NifAvail(Q22) + NifAvail(Q24),
### Example 4
+   myCount = 
as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24))

+ )
Warning messages:
1: In if (is.na(x)) x <- 0 else x <- x :
  the condition has length > 1 and only the first element will be used
2: In if (is.na(x)) x <- 0 else x <- x :
  the condition has length > 1 and only the first element will be used
3: In if (is.na(x)) x <- 0 else x <- x :
  the condition has length > 1 and only the first element will be used
> x
  Q20 Q21 Q22 Q23 Q24 mySum myCount
1   0   12   3   4 6   3
2   1  NA   2   3   4 7   2
3   2   12   3   4 8   3
> x <- transform( yy,
+   mySum = NifAvail(Q20) + NifAvail(Q21) + NifAvail(Q24), 
 Example 5
+   myCount = 
as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24))

+ )
Warning messages:
1: In if (is.na(x)) x <- 0 else x <- x :
  the condition has length > 1 and only the first element will be used
2: In if (is.na(x)) x <- 0 else x <- x :
  the condition has length > 1 and only the first element will be used
3: In if (is.na(x)) x <- 0 else x <- x :
  the condition has length > 1 and only the first element will be used
> x
  Q20 Q21 Q22 Q23 Q24 mySum myCount
1   0   12   3   4 5   3
2   1  NA   2   3   4NA   2
3   2   12   3   4 7   3


> x <- transform( yy, 
Example 6

+   mySum = sum(as.numeric(Q20), as.numeric(Q21), as.numeric(Q23), na.rm=T),
+   myCount = 
as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24))

+ )
+ x
  Q20 Q21 Q22 Q23 Q24 mySum myCount
1   0   12   3   414   3
2   1  NA   2   3   414   2
3   2   12   3   414   3

> x <- transform( yy,   # 
Example 7

+   mySum = sum(as.numeric(Q20), as.numeric(Q22), as.numeric(Q23), na.rm=T),
+   myCount = 
as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24))

+ )
+ x
  Q20 Q21 Q22 Q23 Q24 mySum myCount
1   0   12   3   418   3
2   1  NA   2   3   418   2
3   2   12   3   418   3

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using GIS data in R

2011-11-20 Thread Zuofeng Shang

Hi Rolf,

Thanks very much for suggestions! Sorry if I made my question not very 
clear.


The context is as follows. I have a set of locations with coordinates 
(latitude and longitude)
from Texas region. Some of the locations are within Texas while some of 
them are not.
I am trying to find those locations within Texas by looking at if the 
coordinate of each location

is within the boundary of Texas.

Now I agree with you that the boundary of Texas is polygon although it 
looks like "irregular".
I find your previous post very helpful. inside.owin() will do the job 
but I have to figure out
the boundary (as a polygon) of Texas. Since I do not have the shapefile 
of Texas, this will be
difficult for me. I am not sure if the shapefile of the US states are 
available in R. If it not,

how can I get the shapefiles? Thanks a lot for kind help!

Best regards,
Jeff





On 21/11/11 08:01, JeffND wrote:

Hi Rolf,

I have a similar question. I want to test whether a point with certain
coordinates is inside
a state, say Texas. It seems that inside.owin() only works for testing if a
point lies in a
regular region, say a polygon. Since Texas has irregular boundary, how do we
achieve this?
Or there is some alternative way we can use.

Thanks!
Jeff

--
View this message in context: 
http://r.789695.n4.nabble.com/Using-GIS-data-in-R-tp1748266p4089242.html
Sent from the R help mailing list archive at Nabble.com.

It would be nice to have some context ... nabble strikes again.

But to get to your question:  How do you think the boundary of Texas
(irregular as it may be) is specified?  I expect that it is specified as
a polygon!!!  It may be *very* ``poly'' --- i.e. have a large number of
edges --- but it's still a polygon and inside.owin() will still work on it.

Have a look at the data set "nbfires" in the spatstat package.  New
Brunswick has an ``irregular boundary'' too!

Have you *tried* making your specification of Texas into an owin object?
Follow the instructions in the vignette given by:

  vignette("shapefiles")

  cheers,

  Rolf Turner


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cox proportional hazards confidence intervals

2011-11-20 Thread Paul Johnston
I am calculating cox propotional hazards models with the coxph
function from the survival package.  My data relates to failure of
various types of endovascular interventions.  I can successfully
obtain the LR, Wald, and Score test p-values from the coxph.object, as
well as the hazard ratio as follows:

formula.obj = Surv(days, status) ~ type
coxph.model = coxph(formula.obj, df)
fit = summary(coxph.model)
hazard.ratio = fit$conf.int[1]
lower95 = fit$conf.int[3]
upper95 = fit$conf.int[4]
logrank.p.value = fit$sctest[3]
wald.p.value = fit$waldtest[3]
lr.p.value = fit$logtest[3]

I had intended to report logrank P values with the hazard ratio and CI
obtained from this function.  In one case the P was 0.04 yet the CI
crossed one, which confused me, and certainly will raise questions by
reviewers.  In retrospect I can see that the CI calculated by coxph is
intimately related to the Wald p-value (which in this specific case
was 0.06), so this would appear to me not a good strategy for
reporting my results (mixing the logrank test with the HR and CIs from
coxph).

I can report the Wald p-values instead, but I have read that the Wald
test is inferior to the score test or LR test.  My questions for
survival analysis jockeys out there / TT:

1. Should I just stop here and use the wald.p.value?  This appears to
be what Stata does with the stcox function (albeit Breslow method).

2. Should I calculate HR and CIs that "agree" with the LR or logrank
P?  How do I do that?

Thank you,
Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data analysis: normal approximation for binomial

2011-11-20 Thread Colstat
Hey, Joshua
Thank so much for your quick response.  Those examples you produced are
very good, I'm pretty impressed by the graphs.  When I ran the last line, I
hit an error, so I ran what's inside summary(), it give me

Error: could not find function "lmer"

Something with the package "lme4"?
Colin


On Sun, Nov 20, 2011 at 1:00 AM, Joshua Wiley wrote:

> Hi Colin,
>
> I have never heard of a binomial distribution z statistic with (or
> without for that matter) a continuity correction, but I am not a
> statistician.  Other's may have some ideas there.  As for other ways
> to analyze the data, I skimmed through the article and brought the
> data and played around with some different analyses and graphs.  I
> attached a file headache.txt with all the R script (including the data
> in an Rish format).  It is really a script file (i.e., .R) but for the
> listservs sake I saved it as a txt.  There are quite a few different
> things I tried in there so hopefully it gives you some ideas.
> Regardless of the analysis type used and whether one considers
> proportion that "significantly improved" or the raw frequency or
> intensity scores, I would say that concluding the treatment was
> effective is a good conclusion.  The only real concern could be that
> people would naturally get better on their own (a control group would
> be needed to bolster the causal inference drawn from a pre/post
> measurement).  However, given at least what I know about migraines, it
> is often a fairly chronic condition so over a relatively short time
> period, it seems implausible to conclude that as many people would be
> improving as this study reported.
>
> Cheers,
>
> Josh
>
> On Sat, Nov 19, 2011 at 7:43 PM, Colstat  wrote:
> > hey, Joshua
> > I was reading this paper, in attachment, and reproducing the results.
> >
> > I was really confused when he said in the paper "The results were then
> > statistically analyzed using binomial distribution z statistics with
> > continuity correction."  The data is binomial?  To me, this is a paired
> > t-test.
> >
> > What command should I use to get those results (the first paragraph in
> > Results section)?  Basically, it's a pre and post treatment problem.
> >
> > What other graphical analysis do you think is appropriate? reshape
> package?
> > lattice package, namely conditional graph?
> >
> > I know this might be too much, but I do really appreciate it if you do
> take
> > a look at it.
> >
> > Thanks,
> > Colin
> >
> >
> > On Sat, Nov 19, 2011 at 10:15 PM, Joshua Wiley 
> > wrote:
> >>
> >> Hi,
> >>
> >> I am not clear what your goal is.  There is a variety of data there.
> >> You could look at t-test differences in preIntensity broken down by
> >> sex, you could use regression looking at postIntensity controlling for
> >> preIntensity and explained by age, you could
> >>
> >> Why are you analyzing data from an article?  What did the article do?
> >> What you mention---some sort of z statistic (what exactly this was of
> >> and how it should be calculated did not seem like was clear even to
> >> you), histogram, t-test, lattice, are all very different things that
> >> help answer different questions, show different things, and in one is
> >> a piece of software.
> >>
> >> Without a clearer question and goal, my best advice is here are a
> >> number of different functions some of which may be useful to you:
> >>
> >> ls(pos = "package:stats")
> >>
> >> Cheers,
> >>
> >> Josh
> >>
> >> On Sat, Nov 19, 2011 at 3:01 PM, Colstat  wrote:
> >> > Dear R experts,
> >> >
> >> > I am trying to analyze data from an article, the data looks like this
> >> >
> >> > Patient Age Sex Aura preCSM preFreq preIntensity postFreq
> postIntensity
> >> > postOutcome
> >> > 1 47 F A 4 6 9 2 8 SD
> >> > 2 40 F A/N 5 8 9 0 0 E
> >> > 3 49 M N 5 8 9 2 6 SD
> >> > 4 40 F A 5 3 10 0 0 E
> >> > 5 42 F N 5 4 9 0 0 E
> >> > 6 35 F N 5 8 9 12 7 NR
> >> > 7 38 F A 5 NA 10 2 9 SD
> >> > 8 44 M A 4 4 10 0 0 E
> >> > 9 47 M A 4 5 8 2 7 SD
> >> > 10 53 F A 5 3 10 0 0 E
> >> > 11 41 F N 5 6 7 0 0 E
> >> > 12 49 F A 4 6 8 0 0 E
> >> > 13 48 F A 5 4 8 0 0 E
> >> > 14 63 M N 4 6 9 15 9 NR
> >> > 15 58 M N 5 9 7 2 8 SD
> >> > 16 53 F A 4 3 9 0 0 E
> >> > 17 47 F N 5 4 8 1 4 SD
> >> > 18 34 F A NA  5 9 0 0 E
> >> > 19 53 F N 5 4 9 5 7 NR
> >> > 20 45 F N 5 5 8 5 4 SD
> >> > 21 30 F A 5 3 8 0 0 E
> >> > 22 29 F A 4 5 9 0 0 E
> >> > 23 49 F N 5 9 10 0 0 E
> >> > 24 24 F A 5 5 9 0 0 E
> >> > 25 63 F N 4 19 7 10 7 NR
> >> > 26 62 F A 5 8 9 11 9 NR
> >> > 27 44 F A 5 3 10 0 0 E
> >> > 28 38 F N 4 8 10 1 3 SD
> >> > 29 38 F N 5 3 10 0 0 E
> >> >
> >> > How do I do a binomial distribution z statistics with continuity
> >> > correction? basically normal approximation.
> >> > Could anyone give me some suggestions what I (or R) can do with these
> >> > data?
> >> > I have tried tried histogram, maybe t-test? or even lattice?  what
> else
> >> > can
> >> > I(or can R) do?
> >> > help please, thanks so much.
> >> >
> >> >[[alternativ

[R] Searching for the first two characters of a string

2011-11-20 Thread aclinard
Hi,

I'm trying to search the data in a column for the first two characters, and
return all rows that match the search criteria. 

Any suggestions?

Thanx

--
View this message in context: 
http://r.789695.n4.nabble.com/Searching-for-the-first-two-characters-of-a-string-tp4089862p4089862.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rJava and multicore

2011-11-20 Thread James McCreight
Hello MasteRs-

Because I want to parallelize several calls to the glmulti package, what
I'm essentially doing is trying to parallelize different calls to rJava.
I'm using plyr functions which use foreach and then doMC which means
multicore is my backend for parallelizing.

I've tried several approaches to this but have not succeeded, i also find
virtually no record of folks trying this or any mention of how it might
work. I'm hoping my failure comes from ignorance of how to call .jinit with
the appropriate parameters at the right time. A simple example call

all.glmultis <- llply( some.list, function.which.calls.glmulti,
.paralllel=TRUE )

If i set .parallel=FALSE, the call completes and I'm able to merge the
results using a glmulti command. However, when parallelizing, it never
returns (even though it appears to run the separate processes
simultaneously and finish the heavy computation).

Can I expect to get this to work or not? If so, what are the basics?

Thanks for any help,

James

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Setting hostname in the .Renvironment

2011-11-20 Thread Jonathan Greenberg
This is a follow-up to a question I asked a few years back.  We have a pair
of computers that share a common home directory (and therefor a common
.Renviron) with identical hardware, but very different sets of libraries
such that using a "shared" R_LIBS between two computers does not work.
 They both use this as the default library for user installations:

'~/R/x86_64-pc-linux-gnu-library/2.13'

I would like to mod the .Renviron in such a way so

'~/R/hostname-1/x86_64-pc-linux-gnu-library/2.13'
'~/R/hostname-2/x86_64-pc-linux-gnu-library/2.13'

How can I modify my R_LIBS_USER in the .Renviron to match this?  The
hostname would need to be "dynamically" set based on which computer I log
into, but I tried something like this:

R_LIBS_USER="~/lib/R/library/"$HOSTNAME"_%p_%a_%o_R%V"

and R did not resolve the $HOSTNAME environment variable even though I can
(from bash) echo $HOSTNAME and it return the correct name.

Thoughts?  Thanks!

--j


-- 
Jonathan A. Greenberg, PhD
Assistant Professor
Department of Geography
University of Illinois at Urbana-Champaign
607 South Mathews Avenue, MC 150
Urbana, IL 61801
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007
http://www.geog.illinois.edu/people/JonathanGreenberg.html

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help with table() and apply()

2011-11-20 Thread Stuart Luppescu
On 日, 2011-11-20 at 17:43 -0500, jim holtman wrote:
> It might be good if you told us the problem you are trying to solve.
> Why do you have factors in the dataframe?  Can you just have the
> values?  Do you want to count the 'levels' of the factors in a row, or
> do you want to count the numeric they represent (in your case it is
> the same, so I wonder why the factor).
> 
> Here is one way of doing it to count what the 'level' values are:
> 
> > apply(df, 1, function(x) tabulate(as.integer(x), nbins = 4))
>  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> [1,]232212122 2
> [2,]147365011 2
> [3,]311421655 3
> [4,]420112322 3
> >
> 
> So tell us what you want to do, not how you want to do it.

I see. The reason I turned the original numeric into factors with 4
levels is so table() would tell me when I had 0 counts of some factor
levels. Your method works very well, and will save me the extra step of
converting to factors. Also, thanks for the explanation on cbind()
converting to numerics. I appreciate the help.

> 
> 2011/11/20 jim holtman :
> > The answer to your question as to why you had to convert back to
> > factors is that you "undid" the factors when you did the 'cbind' to
> > create the dataframe.  Here is what you should have done:
> >
> >> df <- data.frame(rating.1 , rating.2 , rating.3 , rating.4 ,
> > +  rating.5 , rating.6 , rating.7 , rating.8 ,
> > +  rating.9 , rating.10)
> >>
> >> str(df)
> > 'data.frame':   10 obs. of  10 variables:
> >  $ rating.1 : Factor w/ 4 levels "1","2","3","4": 4 1 2 4 3 2 4 1 2 1
> >  $ rating.2 : Factor w/ 4 levels "1","2","3","4": 2 3 2 3 2 2 1 3 3 3
> >  $ rating.3 : Factor w/ 4 levels "1","2","3","4": 3 1 1 3 2 1 3 3 1 3
> >  $ rating.4 : Factor w/ 4 levels "1","2","3","4": 4 2 2 2 2 4 3 3 3 4
> >  $ rating.5 : Factor w/ 4 levels "1","2","3","4": 1 2 2 2 1 2 3 3 4 4
> >  $ rating.6 : Factor w/ 4 levels "1","2","3","4": 3 2 2 1 2 2 3 3 3 2
> >  $ rating.7 : Factor w/ 4 levels "1","2","3","4": 3 4 2 2 4 3 4 4 4 4
> >  $ rating.8 : Factor w/ 4 levels "1","2","3","4": 4 1 3 1 3 1 4 4 3 3
> >  $ rating.9 : Factor w/ 4 levels "1","2","3","4": 4 4 2 3 2 4 3 2 3 2
> >  $ rating.10: Factor w/ 4 levels "1","2","3","4": 1 2 1 3 2 2 3 1 1 1
> >
> > Notice that the factors are maintained.
> >
> > When having problems, break up the steps and see what happens at each
> > one.  Here is the output of your 'cbind':
> >
> >> x <- (cbind(rating.1 , rating.2 , rating.3 , rating.4 ,
> > +  rating.5 , rating.6 , rating.7 , rating.8 ,
> > +  rating.9 , rating.10)
> > + )
> >> str(x)
> >  int [1:10, 1:10] 4 1 2 4 3 2 4 1 2 1 ...
> >  - attr(*, "dimnames")=List of 2
> >  ..$ : NULL
> >  ..$ : chr [1:10] "rating.1" "rating.2" "rating.3" "rating.4" ...
> >>
> >
> > notice it is just an integer array.
> >
> > Also if you had looked at the HELP page, you would have seen:
> >
> > In the default method, all the vectors/matrices must be atomic (see
> > vector) or lists. Expressions are not allowed. Language objects (such
> > as formulae and calls) and pairlists will be coerced to lists: other
> > objects (such as names and external pointers) will be included as
> > elements in a list result. Any classes the inputs might have are
> > discarded (in particular, factors are replaced by their internal
> > codes).
> >
> > Notice the last sentence.
> >
> > 2011/11/20 Stuart Luppescu :
> >> Hello, I am having trouble getting counts of values in rows of a data
> >> frame. I'm trying to use apply, but it's not working.
> >>
> >> This gives a sample of the kind of data I'm working with:
> >>
> >> rating.1 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
> >> rating.2 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
> >> rating.3 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
> >> rating.4 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
> >> rating.5 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
> >> rating.6 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
> >> rating.7 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
> >> rating.8 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
> >> rating.9 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
> >> rating.10 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
> >>
> >> df <- as.data.frame(cbind(rating.1 , rating.2 , rating.3 , rating.4 ,
> >>  rating.5 , rating.6 , rating.7 , rating.8 ,
> >>  rating.9 , rating.10))
> >>
> >> for(i in 1:10) {
> >>  df[,i] <- factor(df[,i], levels=1:4)
> >> }
> >>
> >> [Aside: why does the original df have columns of class "integer" when
> >> the original data are factors? Why is it necessary to reconvert them
> >> into factors? Also, is it possible to d

Re: [R] Need help with table() and apply()

2011-11-20 Thread jim holtman
It might be good if you told us the problem you are trying to solve.
Why do you have factors in the dataframe?  Can you just have the
values?  Do you want to count the 'levels' of the factors in a row, or
do you want to count the numeric they represent (in your case it is
the same, so I wonder why the factor).

Here is one way of doing it to count what the 'level' values are:

> apply(df, 1, function(x) tabulate(as.integer(x), nbins = 4))
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]232212122 2
[2,]147365011 2
[3,]311421655 3
[4,]420112322 3
>

So tell us what you want to do, not how you want to do it.

2011/11/20 jim holtman :
> The answer to your question as to why you had to convert back to
> factors is that you "undid" the factors when you did the 'cbind' to
> create the dataframe.  Here is what you should have done:
>
>> df <- data.frame(rating.1 , rating.2 , rating.3 , rating.4 ,
> +                          rating.5 , rating.6 , rating.7 , rating.8 ,
> +                          rating.9 , rating.10)
>>
>> str(df)
> 'data.frame':   10 obs. of  10 variables:
>  $ rating.1 : Factor w/ 4 levels "1","2","3","4": 4 1 2 4 3 2 4 1 2 1
>  $ rating.2 : Factor w/ 4 levels "1","2","3","4": 2 3 2 3 2 2 1 3 3 3
>  $ rating.3 : Factor w/ 4 levels "1","2","3","4": 3 1 1 3 2 1 3 3 1 3
>  $ rating.4 : Factor w/ 4 levels "1","2","3","4": 4 2 2 2 2 4 3 3 3 4
>  $ rating.5 : Factor w/ 4 levels "1","2","3","4": 1 2 2 2 1 2 3 3 4 4
>  $ rating.6 : Factor w/ 4 levels "1","2","3","4": 3 2 2 1 2 2 3 3 3 2
>  $ rating.7 : Factor w/ 4 levels "1","2","3","4": 3 4 2 2 4 3 4 4 4 4
>  $ rating.8 : Factor w/ 4 levels "1","2","3","4": 4 1 3 1 3 1 4 4 3 3
>  $ rating.9 : Factor w/ 4 levels "1","2","3","4": 4 4 2 3 2 4 3 2 3 2
>  $ rating.10: Factor w/ 4 levels "1","2","3","4": 1 2 1 3 2 2 3 1 1 1
>
> Notice that the factors are maintained.
>
> When having problems, break up the steps and see what happens at each
> one.  Here is the output of your 'cbind':
>
>> x <- (cbind(rating.1 , rating.2 , rating.3 , rating.4 ,
> +                          rating.5 , rating.6 , rating.7 , rating.8 ,
> +                          rating.9 , rating.10)
> + )
>> str(x)
>  int [1:10, 1:10] 4 1 2 4 3 2 4 1 2 1 ...
>  - attr(*, "dimnames")=List of 2
>  ..$ : NULL
>  ..$ : chr [1:10] "rating.1" "rating.2" "rating.3" "rating.4" ...
>>
>
> notice it is just an integer array.
>
> Also if you had looked at the HELP page, you would have seen:
>
> In the default method, all the vectors/matrices must be atomic (see
> vector) or lists. Expressions are not allowed. Language objects (such
> as formulae and calls) and pairlists will be coerced to lists: other
> objects (such as names and external pointers) will be included as
> elements in a list result. Any classes the inputs might have are
> discarded (in particular, factors are replaced by their internal
> codes).
>
> Notice the last sentence.
>
> 2011/11/20 Stuart Luppescu :
>> Hello, I am having trouble getting counts of values in rows of a data
>> frame. I'm trying to use apply, but it's not working.
>>
>> This gives a sample of the kind of data I'm working with:
>>
>> rating.1 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
>> rating.2 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
>> rating.3 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
>> rating.4 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
>> rating.5 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
>> rating.6 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
>> rating.7 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
>> rating.8 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
>> rating.9 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
>> rating.10 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
>>
>> df <- as.data.frame(cbind(rating.1 , rating.2 , rating.3 , rating.4 ,
>>                          rating.5 , rating.6 , rating.7 , rating.8 ,
>>                          rating.9 , rating.10))
>>
>> for(i in 1:10) {
>>  df[,i] <- factor(df[,i], levels=1:4)
>> }
>>
>> [Aside: why does the original df have columns of class "integer" when
>> the original data are factors? Why is it necessary to reconvert them
>> into factors? Also, is it possible to do this without a for loop?]
>>
>> If I do this:
>>
>> apply(df[,1:10], 1, table)
>>
>> I get a 4x10 array, the contents of which I do not understand.
>>
>> apply(df[,1:10], 2, table)
>>
>> gives 10 tables for the columns, but it leaves out factor levels which
>> do not occur. For example,
>>
>>  rating.6 : 'table' int [1:3(1d)] 7 1 2
>>  ..- attr(*, "dimnames")=List of 1
>>  .. ..$ : chr [1:3] "1" "2" "3"
>>
>> lapply(df[, 1:10], table)
>>
>> gives tables of the columns keeping the levels with 0 counts:
>>
>> $ rating.6 : 'table' int [1:4(1d)] 7 1 2 0
>>  ..- attr(*, 

Re: [R] Need help with table() and apply()

2011-11-20 Thread jim holtman
The answer to your question as to why you had to convert back to
factors is that you "undid" the factors when you did the 'cbind' to
create the dataframe.  Here is what you should have done:

> df <- data.frame(rating.1 , rating.2 , rating.3 , rating.4 ,
+  rating.5 , rating.6 , rating.7 , rating.8 ,
+  rating.9 , rating.10)
>
> str(df)
'data.frame':   10 obs. of  10 variables:
 $ rating.1 : Factor w/ 4 levels "1","2","3","4": 4 1 2 4 3 2 4 1 2 1
 $ rating.2 : Factor w/ 4 levels "1","2","3","4": 2 3 2 3 2 2 1 3 3 3
 $ rating.3 : Factor w/ 4 levels "1","2","3","4": 3 1 1 3 2 1 3 3 1 3
 $ rating.4 : Factor w/ 4 levels "1","2","3","4": 4 2 2 2 2 4 3 3 3 4
 $ rating.5 : Factor w/ 4 levels "1","2","3","4": 1 2 2 2 1 2 3 3 4 4
 $ rating.6 : Factor w/ 4 levels "1","2","3","4": 3 2 2 1 2 2 3 3 3 2
 $ rating.7 : Factor w/ 4 levels "1","2","3","4": 3 4 2 2 4 3 4 4 4 4
 $ rating.8 : Factor w/ 4 levels "1","2","3","4": 4 1 3 1 3 1 4 4 3 3
 $ rating.9 : Factor w/ 4 levels "1","2","3","4": 4 4 2 3 2 4 3 2 3 2
 $ rating.10: Factor w/ 4 levels "1","2","3","4": 1 2 1 3 2 2 3 1 1 1

Notice that the factors are maintained.

When having problems, break up the steps and see what happens at each
one.  Here is the output of your 'cbind':

> x <- (cbind(rating.1 , rating.2 , rating.3 , rating.4 ,
+  rating.5 , rating.6 , rating.7 , rating.8 ,
+  rating.9 , rating.10)
+ )
> str(x)
 int [1:10, 1:10] 4 1 2 4 3 2 4 1 2 1 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:10] "rating.1" "rating.2" "rating.3" "rating.4" ...
>

notice it is just an integer array.

Also if you had looked at the HELP page, you would have seen:

In the default method, all the vectors/matrices must be atomic (see
vector) or lists. Expressions are not allowed. Language objects (such
as formulae and calls) and pairlists will be coerced to lists: other
objects (such as names and external pointers) will be included as
elements in a list result. Any classes the inputs might have are
discarded (in particular, factors are replaced by their internal
codes).

Notice the last sentence.

2011/11/20 Stuart Luppescu :
> Hello, I am having trouble getting counts of values in rows of a data
> frame. I'm trying to use apply, but it's not working.
>
> This gives a sample of the kind of data I'm working with:
>
> rating.1 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
> rating.2 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
> rating.3 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
> rating.4 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
> rating.5 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
> rating.6 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
> rating.7 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
> rating.8 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
> rating.9 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
> rating.10 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
>
> df <- as.data.frame(cbind(rating.1 , rating.2 , rating.3 , rating.4 ,
>  rating.5 , rating.6 , rating.7 , rating.8 ,
>  rating.9 , rating.10))
>
> for(i in 1:10) {
>  df[,i] <- factor(df[,i], levels=1:4)
> }
>
> [Aside: why does the original df have columns of class "integer" when
> the original data are factors? Why is it necessary to reconvert them
> into factors? Also, is it possible to do this without a for loop?]
>
> If I do this:
>
> apply(df[,1:10], 1, table)
>
> I get a 4x10 array, the contents of which I do not understand.
>
> apply(df[,1:10], 2, table)
>
> gives 10 tables for the columns, but it leaves out factor levels which
> do not occur. For example,
>
>  rating.6 : 'table' int [1:3(1d)] 7 1 2
>  ..- attr(*, "dimnames")=List of 1
>  .. ..$ : chr [1:3] "1" "2" "3"
>
> lapply(df[, 1:10], table)
>
> gives tables of the columns keeping the levels with 0 counts:
>
> $ rating.6 : 'table' int [1:4(1d)] 7 1 2 0
>  ..- attr(*, "dimnames")=List of 1
>  .. ..$ : chr [1:4] "1" "2" "3" "4"
>
> But I really want tables of the rows. Do I have to write my own function
> to count the numbers of values?
>
> Thanks in advance.
>
> --
> Stuart Luppescu -=- slu .at. ccsr.uchicago.edu
> University of Chicago -=- CCSR
> 才文と智奈美の父  -=- Kernel 3.0.6-gentoo
>  You say yourself it wasn't reproducible. So it could have been anything
> that "crashed" your R, cosmic radiation, a bolt of lightning reversing a
> bit in your computer memory, ... :-) -- Martin Maechler (replying to a
> bug report) R-devel (July 2005)
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

[R] Need help with table() and apply()

2011-11-20 Thread Stuart Luppescu
Hello, I am having trouble getting counts of values in rows of a data
frame. I'm trying to use apply, but it's not working.

This gives a sample of the kind of data I'm working with:

rating.1 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
rating.2 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
rating.3 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
rating.4 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
rating.5 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
rating.6 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
rating.7 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
rating.8 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
rating.9 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
rating.10 <- factor(sample(1:3, size=10, replace=T), levels=1:4)

df <- as.data.frame(cbind(rating.1 , rating.2 , rating.3 , rating.4 ,
  rating.5 , rating.6 , rating.7 , rating.8 ,
  rating.9 , rating.10))

for(i in 1:10) {
  df[,i] <- factor(df[,i], levels=1:4)
}

[Aside: why does the original df have columns of class "integer" when
the original data are factors? Why is it necessary to reconvert them
into factors? Also, is it possible to do this without a for loop?]

If I do this:

apply(df[,1:10], 1, table)

I get a 4x10 array, the contents of which I do not understand.

apply(df[,1:10], 2, table)

gives 10 tables for the columns, but it leaves out factor levels which
do not occur. For example,

 rating.6 : 'table' int [1:3(1d)] 7 1 2
  ..- attr(*, "dimnames")=List of 1
  .. ..$ : chr [1:3] "1" "2" "3"

lapply(df[, 1:10], table)

gives tables of the columns keeping the levels with 0 counts:

$ rating.6 : 'table' int [1:4(1d)] 7 1 2 0
  ..- attr(*, "dimnames")=List of 1
  .. ..$ : chr [1:4] "1" "2" "3" "4"

But I really want tables of the rows. Do I have to write my own function
to count the numbers of values?

Thanks in advance.

-- 
Stuart Luppescu -=- slu .at. ccsr.uchicago.edu 
University of Chicago -=- CCSR 
才文と智奈美の父  -=- Kernel 3.0.6-gentoo
 You say yourself it wasn't reproducible. So it could have been anything
that "crashed" your R, cosmic radiation, a bolt of lightning reversing a
bit in your computer memory, ... :-) -- Martin Maechler (replying to a
bug report) R-devel (July 2005)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using GIS data in R

2011-11-20 Thread Rolf Turner

On 21/11/11 10:19, Zuofeng Shang wrote:

Hi Rolf,

Thanks very much for suggestions! Sorry if I made my question not very 
clear.


The context is as follows. I have a set of locations with coordinates 
(latitude and longitude)
from Texas region. Some of the locations are within Texas while some 
of them are not.
I am trying to find those locations within Texas by looking at if the 
coordinate of each location

is within the boundary of Texas.

Now I agree with you that the boundary of Texas is polygon although it 
looks like "irregular".
I find your previous post very helpful. inside.owin() will do the job 
but I have to figure out
the boundary (as a polygon) of Texas. Since I do not have the 
shapefile of Texas, this will be
difficult for me. I am not sure if the shapefile of the US states are 
available in R. If it not,

how can I get the shapefiles? Thanks a lot for kind help!


This is not my area of expertise (if indeed I have such an area at all! 
:-) ) but
I'm sure that maps of the states of the USA --- including Texas, even! 
--- are readily available
from various sources.  I.e. from various GIS's or perhaps from the 
"maps" package.


With a bit of luck someone on the list will chime in with something more 
specific.


Or you could try asking "how/where can I get a shapefile specifying the 
boundary

of Texas" on the R-Sig-Geo list.

HTH

cheers,

Rolf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using GIS data in R

2011-11-20 Thread Rolf Turner

On 21/11/11 08:01, JeffND wrote:

Hi Rolf,

I have a similar question. I want to test whether a point with certain
coordinates is inside
a state, say Texas. It seems that inside.owin() only works for testing if a
point lies in a
regular region, say a polygon. Since Texas has irregular boundary, how do we
achieve this?
Or there is some alternative way we can use.

Thanks!
Jeff

--
View this message in context: 
http://r.789695.n4.nabble.com/Using-GIS-data-in-R-tp1748266p4089242.html
Sent from the R help mailing list archive at Nabble.com.


It would be nice to have some context ... nabble strikes again.

But to get to your question:  How do you think the boundary of Texas
(irregular as it may be) is specified?  I expect that it is specified as
a polygon!!!  It may be *very* ``poly'' --- i.e. have a large number of
edges --- but it's still a polygon and inside.owin() will still work on it.

Have a look at the data set "nbfires" in the spatstat package.  New
Brunswick has an ``irregular boundary'' too!

Have you *tried* making your specification of Texas into an owin object?
Follow the instructions in the vignette given by:

vignette("shapefiles")

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] I'm writing this letter to enquire where can I download the package of "lmtest".

2011-11-20 Thread Sophy
Dear editor:
   I'm writing this letter to enquire where can I download the package of 
"lmtest". Can you send me this package?

THanks a lot.
Best regards,
Shu-Fei Wu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] I'm writing this letter to enquire where can I download the package of "lmtest".

2011-11-20 Thread R. Michael Weylandt
http://cran.r-project.org/web/packages/lmtest/index.html

Or automatically within R by typing:

install.packages("lmtest")

Michael

On Sun, Nov 20, 2011 at 1:18 PM, Sophy <100...@mail.tku.edu.tw> wrote:
> Dear editor:
>   I'm writing this letter to enquire where can I download the package of 
> "lmtest". Can you send me this package?
>
> THanks a lot.
> Best regards,
> Shu-Fei Wu
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using GIS data in R

2011-11-20 Thread JeffND
Hi Rolf,

I have a similar question. I want to test whether a point with certain
coordinates is inside 
a state, say Texas. It seems that inside.owin() only works for testing if a
point lies in a
regular region, say a polygon. Since Texas has irregular boundary, how do we
achieve this?
Or there is some alternative way we can use.

Thanks!
Jeff  

--
View this message in context: 
http://r.789695.n4.nabble.com/Using-GIS-data-in-R-tp1748266p4089242.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Transformations

2011-11-20 Thread David Winsemius


On Nov 20, 2011, at 11:16 AM, d.s91 wrote:

Hi, I'm having some trouble with R code for one of my assignment  
questions on
transforms. I've never used R for transformations so I'm not sure  
where to

being. Our prof did give us some partial R code.par(mfrow=c(2,2))
plot(X[,2],e)
plot(X[,3],e)
plot(X[,4],e)
plot(X[,5],e)

Any help ??
3. Try the following transformations on the response variable:
(a) ˜ y = ey
(b) ˜ y = log(y)
(c) ˜ y = 1/y
(d) ˜ y = 1/y2
(e) ˜ y = 1/√y
Carry out the regression ˜ y = β0 + β1 x1 + β2 x2 + β3 x3 + β4  
x4 + ϵ for

each case,
including the residual analysis



This mailing list specifically discourages homework questions, but  
especially discourages ones where the student remains anonymous and  
show not evidence of prior effort beyond typing the question at their  
keyboard.



--
View this message in context: 
http://r.789695.n4.nabble.com/Transformations-tp4088908p4088908.html
Sent from the R help mailing list archive at Nabble.com.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression by glm

2011-11-20 Thread 屠鞠传礼
Thank you Ligges :)
one more question:
my response value "diagnostic" have 4 levels (0, 1, 2 and 3), so I use it like 
this:
"as.factor(diagnostic) ~ as.factor(7161521) +as.factor(2281517) "
Is it all right?
 



ÔÚ 2011-11-20 23:45:23£¬"Uwe Ligges"  дµÀ£º
>
>
>On 20.11.2011 12:46, tujchl wrote:
>> HI
>>
>> I use glm in R to do logistic regression. and treat both response and
>> predictor as factor
>> In my first try:
>>
>> ***
>> Call:
>> glm(formula = as.factor(diagnostic) ~ as.factor(7161521) +
>> as.factor(2281517), family = binomial())
>>
>> Deviance Residuals:
>> Min 1Q Median 3Q Max
>> -1.5370 -1.0431 -0.9416 1.3065 1.4331
>>
>> Coefficients:
>> Estimate Std. Error z value Pr(>|z|)
>> (Intercept) -0.58363 0.27948 -2.088 0.0368 *
>> as.factor(7161521)2 1.39811 0.66618 2.099 0.0358 *
>> as.factor(7161521)3 0.28192 0.83255 0.339 0.7349
>> as.factor(2281517)2 -1.11284 0.63692 -1.747 0.0806 .
>> as.factor(2281517)3 -0.02286 0.80708 -0.028 0.9774
>> ---
>> Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1
>>
>> (Dispersion parameter for binomial family taken to be 1)
>>
>> Null deviance: 678.55 on 498 degrees of freedom
>> Residual deviance: 671.20 on 494 degrees of freedom
>> AIC: 681.2
>>
>> Number of Fisher Scoring iterations: 4
>> ***
>>
>> And I remodel it and *want no intercept*:
>> ***
>> Call:
>> glm(formula = as.factor(diagnostic) ~ as.factor(2281517) +
>> as.factor(7161521) - 1, family = binomial())
>>
>> Deviance Residuals:
>> Min 1Q Median 3Q Max
>> -1.5370 -1.0431 -0.9416 1.3065 1.4331
>>
>> Coefficients:
>> Estimate Std. Error z value Pr(>|z|)
>> as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 *
>> as.factor(2281517)2 -1.6965 0.6751 -2.513 0.0120 *
>> as.factor(2281517)3 -0.6065 0.8325 -0.728 0.4663
>> as.factor(7161521)2 1.3981 0.6662 2.099 0.0358 *
>> as.factor(7161521)3 0.2819 0.8325 0.339 0.7349
>> ---
>> Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1
>>
>> (Dispersion parameter for binomial family taken to be 1)
>>
>> Null deviance: 691.76 on 499 degrees of freedom
>> Residual deviance: 671.20 on 494 degrees of freedom
>> AIC: 681.2
>>
>> Number of Fisher Scoring iterations: 4
>> ***
>>
>> *As show above in my second model it return no intercept but look this:
>> Model1:
>> (Intercept) -0.58363 0.27948 -2.088 0.0368 *
>> Model2:
>> as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 **
>>
>> They are exactly the same. Could you please tell me what happen?
>
>Actually it does not make sense to estimate the model without an 
>intercept unless you assume that it is exactly zero for the first levels 
>of your factors. Think about the contrasts you are interested in. Looks 
>like not the default?
>
>Uwe Ligges
>
>
>>
>> Thank you in advance
>>
>>
>> --
>> View this message in context: 
>> http://r.789695.n4.nabble.com/logistic-regression-by-glm-tp4088471p4088471.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression by glm

2011-11-20 Thread 屠鞠传礼
I worried it too, Do you have idear that what tools I can use?





ÔÚ 2011-11-21 00:13:26£¬"Uwe Ligges"  дµÀ£º
>
>
>On 20.11.2011 16:58, ÍÀ¾Ï´«Àñ wrote:
>> Thank you Ligges :)
>> one more question:
>> my response value "diagnostic" have 4 levels (0, 1, 2 and 3), so I use it 
>> like this:
>> "as.factor(diagnostic) ~ as.factor(7161521) +as.factor(2281517)"
>> Is it all right?
>
>
>Uhh. 4 levels? Than I doubt logistic regression is the right tool for 
>you. Please revisit the theory first: It is intended for 2 levels...
>
>
>Uwe Ligges
>
>
>
>
>
>>
>>
>>
>>
>> ÔÚ 2011-11-20 23:45:23£¬"Uwe Ligges"  дµÀ£º
>>>
>>>
>>> On 20.11.2011 12:46, tujchl wrote:
 HI

 I use glm in R to do logistic regression. and treat both response and
 predictor as factor
 In my first try:

 ***
 Call:
 glm(formula = as.factor(diagnostic) ~ as.factor(7161521) +
 as.factor(2281517), family = binomial())

 Deviance Residuals:
 Min 1Q Median 3Q Max
 -1.5370 -1.0431 -0.9416 1.3065 1.4331

 Coefficients:
 Estimate Std. Error z value Pr(>|z|)
 (Intercept) -0.58363 0.27948 -2.088 0.0368 *
 as.factor(7161521)2 1.39811 0.66618 2.099 0.0358 *
 as.factor(7161521)3 0.28192 0.83255 0.339 0.7349
 as.factor(2281517)2 -1.11284 0.63692 -1.747 0.0806 .
 as.factor(2281517)3 -0.02286 0.80708 -0.028 0.9774
 ---
 Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1

 (Dispersion parameter for binomial family taken to be 1)

 Null deviance: 678.55 on 498 degrees of freedom
 Residual deviance: 671.20 on 494 degrees of freedom
 AIC: 681.2

 Number of Fisher Scoring iterations: 4
 ***

 And I remodel it and *want no intercept*:
 ***
 Call:
 glm(formula = as.factor(diagnostic) ~ as.factor(2281517) +
 as.factor(7161521) - 1, family = binomial())

 Deviance Residuals:
 Min 1Q Median 3Q Max
 -1.5370 -1.0431 -0.9416 1.3065 1.4331

 Coefficients:
 Estimate Std. Error z value Pr(>|z|)
 as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 *
 as.factor(2281517)2 -1.6965 0.6751 -2.513 0.0120 *
 as.factor(2281517)3 -0.6065 0.8325 -0.728 0.4663
 as.factor(7161521)2 1.3981 0.6662 2.099 0.0358 *
 as.factor(7161521)3 0.2819 0.8325 0.339 0.7349
 ---
 Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1

 (Dispersion parameter for binomial family taken to be 1)

 Null deviance: 691.76 on 499 degrees of freedom
 Residual deviance: 671.20 on 494 degrees of freedom
 AIC: 681.2

 Number of Fisher Scoring iterations: 4
 ***

 *As show above in my second model it return no intercept but look this:
 Model1:
 (Intercept) -0.58363 0.27948 -2.088 0.0368 *
 Model2:
 as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 **

 They are exactly the same. Could you please tell me what happen?
>>>
>>> Actually it does not make sense to estimate the model without an
>>> intercept unless you assume that it is exactly zero for the first levels
>>> of your factors. Think about the contrasts you are interested in. Looks
>>> like not the default?
>>>
>>> Uwe Ligges
>>>
>>>

 Thank you in advance


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/logistic-regression-by-glm-tp4088471p4088471.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Transformations

2011-11-20 Thread d.s91
Hi, I'm having some trouble with R code for one of my assignment questions on
transforms. I've never used R for transformations so I'm not sure where to
being. Our prof did give us some partial R code.par(mfrow=c(2,2)) 
plot(X[,2],e) 
plot(X[,3],e) 
plot(X[,4],e) 
plot(X[,5],e) 

 Any help ?? 
3. Try the following transformations on the response variable: 
(a) ˜ y = ey 
(b) ˜ y = log(y) 
(c) ˜ y = 1/y 
(d) ˜ y = 1/y2 
(e) ˜ y = 1/√y 
Carry out the regression ˜ y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + ϵ for
each case, 
including the residual analysis

--
View this message in context: 
http://r.789695.n4.nabble.com/Transformations-tp4088908p4088908.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression by glm

2011-11-20 Thread Uwe Ligges



On 20.11.2011 17:27, 屠鞠传礼 wrote:

I worried it too, Do you have idear that what tools I can use?



Depends on your aims - what you want to do with the fitted model.
A multinomial model, some kind of discriminant analysis (lda, qda), tree 
based methods, svm and so son come to mind. You probably want to discuss 
this on some statistics mailing list/forum or among local experts rather 
than on the R list. Since this is actually not that R releated.


Uwe Ligges








在 2011-11-21 00:13:26,"Uwe Ligges"  写道:



On 20.11.2011 16:58, 屠鞠传礼 wrote:

Thank you Ligges :)
one more question:
my response value "diagnostic" have 4 levels (0, 1, 2 and 3), so I use it like 
this:
"as.factor(diagnostic) ~ as.factor(7161521) +as.factor(2281517)"
Is it all right?



Uhh. 4 levels? Than I doubt logistic regression is the right tool for
you. Please revisit the theory first: It is intended for 2 levels...


Uwe Ligges










在 2011-11-20 23:45:23,"Uwe Ligges"   写道:



On 20.11.2011 12:46, tujchl wrote:

HI

I use glm in R to do logistic regression. and treat both response and
predictor as factor
In my first try:

***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(7161521) +
as.factor(2281517), family = binomial())

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5370 -1.0431 -0.9416 1.3065 1.4331

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
as.factor(7161521)2 1.39811 0.66618 2.099 0.0358 *
as.factor(7161521)3 0.28192 0.83255 0.339 0.7349
as.factor(2281517)2 -1.11284 0.63692 -1.747 0.0806 .
as.factor(2281517)3 -0.02286 0.80708 -0.028 0.9774
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 678.55 on 498 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

And I remodel it and *want no intercept*:
***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(2281517) +
as.factor(7161521) - 1, family = binomial())

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5370 -1.0431 -0.9416 1.3065 1.4331

Coefficients:
Estimate Std. Error z value Pr(>|z|)
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 *
as.factor(2281517)2 -1.6965 0.6751 -2.513 0.0120 *
as.factor(2281517)3 -0.6065 0.8325 -0.728 0.4663
as.factor(7161521)2 1.3981 0.6662 2.099 0.0358 *
as.factor(7161521)3 0.2819 0.8325 0.339 0.7349
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 691.76 on 499 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

*As show above in my second model it return no intercept but look this:
Model1:
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
Model2:
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 **

They are exactly the same. Could you please tell me what happen?


Actually it does not make sense to estimate the model without an
intercept unless you assume that it is exactly zero for the first levels
of your factors. Think about the contrasts you are interested in. Looks
like not the default?

Uwe Ligges




Thank you in advance


--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-regression-by-glm-tp4088471p4088471.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression by glm

2011-11-20 Thread Uwe Ligges



On 20.11.2011 16:58, 屠鞠传礼 wrote:

Thank you Ligges :)
one more question:
my response value "diagnostic" have 4 levels (0, 1, 2 and 3), so I use it like 
this:
"as.factor(diagnostic) ~ as.factor(7161521) +as.factor(2281517)"
Is it all right?



Uhh. 4 levels? Than I doubt logistic regression is the right tool for 
you. Please revisit the theory first: It is intended for 2 levels...



Uwe Ligges










在 2011-11-20 23:45:23,"Uwe Ligges"  写道:



On 20.11.2011 12:46, tujchl wrote:

HI

I use glm in R to do logistic regression. and treat both response and
predictor as factor
In my first try:

***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(7161521) +
as.factor(2281517), family = binomial())

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5370 -1.0431 -0.9416 1.3065 1.4331

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
as.factor(7161521)2 1.39811 0.66618 2.099 0.0358 *
as.factor(7161521)3 0.28192 0.83255 0.339 0.7349
as.factor(2281517)2 -1.11284 0.63692 -1.747 0.0806 .
as.factor(2281517)3 -0.02286 0.80708 -0.028 0.9774
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 678.55 on 498 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

And I remodel it and *want no intercept*:
***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(2281517) +
as.factor(7161521) - 1, family = binomial())

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5370 -1.0431 -0.9416 1.3065 1.4331

Coefficients:
Estimate Std. Error z value Pr(>|z|)
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 *
as.factor(2281517)2 -1.6965 0.6751 -2.513 0.0120 *
as.factor(2281517)3 -0.6065 0.8325 -0.728 0.4663
as.factor(7161521)2 1.3981 0.6662 2.099 0.0358 *
as.factor(7161521)3 0.2819 0.8325 0.339 0.7349
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 691.76 on 499 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

*As show above in my second model it return no intercept but look this:
Model1:
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
Model2:
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 **

They are exactly the same. Could you please tell me what happen?


Actually it does not make sense to estimate the model without an
intercept unless you assume that it is exactly zero for the first levels
of your factors. Think about the contrasts you are interested in. Looks
like not the default?

Uwe Ligges




Thank you in advance


--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-regression-by-glm-tp4088471p4088471.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] place values into a matrix efficiently?

2011-11-20 Thread Carl Witthoft

Excellent.  That's what I was looking for.
Thanks, Rolf.

Carl

On 11/19/11 11:16 PM, Rolf Turner wrote:

On 20/11/11 16:27, Carl Witthoft wrote:

This question attacked me as I was thinking about matrix value updates.

I probably will never need to do this, but wanted to ask if there are
efficient methods to perform the for-loop in the following sequence.

%xymat<-matrix(rep(0,100) nr=10,nc=10) # empty matrix
%x<-1:10
%y<-sample.int(10,10,rep=T)

%for (j in 1:10) xymat[x[j],y[j]] <- some_function(x[j],y[j]) #to
create either false-color or 3D map .
plot(0:1,0:1,t='n')
% rasterImage(xymat/max(xymat),0,0,1,1,interp=F)

This will produce a raster image of the original data(x vs y) that
looks like plot(x,y) .

Anyway, I just seem to be blanking: is there some vectorized way to
place values, or even a constant value, into all elements of xymat
whose row,col coordinates match ordered pairs in x and y ?


Yes. Use ``xymat[m]'' where m is a 2-column matrix of row and column
indices.

E.g.:

xymat <- matrix(0,nrow=10,ncol=10)
x <- 1:10
y <- sample.int(10,10,rep=TRUE)
set.seed(42)
for (j in 1:10) xymat[x[j],y[j]] <- runif(1)

gorp <- matrix(0,nrow=10,ncol=10)
set.seed(42)
gorp[cbind(x,y)] <- runif(10)
all.equal(xymat,gorp) # TRUE!!!

cheers,

Rolf Turner



--

Sent from my Cray XK6
"Pendeo-navem mei anguillae plena est."

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ANOVA and LSD TEST - contrasting results

2011-11-20 Thread Roberto
I didn't meant to rewrite textbook. I asked only for a clarification to my
doubt, because I have started to study statistic recently.

--
View this message in context: 
http://r.789695.n4.nabble.com/ANOVA-and-LSD-TEST-contrasting-results-tp4086090p4088810.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create variable names using paste

2011-11-20 Thread Uwe Ligges



On 20.11.2011 15:32, Alaios wrote:

Dear all,

I would like based on a variable values, like threshold<- 90 to create 
variables name that will load data sets

For example

paste('mystring',threshold,sep="")<-load('myfile')


This is a FAQ!
You probably want to assign into a list. But if you really want to do 
the above, see ?assign


Uwe Ligges



How I can doi something like that in R?


B.R
Alex
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression by glm

2011-11-20 Thread Uwe Ligges



On 20.11.2011 12:46, tujchl wrote:

HI

I use glm in R to do logistic regression. and treat both response and
predictor as factor
In my first try:

***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(7161521) +
as.factor(2281517), family = binomial())

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5370 -1.0431 -0.9416 1.3065 1.4331

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
as.factor(7161521)2 1.39811 0.66618 2.099 0.0358 *
as.factor(7161521)3 0.28192 0.83255 0.339 0.7349
as.factor(2281517)2 -1.11284 0.63692 -1.747 0.0806 .
as.factor(2281517)3 -0.02286 0.80708 -0.028 0.9774
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 678.55 on 498 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

And I remodel it and *want no intercept*:
***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(2281517) +
as.factor(7161521) - 1, family = binomial())

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5370 -1.0431 -0.9416 1.3065 1.4331

Coefficients:
Estimate Std. Error z value Pr(>|z|)
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 *
as.factor(2281517)2 -1.6965 0.6751 -2.513 0.0120 *
as.factor(2281517)3 -0.6065 0.8325 -0.728 0.4663
as.factor(7161521)2 1.3981 0.6662 2.099 0.0358 *
as.factor(7161521)3 0.2819 0.8325 0.339 0.7349
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 691.76 on 499 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

*As show above in my second model it return no intercept but look this:
Model1:
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
Model2:
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 **

They are exactly the same. Could you please tell me what happen?


Actually it does not make sense to estimate the model without an 
intercept unless you assume that it is exactly zero for the first levels 
of your factors. Think about the contrasts you are interested in. Looks 
like not the default?


Uwe Ligges




Thank you in advance


--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-regression-by-glm-tp4088471p4088471.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] alpha_1 + beta_1 >1 in GARCH(1,1)

2011-11-20 Thread Patrick Burns

This would have been a more appropriate
question on r-sig-finance.

The short answer is: no, you can't trust
the coefficients.

You don't say how much data you have, but
this situation is common when you don't
have much data (meaning fewer than 2000
daily observations).  The sum of those
two parameters is saying how fast shocks
dissipate (in the variance).  If it looks
like there is a trend in volatility over
the in-sample period, then a reasonable
answer given that information is that the
shocks do not dissipate -- meaning the sum
of the parameters is greater than 1.

On 20/11/2011 10:25, user84 wrote:

Hi,

as i suppose to know in a stationary GARCH(1,1) model the sum of alpha and
beta has to be smaller than 1.
But if i use the garchfit() function from the package fGarch for my
timeseries the sum is bigger than 1.
The adf.test tells me a p-value smaller than 0.01 instead.
What does this mean for me?

Can i trust in the coefficients in this case?

mfg user84

--
View this message in context: 
http://r.789695.n4.nabble.com/alpha-1-beta-1-1-in-GARCH-1-1-tp4088342p4088342.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ANOVA and LSD TEST - contrasting results

2011-11-20 Thread peter dalgaard

On Nov 19, 2011, at 15:37 , Roberto wrote:

> First of all, thank you for the reply to my topic.
> 
> Anova summary has shown no significative difference into Treatment (Pr
> 0.374), otherwise LSD Test has shown difference between 2 Treatment.
> I suppose that's something strange.
> 

It isn't. You need to read up on "familywise error rate" and its connection to 
LSD and Tukey HSD tests (Google gets you there soon enough; I doubt people in 
here will rewrite textbook material just for you).

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] create variable names using paste

2011-11-20 Thread Alaios
Dear all,

I would like based on a variable values, like threshold<- 90 to create 
variables name that will load data sets

For example

paste('mystring',threshold,sep="")<-load('myfile')

How I can doi something like that in R?


B.R
Alex
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ltm: Simplified approach to bootstrapping 2PL-Models?

2011-11-20 Thread Dimitris Rizopoulos
Note that you can also fit 2PL models using the gpcm() function that 
fits the Generalized Partial Credit Model, and therefore you can use the 
GoF.gpcm() function for goodness-of-fit, e.g.,


library(ltm)

fit2PL.1 <- ltm(LSAT ~ z1)
fit2PL.2 <- gpcm(LSAT)

fit2PL.1
fit2PL.2

GoF.gpcm(fit2PL.2, B = 199)


I hope it helps.

Best,
Dimitris


On 11/20/2011 1:16 PM, Christian Montel wrote:

Dear R-List,

to assess the model fit for 2PL-models, I tried to mimic the
bootstrap-approach chosen in the GoF.rasch()-function. Not being a
statistician, I was wondering whether the following simplification
(omit the "chi-squared-expressed model fit-step") would be appropriate:

GoF.ltm<- function(object, B = 50, ...){
   liFits<- list()
   for(i in 1:B){
 rndDat<- rmvlogis(nrow(object$X), coef(object))
 liFits[[i]]<- ltm(rndDat ~ z1)
   }
   distr<- sort(sapply(liFits, function(x)return(x$log.Lik)))
   return(max(which(distr<= object$log.Lik))/length(distr))
}

The rationale behind was to directly use the sorted sequence of
(log)likelihoods of models fitted to 2PL-fitting-datasets where the
parameters of  hold. The return value was intented to roughly
mirror how many 2PL-Datasets which demonstrably fit the model fit
worse or better than the model in question.

Any comments which may help me figure out whether I'm on the right
track are greatly appreciated.

Thank you in advance,
best regards,
Christian


Dr. Christian Montel
E-Mail: christian.mon...@eligo.de
eligo GmbH -- Büro Berlin   
Tel.:   +49 (0) 30 695 399 95-2
Arndtstr. 34
Fax:+49 (0) 30 695 399 95-1
10965 Berlin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data analysis: normal approximation for binomial

2011-11-20 Thread John Kane
Hi Colin,

I'm no statistician and it's been a very long time but IIRC a t-test is a 
'modified version of a x-test that is used on small sample sizes.  (I can hear 
some of our statistians screaming in the background as I type.)

In any case I thing a Z distribution is descrete and a standard normal is not 
so a user can use Yates continuity correction to interpolate values for the 
normal between the discrete z-values.  Or something like this.  

I have only encountered it once in a Psych stats course taught by an animal 
geneticist who seemed to think it was important. To be honest, it looked pretty 
trivial for the type of data I'd be likely to see. 

I cannot remember ever seeing a continuity correction used in a published 
paper--for that matter I have trouble remembering a z-test.  

If you want more information on the subject I found a very tiny bit of info at 
http://books.google.ca/books?id=SiJ2UB3dv9UC&pg=PA139&lpg=PA139&dq=z-test+with+continuity+correction&source=bl&ots=0vMTCUZWXx&sig=bfCPx0vynGjA0tHLRAf6B42x0mM&hl=en&ei=nQHJTo7LPIrf0gHxs6Aq&sa=X&oi=book_result&ct=result&resnum=2&ved=0CC0Q6AEwAQ#v=onepage&q=z-test%20with%20continuity%20correction&f=false

A print source that, IIRC, has a discussion of this is "Hayes, W. (1981. 
Statistics. 3rd Ed., Holt Rinehart and Winston

Have fun

--- On Sat, 11/19/11, Colstat  wrote:

> From: Colstat 
> Subject: [R] Data analysis: normal approximation for binomial
> To: r-help@r-project.org
> Received: Saturday, November 19, 2011, 6:01 PM
> Dear R experts,
> 
> I am trying to analyze data from an article, the data looks
> like this
> 
> Patient Age Sex Aura preCSM preFreq preIntensity postFreq
> postIntensity
> postOutcome
> 1 47 F A 4 6 9 2 8 SD
> 2 40 F A/N 5 8 9 0 0 E
> 3 49 M N 5 8 9 2 6 SD
> 4 40 F A 5 3 10 0 0 E
> 5 42 F N 5 4 9 0 0 E
> 6 35 F N 5 8 9 12 7 NR
> 7 38 F A 5 NA 10 2 9 SD
> 8 44 M A 4 4 10 0 0 E
> 9 47 M A 4 5 8 2 7 SD
> 10 53 F A 5 3 10 0 0 E
> 11 41 F N 5 6 7 0 0 E
> 12 49 F A 4 6 8 0 0 E
> 13 48 F A 5 4 8 0 0 E
> 14 63 M N 4 6 9 15 9 NR
> 15 58 M N 5 9 7 2 8 SD
> 16 53 F A 4 3 9 0 0 E
> 17 47 F N 5 4 8 1 4 SD
> 18 34 F A NA  5 9 0 0 E
> 19 53 F N 5 4 9 5 7 NR
> 20 45 F N 5 5 8 5 4 SD
> 21 30 F A 5 3 8 0 0 E
> 22 29 F A 4 5 9 0 0 E
> 23 49 F N 5 9 10 0 0 E
> 24 24 F A 5 5 9 0 0 E
> 25 63 F N 4 19 7 10 7 NR
> 26 62 F A 5 8 9 11 9 NR
> 27 44 F A 5 3 10 0 0 E
> 28 38 F N 4 8 10 1 3 SD
> 29 38 F N 5 3 10 0 0 E
> 
> How do I do a binomial distribution z statistics with
> continuity
> correction? basically normal approximation.
> Could anyone give me some suggestions what I (or R) can do
> with these data?
> I have tried tried histogram, maybe t-test? or even
> lattice?  what else can
> I(or can R) do?
> help please, thanks so much.
> 
>     [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] logistic regression by glm

2011-11-20 Thread tujchl
HI 

I use glm in R to do logistic regression. and treat both response and
predictor as factor
In my first try:

***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(7161521) + 
as.factor(2281517), family = binomial())

Deviance Residuals: 
Min 1Q Median 3Q Max 
-1.5370 -1.0431 -0.9416 1.3065 1.4331 

Coefficients:
Estimate Std. Error z value Pr(>|z|) 
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
as.factor(7161521)2 1.39811 0.66618 2.099 0.0358 *
as.factor(7161521)3 0.28192 0.83255 0.339 0.7349 
as.factor(2281517)2 -1.11284 0.63692 -1.747 0.0806 .
as.factor(2281517)3 -0.02286 0.80708 -0.028 0.9774 
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 678.55 on 498 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

And I remodel it and *want no intercept*:
***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(2281517) + 
as.factor(7161521) - 1, family = binomial())

Deviance Residuals: 
Min 1Q Median 3Q Max 
-1.5370 -1.0431 -0.9416 1.3065 1.4331 

Coefficients:
Estimate Std. Error z value Pr(>|z|) 
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 *
as.factor(2281517)2 -1.6965 0.6751 -2.513 0.0120 *
as.factor(2281517)3 -0.6065 0.8325 -0.728 0.4663 
as.factor(7161521)2 1.3981 0.6662 2.099 0.0358 *
as.factor(7161521)3 0.2819 0.8325 0.339 0.7349 
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 691.76 on 499 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

*As show above in my second model it return no intercept but look this:
Model1:
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
Model2:
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 **

They are exactly the same. Could you please tell me what happen?

Thank you in advance 


--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-regression-by-glm-tp4088471p4088471.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ltm: Simplified approach to bootstrapping 2PL-Models?

2011-11-20 Thread Christian Montel
Dear R-List, 

to assess the model fit for 2PL-models, I tried to mimic the
bootstrap-approach chosen in the GoF.rasch()-function. Not being a
statistician, I was wondering whether the following simplification
(omit the "chi-squared-expressed model fit-step") would be appropriate:

GoF.ltm <- function(object, B = 50, ...){
  liFits <- list()
  for(i in 1:B){
rndDat <- rmvlogis(nrow(object$X), coef(object))
liFits[[i]] <- ltm(rndDat ~ z1)
  }
  distr <- sort(sapply(liFits, function(x)return(x$log.Lik)))
  return(max(which(distr <= object$log.Lik))/length(distr))
}

The rationale behind was to directly use the sorted sequence of
(log)likelihoods of models fitted to 2PL-fitting-datasets where the
parameters of  hold. The return value was intented to roughly
mirror how many 2PL-Datasets which demonstrably fit the model fit
worse or better than the model in question.

Any comments which may help me figure out whether I'm on the right
track are greatly appreciated.

Thank you in advance,
best regards, 
Christian


Dr. Christian Montel
E-Mail: christian.mon...@eligo.de
eligo GmbH -- Büro Berlin   
Tel.:   +49 (0) 30 695 399 95-2
Arndtstr. 34
Fax:+49 (0) 30 695 399 95-1
10965 Berlin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ltm: Simplified approach to bootstrapping 2PL-Models?

2011-11-20 Thread Christian Montel
Dear R-List, 

to assess the model fit for 2PL-models, I tried to mimic the
bootstrap-approach chosen in the GoF.rasch()-function. Not being a
statistician, I was wondering whether the following simplification
(omit the "chi-squared-expressed model fit-step") would be appropriate:

GoF.ltm <- function(object, B = 50, ...){
  liFits <- list()
  for(i in 1:B){
rndDat <- rmvlogis(nrow(object$X), coef(object))
liFits[[i]] <- ltm(rndDat ~ z1)
  }
  distr <- sort(sapply(liFits, function(x)return(x$log.Lik)))
  return(max(which(distr <= object$log.Lik))/length(distr))
}

fit <- ltm(Abortion ~ z1, control = list(GHk = 20, iter.em = 20))
GoF.ltm(fit)

The rationale behind was to directly use the sorted sequence of
(log)likelihoods of models fitted to 2PL-fitting-datasets where the
parameters of  hold. The return value was intented to roughly
mirror how many 2PL-Datasets which demonstrably fit the model fit
worse or better than the model in question.

Any comments which may help me figure out whether I'm on the right
track are greatly appreciated.

Thank you, best regards, 
Christian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] anova to pca

2011-11-20 Thread Giovanni Azua
Hello,

I would like to reinforce my anova results using PCA i.e. which factor are most 
important because they explain most of the variance (i.e. signal) of my 2^k*r 
experiment. However, I get the following error while trying to run PCA:

> throughput.prcomp <- 
> prcomp(~No_databases+Partitioning+No_middlewares+Queue_size,data=throughput)
Error in prcomp.formula(~No_databases + Partitioning + No_middlewares +  : 
  PCA applies only to numerical variables

What is the most R-like concise way to map/transform those factor values into 
numerical values in a suitable way for PCA analysis? My first attempt would be:

# C++ "style"
throughput$No_databases_num <- (throughput$No_databases == 1) ? -1 : 1 
throughput$Partitioning_num <- (throughput$Partitioning == "sharding") ? -1 : 1 
etc.
How can I do this in the R way?

Would these -1, 1 be sensible for a PCA analysis or it just doesn't matter? How 
about a factor for which I have 3 levels? -1, 0 and 1? 

Many thanks in advance,
Best regards,
Giovanni
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] alpha_1 + beta_1 >1 in GARCH(1,1)

2011-11-20 Thread user84
Hi,

as i suppose to know in a stationary GARCH(1,1) model the sum of alpha and
beta has to be smaller than 1.
But if i use the garchfit() function from the package fGarch for my
timeseries the sum is bigger than 1.
The adf.test tells me a p-value smaller than 0.01 instead.
What does this mean for me?

Can i trust in the coefficients in this case?

mfg user84

--
View this message in context: 
http://r.789695.n4.nabble.com/alpha-1-beta-1-1-in-GARCH-1-1-tp4088342p4088342.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.