Re: [R] Common elements in columns

2012-09-02 Thread arun
Hi,
May be this might help:

set.seed(1)
df1<-data.frame(C1=sample(LETTERS[1:25],20,replace=FALSE),value=sample(50,20,replace=FALSE))
set.seed(15)
df2<-data.frame(C1=sample(LETTERS[1:25],15,replace=FALSE),C2=1:15)
set.seed(3)
df3<-data.frame(C1=sample(LETTERS[1:10],10,replace=FALSE),B1=rnorm(10,3))
set.seed(5)
df4<-data.frame(C1=sample(LETTERS[1:15],10,replace=FALSE),A2=rnorm(10,15))
df1$C1[df1$C1%in%df2$C1[df2$C1%in%df3$C1[df3$C1%in%df4$C1]]]
#[1] G E H J
A.K.



- Original Message -
From: Silvano Cesar da Costa 
To: r-help@r-project.org
Cc: 
Sent: Sunday, September 2, 2012 7:05 PM
Subject: [R] Common elements in columns

Hi,

I have 4 files with 1 individuals in each file and 10 columns each.
One of the columns, say C1, may have elements in common with the other
columns C1 of other files.

If I have only 2 files, I can do this check with the command:

data1[data1 %in% data2]
data2[data2 %in% data1]

How do I check which common elements in the columns of C1 4 files?

Thanks,

-
Silvano Cesar da Costa

Universidade Estadual de Londrina
Centro de Ciências Exatas
Departamento de Estatística

Fone: (43) 3371-4346

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splits with 0s in middle columns

2012-09-02 Thread arun

Hi,
It's working for me.
Try this:
dat1<-read.csv("test.csv")
dat2<-na.omit(dat1)
 
nrow(dat1)
#[1] 635
 nrow(dat2)
#[1] 627

B<-gsub(" slope{0,1}\\s\\((.*)\\)","\\1",dat2$Composition_percent_part)
fun1<-function(x){
C<-gsub("(.*)/(.*)","\\1 0 0 \\2", gsub("(.*)/(.*)/(.*)","\\1 \\2 0 \\3", 
gsub("(.*)/(.*)/(.*)/(.*)","\\1 \\2 \\3 \\4",x)))
res<-data.frame(do.call(rbind,strsplit(C," ")))
colnames(res)<-paste("A",1:4,sep="")
res
}
fun1(B)
head(fun1(B),15)
 #  A1 A2 A3 A4
#1  60 25  0 15
#2  60 25  0 15
#3  40 35 15 10
#4  40 35 15 10
#5  40 35 15 10
#6  40 35 15 10
#7  40 35 15 10
#8  50 40  0 10
#9  50 40  0 10
#10 50 40  0 10
#11 50 40  0 10
#12 50 30  0 20
#13 50 30  0 20
#14 50 30  0 20
#15 50 30  0 20

#or,
fun1<-function(x){
B<-gsub(" slope{0,1}\\s\\((.*)\\)","\\1",x)
C<-gsub("(.*)/(.*)","\\1 0 0 \\2", gsub("(.*)/(.*)/(.*)","\\1 \\2 0 \\3", 
gsub("(.*)/(.*)/(.*)/(.*)","\\1 \\2 \\3 \\4",B)))
res<-data.frame(do.call(rbind,strsplit(C," ")))
colnames(res)<-paste("A",1:4,sep="")
res
}
fun1(dat2$Composition_percent_part)    


 head(fun1(dat2$Composition_percent_part),5)
#  A1 A2 A3 A4
#1 60 25  0 15
#2 60 25  0 15
#3 40 35 15 10
#4 40 35 15 10
#5 40 35 15 10



A.K.





From: Sapana Lohani 
To: arun  
Sent: Sunday, September 2, 2012 8:39 PM
Subject: Re: [R] splits with 0s in middle columns


Hi Arun,

I do not know whats wrong with 
my data, so am sending you the whole column I wanted to split. Could you please 
have a look and suggest me the error? I ma totally stuck at this point of my 
analysis




From: arun 
To: Rui Barradas  
Cc: Sapana Lohani ; R help  
Sent: Sunday, September 2, 2012 1:54 PM
Subject: Re: [R] splits with 0s in middle columns

HI,
You can also try this as a function:
fun1<-function(x){
B<-gsub("percent{0,1}\\s\\((.*)\\)","\\1",x)
C<-gsub("(.*)/(.*)","\\1 0 0 \\2", gsub("(.*)/(.*)/(.*)","\\1 \\2 0 \\3", 
gsub("(.*)/(.*)/(.*)/(.*)","\\1 \\2 \\3 \\4",B)))
dat1<-data.frame(do.call(rbind,strsplit(C," ")))
colnames(dat1)<-paste0("A",1:4)
dat1
}
 fun1(A)
#  A1 A2 A3 A4
#1 10 20  0 30
#2 40  0  0 20
#3 60 10 10  5
#4 80  0  0 10
A.K.




- Original Message -
From: Rui Barradas 
To: Sapana Lohani 
Cc: r-help 
Sent: Sunday,
September 2, 2012 1:05 PM
Subject: Re: [R] splits with 0s in middle columns

Hello,

You don't need a new function, what you need is to prepare your data in 
such a way that the function can process it.


A <- c("percent (10/20/30)", "percent (40/20)", "percent (60/10/10/5)",
"percent (80/10)")
B <- gsub("\\(|\\)|percent| ", "", A)
fun(B)

Also, please use dput to post the data examples,

dput(A)
c("percent (10/20/30)", "percent (40/20)", "percent (60/10/10/5)",
"percent (80/10)")

Then copy&paste in your post.

Rui Barradas

Em 02-09-2012 04:22, Sapana Lohani escreveu:
> Dear Rui,
>
> The new code works fine for what I wanted. I have another similar column but 
> it looks like
>
> A
> percent (10/20/30)
> percent (40/20)
> percent (60/10/10/5)
> percent (80/10)
>
> I want a similar split but delete the percent
in the front. The output should look like
>
> A1 A2 A3 A4
> 10 20 0 30
> 40 0 0 20
> 60 10 10 5
> 80 0 0 10
>
> Could you please make the small change in the code that you gave me. It must 
> be a small edition but I could not figure that out. FYI the code that worked 
> was
>
> fun <- function(X){
>       xname <- deparse(substitute(X))
>       s <- strsplit(X, "/")
>       n <- max(sapply(s, length))
>       tmp <- numeric(n)
>
>       f <- function(x){
>           x <- as.numeric(x)
>           m <- length(x)
>           if(m > 1){
>               tmp[n] <- x[m]
>   
           tmp[seq_len(m - 1)] <- x[seq_len(m - 1)]
>           }else tmp[1] <- x
>           tmp
>       }
>
>       res <- do.call(rbind, lapply(s, f))
>       colnames(res) <- paste(xname, seq_along(s), sep = "")
>       data.frame(res)
> }
>
> fun(A)
>
> Thank you so very much.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,
reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Coxph not converging with continuous variable

2012-09-02 Thread Keil, Alex
The coxph function in R is not working for me when I use a continuous predictor 
in the model. Specifically, it fails to converge, even when bumping up the 
number of max iterations or setting reasonable initial values. The estimated 
Hazard ratio from the model is incorrect (verified by an AFT model). I've 
isolated it to the "x1" variable in the example below, which is log-normally 
distributed. The x1 here has extreme values, but I've been able to reproduce 
the problem intermittently with less extreme values. It seemed odd that I 
couldn't find this question asked anywhere, so I'm wondering if I'm just not 
seeing a mistake I've made.

Any help is appreciated to get this model to converge. Code, output, 
sessionInfo follows signature.

Alex Keil
UNC Chapel Hill


Here's an example:

library(survival)
set.seed(8182828)
#data
N = 10
shape = 0.75
hr = 3.5
betax <- -log(hr)/(shape)
ctime = 5

z1 <- rbinom(N, 1, 0.5)
z2 <- rbinom(N, 1, 0.5)
x1 <- exp(rnorm(N, 0 + 2*z1, 1))
wscale1 <- exp(0 + betax*x1 + z1 + z2 )
time1 <- rweibull(N, shape, wscale1)
t1 <- pmin(time1, rep(ctime, N))
d1 <- 1*(t1 == time1)
dat <- as.data.frame(cbind(t1, time1, d1, z1, z2, x1))
 summary(dat)

#output
> summary(dat)
   t1   time1   d1   z1z2   
x1  
 Min.   :0.00   Min.   : 0.   Min.   :0.   Min.   :0.0   Min.   
:0.   Min.   :  0.0147  
 1st Qu.:0.04   1st Qu.: 0.   1st Qu.:1.   1st Qu.:0.0   1st 
Qu.:0.   1st Qu.:  0.9552  
 Median :0.008400   Median : 0.0084   Median :1.   Median :0.5   Median 
:0.   Median :  2.7052  
 Mean   :0.295397   Mean   : 0.3260   Mean   :0.9906   Mean   :0.5   Mean   
:0.4997   Mean   :  6.8948  
 3rd Qu.:0.178325   3rd Qu.: 0.1783   3rd Qu.:1.   3rd Qu.:1.0   3rd 
Qu.:1.   3rd Qu.:  7.7409  
 Max.   :5.00   Max.   :52.8990   Max.   :1.   Max.   :1.0   Max.   
:1.   Max.   :431.2779 

> exp((cox1 <- coxph(Surv(t1, d1)~  x1 + z1+ z2, ties="breslow"))[[1]]) #hrs
   x1z1z2 
3.3782387 0.4925040 0.4850214 
Warning message:
In fitter(X, Y, strats, offset, init, control, weights = weights,  :
  Ran out of iterations and did not converge


#accelerated failure time model confirms estimate is off from coxph

> aft1 <- survreg(Surv(t1, d1)~ x1 + z1 + z2, dist="weibull"); coefs <- 
> aft1$coefficients; scale <- aft1$scale
>   data.frame(psi = -coefs[2], scale, hr=exp(-coefs[2]*(1/scale))) #hr, 
> scale is "weibull shape" in sas
psiscale   hr
x1 1.670406 1.91 3.499958
#end output

> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods   base  
   

other attached packages:
[1] survival_2.36-14

loaded via a namespace (and not attached):
[1] timereg_1.6-5
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a newbie seeking for a simple problem

2012-09-02 Thread Daniel Malter
As df2 has only one column and is thus effectively a variable in this case,
you don't even need to merge.

df1[df1$gene_name%in%df2$gene_name , ] 

should do.

HTH,
Daniel


wong, honkit (Stephen) wrote
> 
> Dear Experienced R users,
> 
> I have a looks-like simple but complicated problem urgently needed to be
> solved. Below is the detail:
> 
> I have two dataframes, df1, df2. df1 contains two column and many
> thousands rows: column 1 is a "gene_name", column 2 is "value". df2
> contains only one column which is "gene_name" with couple hundred rows. I
> want to change "value" of df2 for those "gene_name" also appear in df2
> "gene_name". How to do that? Millions thanks.
> 
> 
> Ste
> 
> __
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 




--
View this message in context: 
http://r.789695.n4.nabble.com/a-newbie-seeking-for-a-simple-problem-tp4642029p4642031.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R suitability for development project

2012-09-02 Thread Eric Langley
Hello All,

Eric Langley here with my first post to this list. I am looking to
determine if R is suitable for a development project I am working on
and if so possibly finding someone proficient in R that would be
interested in doing the coding.

I would like to preface my inquiry that while I am not a programmer I
can communicate in a dialog my objectives.

An array of rank ordered data looks like this:
Item-Rank  First  Second  Third  Fourth  Totals
Item1 6   8   0   0  14
Item2 7   5   2   0  14
Item3 1   1  11  1  14
Item4 0   0   1   1314
Totals14  1414  14

The required output of R will be two fold;

1, a numerical score for each of the Items (1-4) from highest to
lowest and lowest to highest on a scale of 0-99 that is statistically
accurate. For this example the scores would be Item1 highest number
down to Item4 with the lowest number. In reverse Item4 would be the
highest number down to Item1 the lowest number. For the Highest like
this; Item1=94, Item2=88, Item3=48, Item4=2 (just guessing here on the
scores...:)

2, a graphical output of the data based on the scores in three special
graphs with a middle line at '0' and increasing numbers to the left
AND right. The graphs plot the Highest ranked Items, the Lowest Ranked
items and a combination of the two.
Sample graphs are here: http://community.abeo.us/sample-graphs/

Looking forward to hearing if R will be able to accomplish this.

TYIA,

~eric

--
Eric Langley
Founder
abeo

e...@abeo.us
404-326-5382

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [newbie] scripting remote check for R package

2012-09-02 Thread Daróczi Gergely
I would call something like this via ssh (please note: I used "ggplot2" in
the example):

Rscript -e
'as.numeric(suppressWarnings(suppressPackageStartupMessages(require(ggplot2'

You could easily extend this to loop through the required packages and
tweak the output for your needs.

Best,
Gergely

On Sun, Sep 2, 2012 at 10:42 PM, Tom Roche  wrote:

>
> summary: e.g., how to replace ''
> in the following:
>
> for RSERVER in 'foo' 'bar' 'baz' ; do
>   ssh ${RSERVER} ''
> done
>
> or is there a better way to script checking for an R package?
>
> details:
>
> For my work I need a few non-standard R packages. I do that work on 2
> clusters. One uses environment modules, so any given server on that
> cluster can provide (via `module add`) pretty much the same resources.
>
> The other cluster, on which I must also unfortunately work, has
> servers managed individually, so I get in the habit of doing, e.g.,
>
> EXEC_NAME='whatever'
> for S in 'foo' 'bar' 'baz' ; do
>   ssh ${RSERVER} "${EXEC_NAME} --version"
> done
>
> periodically. I'm wondering, what incantation to utter (bash preferred)
> via ssh to query a given server's R for a given package?
>
> TIA, Tom Roche 
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help on finding specific columns in matrix

2012-09-02 Thread arun


HI Andras,
No problem.  
But, i guess using colMeans() as suggested by Rainer should be faster in large 
datasets.

A.K.

- Original Message -
From: Andras Farkas 
To: smartpink...@yahoo.com
Cc: 
Sent: Sunday, September 2, 2012 4:59 PM
Subject: Re: [R] Help on finding specific columns in  matrix




this works nicely, thanks arun...

Andras

--
On Sun, Sep 2, 2012 11:50 AM EDT arun wrote:

>Hi,
>Try this:
> which.min(apply(a,2,mean))
>#[1] 1
> which.max(apply(a,2,mean))
>#[1] 4
>A.K.
>
>
>
>
>- Original Message -
>From: Andras Farkas 
>To: "r-help@r-project.org" 
>Cc: 
>Sent: Sunday, September 2, 2012 5:49 AM
>Subject: [R] Help on finding specific columns in  matrix
>
>Dear All,
> 
>I have a matrix with 33 columns and 5000 rows. I would like to find 2 specific 
>columns in the set: the one that holds the highest values and the one that 
>holds the lowest values. In this case the column's mean would be apropriate to 
>use to try to find those specific columns because each columns mean is 
>different and they all change together based on the same "change of rate 
>constants" as a function of time (ie: tme is the most important determinant of 
>the magnitude of that mean).  The columns are not to be named, if possible 
>Any thoughts on that? Would apreciate the help
> 
>example:
> 
>a 
><-matrix(c(runif(500,10,15),runif(500,15,20),runif(500,20,25),runif(500,25,30)),ncol=4)
> 
>I would need to find and plot with a box percentile plot column 1, the column 
>with the lowest mean, and column 4, the column with the highest mean
> 
>thanks,
> 
>Andras 
>    [[alternative HTML version deleted]]
>
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splits with 0s in middle columns

2012-09-02 Thread arun
HI,
You can also try this as a function:
fun1<-function(x){
B<-gsub("percent{0,1}\\s\\((.*)\\)","\\1",x)
C<-gsub("(.*)/(.*)","\\1 0 0 \\2", gsub("(.*)/(.*)/(.*)","\\1 \\2 0 \\3", 
gsub("(.*)/(.*)/(.*)/(.*)","\\1 \\2 \\3 \\4",B)))
dat1<-data.frame(do.call(rbind,strsplit(C," ")))
colnames(dat1)<-paste0("A",1:4)
dat1
}
 fun1(A)
#  A1 A2 A3 A4
#1 10 20  0 30
#2 40  0  0 20
#3 60 10 10  5
#4 80  0  0 10
A.K.




- Original Message -
From: Rui Barradas 
To: Sapana Lohani 
Cc: r-help 
Sent: Sunday, September 2, 2012 1:05 PM
Subject: Re: [R] splits with 0s in middle columns

Hello,

You don't need a new function, what you need is to prepare your data in 
such a way that the function can process it.


A <- c("percent (10/20/30)", "percent (40/20)", "percent (60/10/10/5)",
"percent (80/10)")
B <- gsub("\\(|\\)|percent| ", "", A)
fun(B)

Also, please use dput to post the data examples,

dput(A)
c("percent (10/20/30)", "percent (40/20)", "percent (60/10/10/5)",
"percent (80/10)")

Then copy&paste in your post.

Rui Barradas

Em 02-09-2012 04:22, Sapana Lohani escreveu:
> Dear Rui,
>
> The new code works fine for what I wanted. I have another similar column but 
> it looks like
>
> A
> percent (10/20/30)
> percent (40/20)
> percent (60/10/10/5)
> percent (80/10)
>
> I want a similar split but delete the percent in the front. The output should 
> look like
>
> A1 A2 A3 A4
> 10 20 0 30
> 40 0 0 20
> 60 10 10 5
> 80 0 0 10
>
> Could you please make the small change in the code that you gave me. It must 
> be a small edition but I could not figure that out. FYI the code that worked 
> was
>
> fun <- function(X){
>       xname <- deparse(substitute(X))
>       s <- strsplit(X, "/")
>       n <- max(sapply(s, length))
>       tmp <- numeric(n)
>
>       f <- function(x){
>           x <- as.numeric(x)
>           m <- length(x)
>           if(m > 1){
>               tmp[n] <- x[m]
>               tmp[seq_len(m - 1)] <- x[seq_len(m - 1)]
>           }else tmp[1] <- x
>           tmp
>       }
>
>       res <- do.call(rbind, lapply(s, f))
>       colnames(res) <- paste(xname, seq_along(s), sep = "")
>       data.frame(res)
> }
>
> fun(A)
>
> Thank you so very much.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Common elements in columns

2012-09-02 Thread Silvano Cesar da Costa
Hi,

I have 4 files with 1 individuals in each file and 10 columns each.
One of the columns, say C1, may have elements in common with the other
columns C1 of other files.

If I have only 2 files, I can do this check with the command:

data1[data1 %in% data2]
data2[data2 %in% data1]

How do I check which common elements in the columns of C1 4 files?

Thanks,

-
Silvano Cesar da Costa

Universidade Estadual de Londrina
Centro de Ciências Exatas
Departamento de Estatística

Fone: (43) 3371-4346

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Skew-Normal CDF using psn

2012-09-02 Thread Boris Beranger
Dear R-users,

I have been using the code below in order to verify how the CDF of a
skew-normal distribution was calculated:

library(sn)

s=seq(-30,30,by=0.1)
a<-matrix(nrow=length(s),ncol=5)
lambda=1

for(i in 1:length(s)){
a[i,1]=pnorm(s[i],mean=0,sd=1);
a[i,2]=T.Owen(s[i],lambda);
a[i,3]=a[i,5]-2*a[i,6];
a[i,4]=pnorm(s[i])-2*T.Owen(s[i],lambda);
a[i,5]=psn(s[i],shape=lambda);
}


>From the literature I was expecting the column 3, 4 and 5 to give me the
exact same results but it actually doesn't. There seem to be some
approximations for small values of x. Does anyone know where does this come
from?

Any help would be greatly appreciated,

Cheers,
Boris

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splits with 0s in middle columns

2012-09-02 Thread Rui Barradas

Hello,

You should Cc the list, there are others presenting solutions.
What's going on should be obvious, your data example had "percent" in 
it, and your data file has "slope"!

How could you expect it to work?
Just in case, I'm changing the regular expression to removing everything 
but bars and digits.


dat <- read.csv("test.csv")
B <- gsub("[^/[:digit:]]+", "", dat$Composition_percent_part)

Rui Barradas

Em 03-09-2012 01:38, Sapana Lohani escreveu:

Hello Rui,


I do not know whats wrong with my data, so am sending you the whole column I 
wanted to split. Could you please have a look and suggest me the error? I ma 
totally stuck at this point of my analysis




  From: Rui Barradas 
To: Sapana Lohani 
Cc: r-help 
Sent: Sunday, September 2, 2012 10:05 AM
Subject: Re: [R] splits with 0s in middle columns
  
Hello,


You don't need a new function, what you need is to prepare your data in
such a way that the function can process it.


A <- c("percent (10/20/30)", "percent (40/20)", "percent (60/10/10/5)",
"percent (80/10)")
B <- gsub("\\(|\\)|percent| ", "", A)
fun(B)

Also, please use dput to post the data examples,

dput(A)
c("percent (10/20/30)", "percent (40/20)", "percent (60/10/10/5)",
"percent (80/10)")

Then copy&paste in your post.

Rui Barradas

Em 02-09-2012 04:22, Sapana Lohani escreveu:

Dear Rui,

The new code works fine for what I wanted. I have another similar column but it 
looks like

A
percent (10/20/30)
percent (40/20)
percent (60/10/10/5)
percent (80/10)

I want a similar split but delete the percent in the front. The output should 
look like

A1 A2 A3 A4
10 20 0 30
40 0 0 20
60 10 10 5
80 0 0 10

Could you please make the small change in the code that you gave me. It must be 
a small edition but I could not figure that out. FYI the code that worked was

fun <- function(X){
xname <- deparse(substitute(X))
s <- strsplit(X, "/")
n <- max(sapply(s, length))
tmp <- numeric(n)

f <- function(x){
x <- as.numeric(x)
m <- length(x)
if(m > 1){
tmp[n] <- x[m]
tmp[seq_len(m - 1)] <- x[seq_len(m - 1)]
}else tmp[1] <- x
tmp
}

res <- do.call(rbind, lapply(s, f))
colnames(res) <- paste(xname, seq_along(s), sep = "")
data.frame(res)
}

fun(A)

Thank you so very much.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Impact of cex changing as a function of mfrow

2012-09-02 Thread Fisher Dennis
Duncan

I are getting closer to an understanding.  
1.  "index" is not relevant to the discussion.  i am focused solely on 
margin text
2.  the size pf the margin text does not change between pages
3.  however, the location does change, moreso on the top / right than 
on the left / bottom.

I rewrote the code to better illustrate the problem:
###
FINDCEX <- function()
{
CORRECT <- 1
MFROW   <- par()$mfrow
MFCOL   <- par()$mfcol
TEST<- all(MFROW == MFCOL)
if (TEST && MFROW == c(2,2))CORRECT <- 1 / 
0.83
if (TEST && (MFROW[1] >= 3 | MFROW[2] >= 3))CORRECT <- 1 / 0.66
if (!TEST)  cat("MFROW does not equal MFCOL\n")
cat(MFROW, CORRECT, "\n")
return(CORRECT)
}
DRAW<- function(N)
{
for (i in 1:N)
{
plot(1, bty="n")
box(which="inner")
}
for (SIDE in 1:2)
{
mtext(outer=T, side=SIDE, line=0, cex=1, "line 0")
mtext(outer=T, side=SIDE, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=SIDE, line=-FINDCEX(), cex=1, "line 
FINDCEX")
}
for (SIDE in 3:4)
{
mtext(outer=T, side=SIDE, line=0, cex=1, "line 0")
mtext(outer=T, side=SIDE, line=1, cex=1, "line 1")
}
}
pdf("TestCEX.pdf", 10, 8)
par(mfrow=c(1,1), omi=c(1,1,1,1))
DRAW(1)
par(mfrow=c(2,2), omi=c(1,1,1,1))
DRAW(4)
par(mfrow=c(3,3), omi=c(1,1,1,1))
DRAW(9)
graphics.off()
system("open TestCEX.pdf")

system("open TestCEX.pdf")  ## in Windows, "start" instead of "open"


If you execute this code, you will see:
1.  the box at the outer border is identical on all pages (as expected)
2.  the margin side on the left side superimposes between pages
3.  on the bottom, margin text is spaced the same on each page (unlike at the 
top).  However, the vertical position differs between pages (most obvious on 
comparing pages 1 and 3).  

I could correct the vertical positioning by brute force, i.e., adding a 
constant to each mtext command (applying a different value based on the value 
of mfrow) -- It turn out that a correction of 0.2 (for mfrow=c(2,2) and 0.4 
(for mfrow=c(2,2) aligns everything properly.  .  However, this is not elegant. 
 The correction factor took care of line-to-line spacing but not vertical 
position.

Dennis



Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

On Sep 2, 2012, at 10:40 AM, Dennis Fisher wrote:

> R 2.15.1
> OS X (MLion)
> 
> Colleagues,
> 
> I am aware that changes in mfrow / mfcol in par() affect cex (from help: In a 
> layout with exactly two rows and columns the base value of ‘"cex"’ is reduced 
> by a factor of 0.83: if there are three or more of either rows or columns, 
> the reduction factor is 0.66).
> 
> I generate a multipage PDF in which mfrow varies such that cex is impacted.  
> This affect mtext in the outer margin.  Sample code is pasted at the bottom 
> of this email.  The impact is most obvious if one examines the text at the 
> bottom of each page as one moves page-to-page.
> 
> Does anyone have a suggestion for how to overcome this (other than using 
> brute force).
> 
> Dennis
> 
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com
> 
> # In a layout with exactly two rows and columns the base value of ‘"cex"’ is 
> reduced by a factor of 0.83: if 
> # there are three or more of either rows or columns, the reduction factor is 
> 0.66.
> 
> FINDCEX   <- function()
>   {
>   CORRECT <- 1
>   MFROW   <- par()$mfrow
>   MFCOL   <- par()$mfcol
>   TEST<- all(MFROW == MFCOL)
>   if (TEST && MFROW == c(2,2))CORRECT <- 1 / 
> 0.83
>   if (TEST && (MFROW[1] >= 3 | MFROW[2] >= 3))CORRECT <- 1 / 0.66
>   if (!TEST)  cat("MFROW does not equal MFCOL\n")
>   cat(MFROW, CORRECT, "\n")
>   return(CORRECT)
>   }
> pdf("TestCEX.pdf", 8, 6)
> par(mfrow=c(1,1), omi=c(1,1,1,1))
> plot(1)
> mtext(outer=T, side=1, line=0, cex=1, "line 0")
> mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")
> mtext(outer=T, side=3, line=0, cex=1, "line 0")
> mtext(outer=T, side=3, line=1, cex=1, "line 1")
> mtext(outer=T, side=2, line=0, cex=1, "line 0")
> mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX")
> mtext(outer=T, side=4, line=0, cex=1, "line 0")
> mtext(outer=T, side=4, line=1, cex=1, "line 1")
> 
> 
> par(mfrow=c(1,2), omi=c(1,1,1,1))
> plot(1)
> mtext(outer=T, side=1, line=0, cex=1, "line 0")
> mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")

Re: [R] a newbie seeking for a simple problem

2012-09-02 Thread Jeff Newmiller
Read the posting guide... you need to provide more specific information such as 
sample data (?dput).

For this problem you should probably read

?merge
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

"Hon Kit (Stephen) Wong"  wrote:

>Dear Experienced R users,
>
>I have a looks-like simple but complicated problem urgently needed to
>be solved. Below is the detail:
>
>I have two dataframes, df1, df2. df1 contains two column and many
>thousands rows: column 1 is a "gene_name", column 2 is "value". df2
>contains only one column which is "gene_name" with couple hundred rows.
>I want to change "value" of df2 for those "gene_name" also appear in
>df2 "gene_name". How to do that? Millions thanks.
>
>
>Ste
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] a newbie seeking for a simple problem

2012-09-02 Thread Hon Kit (Stephen) Wong
Dear Experienced R users,

I have a looks-like simple but complicated problem urgently needed to be 
solved. Below is the detail:

I have two dataframes, df1, df2. df1 contains two column and many thousands 
rows: column 1 is a "gene_name", column 2 is "value". df2 contains only one 
column which is "gene_name" with couple hundred rows. I want to change "value" 
of df2 for those "gene_name" also appear in df2 "gene_name". How to do that? 
Millions thanks.


Ste

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why variations in accuracy between R to ARCGIS for the same point reprojection?

2012-09-02 Thread Camilo Mora
I apologize, I did not intend to blame anyone. I actually thought  
there may be some underlying differences in the formulas used. The  
patterns overall are very similar but they are not that precise to one  
another. The function I use in r is called spTransform from the  
package "rgdal".


Again, my guess is that it has to do something with the functions  
used. As an example, I found two different ways to measure the radius  
of the Earth. So perhaps there is an additional parameter that  
differentiates the scripts from R and ArcGis?.


Sorry again for the confusion and thanks,

Camilo
p.s. I always pride R. It is just hard to image academic life without it...

Camilo Mora, Ph.D.
Department of Geography, University of Hawaii
Currently available in Colombia
Phone:   Country code: 57
 Provider code: 313
 Phone 776 2282
 From the USA or Canada you have to dial 011 57 313 776 2282
http://www.soc.hawaii.edu/mora/



Quoting Prof Brian Ripley :

There is no 'reprojection' in R (which is upper case).  Please  
attribute blame correctly.


You seem to be talking about some contributed addon package, not  
specified.  But I think you should be asking this on R-sig-geo.


On 02/09/2012 19:24, Camilo Mora wrote:

Hi everyone,

I wonder if anyone knows the reason why the outputs of the same
reprojection in r and arcgis are different?. The magnitude of the change
can be up to 40 km in the poles.
Basically, I have a database of points equally separated by one degree
over the globe.
In ARCGIS,  I am projecting the data in GCS-WGS-1984 and then
reprojected it to Berhmann to ensure equal area distribution of the points.
In R, I am using:
spPoint <-
SpatialPoints(coords=coordinates(Data),proj4string=CRS("+proj=longlat
+datum=WGS84"))
and then reprojecting it to Berhmann with:
spPointReprj=spTransform(Data,CRS("+proj=cea +lon_0=0 +lat_ts=30 +x_0=0
+y_0=0 +ellps=WGS84"))

If I put the two outputs of the reprojections in the same map, they are
off by few meters in the tropics by up to 40km in the poles.

I decided to investigate how the reprojections are done and my
calculations are different from both R and ArcGis:

First, I calculated the radious of the Earth in two different ways:
=Re * (1 - e^2)/ (1 - e^2 *SIN(RADIANS(Latitude))^2)^(3/2)
=Re * SQRT(1 - e^2) / (1 - e^2 * (SIN(RADIANS(Latitude)))^2)

where Re is the radius of the Earth at the tropics(6378km) and e is the
eccentricity of the ellipsoid (0.081082).

According to several books I used, the position of a point in the Y-axis
in the Berhmann projection could be estimated as:
=Re*(SIN(RADIANS(Latitude))/COS(RADIANS(Spll)))
where Spll is the standard parallel, which in the Berhmann's projection
is 30.
Unfortunately, the resulting output, although similar in shape to the
outputs in R and Arcgis, is still not quite the same. Any thoughts why
these differences in supposedly the same calculations?

Any input will be greatly appreciated,

Thanks,

Camilo




Camilo Mora, Ph.D.
Department of Geography, University of Hawaii
Currently available in Colombia
Phone:   Country code: 57
 Provider code: 313
 Phone 776 2282
 From the USA or Canada you have to dial 011 57 313 776 2282
http://www.soc.hawaii.edu/mora/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NANs in test Breslow-Day for svytable??

2012-09-02 Thread Diana Marcela Martinez Ruiz

Hello,

 Thanks for the indications for use epi.2by2 in svytable, however when running 
the code throws warnings
 that reference NANs, could help me with this? Following you show the results:

> int19<-svytable(~Perdida_peso_no_intensionada+APES_DICOT+tabaco,Muestra.comp,Ntotal
>  = NULL)
> int19
, , tabaco = Nunca fumador

APES_DICOT
Perdida_peso_no_intensionada  Buena   Mala
  No/No sabe 2108.3  599.8
  Si  264.8  253.6

, , tabaco = Activo pasivo

APES_DICOT
Perdida_peso_no_intensionada  Buena   Mala
  No/No sabe  200.8   32.0
  Si   25.0   32.0

> epi.2by2(dat = int19, method = "case.control", conf.level = 0.95, 
+ units = 100, homogeneity = "breslow.day", verbose = TRUE)$OR.homog

  test.statistic dfp.value
1   6.566923  1 0.01038914
Hubo 12 avisos (use warnings() para verlos)

> warnings(int19)
Warning messages:
1: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 
264.8 599.8 253.6 200.8 25 32 32
2: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 
264.8 599.8 253.6 200.8 25 32 32
3: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 
264.8 599.8 253.6 200.8 25 32 32
4: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 
264.8 599.8 253.6 200.8 25 32 32
5: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 
264.8 599.8 253.6 200.8 25 32 32
6: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 
264.8 599.8 253.6 200.8 25 32 32
7: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 
264.8 599.8 253.6 200.8 25 32 32
8: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 
264.8 599.8 253.6 200.8 25 32 32
9: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 
264.8 599.8 253.6 200.8 25 32 32
10: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 
264.8 599.8 253.6 200.8 25 32 32
11: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 
264.8 599.8 253.6 200.8 25 32 32
12: In qbinom(p, size, prob, lower.tail, log.p) : Se han producido NaNs 2108.3 
264.8 599.8 253.6 200.8 25 32 32


Thanks

> Date: Sat, 1 Sep 2012 12:34:11 +1200
> Subject: Re: [R] test Breslow-Day for svytable??
> From: tlum...@uw.edu
> To: dwinsem...@comcast.net
> CC: dianamm...@hotmail.com; r-help@r-project.org; m.steven...@massey.ac.nz
> 
> On Sat, Sep 1, 2012 at 4:27 AM, David Winsemius  
> wrote:
> >
> > On Aug 31, 2012, at 7:20 AM, Diana Marcela Martinez Ruiz wrote:
> >
> >> Hi all,
> >>
> >> I want to know how to perform the test Breslow-Day test for homogeneity of
> >> odds ratios (OR) stratified for svytable. This test is obtained with the 
> >> following code:
> >>
> >> epi.2by2 (dat = daty, method = "case.control" conf.level = 0.95,
> >
> > missing comma here ...^
> >
> >>units = 100, homogeneity = "breslow.day", verbose = TRUE)
> >>
> >> where "daty" is the object type table  svytable consider it, but when I 
> >> run the code
> >> does not throw the homogeneity test.
> >
> > You are asked in the Posting guide to copy all errors and warnings when 
> > asking about unexpected behavior. When I run epi.2y2 on the output of a 
> > syvtable object I get no errors, but I do get warnings which I think are 
> > due to non-integer entries in the weighted table. I also get from a 
> > svytable() usingits first example on the help page an object that is NOT a 
> > set of 2 x 2 tables in an array of the structure as expected by epi.2by2(). 
> > The fact that epi.2by2() will report numbers with labels for a 2 x 3 table 
> > means that its error checking is weak.
> >
> > This is the output of str(dat) from one of the example on epi.2by2's help 
> > page:
> >
> >> str(dat)
> >  table [1:2, 1:2, 1:3] 41 13 6 53 66 37 25 83 23 37 ...
> >  - attr(*, "dimnames")=List of 3
> >   ..$ Exposure: chr [1:2] "+" "-"
> >   ..$ Disease : chr [1:2] "+" "-"
> >   ..$ Strata  : chr [1:3] "20-29 yrs" "30-39 yrs" "40+ yrs"
> >
> > Notice that is is a 2 x 2 x n array. (Caveat:: from here on out I am simply 
> > reading the help pages and using str() to look at the objects created to 
> > get an idea regarding success or failure. I am not an experienced user of 
> > either package.)  I doubt that  what you got from svytable is a 2 x 2 
> > table. As another example you can build a 2 x 2 x n table from the built-in 
> > dataset: UCBAdmissions
> >
> > DF <- as.data.frame(UCBAdmissions)
> > ## Now 'DF' is a data frame with a grid of the factors and the counts
> > ## in variable 'Freq'.
> > dat2 <- xtabs(Freq ~ Gender + Admit+Dept, DF)
> > epiR::epi.2by2(dat = dat2, method = "case.control", conf.level = 0.95,
> >  units = 100, homogeneity = "breslow.day", verbose = TRUE)$OR.homog
> > #-
> >   test.statistic dfp.value
>

[R] [newbie] scripting remote check for R package

2012-09-02 Thread Tom Roche

summary: e.g., how to replace ''
in the following:

for RSERVER in 'foo' 'bar' 'baz' ; do
  ssh ${RSERVER} ''
done

or is there a better way to script checking for an R package?

details:

For my work I need a few non-standard R packages. I do that work on 2
clusters. One uses environment modules, so any given server on that
cluster can provide (via `module add`) pretty much the same resources.

The other cluster, on which I must also unfortunately work, has
servers managed individually, so I get in the habit of doing, e.g.,

EXEC_NAME='whatever'
for S in 'foo' 'bar' 'baz' ; do
  ssh ${RSERVER} "${EXEC_NAME} --version"
done

periodically. I'm wondering, what incantation to utter (bash preferred)
via ssh to query a given server's R for a given package?

TIA, Tom Roche 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Impact of cex changing as a function of mfrow

2012-09-02 Thread Duncan Murdoch

On 12-09-02 3:52 PM, Dennis Fisher wrote:

Duncan

Perhaps I did not explain sufficiently -- the code that I sent makes the 
correction in FINDCEX.  But, that correction does not accomplish the intended 
goal -- as evidenced by the graphic that is created from the code that I sent.

The option of "brute force" would require trial and error.  I was hoping that 
invoking some other option in mtext, such as padj, would accomplish my goal of needing 
only to multiply the line= value by the value returned from my function.


You're right, I didn't understand you.

Here's what I see in your sample code display (in 2.15.0 on a Mac with 
OSX 10.6.8):


The spacing between "line 0" and "line 1" (on the top and right) gets 
smaller from page 1 to page 4.


The size of the margin text is the same in all pages.

The size of "Index" within the subfigures is smaller in the later pages.

There are some other small movements of text from page to page, but not 
much.


What do you see?

Duncan Murdoch


Dennis

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

On Sep 2, 2012, at 11:43 AM, Duncan Murdoch  wrote:


On 12-09-02 1:40 PM, Dennis Fisher wrote:

R 2.15.1
OS X (MLion)

Colleagues,

I am aware that changes in mfrow / mfcol in par() affect cex (from help: In a layout with 
exactly two rows and columns the base value of ‘"cex"’ is reduced by a factor 
of 0.83: if there are three or more of either rows or columns, the reduction factor is 
0.66).

I generate a multipage PDF in which mfrow varies such that cex is impacted.  
This affect mtext in the outer margin.  Sample code is pasted at the bottom of 
this email.  The impact is most obvious if one examines the text at the bottom 
of each page as one moves page-to-page.

Does anyone have a suggestion for how to overcome this (other than using brute 
force).


I am not sure what's wrong with brute force here.  You know the formula for the 
reduction; just apply a corresponding increase first.

Duncan Murdoch



Dennis

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

# In a layout with exactly two rows and columns the base value of ‘"cex"’ is 
reduced by a factor of 0.83: if
# there are three or more of either rows or columns, the reduction factor is 
0.66.

FINDCEX <- function()
{
CORRECT <- 1
MFROW   <- par()$mfrow
MFCOL   <- par()$mfcol
TEST<- all(MFROW == MFCOL)
if (TEST && MFROW == c(2,2))CORRECT <- 1 / 
0.83
if (TEST && (MFROW[1] >= 3 | MFROW[2] >= 3))  CORRECT <- 1 / 0.66
if (!TEST)  cat("MFROW does not equal MFCOL\n")
cat(MFROW, CORRECT, "\n")
return(CORRECT)
}
pdf("TestCEX.pdf", 8, 6)
par(mfrow=c(1,1), omi=c(1,1,1,1))
plot(1)
mtext(outer=T, side=1, line=0, cex=1, "line 0")
mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=3, line=0, cex=1, "line 0")
mtext(outer=T, side=3, line=1, cex=1, "line 1")
mtext(outer=T, side=2, line=0, cex=1, "line 0")
mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=4, line=0, cex=1, "line 0")
mtext(outer=T, side=4, line=1, cex=1, "line 1")


par(mfrow=c(1,2), omi=c(1,1,1,1))
plot(1)
mtext(outer=T, side=1, line=0, cex=1, "line 0")
mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=3, line=0, cex=1, "line 0")
mtext(outer=T, side=3, line=1, cex=1, "line 1")
mtext(outer=T, side=2, line=0, cex=1, "line 0")
mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=4, line=0, cex=1, "line 0")
mtext(outer=T, side=4, line=1, cex=1, "line 1")

par(mfrow=c(2,2), omi=c(1,1,1,1))
plot(1)
mtext(outer=T, side=1, line=0, cex=1, "line 0")
mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=3, line=0, cex=1, "line 0")
mtext(outer=T, side=3, line=1, cex=1, "line 1")
mtext(outer=T, side=2, line=0, cex=1, "line 0")
mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=4, line=0, cex=1, "line 0")
mtext(outer=T, side=4, line=1, cex=1, "line 1")

par(mfrow=c(3,3), omi=c(1,1,1,1))
plot(1)
mtext(outer=T, side=1, line=0, cex=1, "line 0")
mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=3, line=0, cex=1, "line 0")
mtext(outer=T, side=3, line=1, cex=1, "line 1")
mtext(outer=T, side=2, line=0, cex=1, "line 0")
mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=4, line=0, cex=1, "line 0")
mtext(outer=T, side=4, line=1, cex=1, "line 1")

graphics.off()
system("open TestCEX.pdf")

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provid

Re: [R] Impact of cex changing as a function of mfrow

2012-09-02 Thread Dennis Fisher
Duncan

Perhaps I did not explain sufficiently -- the code that I sent makes the 
correction in FINDCEX.  But, that correction does not accomplish the intended 
goal -- as evidenced by the graphic that is created from the code that I sent.

The option of "brute force" would require trial and error.  I was hoping that 
invoking some other option in mtext, such as padj, would accomplish my goal of 
needing only to multiply the line= value by the value returned from my function.

Dennis

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

On Sep 2, 2012, at 11:43 AM, Duncan Murdoch  wrote:

> On 12-09-02 1:40 PM, Dennis Fisher wrote:
>> R 2.15.1
>> OS X (MLion)
>> 
>> Colleagues,
>> 
>> I am aware that changes in mfrow / mfcol in par() affect cex (from help: In 
>> a layout with exactly two rows and columns the base value of ‘"cex"’ is 
>> reduced by a factor of 0.83: if there are three or more of either rows or 
>> columns, the reduction factor is 0.66).
>> 
>> I generate a multipage PDF in which mfrow varies such that cex is impacted.  
>> This affect mtext in the outer margin.  Sample code is pasted at the bottom 
>> of this email.  The impact is most obvious if one examines the text at the 
>> bottom of each page as one moves page-to-page.
>> 
>> Does anyone have a suggestion for how to overcome this (other than using 
>> brute force).
> 
> I am not sure what's wrong with brute force here.  You know the formula for 
> the reduction; just apply a corresponding increase first.
> 
> Duncan Murdoch
> 
>> 
>> Dennis
>> 
>> Dennis Fisher MD
>> P < (The "P Less Than" Company)
>> Phone: 1-866-PLessThan (1-866-753-7784)
>> Fax: 1-866-PLessThan (1-866-753-7784)
>> www.PLessThan.com
>> 
>> # In a layout with exactly two rows and columns the base value of ‘"cex"’ is 
>> reduced by a factor of 0.83: if
>> # there are three or more of either rows or columns, the reduction factor is 
>> 0.66.
>> 
>> FINDCEX  <- function()
>>  {
>>  CORRECT <- 1
>>  MFROW   <- par()$mfrow
>>  MFCOL   <- par()$mfcol
>>  TEST<- all(MFROW == MFCOL)
>>  if (TEST && MFROW == c(2,2))CORRECT <- 1 / 
>> 0.83
>>  if (TEST && (MFROW[1] >= 3 | MFROW[2] >= 3))CORRECT <- 1 / 0.66
>>  if (!TEST)  cat("MFROW does not equal MFCOL\n")
>>  cat(MFROW, CORRECT, "\n")
>>  return(CORRECT)
>>  }
>> pdf("TestCEX.pdf", 8, 6)
>> par(mfrow=c(1,1), omi=c(1,1,1,1))
>> plot(1)
>> mtext(outer=T, side=1, line=0, cex=1, "line 0")
>> mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")
>> mtext(outer=T, side=3, line=0, cex=1, "line 0")
>> mtext(outer=T, side=3, line=1, cex=1, "line 1")
>> mtext(outer=T, side=2, line=0, cex=1, "line 0")
>> mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX")
>> mtext(outer=T, side=4, line=0, cex=1, "line 0")
>> mtext(outer=T, side=4, line=1, cex=1, "line 1")
>> 
>> 
>> par(mfrow=c(1,2), omi=c(1,1,1,1))
>> plot(1)
>> mtext(outer=T, side=1, line=0, cex=1, "line 0")
>> mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")
>> mtext(outer=T, side=3, line=0, cex=1, "line 0")
>> mtext(outer=T, side=3, line=1, cex=1, "line 1")
>> mtext(outer=T, side=2, line=0, cex=1, "line 0")
>> mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX")
>> mtext(outer=T, side=4, line=0, cex=1, "line 0")
>> mtext(outer=T, side=4, line=1, cex=1, "line 1")
>> 
>> par(mfrow=c(2,2), omi=c(1,1,1,1))
>> plot(1)
>> mtext(outer=T, side=1, line=0, cex=1, "line 0")
>> mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")
>> mtext(outer=T, side=3, line=0, cex=1, "line 0")
>> mtext(outer=T, side=3, line=1, cex=1, "line 1")
>> mtext(outer=T, side=2, line=0, cex=1, "line 0")
>> mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX")
>> mtext(outer=T, side=4, line=0, cex=1, "line 0")
>> mtext(outer=T, side=4, line=1, cex=1, "line 1")
>> 
>> par(mfrow=c(3,3), omi=c(1,1,1,1))
>> plot(1)
>> mtext(outer=T, side=1, line=0, cex=1, "line 0")
>> mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")
>> mtext(outer=T, side=3, line=0, cex=1, "line 0")
>> mtext(outer=T, side=3, line=1, cex=1, "line 1")
>> mtext(outer=T, side=2, line=0, cex=1, "line 0")
>> mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX")
>> mtext(outer=T, side=4, line=0, cex=1, "line 0")
>> mtext(outer=T, side=4, line=1, cex=1, "line 1")
>> 
>> graphics.off()
>> system("open TestCEX.pdf")
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.or

Re: [R] Loading Chess Data

2012-09-02 Thread Rui Barradas

Hello,

Try the following


library(XML)

url <- "http://ratings.fide.com/top.phtml?list=men";
chess <- readHTMLTable(url, header = TRUE, which = 5)

str(chess)  # See what we have

# All variables are factors,
# convert these to integer
chess$Rating <- as.integer(chess$Rating)
chess$Games <- as.integer(chess$Games)
chess$`B-Year` <- as.integer(chess$`B-Year`)

head(chess, 20)  # See first 20 rows


Hope this helps,

Rui Barradas
Em 02-09-2012 17:41, David Arnold escreveu:

All,

What would be the most efficient way to load the data at the following
address into a dataframe?

http://ratings.fide.com/top.phtml?list=men

Thanks,

David



--
View this message in context: 
http://r.789695.n4.nabble.com/Loading-Chess-Data-tp4642006.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.lm(...,type="terms") question

2012-09-02 Thread Rui Barradas

Hello,

In the mean time, I've "discovered" an R package that might do the job, 
chemCal. It only implements first order linear regression (weighted), 
and like I'd said in my first reply, and was repeated several times, the 
standard errors are not reversed, the prediction bands follow Massart et 
al. (1988).

Take a look at it, it's a very simple package.

Rui Barradas

Em 02-09-2012 18:07, John Thaden escreveu:

Thank you all. My muddle about predict.lm(..., type = "terms") was evident even 
in my first sentence of my original posting


How can I actually use the output of
predict.lm(..., type="terms") to predict
new term values from new response values?

the answer being that I cannot; that new response values, if included in 
newdata, will simply be ignored by predict.lm, as well they should.

As for the calibration issue, I am reviewing literature now as suggested.

Though predict.lm performed to spec (no bug), may I suggest a minor
change to ?predict.lm text?

Existing:
  newdata  An optional data frame in which to
   look for variables with
   which to predict. If omitted,
   the fitted values are used.
Proposed:
  newdata  An optional data frame in which to
   look for new values of terms with
   which to predict. If omitted, the
   fitted values are used.

-John Thaden, Ph.D.
  College Station, TX

--- On Sun, 9/2/12, peter dalgaard  wrote:


From: peter dalgaard 
Subject: Re: [R] predict.lm(...,type="terms") question
To: "David Winsemius" 
Cc: "Rui Barradas" , r-help@r-project.org, "jjthaden" 

Date: Sunday, September 2, 2012, 1:35 AM

On Sep 2, 2012, at 03:38 , David Winsemius wrote:


Why should predict not complain when it is offered a

newdata argument that does no contain a vector of values for
"x"? The whole point of the terms method of prediction is to
offer estimates for specific values of items on the RHS of
the formula. The OP seems to have trouble understanding that
point. Putting in a vector with the name of the LHS item
makes no sense to me. I certainly cannot see that any
particular behavior for this pathological input is described
for predict.lm in its help page, but throwing an error seems
perfectly reasonable to me.

Yes. Lots of confusion going on here.

First, data= is _always_ used as the _first_ place to look
for variables, if things are not in it, search continues
into the formula's environment. To be slightly perverse,
notice that even this works:


y <- rnorm(10)
x <- rnorm(10)
d <- data.frame(z=rnorm(9))
lm(y ~ x, d)

Call:
lm(formula = y ~ x, data = d)

Coefficients:
(Intercept)x

 -0.2760
0.2328

Secondly, what is predict(..., type="terms") supposed to
have to do with inverting a regression equation? That's just
not what it does, it only splits the prediction formula into
its constituent terms.

Thirdly; no, you do not invert a regression equation by
regressing y on x. That only works if you can be sure that
your new (x, y) are sampled from the same population as the
data, which is not going to be the case if you are fitting
to data with, say, selected equispaced x values. There's a
whole literature on how to do this properly, Google e.g.
"inverse calibration" for enlightenment.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk
Priv: pda...@gmail.com











__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why variations in accuracy between R to ARCGIS for the same point reprojection?

2012-09-02 Thread Prof Brian Ripley
There is no 'reprojection' in R (which is upper case).  Please attribute 
blame correctly.


You seem to be talking about some contributed addon package, not 
specified.  But I think you should be asking this on R-sig-geo.


On 02/09/2012 19:24, Camilo Mora wrote:

Hi everyone,

I wonder if anyone knows the reason why the outputs of the same
reprojection in r and arcgis are different?. The magnitude of the change
can be up to 40 km in the poles.
Basically, I have a database of points equally separated by one degree
over the globe.
In ARCGIS,  I am projecting the data in GCS-WGS-1984 and then
reprojected it to Berhmann to ensure equal area distribution of the points.
In R, I am using:
spPoint <-
SpatialPoints(coords=coordinates(Data),proj4string=CRS("+proj=longlat
+datum=WGS84"))
and then reprojecting it to Berhmann with:
spPointReprj=spTransform(Data,CRS("+proj=cea +lon_0=0 +lat_ts=30 +x_0=0
+y_0=0 +ellps=WGS84"))

If I put the two outputs of the reprojections in the same map, they are
off by few meters in the tropics by up to 40km in the poles.

I decided to investigate how the reprojections are done and my
calculations are different from both R and ArcGis:

First, I calculated the radious of the Earth in two different ways:
=Re * (1 - e^2)/ (1 - e^2 *SIN(RADIANS(Latitude))^2)^(3/2)
=Re * SQRT(1 - e^2) / (1 - e^2 * (SIN(RADIANS(Latitude)))^2)

where Re is the radius of the Earth at the tropics(6378km) and e is the
eccentricity of the ellipsoid (0.081082).

According to several books I used, the position of a point in the Y-axis
in the Berhmann projection could be estimated as:
=Re*(SIN(RADIANS(Latitude))/COS(RADIANS(Spll)))
where Spll is the standard parallel, which in the Berhmann's projection
is 30.
Unfortunately, the resulting output, although similar in shape to the
outputs in R and Arcgis, is still not quite the same. Any thoughts why
these differences in supposedly the same calculations?

Any input will be greatly appreciated,

Thanks,

Camilo




Camilo Mora, Ph.D.
Department of Geography, University of Hawaii
Currently available in Colombia
Phone:   Country code: 57
  Provider code: 313
  Phone 776 2282
  From the USA or Canada you have to dial 011 57 313 776 2282
http://www.soc.hawaii.edu/mora/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Impact of cex changing as a function of mfrow

2012-09-02 Thread Duncan Murdoch

On 12-09-02 1:40 PM, Dennis Fisher wrote:

R 2.15.1
OS X (MLion)

Colleagues,

I am aware that changes in mfrow / mfcol in par() affect cex (from help: In a layout with 
exactly two rows and columns the base value of ‘"cex"’ is reduced by a factor 
of 0.83: if there are three or more of either rows or columns, the reduction factor is 
0.66).

I generate a multipage PDF in which mfrow varies such that cex is impacted.  
This affect mtext in the outer margin.  Sample code is pasted at the bottom of 
this email.  The impact is most obvious if one examines the text at the bottom 
of each page as one moves page-to-page.

Does anyone have a suggestion for how to overcome this (other than using brute 
force).


I am not sure what's wrong with brute force here.  You know the formula 
for the reduction; just apply a corresponding increase first.


Duncan Murdoch



Dennis

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

# In a layout with exactly two rows and columns the base value of ‘"cex"’ is 
reduced by a factor of 0.83: if
# there are three or more of either rows or columns, the reduction factor is 
0.66.

FINDCEX <- function()
{
CORRECT <- 1
MFROW   <- par()$mfrow
MFCOL   <- par()$mfcol
TEST<- all(MFROW == MFCOL)
if (TEST && MFROW == c(2,2))CORRECT <- 1 / 
0.83
if (TEST && (MFROW[1] >= 3 | MFROW[2] >= 3))  CORRECT <- 1 / 0.66
if (!TEST)  cat("MFROW does not equal MFCOL\n")
cat(MFROW, CORRECT, "\n")
return(CORRECT)
}
pdf("TestCEX.pdf", 8, 6)
par(mfrow=c(1,1), omi=c(1,1,1,1))
plot(1)
mtext(outer=T, side=1, line=0, cex=1, "line 0")
mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=3, line=0, cex=1, "line 0")
mtext(outer=T, side=3, line=1, cex=1, "line 1")
mtext(outer=T, side=2, line=0, cex=1, "line 0")
mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=4, line=0, cex=1, "line 0")
mtext(outer=T, side=4, line=1, cex=1, "line 1")


par(mfrow=c(1,2), omi=c(1,1,1,1))
plot(1)
mtext(outer=T, side=1, line=0, cex=1, "line 0")
mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=3, line=0, cex=1, "line 0")
mtext(outer=T, side=3, line=1, cex=1, "line 1")
mtext(outer=T, side=2, line=0, cex=1, "line 0")
mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=4, line=0, cex=1, "line 0")
mtext(outer=T, side=4, line=1, cex=1, "line 1")

par(mfrow=c(2,2), omi=c(1,1,1,1))
plot(1)
mtext(outer=T, side=1, line=0, cex=1, "line 0")
mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=3, line=0, cex=1, "line 0")
mtext(outer=T, side=3, line=1, cex=1, "line 1")
mtext(outer=T, side=2, line=0, cex=1, "line 0")
mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=4, line=0, cex=1, "line 0")
mtext(outer=T, side=4, line=1, cex=1, "line 1")

par(mfrow=c(3,3), omi=c(1,1,1,1))
plot(1)
mtext(outer=T, side=1, line=0, cex=1, "line 0")
mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=3, line=0, cex=1, "line 0")
mtext(outer=T, side=3, line=1, cex=1, "line 1")
mtext(outer=T, side=2, line=0, cex=1, "line 0")
mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=4, line=0, cex=1, "line 0")
mtext(outer=T, side=4, line=1, cex=1, "line 1")

graphics.off()
system("open TestCEX.pdf")

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] why variations in accuracy between R to ARCGIS for the same point reprojection?

2012-09-02 Thread Camilo Mora

Hi everyone,

I wonder if anyone knows the reason why the outputs of the same  
reprojection in r and arcgis are different?. The magnitude of the  
change can be up to 40 km in the poles.
Basically, I have a database of points equally separated by one degree  
over the globe.
In ARCGIS,  I am projecting the data in GCS-WGS-1984 and then  
reprojected it to Berhmann to ensure equal area distribution of the  
points.

In R, I am using:
spPoint <-  
SpatialPoints(coords=coordinates(Data),proj4string=CRS("+proj=longlat  
+datum=WGS84"))

and then reprojecting it to Berhmann with:
spPointReprj=spTransform(Data,CRS("+proj=cea +lon_0=0 +lat_ts=30  
+x_0=0 +y_0=0 +ellps=WGS84"))


If I put the two outputs of the reprojections in the same map, they  
are off by few meters in the tropics by up to 40km in the poles.


I decided to investigate how the reprojections are done and my  
calculations are different from both R and ArcGis:


First, I calculated the radious of the Earth in two different ways:
=Re * (1 - e^2)/ (1 - e^2 *SIN(RADIANS(Latitude))^2)^(3/2)
=Re * SQRT(1 - e^2) / (1 - e^2 * (SIN(RADIANS(Latitude)))^2)

where Re is the radius of the Earth at the tropics(6378km) and e is  
the eccentricity of the ellipsoid (0.081082).


According to several books I used, the position of a point in the  
Y-axis in the Berhmann projection could be estimated as:

=Re*(SIN(RADIANS(Latitude))/COS(RADIANS(Spll)))
where Spll is the standard parallel, which in the Berhmann's projection is 30.
Unfortunately, the resulting output, although similar in shape to the  
outputs in R and Arcgis, is still not quite the same. Any thoughts why  
these differences in supposedly the same calculations?


Any input will be greatly appreciated,

Thanks,

Camilo




Camilo Mora, Ph.D.
Department of Geography, University of Hawaii
Currently available in Colombia
Phone:   Country code: 57
 Provider code: 313
 Phone 776 2282
 From the USA or Canada you have to dial 011 57 313 776 2282
http://www.soc.hawaii.edu/mora/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loading Chess Data

2012-09-02 Thread Hasan Diwan
Mr Arnold,

> What would be the most efficient way to load the data at the following
>> address into a dataframe?
>>
>
To what end? In other words, what are you trying to achieve with the
ratings list? -- H
-- 
Sent from my mobile device
Envoyait de mon portable

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loading Chess Data

2012-09-02 Thread jones

You might take a look at the link titled "Download Rating List",
in the blue box on the right side of that page...

albyn

On 2012-09-02 9:41, David Arnold wrote:

All,

What would be the most efficient way to load the data at the 
following

address into a dataframe?

http://ratings.fide.com/top.phtml?list=men

Thanks,

David



--
View this message in context:
http://r.789695.n4.nabble.com/Loading-Chess-Data-tp4642006.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Impact of cex changing as a function of mfrow

2012-09-02 Thread Dennis Fisher
R 2.15.1
OS X (MLion)

Colleagues,

I am aware that changes in mfrow / mfcol in par() affect cex (from help: In a 
layout with exactly two rows and columns the base value of ‘"cex"’ is reduced 
by a factor of 0.83: if there are three or more of either rows or columns, the 
reduction factor is 0.66).

I generate a multipage PDF in which mfrow varies such that cex is impacted.  
This affect mtext in the outer margin.  Sample code is pasted at the bottom of 
this email.  The impact is most obvious if one examines the text at the bottom 
of each page as one moves page-to-page.

Does anyone have a suggestion for how to overcome this (other than using brute 
force).

Dennis

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

# In a layout with exactly two rows and columns the base value of ‘"cex"’ is 
reduced by a factor of 0.83: if 
# there are three or more of either rows or columns, the reduction factor is 
0.66.

FINDCEX <- function()
{
CORRECT <- 1
MFROW   <- par()$mfrow
MFCOL   <- par()$mfcol
TEST<- all(MFROW == MFCOL)
if (TEST && MFROW == c(2,2))CORRECT <- 1 / 
0.83
if (TEST && (MFROW[1] >= 3 | MFROW[2] >= 3))CORRECT <- 1 / 0.66
if (!TEST)  cat("MFROW does not equal MFCOL\n")
cat(MFROW, CORRECT, "\n")
return(CORRECT)
}
pdf("TestCEX.pdf", 8, 6)
par(mfrow=c(1,1), omi=c(1,1,1,1))
plot(1)
mtext(outer=T, side=1, line=0, cex=1, "line 0")
mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=3, line=0, cex=1, "line 0")
mtext(outer=T, side=3, line=1, cex=1, "line 1")
mtext(outer=T, side=2, line=0, cex=1, "line 0")
mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=4, line=0, cex=1, "line 0")
mtext(outer=T, side=4, line=1, cex=1, "line 1")


par(mfrow=c(1,2), omi=c(1,1,1,1))
plot(1)
mtext(outer=T, side=1, line=0, cex=1, "line 0")
mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=3, line=0, cex=1, "line 0")
mtext(outer=T, side=3, line=1, cex=1, "line 1")
mtext(outer=T, side=2, line=0, cex=1, "line 0")
mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=4, line=0, cex=1, "line 0")
mtext(outer=T, side=4, line=1, cex=1, "line 1")

par(mfrow=c(2,2), omi=c(1,1,1,1))
plot(1)
mtext(outer=T, side=1, line=0, cex=1, "line 0")
mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=3, line=0, cex=1, "line 0")
mtext(outer=T, side=3, line=1, cex=1, "line 1")
mtext(outer=T, side=2, line=0, cex=1, "line 0")
mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=4, line=0, cex=1, "line 0")
mtext(outer=T, side=4, line=1, cex=1, "line 1")

par(mfrow=c(3,3), omi=c(1,1,1,1))
plot(1)
mtext(outer=T, side=1, line=0, cex=1, "line 0")
mtext(outer=T, side=1, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=3, line=0, cex=1, "line 0")
mtext(outer=T, side=3, line=1, cex=1, "line 1")
mtext(outer=T, side=2, line=0, cex=1, "line 0")
mtext(outer=T, side=2, line=FINDCEX(), cex=1, "line FINDCEX")
mtext(outer=T, side=4, line=0, cex=1, "line 0")
mtext(outer=T, side=4, line=1, cex=1, "line 1")

graphics.off()
system("open TestCEX.pdf")

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Loading Chess Data

2012-09-02 Thread David Arnold
All,

What would be the most efficient way to load the data at the following
address into a dataframe?

http://ratings.fide.com/top.phtml?list=men

Thanks,

David



--
View this message in context: 
http://r.789695.n4.nabble.com/Loading-Chess-Data-tp4642006.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.lm(...,type="terms") question

2012-09-02 Thread John Thaden
Thank you all. My muddle about predict.lm(..., type = "terms") was evident even 
in my first sentence of my original posting 

> How can I actually use the output of 
> predict.lm(..., type="terms") to predict 
> new term values from new response values?

the answer being that I cannot; that new response values, if included in 
newdata, will simply be ignored by predict.lm, as well they should.

As for the calibration issue, I am reviewing literature now as suggested.

Though predict.lm performed to spec (no bug), may I suggest a minor
change to ?predict.lm text?

Existing: 
 newdata  An optional data frame in which to 
  look for variables with  
  which to predict. If omitted, 
  the fitted values are used.
Proposed: 
 newdata  An optional data frame in which to 
  look for new values of terms with 
  which to predict. If omitted, the 
  fitted values are used.

-John Thaden, Ph.D.
 College Station, TX

--- On Sun, 9/2/12, peter dalgaard  wrote:

> From: peter dalgaard 
> Subject: Re: [R] predict.lm(...,type="terms") question
> To: "David Winsemius" 
> Cc: "Rui Barradas" , r-help@r-project.org, "jjthaden" 
> 
> Date: Sunday, September 2, 2012, 1:35 AM
> 
> On Sep 2, 2012, at 03:38 , David Winsemius wrote:
> 
> > 
> > Why should predict not complain when it is offered a
> newdata argument that does no contain a vector of values for
> "x"? The whole point of the terms method of prediction is to
> offer estimates for specific values of items on the RHS of
> the formula. The OP seems to have trouble understanding that
> point. Putting in a vector with the name of the LHS item
> makes no sense to me. I certainly cannot see that any
> particular behavior for this pathological input is described
> for predict.lm in its help page, but throwing an error seems
> perfectly reasonable to me.
> 
> Yes. Lots of confusion going on here. 
> 
> First, data= is _always_ used as the _first_ place to look
> for variables, if things are not in it, search continues
> into the formula's environment. To be slightly perverse,
> notice that even this works:
> 
> > y <- rnorm(10)
> > x <- rnorm(10)
> > d <- data.frame(z=rnorm(9))
> > lm(y ~ x, d)
> 
> Call:
> lm(formula = y ~ x, data = d)
> 
> Coefficients:
> (Intercept)            x 
> 
>     -0.2760   
>    0.2328  
> 
> Secondly, what is predict(..., type="terms") supposed to
> have to do with inverting a regression equation? That's just
> not what it does, it only splits the prediction formula into
> its constituent terms.
> 
> Thirdly; no, you do not invert a regression equation by
> regressing y on x. That only works if you can be sure that
> your new (x, y) are sampled from the same population as the
> data, which is not going to be the case if you are fitting
> to data with, say, selected equispaced x values. There's a
> whole literature on how to do this properly, Google e.g.
> "inverse calibration" for enlightenment.  
> 
> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd@cbs.dk 
> Priv: pda...@gmail.com
> 
> 
> 
> 
> 
> 
> 
> 
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Simulation of genetic data

2012-09-02 Thread james power
Hi,

I am looking for the best way or best package available for simulating a
genetic association between a specific SNP and a quantitative
phenotype. All of the packages I saw in R seem to be specialised in
pedigree data or in population data where coalescence and other
evolutionary factors are specified, but I don't have any experience in
population genetics and I only want to simulate the simple case of European
population which is the most similar to my real data, having a normal
distribution for the trait and an additive effect for the genotype.
I am looking for something in R similar to the function in Plink where one
needs to specify a range of allele frequencies, a range for the phenotype,
and a variant which should result associated with the genotype.

Can someone please help me?
Thank you very much.

James

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R_closest date

2012-09-02 Thread arun
Hi,
No problem.

If you use join() instead of merge(), the original order of columns may not get 
altered.

dat3<-aggregate(DAYS_DIFF~PT_ID,data=dat1,min)
library(plyr)
 join(dat1,dat3,type="inner")
#Joining by: PT_ID, DAYS_DIFF
 # PT_ID IDX_DT   OBS_DATE DAYS_DIFF OBS_VALUE CATEGORY
#1  4549 2002-08-21 2002-08-20    -1   183    2
#2  4839 2006-11-28 2006-11-28 0   179    2
A.K.







From: Weijia Wang 
To: arun  
Sent: Saturday, September 1, 2012 5:11 PM
Subject: Re: [R] R_closest date


Thank you Arun, for your help again.

Best
__
WANG WEIJIA
Graudate Research and Teaching Assistant
Department of Environmental Medicine
New York University, School of Medicine
wwang@gmail.com




On Sep 1, 2012, at 5:04 PM, arun  wrote:

Hi,
>Try this:
>dat1 <- read.table(text="
>  PT_ID    IDX_DT  OBS_DATE DAYS_DIFF OBS_VALUE CATEGORY
>13  4549 2002-08-21 2002-08-20    -1  183    2
>14  4549 2002-08-21 2002-11-14    85    91    1
>15  4549 2002-08-21 2003-02-18  181    89    1
>16  4549 2002-08-21 2003-05-15  267  109    2
>17  4549 2002-08-21 2003-12-16  482    96    1
>128  4839 2006-11-28 2006-11-28    0  179    2
>", header=TRUE)
>dat3<-aggregate(DAYS_DIFF~PT_ID,data=dat1,min)
>merge(dat1,dat3)
>#  PT_ID DAYS_DIFF IDX_DT   OBS_DATE OBS_VALUE CATEGORY
>#1  4549    -1 2002-08-21 2002-08-20   183    2
>#2  4839 0 2006-11-28 2006-11-28   179    2
>
>#or,
>dat2<- tapply(dat1$DAYS_DIFF,dat1$PT_ID,min)
>dat4<-data.frame(PT_ID=row.names(data.frame(dat2)),DAYS_DIFF=dat2)
> row.names(dat4)<-1:nrow(dat4)
>merge(dat1,dat4)
>#  PT_ID DAYS_DIFF IDX_DT   OBS_DATE OBS_VALUE CATEGORY
>#1  4549    -1 2002-08-21 2002-08-20   183    2
>#2  4839 0 2006-11-28 2006-11-28   179    2
>A.K.
>
>
>
>
>
>- Original Message -
>From: WANG WEIJIA 
>To: "r-help@R-project.org" 
>Cc: 
>Sent: Saturday, September 1, 2012 1:10 PM
>Subject: [R] R_closest date
>
>Hi, 
>
>I have encountered an issue about finding a date closest to another date
>
>So this is how the data frame looks like:
>
>    PT_ID     IDX_DT   OBS_DATE DAYS_DIFF OBS_VALUE CATEGORY
>13   4549 2002-08-21 2002-08-20        -1       183        2
>14   4549 2002-08-21 2002-11-14        85        91        1
>15   4549 2002-08-21 2003-02-18       181        89        1
>16   4549 2002-08-21 2003-05-15       267       109        2
>17   4549 2002-08-21 2003-12-16       482        96        1
>128  4839 2006-11-28 2006-11-28         0       179        2
>
>I need to find, the single observation, which has the closest date of 
>'OBS_DATE' to 'IDX_DT'.
>
>For example, for 'PT_ID' of 4549, I need row 13, of which the OBS_DATE is just 
>one day away from IDX_DT. 
>
>I was thinking about using abs(), and I got this:
>
>baseline<- function(x){
>+  
>+  #remove all uncessary variables
>+  baseline<- x[,c("PT_ID","DAYS_DIFF")]
>+  
>+  #get a list of every unique ID
>+  uniqueID <- unique(baseline$PT_ID)
>+  
>+  #make a vector that will contain the smallest DAYS_DIFF
>+  first <- rep(-99,length(uniqueID))
>+  
>+  i = 1
>+  #loop through each unique ID
>+  for (PT_ID in uniqueID){
>+  
>+  #for each iteration get the smallest DAYS_DIFF for that ID
>+  first[i] <- 
>min(baseline[which(baseline$PT_ID==PT_ID),abs(baseline$DAYS_DIFF)])
>+  
>+  #up the iteration counter
>+  i = i + 1
>+  
>+  }
>+  #make a data frame with the lowest DAYS_DIFF and ID
>+  newdata <- data.frame(uniqueID,first)
>+  names(newdata) <- c("PT_ID","DAYS_DIFF")
>+  
>+  #return the data frame containing the lowest GPI for each ID
>+  return(newdata)
>+  }
>
>ldl.b<-baseline(ldl) #get all baseline ldl patient ID, total 11368 obs, all 
>unique#
>>Error in `[.data.frame`(baseline, which(baseline$PT_ID == PT_ID), 
>>abs(baseline$DAYS_DIFF)) : 
>  undefined columns selected
>
>Can anyone help me in figuring out how to get the minimum value of the 
>absolute value of DAYS_DIFF for unique ID?
>
>Thanks a lot
>    [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help on finding specific columns in matrix

2012-09-02 Thread arun
Hi,
Try this:
 which.min(apply(a,2,mean))
#[1] 1
 which.max(apply(a,2,mean))
#[1] 4
A.K.




- Original Message -
From: Andras Farkas 
To: "r-help@r-project.org" 
Cc: 
Sent: Sunday, September 2, 2012 5:49 AM
Subject: [R] Help on finding specific columns in  matrix

Dear All,
 
I have a matrix with 33 columns and 5000 rows. I would like to find 2 specific 
columns in the set: the one that holds the highest values and the one that 
holds the lowest values. In this case the column's mean would be apropriate to 
use to try to find those specific columns because each columns mean is 
different and they all change together based on the same "change of rate 
constants" as a function of time (ie: tme is the most important determinant of 
the magnitude of that mean).  The columns are not to be named, if possible 
Any thoughts on that? Would apreciate the help
 
example:
 
a 
<-matrix(c(runif(500,10,15),runif(500,15,20),runif(500,20,25),runif(500,25,30)),ncol=4)
 
I would need to find and plot with a box percentile plot column 1, the column 
with the lowest mean, and column 4, the column with the highest mean
 
thanks,
 
Andras 
    [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glmulti runs indefinitely when using genetic algorithm with lme4

2012-09-02 Thread Thomas
Right, I've worked this one out - the problem is that the example (above)  
I was using to test run this package only contains 3 variables. When you  
add in a fourth it works fine:


d = runif(30)

And run again telling it to use GA:

babs <- glmulti(y~a*b*c*d, level = 2, fitfunc = lmer.glmulti, random =  
"+(1|x)", method = "g")


Returns:

...

After 190 generations:
Best model: y~1
Crit= 159.374382952181
Mean crit= 163.380382861026
Improvements in best and average IC have bebingo en below the  
specified goals.

Algorithm is declared to have converged.
Completed.

Using glmulti out-of-the-box with a GLM gives the same result if you try  
to use GA with less than three variables. This is not really an issue  
however as if you've only got three variables it is possible to do an  
exhaustive search. The problem was the example.


Thomas


--- Original message ---
From: Thomas 
To: r-help@r-project.org
Cc:
Subject: glmulti runs indefinitely when using genetic algorithm with lme4
Date: Sun, 02 Sep 2012 11:23:23 +0100

Dear List,

I'm using glmulti for model averaging in R. There are ~10 variables in my
model, making exhaustive screening impractical - I therefore need to use
the genetic algorithm (GA) (call: method = "g").

I need to include random effects so I'm using glmulti as a wrapper for
lme4. Methods for doing this are available here
http://www.inside-r.org/packages/cran/glmulti/docs/glmulti and there is
also a pdf included with the glmulti package that goes into more detail.
The problem is that when telling glmulti to use GA in this setting it runs
indefinitely, even after the best model has been found.

This is the example taken from the pdf included in the glmulti package:

  library(lme4)
  library(glmulti)

  # create a function for glmulti to act as a wrapper for lmer:
  lmer.glmulti <- function (formula, data, random = "", ...) {
  lmer(paste(deparse(formula), random), data = data, REML=F, ...)
  }

  # set some random variables:
  y = runif(30,0,10) # mock dependent variable
  a = runif(30) # dummy covariate
  b = runif(30) # another dummy covariate
  c = runif(30) # an another one
  x = as.factor(round(runif(30),1))# dummy grouping factor

  # run exhaustive screening with lmer:
  bab <- glmulti(y~a*b*c, level = 2, fitfunc = lmer.glmulti, random =
"+(1|x)")

This works fine. The problem is when I tell it to use the genetic
algorithm:

  babs <- glmulti(y~a*b*c, level = 2, fitfunc = lmer.glmulti, random =
"+(1|x)", method = "g")

It just keeps running indefinitely and the AIC does not change:

  After 19550 generations:
  Best model: y~1
  Crit= 161.038899734164
  Mean crit= 164.13629335762
  Change in best IC: 0 / Change in mean IC: 0

  After 19560 generations:
  Best model: y~1
  Crit= 161.038899734164
  Mean crit= 164.13629335762
  Change in best IC: 0 / Change in mean IC: 0

  After 19570 generations:
  Best model: y~1
  Crit= 161.038899734164
  Mean crit= 164.13629335762

  etc.

I have tried using calls that tell glmulti when to stop (deltaB = 0,
deltaM = 0.01, conseq = 6) but nothing seems to work. I think the problem
must lie with setting the function (?). It may be something really obvious
however I'm new to R and I can't work it out.

I am using R v.2.15.0, glmulti v.1.0.6 and lme4 v.0.99-0 on Windows.

Any help with this would be much appreciated.

Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] most efficient plyr solution

2012-09-02 Thread Philip de Witt Hamer
Dear list members,

Any help on this efficiency issue would be greatly appreciated.

I would like to find the most efficient way to run a non-vectorized function 
(here: fisher exact test p-value) iteratively using 4 matrices with identical 
dimensions. And as a result I aim for an array with identical dimensions 
containing the corresponding p-values. Please consider some code using a 
trivial example with 3x4 arrays below. Eventually I would like to run code on 
2e3 x 7e6 arrays, for which someone suggested Amazon EC2 already...

Q1: would you agree that fisher.test() is not vectorizable? e.g. fisher.test( 
matrix(c(Ax,Ay,Bx,By),ncol=2) ) does not work
Q2: direct use of Ax, Ay, Bx, By as input instead of a (list) transform for the 
input would seem beneficial for speed
Q3: parallelization of the iterative process seems to make sense.
Q4: a progress bar seems to save peace of mind having no clue of the runtime.
Q5: avoidance of an output transform to get array from vector  
Q6: for Q2/3/4/5 plyr seems to be ideal (e.g. maply) 

Please also find some solutions below.  
solution 1: using mapply
solution 2: using lapply
solution 3: using mclapply
attempt 4: stuck on plyr implementation

--Philip

### CODE START ###

Ax <- matrix(c(2,3,5,6,
   3,7,8,9,
   8,2,1,3), ncol = 4)
Ay <- matrix(c(9,8,5,7,
   4,9,9,9,
   8,7,5,4), ncol = 4)
Bx <- matrix(c(1,5,9,8,
   4,7,8,9,
   2,3,2,1), ncol = 4)
By <- matrix(c(5,5,9,9,
   9,8,8,9,
   5,5,3,2), ncol = 4)

### solution 1 using mapply
# proper answer, no input transform, output transform, no parallelization, no 
progress update
sol1 <- function() {
 res1 <- mapply(
   function(i,j,k,l) { fisher.test( matrix(c(i,j,k,l), ncol=2), 
conf.int=F)$p.value },
   i=Ax, j=Ay, k=Bx, l=By, 
   SIMPLIFY=TRUE)
 ans1 <- matrix(res1,ncol=4)
 return(ans1)
}
s1 <- sol1()

### solution 2 using lapply
# proper answer, input transform, output transform, no parallelization, no 
progress update
sol2 <- function() {
 tmp.list <- as.data.frame(rbind(as.numeric(Ax), as.numeric(Ay), 
as.numeric(Bx), as.numeric(By)))
 # determine fisher.test p-values as list
 res2 <- lapply(tmp.list, 
function(x) { fisher.test( matrix(unlist(x), ncol=2), conf.int=F)$p.value 
})  
 ans2 <- matrix(unlist(res2),ncol=4)
 return(ans2)
}
s2 <- sol2()

### solution 3 using mclapply
# proper answer using input transform, output transform, parallelization, no 
progress update
library(multicore)
sol3 <- function() {
  tmp.list <- as.data.frame(rbind(as.numeric(Ax), as.numeric(Ay), 
as.numeric(Bx), as.numeric(By)))
  # determine fisher.test p-values as list
  res3 <- mclapply(tmp.list, 
function(x) { fisher.test( matrix(unlist(x), ncol=2), conf.int=F)$p.value },
mc.cores=4)  
  ans3 <- matrix(unlist(res3),ncol=4)
}
s3 <- sol3()

### solution 4 using plyr::maply
# difficulty finding equivalent code
# benefit could be: no input transform, no output transform, parallelization, 
and progress update
 library(plyr)
 library(abind)
 library(doMC)
 registerDoMC(cores=4)
 sol4 <- function() {
  ans4 <- maply(
   #.data = abind(i=Ax,j=Ay,k=Bx,l=By,along=0),
   #.data = abind(Ax,Ay,Bx,By,along=3),
   #.data = data.frame(i=Ax, j=Ay, k=Bx, l=By),
   #.data = cbind(i=as.vector(Ax), j=as.vector(Ay), k=as.vector(Bx), 
l=as.vector(By)),
   #.data = list(i=Ax, j=Ay, k=Bx, l=By),
   .data = abind(i=Ax, j=Ay, k=Bx, l=By, along=3),
   .fun = function(i,j,k,l) { fisher.test( matrix(c(i,j,k,l), ncol=2), 
conf.int=F)$p.value },
   .progress = "text",
   .parralel = TRUE
  )
  return(ans4)
}

all.equal(s1,s2) # TRUE
all.equal(s1,s3) # TRUE

library(microbenchmark)
microbenchmark(sol1, times=1000)
microbenchmark(sol2, times=1000)
microbenchmark(sol3, times=1000)
microbenchmark(sol4, times=1000)


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] for help post-hoc comparisons

2012-09-02 Thread David Winsemius


On Sep 2, 2012, at 12:16 AM, Luo Yinling wrote:



Dear R community
   In my experiment, there are three treatments (F, R, and S), and  
the  storge temperatures of these seeds were 4, 10 and 15. The seeds  
number of the treatment were the same. The question is whether the  
treatments and storage temperatures changed the seed germinating.  
The data attached as a ".csv" file names "g.csv".


The mail server checks to see what file type is attached. If it is not  
mime-text, it gets discarded, and I suspect that your mail client sent  
it as something else. You might get better success with a vanilla  
editor like Notepad on Windows or TextEdit.app on a Mac and make sure  
to save with extension ".txt".


If you want everyone to "be on the same page" you should also include  
your code.



 I performed a generalized linear mixed model using glm.  Time and  
Temp are significant (p < 0.0001) and I would like to perform post- 
hoc  comparisons  to check  where the difference among  Time and  
temp categories but I did not find  a solution  in R help list or  
others.


You might want to look at ?p.adjust to see if it will meet your post- 
hoc testing needs. If not you can also look at the multcomp package.




Thank you very much in advance for your help.


--
Xishuangbanna Tropical Botanical Garden
Chinese Academy of Sciences
Menglun Town, Mengla County
Yunnan 666303
Tel.: +86-691-8716759


--

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splits with 0s in middle columns

2012-09-02 Thread Rui Barradas

Hello,

You don't need a new function, what you need is to prepare your data in 
such a way that the function can process it.



A <- c("percent (10/20/30)", "percent (40/20)", "percent (60/10/10/5)",
"percent (80/10)")
B <- gsub("\\(|\\)|percent| ", "", A)
fun(B)

Also, please use dput to post the data examples,

dput(A)
c("percent (10/20/30)", "percent (40/20)", "percent (60/10/10/5)",
"percent (80/10)")

Then copy&paste in your post.

Rui Barradas

Em 02-09-2012 04:22, Sapana Lohani escreveu:

Dear Rui,

The new code works fine for what I wanted. I have another similar column but it 
looks like

A
percent (10/20/30)
percent (40/20)
percent (60/10/10/5)
percent (80/10)

I want a similar split but delete the percent in the front. The output should 
look like

A1 A2 A3 A4
10 20 0 30
40 0 0 20
60 10 10 5
80 0 0 10

Could you please make the small change in the code that you gave me. It must be 
a small edition but I could not figure that out. FYI the code that worked was

fun <- function(X){
  xname <- deparse(substitute(X))
  s <- strsplit(X, "/")
  n <- max(sapply(s, length))
  tmp <- numeric(n)

  f <- function(x){
  x <- as.numeric(x)
  m <- length(x)
  if(m > 1){
  tmp[n] <- x[m]
  tmp[seq_len(m - 1)] <- x[seq_len(m - 1)]
  }else tmp[1] <- x
  tmp
  }

  res <- do.call(rbind, lapply(s, f))
  colnames(res) <- paste(xname, seq_along(s), sep = "")
  data.frame(res)
}

fun(A)

Thank you so very much.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Computing 'exp(1e3)*0' correctly....

2012-09-02 Thread CHEL HEE LEE
I very appreciate for good comments and tip regarding my question.  All
postings are excellent to know when I am writing such expression in R.
Thank you so much, and my question is completely resolved from all your
postings.

On Sun, 2012-09-02 at 00:50 +0100, Rui Barradas wrote:
> Em 02-09-2012 00:10, Jeff Newmiller escreveu:
> > I disagree that this answer is "wrong". If you want a mathematically 
> > correct answer you are going to have to obtain it by applying intelligence 
> > to the algorithm in which this calculation occurred.
> 
> Logarithms are the product of intelligence.
> And the standard trick to make this sort of computation.
> 
> x <- 1e3
> exp(x + log(0))  # zero
> 
> x <- 1e300
> exp(x + log(0))  # zero
> 
> Rui Barradas
> >   This is not a mailing list about numerical methods in general, so it 
> > probably isn't appropriate to pursue that conversation here.
> > ---
> > Jeff NewmillerThe .   .  Go Live...
> > DCN:Basics: ##.#.   ##.#.  Live Go...
> >Live:   OO#.. Dead: OO#..  Playing
> > Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> > /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
> > ---
> > Sent from my phone. Please excuse my brevity.
> >
> > CHEL HEE LEE  wrote:
> >
> >> I have some trouble to deal the value of 'NaN'.  For example,
> >>
> >>> exp(1e3)
> >> [1] Inf
> >>> exp(1e3)*0
> >> [1] NaN
> >>
> >> The correct answer should be 0 rather than NaN.  I will very appreciate
> >> if anyone can share some technique to get a correct answer.
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help on finding specific columns in matrix

2012-09-02 Thread Rainer Schuermann
If I understand your question correctly, you want to identify the one column 
that has the lowest mean of all columns, and the one column that has the 
highest mean of all columns.

Using your provided sample data, this gives you the indices:

> colMeans(a)
[1] 12.48160 17.46868 22.51761 27.59880
> which.min( colMeans( a ) ) 
[1] 1
> which.max( colMeans( a ) ) 
[1] 4

Does that help?

Rgds,
Rainer


On Sunday 02 September 2012 02:49:11 Andras Farkas wrote:
> Dear All,
>  
> I have a matrix with 33 columns and 5000 rows. I would like to find 2 
specific columns in the set: the one that holds the highest values and the one 
that holds the lowest values. In this case the column's mean would be 
apropriate to use to try to find those specific columns because each columns 
mean is different and they all change together based on the same "change of 
rate constants" as a function of time (ie: tme is the most important 
determinant of the magnitude of that mean).  The columns are not to be named, 
if possible Any thoughts on that? Would apreciate the help
>  
> example:
>  
> a <-
matrix(c(runif(500,10,15),runif(500,15,20),runif(500,20,25),runif(500,25,30)),ncol=4)
>  
> I would need to find and plot with a box percentile plot column 1, the 
column with the lowest mean, and column 4, the column with the highest mean
>  
> thanks,
>  
> Andras 
>   [[alternative HTML version deleted]]
> 






- - - - -

Boycott Apple!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] aov(rt~(F1*F2*F3)+Error(s/(F1*F2*F3)), three_way) question + data

2012-09-02 Thread David G
Dear list members,

I am picking up some experimental data that was collected last year and
analysed using a within subject repeated measure ANOVA using SPSS (glm).
There are 3 factors (F1, F2, F3), each with 2 levels (foc - per; on - off; l
- r) recorded within a balanced 2x2x2 design.

I want to rerun the analysis, using R as I no longer have access to SPSS.
Current attempts have produced different results to previous SPSS runs so I
am requiring a bit of help to ensure I am using the right R code/re-ordering
the data correctly:

Attached is the original matrix data used for the SPSS analysis (mean RTs
0811 rnabble) 
http://r.789695.n4.nabble.com/file/n4641999/mean_RTs_0811_rnabble.csv
mean_RTs_0811_rnabble.csv . I understand that the data needs to be formatted
length wise, which is also attached (three_way rnabble) 
http://r.789695.n4.nabble.com/file/n4641999/three_way_rnabble.csv
three_way_rnabble.csv . The data represents mean reaction times for each
subject across conditions.

I want (I think) to run a repeated measures within subject ANOVA in order to
test main effects of 3 factors as well as any between factor interactions.

The R code that I have used so far (but I am not sure it is the correct way
to get what I want) is:

three_way <- read.csv(file.choose(),header=TRUE)
three_way
aov.3way=aov(Reaction_time~(F1*F2*F3)+Error(Subject/(F1*F2*F3)),three_way)
summary (aov.3way)

This give the reading of significant main effect of F1 (p=0.001) and F2
(p=0.026), with no main effect of F3. There are no interactions.

These results are different to that of SPSS which were: F1 = (p=0.001), F2 =
(0.031), no main effect for F3. There was an interaction: F1*F3 (p=0.042).

Question: Am I using the most appropriate R code for this?

Many thanks in advance.

Best wishes,

David 



--
View this message in context: 
http://r.789695.n4.nabble.com/aov-rt-F1-F2-F3-Error-s-F1-F2-F3-three-way-question-data-tp4641999.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] for help post-hoc comparisons

2012-09-02 Thread Luo Yinling

Dear R community 
In my experiment, there are three treatments (F, R, and S), and the  storge 
temperatures of these seeds were 4, 10 and 15. The seeds number of the 
treatment were the same. The question is whether the treatments and storage 
temperatures changed the seed germinating. The data attached as a ".csv" file 
names "g.csv".  I performed a generalized linear mixed model using glm.  Time 
and Temp are significant (p < 0.0001) and I would like to perform post-hoc  
comparisons  to check  where the difference among  Time and temp categories but 
I did not find  a solution  in R help list or others.  

Thank you very much in advance for your help.


--
Xishuangbanna Tropical Botanical Garden
Chinese Academy of Sciences
Menglun Town, Mengla County
Yunnan 666303
Tel.: +86-691-8716759


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help on finding specific columns in matrix

2012-09-02 Thread Andras Farkas
Dear All,
 
I have a matrix with 33 columns and 5000 rows. I would like to find 2 specific 
columns in the set: the one that holds the highest values and the one that 
holds the lowest values. In this case the column's mean would be apropriate to 
use to try to find those specific columns because each columns mean is 
different and they all change together based on the same "change of rate 
constants" as a function of time (ie: tme is the most important determinant of 
the magnitude of that mean).  The columns are not to be named, if possible 
Any thoughts on that? Would apreciate the help
 
example:
 
a 
<-matrix(c(runif(500,10,15),runif(500,15,20),runif(500,20,25),runif(500,25,30)),ncol=4)
 
I would need to find and plot with a box percentile plot column 1, the column 
with the lowest mean, and column 4, the column with the highest mean
 
thanks,
 
Andras 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glmulti runs indefinitely when using genetic algorithm with lme4

2012-09-02 Thread Thomas

Dear List,

I'm using glmulti for model averaging in R. There are ~10 variables in my  
model, making exhaustive screening impractical - I therefore need to use  
the genetic algorithm (GA) (call: method = "g").


I need to include random effects so I'm using glmulti as a wrapper for  
lme4. Methods for doing this are available here  
http://www.inside-r.org/packages/cran/glmulti/docs/glmulti and there is  
also a pdf included with the glmulti package that goes into more detail.  
The problem is that when telling glmulti to use GA in this setting it runs  
indefinitely, even after the best model has been found.


This is the example taken from the pdf included in the glmulti package:

library(lme4)
library(glmulti)

# create a function for glmulti to act as a wrapper for lmer:
lmer.glmulti <- function (formula, data, random = "", ...) {
lmer(paste(deparse(formula), random), data = data, REML=F, ...)
}

# set some random variables:
y = runif(30,0,10) # mock dependent variable
a = runif(30) # dummy covariate
b = runif(30) # another dummy covariate
c = runif(30) # an another one
x = as.factor(round(runif(30),1))# dummy grouping factor

# run exhaustive screening with lmer:
bab <- glmulti(y~a*b*c, level = 2, fitfunc = lmer.glmulti, random =  
"+(1|x)")


This works fine. The problem is when I tell it to use the genetic  
algorithm:


babs <- glmulti(y~a*b*c, level = 2, fitfunc = lmer.glmulti, random =  
"+(1|x)", method = "g")


It just keeps running indefinitely and the AIC does not change:

After 19550 generations:
Best model: y~1
Crit= 161.038899734164
Mean crit= 164.13629335762
Change in best IC: 0 / Change in mean IC: 0

After 19560 generations:
Best model: y~1
Crit= 161.038899734164
Mean crit= 164.13629335762
Change in best IC: 0 / Change in mean IC: 0

After 19570 generations:
Best model: y~1
Crit= 161.038899734164
Mean crit= 164.13629335762

etc.

I have tried using calls that tell glmulti when to stop (deltaB = 0,  
deltaM = 0.01, conseq = 6) but nothing seems to work. I think the problem  
must lie with setting the function (?). It may be something really obvious  
however I'm new to R and I can't work it out.


I am using R v.2.15.0, glmulti v.1.0.6 and lme4 v.0.99-0 on Windows.

Any help with this would be much appreciated.

Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (Newbie) help cannot find chrome

2012-09-02 Thread Bert Gunter
I should have added

?help.start

tells you what I did already. So you should make it your first order of
business to learn to use R's Help system.

-- Bert

On Sat, Sep 1, 2012 at 2:39 PM, Carilda Thomas wrote:

> Following the beginning tutorial, I typed help.start() and was given a
> choice of browsers. I picked chrome but got back that chrome is not found.
> I cannot seem to change or get rid of it now.
> I looked at /etc/R/Renviron but don't see anywhere to set browser
> (R_BROWSER = ${R_BROWSER ...etc) - nothing about chrome.
> I tried starting as "R_BROWSER= R" - same thing.
> How do I change the default browser, for instance, to firefox? (Using Linux
> Mint xfce)
> Thanks.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (Newbie) help cannot find chrome

2012-09-02 Thread Bert Gunter
?options
options("browser") <- ## whatever

If you don't know how to use the Help system, read the Intro to R tutorial
-- ot type ?help at the command prompt in interactive mode fpr the help for
help.

-- Bert


On Sat, Sep 1, 2012 at 2:39 PM, Carilda Thomas wrote:

> Following the beginning tutorial, I typed help.start() and was given a
> choice of browsers. I picked chrome but got back that chrome is not found.
> I cannot seem to change or get rid of it now.
> I looked at /etc/R/Renviron but don't see anywhere to set browser
> (R_BROWSER = ${R_BROWSER ...etc) - nothing about chrome.
> I tried starting as "R_BROWSER= R" - same thing.
> How do I change the default browser, for instance, to firefox? (Using Linux
> Mint xfce)
> Thanks.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.