date:20111002

[R] Fitting 3 beta distributions

2011-10-02 Thread Nitin Bhardwaj

Hi,
I want to fit 3 beta distributions to my data which ranges between 0 and 1.
What are the functions that I can easily call and specify that 3 beta
distributions should be fitted?
I have already looked at normalmixEM and fitdistr but they dont seem to be
applicable (normalmixEM is only for fitting normal dist and fitdistr will
only fit 1 distribution, not 3). Is that right?

Also, my data has 26 million data points. What can I do to reduce the
computation time with the suggested function?
thanks a lot in advance,
eagerly waiting for any input.
Best
Nitin

-- 
ÎI+IÐ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] deSolve - Function daspk on DAE system - Error

2011-10-02 Thread Vince

I'm getting this error on the attached code and breaking my head but can't
figure it out. Any help is much appreciated. Thanks, Vince

CODE:
library(deSolve)

Res_DAE=function(t, y, dy, pars) {
  with(as.list(c(y, dy, pars)), {
 
  res1 = -dS -dES-k2*ES
  res2 = -dP + k2*ES
 
  eq1 = Eo-E -ES
  eq2 = So-S -ES -P
  return(list(c(res1, res2, eq1, eq2)))
  })
}

pars - c(Eo=0.02, So=0.02, k2=250, E=0.01); pars
yini - c(S=0.01, ES = 0.01, P=0.0, E=0.01); yini
times - seq(0, 0.01, by = 0.0001); times
dyini = c(dS=0.0, dES=0.0, dP=0.0)

## Tabular output check of matrix output

DAE - daspk(y = yini, dy = dyini, times = times, res = Res_DAE, parms =
pars, atol = 1e-10, rtol = 1e-10)

ERROR:
daspk--  warning.. At T(=R1) and stepsize H (=R2) the  nonlinear solver
f
  nonlinear solver failed to converge  repeatedly of with abs (H) =
H
  repeatedly of with abs (H) = HMIN  preconditioner had repeated
failur
0.0D+00  0.5960464477539D-14
Warning messages:
1: In daspk(y = yini, dy = dyini, times = times, res = Res_DAE, parms =
pars,  :
  repeated convergence test failures on a step - inaccurate Jacobian or
preconditioner?
2: In daspk(y = yini, dy = dyini, times = times, res = Res_DAE, parms =
pars,  :
  Returning early. Results are accurate, as far as they go

--
View this message in context: 
http://r.789695.n4.nabble.com/deSolve-Function-daspk-on-DAE-system-Error-tp3864298p3864298.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Entering data into a multi-way array?

2011-10-02 Thread Victoria_Stuart

I am trying to replicate the script, appended below.  My data is in OOCalc
files.  The script (below) synthesizes a dataset (it serves as a
tutorial), but I will need to get my data from OOCalc into R for use in
that script (which uses arrays).

I've worked my way through the script, and understand how most of it works
(except the first bit - Step 1 - which is irrelevant to me, anyway).

[begin script]

### Supplementary material with the paper
### Interpretation of ANOVA models for microarray data using PCA
### J.R. de Haan et al. Bioinformatics (2006)
### Please cite this paper when you use this code in a publication.
### Written by J.R. de Haan, December 18, 2006

### Step1: a synthetic dataset of 500 genes is generated with 5 classes
### 1 unresponsive genes (300 genes)
### 2 constant genes (50 genes)
### 3 profile 1 (50 genes)
### 4 profile 2 (50 genes)
### 5 profile 3 (50 genes)

#generate synthetic dataset with similar dimensions:
# 500 genes, 3 replicates, 10 timepoints, 4 treatments
X - array(0, c(500, 3, 10, 4))
labs.synth - c(rep(1, 300), rep(2, 50), rep(3, 50), rep(4, 50), rep(5, 50))
gnames - cbind(labs.synth, labs.synth)
#print(dim(gnames))
gnames[1:300,2] - A
gnames[301:350,2] - B
gnames[351:400,2] - C
gnames[401:450,2] - D
gnames[451:500,2] - E

### generate 300 noise genes with expressions slightly larger than
### the detection limit (class 1)
X[labs.synth==1,1,,] - rnorm(length(X[labs.synth==1,1,,]), mean=50, sd=40)
X[labs.synth==1,2,,] - X[labs.synth==1,1,,] +
  rnorm(length(X[labs.synth==1,1,,]), mean=0, sd=10)
X[labs.synth==1,3,,] - X[labs.synth==1,1,,] +
  rnorm(length(X[labs.synth==1,1,,]), mean=0, sd=10)

# generate 50 stable genes at two levels (class 2)
X[301:325,1,,] - rnorm(length(X[301:325,1,,]), mean=1500, sd=40)
X[301:325,2,,] - X[301:325,1,,] + rnorm(length(X[301:325,1,,]), mean=0,
sd=10)
X[301:325,3,,] - X[301:325,1,,] + rnorm(length(X[301:325,1,,]), mean=0,
sd=10)

X[326:350,1,,] - rnorm(length(X[326:350,1,,]), mean=3000, sd=40)
X[326:350,2,,] - X[326:350,1,,] + rnorm(length(X[326:350,1,,]), mean=0,
sd=10)
X[326:350,3,,] - X[326:350,1,,] + rnorm(length(X[326:350,1,,]), mean=0,
sd=10)

# generate50 genes with profile 1 (class 3)
increase.range - matrix(rep(1:50, 10), ncol=10, byrow=FALSE)
profA3 - matrix(rep(c(10, 60, 110, 150, 150, 150, 150, 150, 150, 150) ,
50),
 ncol=10, byrow=TRUE) * increase.range
X[351:400,1,,1] - profA3 + rnorm(length(profA3), mean=0, sd=40)
profB3 - matrix(rep(c(10, 100, 220, 280, 280, 280, 280, 280, 280, 280),
50),
 ncol=10, byrow=TRUE) * increase.range
X[351:400,1,1:10,2] - profB3 + rnorm(length(profA3), mean=0, sd=40)
profC3 - matrix(rep(c(10, 120, 300, 300, 280, 280, 280, 280, 280, 280),
50),
 ncol=10, byrow=TRUE) * increase.range
X[351:400,1,1:10,3] - profC3 + rnorm(length(profA3), mean=0, sd=40)
profD3 - matrix(rep(c(100, 75, 50, 50, 50, 50, 50, 50, 75, 100), 50),
ncol=10,
 byrow=TRUE)
X[351:400,1,1:10,4] - profD3 + rnorm(length(profA3), mean=0, sd=40)
#again replicates
X[351:400,2,,] - X[351:400,1,,] + rnorm(length(X[351:400,2,,]), mean=0,
sd=10)
X[351:400,3,,] - X[351:400,1,,] + rnorm(length(X[351:400,3,,]), mean=0,
sd=10)

# generate50 genes with profile 2 (class 4)
increase.range - matrix(rep(1:50, 10), ncol=10, byrow=FALSE)
profA4 - matrix(rep(c(10, 60, 110, 150, 125, 100, 75, 50, 50, 50) , 50),
 ncol=10, byrow=TRUE) * increase.range
X[401:450,1,,1] - profA4 + rnorm(length(profA4), mean=0, sd=40)
profB4 - matrix(rep(c(10, 100, 220, 280, 200, 150, 100, 50, 50, 50), 50),
 ncol=10, byrow=TRUE) * increase.range
X[401:450,1,1:10,2] - profB4 + rnorm(length(profA4), mean=0, sd=40)
profC4 - matrix(rep(c(10, 150, 300, 220, 150, 100, 50, 50, 50, 50), 50),
 ncol=10, byrow=TRUE) * increase.range
X[401:450,1,1:10,3] - profC4 + rnorm(length(profA4), mean=0, sd=40)
profD4 - matrix(rep(c(150, 100, 50, 50, 75, 75, 75, 100, 100, 100), 50),
 ncol=10, byrow=TRUE)
X[401:450,1,1:10,4] - profD4 + rnorm(length(profA4), mean=0, sd=40)
#again replicates
X[401:450,2,,] - X[401:450,1,,] + rnorm(length(X[401:450,2,,]), mean=0,
sd=10)
X[401:450,3,,] - X[401:450,1,,] + rnorm(length(X[401:450,3,,]), mean=0,
sd=10)

# generate50 genes with profile 3 (class 5)
increase.range - matrix(rep(1:25, 20), ncol=10, byrow=FALSE)
profA4 - matrix(rep((200 - c(10, 60, 110, 150, 125, 100, 75, 50, 50, 50)),
50),
 ncol=10, byrow=TRUE) * increase.range
X[451:500,1,,1] - profA4 + rnorm(length(profA4), mean=0, sd=40)
profB4 - matrix(rep((200 - c(10, 100, 180, 200, 200, 150, 100, 50, 50,
50)), 50),
 ncol=10, byrow=TRUE) * increase.range
X[451:500,1,1:10,2] - profB4 + rnorm(length(profA4), mean=0, sd=40)
profC4 - matrix(rep((200 - c(10, 150, 200, 180, 150, 100, 50, 50, 50, 50)),
50),
 ncol=10, byrow=TRUE) * increase.range
X[451:500,1,1:10,3] - profC4 + rnorm(length(profA4), mean=0, sd=40)
profD4 -

Re: [R] Poor performance of Optim

2011-10-02 Thread yehengxin

Thank you for your response!
But the problem is when I estimate a model without knowing the true
coefficients, how can I know which reltol is good enough?  1e-8 or
1e-10?  Why can commercial packages automatically determine the right
reltol but R cannot?

--
View this message in context: 
http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3864243.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Poor performance of Optim

2011-10-02 Thread yehengxin

What I tried is just a simple binary probit model.   Create a random data and
use optim to maximize the log-likelihood function to estimate the
coefficients.   (e.g. u = 0.1+0.2*x + e, e is standard normal.  And y = (u 
0),  y indicating a binary choice variable)

If I estimate coefficient of x, I should be able to get a value close to
0.2 if sample is large enough.  Say I got 0.18. 

If I expand x by twice and reestimate the model, which coefficient should I
get?  0.09, right?

But with optim, I got something different.  When I do the same thing in
both Gauss and Matlab, I can exactly get 0.09, evidencing that the
coefficient estimator is reliable.  But R's optim does not give me a
reliable estimator. 

--
View this message in context: 
http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3863969.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Multivariate Laplace density

2011-10-02 Thread zerfetzen

Can anyone show how to calculate a multivariate Laplace density? Thanks.

--
View this message in context: 
http://r.789695.n4.nabble.com/Multivariate-Laplace-density-tp3864072p3864072.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Poor performance of Optim

2011-10-02 Thread yehengxin

Oh, I think  I got it.  Commercial packages limit the number of decimals
shown.  

--
View this message in context: 
http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3864271.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] plot: how to fix the ratio of the plot box?

2011-10-02 Thread Hofert Jan Marius

Dear all,

this should be trivial, but I couldn't figure out how to solve it... I would 
like to have a plot with fixed aspect ratio of 1. Whenever I resize the Quartz 
window, the axes are extended so that the plot fills the whole window. However, 
if you have different extensions for the different axes, the plot does not look 
 like a square anymore (i.e., aspect ratio 1). The same of course happens if 
you print it to .pdf (ultimate goal). How can I fix the plot box (formed by the 
axes) ratio to be 1, meaning that the plot box is a square no matter how I 
resize the Quartz window?

I searched for this and found: 
http://tolstoy.newcastle.edu.au/R/help/05/04/2888.html
It is more or less recommended to use lattice's xyplot for that. Is there no 
solution for base graphics?
[I know that the extension is by default 4% and that's great, but the the size 
of the Quartz window should not change this (which it does if you resize the 
window accordingly)].

Cheers,

Marius

Minimal example:
u - runif(10)
pdf(width=5, height=5)
plot(u, u, asp=1, xlim=c(0,1), ylim=c(0,1), main=My title)
dev.off()

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Poor performance of Optim

2011-10-02 Thread Daniel Malter

Ben Bolker sent me a private email rightfully correcting me that was
factually wrong when I wrote that ML /is/ a numerical method (I had written
sloppily and under time pressure). He is of course right to point out that
not all maximum likelihood estimators require numerical methods to solve.
Further, only numerical optimization will show the behavior discussed in
this post for the given reasons. (I hope this post isn't yet another blooper
of mine at 5 a.m. in the morning). 

Best,
Daniel


Daniel Malter wrote:
 
 With respect, your statement that R's optim does not give you a reliable
 estimator is bogus. As pointed out before, this would depend on when optim
 believes it's good enough and stops optimizing. In particular if you
 stretch out x, then it is plausible that the likelihood function will
 become flat enough earlier, so that the numerical optimization will stop
 earlier (i.e., optim will think that the slope of the likelihood
 function is flat enough to be considered zero and stop earlier than it
 will for more condensed data). After all, maximum likelihood is a
 numerical method and thus an approximation. I would venture to say that
 what you describe lies in the nature of this method. You could also follow
 the good advice given earlier, by increasing the number of iterations or
 decreasing the tolerance. 
 
 However, check the example below: for all purposes it's really close
 enough and has nothing to do with optim being unreliable.
 
 n-1000
 x-rnorm(n)
 y-0.5*x+rnorm(n)
 z-ifelse(y0,1,0)
 
 X-cbind(1,x)
 b-matrix(c(0,0),nrow=2)
 
 #Probit
 reg-glm(z~x,family=binomial(probit))
 
 #Optim reproducing probit (with minor deviations due to difference in
 method)
 LL-function(b){-sum(z*log(pnorm(X%*%b))+(1-z)*log(1-pnorm(X%*%b)))}
 optim(c(0,0),LL)
 
 #Multiply x by 2 and repeat optim
 X[,2]=2*X[,2]
 optim(c(0,0),LL)
 
 HTH,
 Daniel
 
 
 
 yehengxin wrote:
 
 What I tried is just a simple binary probit model.   Create a random data
 and use optim to maximize the log-likelihood function to estimate the
 coefficients.   (e.g. u = 0.1+0.2*x + e, e is standard normal.  And y =
 (u  0),  y indicating a binary choice variable)
 
 If I estimate coefficient of x, I should be able to get a value close
 to 0.2 if sample is large enough.  Say I got 0.18. 
 
 If I expand x by twice and reestimate the model, which coefficient should
 I get?  0.09, right?
 
 But with optim, I got something different.  When I do the same thing in
 both Gauss and Matlab, I can exactly get 0.09, evidencing that the
 coefficient estimator is reliable.  But R's optim does not give me a
 reliable estimator.
 
 

--
View this message in context: 
http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3864681.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Poor performance of Optim

2011-10-02 Thread Daniel Malter

And there I caught myself with the next blooper: it wasn't Ben Bolker, it was
Bert Gunter who pointed that out. :)


Daniel Malter wrote:
 
 Ben Bolker sent me a private email rightfully correcting me that was
 factually wrong when I wrote that ML /is/ a numerical method (I had
 written sloppily and under time pressure). He is of course right to point
 out that not all maximum likelihood estimators require numerical methods
 to solve. Further, only numerical optimization will show the behavior
 discussed in this post for the given reasons. (I hope this post isn't yet
 another blooper of mine at 5 a.m. in the morning). 
 
 Best,
 Daniel
 
 
 Daniel Malter wrote:
 
 With respect, your statement that R's optim does not give you a reliable
 estimator is bogus. As pointed out before, this would depend on when
 optim believes it's good enough and stops optimizing. In particular if
 you stretch out x, then it is plausible that the likelihood function will
 become flat enough earlier, so that the numerical optimization will
 stop earlier (i.e., optim will think that the slope of the likelihood
 function is flat enough to be considered zero and stop earlier than it
 will for more condensed data). After all, maximum likelihood is a
 numerical method and thus an approximation. I would venture to say that
 what you describe lies in the nature of this method. You could also
 follow the good advice given earlier, by increasing the number of
 iterations or decreasing the tolerance. 
 
 However, check the example below: for all purposes it's really close
 enough and has nothing to do with optim being unreliable.
 
 n-1000
 x-rnorm(n)
 y-0.5*x+rnorm(n)
 z-ifelse(y0,1,0)
 
 X-cbind(1,x)
 b-matrix(c(0,0),nrow=2)
 
 #Probit
 reg-glm(z~x,family=binomial(probit))
 
 #Optim reproducing probit (with minor deviations due to difference in
 method)
 LL-function(b){-sum(z*log(pnorm(X%*%b))+(1-z)*log(1-pnorm(X%*%b)))}
 optim(c(0,0),LL)
 
 #Multiply x by 2 and repeat optim
 X[,2]=2*X[,2]
 optim(c(0,0),LL)
 
 HTH,
 Daniel
 
 
 
 yehengxin wrote:
 
 What I tried is just a simple binary probit model.   Create a random
 data and use optim to maximize the log-likelihood function to estimate
 the coefficients.   (e.g. u = 0.1+0.2*x + e, e is standard normal.  And
 y = (u  0),  y indicating a binary choice variable)
 
 If I estimate coefficient of x, I should be able to get a value close
 to 0.2 if sample is large enough.  Say I got 0.18. 
 
 If I expand x by twice and reestimate the model, which coefficient
 should I get?  0.09, right?
 
 But with optim, I got something different.  When I do the same thing
 in both Gauss and Matlab, I can exactly get 0.09, evidencing that the
 coefficient estimator is reliable.  But R's optim does not give me a
 reliable estimator.
 
 
 

--
View this message in context: 
http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3864688.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Ipad on R

2011-10-02 Thread Oscar Ramírez

It is possible to install R on Ipad 2?  

 

-- 
Oscar Ramírez A.
Universidad Nacional, Escuela de Ciencias Biológicas.
M.Sc. en Conservación y Manejo de Vida Silvestre
osorami...@gmail.com 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot: how to fix the ratio of the plot box?

2011-10-02 Thread Jim Lemon


On 10/02/2011 07:20 PM, Hofert Jan Marius wrote:

Dear all,

this should be trivial, but I couldn't figure out how to solve it... I would like to have 
a plot with fixed aspect ratio of 1. Whenever I resize the Quartz window, the axes are 
extended so that the plot fills the whole window. However, if you have different 
extensions for the different axes, the plot does not look  like a square 
anymore (i.e., aspect ratio 1). The same of course happens if you print it to .pdf 
(ultimate goal). How can I fix the plot box (formed by the axes) ratio to be 1, meaning 
that the plot box is a square no matter how I resize the Quartz window?

I searched for this and found: 
http://tolstoy.newcastle.edu.au/R/help/05/04/2888.html
It is more or less recommended to use lattice's xyplot for that. Is there no 
solution for base graphics?
[I know that the extension is by default 4% and that's great, but the the size 
of the Quartz window should not change this (which it does if you resize the 
window accordingly)].

Cheers,

Marius

Minimal example:
u- runif(10)
pdf(width=5, height=5)
plot(u, u, asp=1, xlim=c(0,1), ylim=c(0,1), main=My title)
dev.off()


Hi Marius,
Have you tried:

par(pty=s)

after you open the device and before plotting?

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot: how to fix the ratio of the plot box?

2011-10-02 Thread Hofert Jan Marius

ahh, perfect, thanks.

Cheers,

Marius

On 2011-10-02, at 13:08 , Jim Lemon wrote:

 On 10/02/2011 07:20 PM, Hofert Jan Marius wrote:
 Dear all,
 
 this should be trivial, but I couldn't figure out how to solve it... I would 
 like to have a plot with fixed aspect ratio of 1. Whenever I resize the 
 Quartz window, the axes are extended so that the plot fills the whole 
 window. However, if you have different extensions for the different axes, 
 the plot does not look  like a square anymore (i.e., aspect ratio 1). The 
 same of course happens if you print it to .pdf (ultimate goal). How can I 
 fix the plot box (formed by the axes) ratio to be 1, meaning that the plot 
 box is a square no matter how I resize the Quartz window?
 
 I searched for this and found: 
 http://tolstoy.newcastle.edu.au/R/help/05/04/2888.html
 It is more or less recommended to use lattice's xyplot for that. Is there no 
 solution for base graphics?
 [I know that the extension is by default 4% and that's great, but the the 
 size of the Quartz window should not change this (which it does if you 
 resize the window accordingly)].
 
 Cheers,
 
 Marius
 
 Minimal example:
 u- runif(10)
 pdf(width=5, height=5)
 plot(u, u, asp=1, xlim=c(0,1), ylim=c(0,1), main=My title)
 dev.off()
 
 Hi Marius,
 Have you tried:
 
 par(pty=s)
 
 after you open the device and before plotting?
 
 Jim
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ipad on R

2011-10-02 Thread Sarah Goslee

2011/10/2 Oscar Ramírez osorami...@gmail.com:
 It is possible to install R on Ipad 2?

This discussion predates the iPad 2, but the licensing restrictions likely
still apply:
http://www.r-statistics.com/2010/06/could-we-run-a-statistical-analysis-on-iphoneipad-using-r/

One-word answer: no.
Two-word answer: Not legally.

But do read the discussion at the above link.

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] subset in dataframes

2011-10-02 Thread Cecilia Carmo

I need help in subseting a dataframe:

 

data1-data.frame(year=c(2001,2002,2003,2004,2001,2002,2003,2004,

2001,2002,2003,2004,2001,2002,2003,2004),

firm=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),x=c(11,22,-32,25,-26,47,85,98,

101,14,87,56,12,43,67,54),

y=c(110,220,302,250,260,470,850,980,1010,140,870,560,120,430,670,540))

 

data1

 

I want to keep the firms where all x0 (where there are no negative values
in x)

So my output should be:

   year firm   xy

1  20013 101 1010

2 20023  14  140

3 20033  87  870

4 20043  56  560

5 20014  12  120

6 20024  43  430

7 20034  67  670

8 20044  54  540

 

So I'm doing:

data2-data1[data1$firm%in%subset(data1,data1$x0),]

data2

 

But the result is

 

[1] year firm xy   

0 rows (or 0-length row.names)

 

Thank you for any help

 

Cecília Carmo

(Universidade de Aveiro)


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset in dataframes

2011-10-02 Thread Sarah Goslee

Hi,

On Sun, Oct 2, 2011 at 7:48 AM, Cecilia Carmo cecilia.ca...@ua.pt wrote:
 I need help in subseting a dataframe:



 data1-data.frame(year=c(2001,2002,2003,2004,2001,2002,2003,2004,

 2001,2002,2003,2004,2001,2002,2003,2004),

 firm=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),x=c(11,22,-32,25,-26,47,85,98,

 101,14,87,56,12,43,67,54),

 y=c(110,220,302,250,260,470,850,980,1010,140,870,560,120,430,670,540))


Thank you for providing a reproducible example.



 data1



 I want to keep the firms where all x0 (where there are no negative values
 in x)

 So my output should be:

   year firm   x    y

 1  2001    3 101 1010

 2 2002    3  14  140

 3 2003    3  87  870

 4 2004    3  56  560

 5 2001    4  12  120

 6 2002    4  43  430

 7 2003    4  67  670

 8 2004    4  54  540



 So I'm doing:

 data2-data1[data1$firm%in%subset(data1,data1$x0),]

 data2



What about finding which ones have negative values and should be deleted,

 unique(data1$firm[data1$x = 0])
[1] 1 2

And then deleting them?

 data1[!(data1$firm %in% unique(data1$firm[data1$x = 0])),]
   year firm   xy
9  20013 101 1010
10 20023  14  140
11 20033  87  870
12 20043  56  560
13 20014  12  120
14 20024  43  430
15 20034  67  670
16 20044  54  540



 But the result is

 [1] year firm x    y

 0 rows (or 0-length row.names)



If you look at just the result of part of your code,
subset(data1,data1$x0)
it isn't giving at all what you need for the next step: the entire
data frame for x0.


Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Difference between ~lp() or simply ~ in R's locfit?

2011-10-02 Thread László Sándor

As I think it is not spam but helpful, let me repeat my stats.stackexchange.com 
question here, from 
http://stats.stackexchange.com/questions/16346/difference-between-lp-or-simply-in-rs-locfit

I am not sure I see the difference between different examples for local 
logistic regression in the documentation of the gold standard locfit package 
for R: http://cran.r-project.org/web/packages/locfit/locfit.pdf

I get starkingly different results with

fit2-scb(closed_rule ~ lp(bl),deg=1,xlim=c(0,1),ev=lfgrid(100), 
family='binomial',alpha=cbind(0,0.3),kern=parm)

from

fit2-scb(closed_rule ~ bl,deg=1,xlim=c(0,1),ev=lfgrid(100), 
family='binomial',alpha=cbind(0,0.3),kern=parm)

.

What is the nature of the difference? Maybe that can help me phrase which I 
wanted. I had in mind an index linear in bl within a logistic link function 
predicting the probability of closed_rule. The documentation of lp says that it 
fits a local polynomial  which is great, but I thought that would happen even 
if I leave it out. And in any case, the documentation has examples for local 
logistic regression either way
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fitting 3 beta distributions

2011-10-02 Thread Achim Zeileis


On Sat, 1 Oct 2011, Nitin Bhardwaj wrote:


Hi,
I want to fit 3 beta distributions to my data which ranges between 0 and 1.
What are the functions that I can easily call and specify that 3 beta
distributions should be fitted?
I have already looked at normalmixEM and fitdistr but they dont seem to be
applicable (normalmixEM is only for fitting normal dist and fitdistr will
only fit 1 distribution, not 3). Is that right?


From your description above, I guess that (a) you want to fit a _mixture_ 
of 3 beta distributions, and (b) have tried to use mixtools and MASS 
so far.


Based on these assumptions: fitdistr() does not fit mixture models. 
mixtools does fit mixtures and the accompanying paper has an example 
where a nonparametric model is applied to mixtures of beta distributions. 
Furthermore, the betareg package has a function betamix() which can fit 
mixtures of beta regression models (including the special case of no 
covariates).


Both mixtools and betareg have been published in JSS, as indicated 
when calling citation(mixtools) and citation(betareg):

http://www.jstatsoft.org/v32/i06/
http://www.jstatsoft.org/v34/i02/

The latter does not yet contain the betamix() function. As an example, one 
can use the artificial data generated in Section 5.2:

  set.seed(123)
  y1 - c(rbeta(150, 0.3 * 4, 0.7 * 4), rbeta(50, 0.5 * 4, 0.5 * 4))
  y2 - c(rbeta(100, 0.3 * 4, 0.7 * 4), rbeta(100, 0.3 * 8, 0.7 * 8))
  d - data.frame(y1, y2)
  bm1 - betamix(y1 ~ 1 | 1, data = d, k = 2)
  bm2 - betamix(y2 ~ 1 | 1, data = d, k = 2)
where one should note that compared to R's parametrization of the beta 
distribution two transformations are employed: From shape1/shape2 to 
mu/phi and then adding logit/log link functions.



Also, my data has 26 million data points. What can I do to reduce the
computation time with the suggested function?


I think all functions above will have problems with 26 million 
observations directly. One alternative - if the fitting function 
takes weights - would be to use a representative sample or computing 
weights on a possibly coarsened grid.


hth,
Z


thanks a lot in advance,
eagerly waiting for any input.
Best
Nitin

--
??I+I??

[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset in dataframes

2011-10-02 Thread Cecilia Carmo

Thank you very much.

My dataframe has thousands of firms, how can I delete all of those with x0
and keep another dataframe with firms where all x0?

Thank you again.

Cecília Carmo
(Universidade de Aveiro - Portugal)

-Mensagem original-
De: Sarah Goslee [mailto:sarah.gos...@gmail.com] 
Enviada: domingo, 2 de Outubro de 2011 13:01
Para: Cecilia Carmo
Cc: r-help@r-project.org
Assunto: Re: [R] subset in dataframes

Hi,

On Sun, Oct 2, 2011 at 7:48 AM, Cecilia Carmo cecilia.ca...@ua.pt wrote:
 I need help in subseting a dataframe:



 data1-data.frame(year=c(2001,2002,2003,2004,2001,2002,2003,2004,

 2001,2002,2003,2004,2001,2002,2003,2004),

 firm=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),x=c(11,22,-32,25,-26,47,85,98,

 101,14,87,56,12,43,67,54),

 y=c(110,220,302,250,260,470,850,980,1010,140,870,560,120,430,670,540))


Thank you for providing a reproducible example.



 data1



 I want to keep the firms where all x0 (where there are no negative values
 in x)

 So my output should be:

   year firm   x    y

 1  2001    3 101 1010

 2 2002    3  14  140

 3 2003    3  87  870

 4 2004    3  56  560

 5 2001    4  12  120

 6 2002    4  43  430

 7 2003    4  67  670

 8 2004    4  54  540



 So I'm doing:

 data2-data1[data1$firm%in%subset(data1,data1$x0),]

 data2



What about finding which ones have negative values and should be deleted,

 unique(data1$firm[data1$x = 0])
[1] 1 2

And then deleting them?

 data1[!(data1$firm %in% unique(data1$firm[data1$x = 0])),]
   year firm   xy
9  20013 101 1010
10 20023  14  140
11 20033  87  870
12 20043  56  560
13 20014  12  120
14 20024  43  430
15 20034  67  670
16 20044  54  540



 But the result is

 [1] year firm x    y

 0 rows (or 0-length row.names)



If you look at just the result of part of your code,
subset(data1,data1$x0)
it isn't giving at all what you need for the next step: the entire
data frame for x0.


Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset in dataframes

2011-10-02 Thread Sarah Goslee

Hi,

On Sun, Oct 2, 2011 at 9:08 AM, Cecilia Carmo cecilia.ca...@ua.pt wrote:
 Thank you very much.

 My dataframe has thousands of firms, how can I delete all of those with x0
 and keep another dataframe with firms where all x0?

How does that differ from your original question? What doesn't work for you in
the answer I already gave?

Sarah

 Thank you again.

 Cecília Carmo
 (Universidade de Aveiro - Portugal)

 -Mensagem original-
 De: Sarah Goslee [mailto:sarah.gos...@gmail.com]
 Enviada: domingo, 2 de Outubro de 2011 13:01
 Para: Cecilia Carmo
 Cc: r-help@r-project.org
 Assunto: Re: [R] subset in dataframes

 Hi,

 On Sun, Oct 2, 2011 at 7:48 AM, Cecilia Carmo cecilia.ca...@ua.pt wrote:
 I need help in subseting a dataframe:



 data1-data.frame(year=c(2001,2002,2003,2004,2001,2002,2003,2004,

 2001,2002,2003,2004,2001,2002,2003,2004),

 firm=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),x=c(11,22,-32,25,-26,47,85,98,

 101,14,87,56,12,43,67,54),

 y=c(110,220,302,250,260,470,850,980,1010,140,870,560,120,430,670,540))


 Thank you for providing a reproducible example.



 data1



 I want to keep the firms where all x0 (where there are no negative values
 in x)

 So my output should be:

   year firm   x    y

 1  2001    3 101 1010

 2 2002    3  14  140

 3 2003    3  87  870

 4 2004    3  56  560

 5 2001    4  12  120

 6 2002    4  43  430

 7 2003    4  67  670

 8 2004    4  54  540



 So I'm doing:

 data2-data1[data1$firm%in%subset(data1,data1$x0),]

 data2



 What about finding which ones have negative values and should be deleted,

 unique(data1$firm[data1$x = 0])
 [1] 1 2

 And then deleting them?

 data1[!(data1$firm %in% unique(data1$firm[data1$x = 0])),]
   year firm   x    y
 9  2001    3 101 1010
 10 2002    3  14  140
 11 2003    3  87  870
 12 2004    3  56  560
 13 2001    4  12  120
 14 2002    4  43  430
 15 2003    4  67  670
 16 2004    4  54  540



 But the result is

 [1] year firm x    y

 0 rows (or 0-length row.names)



 If you look at just the result of part of your code,
 subset(data1,data1$x0)
 it isn't giving at all what you need for the next step: the entire
 data frame for x0.




-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset in dataframes

2011-10-02 Thread Cecilia Carmo

Sarah,

Sorry for being ignorant. I was doing something wrong. It works perfectly. 

Thank you.

Cecília Carmo


-Mensagem original-
De: Sarah Goslee [mailto:sarah.gos...@gmail.com] 
Enviada: domingo, 2 de Outubro de 2011 14:21
Para: Cecilia Carmo
Cc: r-help@r-project.org
Assunto: Re: [R] subset in dataframes

Hi,

On Sun, Oct 2, 2011 at 9:08 AM, Cecilia Carmo cecilia.ca...@ua.pt wrote:
 Thank you very much.

 My dataframe has thousands of firms, how can I delete all of those with
x0
 and keep another dataframe with firms where all x0?

How does that differ from your original question? What doesn't work for you
in
the answer I already gave?

Sarah

 Thank you again.

 Cecília Carmo
 (Universidade de Aveiro - Portugal)

 -Mensagem original-
 De: Sarah Goslee [mailto:sarah.gos...@gmail.com]
 Enviada: domingo, 2 de Outubro de 2011 13:01
 Para: Cecilia Carmo
 Cc: r-help@r-project.org
 Assunto: Re: [R] subset in dataframes

 Hi,

 On Sun, Oct 2, 2011 at 7:48 AM, Cecilia Carmo cecilia.ca...@ua.pt wrote:
 I need help in subseting a dataframe:



 data1-data.frame(year=c(2001,2002,2003,2004,2001,2002,2003,2004,

 2001,2002,2003,2004,2001,2002,2003,2004),

 firm=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),x=c(11,22,-32,25,-26,47,85,98,

 101,14,87,56,12,43,67,54),

 y=c(110,220,302,250,260,470,850,980,1010,140,870,560,120,430,670,540))


 Thank you for providing a reproducible example.



 data1



 I want to keep the firms where all x0 (where there are no negative
values
 in x)

 So my output should be:

   year firm   x    y

 1  2001    3 101 1010

 2 2002    3  14  140

 3 2003    3  87  870

 4 2004    3  56  560

 5 2001    4  12  120

 6 2002    4  43  430

 7 2003    4  67  670

 8 2004    4  54  540



 So I'm doing:

 data2-data1[data1$firm%in%subset(data1,data1$x0),]

 data2



 What about finding which ones have negative values and should be deleted,

 unique(data1$firm[data1$x = 0])
 [1] 1 2

 And then deleting them?

 data1[!(data1$firm %in% unique(data1$firm[data1$x = 0])),]
   year firm   x    y
 9  2001    3 101 1010
 10 2002    3  14  140
 11 2003    3  87  870
 12 2004    3  56  560
 13 2001    4  12  120
 14 2002    4  43  430
 15 2003    4  67  670
 16 2004    4  54  540



 But the result is

 [1] year firm x    y

 0 rows (or 0-length row.names)



 If you look at just the result of part of your code,
 subset(data1,data1$x0)
 it isn't giving at all what you need for the next step: the entire
 data frame for x0.




-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Overlapping plot in lattice

2011-10-02 Thread Kang Min

Thanks Gabor, that was exactly what I needed.

On Sep 30, 9:00 pm, Gabor Grothendieck ggrothendi...@gmail.com
wrote:
 On Fri, Sep 30, 2011 at 3:01 AM, Kang Min ngokang...@gmail.com wrote:
  Hi all,

  I was wondering if there's an equivalent to par(new=T) of the plot
  function in lattice. I'm plotting an xyplot, and I would like to
  highlight one point by plotting that one point again using a different
  symbol.

  For example, where 6 is highlighted:
  plot(1:10, xlim=c(0,10), ylim=c(0,10))
  par(new=T)
  plot(6,6, xlim=c(0,10), ylim=c(0,10), pch=16)

 Try this:

 library(lattice)
 xyplot(1:10 ~ 1:10, xlim=c(0,10), ylim=c(0,10))
 trellis.focus()
 panel.points(6, 6, pch = 6)
 trellis.unfocus()

 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Arimax First-Order Transfer Function

2011-10-02 Thread David Humphreys

Dear list members,

I am a (very) recent convert to R and I am hoping you can help me with a
problem I'm having. I'm trying to fit a first-order transfer function to an
ARIMA intervention analysis using the arimax function. The data was
obtained from McCleary  Hay (1980) (via Rob Hyndman's Time Series Library:
http://robjhyndman.com/tsdldata/data/schizo.dat). It has 120 time points
with an intervention occurring on the 60th unit. So far I've been able to
run a simple zero-order intervention model , which I've done like this:

Model1 -arimax(x,order=c(0,1,1), xreg=Intv)

** where Intv -as.matrix(c(rep(0,60),rep(1,60))) (the dummy, intervention
variable).

I'd like to add a first-order transfer function in order to test for
gradual, permanent effects. I understand this can be done by adding the
xtransf and transfer arguments, however after playing around with this
I've been unsuccessful in replicating the results found in McCleary 
Hay(1980).

I've looked, in depth, at the 'airline' example, however, despite the
guidance provided by Chan on this (see below) it's not immediately clear to
me how xtransfer (i.e. I911=1*(seq(airmiles)==69) and the transfer (i.e.
transfer=list(c(0,0),c(1,0)) arguments are generated, and what they consist
of. I've looked extensively for further information on this, but to no
avail. Is anyone able to offer any further advice/ directions on how to go
about this?

Best wishes

David

example provided by Chan (2008)(airline example):

air.m1=arimax(log(airmiles),order=c(0,1,1),seasonal=list(order=c(0,1,1),
period=12),xtransf=data.frame(I911=1*(seq(airmiles)==69),
I911=1*(seq(airmiles)==69)),
transfer=list(c(0,0),c(1,0)),xreg=data.frame(Dec96=1*(seq(airmiles)==12),
Jan97=1*(seq(airmiles)==13),Dec02=1*(seq(airmiles)==84)),method='ML')
# Additive outliers are incorporated as dummy variables in xreg.
# Transfer function components are incorporated by the xtransf and transfer
# arguments.
# Here, the transfer function consists of two parts omega0*P(t) and
# omega1/(1-omega2*B)P(t) where the inputs of the two transfer
# functions are identical and equals the dummy variable that is 1 at September
# 2001 (the 69th data point) and zero otherwise.
# xtransf is a matrix whose columns are the input variables.
# transfer is a list consisting of the pair of (MA order, AR order) of each

# transfer function, which in this examples is (0,0) and (1,0).

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Find all duplicate records

2011-10-02 Thread Erik Svensson

Hello,
In a data frame I want to identify ALL duplicate IDs in the example to be
able to examine OS and time.

(df-data.frame(ID=c(userA, userB, userA, userC),
  OS=c(Win,OSX,Win, Win64),
  time=c(12:22,23:22,04:44,12:28)))

 IDOS  time
1 userA   Win 12:22
2 userB   OSX 23:22
3 userA   Win 04:44
4 userC Win64 12:28

My desired output is that ALL records with the same IDs are found:

userA   Win 12:22
userA   Win 04:44

preferably by returning logical values (TRUE FALSE TRUE FALSE)

Is there a simple way to do that?

[-- With duplicated(df$ID) the output will be
[1] FALSE FALSE  TRUE FALSE 
i.e. not all user A records are found

With unique(df$ID)
[1] userA userB userC
Levels: userA userB userC 
i.e. one of each ID is found --]

Erik Svensson

--
View this message in context: 
http://r.789695.n4.nabble.com/Find-all-duplicate-records-tp3865139p3865139.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Find all duplicate records

2011-10-02 Thread Uwe Ligges




On 02.10.2011 16:05, Erik Svensson wrote:

Hello,
In a data frame I want to identify ALL duplicate IDs in the example to be
able to examine OS and time.

(df-data.frame(ID=c(userA, userB, userA, userC),
   OS=c(Win,OSX,Win, Win64),
   time=c(12:22,23:22,04:44,12:28)))

  IDOS  time
1 userA   Win 12:22
2 userB   OSX 23:22
3 userA   Win 04:44
4 userC Win64 12:28

My desired output is that ALL records with the same IDs are found:

userA   Win 12:22
userA   Win 04:44


See ?split or ?subset

Uwe Ligges




preferably by returning logical values (TRUE FALSE TRUE FALSE)

Is there a simple way to do that?

[-- With duplicated(df$ID) the output will be
[1] FALSE FALSE  TRUE FALSE
i.e. not all user A records are found

With unique(df$ID)
[1] userA userB userC
Levels: userA userB userC
i.e. one of each ID is found --]

Erik Svensson

--
View this message in context: 
http://r.789695.n4.nabble.com/Find-all-duplicate-records-tp3865139p3865139.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Poor performance of Optim

2011-10-02 Thread Ravi Varadhan

Hi,

You really need to study the documentation of optim carefully before you make 
broad generalizations.  There are several algorithms available in optim.  The 
default is a simplex-type algorithm called Nelder-Mead.  I think this is an 
unfortunate choice as the  default algorithm.  Nelder-Mead is a robust 
algorithm that can work well for almost any kind of objective function (smooth 
or nasty). However, the trade-off is that it is very slow in terms of 
convergence rate.  For simple, smooth problems, such as yours, you should use 
BFGS (or L-BFGS if you have simple box-constraints).  Also, take a look at 
the optimx package and the most recent paper in J Stat Software on optimx for 
a better understanding of the wide array of optimization options available in R.

Best,
Ravi.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] generating Venn diagram with 6 sets

2011-10-02 Thread Mao Jianfeng

Dear r-helpers,

Here I would like to have your kind helps on generating Venn diagram.

There are some packages within R on this task, like venneuler, VennDiagram,
vennerable. But, vennerable can not be installed on my Mac book. It seems
VennDiagram can not work on my data. And, venneuler may have generated a
wrong Venn diagram to me.

Do you have any experience/expertise on those Venn diagram? Could you please
give me any directions on that?

Thanks in advance.

Best wishes,
Jian-Feng,

##
# (1) my code for venneuler
vd - venneuler(c(ABCDEF=
69604,ABCDE=426120
,ABCDF=20297,ABCD=123063,ABCEF=12695,ABCE=115100,ABCF=11667,ABC=95656,ABDEF=1755,ABDE=20113,ABDF=1903,ABD=19218,ABEF=2831,ABE=38362,ABF=4950,AB=68289,ACDEF=11657,ACDE=107235,ACDF=14883,ACD=193338,ACEF=6284,ACE=79985,ACF=14710,AC=
271416
,ADEF=1069,ADE=17628,ADF=3152,AD=71573,AEF=2786,AE=57511,AF=13684,A=
475970
,BCDEF=2722,BCDE=30528,BCDF=2740,BCD=30986,BCEF=3579,BCE=55443,BCF=7789,BC=101005,BDEF=917,BDE=14894,BDF=1436,BD=24972,BEF=3975,BE=105527,BF=
16877,B=718570
,CDEF=1587,CDE=26289,CDF=4902,CD=101947,CEF=3326,CE=77289,CF=20125,C=
689330,DEF=892,DE=22666,DF=4661,D=200020,EF=8518,E=521290,F=
401622))

pdf(myvenn.pdf)
plot(vd)
dev.off()


#
# (2) the problem of the plot venneuler generated me is sets (A,B,C,D,E,F)
should shared 69604 elements.
#  But, it illustrated nothing for me for this 6 sets sharing.



#
# (3) I prepared my code for vennerable package, but it can not be installed
now.

myVenn - Venn(SetNames = c(Norway,Russia,Iceland,
Scotland,Austria,North American), Weight = c('11'=69604,'10'=
426120
,'01'=20297','00'=123063,'111011'=12695,'111010'=115100,'111001'=11667,'111000'=95656,'110111'=1755,'110110'=20113,'110101'=1903,'110100'=19218,'110011'=2831,'110010'=38362,'110001'=4950,'11'=68289,'10'=11657,'101110'=107235,'101101'=14883,'101100'=193338,'101011'=6284,'101010'=79985,'101001'=14710,'101000'=
271416
,'100111'=1069,'100110'=17628,'100101'=3152,'100100'=71573,'100011'=2786,'100010'=57511,'11'=13684,'10'=
475970
,'01'=2722,'00'=30528,'011101'=2740,'011100'=30986,'011011'=3579,'011010'=55443,'011001'=7789,'011000'=101005,'010111'=917,'010110'=14894,'010101'=1436,'010100'=24972,'010011'=3975,'010010'=105527,'010001'=
16877,'01'=718570
,'00'=1587,'001110'=26289,'001101'=4902,'001100'=101947,'001011'=3326,'001010'=77289,'001001'=20125,'001000'=689330,'000111'=892,'000110'=22666,'000101'=4661,'000100'=200020,'11'=8518,'10'=521290,'01'=401622)

pdf(myVenn)
plot(myVenn, doWeight = T, type = circles)
dev.off()

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] regarding specifying criteria for Cointegration

2011-10-02 Thread upananda pani

Dear All,

I am learning R and Time Series Econometrics for the first time. I have
doubt regarding cointegration specification criteria. The problem follows:

test1 - ca.jo(data1,ecdet=const,type=trace,K=2,spec=transitory)---When
to specify transitory
test1 - ca.jo(data1,ecdet=const,type=trace,K=2,spec=longrun)..when to
specify long-run

With regards,
Upananda

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error while using shapiro.test()

2011-10-02 Thread Peter Ehlers


On 2011-10-01 09:24, spicymchaggis101 wrote:

Thank you very much! your response solved my issue.

I needed to determine the probability of normality for word types per page.



You may want to review just what the test does. It certainly does not
give you the 'probability of normality'. A worthwhile exercise might
be to test several other distributions on your data.

Peter Ehlers


--
View this message in context: 
http://r.789695.n4.nabble.com/error-while-using-shapiro-test-tp3861535p3863205.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] On-line machine learning packages?

2011-10-02 Thread Jason Edgecombe


Hello Jay,

Did you find the answer to your question on incremental machine 
learning? If not, I found some links that might help:


It appears that might be able to do streaming/incremental machine 
learning in Weka:

http://moa.cs.waikato.ac.nz/details/classification/using-weka/

On the above link, there is a link to a free online book on data stream 
mining:

http://heanet.dl.sourceforge.net/project/moa-datastream/documentation/StreamMining.pdf

While weka is a separate project from R, there is an R to Weka interface 
available at

http://cran.r-project.org/web/packages/RWeka/index.html

Sadly, I didn't see any streaming/incremental machine learning packages 
on the CRAN machine leaning task view.


I would guess that your best bet is using Weka with the Rweka interface, 
but I'm a neophyte in the machine learning field, so please take this 
advice with a grain of salt.


Sincerely,
Jason


On 09/13/2011 02:35 AM, Jay wrote:

How does sequential classification differ form running a one-off
classifier for each run?
-  Because feedback from the previous round can and needs to be
incorporated into the ext round.


http://lmgtfy.com/?q=R+machine+learning
-  That is a new low. I was hoping to get help, oblivious I was wrong
to use this forum in the hopes of somebody had already battled these
kinds of problems in R.


On Sep 13, 1:52 am, Jason Edgecombeja...@rampaginggeek.com  wrote:

I already provided the link to the task view, which provides a list of
the more popular machine learning algorithms for R.

Do you have a particular algorithm or technique in mind? Does it have a
name?

How does sequential classification differ form running a one-off
classifier for each run?

On 09/12/2011 05:24 AM, Jay wrote:




In my mind this sequential classification task with feedback is
somewhat different from an completely offline, once-off,
classification. Am I wrong?
However, it looks like the mentality on this topic is to refer me to
cran/google in order to look for solutions myself. Oblivious I know
about these sources, and as I said, I used rseek.org among other
sources to look for solutions. I did not start this topic for fun, I'm
asking for help to find a suitable machine learning packages that
readily incorporates feedback loops and online learning. If somebody
has experience these kinds of problems in R, please respond.
Or will
http://cran.r-project.org
Look for 'Task Views'
be my next piece of advice?
On Sep 12, 11:31 am, Dennis Murphydjmu...@gmail.comwrote:

http://cran.r-project.org/web/views/
Look for 'machine learning'.
Dennis
On Sun, Sep 11, 2011 at 11:33 PM, Jayjosip.2...@gmail.comwrote:

If the answer is so obvious, could somebody please spell it out?
On Sep 11, 10:59 pm, Jason Edgecombeja...@rampaginggeek.comwrote:

Try this:
http://cran.r-project.org/web/views/MachineLearning.html
On 09/11/2011 12:43 PM, Jay wrote:

Hi,
I used the rseek search engine to look for suitable solutions, however
as I was unable to find anything useful, I'm asking for help.
Anybody have experience with these kinds of problems? I looked into
dynaTree, but as information is a bit scares and as I understand it,
it might not be what I'm looking for..(?)
BR,
Jay
On Sep 11, 7:15 pm, David Winsemiusdwinsem...@comcast.net  wrote:

On Sep 11, 2011, at 11:42 AM, Jay wrote:

What R packages are available for performing classification tasks?
That is, when the predictor has done its job on the dataset (based on
the training set and a range of variables), feedback about the true
label will be available and this information should be integrated for
the next classification round.

You should look at CRAN Task Views. Extremely easy to find from the
main R-project page.
--
David Winsemius, MD
West Hartford, CT
__
r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
r-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
r-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting

Re: [R] On-line machine learning packages?

2011-10-02 Thread Steve Lianoglou

Hi Jay,

I see this thread is a bit (ok, quite) old at this point, but I see
you never really got an answer to your question that was satisfactory.
I figured you might be interested to know that Dirk has started to
wrap vowpal wabbit[1,2] into an R package, RVowpalWabbit[3,4]

The package itself is still a rather bare-bones, but perhaps it can be
useful to you in its current state, or perhaps the raw vowpal
wabbit.

You might also consider the shogun toolbox[5]. As of its 1.0 release,
I believe it has incorporated vowpal wabbit in some form or another to
do online learning, but might have other online learning algos baked
in. It has its own flavor of an R interface (r_static or r_modular),
which might work for you if you can get it to compile.

-steve

[1] Vowpal Wabbit (home page):
http://hunch.net/~vw/

[2] Vowpal Wabbit (github):
https://github.com/JohnLangford/vowpal_wabbit

[3] RVowpalWabbit (CRAN):
http://cran.r-project.org/web/packages/RVowpalWabbit/index.html

[4] RVowpalWabbit (R-forge):
https://r-forge.r-project.org/projects/rvowpalwabbit/

[5] The shogun toolbox:
http://www.shogun-toolbox.org/

On Mon, Sep 12, 2011 at 5:24 AM, Jay josip.2...@gmail.com wrote:
 In my mind this sequential classification task with feedback is
 somewhat different from an completely offline, once-off,
 classification. Am I wrong?
 However, it looks like the mentality on this topic is to refer me to
 cran/google in order to look for solutions myself. Oblivious I know
 about these sources, and as I said, I used rseek.org among other
 sources to look for solutions. I did not start this topic for fun, I'm
 asking for help to find a suitable machine learning packages that
 readily incorporates feedback loops and online learning. If somebody
 has experience these kinds of problems in R, please respond.


 Or will
 http://cran.r-project.org
 Look for 'Task Views'
 be my next piece of advice?

 On Sep 12, 11:31 am, Dennis Murphy djmu...@gmail.com wrote:
 http://cran.r-project.org/web/views/

 Look for 'machine learning'.

 Dennis



 On Sun, Sep 11, 2011 at 11:33 PM, Jay josip.2...@gmail.com wrote:
  If the answer is so obvious, could somebody please spell it out?

  On Sep 11, 10:59 pm, Jason Edgecombe ja...@rampaginggeek.com wrote:
  Try this:

 http://cran.r-project.org/web/views/MachineLearning.html

  On 09/11/2011 12:43 PM, Jay wrote:

   Hi,

   I used the rseek search engine to look for suitable solutions, however
   as I was unable to find anything useful, I'm asking for help.
   Anybody have experience with these kinds of problems? I looked into
   dynaTree, but as information is a bit scares and as I understand it,
   it might not be what I'm looking for..(?)

   BR,
   Jay

   On Sep 11, 7:15 pm, David Winsemiusdwinsem...@comcast.net  wrote:
   On Sep 11, 2011, at 11:42 AM, Jay wrote:

   What R packages are available for performing classification tasks?
   That is, when the predictor has done its job on the dataset (based on
   the training set and a range of variables), feedback about the true
   label will be available and this information should be integrated for
   the next classification round.
   You should look at CRAN Task Views. Extremely easy to find from the
   main R-project page.

   --
   David Winsemius, MD
   West Hartford, CT

   __
   r-h...@r-project.org mailing 
   listhttps://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting 
   guidehttp://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
   __
   r-h...@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting 
   guidehttp://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.

  __
  r-h...@r-project.org mailing 
  listhttps://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting 
  guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

  __
  r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Steve

Re: [R] Keep ALL duplicate records

2011-10-02 Thread Pete Brecknock


Erik Svensson wrote:
 
 Hello,
 In a data frame I want to identify ALL duplicate IDs in the example to be
 able to examine OS and time.
 
 (df-data.frame(ID=c(userA, userB, userA, userC),
   OS=c(Win,OSX,Win, Win64),
   time=c(12:22,23:22,04:44,12:28)))
 
  IDOS  time
 1 userA   Win 12:22
 2 userB   OSX 23:22
 3 userA   Win 04:44
 4 userC Win64 12:28
 
 My desired output is that ALL records with the same IDs are found:
 
 userA   Win 12:22
 userA   Win 04:44
 
 preferably by returning logical values (TRUE FALSE TRUE FALSE)
 
 Is there a simple way to do that?
 
 [-- With duplicated(df$ID) the output will be
 [1] FALSE FALSE  TRUE FALSE 
 i.e. not all user A records are found
 
 With unique(df$ID)
 [1] userA userB userC
 Levels: userA userB userC 
 i.e. one of each ID is found --]
 
 Erik Svensson
 


How about ...

# All records
ALL_RECORDS - df[df$ID==df$ID[duplicated(df$ID)],]
print(ALL_RECORDS)

# Logical Records
TRUE_FALSE - df$ID==df$ID[duplicated(df$ID)]
print(TRUE_FALSE)

HTH

Pete


--
View this message in context: 
http://r.789695.n4.nabble.com/Keep-ALL-duplicate-records-tp3865136p3865573.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] difference between createPartition and createfold functions

2011-10-02 Thread bby2103


Hello,

I'm trying to separate my dataset into 4 parts with the 4th one as the  
test dataset, and the other three to fit a model.


I've been searching for the difference between these 2 functions in  
Caret package, but the most I can get is this--


A series of test/training partitions are created using  
createDataPartition while createResample creates one or more bootstrap  
samples. createFolds splits the data into k groups.


I'm missing something here? What is the difference btw createPartition  
and createFold? I guess they wouldn't be equivalent.


Thank you.

Bonnie Yuan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is the output of survfit.coxph survival or baseline survival?

2011-10-02 Thread Thomas Lumley

On Sat, Oct 1, 2011 at 2:31 PM, koshihaku koshih...@gmail.com wrote:
 Dear all,
 I am confused with the output of survfit.coxph.
 Someone said that the survival given by summary(survfit.coxph) is the
 baseline survival S_0, but some said that is the survival S=S_0^exp{beta*x}.

 Which one is correct?

The baseline hazard as estimated in survfit.coxph is the hazard when
all covariates are equal to the sample mean (or the stratum mean for a
stratified model).   The means that it is using are available in the
$means component of the coxph object.   It is not the hazard
extrapolated to all covariates equal zero.

The centering at the sample mean is done for three reasons
1/ it's computationally convenient
2/ it's numerically more stable
3/ it makes the baseline hazard more interpretable, since at least it
is the hazard for a set of covariate values somewhere in the interior
of your data.

   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Advice on approach to weighting survey

2011-10-02 Thread Thomas Lumley

On Sat, Oct 1, 2011 at 4:59 AM, Farley, Robert farl...@metro.net wrote:
 I'm about to add weights to a bus on-board survey dataset with ~150 variables 
 and ~28,000 records.  My intention is to weight (for each bus run) by 
 boarding stop and alighting stop.  I've seen the Rake function of the Survey 
 package, but it seems that converting to a svydesign might be excessive for 
 my purpose.

 My dataset has a huge number of unique Run-Boarding and Run-Alighting 
 groups each with a small number of records to expand.  Would it be easier to 
 manually implement Iterative-Proportional-Fitting/Raking/Fratar/Furness on 
 the data?  Or are there benefits to converting the data to a svydesign that 
 would make it valuable?  This traditional weighting expands what we call 
 unlinked (based on each boarding)trips.  I'm thinking of also using 
 IPF/Raking to estimate linked (based on each individual) trips.  Would this 
 change the consideration of using the svydesign process?


If you're planning to do any analysis afterwards it would be useful to
have the data in a svydesign object, or if you end up needing to do
weight trimming or bounding, or other slightly more complicated weight
adjustments.  Otherwise it might well just be easier to do your own
IPF algorithm.

   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] difference between createPartition and createfold functions

2011-10-02 Thread Steve Lianoglou

Hi,

On Sun, Oct 2, 2011 at 2:47 PM,  bby2...@columbia.edu wrote:
 Hello,

 I'm trying to separate my dataset into 4 parts with the 4th one as the test
 dataset, and the other three to fit a model.

 I've been searching for the difference between these 2 functions in Caret
 package, but the most I can get is this--

 A series of test/training partitions are created using createDataPartition
 while createResample creates one or more bootstrap samples. createFolds
 splits the data into k groups.

 I'm missing something here? What is the difference btw createPartition and
 createFold? I guess they wouldn't be equivalent.

Well -- you could always look at the source code to find out (enter
the name of the function into your R console and hit return), but you
can also do some experimentation to find out. Using the data from the
Examples section of caret::createFolds:

R library(caret)
R data(oil)
R part - createDataPartition(oilType, 2)
R fold - createFolds(oilType, 2)

R length(Reduce(intersect, part))
[1] 27

R length(Reduce(intersect, fold))
[1] 0

Looks like `createDataPartition` split your data into smaller pieces,
but allows for the same example to appear in different splits.

`createFolds` doesn't allow different examples to appear in different
splits of the folds.

HTH,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Scatterplot with the 3rd dimension = color?

2011-10-02 Thread Kerry

I have 3 columns of data and want to plot each row as a point in a
scatter plot and want one column to be represented as a color gradient
(e.g. larger  values being more red). Anyone know the command or
package for this?

Thanks,
KB

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] is member

2011-10-02 Thread Alaios

Dear all,
I would like to thank you for you answers
This worked for me

Browse[1] match(Test,seq(1,C,FrN),nomatch=FALSE)
  [1]   1   0   2   3   0   0   4   0   0   5   0   0   6   7   0   0   8   0
 [19]   0   9   0  10  11   0   0  12   0   0  13  14   0  15   0  16   0   0
 [37]  17  18  19   0   0  20  21  22  23   0   0  24   0  25   0   0  26   0
 [55]   0  27  29   0  30  31   0  32   0   0  33  34   0   0  37   0  38   0
 [73]   0   0  39   0  40   0  41   0  42  43  46  47   0  48   0   0  49  51
 [91]   0   0  52   0  53   0   0  54  55   0   0  56   0  57  58  59   0   0
[109]  60  61   0   0  62  63  64  65  67  68  69  70  71  72  73  74  75   0
[127]  76  77  79   0  80   0  81  82  83  84  85  86   0  87   0  88  89  90
[145]   0   0  91  92  93  94   0  95   0  96  97  98  99   0   0 100   0   0
[163] 101 102   0   0 103   0   0   0 104   0   0 105   0   0 106   0 107   0
[181] 108   0 109 110 111   0   0 112   0 113   0 114   0 115 116 117 118 119
[199] 120 121 122 123 124 125 126 127 129 130 131 132 133 134 135   0 136 137
[217]   0 138   0 139 140 141   0 142   0   0 143 144   0   0 145   0 146   0
[235]   0 147   0 148 149 150   0 151 152 153   0   0 154 156 157 158   0 159
[253] 160 161 162 163 164 165 166 167   0 168 169 170 171 172 173   0   0 174
[271]   0 175 176 177 178 179 180 181 182 183 184 185   0 186 187   0 188   0
[289] 189 190 191 192   0 193 194 195 196 197 198 199 200


What I want to do now is to keep all the vector elements (only numbers) without 
the zeros!. How I can do that?

B.R
Alex




From: William Dunlap wdun...@tibco.com

Sent: Saturday, October 1, 2011 12:11 AM
Subject: RE: [R] is member



Someone already suggested that you use match(),
which does what I think you want.  Read its help file
for details.
 
 A - seq(1,113,4)
 match(c(9, 17, 18), A)
[1]  3  5 NA
 
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

Sent: Friday, September 30, 2011 2:07 PM
To: William Dunlap; R-help@r-project.org
Subject: Re: [R] is member
 
Thanks a lot! This works.
Now I want to do the opposite
let's say that I have one sequence 
for example 
check in image 
http://imageshack.us/photo/my-images/4/unleduso.png/
column A (this is a seq(1,113,4)
and I want when I get the number 9 to say that this is the third number in the 
seq (1,113,4). everything about the seq(1,113,4) is known and I want when I get 
one of the number of the sequence to say which is its order.
How I can do that?
B.R
A;ex
 
From:William Dunlap wdun...@tibco.com

Sent: Friday, September 30, 2011 6:34 PM
Subject: RE: [R] is member

  is.element(myvector, seq(1,800,4))
or, if you like typing percent signs,
  myvector %in% seq(1,800,4)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Alaios
 Sent: Friday, September 30, 2011 9:26 AM
 To: R-help@r-project.org
 Subject: [R] is member
 
 Dear all,
 I have a vector with number that some of them are part of the
 
 seq(1,800,4). How can I check which of the numbers belong to the seq(1,800,4)
 
 LEt's say that is called myvector the vector with the numbers.
 Is there in R something like this?
 is.member(myvector,seq(1,800,4))
 
 I would like to thank you in advance for your help
 
 B.R
 Alex
     [[alternative HTML version deleted]]
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Scatterplot with the 3rd dimension = color?

2011-10-02 Thread Tal Galili

Here is one:
http://cran.r-project.org/web/packages/scatterplot3d/index.html

In the future, consider first searching:
http://finzi.psych.upenn.edu/search.html
http://rseek.org/

etc...



Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Sun, Oct 2, 2011 at 7:11 PM, Kerry kbro...@gmail.com wrote:

 I have 3 columns of data and want to plot each row as a point in a
 scatter plot and want one column to be represented as a color gradient
 (e.g. larger  values being more red). Anyone know the command or
 package for this?

 Thanks,
 KB

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] difference between createPartition and createfold functions

2011-10-02 Thread bby2103


Hi Steve,

Thanks for the note. I did try the example and the result didn't make  
sense to me. For splitting a vector, what you describe is a big  
difference btw them. For splitting a dataframe, I now wonder if these  
2 functions are the wrong choices. They seem to split the columns, at  
least in the few things I tried.


Bonnie

Quoting Steve Lianoglou mailinglist.honey...@gmail.com:


Hi,

On Sun, Oct 2, 2011 at 2:47 PM,  bby2...@columbia.edu wrote:

Hello,

I'm trying to separate my dataset into 4 parts with the 4th one as the test
dataset, and the other three to fit a model.

I've been searching for the difference between these 2 functions in Caret
package, but the most I can get is this--

A series of test/training partitions are created using createDataPartition
while createResample creates one or more bootstrap samples. createFolds
splits the data into k groups.

I'm missing something here? What is the difference btw createPartition and
createFold? I guess they wouldn't be equivalent.


Well -- you could always look at the source code to find out (enter
the name of the function into your R console and hit return), but you
can also do some experimentation to find out. Using the data from the
Examples section of caret::createFolds:

R library(caret)
R data(oil)
R part - createDataPartition(oilType, 2)
R fold - createFolds(oilType, 2)

R length(Reduce(intersect, part))
[1] 27

R length(Reduce(intersect, fold))
[1] 0

Looks like `createDataPartition` split your data into smaller pieces,
but allows for the same example to appear in different splits.

`createFolds` doesn't allow different examples to appear in different
splits of the folds.

HTH,
-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Scatterplot with the 3rd dimension = color?

2011-10-02 Thread Duncan Murdoch


On 11-10-02 1:11 PM, Kerry wrote:

I have 3 columns of data and want to plot each row as a point in a
scatter plot and want one column to be represented as a color gradient
(e.g. larger  values being more red). Anyone know the command or
package for this?


It's not a particularly effective display, but here's how to do it.  Use 
rainbow(101) in place of rev(heat.colors(101)) if you like.


x - rnorm(10)
y - rnorm(10)
z - rnorm(10)
colors - rev(heat.colors(101))
zcolor - colors[(z - min(z))/diff(range(z))*100 + 1]
plot(x,y,col=zcolor)

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Studio and Rcmdr/RcmdrPlugins

2011-10-02 Thread Tal Galili

Hi Erin,
The last I checked - it was not possible.

However, the place to ask this is here:
http://support.rstudio.org/help/discussions

Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Sun, Oct 2, 2011 at 4:42 AM, Erin Hodgess erinm.hodg...@gmail.comwrote:

 Dear R People:

 Hope you're having a great weekend!

 Anyhow, I'm currently experimenting with R Studio on a web server,
 which is the best thing since sliced bread, Coca Cola, etc.

 My one question:  there is a way to show plots.  is there a way to
 show Rcmdr or its Plugins, please?  I tried, but it doesn't seem to
 work.

 Thanks so much,
 Sincerely,
 Erin


 --
 Erin Hodgess
 Associate Professor
 Department of Computer and Mathematical Sciences
 University of Houston - Downtown
 mailto: erinm.hodg...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] difference between createPartition and createfold functions

2011-10-02 Thread Steve Lianoglou

Hi,

On Sun, Oct 2, 2011 at 3:54 PM,  bby2...@columbia.edu wrote:
 Hi Steve,

 Thanks for the note. I did try the example and the result didn't make sense
 to me. For splitting a vector, what you describe is a big difference btw
 them. For splitting a dataframe, I now wonder if these 2 functions are the
 wrong choices. They seem to split the columns, at least in the few things I
 tried.

Sorry, I'm a bit confused now as to what you are after.

You don't pass in a data.frame into any of the
createFolds/DataPartition functions from the caret package.

You pass in a *vector* of labels, and these functions tells you which
indices into the vector to use as examples to hold out (or keep
(depending on the value you pass in for the `returnTrain` argument))
between each fold/partition of your learning scenario (eg. cross
validation with createFolds).

You would then use these indices to keep (remove) the rows of a
data.frame, if that is how you are storing your examples.

Does that make sense?

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error while using shapiro.test()

2011-10-02 Thread csrabak


Em 1/10/2011 13:24, spicymchaggis101 escreveu:

Thank you very much! your response solved my issue.

I needed to determine the probability of normality for word types per page.

You need to insure this assumption is reasonable for your problem domain 
as words types per page seems like count data for me and for this kind 
of data Gaussian distributions are at the very best last resort 
approximations.


--
Cesar Rabak

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Keep ALL duplicate records

2011-10-02 Thread jim holtman

Here is a function I use to find all duplicate records

 allDup - function (value)
{
duplicated(value) | duplicated(value, fromLast = TRUE)
}
 x
 IDOS  time
1 userA   Win 12:22
2 userB   OSX 23:22
3 userA   Win 04:44
4 userC Win64 12:28
 x[allDup(x$ID),]
 ID  OS  time
1 userA Win 12:22
3 userA Win 04:44




On Sun, Oct 2, 2011 at 2:18 PM, Pete Brecknock peter.breckn...@bp.com wrote:

 Erik Svensson wrote:

 Hello,
 In a data frame I want to identify ALL duplicate IDs in the example to be
 able to examine OS and time.

 (df-data.frame(ID=c(userA, userB, userA, userC),
   OS=c(Win,OSX,Win, Win64),
   time=c(12:22,23:22,04:44,12:28)))

      ID    OS  time
 1 userA   Win 12:22
 2 userB   OSX 23:22
 3 userA   Win 04:44
 4 userC Win64 12:28

 My desired output is that ALL records with the same IDs are found:

 userA   Win 12:22
 userA   Win 04:44

 preferably by returning logical values (TRUE FALSE TRUE FALSE)

 Is there a simple way to do that?

 [-- With duplicated(df$ID) the output will be
 [1] FALSE FALSE  TRUE FALSE
 i.e. not all user A records are found

 With unique(df$ID)
 [1] userA userB userC
 Levels: userA userB userC
 i.e. one of each ID is found --]

 Erik Svensson



 How about ...

 # All records
 ALL_RECORDS - df[df$ID==df$ID[duplicated(df$ID)],]
 print(ALL_RECORDS)

 # Logical Records
 TRUE_FALSE - df$ID==df$ID[duplicated(df$ID)]
 print(TRUE_FALSE)

 HTH

 Pete


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Keep-ALL-duplicate-records-tp3865136p3865573.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Find all duplicate records

2011-10-02 Thread Gabor Grothendieck

On Sun, Oct 2, 2011 at 10:05 AM, Erik Svensson
erik.b.svens...@gmail.com wrote:
 Hello,
 In a data frame I want to identify ALL duplicate IDs in the example to be
 able to examine OS and time.

 (df-data.frame(ID=c(userA, userB, userA, userC),
  OS=c(Win,OSX,Win, Win64),
  time=c(12:22,23:22,04:44,12:28)))

     ID    OS  time
 1 userA   Win 12:22
 2 userB   OSX 23:22
 3 userA   Win 04:44
 4 userC Win64 12:28

 My desired output is that ALL records with the same IDs are found:

 userA   Win 12:22
 userA   Win 04:44

 preferably by returning logical values (TRUE FALSE TRUE FALSE)


Try this:

 ave(rownames(df), df$ID, FUN = length)  1
[1]  TRUE FALSE  TRUE FALSE


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] difference between createPartition and createfold functions

2011-10-02 Thread Max Kuhn

Basically, createDataPartition is used when you need to make one or
more simple two-way splits of your data. For example, if you want to
make a training and test set and keep your classes balanced, this is
what you could use. It can also make multiple splits of this kind (or
leave-group-out CV aka Monte Carlos CV aka repeated training test
splits).

createFolds is exclusively for k-fold CV. Their usage is simular when
you use the returnTrain = TRUE option in createFolds.

Max

On Sun, Oct 2, 2011 at 4:00 PM, Steve Lianoglou
mailinglist.honey...@gmail.com wrote:
 Hi,

 On Sun, Oct 2, 2011 at 3:54 PM,  bby2...@columbia.edu wrote:
 Hi Steve,

 Thanks for the note. I did try the example and the result didn't make sense
 to me. For splitting a vector, what you describe is a big difference btw
 them. For splitting a dataframe, I now wonder if these 2 functions are the
 wrong choices. They seem to split the columns, at least in the few things I
 tried.

 Sorry, I'm a bit confused now as to what you are after.

 You don't pass in a data.frame into any of the
 createFolds/DataPartition functions from the caret package.

 You pass in a *vector* of labels, and these functions tells you which
 indices into the vector to use as examples to hold out (or keep
 (depending on the value you pass in for the `returnTrain` argument))
 between each fold/partition of your learning scenario (eg. cross
 validation with createFolds).

 You would then use these indices to keep (remove) the rows of a
 data.frame, if that is how you are storing your examples.

 Does that make sense?

 -steve

 --
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
  | Memorial Sloan-Kettering Cancer Center
  | Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is the output of survfit.coxph survival or baseline survival?

2011-10-02 Thread Terry Therneau

 Dear all,
 I am confused with the output of survfit.coxph.
 Someone said that the survival given by summary(survfit.coxph) is the
 baseline survival S_0, but some said that is the survival
 S=S_0^exp{beta*x}.

 Which one is correct?

 The ³baseline survival², which is the survival for a hypothetical subject
with all covariates=0, may be useful mathematical shorthand when writing a
book but I cannot think of a single case where the resulting curve would be
of any practical interest in medical data.  For this reason my survival
routines in R NEVER return it.  (Ask yourself ³what is the survival for
someone with blood pressure=0, cholesterol=0, weight=0, ².  The answer
is that they are either non-existent or dead).
 The intention with survfit is that you will give it a second data set
containing one or more lines, each of which describes a subject whose
predicted survival is of interest.  If no such data is given, the survival
for someone with all covariates = to the mean is given.  This is better than
covariates =0, but sometimes not by much.  (What if sex were coded as a 0/1
numeric  do we get the survival of a hermaphrodite?)

Your best approach is to forget the phrase ³baseline survival² and focus on
covariate sets of interest to you.

Terry Therneau

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] about the array transpose

2011-10-02 Thread venerealdisease

Hi, all,

I am a newbie for [R]
Would anyone help me how to transpose a 3x3x3 array for 1:27

Eg.
A-array(1:27, c(3,3,3)

What is the logic to transpose it to B-aperm(A, c(3,2,1))

Because I found I could not imagine how it transposes, anyone could solve my
problem?
And most important I could get the number what I expected, I think if I
could not figure it out, I will have a confused concept which will affect my
future learning of 3D models in [R].

Highly appreciated and thanks.

VD







--
View this message in context: 
http://r.789695.n4.nabble.com/about-the-array-transpose-tp3866241p3866241.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Scatterplot with the 3rd dimension = color?

2011-10-02 Thread Kerry

Yes, perfect! This I can work with.

Thanks,
KB

On Oct 2, 3:55 pm, Duncan Murdoch murdoch.dun...@gmail.com wrote:
 On 11-10-02 1:11 PM, Kerry wrote:

  I have 3 columns of data and want to plot each row as a point in a
  scatter plot and want one column to be represented as a color gradient
  (e.g. larger  values being more red). Anyone know the command or
  package for this?

 It's not a particularly effective display, but here's how to do it.  Use
 rainbow(101) in place of rev(heat.colors(101)) if you like.

 x - rnorm(10)
 y - rnorm(10)
 z - rnorm(10)
 colors - rev(heat.colors(101))
 zcolor - colors[(z - min(z))/diff(range(z))*100 + 1]
 plot(x,y,col=zcolor)

 Duncan Murdoch

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] patients.txt data

2011-10-02 Thread Nadine Melhem


please send me the patients.txt data.

thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] patients.txt data

2011-10-02 Thread Melhem, Nadine

I'm new to learning R. I'm taking a course and will need access to the 
patients.txt data to be able to do the exercises required using this dataset.

thanks.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Scatterplot with the 3rd dimension = color?

2011-10-02 Thread Ben Bolker

Duncan Murdoch murdoch.duncan at gmail.com writes:

 
 On 11-10-02 1:11 PM, Kerry wrote:
  I have 3 columns of data and want to plot each row as a point in a
  scatter plot and want one column to be represented as a color gradient
  (e.g. larger  values being more red). Anyone know the command or
  package for this?
 
 It's not a particularly effective display, but here's how to do it.  Use 
 rainbow(101) in place of rev(heat.colors(101)) if you like.
 
 x - rnorm(10)
 y - rnorm(10)
 z - rnorm(10)
 colors - rev(heat.colors(101))
 zcolor - colors[(z - min(z))/diff(range(z))*100 + 1]
 plot(x,y,col=zcolor)
 

  or

d - data.frame(x,y,z)
library(ggplot2)
qplot(x,y,colour=z,data=d)

  I agree about the not particularly effective display
comment, but if you have two continuous predictors and
a continuous response you've got a tough display problem --
your choices are:

  1. use color, size, or some other graphical characteristic
(pretty far down on the Cleveland hierarchy)
  2. use a perspective plot (hard to get the right viewing
angle, often confusing)
  3. use coplots/small multiples/faceting (requires
discretizing one dimension)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] rolling regression

2011-10-02 Thread Darius H


Dear all,

I have spent the last few days on a seemingly simple and previously documented 
rolling regression.

I have a 60 year data set organized in a ts matrix. 
The matrix has 5 columns; cash_ret, epy1, ism1, spread1, unemp1

I have been able to come up with the following based on previous help threads. 
It seems to work fine.
The trouble is I get regression coefficients but need the immediate next period 
forecast.

cash_fit= rollapply(cash_data, width=60, 

function(x) coef(lm(cash_ret~epy1+ism1+spread1+unemp1, data = 
as.data.frame(x))), 

by.column=FALSE, align=right); cash_fit


I tried to replace coef above to predict but I get a whole bunch of results 
too big to be displayed. I would be grateful 
if someone could guide me on how to get the next period forecast after each 
regression. 

If there is a possibility of getting the significance of each regressor and the 
standard error in addition to R-sq 
without having to spend the next week, that would be helpful as well.

Many thanks,
Darius




  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] patients.txt data

2011-10-02 Thread Steve Lianoglou

Hi,

On Sun, Oct 2, 2011 at 4:31 PM, Melhem, Nadine mel...@upmc.edu wrote:
 I'm new to learning R. I'm taking a course and will need access to the 
 patients.txt data to be able to do the exercises required using this 
 dataset.

Without more context, I'm doubtful that anybody will be able to help you.

I reckon your best bet will be to ask your instructor where you can
find this sample data.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to format R superscript 2 followed by = value

2011-10-02 Thread Nevil Amos


I am trying to put  an
R2 value with R2 formatted with a superscript 2 followed by = and the 
value :
the first mtext prints the R2 correctly formatted but follows it with 
=round(summary(mylm)$r.squared,3))) as text
the second prints R^2 = followed by the value of 
round(summary(mylm)$r.squared,3))).


how do I correctly write the expression to get formatted r2 followed by 
the value?





x=runif(10)
y=runif(10)
summary(mylm-lm(y~x))
plot(x,y)
abline(mylm)
mtext(expression(paste(R^2,=,round(summary(mylm)$r.squared,3))),1)
mtext(paste(expression(R^2),=,round(summary(mylm)$r.squared,3)),3)



thanks

Nevil Amos

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] function recode within sapply

2011-10-02 Thread Lara Poplarski

Dear List,

I am using function recode, from package car, within sapply, as follows:

L3 - LETTERS[1:3]
(d - data.frame(cbind(x = 1, y = 1:10), fac1 = sample(L3, 10,
replace=TRUE), fac2 = sample(L3, 10, replace=TRUE), fac3 = sample(L3,
10, replace=TRUE)))
str(d)

d[, c(fac1, fac2)] - sapply(d[, c(fac1, fac2)], recode,
c('A', 'B') = 'XX', as.factor.result = TRUE)
d[, fac3] - recode(d[, fac3], c('A', 'B') = 'XX')
str(d)

However, the class of columns fac1 and fac2 is character as opposed
to factor, even though I specify the option as.factor.result =
TRUE; this option works fine with a single column.

Any thoughts?

Many thanks,
Lara

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to format R superscript 2 followed by = value

2011-10-02 Thread Joshua Wiley

Hi Nevil,

Here is one option:



## function definition
r2format - function(object, digits = 3, output, sub, expression = TRUE, ...) {
  if (inherits(object, lm)) {
x - summary(object)
  } else if (inherits(object, summary.lm)) {
x - object
  } else stop(object is an unmanageable class)
  out - format(x$r.squared, digits = digits)

  if (!missing(output)) {
output - gsub(sub, out, output)
  } else {
output - out
  }
  if (expression) {
output - parse(text = output)
  }
  return(output)
}

## model
m - lm(mpg ~ hp * wt, data = mtcars)

## demonstration
r2format(object = m, output = R^2 == rval, sub = rval, expression = TRUE)

## your problem
x - runif(10)
y - runif(10)
mylm - lm(y ~ x)
plot(x, y)
abline(mylm)
## simplified version of demo
mtext(r2format(m, 3, R^2 == rval, rval), 3)



The real key is using == instead of =.  The lengthy response is
because I have been toying with and working with different stylers and
formatters to try to facilitate getting output from R into publication
format so I was interested in playing with this and thinking what
might be useful abstractions.  Anyway, more specific to your useage
might be something like:

substitute(expression(R^2 == rval), list(rval =
round(summary(mylm)$r.squared,3)))

Cheers,

Josh

On Sun, Oct 2, 2011 at 9:49 PM, Nevil Amos nevil.a...@gmail.com wrote:
 I am trying to put  an
 R2 value with R2 formatted with a superscript 2 followed by = and the
 value :
 the first mtext prints the R2 correctly formatted but follows it with
 =round(summary(mylm)$r.squared,3))) as text
 the second prints R^2 = followed by the value of
 round(summary(mylm)$r.squared,3))).

 how do I correctly write the expression to get formatted r2 followed by the
 value?




 x=runif(10)
 y=runif(10)
 summary(mylm-lm(y~x))
 plot(x,y)
 abline(mylm)
 mtext(expression(paste(R^2,=,round(summary(mylm)$r.squared,3))),1)
 mtext(paste(expression(R^2),=,round(summary(mylm)$r.squared,3)),3)



 thanks

 Nevil Amos

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, ATS Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] function recode within sapply

2011-10-02 Thread Joshua Wiley

Hi Lara,

Use lapply here instead of sapply or specify simplify = FALSE.  See
?sapply for details.

d[, c(fac1, fac2)] - lapply(d[, c(fac1, fac2)], recode,
c('A', 'B') = 'XX', as.factor.result = TRUE)
d[, fac3] - recode(d[, fac3], c('A', 'B') = 'XX')
str(d)

Cheers,

Josh

On Sun, Oct 2, 2011 at 10:16 PM, Lara Poplarski larapoplar...@gmail.com wrote:
 Dear List,

 I am using function recode, from package car, within sapply, as follows:

 L3 - LETTERS[1:3]
 (d - data.frame(cbind(x = 1, y = 1:10), fac1 = sample(L3, 10,
 replace=TRUE), fac2 = sample(L3, 10, replace=TRUE), fac3 = sample(L3,
 10, replace=TRUE)))
 str(d)

 d[, c(fac1, fac2)] - sapply(d[, c(fac1, fac2)], recode,
 c('A', 'B') = 'XX', as.factor.result = TRUE)
 d[, fac3] - recode(d[, fac3], c('A', 'B') = 'XX')
 str(d)

 However, the class of columns fac1 and fac2 is character as opposed
 to factor, even though I specify the option as.factor.result =
 TRUE; this option works fine with a single column.

 Any thoughts?

 Many thanks,
 Lara

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, ATS Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

59 matches

Mail list logo