[R] Error message related to 'weights for observations' argument in glmpath

2011-08-24 Thread Robert Nee
Hello-

I am new to R and am trying to use glmpath for LASSO logistic regression. My 
data set contains 40 observations and a mix of dichotomous, categorical, and 
continuous potential predictor variables.

I am having a problem with the 'weight = rep(1, n)' argument that relates to an 
optional vector of weights for observations.

I would like to have the same weight for all 40 observations. However, if I 
enter 'weight = rep(1, 40)' as part of my code, I get the following error 
message:

"Error in if (length(weight) != n) { : argument is of length zero".

I would appreciate any help that people can offer and apologize in advance if I 
have missed an obvious answer.

Kind regards,

Bob

Robert J. Nee, PhD Candidate
Division of Physiotherapy
School of Health and Rehabilitation Sciences
The University of Queensland
St. Lucia, QLD 4072
AUSTRALIA
r@uq.edu.au




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question on silhouette colours

2011-08-24 Thread Gordon Robertson
I'm fairly new to the silhouette functionality in the cluster package, so 
apologize if I'm asking something naive. 

If I run the 'agnes(ruspini)' example from the silhouette section of the 
cluster package vignette, and assign colours to clusters, two clusters have 
what appear to be incorrect colours in the silhouette plot. 

library(cluster)
data(ruspini)
ar<- agnes(ruspini)
si3<- silhouette(cutree(ar, k = 5), daisy(ruspini))
# 1. This gives a mid-gray silhouette plot, which does not show the problem
plot(si3, nmax = 80, cex.names = 0.5) 
# 2. This gives a multicolour silhouette plot, but there are three black 
lines/bars in the yellow cluster, and the cluster that should be black is 
actually yellow?
plot(si3, nmax = 80, cex.names = 0.5, 
col=c("red","blue","yellow","black","green"))
# 3. Check sorting by writing out sorted results to a file, then plotting from 
the file
si3.sorted<-sortSilhouette(si3)
write.table(si3.sorted,"/...myPath.../si3.sorted.txt",sep="\t")

Inspecting the si3.sorted.txt file, cluster numbers are ordered as expected 
(1's then 2's then...), and sil_width's within each cluster appear correctly 
sorted (descending). Given this, if I load the file into say Mathematica, and 
plot it with colours, I easily generate a graphic that is like the one from R, 
but in which all cluster colours are as expected, i.e. there are no black bars 
in the yellow region, and the cluster that should be black -is- black. 

Again, I apologize if I'm missing something simple. Thanks for your help in 
understanding this behaviour.

Gordon
--
sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 
other attached packages:
[1] cluster_1.14.0
loaded via a namespace (and not attached):
[1] tools_2.13.1

--
Gordon Robertson
BC Cancer Agency Genome Sciences Centre

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Suppressing error messages printed in xyplot() with panel function

2011-08-24 Thread Adam Zeilinger

Dear Deepayan and Dennis,

Both of your proposed fixes worked perfectly.  Thank you!

Adam


On 8/24/2011 9:47 PM, Deepayan Sarkar wrote:

On Thu, Aug 25, 2011 at 1:00 AM, Adam Zeilinger  wrote:

Hello,

I am using the xyplot() function to create a series of scatterplot panels
with lines of best fit.  To draw the lines of best fit for each panel, I am
using a panel function.  Here's an example:


species<- as.character(c(rep(list("A", "B", "A"), 10), "B"))
year<- as.character(c(rep(list("2009", "2009", "2010"), 10), "2010"))
x<- rnorm(31, mean = 50, sd = 1)
y<- 3*x + 20
ex.data<- data.frame(cbind(species, year, x, y))
ex.data$x<- as.numeric(ex.data$x)
ex.data$y<- as.numeric(ex.data$y)
xyplot(y ~ x|species*year, data = ex.data,

+   panel = function(x, y) {
+   panel.xyplot(x, y, pch=16, col="black")
+   panel.abline(lm(y ~ x))})

With my data set, there are some panels with less than 2 data points.  In
these panels, an error message is printed in the panel, something like:
"Error using packet 4 missing value where TRUE/FALSE needed."

In the panels with error messages, I want to keep the panels but suppress
the error message, such that the panel is blank or has only one datum.  How
do I do suppress the printing of the error message?

Generally speaking, suppressing error messages should not be your
first instinct. If you know why the error is happening, it's better to
make sure it never happens, e.g., with something like

if (length(x)>= 2) panel.abline(lm(y ~ x))

Of course, as Dennis says, in this case you can take advantage of the
'type' argument, which already implements what you want.

-Deepayan



--

Adam Zeilinger
Ph. D Candidate
Conservation Biology Program
University of Minnesota
Saint Paul, MN
www.linkedin.com/in/adamzeilinger

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to combine two learned regression models?

2011-08-24 Thread David Winsemius


On Aug 25, 2011, at 12:38 AM, Andra Isan wrote:


Hi All,

I have a set of features of size p and I would like to separate my  
feature space into two sets so that p = p1 + p2, p1 is a set of  
features and p2 is another set of features and I want to fit a glm  
model for each sets of features separately.


You will be extracting the parameters and creating a summary variable  
for each model? That sounds pretty straight-forward. That's not the  
point you have questions about, right? Let's call teh functions that  
you create p1pred and p2pred.


Then I want to combine the results of two glm models with a  
parameter beta. For example, beta * F(p1) + (1-beta) * F(p2) where  
F(p1) is a learned model for feature set p1 and F(p2) is the learned  
model for feature set p2. Is there any way to do that in R?


In any GLM fitting program that provides an offset term (as does R's  
glm(.) ), you can construct:


y = beta1*(p1pred  -p2pred) + offset(p2pred)

# you would create the difference score first or wrap the difference  
of predictions in the I() function.


This has the same fitted values as would:

y = beta1*p1pred + (1-beta2)*p2pred

I'm not sure about the inferential statistics. Seems to me that they  
would be acceptable, but I am not a statistician.  Have you looked at  
the 'flexmix' package? I suspect its got that all mapped out in the  
vignette. If it can model difference components with varying  
distributions, it should be able to model "parceled features". Too bad  
I don't understand the theoretical presentation.


http://finzi.psych.upenn.edu/R/library/flexmix/doc/regression-examples.pdf



There is a package called mixtools which can fit a mixture of two  
regression models but it does not separate the features. I would  
also like to separate features and fit a model for each feature set  
and then combine them.


Thanks,
Andra


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to store the output of a loop into a matrix??

2011-08-24 Thread Dennis Murphy
Hi:

It's straightforward to write a simple function to do this:

simmat <- function(N, t1) matrix(rep(runif(N, min = -1, max = 1), each
= t1), nrow = N, byrow = TRUE)
v <- simmat(45, 10)
head(v, 3)

HTH,
Dennis

PS: T is a reserved word in R (abbreviation for the logical TRUE) and
should not be used as a variable name; similarly, t is the name of a
very common function (for matrix transpose) so it shouldn't be used as
a variable name, either. This is why I chose t1 in the function above.

On Wed, Aug 24, 2011 at 6:10 PM, Soberon Velez, Alexandra Pilar
 wrote:
> Hello,
>
> I want to create a matrix of N random numbers with a uniform distributions. 
> Later, I want to repeat T times each row of this matrix. For this I do the 
> following loop:
>
> N<-45
> T<-10
> n<-N*T
> a<-matrix(runif(N,min=-1,max=1),nr=N)
>
> mymat<-matrix(rep(NA,n),nr=n,nc=1)
> for(i in i:N){
> b<-rep(a[i,],T)
> mymat[i,]<-b
> }
>
> Mi problem is that with this loop I can see the output of the loop but I 
> canot get the matrix (mymat) that contains the full output of the loop.
>
> Please, somebody know how can I create a matrix that contains the output of 
> the loop with a dimension NTx1 (450x1).
>
> Thanks a lot for your help.
> Alexandra
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dput data frame

2011-08-24 Thread Dennis Murphy
Hi:

Try this instead:

m <- matrix(rpois(40, 10), nrow = 1)
> dim(m)
[1] 140
r <- m[1:10, 3:6]
dput(r)
structure(c(12, 7, 15, 8, 6, 7, 14, 10, 11, 4, 9, 16, 12, 5,
9, 10, 9, 9, 8, 7, 12, 9, 10, 12, 12, 11, 11, 8, 12, 8, 15, 21,
3, 3, 13, 9, 8, 13, 7, 11), .Dim = c(10L, 4L))

# Alternatively,
dput(r <- m[1:10, 3:6])
structure(c(12, 7, 15, 8, 6, 7, 14, 10, 11, 4, 9, 16, 12, 5,
9, 10, 9, 9, 8, 7, 12, 9, 10, 12, 12, 11, 11, 8, 12, 8, 15, 21,
3, 3, 13, 9, 8, 13, 7, 11), .Dim = c(10L, 4L))

HTH,
Dennis

On Wed, Aug 24, 2011 at 3:48 PM, Jeffrey Joh  wrote:
>
> I have a data frame that is about 40 columns by 1 rows.  I want to get 
> the dput of small portion of that by using dput(results[1:10,3:6]).  The dput 
> is very long and includes all the values from the original data frame.  Why 
> is that?
>
>
>
> Jeffrey
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating new variable with maximum visit date by group_id

2011-08-24 Thread Dennis Murphy
Hi:

Since you tried several functions (reasonably so IMO), here is how
they would work in this problem, in addition to the solutions already
supplied.

Some data massaging before starting, taking your data as input, saved
into an object named visits:

visits <- structure(list(unique_id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L, 4L, 4L), visit_date = structure(c(14615, 14975, 14980, 14245,
14250, 14615, 14975, 14981, 13879, 14245, 14610), class = "Date")),
.Names = c("unique_id",
"visit_date"), row.names = c(NA, -11L), class = "data.frame")

# plyr package:
ddply(visits, .(unique_id), transform, last_visit_date = max(visit_date))

# Faster version, using the more recent function mutate():
mutate(visits, .(unique_id), last_visit_date = max(visit_date))

# data.table:
library(data.table)
# Create the data table from a data frame, using unique_id as a key:
visDT <- data.table(visits, key = 'unique_id')

#  list() is used to output multiple variables:
visDT[, list(visit_date, last_visit_date = max(visit_date)), by = 'unique_id']

# doBy package:
library(doBy)
transformBy(~ unique_id, data = visits, last_visit_date = max(visit_date))

# base package, using transform():
transform(visits, last_visit_date = ave(visit_date, unique_id, FUN = max))

# ...which you could have gotten from the original data as follows, including
# the conversion of visit_date to a Date variable assuming it was read
in as character
# rather than factor, as below:

visits0 <- structure(list(unique_id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L, 4L, 4L), visit_date = c("01/06/2010", "01/01/2011", "01/06/2011",
"01/01/2009", "01/06/2009", "01/06/2010", "01/01/2011", "01/07/2011",
"01/01/2008", "01/01/2009", "01/01/2010")), .Names = c("unique_id",
"visit_date"), row.names = c(NA, -11L), class = "data.frame")
str(visits0)

within(visits0, {
 visit_date <- as.Date(visit_date, format = '%m/%d/%Y')
 last_visit_date <- ave(visit_date, unique_id, FUN = max)
   }  )

# All of the above produce
   unique_id visit_date last_visit_date
1  1 2010-01-06  2011-01-06
2  1 2011-01-01  2011-01-06
3  1 2011-01-06  2011-01-06
4  2 2009-01-01  2011-01-07
5  2 2009-01-06  2011-01-07
6  2 2010-01-06  2011-01-07
7  2 2011-01-01  2011-01-07
8  2 2011-01-07  2011-01-07
9  3 2008-01-01  2008-01-01
10 4 2009-01-01  2010-01-01
11 4 2010-01-01  2010-01-01

With basic summaries such as this, there are a wealth of options
available. mutate() is faster than ddply() for transform operations,
data.table can be very fast, especially in large data sets, and the
within() statement above shows how to perform the entire task
(including conversion to dates) in one fell swoop. Notice that by
using within(), one can convert visit_date to a Date object and then
use it as input to the next function.

You should be able to do this with the mysql package, too, but my SQL
programming skills are pretty limited so I'll pass on that one. I made
a weak effort butno.

HTH,
Dennis


On Wed, Aug 24, 2011 at 2:15 PM, Kathleen Rollet
 wrote:
>
>
>
>
>
> Dear R users,
>
> I am encoutering the following problem: I have a dataset with a 'unique_id' 
> and different 'visit_date' (formatted as.Date, "%d/%m/%Y") per unique_id. I 
> would like to create a new variable with the most recent date of visit per 
> unique_id as shown below.
>
> unique_id visit_date last_visit_date
> 1  01/06/2010  01/06/2011
> 1  01/01/2011  01/06/2011
> 1  01/06/2011  01/06/2011
> 2  01/01/2009  01/07/2011
> 2  01/06/2009  01/07/2011
> 2  01/06/2010  01/07/2011
> 2  01/01/2011  01/07/2011
> 2  01/07/2011  01/07/2011
> 3  01/01/2008  01/01/2008
> 4  01/01/2009  01/01/2010
> 4  01/01/2010  01/01/2010
>
> I know the coding to easily do this in Stata, SAS, and Excel but I cannot 
> find how to do it in R. I try multiple function such as tapply( ), ave( ), 
> ddply ( ), and transform ( ) after looking into previous postings. The codes 
> are running but only NA values are generated or I get error messages that the 
> replacement has less row than the data has (there are about 1000 unique_id 
> and over 4000 rows in my dataset presently).
> I would greatly appreciate if someone could help me.
>
> Thank you!
>
> Kathleen R.
> Epidemiologist
> Montreal, QC, Canada
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Suppressing error messages printed in xyplot() with panel function

2011-08-24 Thread Deepayan Sarkar
On Thu, Aug 25, 2011 at 1:00 AM, Adam Zeilinger  wrote:
> Hello,
>
> I am using the xyplot() function to create a series of scatterplot panels
> with lines of best fit.  To draw the lines of best fit for each panel, I am
> using a panel function.  Here's an example:
>
>> species <- as.character(c(rep(list("A", "B", "A"), 10), "B"))
>> year <- as.character(c(rep(list("2009", "2009", "2010"), 10), "2010"))
>> x <- rnorm(31, mean = 50, sd = 1)
>> y <- 3*x + 20
>> ex.data <- data.frame(cbind(species, year, x, y))
>> ex.data$x <- as.numeric(ex.data$x)
>> ex.data$y <- as.numeric(ex.data$y)
>> xyplot(y ~ x|species*year, data = ex.data,
> +   panel = function(x, y) {
> +   panel.xyplot(x, y, pch=16, col="black")
> +   panel.abline(lm(y ~ x))})
>
> With my data set, there are some panels with less than 2 data points.  In
> these panels, an error message is printed in the panel, something like:
> "Error using packet 4 missing value where TRUE/FALSE needed."
>
> In the panels with error messages, I want to keep the panels but suppress
> the error message, such that the panel is blank or has only one datum.  How
> do I do suppress the printing of the error message?

Generally speaking, suppressing error messages should not be your
first instinct. If you know why the error is happening, it's better to
make sure it never happens, e.g., with something like

   if (length(x) >= 2) panel.abline(lm(y ~ x))

Of course, as Dennis says, in this case you can take advantage of the
'type' argument, which already implements what you want.

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to combine two learned regression models?

2011-08-24 Thread Andra Isan
Hi All, 

I have a set of features of size p and I would like to separate my feature 
space into two sets so that p = p1 + p2, p1 is a set of features and p2 is 
another set of features and I want to fit a glm model for each sets of features 
separately. Then I want to combine the results of two glm models with a 
parameter beta. For example, beta * F(p1) + (1-beta) * F(p2) where F(p1) is a 
learned model for feature set p1 and F(p2) is the learned model for feature set 
p2. Is there any way to do that in R? 

There is a package called mixtools which can fit a mixture of two regression 
models but it does not separate the features. I would also like to separate 
features and fit a model for each feature set and then combine them. 

Thanks,
Andra

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (OT) Puzzled over reinstalling R on new Linux distro...

2011-08-24 Thread Peter Langfelder
On Wed, Aug 24, 2011 at 6:19 PM, Brian Lunergan  wrote:
> Evening all:
>
> Redid my home box using VectorLinux (Slackware variation) and now I'm not sure
> which game trail to follow to reinstall R.
>
> Could somebody with more knowledge in this share their thoughts off-list. Just
> need a pointer to the appropriate trail head. I can take it from there.
>

Simple installation is very, well, simple - download the
R-2.13.1.tar.gz bundle, for example from here,

http://cran.r-project.org/src/base/R-2/R-2.13.1.tar.gz

then unpack it and follow the usual steps of ./configure, make and
make install.

See the R Installation and Administration manual at
http://cran.r-project.org/doc/manuals/R-admin.html for more details,
configuration and installation options, etc.

HTH,

Peter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pooled hazard model with aftreg and time-dependent variables

2011-08-24 Thread JPF
This is for coxph:

The cluster term is used to compute a robust variance for the model. The
term + cluster(id) where each value of id is unique is equivalent to
specifying the robust=T argument, and produces an approximate jackknife
estimate of the variance. If the id variable were not unique, but instead
identifies clusters of correlated observations, then the variance estimate
is based on a grouped jackknife. 

How could be done using aftreg? and/or phreg?

--
View this message in context: 
http://r.789695.n4.nabble.com/pooled-hazard-model-with-aftreg-and-time-dependent-variables-tp3758805p3767222.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Suppressing error messages printed in xyplot() with panel function

2011-08-24 Thread Dennis Murphy
Hi:

Here's one way out:

xyplot(y ~ x|species*year, data = ex.data, type = c('p', 'r'))

type = is a very useful argument to know in xyplot(). See p.75 of the
Lattice book.

HTH,
Dennis

On Wed, Aug 24, 2011 at 12:30 PM, Adam Zeilinger  wrote:
> Hello,
>
> I am using the xyplot() function to create a series of scatterplot panels
> with lines of best fit.  To draw the lines of best fit for each panel, I am
> using a panel function.  Here's an example:
>
>> species <- as.character(c(rep(list("A", "B", "A"), 10), "B"))
>> year <- as.character(c(rep(list("2009", "2009", "2010"), 10), "2010"))
>> x <- rnorm(31, mean = 50, sd = 1)
>> y <- 3*x + 20
>> ex.data <- data.frame(cbind(species, year, x, y))
>> ex.data$x <- as.numeric(ex.data$x)
>> ex.data$y <- as.numeric(ex.data$y)
>> xyplot(y ~ x|species*year, data = ex.data,
> +   panel = function(x, y) {
> +   panel.xyplot(x, y, pch=16, col="black")
> +   panel.abline(lm(y ~ x))})
>
> With my data set, there are some panels with less than 2 data points.  In
> these panels, an error message is printed in the panel, something like:
> "Error using packet 4 missing value where TRUE/FALSE needed."
>
> In the panels with error messages, I want to keep the panels but suppress
> the error message, such that the panel is blank or has only one datum.  How
> do I do suppress the printing of the error message?
>
> Thanks in advance for your help.
> Adam
>
> --
>
> Adam Zeilinger
> Conservation Biology Program
> University of Minnesota
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bold in expression in Y label

2011-08-24 Thread David Winsemius


On Aug 24, 2011, at 9:10 PM, Katherine Lizama Allende wrote:


Hi all:

I need to put bold font in the y label, which is an expression at  
the same

time

When putting font.lab=2 in plot, it only puts bold font for the x axis
label.. what should I do? Thanks very much

plot(jitter(c(1, 4, 7, 9, 11, 13), a=0.1), y = Bllim.m, xlab =  
"Week", ylab
= expression(paste("B removal rate", " (mg/m"^{3}, "- 
d)")),font.lab=2, ylim

= c(-1000,9000), xlim = c(0, 14), pch = 22,  axes= FALSE, cex = 0.1)

I don't know why, (and I was not familiar with font.lab as an  
arguemnt, nor can I find its documentation) ...  but I can offer a  
solution:


plot(1,1, xlab = "Week", font.lab=2,ylab =  
expression(bold(B~removal~rate~mg/m^{3}*-d)), cex = 0.1)


--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to compare two dataset with same sampling

2011-08-24 Thread William Dunlap
merge() can align the values in the obs columns
so those with the same date can be compared.  E.g.,
set up the data with the following copy-and-pastable
code:

A <- read.table(header=TRUE, textConnection(" year  mon  day  obs
 2010 03 1212
 2010 03 1822
 2010 04 1262
 2010 07  24   29
"))
B <- read.table(header=TRUE, textConnection("year  mon  day  obs
 2010 03 1215
 2010 04 1257
 2010 07 2432
 2010 08 2315
"))
AB <- merge(A, B, by=c("year", "mon", "day"), suffixes=c("A","B"))

and you can use AB to do comparisons:

> AB
  year mon day obsA obsB
1 2010   3  12   12   15
2 2010   4  12   62   57
3 2010   7  24   29   32
> with(AB, obsA - obsB)
[1] -3  5 -3

Look at help("merge") for more options.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Jie TANG
> Sent: Wednesday, August 24, 2011 6:57 PM
> To: Jeff Newmiller; r-help@r-project.org
> Subject: Re: [R] how to compare two dataset with same sampling
> 
> thanks.
>  Merge? I am just looking for a method to comparison on homogeneous sample
> between two dataset.
> 
> 
> 2011/8/25 Jeff Newmiller 
> 
> > ?merge may be what you are looking for. If not, you should clarify what you
> > want to do.
> > ---
> > Jeff Newmiller The . . Go Live...
> > DCN: Basics: ##.#. ##.#. Live Go...
> > Live: OO#.. Dead: OO#.. Playing
> > Research Engineer (Solar/Batteries O.O#. #.O#. with
> > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> > ---
> >
> > Sent from my phone. Please excuse my brevity.
> >
> > Jie TANG  wrote:
> >
> >> hi ,
> >> Now I have two dataset and want to compare them with same sample.
> >> Dataset A:
> >>  year  mon  day  obs
> >>  2010 03 1212
> >>  2010 03 1822
> >>  2010 04 1262
> >>  2010 07  24   29
> >>
> >> Dataset B:
> >>  year  mon  day  obs
> >>  2010 03 1215
> >>  2010 04 1257
> >>  2010 07 2432
> >>  2010 08 2315
> >>
> >>
> >> As you see,dataset A and B have several observation data but their obs data
> >> is not same everyday.
> >> Now I want to compare the data from two dataset in the same time.
> >> How could I write the R script?
> >>
> >> errdata<-subset(A,A$V11==Bdata$V11 & A$V12==B$V12)
> >>
> >> this command seems failed.How could I?
> >>
> >> --
> >>
> >>[[alternative HTML version deleted]]
> >>
> >> --
> >>
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide 
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> 
> 
> --
> TANG Jie
> Email: totang...@gmail.com
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bold in expression in Y label

2011-08-24 Thread Jorge I Velez
Hi Kathy,

Try

plot(10, xlab = "Week", ylab = expression(bold("B removal rate
"*"(mg/"*m^3*"-d)")))

HTH,
Jorge


On Wed, Aug 24, 2011 at 9:10 PM, Katherine Lizama Allende <> wrote:

> Hi all:
>
> I need to put bold font in the y label, which is an expression at the same
> time
>
> When putting font.lab=2 in plot, it only puts bold font for the x axis
> label.. what should I do? Thanks very much
>
> plot(jitter(c(1, 4, 7, 9, 11, 13), a=0.1), y = Bllim.m, xlab = "Week", ylab
> = expression(paste("B removal rate", " (mg/m"^{3}, "-d)")),font.lab=2, ylim
> = c(-1000,9000), xlim = c(0, 14), pch = 22,  axes= FALSE, cex = 0.1)
>
>
> Kathy
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to store the output of a loop into a matrix??

2011-08-24 Thread Jorge I Velez
Hi Alexandra,

Here is an alternative without using a loop:

matrix(sapply(a, rep, T), ncol = 1)

HTH,
Jorge


On Wed, Aug 24, 2011 at 9:10 PM, Soberon Velez, Alexandra Pilar <> wrote:

> Hello,
>
> I want to create a matrix of N random numbers with a uniform distributions.
> Later, I want to repeat T times each row of this matrix. For this I do the
> following loop:
>
> N<-45
> T<-10
> n<-N*T
> a<-matrix(runif(N,min=-1,max=1),nr=N)
>
> mymat<-matrix(rep(NA,n),nr=n,nc=1)
> for(i in i:N){
> b<-rep(a[i,],T)
> mymat[i,]<-b
> }
>
> Mi problem is that with this loop I can see the output of the loop but I
> canot get the matrix (mymat) that contains the full output of the loop.
>
> Please, somebody know how can I create a matrix that contains the output of
> the loop with a dimension NTx1 (450x1).
>
> Thanks a lot for your help.
> Alexandra
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to compare two dataset with same sampling

2011-08-24 Thread Jie TANG
thanks.
 Merge? I am just looking for a method to comparison on homogeneous sample
between two dataset.


2011/8/25 Jeff Newmiller 

> ?merge may be what you are looking for. If not, you should clarify what you
> want to do.
> ---
> Jeff Newmiller The . . Go Live...
> DCN: Basics: ##.#. ##.#. Live Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> ---
>
> Sent from my phone. Please excuse my brevity.
>
> Jie TANG  wrote:
>
>> hi ,
>> Now I have two dataset and want to compare them with same sample.
>> Dataset A:
>>  year  mon  day  obs
>>  2010 03 1212
>>  2010 03 1822
>>  2010 04 1262
>>  2010 07  24   29
>>
>> Dataset B:
>>  year  mon  day  obs
>>  2010 03 1215
>>  2010 04 1257
>>  2010 07 2432
>>  2010 08 2315
>>
>>
>> As you see,dataset A and B have several observation data but their obs data
>> is not same everyday.
>> Now I want to compare the data from two dataset in the same time.
>> How could I write the R script?
>>
>> errdata<-subset(A,A$V11==Bdata$V11 & A$V12==B$V12)
>>
>> this command seems failed.How could I?
>>
>> --
>>
>>  [[alternative HTML version deleted]]
>>
>> --
>>
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>


-- 
TANG Jie
Email: totang...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bold in expression in Y label

2011-08-24 Thread Katherine Lizama Allende
Hi all:

I need to put bold font in the y label, which is an expression at the same
time

When putting font.lab=2 in plot, it only puts bold font for the x axis
label.. what should I do? Thanks very much

plot(jitter(c(1, 4, 7, 9, 11, 13), a=0.1), y = Bllim.m, xlab = "Week", ylab
= expression(paste("B removal rate", " (mg/m"^{3}, "-d)")),font.lab=2, ylim
= c(-1000,9000), xlim = c(0, 14), pch = 22,  axes= FALSE, cex = 0.1)


Kathy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (OT) Puzzled over reinstalling R on new Linux distro...

2011-08-24 Thread Brian Lunergan
Evening all:

Redid my home box using VectorLinux (Slackware variation) and now I'm not sure 
which game trail to follow to reinstall R.

Could somebody with more knowledge in this share their thoughts off-list. Just 
need a pointer to the appropriate trail head. I can take it from there.

Regards...
-- 
Brian Lunergan
Nepean, Ontario
Canada

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to store the output of a loop into a matrix??

2011-08-24 Thread Soberon Velez, Alexandra Pilar
Hello,

I want to create a matrix of N random numbers with a uniform distributions. 
Later, I want to repeat T times each row of this matrix. For this I do the 
following loop:

N<-45
T<-10
n<-N*T
a<-matrix(runif(N,min=-1,max=1),nr=N)

mymat<-matrix(rep(NA,n),nr=n,nc=1)
for(i in i:N){
b<-rep(a[i,],T)
mymat[i,]<-b
}

Mi problem is that with this loop I can see the output of the loop but I canot 
get the matrix (mymat) that contains the full output of the loop.

Please, somebody know how can I create a matrix that contains the output of the 
loop with a dimension NTx1 (450x1).

Thanks a lot for your help.
Alexandra

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to compare two dataset with same sampling

2011-08-24 Thread Jeff Newmiller
?merge may be what you are looking for. If not, you should clarify what you 
want to do.
---
Jeff Newmiller The . . Go Live...
DCN: Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Jie TANG  wrote:

hi ,
Now I have two dataset and want to compare them with same sample.
Dataset A:
year mon day obs
2010 03 12 12
2010 03 18 22
2010 04 12 62
2010 07 24 29

Dataset B:
year mon day obs
2010 03 12 15
2010 04 12 57
2010 07 24 32
2010 08 23 15


As you see,dataset A and B have several observation data but their obs data
is not same everyday.
Now I want to compare the data from two dataset in the same time.
How could I write the R script?

errdata<-subset(A,A$V11==Bdata$V11 & A$V12==B$V12)

this command seems failed.How could I?

--

[[alternative HTML version deleted]]

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to compare two dataset with same sampling

2011-08-24 Thread Jie TANG
hi ,
Now I have two dataset and want to compare them with same sample.
Dataset A:
 year  mon  day  obs
 2010 03 1212
 2010 03 1822
 2010 04 1262
 2010 07  24   29

Dataset B:
 year  mon  day  obs
 2010 03 1215
 2010 04 1257
 2010 07 2432
 2010 08 2315


As you see,dataset A and B have several observation data but their obs data
is not same everyday.
Now I want to compare the data from two dataset in the same time.
How could I write the R script?

errdata<-subset(A,A$V11==Bdata$V11 & A$V12==B$V12)

this command seems failed.How could I?

--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Boxplot orders

2011-08-24 Thread David Winsemius


On Aug 24, 2011, at 8:36 PM, Weidong Gu wrote:


At default,  factors (months) are alphabetically leveled. You can
explicitly re-level months

months<-factor(months,levels=c('Jan','Feb','Mar',...,'Dec'))



> ?Constants
> month.abb
 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct"  
"Nov" "Dec"


No pi in the sky, that!

--
david.


Then it should work.

Weidong Gu

On Wed, Aug 24, 2011 at 10:45 AM, Phoebe Jekielek  
 wrote:

Hi there,

I have length data of an organism over the year and I want to make a
boxplot. I get the boxplot just fine but the months are all out of  
order. In
the data set they are in order from Jan-Dec...how can I fix this  
problem?


Thanks so much in advance!!

Phoebe

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Importing data from MS EXCEL (.xls) to R XXXX

2011-08-24 Thread B77S
I agree with Ken.. if you can, save it as a CSV file.  But if you have a
bunch of these, then it isn't very efficient.  I use read.xlsx() from the
package "xlsx".  

I notice that you are using the full path.. have you tried changing
directories?... I find it is best to compartmentalize my work and (with a
few exceptions) work within a folder.

good luck.

 


Dan Abner wrote:
> 
> Hello everyone,
> 
> What is the simplest, most RELIABLE way to import data from MS EXCEL
> (.xls)
> format to R? In the past I have used the read.xls() function from the
> xlsReadWrite package, however, I have been wrestling with it all afternoon
> long with no success. I continue to receive the following error message:
> 
> 
>> {widge<-read.xls("F:\\Classes\\Z1.Data\\stat.3010\\WidgeOne.xls",
> + colNames=TRUE,sheet=1)}
> Error in .Call("ReadXls", file, colNames, sheet, type, from, rowNames,  :
>   Incorrect number of arguments (11), expecting 10 for 'ReadXls'
> 
> Any insight/suggestions/assistance is appreciated.
> 
> Thank you,
> 
> Dan
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

--
View this message in context: 
http://r.789695.n4.nabble.com/Importing-data-from-MS-EXCEL-xls-to-R--tp3766864p3767063.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Boxplot orders

2011-08-24 Thread Weidong Gu
At default,  factors (months) are alphabetically leveled. You can
explicitly re-level months

months<-factor(months,levels=c('Jan','Feb','Mar',...,'Dec'))

Then it should work.

Weidong Gu

On Wed, Aug 24, 2011 at 10:45 AM, Phoebe Jekielek  wrote:
> Hi there,
>
> I have length data of an organism over the year and I want to make a
> boxplot. I get the boxplot just fine but the months are all out of order. In
> the data set they are in order from Jan-Dec...how can I fix this problem?
>
> Thanks so much in advance!!
>
> Phoebe
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Howto convert Linear Regression data to text

2011-08-24 Thread B77S
If I understand you correctly, see ?paste

and the following to extract the values you require:

summary(res)[[4]][1] 

summary(res)[[4]][2]  

summary(res)[[8]]

HTH



ashz wrote:
> 
> Dear all,
> 
> How can I covert lm data to text in the form of "y=ax+b, r2" and how do I
> calculate R-squared(r2)?
> 
> Thanks. 
>  
> Code:
> x=18:29
> y=c(7.1,7,7.7,8.2,8.8,9.7,9.9,7.1,7.2,8.8,8.7,8.5)
> res=lm(y~x)
> 

--
View this message in context: 
http://r.789695.n4.nabble.com/Howto-convert-Linear-Regression-data-to-text-tp3766230p3767009.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Package missForest changes data types?

2011-08-24 Thread nima82
Hi,

I'm trying to impute a data set consisting of mixed type variables, mostly
logical, but also ordered and non-ordered factors, and numeric variables
with the missForest package. Although the help file of missForest states
that the resulting data matrix 'ximp' has the same type as the original data
'xmis', the ximp I get has all numeric variables with decimal numbers. Has
someone any explanations for this behavior? Can the data types of ximp be
controlled? If not, is there an easy way to coerce the variables back to
their original data types by rounding to the nearest levels of the factors?


Thanks

Nima

--
View this message in context: 
http://r.789695.n4.nabble.com/Package-missForest-changes-data-types-tp3766921p3766921.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Refit for flexmix

2011-08-24 Thread ericmak
Hi all,

Just a small question: After fitting a multivariate mixture using flexmix, I
wish to use refit to get the parameters of covariates and their standard
errors. However using refit I only can see the components for the first
dependent variable. What should I do if I want to see the others?

Thanks a lot,
Eric

--
View this message in context: 
http://r.789695.n4.nabble.com/Refit-for-flexmix-tp3766984p3766984.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dput data frame

2011-08-24 Thread David Winsemius


On Aug 24, 2011, at 6:48 PM, Jeffrey Joh wrote:



I have a data frame that is about 40 columns by 1 rows.  I want  
to get the dput of small portion of that by using  
dput(results[1:10,3:6]).  The dput is very long and includes all the  
values from the original data frame.  Why is that?




I am guessing that you have factors with numerous levels that you are  
incorrectly interpreting as "all the values from the original data  
frame". Hard to do more than guess, given the description.


--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dput data frame

2011-08-24 Thread R. Michael Weylandt
How are you storing the elements of the data frame? I'm working with a data
frame of doubles with no names and having trouble observing the same
problem. If they are factor levels though, that *might* account for it.
sessionInfo() might also help. Obviously it's not convenient to print this
example, but if you could reproduce the irregularity on a smaller data
frame, that would be great as well.

Michael Weylandt

On Wed, Aug 24, 2011 at 6:48 PM, Jeffrey Joh  wrote:

>
> I have a data frame that is about 40 columns by 1 rows.  I want to get
> the dput of small portion of that by using dput(results[1:10,3:6]).  The
> dput is very long and includes all the values from the original data frame.
>  Why is that?
>
>
>
> Jeffrey
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dput data frame

2011-08-24 Thread Jeffrey Joh

I have a data frame that is about 40 columns by 1 rows.  I want to get the 
dput of small portion of that by using dput(results[1:10,3:6]).  The dput is 
very long and includes all the values from the original data frame.  Why is 
that?

 

Jeffrey   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Importing data from MS EXCEL (.xls) to R XXXX

2011-08-24 Thread Jorge I Velez
Hi Dan,

You might try

require(gdata)
?read.xls

HTH,
Jorge


On Wed, Aug 24, 2011 at 6:20 PM, Dan Abner <> wrote:

> Hello everyone,
>
> What is the simplest, most RELIABLE way to import data from MS EXCEL (.xls)
> format to R? In the past I have used the read.xls() function from the
> xlsReadWrite package, however, I have been wrestling with it all afternoon
> long with no success. I continue to receive the following error message:
>
>
> > {widge<-read.xls("F:\\Classes\\Z1.Data\\stat.3010\\WidgeOne.xls",
> + colNames=TRUE,sheet=1)}
> Error in .Call("ReadXls", file, colNames, sheet, type, from, rowNames,  :
>  Incorrect number of arguments (11), expecting 10 for 'ReadXls'
>
> Any insight/suggestions/assistance is appreciated.
>
> Thank you,
>
> Dan
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating new variable with maximum visit date by group_id

2011-08-24 Thread Jean V Adams
Try this:

require(zoo)
lvd <- tapply(df$visit_date, df$unique_id, max)
index <- tapply(df$visit_date, df$unique_id)
df$last_visit_date <- as.Date(lvd[index])

Jean

Kathleen Rollet wrote on 08/24/2011 04:15:45 PM:
> 
> Dear R users,
> 
> I am encoutering the following problem: I have a dataset with a 
> 'unique_id' and different 'visit_date' (formatted as.Date, "%d/%m/%
> Y") per unique_id. I would like to create a new variable with the 
> most recent date of visit per unique_id as shown below.
> 
> unique_id visit_date last_visit_date 
> 1  01/06/2010  01/06/2011 
> 1  01/01/2011  01/06/2011 
> 1  01/06/2011  01/06/2011 
> 2  01/01/2009  01/07/2011 
> 2  01/06/2009  01/07/2011 
> 2  01/06/2010  01/07/2011 
> 2  01/01/2011  01/07/2011 
> 2  01/07/2011  01/07/2011 
> 3  01/01/2008  01/01/2008 
> 4  01/01/2009  01/01/2010 
> 4  01/01/2010  01/01/2010 
> 
> I know the coding to easily do this in Stata, SAS, and Excel but I 
> cannot find how to do it in R. I try multiple function such as 
> tapply( ), ave( ), ddply ( ), and transform ( ) after looking into 
> previous postings. The codes are running but only NA values are 
> generated or I get error messages that the replacement has less row 
> than the data has (there are about 1000 unique_id and over 4000 rows
> in my dataset presently). 
> I would greatly appreciate if someone could help me.
> 
> Thank you!
> 
> Kathleen R.
> Epidemiologist
> Montreal, QC, Canada 
>[[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Importing data from MS EXCEL (.xls) to R XXXX

2011-08-24 Thread Ken Hutchison
-- Forwarded message --
From: Ken Hutchison 
Date: Wed, Aug 24, 2011 at 6:27 PM
Subject: Re: [R] Importing data from MS EXCEL (.xls) to R 
To: Dan Abner 


save as csv.
?read.csv
   Ken


On Wed, Aug 24, 2011 at 6:20 PM, Dan Abner  wrote:

> Hello everyone,
>
> What is the simplest, most RELIABLE way to import data from MS EXCEL (.xls)
> format to R? In the past I have used the read.xls() function from the
> xlsReadWrite package, however, I have been wrestling with it all afternoon
> long with no success. I continue to receive the following error message:
>
>
> > {widge<-read.xls("F:\\Classes\\Z1.Data\\stat.3010\\WidgeOne.xls",
> + colNames=TRUE,sheet=1)}
> Error in .Call("ReadXls", file, colNames, sheet, type, from, rowNames,  :
>  Incorrect number of arguments (11), expecting 10 for 'ReadXls'
>
> Any insight/suggestions/assistance is appreciated.
>
> Thank you,
>
> Dan
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: help with "by" command

2011-08-24 Thread Ken Hutchison
-- Forwarded message --
From: Ken Hutchison 
Date: Wed, Aug 24, 2011 at 6:06 PM
Subject: Re: [R] help with "by" command
To: amalka 


?tapply
or more specifically
?ave
 Hope this helps,
 Ken


On Wed, Aug 24, 2011 at 2:51 PM, amalka  wrote:

> Hello,
>
> I am a new user of R, and I'd be grateful if someone could help me with the
> following:
>
> I would like to compute the mean of variable "trust" in dataframe "foo",
> but
> separately for each level of variable V2.  That is, I'd like to compute the
> mean of trust at each level of V2.
>
> I have done this:
>
> > tmp <- by(foo, foo$V2, function(x) mean(foo$trust, na.rm=T))
> >tmp
>
> Doing this does indeed give me a mean for variable trust at each level of
> V2
> - but the problem is that I get the exact same mean for each level of V2.
> The means are not the really the same across levels of V2, but instead of
> getting the mean for each level of V2 I get the overall mean for the
> variable in the dataframe listed over and over for each level of V2.
>
> I have been attempting to figure this out for a while now, but I just can't
> seem to figure it out.
>
> Thanks for your time!
> Ari
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/help-with-by-command-tp3766285p3766285.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating new variable with maximum visit date by group_id

2011-08-24 Thread David Winsemius


On Aug 24, 2011, at 5:15 PM, Kathleen Rollet wrote:


Dear R users,

I am encoutering the following problem: I have a dataset with a  
'unique_id' and different 'visit_date' (formatted as.Date, "%d/%m/ 
%Y") per unique_id. I would like to create a new variable with the  
most recent date of visit per unique_id as shown below.


That should not result in what is below unless you have changes  
something in options() forcing a different data output format. (Is  
that even possible?)


unique_id visit_date last_visit_date
1  01/06/2010  01/06/2011
1  01/01/2011  01/06/2011
1  01/06/2011  01/06/2011
2  01/01/2009  01/07/2011
2  01/06/2009  01/07/2011
2  01/06/2010  01/07/2011
2  01/01/2011  01/07/2011
2  01/07/2011  01/07/2011
3  01/01/2008  01/01/2008
4  01/01/2009  01/01/2010
4  01/01/2010  01/01/2010


Read it in as dfrm named "dat" with:
colClasses=c("numeric", "character", "character")

Then:

dat$visit_date <-as.Date(dat$visit_date, format="%d/%m/%Y",  
origin="1970-01-01")
dat$last_visit_date <-as.Date(dat$last_visit_date, format="%d/%m/%Y",  
origin="1970-01-01")


I know the coding to easily do this in Stata, SAS, and Excel but I  
cannot find how to do it in R. I try multiple function such as  
tapply( ), ave( ), ddply ( ), and transform ( ) after looking into  
previous postings. The codes are running but only NA values are  
generated or I get error messages that the replacement has less row  
than the data has (there are about 1000 unique_id and over 4000 rows  
in my dataset presently).


The 'ave' function should be able to do it. It returns a vector as  
long as the dataframe has rows.
You are asked to post your failures as well as reproducible code which  
is best produced with dput(). (This apples doubly so when you choose  
non-standard formats for Date objects.)


Please read:
?dput
?ave

Worked example:

dat$most_recent<- format(ave(dat$visit_date, dat$unique_id, FUN=max),  
format="%d/%m/%Y")

dat

NOTE: that last column is not an R date but rather a character vector.




I would greatly appreciate if someone could help me.

Thank you!

Kathleen R.
Epidemiologist
Montreal, QC, Canada
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Importing data from MS EXCEL (.xls) to R XXXX

2011-08-24 Thread Dan Abner
Hello everyone,

What is the simplest, most RELIABLE way to import data from MS EXCEL (.xls)
format to R? In the past I have used the read.xls() function from the
xlsReadWrite package, however, I have been wrestling with it all afternoon
long with no success. I continue to receive the following error message:


> {widge<-read.xls("F:\\Classes\\Z1.Data\\stat.3010\\WidgeOne.xls",
+ colNames=TRUE,sheet=1)}
Error in .Call("ReadXls", file, colNames, sheet, type, from, rowNames,  :
  Incorrect number of arguments (11), expecting 10 for 'ReadXls'

Any insight/suggestions/assistance is appreciated.

Thank you,

Dan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Split data frame by date (POSIXlt)

2011-08-24 Thread Jean V Adams
You could try using the numeric representation of date, and split the data 
frame using that variable.  For example:

src$date.num <- as.numeric(src$date)

Jean

Franc Lucas wrote on 08/24/2011 02:42:58 PM:
> 
>Hello everyone,
>I want to split a data.frame by the column date . The data frame 
looks like
>this
>date  time   openclose
>02.01.201109:00:00  1000 1200
>02.01.201109:05:02  1200 1203
>...
>01.02.201110:01:21  1029 1110
>.
>30.03.201112:02:12  1231  1200
>30.03.201117:00:00  1200  1190
>Please  note  that this is the German version of the date notation. 
So
>02.01.2011 is January 2nd 2011.
>So the column data is class: character.
>When I now split the dataframe by date, e.g.
>Intraday <- split(x=src, f=src$date, drop=FALSE)
>..I get a list which is not sorted...for example:  "01.02.2011" 
(February
>1st) comes before "02.01.2011" (January 2nd).
>My approach was to transform the column date into POSIXct by using 
strptime
>(btw: I dont care for the time information):
>src$date <- strptime(tickdata$date, "%d.%m.%Y")
>The data frame then looks like this:
>date  time   openclose
>01-02-201109:00:00  1000 1200
>01-02-201109:05:02  1200 1203
>...
>02-01-201110:01:21  1029 1110
>.
>03-30-201112:02:12  1231  1200
>03-30-201117:00:00  1200  1190
>which is totally fine. But when I now try to split the data frame it 
says,
>that  I  am  indexing out of bounds... (German: "Fehler in args[[i]] 
:
>Indizierung außerhalb der Grenzen")
>Can anybody help me?
>Thanks in advance!
>Best
>Franc
>BSc. Student
>University of Mannheim
> 
>Schon gehört? WEB.DE hat einen genialen Phishing-Filter in die
>Toolbar eingebaut! [1]http://produkte.web.de/go/toolbar
> 
> References
> 
>1. http://produkte.web.de/go/toolbar
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] as.numeric() and POSIXct format

2011-08-24 Thread Justin Haynes
as.POSIXct(518400,origin='2001-01-01')
[1] "2001-01-07 PST"


as.POSIXct(as.numeric(as.POSIXct(518400,origin='2001-01-01')),origin='1970-01-01')
[1] "2001-01-07 08:00:00 PST"


On Wed, Aug 24, 2011 at 9:22 AM, Agustin Lobo wrote:

> Hi!
>
> I'm confused by this:
> > as.numeric(as.POSIXct(518400,**origin="2001-01-01"))
> [1] 978822000
>
> I guess the problem is that as.numeric() assumes a different origin, but
> cannot find
> any default origin.
>
> How can I get back the seconds from the POSIXct format? In other words,
> which the inverse function of as.POSIXct()?
> I've tried as.numeric and unclass() using a origin= argument, but this does
> not work.
>
> Thanks
>
> Agus
>
> --
> Dr. Agustin Lobo
> Institut de Ciencies de la Terra "Jaume Almera" (CSIC)
> LLuis Sole Sabaris s/n
> 08028 Barcelona
> Spain
> Tel. 34 934095410
> Fax. 34 934110012
> email: agustin.l...@ija.csic.es
>
> __**
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/**
> posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with "by" command

2011-08-24 Thread Jorge I Velez
Hi Ari,

Try this instead

with(foo, tapply(V2, trust, mean, na.rm = TRUE))

See ?tapply and ?with for more information.

HTH,
Jorge


On Wed, Aug 24, 2011 at 2:51 PM, amalka <> wrote:

> Hello,
>
> I am a new user of R, and I'd be grateful if someone could help me with the
> following:
>
> I would like to compute the mean of variable "trust" in dataframe "foo",
> but
> separately for each level of variable V2.  That is, I'd like to compute the
> mean of trust at each level of V2.
>
> I have done this:
>
> > tmp <- by(foo, foo$V2, function(x) mean(foo$trust, na.rm=T))
> >tmp
>
> Doing this does indeed give me a mean for variable trust at each level of
> V2
> - but the problem is that I get the exact same mean for each level of V2.
> The means are not the really the same across levels of V2, but instead of
> getting the mean for each level of V2 I get the overall mean for the
> variable in the dataframe listed over and over for each level of V2.
>
> I have been attempting to figure this out for a while now, but I just can't
> seem to figure it out.
>
> Thanks for your time!
> Ari
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/help-with-by-command-tp3766285p3766285.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression by factor using "sapply"

2011-08-24 Thread Jorge I Velez
Hi elh,

You could try using split() and lapply() instead (untested):

mymodels <- lapply(split(usage, ActNo),
function(l) lm(AvgKWh ~ AvgHDD + AvgCDD, data = l)
 )
To access the coefficients for all models you can do

lapply(mymodels, coef)

and, to access model number one (first ActNo),

mymodels[[1]]
summary(mymodels[[1]])

See ?split and ?lapply for more information.

HTH,
Jorge


On Wed, Aug 24, 2011 at 1:51 PM, elh <> wrote:

> Apologies for the elementary nature of the question (yes, I'm another
> newbie)...
>
> I'd like to perform a multiple regression on a single data set containing a
> representation of energy consumption and temperatures containing account
> number, usage (KWh), heating degree days (HDD) and cooling degree (CDD)
> days.  I want to get the coefficients back from the following equation:
>lm(AvgKWh ~ AvgHDD + AvgCDD, data=usage)
>
> Given that the data set contains the usage of different accounts (e.g. some
> large energy users and some small energy users) I do not want to perform
> the
> equation just one time.  Instead, I want to re-calculate the coefficients
> (and associated measures of goodness of fit) for each account using the
> same
> equation and return the corresponding coefficients by the account number
> identifier.
>
> I thought I had figured out how to do this using  "by" and "sapply" formula
> but I keep getting an error message of: "$ operator is invalid for atomic
> vectors"
>
> Here is what I've bee trying to use
> # data is stored in a table called "usage"; other than the "ActNo" field,
> all the fields are numeric
> byDD <- function(data) {lm(AvgKWh~ AvgHDD + AvgCDD, data=data)}
> byActNo <- by(usage, usage$ActNo, FUN=byDD)
> sapply(byActNo, summary(byActno)$coef)
>
> Thanks in advance!  I'm sure a similar question has been covered somewhere
> but everytime I follow the message stream I hit a deadend.
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Regression-by-factor-using-sapply-tp3766145p3766145.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Threads in R

2011-08-24 Thread Immanuel
Thanks again, you are perfectly right. I checked and saw I was indeed
polluting my machine with unclosed threads.

On 08/23/2011 01:34 AM, Peter Langfelder wrote:
> On Mon, Aug 22, 2011 at 3:12 PM, Immanuel  wrote:
>> Hello,
>>
>> thanks for the input. Below is a small example, simpler then expected :)
>>  I'm just curious why I can't see any output from print(i).
>>
>> --
>> library(multicore)
>>
>> f_long <- function() {
>>for (i in 1:1){ a=i}
>>print(i)
>>return("finished")
>> }
>>
>> p_long <- parallel(f_long() ,silent =FALSE)
>> collect(p_long, wait=FALSE, 10)
>> # stops the execution since its not finished after 10sec
>> # on my machine anyway ;)
> I could be wrong on this, but according to the help file, collect(wait
> = FALSE) will not kill the child process, it will only collect
> whatever result the child process has sent until then using
> sendMaster(). I believe you have to kill() the child process to stop
> it completely.
>
> Peter
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unused argument(s) (Header = True) help!

2011-08-24 Thread Daniel Nordlund


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of shardman
> Sent: Wednesday, August 24, 2011 8:10 AM
> To: r-help@r-project.org
> Subject: [R] unused argument(s) (Header = True) help!
> 
> Hi,
> 
> I'm really new to R so I aoplogise if this is a stupid question.
> 
> I'm trying to import data from a .txt file into R using the read.table
> command, the headers for the data columns are already in the text file so
> I
> add Header = True after the file location. The problem is I keep getting
> the
> error message *unused argument(s) (Header = True)*, does anyone know why?
> 
> The format of the text file is like this (I've also tried spaces rather
> than
> tab to seperate the columns):
> 
> TRAP  SHANNON_INDEX
> 1 3.347
> 2 3.096
> 3 3.521
> 4 2.871
> 5 2.678
> 
> The commond looks like this:
> 
> Trap1_data<-read.table("C:/Documents and
> Settings/Samuel/Desktop/Biology/Independent study/Stats/Diversity
> indices/shannon index results trap 1.txt", Header = True)
> 
 

Sam,

the unused parameter, Header = True, is the problem.  The parameter name is 
header not Header (no caps).  And, the value for true should be TRUE (ALL caps).

Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unused argument(s) (Header = True) help!

2011-08-24 Thread Jorge I Velez
Hi Sam,

It is "header", not "Header".  See ?read.table.

HTH,
Jorge


On Wed, Aug 24, 2011 at 11:10 AM, shardman < wrote:

> Hi,
>
> I'm really new to R so I aoplogise if this is a stupid question.
>
> I'm trying to import data from a .txt file into R using the read.table
> command, the headers for the data columns are already in the text file so I
> add Header = True after the file location. The problem is I keep getting
> the
> error message *unused argument(s) (Header = True)*, does anyone know why?
>
> The format of the text file is like this (I've also tried spaces rather
> than
> tab to seperate the columns):
>
> TRAPSHANNON_INDEX
> 1   3.347
> 2   3.096
> 3   3.521
> 4   2.871
> 5   2.678
>
> The commond looks like this:
>
> Trap1_data<-read.table("C:/Documents and
> Settings/Samuel/Desktop/Biology/Independent study/Stats/Diversity
> indices/shannon index results trap 1.txt", Header = True)
>
> I would reall appreciate some help,
> Yours,
> Sam
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/unused-argument-s-Header-True-help-tp3765651p3765651.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unused argument(s) (Header = True) help!

2011-08-24 Thread R. Michael Weylandt
The proper command is "header = TRUE"

capitalization is important for both *h*eader and T*RUE*

Hope this helps,
Michael Weylandt

On Wed, Aug 24, 2011 at 11:10 AM, shardman wrote:

> Hi,
>
> I'm really new to R so I aoplogise if this is a stupid question.
>
> I'm trying to import data from a .txt file into R using the read.table
> command, the headers for the data columns are already in the text file so I
> add Header = True after the file location. The problem is I keep getting
> the
> error message *unused argument(s) (Header = True)*, does anyone know why?
>
> The format of the text file is like this (I've also tried spaces rather
> than
> tab to seperate the columns):
>
> TRAPSHANNON_INDEX
> 1   3.347
> 2   3.096
> 3   3.521
> 4   2.871
> 5   2.678
>
> The commond looks like this:
>
> Trap1_data<-read.table("C:/Documents and
> Settings/Samuel/Desktop/Biology/Independent study/Stats/Diversity
> indices/shannon index results trap 1.txt", Header = True)
>
> I would reall appreciate some help,
> Yours,
> Sam
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/unused-argument-s-Header-True-help-tp3765651p3765651.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regarding changing of title of decompose graph

2011-08-24 Thread R. Michael Weylandt
Read the plot documentation by typing ?plot, particularly the optional
argument main.

Hope this helps,

Michael Weylandt

On Wed, Aug 24, 2011 at 1:40 PM, upani1982  wrote:

> Hi All,
>
> I am new to this forum. I have just started learning R. When i use
> plot(decompose(x)), i am getting the title " Additive time series
> decomposition". How to make this title off and change to some other title.
>
> Any help regarding this is highly appreciated.
>
> With sincerer regards,
> Upananda
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/regarding-changing-of-title-of-decompose-graph-tp3766114p3766114.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Append a value to a vector

2011-08-24 Thread Jean V Adams
Claudio Zanettini  wrote on 08/24/2011 
04:33:50 PM:
> 
> Thank you, this work fine,
> and is not contorted like mine:)
> In this case lastV=LastI but depending on the data that I obtain
> lastV can be = LastA.
> 
> Any way it works very good:)
> 
> Thank you
> very much :)
> 
> 
> PS: but I still do not understand what was wrong in the script that I 
used,
> It was not very appropriate but it is strange that was not working, 

The main thing you missed with your code was that you didn't "save" the 
changes you made to activeT and activeR.

In other words, instead of

if (lastV > lastA){ 
append(activeT, lastV) 
lastR <- tail(activeR, 1) 
append(activeR, lastR) 
} 

You should have used

if (lastV > lastA){ 
activeT <- append(activeT, lastV) 
lastR <- tail(activeR, 1) 
activeR <- append(activeR, lastR) 
} 

Jean

> 2011/8/24 Jean V Adams 
> 
> I'm still a little confused about lastV and lastI.  The code you 
> provide uses lastV, but your description seems to refer to lastI. 
>  Test out this code and see if it is doing what you want it to do. 
> 
> lastI 
> lastA 
> activeT 
> activeR 
> if(lastI > lastA) { 
> activeT <- c(activeT, lastI) 
> activeR <- c(activeR, tail(activeR,1)) 
> } 
> activeT 
> activeR 
> 
> By the way, it's helpful to others if you cc r-help@r-project.org in
> any replies to keep the thread going. 
> 
> Jean 
> 
> Claudio Zanettini  wrote on 08/24/2011 
> 04:05:10 PM:
> 
> > 
> > Sure, sorry for that I was not very clear
> >  I did not mention that there was another the vector!
> > The vector lat is a vector containing both the values of activeT 
> > and  of inactT.
> > activeT and inactT have been created subsetting the vector lat'
> > The values are labeled such as that the decimal points indicate the 
> > kind of information
> > #.11 = active responses= activeT
> > #.13 = inactive responses= inacT
> > All the values are sorted in crescent way so
> >  the lastV is the last values in the vector lat (composed by the 
2vectors),
> > and so it is also the max value of all the values of activeT and 
inactiveT
> > 
> > this is the vector lat:
> > > lat
> >  [1] "26.11""316.13"   "341.11"   "376.11"   "459.11"   "466.21"  
> >  [7] "516.61"   "532.11"   "656.13"   "935.11"   "1163.11"  "1721.11" 
> > [13] "6167.11"  "6378.13"  "6513.11"  "7114.21"  "7165.61"  "7225.11" 
> > [19] "7254.11"  "7728.11"  "7964.11"  "8098.13"  "8099.13"  "8630.11" 
> > [25] "8803.11"  "9186.11"  "9453.11"  "10132.11" "10669.21" "10720.61"
> > [31] "10755.13" "11326.11" "11440.13" "11486.11" "11508.11" "11711.11"
> > [37] "11726.11" "13450.11" "13465.11" "15463.13" "15965.11" "15979.11"
> > [43] "16324.11" "16827.11" "16959.11" "17809.11" "19048.21" "19098.61"
> > [49] "22474.13" "22600.13" "22673.11" "23268.11" "27936.13" "27944.13"
> > [55] "30757.13" "32503.13" "32506.13" "32522.13" "32596.11" "33082.13"
> > [61] "33148.11" "46717.11" "51436.13"
> > 
> > 
> > thanks for you reply :)
> > Claudio
> 
> > 2011/8/24 Jean V Adams  
> > 
> > Claudio Zanettini wrote on 08/24/2011 03:04:39 PM: 
> > 
> > 
> > > This should be easy but it does not work
> > > I have 3 vectors*(activeT,inactT, activeR)*,
> > > the idea is that if the last value in inactT is higher than the last 
in
> > > activeT
> > > this value has to be append in active T 
> > 
> > 
> > 
> > When you say "this value" which one do you mean, the last value in 
> > inactT or the last value in activeT? 
> > 
> > 
> > > and the last value in another vector call activeR has to be 
repeated.
> > > (at the bottom you can find the vectors)
> > > I have done this:
> > > 
> > > activeT=round(as.numeric(activeT))
> > > inactT= round(as.numeric(inactT))
> > > lastV<-round(as.numeric(tail(lat,1))) 
> > 
> 
> > When I submit this line of your code, I get this error: 
> > Error in tail(lat, 1) : object 'lat' not found 
> > 
> > You didn't provide any information on the vector "lat". 
> > 
> > Jean 
> > 
> > 
> > > lastA<-round(as.numeric(tail(activeT,1)))
> > > lastI<-round(as.numeric(tail(inactT,1)))
> > > 
> > > if (lastV!=lastA){
> > > append(lastV, activeT)
> > > lastR=tail(activeR,1)
> > > append(activeR,lastR)
> > > }
> > > 
> > > lastR has been appended to activeR
> > > but not lastV to activeV
> > > 
> > > I guess that this is related to the attributes of the vectors 
> this is why I
> > > applied as.numeric at all the vectors.
> > > 
> > > Thank you for you time and your patience
> > > :)
> > > Claudio
> > > 
> > > *this are the vectors:*
> > > > activeT
> > >  [1]26.11   341.11   376.11   459.11   466.21   532.11   
> 935.11  1163.11
> > > 
> > >  [9]  1721.11  6167.11  6513.11  7114.21  7225.11  7254.11 
>  7728.11  7964.11
> > > 
> > > [17]  8630.11  8803.11  9186.11  9453.11 10132.11 10669.21 
> 11326.11 11486.11
> > > 
> > > [25] 11508.11 11711.11 11726.11 13450.11 13465.11 15965.11 
> 15979.11 16324.11
> > > 
> > > [33] 16827.11 16959.11 17809.11 19048.21 226

Re: [R] df of numerator and denominator

2011-08-24 Thread Jorge I Velez
Hi,

All the information is contained in your aov() object.  Take a look at the
first example at

?aov

HTH,
Jorge
*
*

On Wed, Aug 24, 2011 at 10:33 AM, martinas <> wrote:

> hello
>
> I need to know the dfn and dfd of my Anova. But in the Anova output there
> is
> only "Df".
> Is this the dfn or the dfd? and how do I get both of it in R?
>
> Thanks for any answers
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/df-of-numerator-and-denominator-tp3765526p3765526.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lodplot help

2011-08-24 Thread Al-Sabban, Shaza
I have a data frame (narrow) with 431 rows and 6 columns containing information 
on chromosome, position, lod1, lod2, lod3, lod4, looking like this:

> narrow
   chr pos   lod1   lod2   lod3   lod4
1 1   3.456 -0.025 -0.003 -0.209 -0.057
2 1   5.697 -0.029 -0.005 -0.200 -0.058
3 1   8.434 -0.049 -0.012 -0.247 -0.092
4 1   9.466 -0.074 -0.025 -0.300 -0.136
5 1   9.706 -0.074 -0.025 -0.298 -0.134
6 1  12.022 -0.067 -0.018 -0.280 -0.112
7 1  13.031 -0.061 -0.015 -0.268 -0.099
8 1  13.050 -0.061 -0.015 -0.268 -0.099
9 1  13.719 -0.055 -0.012 -0.252 -0.090


I am trying to plot the positions vs. scores using:

chromosome.viewlinkage (narrow, chrom=1, type="layout", statistic = "lod", 
with.X= TRUE, min.sat= -2, max.stat= 4, pheno.names = NULL, units = "cM", col 
=1:6, ltyp= 1, lwd = 2, chromosome.cex= 0.9)

or 

plot.scan  (narrow, chrom=1, type="layout", statistic = "lod", with.X= TRUE, 
min.sat= -2, max.stat= 4, pheno.names = NULL, units = "cM", col =1:6, ltyp= 1, 
lwd = 2, chromosome.cex= 0.9)

I get this error message: Error in if (length(col) < ncol(stat)) { : argument 
is of length zero.

Can you help me understand what I'm doing wrong? 

Thanks 
Shaza

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] looking for REML from a gnls fit or a nonlinear function (without random effects) fit using REML

2011-08-24 Thread Daniel Okamoto
Dear R users,
  I am fitting nonlinear mixed effects models with autocorrelated errors
(an AR(1) model on the residuals) using NLME and am comparing a set of
models that contain the same fixed effects structure but with different or
no random effects (nested) .  The issue I've come across that I'm unable to
fit the model with fixed effects only (i.e. no random effects) using
restricted maximum likelihood (REML) in order to appropriately compare the
models with and without random effects.  Does anyone know of a function that
will fit nonlinear models using REML or that will extract a restricted
log-likelihood from a gnls object?


Daniel K. Okamoto
PhD Student
1206 Marine Science Research Building
Department of Ecology, Evolution and Marine Biology
University of California, Santa Barbara
okam...@lifesci.ucsb.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Two-levels labels on x-axis?

2011-08-24 Thread Sébastien Vigneau
I figured out a solution by myself. In brief, I used different axis commands
to specify the ticks (with labels set to FALSE) and the labels (with tick
set to FALSE). For instance (with width=1 and space=1):
axis(side=1,at=c(2,6),labels=FALSE,tck=-0.1)
axis(side=1,at=c(0,4,8),labels=FALSE,tck=-0.2)
axis(side=1,at=c(1,3,5,7),labels=c("a","b","c","d"),tick=FALSE)
axis(side=1,at=c(2,6),labels=c("A","B"),tick=FALSE,padj=2)

Sebastien

On Mon, Aug 22, 2011 at 6:33 PM, Sébastien Vigneau <
sebastien.vign...@gmail.com> wrote:

> Thank you for your answer!
>
> I have two additional questions, in line with the previous one:
>
>1. how can I obtain tick marks flanking the labels, instead of being
>aligned with them (similar to the pipe symbols on my example)?
>2. how can I obtain tick marks of different sizes, so that the marks
>separating the groups are longer?
>
> Thank you for your help!
>
> Sebastien
>
>
> On Mon, Aug 22, 2011 at 2:07 PM, Joshua Wiley wrote:
>
>> Hi Sébastien,
>>
>> Not sure about an elegant, general way but here is something quick and
>> dirty:
>>
>> p <- barplot(matrix(1:8, 2))
>> axis(1, at = p, labels = letters[1:4])
>> axis(1, at = c(mean(p[1:2]), mean(p[3:4])), labels = paste("\n",
>> LETTERS[1:2]), padj = 1)
>>
>> Cheers,
>>
>> Josh
>>
>>
>>
>> On Mon, Aug 22, 2011 at 10:14 AM, Sébastien Vigneau
>>  wrote:
>> > Hi,
>> >
>> > I would like to draw a stacked bar chart with four bars (say "a", "b",
>> "c",
>> > "d") . Two bars belong to group A and the two others to group B.
>> Therefore,
>> > I would like to have, on the x-axis, a label for each bar and an
>> additional
>> > label for each group, positioned underneath. To give an idea, the x-axis
>> > labels should look like this:
>> > |a|b|c|d|
>> > | A | B |
>> >
>> > Do you know how I can generate such two-levels labels in R?
>> >
>> > Thank you for your help!
>> >
>> > Sebastien
>> >
>> >[[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> Programmer Analyst II, ATS Statistical Consulting Group
>> University of California, Los Angeles
>> https://joshuawiley.com/
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Suppressing error messages printed in xyplot() with panel function

2011-08-24 Thread Adam Zeilinger

Hello,

I am using the xyplot() function to create a series of scatterplot 
panels with lines of best fit.  To draw the lines of best fit for each 
panel, I am using a panel function.  Here's an example:


> species <- as.character(c(rep(list("A", "B", "A"), 10), "B"))
> year <- as.character(c(rep(list("2009", "2009", "2010"), 10), "2010"))
> x <- rnorm(31, mean = 50, sd = 1)
> y <- 3*x + 20
> ex.data <- data.frame(cbind(species, year, x, y))
> ex.data$x <- as.numeric(ex.data$x)
> ex.data$y <- as.numeric(ex.data$y)
> xyplot(y ~ x|species*year, data = ex.data,
+   panel = function(x, y) {
+   panel.xyplot(x, y, pch=16, col="black")
+   panel.abline(lm(y ~ x))})

With my data set, there are some panels with less than 2 data points.  
In these panels, an error message is printed in the panel, something 
like: "Error using packet 4 missing value where TRUE/FALSE needed."


In the panels with error messages, I want to keep the panels but 
suppress the error message, such that the panel is blank or has only one 
datum.  How do I do suppress the printing of the error message?


Thanks in advance for your help.
Adam

--

Adam Zeilinger
Conservation Biology Program
University of Minnesota

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating new variable with maximum visit date by group_id

2011-08-24 Thread Kathleen Rollet





Dear R users,
 
I am encoutering the following problem: I have a dataset with a 'unique_id' and 
different 'visit_date' (formatted as.Date, "%d/%m/%Y") per unique_id. I would 
like to create a new variable with the most recent date of visit per unique_id 
as shown below.
 
unique_id visit_date last_visit_date 
1  01/06/2010  01/06/2011 
1  01/01/2011  01/06/2011 
1  01/06/2011  01/06/2011 
2  01/01/2009  01/07/2011 
2  01/06/2009  01/07/2011 
2  01/06/2010  01/07/2011 
2  01/01/2011  01/07/2011 
2  01/07/2011  01/07/2011 
3  01/01/2008  01/01/2008 
4  01/01/2009  01/01/2010 
4  01/01/2010  01/01/2010 
 
I know the coding to easily do this in Stata, SAS, and Excel but I cannot find 
how to do it in R. I try multiple function such as tapply( ), ave( ), ddply ( 
), and transform ( ) after looking into previous postings. The codes are 
running but only NA values are generated or I get error messages that the 
replacement has less row than the data has (there are about 1000 unique_id and 
over 4000 rows in my dataset presently). 
I would greatly appreciate if someone could help me.
 
Thank you!
 
Kathleen R.
Epidemiologist
Montreal, QC, Canada  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Extracting and using fitted values and residuals with missing data

2011-08-24 Thread Matt1299
Hi folks,

I've a basic question concerning missing data.  I'm running mixed effects
analyses using nlme.  I've a sizable chunk of missing data on the outcome
being modeled, and am using "na.action=na.omit" when running the models. 
After fitting the models, I'm then trying to extract and use the fitted
values and/or residuals for additional analysis, but keep hitting walls
which I believe stem from the missing data.  For instance, when I tried to
plot the residuals against one of the predictors in the models, I get the
error:

Error in xy.coords(x, y, xlabel, ylabel, log) : 
  'x' and 'y' lengths differ.

Similarly, if I try to get the variance of the residuals across levels of a
categorical predictor, I get the error:

Error in tapply(fitted(fm.1, level = 1), multi.2$catpred, var,  : 
  arguments must have same length

My question is how to circumvent the problem, as it's becoming a hindrance. 
I've tried basic/obvious solutions (e.g., when trying to get the residual
variances above, including "na.rm=T"), but nothing helps, and this seems to
simplistic of a problem for there not to be a solution (I've just not
stumbled across it).  Thanks in advance for any ideas.


Thanks,
Matt

--
View this message in context: 
http://r.789695.n4.nabble.com/Extracting-and-using-fitted-values-and-residuals-with-missing-data-tp3766156p3766156.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Split data frame by date (POSIXlt)

2011-08-24 Thread Franc Lucas

   Hello everyone,
   I want to split a data.frame by the column date . The data frame looks like
   this
   date  time   openclose
   02.01.201109:00:00  1000 1200
   02.01.201109:05:02  1200 1203
   ...
   01.02.201110:01:21  1029 1110
   .
   30.03.201112:02:12  1231  1200
   30.03.201117:00:00  1200  1190
   Please  note  that this is the German version of the date notation. So
   02.01.2011 is January 2nd 2011.
   So the column data is class: character.
   When I now split the dataframe by date, e.g.
   Intraday <- split(x=src, f=src$date, drop=FALSE)
   ..I get a list which is not sorted...for example:  "01.02.2011" (February
   1st) comes before "02.01.2011" (January 2nd).
   My approach was to transform the column date into POSIXct by using strptime
   (btw: I dont care for the time information):
   src$date <- strptime(tickdata$date, "%d.%m.%Y")
   The data frame then looks like this:
   date  time   openclose
   01-02-201109:00:00  1000 1200
   01-02-201109:05:02  1200 1203
   ...
   02-01-201110:01:21  1029 1110
   .
   03-30-201112:02:12  1231  1200
   03-30-201117:00:00  1200  1190
   which is totally fine. But when I now try to split the data frame it says,
   that  I  am  indexing out of bounds... (German: "Fehler in args[[i]] :
   Indizierung außerhalb der Grenzen")
   Can anybody help me?
   Thanks in advance!
   Best
   Franc
   BSc. Student
   University of Mannheim

   Schon gehört? WEB.DE hat einen genialen Phishing-Filter in die
   Toolbar eingebaut! [1]http://produkte.web.de/go/toolbar

References

   1. http://produkte.web.de/go/toolbar
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] regarding changing of title of decompose graph

2011-08-24 Thread upani1982
Hi All,

I am new to this forum. I have just started learning R. When i use
plot(decompose(x)), i am getting the title " Additive time series
decomposition". How to make this title off and change to some other title.

Any help regarding this is highly appreciated.

With sincerer regards,
Upananda



--
View this message in context: 
http://r.789695.n4.nabble.com/regarding-changing-of-title-of-decompose-graph-tp3766114p3766114.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help with "by" command

2011-08-24 Thread amalka
Hello,

I am a new user of R, and I'd be grateful if someone could help me with the
following:

I would like to compute the mean of variable "trust" in dataframe "foo", but
separately for each level of variable V2.  That is, I'd like to compute the
mean of trust at each level of V2.

I have done this:

> tmp <- by(foo, foo$V2, function(x) mean(foo$trust, na.rm=T))
>tmp

Doing this does indeed give me a mean for variable trust at each level of V2
- but the problem is that I get the exact same mean for each level of V2. 
The means are not the really the same across levels of V2, but instead of
getting the mean for each level of V2 I get the overall mean for the
variable in the dataframe listed over and over for each level of V2.  

I have been attempting to figure this out for a while now, but I just can't
seem to figure it out.

Thanks for your time!
Ari

--
View this message in context: 
http://r.789695.n4.nabble.com/help-with-by-command-tp3766285p3766285.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nlminb - how to avoid evaluating initial parameters infinite in integrate

2011-08-24 Thread Newbie
Dear R-users.

I am faced with a problem I dont know how to solve. 
I need to calibrate the Heston stochastic volatility model, and have (to my
own belief) created a code for calculating the prices of options by this
model. However, when I calibrate the model using NLMINB I also evaluate my
initial parameters to infinity by the integrate function, and this is wrong!
I believe that this is the reason why nlminb keeps spitting out the initial
parameters as the estimates. For 0 the values of C and D, see below, should
be 0, so that the phi is equal to 1 (which makes the parameters NOT being
evaluated by infinity). It is a bit difficult to explain, and I hope you can
understand what I want to do. 
Is it possible to incorporate a statement in the integrate() saying that at
0 the conditions are different? 
Or should this be done in some sort of loop? Please help me. 

Also, I am very sorry that this message is a "repost" on my part. In my
original message the problem was a wrong use of nlminb. I hope that this new
header will make it easier for people who have knowledge in this field to
find my post. I have searched, but have not found a way to solve this
problem - I hope you are able to help me. (the problem is in the integrate
in the Price_call function where integrand refers to the phi function)


THANK you
Rikke

my code is: 

setwd("F:/Data til speciale/")

## Calibration of Heston model parameters
marketdata <- read.csv(file="S&P 500 calls, jan-jun 2010.csv", header=TRUE,
sep=";")

BS_Call <- function(S0, K, T, r, sigma, q)
{
sig <- sigma * sqrt(T)
d1 <- (log (S0*exp((r-q)*T)/K) + (sigma^2/2) * T ) / sig
d2 <- d1 - sig
Presentvalue <- exp(-r*T)
return (Presentvalue*(S0 * exp((r-q)*T) * pnorm(d1) - K*pnorm(d2)))
}


#- Values --
 Data imported
S0 <- 1136.03
X <- marketdata[1:460,9]
t <- marketdata[1:460,17]/365   #Notice the T is measured in years now
implvol <- marketdata[1:460,12]

## Initial values
kappa <- 0.0663227  # Lambda = -kappa
rho <- -0.6678461
eta <- 0.002124704
theta <- 0.0001421415
v0 <- 0.0001421415

q <- 0.02145608
r <- 0.01268737

smallk <- log(X/(S0*exp(r-q)*t))
parameters0 <- c(kappa, rho, eta, theta, v0)
#-


 The price of a Call option (Eq. (5.6) of The Volatility Surface,
Gatheral)
# In terms of log moneyness

Price_call <- function(phi, smallk, t)
{
integrand <-  function(u) {Re(exp(-1i*u*smallk)*phi(u - 1i/2, t)/(u^2 +
1/4))}
res <- S0*exp(-q*t) -
exp(smallk/2)/pi*integrate(Vectorize(integrand),lower=0,upper=Inf,
subdivisions=460)$value
return(res)
}

# The characteric formula for the Heston model (Eq. XX)

phiHeston <- function(parameters)
{   
lambda <- - kappa   
function(u, t)
{
alpha <- -u*u/2 - 1i*u/2
beta <- lambda - rho*eta*1i*u   
gamma <- eta^2/2
d <- sqrt(beta*beta - 4*alpha*gamma)
rplus <- (beta + d)/(eta^2)
rminus <- (beta - d)/(eta^2)
g <- rminus / rplus
D <- rminus * (1 - exp(-d*t))/ (1 - g*exp(-d*t))
C <- lambda* (rminus * t - 2/eta^2 * log( (1 - 
g*exp(-(d*t)))/(1 - g)) )
return(exp(C*theta + D*v0))
}
}


## Calculating the Heston model price with fourier
HestonCall<-function(smallk, t)
{
res<-Price_call(phiHeston(parameters),smallk,t)
return(res)
}

# Vectorizing the function to handle vectors of strikes and maturities
HestonCallVec <- function(smallk,t)
{
mapply (HestonCall, smallk, t)
}


lb <- c(0, -1, 0, 0, 0)
ub <- c(Inf, 1, Inf, Inf, 2)


difference <- function(smallk, t, S0, r, implvol, q, parameters)
{
return(HestonCallVec(smallk,t) - BS_Call(S0, exp(smallk), t, r, implvol, q))
}

y <- function(x) {kappa<-x[1]; rho<-x[2]; eta<- x[3]; theta<- x[4];
v0<-x[5]; sum(difference(smallk, t, S0, r, implvol, q, x)^2/BS_Call(S0,
exp(smallk), t, r, implvol, q))}
nlminb(start=parameters0, objective = y, lower =lb, upper =ub)
http://r.789695.n4.nabble.com/file/n3765932/S%26P_500_calls%2C_jan-jun_2010.csv
S%26P_500_calls%2C_jan-jun_2010.csv 

--
View this message in context: 
http://r.789695.n4.nabble.com/nlminb-how-to-avoid-evaluating-initial-parameters-infinite-in-integrate-tp3765932p3765932.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] as.numeric() and POSIXct format

2011-08-24 Thread Agustin Lobo

Hi!

I'm confused by this:
> as.numeric(as.POSIXct(518400,origin="2001-01-01"))
[1] 978822000

I guess the problem is that as.numeric() assumes a different origin, but cannot 
find
any default origin.

How can I get back the seconds from the POSIXct format? In other words, which 
the inverse function of as.POSIXct()?
I've tried as.numeric and unclass() using a origin= argument, but this does not 
work.


Thanks

Agus

--
Dr. Agustin Lobo
Institut de Ciencies de la Terra "Jaume Almera" (CSIC)
LLuis Sole Sabaris s/n
08028 Barcelona
Spain
Tel. 34 934095410
Fax. 34 934110012
email: agustin.l...@ija.csic.es

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Append a value to a vector

2011-08-24 Thread heverkuhn
This should be easy but it does not work
I have 3 vectors*(activeT,inactT, activeR)*,
the idea is that if the last value in inactT is higher than the last in
activeT
this value has to be append in active T
and the last value in another vector call activeR has to be repeated.
(at the bottom you can find the vectors)
I have done this:

activeT=round(as.numeric(activeT))
inactT= round(as.numeric(inactT))
lastV<-round(as.numeric(tail(lat,1)))
lastA<-round(as.numeric(tail(activeT,1)))
lastI<-round(as.numeric(tail(inactT,1)))

if (lastV!=lastA){
append(lastV, activeT)
lastR=tail(activeR,1)
append(activeR,lastR)
}   

lastR has been appended to activeR
but not lastV to activeV

I guess that this is related to the attributes of the vectors this is why I
applied as.numeric at all the vectors.

Thank you for you time and your patience
:)
Claudio

*this are the vectors:*
> activeT
 [1]26.11   341.11   376.11   459.11   466.21   532.11   935.11  1163.11
 [9]  1721.11  6167.11  6513.11  7114.21  7225.11  7254.11  7728.11  7964.11
[17]  8630.11  8803.11  9186.11  9453.11 10132.11 10669.21 11326.11 11486.11
[25] 11508.11 11711.11 11726.11 13450.11 13465.11 15965.11 15979.11 16324.11
[33] 16827.11 16959.11 17809.11 19048.21 22673.11 23268.11 32596.11 33148.11
[41] 46717.11

> inactT
 [1] "316.13"   "656.13"   "6378.13"  "8098.13"  "8099.13"  "10755.13"
 [7] "11440.13" "15463.13" "22474.13" "22600.13" "27936.13" "27944.13"
[13] "30757.13" "32503.13" "32506.13" "32522.13" "33082.13" "51436.13"


> activeR
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25
[26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41


--
View this message in context: 
http://r.789695.n4.nabble.com/Append-a-value-to-a-vector-tp3766180p3766180.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regression by factor using "sapply"

2011-08-24 Thread elh
Apologies for the elementary nature of the question (yes, I'm another
newbie)...

I'd like to perform a multiple regression on a single data set containing a
representation of energy consumption and temperatures containing account
number, usage (KWh), heating degree days (HDD) and cooling degree (CDD)
days.  I want to get the coefficients back from the following equation:  
lm(AvgKWh ~ AvgHDD + AvgCDD, data=usage)

Given that the data set contains the usage of different accounts (e.g. some
large energy users and some small energy users) I do not want to perform the
equation just one time.  Instead, I want to re-calculate the coefficients
(and associated measures of goodness of fit) for each account using the same
equation and return the corresponding coefficients by the account number
identifier.   

I thought I had figured out how to do this using  "by" and "sapply" formula
but I keep getting an error message of: "$ operator is invalid for atomic
vectors"

Here is what I've bee trying to use
# data is stored in a table called "usage"; other than the "ActNo" field,
all the fields are numeric
byDD <- function(data) {lm(AvgKWh~ AvgHDD + AvgCDD, data=data)}
byActNo <- by(usage, usage$ActNo, FUN=byDD)
sapply(byActNo, summary(byActno)$coef)

Thanks in advance!  I'm sure a similar question has been covered somewhere
but everytime I follow the message stream I hit a deadend.

--
View this message in context: 
http://r.789695.n4.nabble.com/Regression-by-factor-using-sapply-tp3766145p3766145.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Controling R from MS Access

2011-08-24 Thread lowman
Hello

did you happen to figure this out? I am just learning about using R, i have
a whack of fish data in MSAccess...and i want to take whatever functions
access is limited by with stats, and then call R to do them

i know the package RODBC works great to read data from your mdb, but i want
to have a command button, once clicked, activate R, and run a script, then
send the analysis back to a table in the mdb,

any help would be greatly appreciated

thanks,

doug

--
View this message in context: 
http://r.789695.n4.nabble.com/Controling-R-from-MS-Access-tp2719751p3765671.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] df of numerator and denominator

2011-08-24 Thread martinas
hello

I need to know the dfn and dfd of my Anova. But in the Anova output there is
only "Df".
Is this the dfn or the dfd? and how do I get both of it in R?

Thanks for any answers

--
View this message in context: 
http://r.789695.n4.nabble.com/df-of-numerator-and-denominator-tp3765526p3765526.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Controling R from MS Access

2011-08-24 Thread lowman
answered my own question, just use the call shell function in vb

woohoo



--
View this message in context: 
http://r.789695.n4.nabble.com/Controling-R-from-MS-Access-tp2719751p3766037.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R (&stats) newcomer.... help!

2011-08-24 Thread geigercounter120
Many thanks for your response.

unfortunately, it appears that I'm the closest thing in the vicinity to a
local expert (chilling times indeed...), but i will certainly look at the
booklist

in terms of the number of data points, we have:
two shores,
three treatments,
three replicates of each treatment (arranged in a randomised block design),
Shore A housed 6 species (total of 54 samples)
Shore B housed 4 species (total of 36 samples)

I do wish to treat each species as a separate comparison, so will look at
nlme for more info of how to do this.

many thanks again.

--
View this message in context: 
http://r.789695.n4.nabble.com/R-stats-newcomer-help-tp3764819p3765487.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ddply from plyr package - any alternatives?

2011-08-24 Thread AdamMarczak
Hello everyone,
I was asked to repost this again, sorry for any inconvenience.

I'm looking replacement for ddply function from plyr package. 
Function allows to apply function by category stored in any column/columns.

Regular loops or lapplys slow down greatly because my unique combination
count exceeds 9000. Is there any available solution which allow me to apply
function by category? 

currently my code looks like snippet below 

ddply(myData, c("country_name", "product_name"), myFunction) 

Please note that I'm looking for decently performing resolution. 

Thanks in advance! 

With regards, 
Adam.

--
View this message in context: 
http://r.789695.n4.nabble.com/ddply-from-plyr-package-any-alternatives-tp3765936p3765936.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replacing NAs in one variable with values of another variable

2011-08-24 Thread StellathePug
Thank you Dan and Ista!

Both of you are correct, I should have used NA rather than "NA" in my
example. So the correct code should be:

X <-as.data.frame(matrix(c(9, 6, 1, 3, 9, NA, NA,NA,NA,NA,
   6, 4, 3,NA, NA, NA, 5, 4, 1, 3), ncol=2))
names(X)<-c("X1","X2")  

X$X1[is.na(X$X1)] <- X$X2[is.na(X$X1)] 

Where the last line replaces the missing observations of X1 by those of X2.
The "if else" statement also works.

Thank you very much, again!
Rita

--
View this message in context: 
http://r.789695.n4.nabble.com/Replacing-NAs-in-one-variable-with-values-of-another-variable-tp3763269p3765317.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] unused argument(s) (Header = True) help!

2011-08-24 Thread shardman
Hi,

I'm really new to R so I aoplogise if this is a stupid question.

I'm trying to import data from a .txt file into R using the read.table
command, the headers for the data columns are already in the text file so I
add Header = True after the file location. The problem is I keep getting the
error message *unused argument(s) (Header = True)*, does anyone know why?

The format of the text file is like this (I've also tried spaces rather than
tab to seperate the columns):

TRAPSHANNON_INDEX
1   3.347
2   3.096
3   3.521
4   2.871
5   2.678

The commond looks like this:

Trap1_data<-read.table("C:/Documents and
Settings/Samuel/Desktop/Biology/Independent study/Stats/Diversity
indices/shannon index results trap 1.txt", Header = True)

I would reall appreciate some help,
Yours,
Sam


--
View this message in context: 
http://r.789695.n4.nabble.com/unused-argument-s-Header-True-help-tp3765651p3765651.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Howto convert Linear Regression data to text

2011-08-24 Thread ashz
Dear all,

How can I covert lm data to text in the form of "y=ax+b, r2" and how do I
calculate R-squared(r2)?

Thanks. 
 
Code:
x=18:29
y=c(7.1,7,7.7,8.2,8.8,9.7,9.9,7.1,7.2,8.8,8.7,8.5)
res=lm(y~x)

--
View this message in context: 
http://r.789695.n4.nabble.com/Howto-convert-Linear-Regression-data-to-text-tp3766230p3766230.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Boxplot orders

2011-08-24 Thread Phoebe Jekielek
Hi there,

I have length data of an organism over the year and I want to make a
boxplot. I get the boxplot just fine but the months are all out of order. In
the data set they are in order from Jan-Dec...how can I fix this problem?

Thanks so much in advance!!

Phoebe

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Append a value to a vector

2011-08-24 Thread Claudio Zanettini
Thank you, this work fine,
and is not contorted like mine:)
In this case lastV=LastI but depending on the data that I obtain
lastV can be = LastA.

Any way it works very good:)

Thank you
very much :)


PS: but I still do not understand what was wrong in the script that I used,
It was not very appropriate but it is strange that was not working,

2011/8/24 Jean V Adams 

>
> I'm still a little confused about lastV and lastI.  The code you provide
> uses lastV, but your description seems to refer to lastI.  Test out this
> code and see if it is doing what you want it to do.
>
> lastI
> lastA
> activeT
> activeR
> if(lastI > lastA) {
> activeT <- c(activeT, lastI)
> activeR <- c(activeR, tail(activeR,1))
> }
> activeT
> activeR
>
> By the way, it's helpful to others if you cc r-help@r-project.org in any
> replies to keep the thread going.
>
> Jean
>
> Claudio Zanettini  wrote on 08/24/2011
> 04:05:10 PM:
>
> >
> > Sure, sorry for that I was not very clear
> >  I did not mention that there was another the vector!
> > The vector lat is a vector containing both the values of activeT
> > and  of inactT.
> > activeT and inactT have been created subsetting the vector lat'
> > The values are labeled such as that the decimal points indicate the
> > kind of information
> > #.11 = active responses= activeT
> > #.13 = inactive responses= inacT
> > All the values are sorted in crescent way so
> >  the lastV is the last values in the vector lat (composed by the 2
> vectors),
> > and so it is also the max value of all the values of activeT and
> inactiveT
> >
> > this is the vector lat:
> > > lat
> >  [1] "26.11""316.13"   "341.11"   "376.11"   "459.11"   "466.21"
> >  [7] "516.61"   "532.11"   "656.13"   "935.11"   "1163.11"  "1721.11"
> > [13] "6167.11"  "6378.13"  "6513.11"  "7114.21"  "7165.61"  "7225.11"
> > [19] "7254.11"  "7728.11"  "7964.11"  "8098.13"  "8099.13"  "8630.11"
> > [25] "8803.11"  "9186.11"  "9453.11"  "10132.11" "10669.21" "10720.61"
> > [31] "10755.13" "11326.11" "11440.13" "11486.11" "11508.11" "11711.11"
> > [37] "11726.11" "13450.11" "13465.11" "15463.13" "15965.11" "15979.11"
> > [43] "16324.11" "16827.11" "16959.11" "17809.11" "19048.21" "19098.61"
> > [49] "22474.13" "22600.13" "22673.11" "23268.11" "27936.13" "27944.13"
> > [55] "30757.13" "32503.13" "32506.13" "32522.13" "32596.11" "33082.13"
> > [61] "33148.11" "46717.11" "51436.13"
> >
> >
> > thanks for you reply :)
> > Claudio
>
> > 2011/8/24 Jean V Adams 
> >
> > Claudio Zanettini wrote on 08/24/2011 03:04:39 PM:
> >
> >
> > > This should be easy but it does not work
> > > I have 3 vectors*(activeT,inactT, activeR)*,
> > > the idea is that if the last value in inactT is higher than the last in
> > > activeT
> > > this value has to be append in active T
> >
> >
> >
> > When you say "this value" which one do you mean, the last value in
> > inactT or the last value in activeT?
> >
> >
> > > and the last value in another vector call activeR has to be repeated.
> > > (at the bottom you can find the vectors)
> > > I have done this:
> > >
> > > activeT=round(as.numeric(activeT))
> > > inactT= round(as.numeric(inactT))
> > > lastV<-round(as.numeric(tail(lat,1)))
> >
>
> > When I submit this line of your code, I get this error:
> > Error in tail(lat, 1) : object 'lat' not found
> >
> > You didn't provide any information on the vector "lat".
> >
> > Jean
> >
> >
> > > lastA<-round(as.numeric(tail(activeT,1)))
> > > lastI<-round(as.numeric(tail(inactT,1)))
> > >
> > > if (lastV!=lastA){
> > > append(lastV, activeT)
> > > lastR=tail(activeR,1)
> > > append(activeR,lastR)
> > > }
> > >
> > > lastR has been appended to activeR
> > > but not lastV to activeV
> > >
> > > I guess that this is related to the attributes of the vectors this is
> why I
> > > applied as.numeric at all the vectors.
> > >
> > > Thank you for you time and your patience
> > > :)
> > > Claudio
> > >
> > > *this are the vectors:*
> > > > activeT
> > >  [1]26.11   341.11   376.11   459.11   466.21   532.11   935.11
>  1163.11
> > >
> > >  [9]  1721.11  6167.11  6513.11  7114.21  7225.11  7254.11  7728.11
>  7964.11
> > >
> > > [17]  8630.11  8803.11  9186.11  9453.11 10132.11 10669.21 11326.11
> 11486.11
> > >
> > > [25] 11508.11 11711.11 11726.11 13450.11 13465.11 15965.11 15979.11
> 16324.11
> > >
> > > [33] 16827.11 16959.11 17809.11 19048.21 22673.11 23268.11 32596.11
> 33148.11
> > >
> > > [41] 46717.11
> > >
> > > > inactT
> > >  [1] "316.13"   "656.13"   "6378.13"  "8098.13"  "8099.13"  "10755.13"
> > >  [7] "11440.13" "15463.13" "22474.13" "22600.13" "27936.13" "27944.13"
> > > [13] "30757.13" "32503.13" "32506.13" "32522.13" "33082.13" "51436.13"
> > >
> > >
> > > > activeR
> > >  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
> 23 24
> > > 25
> > > [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
> > >
>

[[alternative HTML version deleted]]

__
R-help@r-project.o

Re: [R] Append a value to a vector

2011-08-24 Thread Jean V Adams
I'm still a little confused about lastV and lastI.  The code you provide 
uses lastV, but your description seems to refer to lastI.  Test out this 
code and see if it is doing what you want it to do.

lastI
lastA
activeT
activeR
if(lastI > lastA) {
activeT <- c(activeT, lastI)
activeR <- c(activeR, tail(activeR,1))
}
activeT
activeR

By the way, it's helpful to others if you cc r-help@r-project.org in any 
replies to keep the thread going.

Jean

Claudio Zanettini  wrote on 08/24/2011 
04:05:10 PM:
> 
> Sure, sorry for that I was not very clear
>  I did not mention that there was another the vector!
> The vector lat is a vector containing both the values of activeT 
> and  of inactT.
> activeT and inactT have been created subsetting the vector lat'
> The values are labeled such as that the decimal points indicate the 
> kind of information
> #.11 = active responses= activeT
> #.13 = inactive responses= inacT
> All the values are sorted in crescent way so
>  the lastV is the last values in the vector lat (composed by the 2 
vectors),
> and so it is also the max value of all the values of activeT and 
inactiveT
> 
> this is the vector lat:
> > lat
>  [1] "26.11""316.13"   "341.11"   "376.11"   "459.11"   "466.21"  
>  [7] "516.61"   "532.11"   "656.13"   "935.11"   "1163.11"  "1721.11" 
> [13] "6167.11"  "6378.13"  "6513.11"  "7114.21"  "7165.61"  "7225.11" 
> [19] "7254.11"  "7728.11"  "7964.11"  "8098.13"  "8099.13"  "8630.11" 
> [25] "8803.11"  "9186.11"  "9453.11"  "10132.11" "10669.21" "10720.61"
> [31] "10755.13" "11326.11" "11440.13" "11486.11" "11508.11" "11711.11"
> [37] "11726.11" "13450.11" "13465.11" "15463.13" "15965.11" "15979.11"
> [43] "16324.11" "16827.11" "16959.11" "17809.11" "19048.21" "19098.61"
> [49] "22474.13" "22600.13" "22673.11" "23268.11" "27936.13" "27944.13"
> [55] "30757.13" "32503.13" "32506.13" "32522.13" "32596.11" "33082.13"
> [61] "33148.11" "46717.11" "51436.13"
> 
> 
> thanks for you reply :)
> Claudio

> 2011/8/24 Jean V Adams 
> 
> Claudio Zanettini wrote on 08/24/2011 03:04:39 PM:
> 
> 
> > This should be easy but it does not work
> > I have 3 vectors*(activeT,inactT, activeR)*,
> > the idea is that if the last value in inactT is higher than the last 
in
> > activeT
> > this value has to be append in active T
> 
> 
> 
> When you say "this value" which one do you mean, the last value in 
> inactT or the last value in activeT? 
> 
> 
> > and the last value in another vector call activeR has to be repeated.
> > (at the bottom you can find the vectors)
> > I have done this:
> > 
> > activeT=round(as.numeric(activeT))
> > inactT= round(as.numeric(inactT))
> > lastV<-round(as.numeric(tail(lat,1))) 
> 

> When I submit this line of your code, I get this error: 
> Error in tail(lat, 1) : object 'lat' not found 
> 
> You didn't provide any information on the vector "lat". 
> 
> Jean 
> 
> 
> > lastA<-round(as.numeric(tail(activeT,1)))
> > lastI<-round(as.numeric(tail(inactT,1)))
> > 
> > if (lastV!=lastA){
> > append(lastV, activeT)
> > lastR=tail(activeR,1)
> > append(activeR,lastR)
> > }
> > 
> > lastR has been appended to activeR
> > but not lastV to activeV
> > 
> > I guess that this is related to the attributes of the vectors this is 
why I
> > applied as.numeric at all the vectors.
> > 
> > Thank you for you time and your patience
> > :)
> > Claudio
> > 
> > *this are the vectors:*
> > > activeT
> >  [1]26.11   341.11   376.11   459.11   466.21   532.11   935.11 
 1163.11
> > 
> >  [9]  1721.11  6167.11  6513.11  7114.21  7225.11  7254.11  7728.11 
 7964.11
> > 
> > [17]  8630.11  8803.11  9186.11  9453.11 10132.11 10669.21 11326.11 
11486.11
> > 
> > [25] 11508.11 11711.11 11726.11 13450.11 13465.11 15965.11 15979.11 
16324.11
> > 
> > [33] 16827.11 16959.11 17809.11 19048.21 22673.11 23268.11 32596.11 
33148.11
> > 
> > [41] 46717.11
> > 
> > > inactT
> >  [1] "316.13"   "656.13"   "6378.13"  "8098.13"  "8099.13"  "10755.13"
> >  [7] "11440.13" "15463.13" "22474.13" "22600.13" "27936.13" "27944.13"
> > [13] "30757.13" "32503.13" "32506.13" "32522.13" "33082.13" "51436.13"
> > 
> > 
> > > activeR
> >  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 
23 24
> > 25
> > [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
> > 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Column of probabilities

2011-08-24 Thread David Winsemius


On Aug 24, 2011, at 3:31 PM, Jim Silverton wrote:


Hi all,
I have a vector xm say:  xm = c(1,2,3,4,5,5,5,6,6)

I want to return a vector with the corresponding probabilities based  
on the

amount of times the numbers occurred. For example, I should get the
following vector for xm:
prob.xm = c(1/9, 1/9, 1/9, 1/9, 3/9, 3/9, 3/9, 2/9, 2/9)


?prop.table

Usage (with table)

> prob.xm <- round( prop.table(table(xm)), digits=3)
> prob.xm
xm
1 2 3 4 5 6
0.111 0.111 0.111 0.111 0.333 0.222

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Append a value to a vector

2011-08-24 Thread Jean V Adams
Claudio Zanettini wrote on 08/24/2011 03:04:39 PM:

> This should be easy but it does not work
> I have 3 vectors*(activeT,inactT, activeR)*,
> the idea is that if the last value in inactT is higher than the last in
> activeT
> this value has to be append in active T


When you say "this value" which one do you mean, the last value in inactT 
or the last value in activeT?


> and the last value in another vector call activeR has to be repeated.
> (at the bottom you can find the vectors)
> I have done this:
> 
> activeT=round(as.numeric(activeT))
> inactT= round(as.numeric(inactT))
> lastV<-round(as.numeric(tail(lat,1)))


When I submit this line of your code, I get this error:
Error in tail(lat, 1) : object 'lat' not found

You didn't provide any information on the vector "lat".

Jean


> lastA<-round(as.numeric(tail(activeT,1)))
> lastI<-round(as.numeric(tail(inactT,1)))
> 
> if (lastV!=lastA){
> append(lastV, activeT)
> lastR=tail(activeR,1)
> append(activeR,lastR)
> }
> 
> lastR has been appended to activeR
> but not lastV to activeV
> 
> I guess that this is related to the attributes of the vectors this is 
why I
> applied as.numeric at all the vectors.
> 
> Thank you for you time and your patience
> :)
> Claudio
> 
> *this are the vectors:*
> > activeT
>  [1]26.11   341.11   376.11   459.11   466.21   532.11   935.11 
1163.11
> 
>  [9]  1721.11  6167.11  6513.11  7114.21  7225.11  7254.11  7728.11 
7964.11
> 
> [17]  8630.11  8803.11  9186.11  9453.11 10132.11 10669.21 11326.11 
11486.11
> 
> [25] 11508.11 11711.11 11726.11 13450.11 13465.11 15965.11 15979.11 
16324.11
> 
> [33] 16827.11 16959.11 17809.11 19048.21 22673.11 23268.11 32596.11 
33148.11
> 
> [41] 46717.11
> 
> > inactT
>  [1] "316.13"   "656.13"   "6378.13"  "8098.13"  "8099.13"  "10755.13"
>  [7] "11440.13" "15463.13" "22474.13" "22600.13" "27936.13" "27944.13"
> [13] "30757.13" "32503.13" "32506.13" "32522.13" "33082.13" "51436.13"
> 
> 
> > activeR
>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 
23 24
> 25
> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
> 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Append a value to a vector

2011-08-24 Thread Claudio Zanettini
This should be easy but it does not work
I have 3 vectors*(activeT,inactT, activeR)*,
the idea is that if the last value in inactT is higher than the last in
activeT
this value has to be append in active T
and the last value in another vector call activeR has to be repeated.
(at the bottom you can find the vectors)
I have done this:

activeT=round(as.numeric(activeT))
inactT= round(as.numeric(inactT))
lastV<-round(as.numeric(tail(lat,1)))
lastA<-round(as.numeric(tail(activeT,1)))
lastI<-round(as.numeric(tail(inactT,1)))

if (lastV!=lastA){
append(lastV, activeT)
lastR=tail(activeR,1)
append(activeR,lastR)
}

lastR has been appended to activeR
but not lastV to activeV

I guess that this is related to the attributes of the vectors this is why I
applied as.numeric at all the vectors.

Thank you for you time and your patience
:)
Claudio

*this are the vectors:*
> activeT
 [1]26.11   341.11   376.11   459.11   466.21   532.11   935.11  1163.11

 [9]  1721.11  6167.11  6513.11  7114.21  7225.11  7254.11  7728.11  7964.11

[17]  8630.11  8803.11  9186.11  9453.11 10132.11 10669.21 11326.11 11486.11

[25] 11508.11 11711.11 11726.11 13450.11 13465.11 15965.11 15979.11 16324.11

[33] 16827.11 16959.11 17809.11 19048.21 22673.11 23268.11 32596.11 33148.11

[41] 46717.11

> inactT
 [1] "316.13"   "656.13"   "6378.13"  "8098.13"  "8099.13"  "10755.13"
 [7] "11440.13" "15463.13" "22474.13" "22600.13" "27936.13" "27944.13"
[13] "30757.13" "32503.13" "32506.13" "32522.13" "33082.13" "51436.13"


> activeR
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25
[26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Opening package manual from within R

2011-08-24 Thread Tyler Rinker

Apparently my request to view the help pages is not a popular method among R 
users for gaining information.  for me these pages are very helpful so I will 
follow up to completed this thread for future searchers.
 
First thanks fo Prof. Brian Ripley.  Your idea was spot on what I was looking 
for for generating a pdf from the package library.
 
I worked out the following code that I added to my .First() library:
#=manual
 <- function(library, method=web){

LIB <- substitute(library)
LIB <- as.character(LIB)

METH <- substitute(method)
METH <- as.character(METH)

switch(METH,
   web = 
browseURL(paste("http://cran.r-project.org/web/packages/",LIB,"/",LIB,".pdf";, 
sep = "")),
   system =  {unlink(paste(getwd(),"/",LIB,".pdf",sep=""))
 path <- find.package(LIB)
 system(paste(shQuote(file.path(R.home("bin"), "R")),"CMD", 
"Rd2pdf",shQuote(path)))})   
}
#=
 
library is the package name
method is either web or system (web is Internet based and faster where as 
system creates the pdf from the library latex code and is slower)
 
#=
Thanks for your responses!

Tyler
 

> Date: Wed, 24 Aug 2011 07:12:24 +0100
> From: rip...@stats.ox.ac.uk
> To: tyler_rin...@hotmail.com
> CC: r-help@r-project.org
> Subject: Re: [R] Opening package manual from within R
> 
> On Tue, 23 Aug 2011, Tyler Rinker wrote:
> 
> >
> > Simple question but searching rseek did not yield the results I wanted.
> >
> > Question: Is there a way to open a help manual for a package from within R.
> >
> > For instance I would like to type a function in r for the tm package 
> > and R would open that PDF as seen here: 
> > http://cran.r-project.org/web/packages/tm/tm.pdf
> >
> > -The vignette function exists for vignettes 
> > [vignette("package.name")] so I assume the same exists for manuals.
> 
> You assume wrong. Vignettes PDFs are installed as part of the package 
> (and often take minutes to regenerate): the PDF version of the help 
> pages (what you seem to call 'the package manual') is not (in 
> general). In many cases what other people (including the author, e.g. 
> me for RODBC) call the 'package manual' is a PDF in the doc directory 
> (which may or may not be a vignette).
> 
> The assumption is that people will use search facilities or the hints 
> given by the help titles in help(package="tm") or browse the HTML 
> version of the same information (e.g. via help.start).
> 
> But you can (provided you have pdflatex etc in your path) generate the 
> PDF version of the help pages by
> 
> R CMD Rd2pdf /path/to/installed/package
> 
> It will even open it in a browser for you (unless you use 
> --no-preview). You could easily encapsulate this in a function by 
> e.g.
> 
> showPDFmanual <- function(package, lib.loc=NULL)
> {
> path <- find.package(package, lib.loc)
> system(paste(shQuote(file.path(R.home("bin"), "R")),
> "CMD", "Rd2pdf",
> shQuote(path)))
> }
> 
> Alternatively *for packages on CRAN only* you can access the version 
> on CRAN by browseURL.
> 
> > -I do not want library(help="package.name") as this is not detailed enough.
> >
> > I am running R 2.14.0 beta on a windows 7 machine
> > Reproducible code does not seem appropriate in this case.
> 
> But accurate 'at a minimum' information (and no HTML) does. There is 
> no such version as '2.14.0 beta', and will not be for a couple of 
> months. If you are running a beta version of R it is old, so please 
> update to a released or patched version. (Also, any version 
> calling itself '2.14.0 Under development' is old and needs updating: 
> the current R-devel displays no version number.)
> 
> 
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> -- 
> Brian D. Ripley, rip...@stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Model selection and model efficiency - Search for opinions

2011-08-24 Thread Bert Gunter
1. As this is not really appropriate for R, I suggest replies be private.

2. You might try posting on various statistical forums, e.g. on
http://stats.stackexchange.com/


-- Cheers, Bert

On Wed, Aug 24, 2011 at 12:15 PM, Arnaud Mosnier  wrote:
> Hi,
>
> In order to find the best models I use AIC, more specifically I calculate
> Akaike weights then Evidence Ratio (ER) and consider that models with a ER <
> 2 are equally likely.
> But the same problem remain each time I do that. I selected the best models
> from a set of them, but I don't know if those models are efficient to
> predict (or at least represent) my data.
> I can have selected the best element(s) of the list of the worst models.
>
> Do you find it is correct to calculate R2 or pseudo-R2 for the best "set of
> models" in order to have an idea of the representativeness of those models
> and use this value to select the more efficient model ?
>
> I would be glad to hear your opinions about this !
>
> Thanks,
>
> Arnaud
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Column of probabilities

2011-08-24 Thread R. Michael Weylandt
If your numbers are all positive integers, this should work:

(tabulate(xm)[xm])/length(xm)

it can be put into a function for ease of use:

probVec <- function(x) {(tabulate(x)[x])/length(x)}

You'll have some trouble if you have non-positive integers or non-integers.
Let me know if you need to handle that case: it's not much harder (just a
transform in and out of integers).

Hope this helps,

Michael

On Wed, Aug 24, 2011 at 3:31 PM, Jim Silverton wrote:

> Hi all,
> I have a vector xm say:  xm = c(1,2,3,4,5,5,5,6,6)
>
> I want to return a vector with the corresponding probabilities based on the
> amount of times the numbers occurred. For example, I should get the
> following vector for xm:
> prob.xm = c(1/9, 1/9, 1/9, 1/9, 3/9, 3/9, 3/9, 2/9, 2/9)
> Any help greatly appreciated.
>
> --
> Thanks,
> Jim.
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Column of probabilities

2011-08-24 Thread Jean V Adams
Try this:

prob.xm <- (table(xm)/length(xm))[match(xm, sort(unique(xm)))]

Jean


Jim Silverton wrote on 08/24/2011 02:31:05 PM:

> Hi all,
> I have a vector xm say:  xm = c(1,2,3,4,5,5,5,6,6)
> 
> I want to return a vector with the corresponding probabilities based on 
the
> amount of times the numbers occurred. For example, I should get the
> following vector for xm:
> prob.xm = c(1/9, 1/9, 1/9, 1/9, 3/9, 3/9, 3/9, 2/9, 2/9)
> Any help greatly appreciated.
> 
> -- 
> Thanks,
> Jim.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Column of probabilities

2011-08-24 Thread Jim Silverton
Hi all,
I have a vector xm say:  xm = c(1,2,3,4,5,5,5,6,6)

I want to return a vector with the corresponding probabilities based on the
amount of times the numbers occurred. For example, I should get the
following vector for xm:
prob.xm = c(1/9, 1/9, 1/9, 1/9, 3/9, 3/9, 3/9, 2/9, 2/9)
Any help greatly appreciated.

-- 
Thanks,
Jim.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Model selection and model efficiency - Search for opinions

2011-08-24 Thread Arnaud Mosnier
Hi,

In order to find the best models I use AIC, more specifically I calculate
Akaike weights then Evidence Ratio (ER) and consider that models with a ER <
2 are equally likely.
But the same problem remain each time I do that. I selected the best models
from a set of them, but I don't know if those models are efficient to
predict (or at least represent) my data.
I can have selected the best element(s) of the list of the worst models.

Do you find it is correct to calculate R2 or pseudo-R2 for the best "set of
models" in order to have an idea of the representativeness of those models
and use this value to select the more efficient model ?

I would be glad to hear your opinions about this !

Thanks,

Arnaud

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] silently testing for data from another package for .Rd examples

2011-08-24 Thread Michael Friendly

On 8/24/2011 12:40 PM, Uwe Ligges wrote:
Actually it is recommended to test for the availability of a valid 
package with find.package(), particularly in this case where the name 
of the package is already know.


Best,
Uwe


Thanks. So I guess the idiom I'm looking for is

> length((find.package("", quiet=TRUE)))
[1] 0
> length((find.package("lattice", quiet=TRUE)))
[1] 1
>

However, AFAICS, find.package() was only introduced in R 2.13.x, so I'm 
a bit reluctant to use it in

a new package that need not otherwise require the latest major version.

The following (from Yihui) works for earlier versions, at least R 2.12.2

> "" %in% .packages(all=TRUE)
[1] FALSE
> "lattice" %in% .packages(all=TRUE)
[1] TRUE
>



On 24.08.2011 18:29, Yihui Xie wrote:

.packages(all = TRUE) will give you a list of all available packages
without really loading them like require().

Regards,
Yihui
--
Yihui Xie
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA



On Wed, Aug 24, 2011 at 9:28 AM, Michael Friendly  
wrote:
In an .Rd example for a package, I want to use data from another 
package,

but avoid loading the entire
package and avoid errors/warnings if that other package is not 
available.


If I don't care about loading the other package, I can just do:

if (require("ElemStatLearn", quietly=TRUE)) {
data(prostate)
  #  rest of example
}

I'd rather just be able to do something like:

if (data(prostate, package="ElemStatLearn")) {
  #  rest of example
}

but it appears that data() doesn't return anything useful (like 
FALSE or

NULL) in case the named data
set doesn't exist, or the package cannot be found.  Below are some test
cases in a fresh R 2.13.1 session.

Is there someway I can incorporate such a data example silently without
errors or warnings if the
package doesn't exist, as is the case with require()?


data(prostate, package="ElemStatLearn")
dd<- data(prostate, package="ElemStatLearn")
dd

[1] "prostate"

dd2<- data(x, package="ElemStatLearn")

Warning message:
In data(x, package = "ElemStatLearn") : data set 'x' not found

dd2

[1] "x"

dd2<- data(x, package="ElemStatLearn", verbose=FALSE)

Warning message:
In data(x, package = "ElemStatLearn", verbose = FALSE) :
  data set 'x' not found


dd3<- data(z, package="foobar")

Error in find.package(package, lib.loc, verbose = verbose) :
  there is no package called 'foobar'

dd3

Error: object 'dd3' not found




try() doesn't seem to help here:


ddtry<- try(data(z, package="foobar"))

Error in find.package(package, lib.loc, verbose = verbose) :
  there is no package called 'foobar'

ddtry
[1] "Error in find.package(package, lib.loc, verbose = verbose) : 
\n  there

is no package called 'foobar'\n"
attr(,"class")
[1] "try-error"




--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help: convert entry of a list into a matrix

2011-08-24 Thread R. Michael Weylandt
Rereading your email, still not sure what the question is -- perhaps you
could give a better code example to illustrate the difference between a[[2]]
and mat1 -- but, since you mentioned briefly lists of lists, have you looked
at unlist(, recursive = F)? If applied to a list of lists, it won't unlist
the sub-lists and that gives you a list of matrices, which, as I said
before, are matrices when subsetted with `[[`, but not `[`.

Michael

On Wed, Aug 24, 2011 at 2:37 PM, R. Michael Weylandt <
michael.weyla...@gmail.com> wrote:

> I'm not sure I understand your question: a[[2]] is a matrix.
>
> > a <- list(matrix(1:6,2),matrix(5:10,2))
> > is.matrix(a[[2]])
> TRUE
> x = a[[2]]
> > is.matrix(x)
> TRUE
> > x+2
>   [,1] [,2] [,3]
> [1,]  7 9   11
> [2,]  810  12
> > a[[2]] + 2
>   [,1] [,2] [,3]
> [1,]  7 9   11
> [2,]  810  12
>
> What else do you need?
>
> Michael Weylandt
>
>
> On Wed, Aug 24, 2011 at 2:28 PM, Chee Chen  wrote:
>
>> Dear All,
>> As always, I appreciate all your help.
>> I would like to know the easiest way to convert each of the homogeneous
>> elements of a numeric list into a matrix. Each element of this list is also
>> a list such that when displayed, looks like a 2-by-3 matrix , I would like
>> to convert each of them into a matrix, without changing the double index of
>> each entry.
>>
>> Suppose:
>> a<- vector("list",2)
>> > a[[1]]
>>   [,1]  [,2]  [,3]
>> [1,]  1 2  0
>> [2,] 3  4  5
>>
>> > a[[2]]
>>   [,1]  [,2] [,3]
>> [1,]  5 69
>> [2,]  78  10
>>
>> >is.list(a[[1]])
>> True
>>
>> Target: I would like to convert a[[2]] into a matrix, keeping the double
>> index, into
>> > mat1
>>   [,1]  [,2] [,3]
>> [1,]  5 6   9
>> [2,]  7810
>>
>> The list I have is huge and so is each of its elements, and do not want to
>> use unlist because I do not understand it fully.
>> Thank you,
>> Chee
>>
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help: convert entry of a list into a matrix

2011-08-24 Thread R. Michael Weylandt
I'm not sure I understand your question: a[[2]] is a matrix.

> a <- list(matrix(1:6,2),matrix(5:10,2))
> is.matrix(a[[2]])
TRUE
x = a[[2]]
> is.matrix(x)
TRUE
> x+2
  [,1] [,2] [,3]
[1,]  7 9   11
[2,]  810  12
> a[[2]] + 2
  [,1] [,2] [,3]
[1,]  7 9   11
[2,]  810  12

What else do you need?

Michael Weylandt

On Wed, Aug 24, 2011 at 2:28 PM, Chee Chen  wrote:

> Dear All,
> As always, I appreciate all your help.
> I would like to know the easiest way to convert each of the homogeneous
> elements of a numeric list into a matrix. Each element of this list is also
> a list such that when displayed, looks like a 2-by-3 matrix , I would like
> to convert each of them into a matrix, without changing the double index of
> each entry.
>
> Suppose:
> a<- vector("list",2)
> > a[[1]]
>   [,1]  [,2]  [,3]
> [1,]  1 2  0
> [2,] 3  4  5
>
> > a[[2]]
>   [,1]  [,2] [,3]
> [1,]  5 69
> [2,]  78  10
>
> >is.list(a[[1]])
> True
>
> Target: I would like to convert a[[2]] into a matrix, keeping the double
> index, into
> > mat1
>   [,1]  [,2] [,3]
> [1,]  5 6   9
> [2,]  7810
>
> The list I have is huge and so is each of its elements, and do not want to
> use unlist because I do not understand it fully.
> Thank you,
> Chee
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.table truncated data?

2011-08-24 Thread Sarah Goslee
Hi,

On Wed, Aug 24, 2011 at 2:18 PM, zhenjiang xu  wrote:
> Hi R users,
>
> I was using read.table to read a file. The data.fame looked alright, but I
> found not all rows are read by the read.table. What's wrong with it? It
> didn't give me any warning or error messages. Why the data are truncated?
> Thanks.
>
> $ wc -l all/isoform_exp.diff
> 42847 all/isoform_exp.diff
>
>> a=read.table('all/isoform_exp.diff', header=T, sep='\t')
>> nrow(a)
> [1] 21423

This is a common problem. You need to take a look at the last row that
was imported, and the rows around 21423 in the original file.

Common causes include stray single or double quotation marks, and
other special characters in your file like the default comment.char #

Sarah
-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help: convert entry of a list into a matrix

2011-08-24 Thread Chee Chen
Dear All,
As always, I appreciate all your help.
I would like to know the easiest way to convert each of the homogeneous 
elements of a numeric list into a matrix. Each element of this list is also a 
list such that when displayed, looks like a 2-by-3 matrix , I would like to 
convert each of them into a matrix, without changing the double index of each 
entry.

Suppose: 
a<- vector("list",2)
> a[[1]]
   [,1]  [,2]  [,3]
[1,]  1 2  0
[2,] 3  4  5

> a[[2]]
   [,1]  [,2] [,3]
[1,]  5 69
[2,]  78  10

>is.list(a[[1]])
True

Target: I would like to convert a[[2]] into a matrix, keeping the double index, 
into
> mat1
   [,1]  [,2] [,3]
[1,]  5 6   9
[2,]  7810

The list I have is huge and so is each of its elements, and do not want to use 
unlist because I do not understand it fully.
Thank you,
Chee


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.table truncated data?

2011-08-24 Thread zhenjiang xu
Hi R users,

I was using read.table to read a file. The data.fame looked alright, but I
found not all rows are read by the read.table. What's wrong with it? It
didn't give me any warning or error messages. Why the data are truncated?
Thanks.

$ wc -l all/isoform_exp.diff
42847 all/isoform_exp.diff

> a=read.table('all/isoform_exp.diff', header=T, sep='\t')
> nrow(a)
[1] 21423

-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function rank() for data frames (or multiple vectors)?

2011-08-24 Thread David Winsemius


On Aug 24, 2011, at 1:37 PM, Sebastian Bauer wrote:


Hi!


in R? Basically, what I need is a mixture of order() and rank().
While the former allows to specify multiple vectors, it doesn't
provide the flexibility of rank() such that I can specify what
happens if ties can not be broken.

An example of this "simple problem" would clarify this greatly. I
cannot tell what "flexibility" in 'rank' is missing in 'order'.


Thanks for your answer. For instance, if I have two vectors such as

1 1
1 2
1 2
1 3
2 1

that I want combinedly ranked. I'd like to get an output

1
2
2
4
5

or (ties.method=average)

1
2.5
2.5
4
5

Basically, I need a function similar to the rank() function that
accepts
more than one vector (as order() does).

Can't you just paste the columns and run rank on the results? 'rank'
accepts character vectors.


I was looking for an elegant solution ;) In the real case I have  
double

values and this would be quite inefficient then.


Still no r-code:

Then what about rank(order(...) , further-ties.method-argument) ?

I'm perhaps not seeing the problem clearly?



Best,
Sebastian



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to do cross validation with glm?

2011-08-24 Thread Frank Harrell
What is your sample size?  I've had trouble getting reliable estimates using
simple data splitting when N < 20,000.

Note that the following functions in the rms package facilitates
cross-validation and bootstrapping for validating models: ols, validate,
calibrate.

Frank

Andra Isan wrote:
> 
> Hi,
> 
> Thanks for the reply. What I meant is that, I would like to partition my
> dat data (a data frame) into training and testing data and then evaluate
> the performance of the model on test data. So, I thought cross validation
> is the natural choice to see how the prediction works on the hold-out
> data. Is there any example that I can take a look to see how to do cross
> validation and get the prediction results on my data?
> 
> Thanks a lot,
> Andra
> 
> --- On Wed, 8/24/11, Prof Brian Ripley 
> wrote:
> 
>> From: Prof Brian Ripley 
>> Subject: Re: [R] How to do cross validation with glm?
>> To: "Andra Isan" 
>> Cc: r-help@r-project.org
>> Date: Wednesday, August 24, 2011, 10:11 AM
>> What you describe is not
>> cross-validation, so I am afraid we do not know what you
>> mean.  And cv.glm does 'prediction for the hold-out
>> data' for you: you can read the code to see how it does so.
>> 
>> I suspect you mean you want to do validation on a test set,
>> but that is not what you actually
>> claim.   There are lots of examples of this
>> sort of thing in MASS (the book, scripts in the package).
>> 
>> On Wed, 24 Aug 2011, Andra Isan wrote:
>> 
>> > Hi All,
>> > 
>> > I have a fitted model called glm.fit which I used glm
>> and data dat is my data frame
>> > 
>> > pred= predict(glm.fit, data = dat, type="response")
>> > 
>> > to predict how it predicts on my whole data but
>> obviously I have to do cross-validation to train the model
>> on one part of my data and predict on the other part. So, I
>> searched for it and I found a function cv.glm which is in
>> package boot. So, I tired to use it as:
>> > 
>> > cv.glm = (cv.glm(dat, glm.fit, cost,
>> K=nrow(dat))$delta)
>> > 
>> > but I am not sure how to do the prediction for the
>> hold-out data. Is there any better way for cross-validation
>> to learn a model on training data and test it on test data
>> in R?
>> > 
>> > Thanks,
>> > Andra
>> > 
>> > __
>> > R-help@r-project.org
>> mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained,
>> reproducible code.
>> > 
>> 
>> -- Brian D. Ripley,         
>>         rip...@stats.ox.ac.uk
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford,         
>>    Tel:  +44 1865 272861 (self)
>> 1 South Parks Road,         
>>            +44 1865
>> 272866 (PA)
>> Oxford OX1 3TG, UK           
>>     Fax:  +44 1865 272595
>>
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-do-cross-validation-with-glm-tp3765994p3766108.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function rank() for data frames (or multiple vectors)?

2011-08-24 Thread Sebastian Bauer
Hi!

 in R? Basically, what I need is a mixture of order() and rank().
 While the former allows to specify multiple vectors, it doesn't
 provide the flexibility of rank() such that I can specify what
 happens if ties can not be broken.
>>> An example of this "simple problem" would clarify this greatly. I
>>> cannot tell what "flexibility" in 'rank' is missing in 'order'.
>>
>> Thanks for your answer. For instance, if I have two vectors such as
>>
>> 1 1
>> 1 2
>> 1 2
>> 1 3
>> 2 1
>>
>> that I want combinedly ranked. I'd like to get an output
>>
>> 1
>> 2
>> 2
>> 4
>> 5
>>
>> or (ties.method=average)
>>
>> 1
>> 2.5
>> 2.5
>> 4
>> 5
>>
>> Basically, I need a function similar to the rank() function that
>> accepts
>> more than one vector (as order() does).
> Can't you just paste the columns and run rank on the results? 'rank'
> accepts character vectors.

I was looking for an elegant solution ;) In the real case I have double
values and this would be quite inefficient then.

Best,
Sebastian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation and summaries with few million rows

2011-08-24 Thread Juliet Hannah
Thanks Dennis! I'll check this out.

Just to clarify, I need the total number of switches/changes
regardless of if that state
had occurred in the past. So A-A-B-A, would have 2 changes: A to B and B to A.

Thanks again.


On Wed, Aug 24, 2011 at 1:28 PM, Dennis Murphy  wrote:
> Hi Juliet:
>
> Here's a Q & D solution:
>
> # (1) plyr
>> f <- function(d) length(unique(d$mygroup)) - 1
>> ddply(myData, .(id), f)
>  id V1
> 1  1  0
> 2  2  2
> 3  3  1
> 4  4  0
>
> # (2) data.table
>
> myDT <- data.table(myData, key = 'id')
> myDT[, list(nswitch = length(unique(mygroup)) - 1), by = 'id']
>
> If one can switch back and forth between levels more than once, then
> the above is clearly not appropriate. A more robust method would be to
> employ rle() [run length encoding]:
>
> g <- function(d) length(rle(d$mygroup)$lengths) - 1
> ddply(myData, .(id), g)    # gives the same answer as above
> myDT[, list(nswitch = length(rle(mygroup)$lengths) - 1), by = 'id']   # ditto
>
>
> HTH,
> Dennis
>
> On Wed, Aug 24, 2011 at 9:48 AM, Juliet Hannah  
> wrote:
>> I have a data set with about 6 million rows and 50 columns. It is a
>> mixture of dates, factors, and numerics.
>>
>> What I am trying to accomplish can be seen with the following
>> simplified data, which is given as dput output below.
>>
>>> head(myData)
>>      mydate gender mygroup id
>> 1 2012-03-25      F       A  1
>> 2 2005-05-23      F       B  2
>> 3 2005-09-08      F       B  2
>> 4 2005-12-07      F       B  2
>> 5 2006-02-26      F       C  2
>> 6 2006-05-13      F       C  2
>>
>> For each id, I want to count the number of changes of the variable
>> 'mygroup' that occur. For example, id=1 has 0 changes because it is
>> observed only once.  id=2 has 2 changes (B to C, and C to D).  I also
>> need to calculate the total observation time for each id using the
>> variable mydate.  In the end, I am trying to have a new data set in
>> which each row has an id, days observed, number of changes, and
>> gender.
>>
>> I made some simple summaries using data.table and plyr, but I'm stuck
>> on this reformatting.
>>
>> Thanks for your help.
>>
>> myData <- structure(list(mydate = c("2012-03-25", "2005-05-23", "2005-09-08",
>> "2005-12-07", "2006-02-26", "2006-05-13", "2006-09-01", "2006-12-12",
>> "2006-02-19", "2006-05-03", "2006-04-23", "2007-12-08", "2011-03-19",
>> "2007-12-20", "2008-06-15", "2008-12-16", "2009-06-07", "2009-10-09",
>> "2010-01-28", "2007-06-05"), gender = c("F", "F", "F", "F", "F",
>> "F", "F", "F", "F", "F", "F", "F", "F", "M", "M", "M", "M", "M",
>> "M", "M"), mygroup = c("A", "B", "B", "B", "C", "C", "C", "D",
>> "D", "D", "D", "D", "D", "A", "A", "A", "B", "B", "B", "A"),
>>    id = c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>>    3L, 3L, 3L, 3L, 3L, 3L, 4L)), .Names = c("mydate", "gender",
>> "mygroup", "id"), class = "data.frame", row.names = c(NA, -20L
>> ))
>>
>>> sessionInfo()
>> R version 2.13.1 (2011-07-08)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function rank() for data frames (or multiple vectors)?

2011-08-24 Thread David Winsemius


On Aug 24, 2011, at 1:11 PM, Sebastian Bauer wrote:


Hi!


I'd like to rank rows of a data frame similar to what rank() does
for vectors. However, ties should be broken by columns that I
specify. If it is not possible to break a ties (because the row data
is essentially the same), I'd like to have the same flexibility that
rank() offers. Is there an elegant solution to this simple problem
in R? Basically, what I need is a mixture of order() and rank().
While the former allows to specify multiple vectors, it doesn't
provide the flexibility of rank() such that I can specify what
happens if ties can not be broken.

An example of this "simple problem" would clarify this greatly. I
cannot tell what "flexibility" in 'rank' is missing in 'order'.


Thanks for your answer. For instance, if I have two vectors such as

1 1
1 2
1 2
1 3
2 1

that I want combinedly ranked. I'd like to get an output

1
2
2
4
5

or (ties.method=average)

1
2.5
2.5
4
5

Basically, I need a function similar to the rank() function that  
accepts

more than one vector (as order() does).


Can't you just paste the columns and run rank on the results? 'rank'  
accepts character vectors.




Best,
Sebastian



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation and summaries with few million rows

2011-08-24 Thread Dennis Murphy
Hi Juliet:

Here's a Q & D solution:

# (1) plyr
> f <- function(d) length(unique(d$mygroup)) - 1
> ddply(myData, .(id), f)
  id V1
1  1  0
2  2  2
3  3  1
4  4  0

# (2) data.table

myDT <- data.table(myData, key = 'id')
myDT[, list(nswitch = length(unique(mygroup)) - 1), by = 'id']

If one can switch back and forth between levels more than once, then
the above is clearly not appropriate. A more robust method would be to
employ rle() [run length encoding]:

g <- function(d) length(rle(d$mygroup)$lengths) - 1
ddply(myData, .(id), g)# gives the same answer as above
myDT[, list(nswitch = length(rle(mygroup)$lengths) - 1), by = 'id']   # ditto


HTH,
Dennis

On Wed, Aug 24, 2011 at 9:48 AM, Juliet Hannah  wrote:
> I have a data set with about 6 million rows and 50 columns. It is a
> mixture of dates, factors, and numerics.
>
> What I am trying to accomplish can be seen with the following
> simplified data, which is given as dput output below.
>
>> head(myData)
>      mydate gender mygroup id
> 1 2012-03-25      F       A  1
> 2 2005-05-23      F       B  2
> 3 2005-09-08      F       B  2
> 4 2005-12-07      F       B  2
> 5 2006-02-26      F       C  2
> 6 2006-05-13      F       C  2
>
> For each id, I want to count the number of changes of the variable
> 'mygroup' that occur. For example, id=1 has 0 changes because it is
> observed only once.  id=2 has 2 changes (B to C, and C to D).  I also
> need to calculate the total observation time for each id using the
> variable mydate.  In the end, I am trying to have a new data set in
> which each row has an id, days observed, number of changes, and
> gender.
>
> I made some simple summaries using data.table and plyr, but I'm stuck
> on this reformatting.
>
> Thanks for your help.
>
> myData <- structure(list(mydate = c("2012-03-25", "2005-05-23", "2005-09-08",
> "2005-12-07", "2006-02-26", "2006-05-13", "2006-09-01", "2006-12-12",
> "2006-02-19", "2006-05-03", "2006-04-23", "2007-12-08", "2011-03-19",
> "2007-12-20", "2008-06-15", "2008-12-16", "2009-06-07", "2009-10-09",
> "2010-01-28", "2007-06-05"), gender = c("F", "F", "F", "F", "F",
> "F", "F", "F", "F", "F", "F", "F", "F", "M", "M", "M", "M", "M",
> "M", "M"), mygroup = c("A", "B", "B", "B", "C", "C", "C", "D",
> "D", "D", "D", "D", "D", "A", "A", "A", "B", "B", "B", "A"),
>    id = c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>    3L, 3L, 3L, 3L, 3L, 3L, 4L)), .Names = c("mydate", "gender",
> "mygroup", "id"), class = "data.frame", row.names = c(NA, -20L
> ))
>
>> sessionInfo()
> R version 2.13.1 (2011-07-08)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to do cross validation with glm?

2011-08-24 Thread Andra Isan
Hi,

Thanks for the reply. What I meant is that, I would like to partition my dat 
data (a data frame) into training and testing data and then evaluate the 
performance of the model on test data. So, I thought cross validation is the 
natural choice to see how the prediction works on the hold-out data. Is there 
any example that I can take a look to see how to do cross validation and get 
the prediction results on my data?

Thanks a lot,
Andra

--- On Wed, 8/24/11, Prof Brian Ripley  wrote:

> From: Prof Brian Ripley 
> Subject: Re: [R] How to do cross validation with glm?
> To: "Andra Isan" 
> Cc: r-help@r-project.org
> Date: Wednesday, August 24, 2011, 10:11 AM
> What you describe is not
> cross-validation, so I am afraid we do not know what you
> mean.  And cv.glm does 'prediction for the hold-out
> data' for you: you can read the code to see how it does so.
> 
> I suspect you mean you want to do validation on a test set,
> but that is not what you actually
> claim.   There are lots of examples of this
> sort of thing in MASS (the book, scripts in the package).
> 
> On Wed, 24 Aug 2011, Andra Isan wrote:
> 
> > Hi All,
> > 
> > I have a fitted model called glm.fit which I used glm
> and data dat is my data frame
> > 
> > pred= predict(glm.fit, data = dat, type="response")
> > 
> > to predict how it predicts on my whole data but
> obviously I have to do cross-validation to train the model
> on one part of my data and predict on the other part. So, I
> searched for it and I found a function cv.glm which is in
> package boot. So, I tired to use it as:
> > 
> > cv.glm = (cv.glm(dat, glm.fit, cost,
> K=nrow(dat))$delta)
> > 
> > but I am not sure how to do the prediction for the
> hold-out data. Is there any better way for cross-validation
> to learn a model on training data and test it on test data
> in R?
> > 
> > Thanks,
> > Andra
> > 
> > __
> > R-help@r-project.org
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> > 
> 
> -- Brian D. Ripley,         
>         rip...@stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,         
>    Tel:  +44 1865 272861 (self)
> 1 South Parks Road,         
>            +44 1865
> 272866 (PA)
> Oxford OX1 3TG, UK           
>     Fax:  +44 1865 272595
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to do cross validation with glm?

2011-08-24 Thread Prof Brian Ripley
What you describe is not cross-validation, so I am afraid we do not 
know what you mean.  And cv.glm does 'prediction for the hold-out 
data' for you: you can read the code to see how it does so.


I suspect you mean you want to do validation on a test set, but that 
is not what you actually claim.   There are lots of examples of this 
sort of thing in MASS (the book, scripts in the package).


On Wed, 24 Aug 2011, Andra Isan wrote:


Hi All,

I have a fitted model called glm.fit which I used glm and data dat 
is my data frame


pred= predict(glm.fit, data = dat, type="response")

to predict how it predicts on my whole data but obviously I have to 
do cross-validation to train the model on one part of my data and 
predict on the other part. So, I searched for it and I found a 
function cv.glm which is in package boot. So, I tired to use it as:


cv.glm = (cv.glm(dat, glm.fit, cost, K=nrow(dat))$delta)

but I am not sure how to do the prediction for the hold-out data. Is 
there any better way for cross-validation to learn a model on 
training data and test it on test data in R?


Thanks,
Andra

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function rank() for data frames (or multiple vectors)?

2011-08-24 Thread Sebastian Bauer
Hi!

>> I'd like to rank rows of a data frame similar to what rank() does
>> for vectors. However, ties should be broken by columns that I
>> specify. If it is not possible to break a ties (because the row data
>> is essentially the same), I'd like to have the same flexibility that
>> rank() offers. Is there an elegant solution to this simple problem
>> in R? Basically, what I need is a mixture of order() and rank().
>> While the former allows to specify multiple vectors, it doesn't
>> provide the flexibility of rank() such that I can specify what
>> happens if ties can not be broken.
> An example of this "simple problem" would clarify this greatly. I
> cannot tell what "flexibility" in 'rank' is missing in 'order'.

Thanks for your answer. For instance, if I have two vectors such as

1 1
1 2
1 2
1 3
2 1

that I want combinedly ranked. I'd like to get an output

1
2
2
4
5

or (ties.method=average)

1
2.5
2.5
4
5

Basically, I need a function similar to the rank() function that accepts
more than one vector (as order() does).

Best,
Sebastian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to do cross validation with glm?

2011-08-24 Thread Andra Isan
Hi All, 

I have a fitted model called glm.fit which I used glm and data dat is my data 
frame

pred= predict(glm.fit, data = dat, type="response") 

to predict how it predicts on my whole data but obviously I have to do 
cross-validation to train the model on one part of my data and predict on the 
other part. So, I searched for it and I found a function cv.glm which is in 
package boot. So, I tired to use it as:

cv.glm = (cv.glm(dat, glm.fit, cost, K=nrow(dat))$delta)

but I am not sure how to do the prediction for the hold-out data. Is there any 
better way for cross-validation to learn a model on training data and test it 
on test data in R? 

Thanks,
Andra

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data manipulation and summaries with few million rows

2011-08-24 Thread Juliet Hannah
I have a data set with about 6 million rows and 50 columns. It is a
mixture of dates, factors, and numerics.

What I am trying to accomplish can be seen with the following
simplified data, which is given as dput output below.

> head(myData)
  mydate gender mygroup id
1 2012-03-25  F   A  1
2 2005-05-23  F   B  2
3 2005-09-08  F   B  2
4 2005-12-07  F   B  2
5 2006-02-26  F   C  2
6 2006-05-13  F   C  2

For each id, I want to count the number of changes of the variable
'mygroup' that occur. For example, id=1 has 0 changes because it is
observed only once.  id=2 has 2 changes (B to C, and C to D).  I also
need to calculate the total observation time for each id using the
variable mydate.  In the end, I am trying to have a new data set in
which each row has an id, days observed, number of changes, and
gender.

I made some simple summaries using data.table and plyr, but I'm stuck
on this reformatting.

Thanks for your help.

myData <- structure(list(mydate = c("2012-03-25", "2005-05-23", "2005-09-08",
"2005-12-07", "2006-02-26", "2006-05-13", "2006-09-01", "2006-12-12",
"2006-02-19", "2006-05-03", "2006-04-23", "2007-12-08", "2011-03-19",
"2007-12-20", "2008-06-15", "2008-12-16", "2009-06-07", "2009-10-09",
"2010-01-28", "2007-06-05"), gender = c("F", "F", "F", "F", "F",
"F", "F", "F", "F", "F", "F", "F", "F", "M", "M", "M", "M", "M",
"M", "M"), mygroup = c("A", "B", "B", "B", "C", "C", "C", "D",
"D", "D", "D", "D", "D", "A", "A", "A", "B", "B", "B", "A"),
id = c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 4L)), .Names = c("mydate", "gender",
"mygroup", "id"), class = "data.frame", row.names = c(NA, -20L
))

> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function rank() for data frames (or multiple vectors)?

2011-08-24 Thread David Winsemius


On Aug 24, 2011, at 11:09 AM, Sebastian Bauer wrote:


Hello,

I'd like to rank rows of a data frame similar to what rank() does  
for vectors. However, ties should be broken by columns that I  
specify. If it is not possible to break a ties (because the row data  
is essentially the same), I'd like to have the same flexibility that  
rank() offers. Is there an elegant solution to this simple problem  
in R? Basically, what I need is a mixture of order() and rank().  
While the former allows to specify multiple vectors, it doesn't  
provide the flexibility of rank() such that I can specify what  
happens if ties can not be broken.


An example of this "simple problem" would clarify this greatly. I  
cannot tell what "flexibility" in 'rank' is missing in 'order'.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] silently testing for data from another package for .Rd examples

2011-08-24 Thread Uwe Ligges
Actually it is recommended to test for the availability of a valid 
package with find.package(), particularly in this case where the name of 
the package is already know.


Best,
Uwe



On 24.08.2011 18:29, Yihui Xie wrote:

.packages(all = TRUE) will give you a list of all available packages
without really loading them like require().

Regards,
Yihui
--
Yihui Xie
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA



On Wed, Aug 24, 2011 at 9:28 AM, Michael Friendly  wrote:

In an .Rd example for a package, I want to use data from another package,
but avoid loading the entire
package and avoid errors/warnings if that other package is not available.

If I don't care about loading the other package, I can just do:

if (require("ElemStatLearn", quietly=TRUE)) {
data(prostate)
  #  rest of example
}

I'd rather just be able to do something like:

if (data(prostate, package="ElemStatLearn")) {
  #  rest of example
}

but it appears that data() doesn't return anything useful (like FALSE or
NULL) in case the named data
set doesn't exist, or the package cannot be found.  Below are some test
cases in a fresh R 2.13.1 session.

Is there someway I can incorporate such a data example silently without
errors or warnings if the
package doesn't exist, as is the case with require()?


data(prostate, package="ElemStatLearn")
dd<- data(prostate, package="ElemStatLearn")
dd

[1] "prostate"

dd2<- data(x, package="ElemStatLearn")

Warning message:
In data(x, package = "ElemStatLearn") : data set 'x' not found

dd2

[1] "x"

dd2<- data(x, package="ElemStatLearn", verbose=FALSE)

Warning message:
In data(x, package = "ElemStatLearn", verbose = FALSE) :
  data set 'x' not found


dd3<- data(z, package="foobar")

Error in find.package(package, lib.loc, verbose = verbose) :
  there is no package called 'foobar'

dd3

Error: object 'dd3' not found




try() doesn't seem to help here:


ddtry<- try(data(z, package="foobar"))

Error in find.package(package, lib.loc, verbose = verbose) :
  there is no package called 'foobar'

ddtry

[1] "Error in find.package(package, lib.loc, verbose = verbose) : \n  there
is no package called 'foobar'\n"
attr(,"class")
[1] "try-error"




--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >