[R] [R-pkgs] new package 'trackObjs' - mirror objects to files, provide summaries & modification times

2007-09-10 Thread Tony Plate
bles are saved on
disk and will be no longer accessible until tracking is
started again.

 *  The objects are stored each in their own file in the
tracking dir, in the format used by 'save()'/'load()' (RData
files).

List of basic functions and common calling patterns:

  Six functions cover the majority of common usage of the trackObjs
  package:

 *  'track.start(dir=...)': start tracking the global
environment, with files saved in 'dir'

 *  'track.stop()': stop tracking (any unsaved tracked variables
are saved to disk and all tracked variables become
unavailable until tracking starts again)

 *  'track(x)': start tracking 'x' - 'x' in the global
environment is replaced by an active binding and 'x' is
saved in its corresponding file in the tracking directory
and, if caching is on, in the tracking environment

 *  'track(x <- value)': start tracking 'x'

 *  'track(list=c('x', 'y'))': start tracking specified
variables

 *  'track(all=TRUE)': start tracking all untracked variables in
the global environment

 *  'untrack(x)': stop tracking variable 'x' - the R object 'x'
is put back as an ordinary object in the global environment

 *  'untrack(all=TRUE)': stop tracking all variables in the
global environment (but tracking is still set up)

 *  'untrack(list=...)': stop tracking specified variables

 *  'track.summary()': print a summary of the basic
characteristics of tracked variables: name, class, extent,
and creation, modification and access times.

 *  'track.remove(x)': completely remove all traces of 'x' from
the global environment, tracking environment and tracking
directory.   Note that if variable 'x' in the global
environment is tracked, 'remove(x)' will make 'x' an
"orphaned" variable: 'remove(x)' will just remove the active
binding from the global environment, and leave 'x' in the
tracked environment and on file, and 'x' will reappear after
restarting tracking.

Complete list of functions and common calling patterns:

  The 'trackObjs' package provides many additional functions for
  controlling how tracking is performed (e.g., whether or not
  tracked variables are cached in memory), examining the state of
  tracking (show which variables are tracked, untracked, orphaned,
  masked, etc.) and repairing tracking environments and databases
  that have become inconsistent or incomplete (this may result from
  resource limitiations, e.g., being unable to write a save file due
  to lack of disk space, or from manual tinkering, e.g., dropping a
  new save file into a tracking directory.)

[truncated here -- see ?trackObjs]

-- Tony Plate

PS: to give credit where due, the end of ?trackObjs says:

References:
  Roger D. Peng. Interacting with data using the filehash package. R
  News, 6(4):19-24, October 2006.
  'http://cran.r-project.org/doc/Rnews' and
  'http://sandybox.typepad.com/software'

  David E. Brahm. Delayed data packages. R News, 2(3):11-12,
  December 2002.  'http://cran.r-project.org/doc/Rnews'

See Also:
  [...]
  Inspriation from the packages 'g.data' and 'filehash'.

___
R-packages mailing list
[EMAIL PROTECTED]
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Q: selecting a name when it is known as a string

2007-09-05 Thread Tony Plate
For the column names of the result of expand.grid(), I would just assign 
them the values I wanted, like this:

 > x <- expand.grid(tmp=1:3,y=1:2)
 > x
   tmp y
1   1 1
2   2 1
3   3 1
4   1 2
5   2 2
6   3 2
 > colnames(x)[1] <- "whatever"
 > x
   whatever y
11 1
22 1
33 1
41 2
52 2
6    3 2
 >

-- Tony Plate

D. R. Evans wrote:
> D. R. Evans said the following at 09/04/2007 04:14 PM :
>> I am 100% certain that there is an easy way to do this, but after
> 
> I have reconsidered this and now believe it to be essentially impossible
> (or at the very least remarkably difficult) although I don't understand why
> it is so :-(
> 
> At least, I spent another two hours trying variations on the suggestions I
> received, but still nothing worked properly.
> 
> It sure seems like it _ought_ to be easy, because of the following argument:
> 
> If I type an expression such as "A <- " then R is perfectly
> capable of parsing the  and executing it and assigning the
> result to A. So it seems to follow that it ought to be able to parse a
> string that contains exactly the same sequence of characters (after all,
> why should the R parsing engine care whether the input string comes from
> the terminal or from a variable?) and therefore it should be possible to
> assign "" to a variable and then have R parse that variable
> precisely as if it had been typed.
> 
> That was my logic as to why this ought to be easy, anyway. (And there was
> the subsidiary argument that this is easy in the other languages I use, but
> R is sufficiently different that I'm not certain that that argument carries
> much force.)
> 
> It does seem that there are several ways to make the
> 
>   lo <- loess(percent ~ ncms * ds, d, control=loess.control(trace.hat =
>> 'approximate'))
> 
> command work OK if the right hand side is in a character variable, but I
> haven't been able to find a way to make
> 
>   grid <- data.frame(expand.grid(ds=MINVAL:MAXVAL, ncms=MINCMS:MAXCMS))
> 
> work.
> 
> I always end up with a parse error or a complaint that "'newdata' does not
> contain the variables needed" when I perform the next task:
> 
>   plo <- predict(lo, grid).
> 
> So I guess I have to stick with half a dozen compound "if" statements, all
> of which do essentially the same thing :-(
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Q: selecting a name when it is known as a string

2007-09-04 Thread Tony Plate
You can use substitute() for this.  The drawback with this approach is 
that the formula in the call in the printed value of loess() is ugly.

 > x <- data.frame(y=rnorm(20), x1=rnorm(20), x2=rnorm(20))
 > loess(y~x2, data=x)
Call:
loess(formula = y ~ x2, data = x)

Number of Observations: 20
Equivalent Number of Parameters: 4.68
Residual Standard Error: 1.208
 > loess(substitute(y~X, list(X=as.name('x2'))), data=x)
Call:
loess(formula = substitute(y ~ X, list(X = as.name("x2"))), data = x)

Number of Observations: 20
Equivalent Number of Parameters: 4.68
Residual Standard Error: 1.208
 > loess(y~x1, data=x)
Call:
loess(formula = y ~ x1, data = x)

Number of Observations: 20
Equivalent Number of Parameters: 4.87
Residual Standard Error: 1.179
 > loess(substitute(y~X, list(X=as.name('x1'))), data=x)
Call:
loess(formula = substitute(y ~ X, list(X = as.name("x1"))), data = x)

Number of Observations: 20
Equivalent Number of Parameters: 4.87
Residual Standard Error: 1.179
 >

hope this helps,

Tony Plate


D. R. Evans wrote:
> I am 100% certain that there is an easy way to do this, but after
> experimenting off and on for a couple of days, and searching everywhere I
> could think of, I haven't been able to find the trick.
> 
> I have this piece of code:
> 
> ...
>   attach(d)
> 
>   if (ORDINATE == 'ds')
>   { lo <- loess(percent ~ ncms * ds, d, control=loess.control(trace.hat =
> 'approximate'))
> grid <- data.frame(expand.grid(ds=MINVAL:MAXVAL, ncms=MINCMS:MAXCMS))
> ...
> 
> then there several almost-identical "if" statements for different values of
> ORDINATE. For example, the next "if" statement starts with:
> 
> ...
>   if (ORDINATE == 'dsl')
>   { lo <- loess(percent ~ ncms * dsl, d, control=loess.control(trace.hat =
> 'approximate'))
> grid <- data.frame(expand.grid(dsl=MINVAL:MAXVAL, ncms=MINCMS:MAXCMS))
> ...
> 
> This is obviously pretty silly code (although of course it does work).
> 
> I imagine that my question is obvious: given that I have a variable,
> ORDINATE, whose value is a string, how do I re-write statements such as the
> "lo <-" and "grid <-" statements above so that they use ORDINATE instead of
> the hard-coded names "ds" and "dsl".
> 
> I am almost sure (almost) that it has something to do with "deparse()", but
> I couldn't find the right incantation, and the ?deparse() help left my head
> swimming.
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how do i use the get function to obtain an element from a list...

2007-08-21 Thread Tony Plate
One simple way that I haven't seen mentioned yet is to do:

 > get("a")$x

(which of course allows further variants such as get("a")$x[3:6] ...)


-- Tony Plate

Juan Manuel Barreneche wrote:
> my problem can be explained with the following example:
> 
> x <- 1:12
> y <- 13:24
> a <- data.frame(x = x, y = y)
> 
> ## if i write
> a$x
> ## it returns
> [1]  1  2  3  4  5  6  7  8  9 10 11 12
> 
> ## but the function get doesn't recognize a$x. Instead it produces the
> following error:
> get("a$x")
> Error in get(x, envir, mode, inherits) : variable "a$x" was not found
> 
> i intend to do it inside a loop, using a new object (and hence, a new
> name) for each iteration (i.e., instead of a$x, it would be a$1, a$2,
> a$3, and so on, for a million times).
> 
> i would greatly appreciate it if someone could help me on this issue,
> 
> thanks in advance,
> 
> Juan Manuel Barreneche,
> Zoología de Vertebrados,
> Facultad de Ciencias,
> UDELAR, Uruguay.
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] poor rbind performance

2007-07-18 Thread Tony Plate

As Jim points out, building up a data frame by rbinding in a loop can be 
a slow way to do things in R.

Here's an example of how you can easily read data frames into a list:

 > # Create 3 files
 > invisible(lapply(1:3, function(i) 
write.csv(file=paste("tmp",i,".csv",sep=""), 
data.frame(i=2*i+(1:2),c=letters[2*i+(1:2)]
 > # Read the files into a list of data frames
 > list.of.dfs <- lapply(paste("tmp",1:3,".csv",sep=""), read.csv, 
row.names=1)
 > # rbind the data frames
 > myData <- do.call("rbind", list.of.dfs)
 > myData
   i c
1 3 c
2 4 d
3 5 e
4 6 f
5 7 g
6 8 h
 >

(and of course, these last two expressions can be composed into a single 
expression if you want)

-- Tony Plate

Aydemir, Zava (FID) wrote:
> Hi
>  
> I rbind data frames in a loop in a cumulative way and the performance
> detriorates very quickly. 
>  
> My code looks like this:
>  
> for( k in 1:N)
> {
> filename <- paste("/tmp/myData_",as.character(k),".txt",sep="")
> myDataTmp <- read.table(filename,header=TRUE,sep=",")
> if( k == 1) {
> myData <- myDataTmp
> }
> else{
> myData <- rbind(myData,myDataTmp)
> }  
> }
>  
> Some more details:
> - the size of the stored text files is about 100,000 rows and 50 columns
> each
> - for k=1: rbind takes 0.0004 seconds
> - for k=2: rbind takes 13 seconds
> - for k=3: rbind takes 30 seconds
> - for k=4: rbind takes 36 seconds
> etc
>  
> Any suggestions to improve speed?
>  
> Thanks
>  
> Zava
> 
> 
> This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with RSVGTipsDevice

2007-06-18 Thread Tony Plate
The new version of RSVGTipsDevice (0.7.1) that is now available on CRAN 
should fix this problem.  Please let me know if it doesn't, or if there 
are other problems.

-- Tony Plate

mister_bluesman wrote:
> Hi there.
> 
> I am still trying to get the RSVGTipsDevice to work, yet I can not.
> 
> I have copied the first example from RSVGTipsDevice documentation:
> 
> library(RSVGTipsDevice)
> devSVGTips("C:\\svgplot1.svg", toolTipMode=1,
> title="SVG example plot 1: shapes and points, tooltips are title + 1 line")
> plot(c(0,10),c(0,10), type="n", xlab="x", ylab="y",
> main="Example SVG plot with title + 1 line tips (mode=1)")
> setSVGShapeToolTip(title="A rectangle", desc="that is yellow")
> rect(1,1,4,6, col='yellow')
> setSVGShapeToolTip(title="1st circle with title only")
> points(5.5,7.5,cex=20,pch=19,col='red')
> setSVGShapeToolTip(title="A triangle", desc="big and green")
> polygon(c(3,6,8), c(3,6,3), col='green')
> # no tooltips on these points
> points(2:8, 8:2, cex=3, pch=19, col='black')
> # tooltips on each these points
> invisible(sapply(1:7, function(x)
> {setSVGShapeToolTip(title=paste("point", x))
> points(x+1, 8-x, cex=3, pch=1, col='black')}))
> dev.off()
> 
> This results in the following output:
> 
> http://www.nabble.com/file/p11064573/svgplot1.svg svgplot1.svg 
> 
> It opens but when I try and hover over the triangle, for example, I do not
> get a topptip box appear. I have tried opening the file though firefox, and
> XP IE - and on more than one computer yet it does not work. Do I need to
> install something else as well?
> 
> Many thanks

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to find how many modes in 2 dimensions case

2007-06-09 Thread Tony Plate
If you want to count the local maxima in the n x n matrix returned by 
kde2d, AND you know there are no ties, you could do something like the 
following:

 > set.seed(1)
 > x <- matrix(sample(10, 25, rep=TRUE), 5, 5)
 > x
  [,1] [,2] [,3] [,4] [,5]
[1,]3935   10
[2,]4   10283
[3,]677   107
[4,]   107442
[5,]31883
 > sum(x > cbind(0, x[,-5]) & x > cbind(x[,-1], 0) & x > rbind(x[-1,], 
0) & x > rbind(0, x[-5,]))
[1] 4
 >

Just be careful that your counting formula matches your definition of 
"neighbor" (the above formula does not include diagonal neighbors).

And of course, ties make things more complicated (note that the above 
simple algorithm misses the local maximum consisting of two 8's in the 
last row.)

-- Tony Plate


Patrick Wang wrote:
> Hi,
> 
> Does anyone know how to count the number of modes in 2 dimensions using
> kde2d function?
> 
> Thanks
> Pat
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interactive plots?

2007-05-25 Thread Tony Plate
The package RSVGTipsDevice allows you to do just it just -- you create a 
plot in an SVG file that can be viewed in a browser like FireFox, and 
the points (or shapes) in that plot can have pop-up tooltips.

-- Tony Plate

mister_bluesman wrote:
> Hi there. 
> 
> I have a matrix that provides place names and the distances between them:
> 
>Chelt Exeter   London  Birm
> Chelt 0   118 96  50
> Exeter   1180   118 163
> London  96 118 0   118
> Birm  50 163 118 0
> 
> After performing multidimensional scaling I get the following points plotted
> as follows
> 
> http://www.nabble.com/file/p10810700/demo.jpeg 
> 
> I would like to know how if I hover a point I can get a little box telling
> me which place the point refers to. Does anyone know?
> 
> Many thanks.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R2 always increases as variables are added?

2007-05-24 Thread Tony Plate
The answer to your question three is that the calculation of r-squared 
in summary.lm does depend on whether or not an intercept is included in 
the model.  (Another part of the reason for you puzzlement is, I think, 
that you are computing R-squared as SSR/SST, which is only valid when 
when the model has an intercept).

The code is in summary.lm, here are the relevant excerpts (assuming your 
model does not have weights):

 r <- z$residuals
 f <- z$fitted
 w <- z$weights
 if (is.null(w)) {
 mss <- if (attr(z$terms, "intercept"))
 sum((f - mean(f))^2)
 else sum(f^2)
 rss <- sum(r^2)
 }
...
 ans$r.squared <- mss/(mss + rss)

If you want to compare models with and without an intercept based on 
R^2, then I suspect it's most appropriate to use the version of R^2 that 
does not use a mean.

It's also worthwhile thinking about what you are actually doing.  I find 
the most intuitive definition of R^2 
(http://en.wikipedia.org/wiki/R_squared) is

R2 = 1 - SSE / SST

where SSE = sum_i (yhat_i - y_i)^2, (sum of errors in predictions for 
you model)
and SST = sum_i (y_i - mean(y))^2 (sum of errors in predictions for an 
intercept-only model)

This means that the standard definition of R2 effectively compares the 
model with an intercept-only model.  As the error in predictions goes 
down, R2 goes up, and the model that uses the mean(y) as a prediction 
(i.e., the intercept-only model) provides a scale for these errors.

If you think or know that the true mean of y is zero then it may be 
appropriate to compare against a zero model rather than an 
intercept-only model (in SST).  And if the sample mean of y is quite 
different from zero, and you compare a no-intercept model against an 
intercept-only model, then you're going to get results that are not 
easily interpreted.

Note that a common way of expressing and computing R^2 is as SSR/SST 
(which you used).  (Where SSR = sum_i (yhat_i - mean(y))^2 ). However, 
this is only valid when the model has an intercept (i.e., SSR/SST = 1 - 
SSE/SST ONLY when the model has an intercept.)

Here's some examples, based on your example:

 > set.seed(1)
 > data <- data.frame(x1=rnorm(10), x2=rnorm(10), y=rnorm(10), I=1)
 >
 > lm1 <- lm(y~1, data=data)
 > summary(lm1)$r.squared
[1] 0
 > y.hat <- fitted(lm1)
 > sum((y.hat-mean(data$y))^2)/sum((data$y-mean(data$y))^2)
[1] 5.717795e-33
 >
 > # model with no intercept
 > lm2 <- lm(y~x1+x2-1, data=data)
 > summary(lm2)$r.squared
[1] 0.6332317
 > y.hat <- fitted(lm2)
 > # no-intercept version of R^2 (2 ways to compute)
 > 1-sum((y.hat-data$y)^2)/sum((data$y)^2)
[1] 0.6332317
 > sum((y.hat)^2)/sum((data$y)^2)
[1] 0.6332317
 > # standard (assuming model has intercept) computations for R^2:
 > SSE <- sum((y.hat - data$y)^2)
 > SST <- sum((data$y - mean(data$y))^2)
 > SSR <- sum((y.hat - mean(data$y))^2)
 > 1 - SSE/SST
[1] 0.6252577
 > # Note that SSR/SST != 1 - SSE/SST (because the model doesn't have an 
intercept)
 > SSR/SST
[1] 0.6616612
 >
 > # model with intercept included in data
 > lm3 <- lm(y~x1+x2+I-1, data=data)
 > summary(lm3)$r.squared
[1] 0.6503186
 > y.hat <- fitted(lm3)
 > # no-intercept version of R^2 (2 ways to compute)
 > 1-sum((y.hat-data$y)^2)/sum((data$y)^2)
[1] 0.6503186
 > sum((y.hat)^2)/sum((data$y)^2)
[1] 0.6503186
 > # standard (assuming model has intercept) computations for R^2:
 > SSE <- sum((y.hat - data$y)^2)
 > SST <- sum((data$y - mean(data$y))^2)
 > SSR <- sum((y.hat - mean(data$y))^2)
 > 1 - SSE/SST
[1] 0.6427161
 > SSR/SST
[1] 0.6427161
 >
 >

hope this helps,

Tony Plate

Disclaimer: I too do not have any degrees in statistics, but I'm 95% 
sure the above is mostly correct :-)  If there are any major mistakes, 
I'm sure someone will point them out.

??? wrote:
> Hi, everybody,
> 
> 3 questions about R-square:
> -(1)--- Does R2 always increase as variables are added?
> -(2)--- Does R2 always greater than 1?
> -(3)--- How is R2 in summary(lm(y~x-1))$r.squared
> calculated? It is different from (r.square=sum((y.hat-mean
> (y))^2)/sum((y-mean(y))^2))
> 
> I will illustrate these problems by the following codes:
> -(1)---  R2  doesn't always increase as variables are added
> 
>> x=matrix(rnorm(20),ncol=2)
>> y=rnorm(10)
>>
>> lm=lm(y~1)
>> y.hat=rep(1*lm$coefficients,length(y))
>> (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> [1] 2.646815e-33
>> lm=lm(y~x-1)
>> y.hat=x%*%lm$coefficients
>> (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> [1] 0.4443356
>>  This is the biggest model, but its R2 is not the biggest,
> why?
>> lm=lm(

Re: [R] getting informative error messages

2007-05-11 Thread Tony Plate
Prof Brian Ripley wrote:
> It is not clear to me what you want here.

I just wanted to be able to quickly find the expression in which an 
error occurred when it was inside a lengthy function.  I now know that 
'debug()' can help with this (debug() allows me to easily step through 
the function and see where the error occurs.)

> Errors are tagged by a 'call', and f(1:3) is the innermost 'call' (special 
> primitives do not set a context and so do not count if you consider '[' 
> to be a function).

Thanks for the explanation.  I suspected that it had something to do 
with primitive functions, but was unable to confirm that by searching.

> 
> The message could tell you what the type was, but it does not and we have 
> lost the pool of active contributors we once had to submit tested patches 
> for things like that.

What is required to test patches for things like this?  Is there 
anything written up on that anywhere?  I've not been able to clearly 
discern what the desired output of 'make check' is -- there seem to be 
reported differences that don't actually matter, but I didn't see a fast 
and easy way of distinguishing those from the ones that do matter.  I 
did look in R-exts, and on developer.r-project.org but was unable to 
find clear guidance there either.

-- Tony Plate
> 
> 
> On Mon, 7 May 2007, Tony Plate wrote:
> 
>> Certain errors seem to generate messages that are less informative than
>> most -- they just tell you which function an error happened in, but
>> don't indicate which line or expression the error occurred in.
>>
>> Here's a toy example:
>>
>>> f <- function(x) {a <- 1; y <- x[list(1:3)]; b <- 2; return(y)}
>>> options(error=NULL)
>>> f(1:3)
>> Error in f(1:3) : invalid subscript type
>>> traceback()
>> 1: f(1:3)
>> In this function, it's clear that the error is in subscripting 'x', but
>> it's not always so immediately obvious in lengthier functions.
>>
>> Is there anything I can do to get a more informative error message in
>> this type of situation?  I couldn't find any help in the section
>> "Debugging R Code" in "R-exts" (or anything at all relevant in "R-intro").
>>
>> (Different values for options(error=...) and different formatting of the
>> function made no difference.)
>>
>> -- Tony Plate
>>
>>> sessionInfo()
>> R version 2.5.0 (2007-04-23)
>> i386-pc-mingw32
>>
>> locale:
>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>> States.1252;LC_MONETARY=English_United
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] "stats" "graphics"  "grDevices" "utils" "datasets"  "methods"
>> [7] "base"
>>
>> other attached packages:
>> tap.misc
>>"1.0"
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] getting informative error messages

2007-05-07 Thread Tony Plate
Certain errors seem to generate messages that are less informative than 
most -- they just tell you which function an error happened in, but 
don't indicate which line or expression the error occurred in.

Here's a toy example:

 > f <- function(x) {a <- 1; y <- x[list(1:3)]; b <- 2; return(y)}
 > options(error=NULL)
 > f(1:3)
Error in f(1:3) : invalid subscript type
 > traceback()
1: f(1:3)
 >

In this function, it's clear that the error is in subscripting 'x', but 
it's not always so immediately obvious in lengthier functions.

Is there anything I can do to get a more informative error message in 
this type of situation?  I couldn't find any help in the section 
"Debugging R Code" in "R-exts" (or anything at all relevant in "R-intro").

(Different values for options(error=...) and different formatting of the 
function made no difference.)

-- Tony Plate

 > sessionInfo()
R version 2.5.0 (2007-04-23)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] "stats" "graphics"  "grDevices" "utils" "datasets"  "methods"
[7] "base"

other attached packages:
tap.misc
"1.0"
 >

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] new package: RSVGTipsDevice: create SVG plots with tooltips & hyperlinks

2007-05-03 Thread Tony Plate
the DESCRIPTION file:

Package: RSVGTipsDevice
Version: 0.7.0
Date:04/30/2007
Title:   An R SVG graphics device with dynamic tips and hyperlinks
Author:  Tony Plate <[EMAIL PROTECTED]>, based on RSvgDevice by T Jake 
Luciani <[EMAIL PROTECTED]>
Maintainer: Tony Plate <[EMAIL PROTECTED]>
Depends: R (>= 1.4)
Description: A graphics device for R that uses the w3.org xml standard
 for Scalable Vector Graphics.  This version supports
 tooltips with 1 to 3 lines, hyperlinks, and line styles.
License: GPL version 2 or newer. http://www.gnu.org/copyleft/gpl.html

___
R-packages mailing list
[EMAIL PROTECTED]
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] applying rbind to list elements

2007-04-25 Thread Tony Plate
do.call("rbind", l)

or, in the case of matrices, using the abind package:

abind(l, along=1)

 > library(abind)
 > l <- list(matrix(1:6, ncol=2), matrix(11:14, ncol=2))
 > abind(l, along=1)
  [,1] [,2]
[1,]14
[2,]25
[3,]36
[4,]   11   13
[5,]   12   14
 >

Hendrik Fuß wrote:
> Hi,
> 
> I have a list of n data.frames (or matrices) which I would like to
> convert to a single data.frame using rbind:
> 
>x <- rbind( l[[1]], l[[2]], l[[3]], l[[4]], ..., l[[n]] )
> 
> Is there a simple way to do this?
> 
> thanks
> Hendrik
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regular expressions with grep() and negative indexing

2007-04-25 Thread Tony Plate
I use regexpr() instead of grep() in cases like this, e.g.:

x2[regexpr("exclude",x2)==-1]

(regexpr returns a vector of the same length as character vector given 
it, so there's no problem with it returning a zero length vector)

-- Tony Plate

Peter Dalgaard wrote:
> Stephen Tucker wrote:
>> Dear R-helpers,
>>
>> Does anyone know how to use regular expressions to return vector elements
>> that don't contain a word? For instance, if I have a vector
>>   x <- c("seal.0","seal.1-exclude")
>> I'd like to get back the elements which do not contain the word "exclude",
>> using something like (I know this doesn't work) but:
>>   grep("[^(exclude)]",x)
>>
>> I can use 
>>   x[-grep("exclude",x)]
>> for this case but then if I use this expression in a recursive function, it
>> will not work for instances in which the vector contains no elements with
>> that word. For instance, if I have
>>   x2 <- c("dolphin.0","dolphin.1")
>> then
>>   x2[-grep("exclude",x2)]
>> will give me 'character(0)'
>>
>> I know I can accomplish this in several steps, for instance:
>>   myfunc <- function(x) {
>> iexclude <- grep("exclude",x)
>> if(length(iexclude) > 0) x2 <- x[-iexclude] else x2 <- x
>> # do stuff with x2 <...?
>>   }
>>
>> But this is embedded in a much larger function and I am trying to minimize
>> intermediate variable assignment (perhaps a futile effort). But if anyone
>> knows of an easy solution, I'd appreciate a tip.
>>   
> It has come up a couple of times before, and yes, it is a bit of a pain.
> 
> Probably the quickest way out is
> 
> negIndex <- function(i) 
> 
>if(length(i))
> 
>-i 
> 
>else 
> 
>TRUE
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] intersect more than two sets

2007-04-24 Thread Tony Plate
I don't think there's that sort of "apply-reduce" function in R, but for 
this problem, the last line below happens to be a "one-liner":

 > set.seed(1)
 > x <- lapply(1:10, function(i) sample(letters, 20))
 > table(unlist(x))

  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  w  x 
  y  z
  6  8  7  8  9  9 10  9  8 10  6  7  9  7  6  8  8  6  9  6  9  6  9  7 
  6  7
 > which(table(unlist(x))==10)
  g  j
  7 10
 > names(which(table(unlist(x))==10))
[1] "g" "j"
 >


Weiwei Shi wrote:
> assume t2 is a list of size 11 and each element is a vector of characters.
> 
> the following codes can get what I wanted but I assume there might be
> a one-line code for that:
> 
> t3 <- t2[[1]]
> for ( i in 2:11){
>   t3 <- intersect(t2[[i]], t3)
> }
> 
> or there is no such "apply"?
> 
> On 4/24/07, Weiwei Shi <[EMAIL PROTECTED]> wrote:
>> Hi,
>> I searched the archives and did not find a good solution to that.
>>
>> assume I have 10 sets and I want to have the common character elements of 
>> them.
>>
>> how could i do that?
>>
>> --
>> Weiwei Shi, Ph.D
>> Research Scientist
>> GeneGO, Inc.
>>
>> "Did you always know?"
>> "No, I did not. But I believed..."
>> ---Matrix III
>>
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Handling of arrays

2007-04-24 Thread Tony Plate
Try the following and look at what they return:

str(ca)
dimnames(ca)

-- Tony Plate

[EMAIL PROTECTED] wrote:
> Dear R-Experts,
> 
> I just imported a workspace from Matlab. I know that I can get the names of 
> the imported variables with names(). It works. The variable "ca" consists of 
> several elements. I want to get the names of the elements to handle my output 
> better. But names(ca) doesn't work. Why? I did the following commands:
> 
>> class(ca)
> [1] "array"
>> mode(ca)
> [1] "list"
>> dim(ca)
> [1] 66  1  1
>> length(ca)
> [1] 66
> 
> How can I now get the names which are stored in ca? When I use the command 
> "ca[18]" I receive the content which stands there but not the name collables 
> which I wanted to extract. 
> 
> Any ideas?
> 
> Thanks, Corinna

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fastest way to repeatedly subset a data frame?

2007-04-20 Thread Tony Plate
This type of information about speeds of various techniques can really 
only be found out by trying things out, especially because R-core has 
recently made a fair number of improvements to some of the underlying 
code in R.  That's part of the reason I put these tests together -- I 
wanted to know for myself what sort of speed differences there was now 
among the various approaches.

-- Tony Plate

Iestyn Lewis wrote:
> This is fantastic.  I just tested the first match() method and it is 
> acceptably fast.  I'll look into some of the even better methods 
> later.   Thank you for taking the time to put this together.
> 
> Is this kind of optimization information on the web anywhere?  I can 
> imagine that a lot of people have slow sets of commands that could be 
> optimized with this kind of knowledge. 
> 
> Thank you so much,
> 
> Iestyn
> 
> Tony Plate wrote:
>> Here's some timings on seemingly minor variations of data structure 
>> showing timings ranging by a factor of 100 (factor of 3 if the worst 
>> is omitted).  One of the keys is to avoid use of the partial string 
>> match that happens with ordinary data frame subscripting.
>>
>> -- Tony Plate
>>
>>> n <- 1 # number of rows in data frame
>>> k <- 500   # number of vectors in indexing list
>>> # use a data frame with regular row names and id as factor (defaults 
>> for data.frame)
>>> df <- data.frame(id=paste("ID", seq(len=n), sep=""), 
>> result=seq(len=n), stringsAsFactors=TRUE)
>>> object.size(df)
>> [1] 440648
>>> df[1:3,,drop=FALSE]
>>id result
>> 1 ID1  1
>> 2 ID2  2
>> 3 ID3  3
>>> set.seed(1)
>>> ids <- lapply(seq(k), function(i) paste("ID", sample(n, 
>> size=sample(seq(ceiling(n/1000), n/2, 1))), sep=""))
>>> sum(sapply(ids, length))
>> [1] 1263508
>>> system.time(lapply(ids, function(i) df[match(i, df$id),,drop=FALSE]))
>>user  system elapsed
>>3.000.003.03
>>> # use a data frame with automatic row names (should be low overhead) 
>> and id as factor
>>> df <- data.frame(id=paste("ID", seq(len=n), sep=""), 
>> result=seq(len=n), row.names=NULL, stringsAsFactors=TRUE)
>>> object.size(df)
>> [1] 440648
>>> df[1:3,,drop=FALSE]
>>id result
>> 1 ID1  1
>> 2 ID2  2
>> 3 ID3  3
>>> set.seed(1)
>>> ids <- lapply(seq(k), function(i) paste("ID", sample(n, 
>> size=sample(seq(ceiling(n/1000), n/2, 1))), sep=""))
>>> sum(sapply(ids, length))
>> [1] 1263508
>>> system.time(lapply(ids, function(i) df[match(i, df$id),,drop=FALSE]))
>>user  system elapsed
>>2.680.002.70
>>> # use a data frame with automatic row names (should be low overhead) 
>> and id as character
>>> df <- data.frame(id=paste("ID", seq(len=n), sep=""), 
>> result=seq(len=n), row.names=NULL, stringsAsFactors=FALSE)
>>> object.size(df)
>> [1] 400448
>>> df[1:3,,drop=FALSE]
>>id result
>> 1 ID1  1
>> 2 ID2  2
>> 3 ID3  3
>>> set.seed(1)
>>> ids <- lapply(seq(k), function(i) paste("ID", sample(n, 
>> size=sample(seq(ceiling(n/1000), n/2, 1))), sep=""))
>>> sum(sapply(ids, length))
>> [1] 1263508
>>> system.time(lapply(ids, function(i) df[match(i, df$id),,drop=FALSE]))
>>user  system elapsed
>>1.540.001.59
>>> # use a data frame with ids as the row names & subscripting for 
>> matching (should be high overhead)
>>> df <- data.frame(id=paste("ID", seq(len=n), sep=""), 
>> result=seq(len=n), row.names="id")
>>> object.size(df)
>> [1] 400384
>>> df[1:3,,drop=FALSE]
>> result
>> ID1  1
>> ID2  2
>> ID3  3
>>> set.seed(1)
>>> ids <- lapply(seq(k), function(i) paste("ID", sample(n, 
>> size=sample(seq(ceiling(n/1000), n/2, 1))), sep=""))
>>> sum(sapply(ids, length))
>> [1] 1263508
>>> system.time(lapply(ids, function(i) df[i,,drop=FALSE]))
>>user  system elapsed
>>  109.150.04  111.28
>>> # use a data frame with ids as the row names & match()
>>> df <- data.frame(id=paste("ID", seq(len=n), sep=""), 
>> result=seq(len=n), row.names="id")
>>> object.size(df)
>> [1] 400384
>>> df[1:3,,drop=FALSE]
>> result
>> ID1  1
&

Re: [R] Fastest way to repeatedly subset a data frame?

2007-04-20 Thread Tony Plate
Here's some timings on seemingly minor variations of data structure 
showing timings ranging by a factor of 100 (factor of 3 if the worst is 
omitted).  One of the keys is to avoid use of the partial string match 
that happens with ordinary data frame subscripting.

-- Tony Plate

 > n <- 1 # number of rows in data frame
 > k <- 500   # number of vectors in indexing list
 > # use a data frame with regular row names and id as factor (defaults 
for data.frame)
 > df <- data.frame(id=paste("ID", seq(len=n), sep=""), 
result=seq(len=n), stringsAsFactors=TRUE)
 > object.size(df)
[1] 440648
 > df[1:3,,drop=FALSE]
id result
1 ID1  1
2 ID2  2
3 ID3  3
 > set.seed(1)
 > ids <- lapply(seq(k), function(i) paste("ID", sample(n, 
size=sample(seq(ceiling(n/1000), n/2, 1))), sep=""))
 > sum(sapply(ids, length))
[1] 1263508
 > system.time(lapply(ids, function(i) df[match(i, df$id),,drop=FALSE]))
user  system elapsed
3.000.003.03
 >
 > # use a data frame with automatic row names (should be low overhead) 
and id as factor
 > df <- data.frame(id=paste("ID", seq(len=n), sep=""), 
result=seq(len=n), row.names=NULL, stringsAsFactors=TRUE)
 > object.size(df)
[1] 440648
 > df[1:3,,drop=FALSE]
id result
1 ID1  1
2 ID2  2
3 ID3  3
 > set.seed(1)
 > ids <- lapply(seq(k), function(i) paste("ID", sample(n, 
size=sample(seq(ceiling(n/1000), n/2, 1))), sep=""))
 > sum(sapply(ids, length))
[1] 1263508
 > system.time(lapply(ids, function(i) df[match(i, df$id),,drop=FALSE]))
user  system elapsed
2.680.002.70
 >
 > # use a data frame with automatic row names (should be low overhead) 
and id as character
 > df <- data.frame(id=paste("ID", seq(len=n), sep=""), 
result=seq(len=n), row.names=NULL, stringsAsFactors=FALSE)
 > object.size(df)
[1] 400448
 > df[1:3,,drop=FALSE]
id result
1 ID1  1
2 ID2  2
3 ID3  3
 > set.seed(1)
 > ids <- lapply(seq(k), function(i) paste("ID", sample(n, 
size=sample(seq(ceiling(n/1000), n/2, 1))), sep=""))
 > sum(sapply(ids, length))
[1] 1263508
 > system.time(lapply(ids, function(i) df[match(i, df$id),,drop=FALSE]))
user  system elapsed
1.540.001.59
 >
 > # use a data frame with ids as the row names & subscripting for 
matching (should be high overhead)
 > df <- data.frame(id=paste("ID", seq(len=n), sep=""), 
result=seq(len=n), row.names="id")
 > object.size(df)
[1] 400384
 > df[1:3,,drop=FALSE]
 result
ID1  1
ID2  2
ID3  3
 > set.seed(1)
 > ids <- lapply(seq(k), function(i) paste("ID", sample(n, 
size=sample(seq(ceiling(n/1000), n/2, 1))), sep=""))
 > sum(sapply(ids, length))
[1] 1263508
 > system.time(lapply(ids, function(i) df[i,,drop=FALSE]))
user  system elapsed
  109.150.04  111.28
 >
 > # use a data frame with ids as the row names & match()
 > df <- data.frame(id=paste("ID", seq(len=n), sep=""), 
result=seq(len=n), row.names="id")
 > object.size(df)
[1] 400384
 > df[1:3,,drop=FALSE]
 result
ID1  1
ID2  2
ID3  3
 > set.seed(1)
 > ids <- lapply(seq(k), function(i) paste("ID", sample(n, 
size=sample(seq(ceiling(n/1000), n/2, 1))), sep=""))
 > sum(sapply(ids, length))
[1] 1263508
 > system.time(lapply(ids, function(i) df[match(i, 
rownames(df)),,drop=FALSE]))
user  system elapsed
1.530.001.58
 >
 > # use a named numeric vector to store the same data as was stored in 
the data frame
 > x <- seq(len=n)
 > names(x) <- paste("ID", seq(len=n), sep="")
 > object.size(x)
[1] 400104
 > x[1:3]
ID1 ID2 ID3
   1   2   3
 > set.seed(1)
 > ids <- lapply(seq(k), function(i) paste("ID", sample(n, 
size=sample(seq(ceiling(n/1000), n/2, 1))), sep=""))
 > sum(sapply(ids, length))
[1] 1263508
 > system.time(lapply(ids, function(i) x[match(i, names(x))]))
user  system elapsed
1.140.051.19
 >





Iestyn Lewis wrote:
> Good tip - an Rprof trace over my real data set resulted in a file 
> filled with:
> 
> pmatch [.data.frame [ FUN lapply
> pmatch [.data.frame [ FUN lapply
> pmatch [.data.frame [ FUN lapply
> pmatch [.data.frame [ FUN lapply
> pmatch [.data.frame [ FUN lapply
> ...
> with very few other calls in there.  pmatch seems to be the string 
> search function, so I'm guessing there's no hashing going on, or not 
> very good hashing.
> 
> I'll let you know how the environment option works - the Bioconductor 
> project seems to make extensive use of it, so I'm guessing it's the way 
> to go.
> 
>

Re: [R] Replacement in an expression - can't use parse()

2007-03-27 Thread Tony Plate
Peter Dalgaard wrote:
> Daniel Berg wrote:
>> Dear all,
>>
>> Suppose I have a very long expression e. Lets assume, for simplicity, that 
>> it is
>>
>> e = expression(u1+u2+u3)
>>
>> Now I wish to replace u2 with x and u3 with 1. I.e. the 'new'
>> expression, after replacement, should be:
>>
>>   
>>> e
>>> 
>> expression(u1+x+1)
>>
>> My question is how to do the replacement?
>>
>> I have tried using:
>>
>>   
>>> e = parse(text=gsub("u2","x",e))
>>> e = parse(text=gsub("u3",1,e))
>>> 
>> Even though this works fine in this simple example, the use of parse
>> when e is very long will fail since parse has a maximum line length
>> and will cut my expressions. I need to keep mode(e)=expression since I
>> will use e further in symbolic derivation and division.
>>
>> Any suggestions are most welcome.
>>   
> The short answer is substitute().
> 
> However, this is not entirely trivial to apply if you have your
> expression already inside an expression() object.
> 
> The easy thing to do is
> 
>> substitute(u1+u2+u3, list(u2=quote(x),u3=1))
> u1 + x + 1
> 
> but notice that this "autoquotes" the first argument, so
> 
>> substitute(e, list(u2=quote(x),u3=1))
> e
> 
> which is pretty much useless.
> 
> (Arguably it would have been a better design to avoid this feature and
> require substitute(quote(.)) for the former case.)
> 
> The way around this is to add a further layer of substitute() to insert
> the value of e:
> 
> eval(substitute(substitute(call,list(u2=quote(x),u3=1)),list(call=e[[1]])))
> u1 + x + 1
> 
> Notice that substitute will not go inside expression objects, so we need
> to extract the mode "call" object using e[[1]]. Also, the result is
> "call" not "expression". You may need an as.expression construct around
> the result to get exactly what you asked.
> 

I usually use do.call() to do this kind of thing:

 > e <- expression(u1+u2+u3)
 > e
expression(u1 + u2 + u3)
 > do.call("substitute", list(e[[1]], list(u2=quote(x),u3=1)))
u1 + x + 1
 >

(and of course one can wrap the result in as.expression() to get an 
expression back).

Are there any circumstances where this construct will produce different 
results to the nested substitute suggested by Peter?

-- Tony

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Prefered date and date/time classes

2007-03-27 Thread Tony Plate
I put a list of date/time classes and pointers to documents describing 
them on the R-wiki at 
http://wiki.r-project.org/rwiki/doku.php?id=guides:times-dates

The various reasons one might use each of them are described in the 
documents. (If anyone feels like adding summaries to the "tips" section 
on the Wiki, please go ahead!)

-- Tony Plate


Petr Pikal wrote:
> Hi
> 
> On 27 Mar 2007 at 9:09, Charles Dupont wrote:
> 
> Date sent:Tue, 27 Mar 2007 09:09:27 -0500
> From: Charles Dupont <[EMAIL PROTECTED]>
> Organization: Vanderbilt University; Department of Biostatistics 
> To:   r-help@stat.math.ethz.ch
> Subject:  [R] Prefered date and date/time classes
> Send reply to:[EMAIL PROTECTED]
>   <mailto:[EMAIL PROTECTED]>
>   <mailto:[EMAIL PROTECTED]>
> 
>> What are the preferred date, and data/time classes for R?
> 
> It is probably a personal choice. You can use POSIX, chron or other 
> options. They are nicely described in RNEWS 4-1 in section Help Desk.
> 
> Regards
> Petr
> 
> 
>> Thanks
>>
>> Charles Dupont
>>
>>
>> -- 
>> Charles Dupont   Computer System Analyst School of Medicine
>>   Department of BiostatisticsVanderbilt University
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html and provide commented,
>> minimal, self-contained, reproducible code.
> 
> Petr Pikal
> [EMAIL PROTECTED]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data.frame handling

2007-03-19 Thread Tony Plate
'table()' can compute your desired result in this particular case 
(though I don't know if it's what you want in general):

 > y <- factor(c("a","b","c")[c(1,1,1,2,2,3,3,3)])
 > x <- factor(c("x","y","z")[c(1,2,3,1,2,1,2,3)])
 > table(x, y)
y
x   a b c
   x 1 1 1
   y 1 1 1
   z 1 0 1
 >

If x and y are already columns in a data frame, then just do

 > table(X$factor1, X$factor2)

hope this helps,

Tony Plate


Michela Cameletti wrote:
> Dear R-users,
> I have a little problem that I can't solve by myself.
> I have a data frame with 2 factors and 8 observations (see the following
> code):
> 
>  y <- c(1,1,1,2,2,3,3,3)
>  y <- factor(y)
>  levels(y) <- c("a","b","c")
>  x <- c(1,2,3,1,2,1,2,3)
>  x <- factor(x)
>  levels(x) <- c("x","y","z")
>  X  <- data.frame(factor1=x,factor2=y)
> 
> and the final result is
> 
>   factor1 factor2
> 1   x   a
> 2   y   a
> 3   z   a
> 4   x   b
> 5   y   b
> 6   x   c
> 7   y   c
> 8   z   c
> 
>>From the above data I'd like to obtain the following matrix:
>   a   b   c
> x 1   1   1
> y 1   1   1
> z 1   0   1
> 
> Do you have any advice? Can you help me please?
> Thank you in advance,
> Michela
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] timeDate & business day

2007-03-13 Thread Tony Plate
There are two articles describing time and date classes in the R-News 
letter:

Brian D. Ripley and Kurt Hornik. Date-time classes. R News, 1(2):8-11, 
June 2001.
http://cran.r-project.org/doc/Rnews/Rnews_2001-2.pdf

Gabor Grothendieck and Thomas Petzoldt. R help desk: Date and time 
classes in R. R News, 4(1):29-32, June 2004.
http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf

The Ripley and Hornik article discusses the "POSIXt" (Posix time) 
classes: "POSIXlt" (POSIX local time) and "POSIXct" (POSIX calendar time).

The Grothendieck and Petzoldt article discusses the "Date", "chron" and 
"POSIXt" classes, and has a very helpful table of how to do various 
operations on "Date", "chron" and "POSIXct" objects.

There is also the fCalandar package, which includes a timeDate class and 
has support for holidays, operations on timeDate objects, and various 
other features useful for dealing with times and dates as they are used 
in financial data.

Obviously, there is the online help for the fCalendar package, but there 
are also three other documents describing how to work with timeDate objects:

Computing with R and S-Plus For Financial Engineers 1 - Part I - 
Markets, Basic Statistics, Date and Time Management, Diethelm W¨urtz
http://www.itp.phys.ethz.ch/econophysics/R/docs/fBasics.pdf

R and Rmetrics for Teaching.  Financial Engineering and Computational 
Finance, Part II, Dates, Time, and, Calendars, Diethelm W¨urtz
http://www.itp.phys.ethz.ch/econophysics/R/docs/rCalendar.pdf

S4 ’timeDate’ and ’timeSeries’ Classes for R, Diethelm W¨urtz
http://www.itp.phys.ethz.ch/econophysics/R/pdf/calendar.pdf


-- Tony Plate


Michael Toews wrote:
> Sadly, I don't know of any tutorials or much help on the web for R ... 
> that doesn't mean it doesn't exist ... you might just have to look 
> around for it (www.rseek.org is a good place to start)
> I've learned almost everything I know through:
> ?strptime
> 
> Also check out the methods for the classes, for example:
> 
> methods(class="Date")
> methods(class="POSIXct")
> 
> And certainly check their help pages ... there is loads of stuff here 
> that I haven't discovered myself. (Note, if you are new to S3 classes .. 
> if it begins with the method, then "." class, you only need to type the 
> beginning. For example "summary(ymd)" ... not "summary.Date(ymd)" if 
> "ymd" has `class(ymd) == "Date" `.
> 
> I think the fundamental things to know are there are three main 
> DateTimeClasses:
> 
>   1. "POSIXct" - has date, time and optionally time-zone info -- very
>  handy for using in data.frame objects (and frankly I think it
>  should be renamed to "DateTime" since the class "POSIXct" has
>  nothing really to do directly with date/times)
>   2. "POSIXlt" - as far as I'm concerned, this is has the same
>  functionality as "POSIXct", but it cannot be used in data.frame
>  objects (and frankly, I think it should be deprecated in favour of
>  #1 to reduce future confusion)
>   3. "Date" - use this if you don't care about times or time-zones
> 
> But it would be nice to track down a good tutorial somewhere.
> +mt
> 
> Young Cho wrote:
>> Thanks so Michael! If you know of a tutorial or introductory document 
>> about timeDate manipulation or time series manipulation in R, can you 
>> share it? It is hard to find by googling... I'd very appreciate any 
>> advice.
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] timeDate & business day

2007-03-13 Thread Tony Plate
The R timeDate class is in the fCalendar package.

Does anyone know how to change the output format of a timeDate object? 
(Other than by explicitly supplying a format= argument to the format() 
function.)  I tried creating a timeDate object, and then changing the 
format slot.  However, all the functions I used on the object ('print', 
'format', 'as.character', 'show') seemed to ignore the value in the 
format slot.

And does anyone else find it a little confusing that print() and show() 
convert timeDate to the local time zone, but as.character() and format() 
display it in the time zone of its "FinCenter" slot?

Here is a transcript:

 > library(fCalendar)
 > tt <- c("2005-01-04", "2005-01-05", "2005-01-06", "2005-01-07")
 > x <- timeDate(tt)
 > [EMAIL PROTECTED]
[1] "%Y-%m-%d"
 > # Change the format on the timeDate object
 > [EMAIL PROTECTED] <- "%Y%m%d"
 > x
An object of class "timeDate"
Slot "Data":
[1] "2005-01-03 17:00:00 Mountain Standard Time"
[2] "2005-01-04 17:00:00 Mountain Standard Time"
[3] "2005-01-05 17:00:00 Mountain Standard Time"
[4] "2005-01-06 17:00:00 Mountain Standard Time"

Slot "Dim":
[1] 4

Slot "format":
[1] "%Y%m%d"

Slot "FinCenter":
[1] "GMT"

 > # Can get what I want by explicitly supplying format
 > # argument to format()
 > format(x, format="%Y%m%d")
[1] "20050104" "20050105" "20050106" "20050107"
 > # But format() seems to ignore the format slot
 > format(x)
[1] "2005-01-04" "2005-01-05" "2005-01-06" "2005-01-07"
 > print(x)
GMT
[1] [2005-01-04] [2005-01-05] [2005-01-06] [2005-01-07]
 > as.character(x)
[1] "2005-01-04" "2005-01-05" "2005-01-06" "2005-01-07"
attr(,"control")
FinCenter
 "GMT"
 >
 > sessionInfo()
R version 2.4.1 (2006-12-18)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] "stats" "graphics"  "grDevices" "utils" "datasets"  "methods"
[7] "base"

other attached packages:
   fCalendar fEcofin
"240.10068" "240.10067"
 > Sys.getenv("TZ")
TZ
""
 >

-- Tony Plate

Michael Toews wrote:
> Those numbers look like ... well, numbers. You want characters! Try 
> converting the integer to a character before trying to do a string 
> parse, e.g.:
> 
> ymd.int <- c(20050104, 20050105, 20050106, 20050107, 20050110, 20050111, 
> 20050113, 20050114)
> ymd <- as.Date(as.character(ymd.int),"%Y%m%d")
> 
> As far as the other functions you are looking at ("timeDate", 
> "timeRelative") -- I've never seen these, so I'm guessing they are 
> S-PLUS. In R, you can use "diff" or "difftime" (which works with "Date" 
> and "POSIXlt"-or Date-Time classes) , e.g.:
> 
> diff(ymd)
> diff(ymd,2)
> diff(ymd,3)
> 
> or do some arithmetic:
> 
> difftime(ymd[1],ymd[4])
> difftime(ymd[1],ymd[4],unit="weeks")
> 
> Hopefully this is helpful to you!
> +mt
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] optim(method="L-BFGS-B") abnormal termination

2007-02-23 Thread Tony Plate
I usually see this message only when my gradient and objective functions 
  do not match each other.  I debug by comparing a finite difference 
approximation to the gradient with the result of the gradient function.

I think you can also run optim() without supplying a gr() function - 
optim() will then use a finite difference approximation.  If optim() 
works fine like this with your function, that's a strong sign that your 
gradient function doesn't match your objective function.

It is of course possible that your gradient function is properly 
specified, and the function along the line being searched is so badly 
behaved that the line search can't find a minimum in 20 steps.  If 
that's the case you might want to look in scaling issues, or 
reformulating the problem.

It's also possible that even if you have a theoretically well-behaved 
objective and gradient, your computation of may be subject to rounding 
error and giving apparently discontinuous results to optim().

I'd look into all of the above possibilities before I tried increasing 
the limit of 20 evaluations in the line search - in my experience 20 
steps is plenty to find an adequate point for a reasonably well-behaved 
function.  It may be possible to increase the number of steps, but I 
don't see how from the docs for ?optim.  Of course, the source is available.

hope this helps,

Tony Plate

Petr Klasterecky wrote:
> Hi,
> my call of optim() with the L-BFGS-B method ended with the following 
> error message: ERROR: ABNORMAL_TERMINATION_IN_LNSRCH
> 
> Further tracing shows:
> Line search cannot locate an adequate point after 20 function and 
> gradient evaluations
> final  value 0.086627
> stopped after 7 iterations
> 
> Could someone pls tell me whether it is possible to increase the limit 
> of 20 evaluations? Is it even worth doing so?
> 
> My function(s) to be minimized are polynomial functions of tens of 
> variables - let say 10 - 60 variables, all of them constrained to the 
> (0,1) interval. Is it even possible and meaningfull to attempt such 
> minimization? (Suppose I have good starting values.)
> 
> Thaks, Petr

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to print a double quote

2007-02-22 Thread Tony Plate
 > cat('Open fnd "test"\n')
Open fnd "test"
 > cat("Open fnd \"test\"\n")
Open fnd "test"
 >

Bos, Roger wrote:
> Can anyone tell me how to get R to include a double quote in the middle
> of a character string?  
> 
> For example, the following code is close:
> 
>> fnd<-"Open fnd 'test'"
>>cat(fnd)
> Open fnd 'test'>
> 
> But instead of Open fnd 'test' I need: Open fnd "test".  Difference
> seems minor, but I am writing batch files for another program to read in
> and it has to have the double quotes to work.  
> 
> Thanks in advance for any help or ideas,
> 
> Roger
> 
> ** * 
> This message is for the named person's use only. It may 
> contain confidential, proprietary or legally privileged 
> information. No right to confidential or privileged treatment 
> of this message is waived or lost by any error in 
> transmission. If you have received this message in error, 
> please immediately notify the sender by e-mail, 
> delete the message and all copies from your system and destroy 
> any hard copies. You must not, directly or indirectly, use, 
> disclose, distribute, print or copy any part of this message 
> if you are not the intended recipient.
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] indexing

2007-02-01 Thread Tony Plate
 > a <- data.frame(value=c(6.5,7.5,8.5,12.0),class=c(1,3,5,2))
 > x <- c(1,1,2,7,6,5,4,3,2,2,2)
 > match(x, a$class)
  [1]  1  1  4 NA NA  3 NA  2  4  4  4
 > a[match(x, a$class), "value"]
  [1]  6.5  6.5 12.0   NA   NA  8.5   NA  7.5 12.0 12.0 12.0
 >

-- Tony Plate

javier garcia-pintado wrote:
> Hello,
> In a nutshell, I've got a data.frame like this:
> 
> 
>>assignation <- data.frame(value=c(6.5,7.5,8.5,12.0),class=c(1,3,5,2))
>>assignation
> 
>   value class
> 1   6.5 1
> 2   7.5 3
> 3   8.5 5
> 4  12.0 2
> 
>>  
> 
> 
> and a long vector of classes like this:
> 
> 
>>x <- c(1,1,2,7,6,5,4,3,2,2,2...)
> 
> 
> And would like to obtain  a vector of length = length(x), with the
> corresponding values extracted from assignation table. Like this:
> 
>>x.value
> 
>  [1]  6.5  6.5 12.0   NA   NA  8.5   NA  7.5 12.0 12.0 12.0
> 
> Could you help me with an elegant way to do this ?
> (I just can do it with looping for each class in the assignation table,
> what a think is not perfect in R's sense)
> 
> Wishes,
> Javier
> 
> 
> 
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Outlook does threading

2007-01-31 Thread Tony Plate
Your final paragraph has the take-home message for everyone (not just MS 
Outlook users): "just create, from scratch, a new message when 
initiating a new subject."

Viewing threads can be completely different to sorting based on the 
subject line.  Your initial post with the subject "regexpr and parsing 
question" was in fact a reply to the message from Gabor Grothendick in 
the thread "Re: [R] change plotting symbol for groups in trellis graph." 
   (I can see this by looking at the header information: I see a 
"In-reply-to:" header item.)

When I view threads in the Thunderbird mail reader, your post and 
replies with the subject "regexpr and parsing question" do in fact show 
up under the thread in which Gabor's message appeared, not in their own 
thread.

According to 
http://office.microsoft.com/en-us/outlook/HA011356671033.aspx, one can 
view threads in Outlook by selecting "View->Arrange By->Conversation".

Hope this helps (in case the horse was not thoroughly dead already.)

-- Tony Plate

Kimpel, Mark William wrote:
> See below for Bert Gunter's off list reply to me (which I do
> appreciate). I'm putting it back on the list because it seems there is
> still confusion regarding the difference between threading and sorting
> by subject. I thought the example I will give below will serve as
> instructional for other Outlook users who may be similarly confused as I
> was (am?). 
> 
> Per Bert's instructions, I just set up my inbox to sort by subject. I
> sent one email to myself with the subject "test1" and then replied to it
> without changing the subject. The reply correctly went to "test1" in the
> inbox sorter. I then changed the subject heading in the test1 reply to
> "test2" and sent it to myself. This time Outlook re-categorized it and
> put it in a separate compartment in the view called "test2".
> 
> If Outlook can do threading the way the R mail server does, I don't
> think this is the way to do it.
> 
> Unless someone has an idea of how to correctly set up Outlook to do
> threading in the manner that the R mail server does, I think the message
> for us Outlook users is to just create, from scratch, a new message when
> initiating a new subject.
> 
> Thanks for all your help. 
> 
> Mark
> 
> -Original Message-
> From: Bert Gunter [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, January 31, 2007 7:03 PM
> To: Kimpel, Mark William
> Subject: Outlook does threading
> 
>  Mark:
> 
> No need to bother the R list with this. Outlook does threading. Just
> sort on
> Subject in the viewer.
> 
> Bert Gunter
> Genentech Nonclinical Statistics
> South San Francisco, CA 94404
> 650-467-7374
> 
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Kimpel, Mark
> William
> Sent: Wednesday, January 31, 2007 3:36 PM
> To: Peter Dalgaard
> Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED]
> Subject: Re: [R] possible spam alert
> 
> Peter,
> 
> Thanks you for your explanation, I had taken Mr. Connolly's message to
> me to imply that I was not changing the subject line. I use MS Outlook
> 2007 and, unless I am just not seeing it, Outlook does not normally
> display the "in reply to" header, I was under the mistaken impression
> that that was what the Subject line was for. See, for example, the
> header to your message to me below. Outlook will, however, sort messages
> by Subject, and that is what I thought was meant by threading.
> 
> Well, I learned something today and apologize for any inconvenience my
> posts may have caused.
> 
> BTW, I use Outlook because it is supported by my university server and
> will synch my appointments and contacts with my PDA, which runs Windows
> CE. If anyone has a suggestion for me of a better email program that
> will provide proper threading AND work with a MS email server and synch
> with Windows CE, I'd love to hear it.
> 
> Thanks again,
> 
> Mark
> 
> Mark W. Kimpel MD 
> 
>  
> 
> (317) 490-5129 Work, & Mobile
> 
>  
> 
> (317) 663-0513 Home (no voice mail please)
> 
> 1-(317)-536-2730 FAX
> 
> 
> -Original Message-
> From: Peter Dalgaard [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, January 31, 2007 6:25 PM
> To: Kimpel, Mark William
> Cc: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch
> Subject: Re: [R] possible spam alert
> 
> Kimpel, Mark William wrote:
> 
>>The last two times I have originated message threads on R or
>>Bioconductor I have received the message included below from someone
>>named Patrick Connolly. Both times I was the originator of the message

Re: [R] Simple Date problems with cbind

2007-01-30 Thread Tony Plate

> It is probably something blindingly simple but can
> anyone suggest something?

You need to use the format code "%Y" for 4-digits years.
You need to create a data frame using 'data.frame()' (cbind() creates a 
matrix when given just vectors).

 > as.Date(c("2005/01/24" ,"2006/01/23" ,"2006/01/23"), "%Y/%m/%d")
[1] "2005-01-24" "2006-01-23" "2006-01-23"
 > data.frame(int=1:3, date=as.Date(c("2005/01/24" ,"2006/01/23" 
,"2006/01/23"), "%Y/%m/%d"))
   int   date
1   1 2005-01-24
2   2 2006-01-23
3   3 2006-01-23
 > (x <- data.frame(int=1:3, date=as.Date(c("2005/01/24" ,"2006/01/23" 
,"2006/01/23"), "%Y/%m/%d")))
   int   date
1   1 2005-01-24
2   2 2006-01-23
3   3 2006-01-23
 > class(x)
[1] "data.frame"
 > sapply(x, class)
   int  date
"integer""Date"
 >

-- Tony Plate

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to avoid test for NAs in foreign function call

2006-12-14 Thread Tony Plate
Supply NAOK=TRUE argument to .C; the help page for .C() contains the 
following:

Usage
 .C(name, ..., NAOK = FALSE, DUP = TRUE, PACKAGE)

Also, you might want to consider using the "raw" data type instead of 
integers -- that way you should have fewer problems with R code making 
unwanted interpretations of certain bit patterns.

-- Tony Plate

Knut M. Wittkowski wrote:
> We have packed logical vectors into integers, 32 flags at a time and 
> then want to AND or OR these vectors of "integers" using other C functions.
> 
> The problem: occasionally, the packed sequence of 32 logical values 
> resembles NA, causing the error message:
> 
> Error in bitAND(packed1, packed2, lenx) :
>  NAs in foreign function call (arg 1)
> 
> How does one instruct R to avoid checking for NAs?
> 
> Knut M. Wittkowski, PhD,DSc
> --
> The Rockefeller University,
> Center for Clinical and Translational Science
> Research Design and Biostatistics,
> 1230 York Ave #121B, Box 322, NY,NY 10021
> +1(212)327-7175, +1(212)327-8450 (Fax), [EMAIL PROTECTED]
> http://www.rockefeller.edu/ccts/rdbs.php
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ifelse question

2006-12-12 Thread Tony Plate
I think you can find your answer if you study this part of the 
documentation for ifelse:

Details:
If yes or no are too short, their elements are recycled. yes will be 
evaluated if and only if any element of test is true, and analogously 
for no.

Also, consider this call:

ifelse(1:12 > 5, 1:3, 11:14)

-- Tony Plate

Jacques Ropers wrote:
>>But you got only two (eventually one) distinct values, right? Look at
>>the code for 'ifelse': yes and no are only called once each, then
>>recycled to desired length.
>>
>>I guess you want something like
>>
>>x <- rnorm(10)
>>y <- rnorm(10)
>>z <- rnorm(10)
>>y1 <- ifelse(x > 0, y, z)
>>  
> 
> Thanks for the help.
> 
> Although this would do the trick, is there a way to call repetitively 
> rnorm (rpois...) *inside the ifelse* rather than constructing the vector 
> outside ? Like in the following where cos() and sin() functions are 
> evaluated for each row :
> x <- rnorm(10)
> y1 <- ifelse(x > 0, cos(x), sin(x))
> 
> I am trying to understand the difference of behaviour. R acts as if 
> rnorm(1) return value were known after the first call and does not 
> evaluate rnorm(1) in
> 
> y1 <- ifelse(x > 0, rnorm(1) ,  rnorm(1))
> 
> again after the first evaluation.
> 
> 
> Jacques.
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nonlinear statistical modeling -- a comparison of R and AD Model Builder

2006-11-24 Thread Tony Plate
Did you try supplying gradient information to nlminb?  (I note that 
nlminb is used for the optimization, but I don't see any gradient 
information supplied to it.) I would suspect that supplying gradient 
information would greatly speed up the computation (as you note in 
comments at http://otter-rsch.ca/tresults.htm.)

I'm curious -- when you say "R may not be a suitable platform for 
development for such models", what aspect of R do you feel is lacking? 
Is it the specific optimization routines available, or is it some other 
more general aspect?

Also, another optimization algorithm available in R is the "L-BFGS-B" 
method for optim() in the MASS package.  I've had extremely good 
experiences with using this code in S-PLUS.  It can take box 
constraints, and can use gradient information.  It is my first choice 
for most optimization problems, and I believe it is very widely used. 
Did you try using that optimization routine with this problem?

-- Tony Plate

dave fournier wrote:
> There has recently been some discussion on the list about
> AD Model builder and the suitability of R for constructing the
> types of models used in fisheries management.
> 
>https://stat.ethz.ch/pipermail/r-help/2006-January/086841.html
> 
>https://stat.ethz.ch/pipermail/r-help/2006-January/086858.html
> 
> I  think that many R users understimate the numerical challenges
> that some of the typical nonlinear statistical model used in different
> fields present. R may not be a suitable platform for development for
> such models.
> 
> Around 10 years ago John Schnute, Laura Richards, and Norm Olsen
> with Canadian federal fisheries undertook an investigation
> comparing various statistical modeling packages for a simple
> age-structured statistical model of the type commonly used in
> fisheries. They compared AD Mdel Builder, Gauss, Matlab, and
> Splus. Unfortunately a working model could not be produced with Splus
> so its times could not be included in the comparison. It is possible
> to produce a working model with the present day version of R so that
> R can now be directly compared with AD Model Builder for this type of model.
> 
> I have put the results of the test together with the original
> Schnute and Richards paper and the working R and AD Model Builder
> codes on Otter's web site
> 
>  http://otter-rsch.ca/tresults.htm
> 
> The results are that AD Model builder is roughly 1000 times faster than
> R for this problem. ADMB takes about 2 seconds to converge while
> R takes over 90 minutes.
> 
> This is a simple toy example. Real fisheries models are often hundred of
> times more computationally intensive as this one.
> 
> Cheers,
> 
>  Dave
> ~

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data storage/cubes and pointers in R

2006-11-09 Thread Tony Plate
What kind of operations do you need to be able to do?  I frequently use 
3 and higher dimensional arrays for storing data, and then I use 
indexing operations to extract slices of data, or sometimes apply() and 
friends to process the data.

The abind() function (in the 'abind' package) will bind together vectors 
and arrays into higher dimensional arrays -- it might come in handy for you.

-- Tony Plate

Piet van Remortel wrote:
> Hi all,
> 
> I am faced with the situation where I want to store/analyze  
> relatively large, organized sets of numerical data, which depend on a  
> number of conditions (biological properties, exposure times,  
> concentrations etc etc).  Imagine about a hundred dataframes of a few  
> thousand numerical values, with some annotation in text for some  
> entries.
> 
> Intuitively, I would like to be able to slice the data in a 'data- 
> cube' kind of way to query, analyze, cluster, fit etc., which  
> resembles the database data-cube way of thinking common in de db  
> world these days. ( http://en.wikipedia.org/wiki/Data_cube )
> 
> I have no knowledge of a package that supports such things in an  
> elegant way within R.  If this exists, please point me to it.
> 
> Also considering implementing a similar setup myself, I started  
> wondering about the possibility of use references (or "pointers"  
> aargh) to dataframes and store them in a list etc.   Separate lists  
> can then represent different 'views' on the shared instance  
> dataframes etc.   I have no knowledge if that is even possible in R,  
> and if that is even the smart way to do it.  If someone could provide  
> some help, that would be great.
> 
> Other option is of course to link to MySQL and do all data handling  
> in that way.  Also considering that.
> 
> Any thoughts/hints would be appreciated !
> 
> thanks,
> 
> Piet
> 
> 
> 
> --
> Dr. P. van Remortel
> Intelligent Systems Lab
> Dept. of Mathematics and Computer Science
> University of Antwerp
> Belgium
> http://www.islab.ua.ac.be
> +32 3 265 33 57 (secr.)
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem about using list element in for cycle

2006-10-23 Thread Tony Plate
Your problem is that you are using cat() on a factor.  Use 
as.character() or format() to convert the factor to character data, 
which cat will then print in the way you want.

 > x <- data.frame(L=letters[1:3])
 > x
   L
1 a
2 b
3 c
 > x$L
[1] a b c
Levels: a b c
 > cat(x$L, "\n")
1 2 3
 > cat(as.character(x$L), "\n")
a b c
 > cat(format(x$L), "\n")
a b c
 >


Hu Chen wrote:
> sorry, pressed "sent" by mistake.
> for example
> 
>>data <- read.csv("data.txt")
>>single
> 
> V1  V2
> 1  YHR165C  CG8877
> 2  YJL130C CG18572
> 3  YDL171C  CG9674
> 4  YKR054C  CG7507
> 5  YDL140C  CG1554
> 6  YLR106C CG13185
> 7  YGL206C  CG9012
> 8  YNL262W  CG6768
> 9  YER172C  CG5931
> 
> 
>>typeof(data)
> 
> [1] "list"
> 
>>for (i in 1:nrow(data)){
> 
>  cat(data[i,1]
>}
> 
> it'll not return things like "YHR165C" but number like 6,7,9..
> is this a new feature of list? how to turn off it.
> thanks
> 
> On 10/23/06, Hu Chen <[EMAIL PROTECTED]> wrote:
> 
>>for example
>>data <- read.csv("data.txt")
>>typeof(data)
>>[1] "list"
>>for (i in 1:nrow(data)){
>>
>>
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] not understanding a do.call

2006-10-18 Thread Tony Plate
Suppose you have a list of equal-length numeric vectors and you want to 
bind them together in a matrix.  You want to a piece of code that will 
work no matter how many vectors are in the list.  That's what this 
construct with do.call() is useful for,

e.g.:

 > a <- 1:3
 > b <- 4:6
 > c <- 7:9
 > x1 <- list(a=a,b=b)
 > x2 <- list(a=a,b=b,c=c)
 > do.call("cbind", x1)
  a b
[1,] 1 4
[2,] 2 5
[3,] 3 6
 > do.call("cbind", x2)
  a b c
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
 >

-- Tony Plate

Leeds, Mark (IED) wrote:
> I did a ?do.call but i don't think i understand it.
>  
> if a, b,c,d are numeric vectors then could someone explain the
> difference between
>  
> do.call("cbind",list(a,b,c,d))
>  
> and cbind(a,b,c,d).
>  
> or point to an archive on it.
>  
> the return value of cbind is a matrix or dataframe depending on what is
> sent in but i don't 
> understand wheen it would be useful to use do.call. i realize it takes a
> list but that's
> all i know about why one use it ? thanks.
> 
> 
> This is not an offer (or solicitation of an offer) to buy/sell the 
> securities/instruments mentioned or an official confirmation.  Morgan Stanley 
> may deal as principal in or own or act as market maker for 
> securities/instruments mentioned or may advise the issuers.  This is not 
> research and is not from MS Research but it may refer to a research 
> analyst/research report.  Unless indicated, these views are the author's and 
> may differ from those of Morgan Stanley research or others in the Firm.  We 
> do not represent this is accurate or complete and we may not update this.  
> Past performance is not indicative of future returns.  For additional 
> information, research reports and important disclosures, contact me or see 
> https://secure.ms.com/servlet/cls.  You should not use e-mail to request, 
> authorize or effect the purchase or sale of any security or instrument, to 
> send transfer instructions, or to effect any other transactions.  We cannot 
> guarantee that any such requests received vi
a !
>  e-mail will be processed in a timely manner.  This communication is solely 
> for the addressee(s) and may contain confidential information.  We do not 
> waive confidentiality by mistransmission.  Contact me if you do not wish to 
> receive these communications.  In the UK, this communication is directed in 
> the UK to those persons who are market counterparties or intermediate 
> customers (as defined in the UK Financial Services Authority's rules).
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: rarefy a matrix of counts

2006-10-11 Thread Tony Plate
Two things to note:

(1) rep() can be vectorized:
 > rep(1:3, 2:4)
[1] 1 1 2 2 2 3 3 3 3
 >

(2) you will likely get much better performance if you work with 
integers and convert to strings after sampling (or use factors), e.g.:

 > c("red","green","blue")[sample(rep(1:3,c(400,100,300)), 5)]
[1] "red"  "blue" "red"  "red"  "red"
 >

-- Tony Plate

Brian Frappier wrote:
> I tried all of the approaches below. 
> 
> the problem with:
> 
>  > x <- data.frame(matrix(NA,100,3))
>  > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
>  > if you want result in data frame
>  > or
>  > x<-vector("list", 3)
>  > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)
> 
> is that this code still samples the rows, not the elements, i.e. returns 
> 100 or 300 in the matrix cells instead of "red" or a matrix of counts by 
> color (object type) like:
>x1x2   x3  
> red  32 560
> gr6895   40
> sum 100  100  100
> 
>  It looks like Tony is right: sampling without replacement requires 
> listing of all elements to be sampled.  But, the code Petr provided
> 
> x1 <- sample(c(rep("red",400),rep("green", 100),rep("black",300)),100)
> 
> did give me a clue of how to quickly make such a list using the 'rep' 
> command.  I will for-loop a rep statement using my original matrix to 
> create a list of elements for each sample:
> 
> Thanks Petr and Tony for your help!
> 
> On 10/11/06, *Tony Plate* <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> 
> wrote:
> 
> Here's a way using apply(), and the prob= argument of sample():
> 
>  > df <- data.frame(sample1=c(red=400,green=100,black=300),
> sample2=c(300,0,1000), sample3=c(2500,200,500))
>  > df
>sample1 sample2 sample3
> red   400 3002500
> green 100   0 200
> black 3001000 500
>  > set.seed(1)
>  > apply(df, 2, function(counts) sample(seq(along=counts), rep=T,
> size=7, prob=counts))
>   sample1 sample2 sample3
> [1,]   1   3   1
> [2,]   1   3   1
> [3,]   3   3   1
> [4,]   2   3   2
> [5,]   1   3   1
> [6,]   2   3   1
> [7,]   2   3   3
>  >
> 
> Note that this does sampling WITH replacement.
> AFAIK, sampling without replacement requires enumerating the entire
> population to be sampled from.  I.e., you cannot do
>  > sample(1:3, prob=1:3, rep=F, size=4)
> instead of
>  > sample(c(1,2,2,3,3,3), rep=F, size=4)
> 
> -- Tony Plate
> 
>  From reading ?sample, I was a little unclear on whether sampling
> without replacement could work
> 
> Petr Pikal wrote:
>  > Hi
>  >
>  > a litle bit different story. But
>  >
>  > x1 <- sample(c(rep("red",400),rep("green", 100),
>  > rep("black",300)),100)
>  >
>  > is maybe close. With data frame (if it is not big)
>  >
>  >
>  >>DF
>  >
>  >   color sample1 sample2 sample3
>  > 1   red 400 3002500
>  > 2 green 100   0 200
>  > 3 black 3001000 500
>  >
>  > x <- data.frame(matrix(NA,100,3))
>  > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
>  > if you want result in data frame
>  > or
>  > x<-vector("list", 3)
>  > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)
>  >
>  > if you want it in list. Maybe somebody is clever enough to discard
>  > for loop but you said you have 80 columns which shall be no problem.
>  >
>  > HTH
>  > Petr
>  >
>  >
>  >
>  >
>  >
>  >
>  >
>  > On 11 Oct 2006 at 10:11, Brian Frappier wrote:
>  >
>  > Date sent:Wed, 11 Oct 2006 10:11:33 -0400
>  > From: "Brian Frappier" < [EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>>
>  > To:   "Petr Pikal" <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>>
>  > Subject:  Fwd: [R] rarefy a matrix of counts
>  >
>  >
>  >>-- Forwarded message --
>

Re: [R] Fwd: rarefy a matrix of counts

2006-10-11 Thread Tony Plate
Here's a way using apply(), and the prob= argument of sample():

 > df <- data.frame(sample1=c(red=400,green=100,black=300), 
sample2=c(300,0,1000), sample3=c(2500,200,500))
 > df
   sample1 sample2 sample3
red   400 3002500
green 100   0 200
black 3001000 500
 > set.seed(1)
 > apply(df, 2, function(counts) sample(seq(along=counts), rep=T, 
size=7, prob=counts))
  sample1 sample2 sample3
[1,]   1   3   1
[2,]   1   3   1
[3,]   3   3   1
[4,]   2   3   2
[5,]   1   3   1
[6,]   2   3   1
[7,]   2   3   3
 >

Note that this does sampling WITH replacement.
AFAIK, sampling without replacement requires enumerating the entire 
population to be sampled from.  I.e., you cannot do
 > sample(1:3, prob=1:3, rep=F, size=4)
instead of
 > sample(c(1,2,2,3,3,3), rep=F, size=4)

-- Tony Plate

 From reading ?sample, I was a little unclear on whether sampling 
without replacement could work

Petr Pikal wrote:
> Hi
> 
> a litle bit different story. But
> 
> x1 <- sample(c(rep("red",400),rep("green", 100), 
> rep("black",300)),100)
> 
> is maybe close. With data frame (if it is not big)
> 
> 
>>DF
> 
>   color sample1 sample2 sample3
> 1   red 400 3002500
> 2 green 100   0 200
> 3 black 3001000 500
> 
> x <- data.frame(matrix(NA,100,3))
> for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
> if you want result in data frame
> or
> x<-vector("list", 3)
> for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)
> 
> if you want it in list. Maybe somebody is clever enough to discard 
> for loop but you said you have 80 columns which shall be no problem.
> 
> HTH
> Petr
> 
> 
> 
> 
> 
> 
> 
> On 11 Oct 2006 at 10:11, Brian Frappier wrote:
> 
> Date sent:Wed, 11 Oct 2006 10:11:33 -0400
> From: "Brian Frappier" <[EMAIL PROTECTED]>
> To:   "Petr Pikal" <[EMAIL PROTECTED]>
> Subject:  Fwd: [R] rarefy a matrix of counts
> 
> 
>>-- Forwarded message --
>>From: Brian Frappier <[EMAIL PROTECTED]>
>>Date: Oct 11, 2006 10:10 AM
>>Subject: Re: [R] rarefy a matrix of counts
>>To: r-help@stat.math.ethz.ch
>>
>>Hi Petr,
>>
>>Thanks for your response.  I have data that looks like the following:
>>
>>   sample 1 sample 2 sample 3  
>>red candy400 300   2500
>>green candy1000  200
>>black candy 3001000500
>>
>>I don't want to randomly select either the samples (columns) or the
>>"candy" types (rows), which sample as you state would allow me. 
>>Instead, I want to randomly sample 100 candies from each sample and
>>retain info on their associated type.  I could make a list of all the
>>candies in each sample:
>>
>>sample 1
>>red
>>red
>>red
>>red
>>green
>>green
>>black
>>red
>>black
>>...
>>
>>and then randomly sample those rows.  Repeat for each sample.  But, I
>>am not sure how to do that without alot of loops, and am wondering if
>>there is an easier way in R.  Thanks!  I should have laid this out in
>>the first email...sorry.
>>
>>
>>On 10/11/06, Petr Pikal <[EMAIL PROTECTED]> wrote:
>>
>>>Hi
>>>
>>>I am not experienced in Matlab and from your explanation I do not
>>>understand what exactly do you want. It seems that you want randomly
>>>choose a sample of 100 rows from your martix, what can be achived by
>>>sample.
>>>
>>>DF<-data.frame(rnorm(100), 1:100, 101:200, 201:300)
>>>DF[sample(1:100, 10),]
>>>
>>>If you want to do this several times, you need to save your result
>>>and than it depends on what you want to do next. One suitable form
>>>is list of matrices the other is array and you can use for loop for
>>>completing it.
>>>
>>>HTH
>>>Petr
>>>
>>>
>>>On 10 Oct 2006 at 17:40, Brian Frappier wrote:
>>>
>>>Date sent:  Tue, 10 Oct 2006 17:40:47 -0400
>>>From:   "Brian Frappier" <[EMAIL PROTECTED]>
>>>To: r-help@stat.math.ethz.ch Subject:   
>>>[R] rarefy a matrix of counts
>>>
>>>
>>>>Hi all,
>>

Re: [R] shifting a huge matrix left or right efficiently ?

2006-10-09 Thread Tony Plate
If you're able to work with the transpose of your matrix, you might 
consider the function 'filter()', e.g.:

 > filter(diag(1:5), c(2,3), sides=1)
Time Series:
Start = 1
End = 5
Frequency = 1
   [,1] [,2] [,3] [,4] [,5]
1   NA   NA   NA   NA   NA
234000
306600
400980
5000   12   10
 >

I don't know if the conversion to and from a time-series class will 
impact the timing, but if this might serve your purposes, it's easy to 
do some experiments to find out.

- Tony Plate

Huang-Wen Chen wrote:
> I'm wondering what's the best way to shift a huge matrix left or right.
> My current implementation is the following:
> 
> shiftMatrixL <- function(X, shift, padding=0) {
>   cbind(X[, -1:-shift], matrix(padding, dim(X)[1], shift))
> }
> 
> X <- shiftMatrixL(X, 1)*3 + shiftMatrixL(X,2)*5...
> 
> However, it's still slow due to heavy use of this function.
> The resulting matrix will only be read once and then discarded,
> so I believe the best implementation of this function is in C,
> manipulating the internal data structure of this matrix.
> Anyone know similar package for doing this job ?
> 
> Huang-Wen
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how ot replace the diagonal of a matrix

2006-10-03 Thread Tony Plate
You are indexing with numeric 0's and 1's, which will refer to only the 
matrix element 1,1 (multiple times), cf:

 > matrix(1:9,3)[diag(3)]
[1] 1 1 1
 >

Try one of these:

 > idx <- diag(3) > 0
 > idx <- which(diag(3)>0)
 > idx <- cbind(seq(len=n), seq(len=n))

(For very large matrices, the third will be more efficient, I believe.)

-- Tony Plate

roger bos wrote:
> Dear useRs,
> 
> Trying to replace the diagonal of a matrix is not working for me.  I
> want a matrix with .6 on the diag and .4 elsewhere.  The following
> code looks like it should work--when I lookk at mps and idx they look
> how I want them too--but it only replaces the first element, not each
> element on the diagonal.
> 
> mps <- matrix(rep(.4, 3*3), nrow=n, byrow=TRUE)
> idx <- diag(3)
> mps
> idx
> mps[idx] <- rep(.6,3)
> 
> I also tried something along the lines of diag(mps=.6, ...) but it
> didn't know what mps was.
> 
> Thanks,
> 
> Roger
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] List-manipulation

2006-09-29 Thread Tony Plate
Does this do what you want?

 > x <- list(1,2,3:7,8,9:10)
 > sapply(x, function(xx) xx[1])
[1] 1 2 3 8 9
 >

-- Tony Plate

Benjamin Otto wrote:
> Hi,
> 
>  
> 
> Sorry for the question, I know it should be basic knowledge but I'm
> struggling for two hours now.
> 
>  
> 
> How do I select only the first entry of each list member and ignore the
> rest?
> 
>  
> 
> So for 
> 
>  
> 
> 
>>$"121_at"
> 
> 
>>-113691170 
> 
> 
>  
> 
> 
>>$"1255_g_at"
> 
> 
>>42231151 
> 
> 
>  
> 
> 
>>$"1316_at"
> 
> 
>>35472685 35472588 
> 
> 
>  
> 
> 
>>$"1320_at"
> 
> 
>>-88003869
> 
> 
>  
> 
> I only want to select 
> 
>  
> 
> -113691170, 42231151, 35472685 and -88003869 .?
> 
>  
> 
> Regards
> 
> Benjamin
> 
> --
> Benjamin Otto
> Universitaetsklinikum Eppendorf Hamburg
> Institut fuer Klinische Chemie
> Martinistrasse 52
> 20246 Hamburg
> 
>  
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] symbolic matrix elements...

2006-09-18 Thread Tony Plate
If I construct the matrix by list()ing together the expressions rather 
than c()ing, then it works OK:

 > x <- matrix(list( expression(x3-5*x+4), expression(log(x2-4*x
 > x[1,1]
[[1]]
expression(x3 - 5 * x + 4)

 > x[[1,1]]
expression(x3 - 5 * x + 4)
 > D(x[[1,1]], "x")
-5
 >

The reason c() doesn't work properly here might have something to do 
with it creating a language object of an unconventional type:

 > c( expression(x3-5*x+4), expression(log(x2-4*x)))
expression(x3 - 5 * x + 4, log(x2 - 4 * x))
 > expression(x3-5*x+4)
expression(x3 - 5 * x + 4)
 >

Using list() with language objects is much safer if you just want to 
make lists of them.

-- Tony Plate

Evan Cooch wrote:
> 
> Eik Vettorazzi wrote:
> 
>>test=matrix(c( expression(x^3-5*x+4), expression(log(x^2-4*x
>>works.
> 
> Well, not really (or I'm misunderstanding). Your code enters fine (no 
> errors), but I can't access individual elements - e.g., test[1,1] gives 
> me an error:
> 
>  > test=matrix(c( expression(x^3-5*x+4), expression(log(x^2-4*x
>  > test[1,1]
> Error: matrix subscripting not handled for this type
> 
> Meaning...what?
> 
> 
>>btw. you recieved an error because D expects an expression and you 
>>offered a list
> 
> 
> OK - so why then are each of the elements identified as an expression 
> which I print out the vector? Each element is reported to be an 
> expression. OK, if so, then I remain puzzled as to how this is a 'list'.
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Access Rows in a Data Frame by Row Name

2006-09-13 Thread Tony Plate
Matrix-style indexing works for both columns and rows of data frames.

E.g.:
 > x <- data.frame(a=1:5, b=6:10, d=11:15)
 > x
   a  b  d
1 1  6 11
2 2  7 12
3 3  8 13
4 4  9 14
5 5 10 15
 > x[2:4,c(1,3)]
   a  d
2 2 12
3 3 13
4 4 14
 >

Time spend reading the help document "An Introduction to R" will 
probably be well worth it.  The relevant sections are "5 Arrays and 
matrices", and "6.3 Data frames".

-- Tony Plate

Michael Gormley wrote:
> I have created a data frame using the read.table command.  I want to be able 
> to access the rows by the row name, or a vector of row names. I know that you 
> can access columns by using the data.frame.name$col.name.  Is there a way to 
> access row names in a similar manner?
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rename cols

2006-09-11 Thread Tony Plate
The following works for data frames and matrices (you didn't say which 
you were working with).

 > x <- data.frame(V1=1:3,V2=4:6)
 > x
   V1 V2
1  1  4
2  2  5
3  3  6
 > colnames(x) <- c("Apple", "Orange")
 > x
   Apple Orange
1 1  4
2 2  5
3 3  6
 >

For a data frame, 'names(x) <- c("Apple", "Orange")' also works, because 
a dataframe is stored internally as a list of columns.

-- Tony Plate

Ethan Johnsons wrote:
> A quick question please!
> 
> How do you rename column names?  i.e. V1 --> Apple; V2 --> Orange, etc.
> 
> thx much
> 
> ej
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with putting objects in list

2006-09-06 Thread Tony Plate
I suspect you are not thinking about the list and the 
subsetting/extraction operators in the right way.

A list contains a number of components.

To get a subset of the list, use the '[' operator.  The subset can 
contain zero or more components of the list, and it is a list itself. 
So, if x is a list, then x[2] is a list containing a single component.

To extract a component from the list, use the '[[' operator.  You can 
only extract one component at a time.  If you supply a vector index with 
more than one element, it will index recursively.

 > x <- list(1,2:3,letters[1:3])
 > x
[[1]]
[1] 1

[[2]]
[1] 2 3

[[3]]
[1] "a" "b" "c"

 > # a subset of the list
 > x[2:3]
[[1]]
[1] 2 3

[[2]]
[1] "a" "b" "c"

 > # a list with one component:
 > x[2]
[[1]]
[1] 2 3

 > # the second component itself
 > x[[2]]
[1] 2 3
 > # recursive indexing
 > x[[c(2,1)]]
[1] 2
 > x[[c(3,2)]]
[1] "b"
 >

Rainer M Krug wrote:
> Hi
> 
> I use the following code and it stores the results of density() in the
> list dr:
> 
> dens <- function(run) { density( positions$X[positions$run==run], bw=3,
> cut=-2 ) }
> dr <- lapply(1:5, dens)
> 
> but the results are stored in dr[[i]] and not dr[i], i.e. plot(dr[[1]])
> works, but plot([1]) doesn't.
> 
> Is there any way that I can store them in dr[i]?
> 
> Thanks a lot,
> 
> Rainer
> 
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cannot get simple data.frame binding.

2006-08-28 Thread Tony Plate
Maybe I'm missing something, but your "Real life code" looks like it 
should work.  What happens when you do:

 > ire1 <- data.frame(md1[, 1:11], other)
Error in data.frame(md1[, 1:11], other) : arguments
imply differing number of rows: 11, 75
 > str(md1[, 1:11])
 > str(other)

?

Maybe the labelled data frame is causing the problem?  Did you try 
as.data.frame(md1[,1:11])? (I'm guessing that will strip off extra 
attributes).

-- Tony Plate

John Kane wrote:
> I am stuck on a simple problem where an example works
> fine but the real one does not.
> 
> I have a data.frame where I wish to sum up some values
> across the rows and create a new data.frame with some
> of old data.frame variables and the new summed
> variable.
> 
> It works fine in my simple example but I am doing
> something wrong in the real world.  In the real world
> I am loading a labeled data.frame. The orginal data
> comes from a spss file imported using spss.get but the
> current data.frame is a subset of the orginal spss
> file.
> 
> EXAMPLE
> cata <- c( 1,1,6,1,1,NA)
> catb <- c( 1,2,3,4,5,6)
> doga <- c(3,5,3,6,4, 0)
> dogb <- c(2,4,6,8,10, 12)
> rata <- c (NA, 9, 9, 8, 9, 8)
> ratb <- c( 1,2,3,4,5,6)
> bata <- c( 12, 42,NA, 45, 32, 54)
> batb <- c( 13, 15, 17,19,21,23)
> id <- c('a', 'b', 'b', 'c', 'a', 'b')
> site <- c(1,1,4,4,1,4)
> mat1 <-  cbind(cata, catb, doga, dogb, rata, ratb,
> bata, batb)
> 
> data1 <- data.frame(site, id, mat1)
> attach(data1)
> data1
> aa <- which(names(data1)=="rata")
> bb <- length(names(data1))
> 
> mat1 <- as.matrix(data1[,aa:bb])
> food <- apply( mat1, 1, sum , na.rm=T)
> food
> 
> abba <- data.frame(data1[, 1:6], food)
> abba
> 
> --
> Real life problem
> 
> 
>>load("C:/start/R.objects/partly.corrected.materials.Rdata")
>>md1<-partly.corrected.materials
>>aa <- which(names(md1)=="oaks")
>>bb <- length(names(md1))
>>
>># sum the values of the "other" variables
>>mat1 <- as.matrix( md1[, aa:bb] )
>>other <- apply(mat1,1, sum, na.rm=T)
>>ire1 <- data.frame(md1[, 1:11], other)
> 
> Error in data.frame(md1[, 1:11], other) : arguments
> imply differing number of rows: 11, 75
> 
> -
> 
> I have simply worked around the problem by using 
> ire1 <- data.frame(md1$site, md1$colour, md1$ss1 ... ,
> other) 
> but I would like to know what stupid thing I am doing.
> 
> Thanks
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex scares me

2006-08-28 Thread Tony Plate
I think this does the trick.  Note that it is case sensitive.

 > x <- c("lad.tab", "xxladyy.tab", "xxyy.tab", "lad.tabx", "LAD.tab", 
"lad.TAB")
 > grep("lad.*\\.tab$", x, value=T)
[1] "lad.tab" "xxladyy.tab"
 >

Jon Minton wrote:
> Hi, apologies if this is too simple but I've been stuck on the following for
> a while:
> 
>  
> 
> I have a vector of strings: filenames with a name before the extension and a
> variety of possible extensions
> 
>  
> 
> I want to select only those files with:
> 
>  1) a ".tab" extension
> 
> AND 
> 
> 2) the character sequence "lad" anywhere in the name of the file before the
> extension.
> 
>  
> 
> Surely this won't take long to do, I thought. (But I was wrong.)
> 
>  
> 
> What's the regexp pattern to specify here?
> 
>  
> 
> Thanks,
> 
>  
> 
> Jon Minton
> 
>  
> 
>  
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] meta characters in file path

2006-08-03 Thread Tony Plate
What is the problem you are having?  Seems to work fine for me running 
under Windows2000:

 > write.table(data.frame(a=1:3,b=4:6), file="@# x.csv", sep=",")
 > read.csv(file="@# x.csv")
   a b
1 1 4
2 2 5
3 3 6
 > sessionInfo()
Version 2.3.1 (2006-06-01)
i386-pc-mingw32

attached base packages:
[1] "methods"   "stats" "graphics"  "grDevices" "utils" "datasets"
[7] "base"

other attached packages:
  XML
"0.99-8"
 >

Li,Qinghong,ST.LOUIS,Molecular Biology wrote:
> Hi,
> 
> I need to read in some files. The file names contain come meta characters 
> such as @, #, and white spaces etc, In read.csv, file= option, is there any 
> way that one can make the function to recognize a file path with those 
> characters?
> 
> Thanks
> Johnny
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] deleting a directory

2006-08-01 Thread Tony Plate
?unlink says that unlink() can remove directories (and has a 'recursive' 
argument).  'unlink' is in the "SEE ALSO" section in ?file.remove.

-- Tony Plate

Sundar Dorai-Raj wrote:
> Hi, all,
> 
> I'm looking a utility for removing a directory from within R. Currently, 
> I'm using:
> 
> foo <- function(...) {
>mydir <- tempdir()
>dir.create(mydir, showWarnings = FALSE, recursive = TRUE)
>on.exit(system(sprintf("rm -rf %s", mydir)))
>## do some stuff in "mydir"
>invisible()
> }
> 
> However, this is assumes "rm" is available. I know of ?dir.create, but 
> there is no opposite. And ?file.remove appears to work only on files and 
> not directories.
> 
> Any advice? Or is my current approach the only solution?
> 
>  > R.version
> _
> platform   i386-pc-mingw32
> arch   i386
> os mingw32
> system i386, mingw32
> status
> major  2
> minor  3.1
> year   2006
> month  06
> day01
> svn rev38247
> language   R
> version.string Version 2.3.1 (2006-06-01)
> 
> 
> Thanks,
> 
> --sundar
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Functions ,Optim, & Dataframe

2006-07-31 Thread Tony Plate
I added an example of passing additional arguments through optim() to 
the objective and gradient functions to the Discussion section of the 
Wiki-fied R documentation.  See it at 
http://wiki.r-project.org/rwiki/doku.php?id=rdoc:stats:optim

-- Tony Plate

PS.  I had to add "&purge=true" to the end of the URL, i.e., 
http://wiki.r-project.org/rwiki/doku.php?id=rdoc:stats:optim&purge=true 
in order to see the original documentation the first time -- it's 
something to do with bad cache entries for the page.

Michael Papenfus wrote:
> I think I need to clarify a little further on my original question.
> 
> I have the following two rows of data:
> mydat<-data.frame(d1=c(3,5),d2=c(6,10),p1=c(.55,.05),p2=c(.85,.35))
>  >mydat
>   d1 d2 p1 p2
> 1 3 6 0.55 0.85
> 2 5 10 0.05 0.35
> 
> I need to optimize the following function using  optim for each row in mydat
> fr<-function(x) {
> u<-x[1]
> v<-x[2]
> sqrt(sum((plnorm(c(d1,d2,u,v)-c(p1,p2))^2))
> }
> x0<-c(1,1)# starting values for two unknown parameters
> y<-optim(x0,fr)
> 
> In my defined function fr, (d1 d2 p1 p2) are known values which I need 
> to read in from my dataframe and u & v are the TWO unknown parameters.  
> I want to solve this equation for each row of my dataframe.
> 
> I can get this to work when I manually plug in the known values (d1 d2 
> p1 p2).  However, I would like to apply this to each row in my dataframe 
> where the known values are automatically passed to my function which 
> then is sent to optim which solves for the two unknown parameters for 
> each row in the dataframe.
> 
> thanks again,
> mike
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Functions ,Optim, & Dataframe

2006-07-31 Thread Tony Plate
Supply your additional arguments to optim() and they will get passed to 
your function:

 > mydat<-data.frame(d1=c(3,5),d2=c(6,10),p1=c(.55,.05),p2=c(.85,.35))
 >
 > fr<-function(x, d) {
+ # d is a vector of d1, d2, p1 & p2
+ u <- x[1]
+ v <- x[2]
+ d1 <- d[1]
+ d2 <- d[2]
+ p1 <- d[3]
+ p2 <- d[4]
+ sqrt(sum((plnorm(c(d1,d2,u,v)-c(p1,p2))^2)))
+ }
 > x0 <- c(1,1)# starting values for two unknown parameters
 > y1 <- optim(x0,fr,d=unlist(mydat[1,]))
 > y2 <- optim(x0,fr,d=unlist(mydat[2,]))
 > y1$par
[1] 0.462500 0.828125
 > y2$par
[1] -1.0937500  0.2828125
 > yall <- apply(mydat, 1, function(d) optim(x0,fr,d=d))
 > yall[[1]]$par
[1] 0.462500 0.828125
 > yall[[2]]$par
[1] -1.0937500  0.2828125
 >

One thing you must be careful of is that none of the arguments to your 
function match or partially match the named arguments of optim(), which are:
 > names(formals(optim))
[1] "par" "fn"  "gr"  "method"  "lower"   "upper"   "control"
[8] "hessian" "..."
 >

For example, if your function has an argument 'he=', you will not be 
able to pass it, because if you say optim(x0, fr, he=3), the 'he' will 
match the 'hessian=' argument of optim(), and it will not be interpreted 
as being a '...' argument.

-- Tony Plate

Michael Papenfus wrote:
> I think I need to clarify a little further on my original question.
> 
> I have the following two rows of data:
> mydat<-data.frame(d1=c(3,5),d2=c(6,10),p1=c(.55,.05),p2=c(.85,.35))
>  >mydat
>   d1 d2 p1 p2
> 1 3 6 0.55 0.85
> 2 5 10 0.05 0.35
> 
> I need to optimize the following function using  optim for each row in mydat
> fr<-function(x) {
> u<-x[1]
> v<-x[2]
> sqrt(sum((plnorm(c(d1,d2,u,v)-c(p1,p2))^2))
> }
> x0<-c(1,1)# starting values for two unknown parameters
> y<-optim(x0,fr)
> 
> In my defined function fr, (d1 d2 p1 p2) are known values which I need 
> to read in from my dataframe and u & v are the TWO unknown parameters.  
> I want to solve this equation for each row of my dataframe.
> 
> I can get this to work when I manually plug in the known values (d1 d2 
> p1 p2).  However, I would like to apply this to each row in my dataframe 
> where the known values are automatically passed to my function which 
> then is sent to optim which solves for the two unknown parameters for 
> each row in the dataframe.
> 
> thanks again,
> mike
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] transformation matrice of vector into array

2006-07-27 Thread Tony Plate
Here's a way to convert a matrix of vectors like you have into an array:

 > x <- array(lapply(seq(0,len=6,by=4), "+", c(a=1,b=2,c=3,d=4)), 
dim=c(2,3), dimnames=list(c("X","Y"),c("e","f","g")))
 > x
   e f g
X Numeric,4 Numeric,4 Numeric,4
Y Numeric,4 Numeric,4 Numeric,4
 > x[["Y","e"]]
a b c d
5 6 7 8
 > xa <- array(unlist(x, use.names=F), dim=c(length(x[[1,1]]),dim(x)), 
dimnames=c(list(names(x[[1,1]])),dimnames(x)))
 > x["Y","e"]
[[1]]
a b c d
5 6 7 8

 > xa[,"Y","e"]
a b c d
5 6 7 8
 >

Then you can do whatever sums you want over the array.

I have not extensively checked the above code, and if I were going to 
use it, I would do numerous spot checks of elements to make sure all the 
elements are going to the right places -- it's not too difficult to make 
mistakes when pulling apart and reassembling arrays like this.  (For 
simpler cases involving lists of vectors or matrices, the abind() 
function can help.)

-- Tony Plate

Jessica Gervais wrote:
> Hi,
> 
> I need some help
> 
> I have a matrix M(m,n) in which each element is a vector V of lenght 6
>  1  2  3  4  5  6  7
> 1   List,6 List,6 List,6 List,6 List,6 List,6 List,6
> 2   List,6 List,6 List,6 List,6 List,6 List,6 List,6
> 3   List,6 List,6 List,6 List,6 List,6 List,6 List,6
> 4   List,6 List,6 List,6 List,6 List,6 List,6 List,6
> 
> 
> i would like to make the sum on the matrix of each element of the
> matrix, that is to say 
> sum(on the matrix)(M[j,][[j]][[1]])
> sum(on the matrix)(M[j,][[j]][[2]])
> ...
> sum(on the matrix)(M[j,][[j]][[6]])  
> 
> I don't really know how to do.
> I thought it was possible to transform the matrix M into an array A of
> dimension (m,n,6), and then use the command sum(colsums(A[,,1]), which
> seems to be possible and quite fast.
> ...but I don't know how to convert a matrix of vector into an array
> 
> As anyone any little idea about that ?
> 
> Thanks by advance
> 
> Jessica
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] References verifying accuracy of R for basic statistical calculations and tests

2006-07-13 Thread Tony Plate
This might be a place to start:

http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html

Among the references listed there are:

Assessing the Reliability of Statistical Software: Part I by B. D. 
McCullough (1998)
http://www.amstat.org/publications/tas/mccull-1.pdf

Assessing the Reliability of Statistical Software: Part II by B. D. 
McCullough (1999)
http://www.amstat.org/publications/tas/mccull.pdf

Those might have some relevance

Then, doing within an R session:

 > RSiteSearch("Assessing Reliability Statistical Software")

turns up 14 hits, many of them looking relevant

[leaving "the" and "of" in the query results in the search engine timing 
out - odd?]

-- Tony Plate


Corey Powell wrote:
> Do you know of any references that verify the accuracy of R for basic 
> statistical calculations and tests.  The results of these studies should 
> indicate that R results are the same as the results of other statistical 
> packages to a certain number of decimal places on some benchmark calculations.
> 
> Thanks,
> 
> Corey Powell
> Clinical Data Analyst
> Broncus Technologies
> [EMAIL PROTECTED]
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] max / pmax

2006-05-30 Thread Tony Plate
Here's an example of how I think you can do what you want.  Play with 
the definition of the function highest.use() to get random selection of 
multiple maxima.

 > drug.names <- c("marijuana", "crack", "cocaine", "heroin")
 > drugs <- factor(drug.names, levels=drug.names)
 > drugs
[1] marijuana crack cocaine   heroin
Levels: marijuana crack cocaine heroin
 > as.numeric(drugs)
[1] 1 2 3 4
 > N <- 20
 > set.seed(1)
 > primary.drug <- sample(drugs, N, rep=T)
 > primary.drug[sample(1:20, 10)] <- NA
 > primary.drug
  [1]   crack heroin
  [8] cocaine   cocaine   marijuana cocaine   crack
[15] heroin  cocaine   heroin  
Levels: marijuana crack cocaine heroin
 > # usage frequencies
 > marijuana <- sample(1:3, N, rep=T)
 > crack <- sample(1:3, N, rep=T)
 > cocaine <- sample(1:3, N, rep=T)
 > heroin <- sample(1:3, N, rep=T)
 > cbind(marijuana, crack, cocaine, heroin)
   marijuana crack cocaine heroin
  [1,] 2 2   2  1
  [2,] 2 3   3  1
  [3,] 2 2   2  2
  [4,] 1 1   2  3
  [5,] 3 1   2  3
  [6,] 3 1   3  3
  [7,] 3 1   3  2
  [8,] 1 2   2  2
  [9,] 3 2   3  3
[10,] 2 2   3  2
[11,] 3 3   2  2
[12,] 2 1   3  2
[13,] 3 2   2  1
[14,] 2 1   1  3
[15,] 2 2   3  2
[16,] 3 1   1  1
[17,] 1 2   3  1
[18,] 2 3   1  2
[19,] 3 1   1  3
[20,] 3 3   1  2
 > highest.use <- function(x) {y <- which(x==max(x, na.rm=T)); if 
(length(y)==1) return(y) else return(NA)}
 > apply(cbind(marijuana, crack, cocaine, heroin), 1, highest.use)
  [1] NA NA NA  4 NA NA NA NA NA  3 NA  3  1  4  3  1  3  2 NA NA
 > impute.primary.drug <- drugs[ifelse(is.na(primary.drug), 
apply(cbind(marijuana, crack, cocaine, heroin), 1, highest.use), 
as.numeric(primary.drug))]
 > data.frame(primary.drug, impute.primary.drug)
primary.drug impute.primary.drug
1  
2 crack   crack
3  
4heroin
5  
6  
7heroin  heroin
8   cocaine cocaine
9   cocaine cocaine
10marijuana   marijuana
11 
12  cocaine
13  cocaine cocaine
14crack   crack
15   heroin  heroin
16marijuana
17  cocaine cocaine
18   heroin  heroin
19 
20 
 >


Brian Perron wrote:
> Hello R users,
> 
> I am relatively new to R and cannot seem to crack a coding problem.  I 
> am working with substance abuse data, and I have a variable called 
> "primary.drug" which is considered the drug of choice for each 
> subject.   I have just a few missing values on that variable.  Instead 
> of using a multiple imputation method like chained equations, I would 
> prefer to derive these values from other survey responses.  
> Specifically, I have a frequency of use (in days) for each of the major 
> drugs, so I would like the missing values to be replaced by that drug 
> with the highest level of use.  I am starting with the "ifelse" and 
> "max" statements, but I know it is wrong:
> 
> impute.primary.drug <-   ifelse(is.na(primary.drug), max(marijuana, 
> crack, cocaine, heroin), primary.drug)
> 
> Here are the problems.  First, the max statement (should it be "pmax"?), 
> returns the highest numeric quantity rather than the variable itself.  
> In other words, I want to test which drug has the highest value, but 
> return the variable name rather than the observed value.   Second, if 
> ties are observed, how can I specify the value to be NA?  Or, how can I 
> specify one of the values to be randomly selected?   
> 
>  Thank in advance for your assistance.
> 
> Regards,
> Brian
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] how to multiply a constant to a matrix?

2006-05-26 Thread Tony Plate
I still can't see why this is a problem.  If a 1x1 matrix should be 
treated as a scalar, then it can just be wrapped in drop(), and the 
arithmetic will be computed correctly by R.

Are there any cases where this cannot be done?  More specifically, are 
there any matrix algebra expressions where, depending on the particular 
dimensions of the variables used, drop() must be used in some cases, and 
not in other cases?

A related but different behavior is the default dropping dimensions with 
extent equal to one by indexing operations.  This can be problematic 
because if one is not careful, incorrect results can be obtained for 
particular values used in the expression.

For example, consider the following, in which we are trying to compute 
the cross product of some columns of x with some rows of y.  If x has n 
rows and y has n columns, then the result should always be an nxn 
matrix.  However, if we are not careful with using drop=F in the 
indexing expressions, we can inadvertently end up with a 1x1 inner 
product matrix result for the case where we just use one column of x and 
one row of y.  The solution to this is to always use drop=F in indexing 
in situations where this can occur.

 > x <- matrix(1:9, ncol=3)
 > y <- matrix(-(1:9), ncol=3)
 > i <- 1:2
 > x[,i] %*% y[i,]
  [,1] [,2] [,3]
[1,]   -9  -24  -39
[2,]  -12  -33  -54
[3,]  -15  -42  -69
 > i <- 1:3
 > x[,i] %*% y[i,]
  [,1] [,2] [,3]
[1,]  -30  -66 -102
[2,]  -36  -81 -126
[3,]  -42  -96 -150
 > # i has just one element -- the expression without drop=F
 > # no longer computes an outer product
 > i <- 2
 > x[,i] %*% y[i,]
  [,1]
[1,]  -81
 > x[,i,drop=F] %*% y[i,,drop=F]
  [,1] [,2] [,3]
[1,]   -8  -20  -32
[2,]  -10  -25  -40
[3,]  -12  -30  -48
 >

Cannot all cases in the situations you mention be handled in an 
analogous manner, by always wrapping appropriate quadratic expressions 
in drop(), or are there some cases where the result of the quadratic 
expression must be treated as a matrix, and other cases where the result 
of the quadratic expression must be treated as a scalar?

-- Tony Plate

Michael wrote:
> imagine when you have complicated matrix algebra computation using R,
> 
> you cannot prevent some middle-terms become quadratic and absorbs into one
> scalar, right?
> 
> if R cannot intelligently determine this, and you  have to manually add
> "drop" everywhere,
> 
> do you think it is reasonable?
> 
> On 5/23/06, Patrick Burns <[EMAIL PROTECTED]> wrote:
> 
>>I think
>>
>>drop(B/D) * solve(A)
>>
>>would be a more transparent approach.
>>
>>It isn't that R can not do what you want, it is that
>>it is saving you from shooting yourself in the foot
>>in your attempt.  What you are doing is not really
>>a matrix computation.
>>
>>
>>Patrick Burns
>>[EMAIL PROTECTED]
>>+44 (0)20 8525 0696
>>http://www.burns-stat.com
>>(home of S Poetry and "A Guide for the Unwilling S User")
>>
>>Michael wrote:
>>
>>
>>>This is very strange:
>>>
>>>I want compute the following in R:
>>>
>>>g = B/D * solve(A)
>>>
>>>where B and D are  quadratics so they are just a scalar number, e.g.
>>
>>B=t(a)
>>
>>>%*% F %*% a;
>>>
>>>I want to multiply B/D to A^(-1),
>>>
>>>but R just does not allow me to do that and it keeps complaining that
>>>"nonconformable array, etc."
>>>
>>>
>>>I tried the following two tricks and they worked:
>>>
>>>as.numeric(B/D) * solve(A)
>>>
>>>diag(as.numeric(B/D), 5, 5) %*% solve (A)
>>>
>>>
>>>
>>>But if R cannot intelligently do scalar and matrix multiplication, it is
>>>really problemetic.
>>>
>>>It basically cannot be used to do computations, since in complicated
>>
>>matrix
>>
>>>algebras, you have to distinguish where is scalar, and scalars obtained
>>
>>from
>>
>>>quadratics cannot be directly used to multiply another matrix, etc. It is
>>>going to a huge mess...
>>>
>>>Any thoughts?
>>>
>>>  [[alternative HTML version deleted]]
>>>
>>>__
>>>R-help@stat.math.ethz.ch mailing list
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide!
>>
>>http://www.R-project.org/posting-guide.html
>>
>>>
>>>
>>>
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Subset dataframe based on condition

2006-04-17 Thread Tony Plate
Works OK for me:

 > x <- data.frame(a=10^(-2:7), b=10^(10:1))
 > subset(x, a > 1)
a b
4  1e+01 1e+07
5  1e+02 1e+06
6  1e+03 1e+05
7  1e+04 1e+04
8  1e+05 1e+03
9  1e+06 1e+02
10 1e+07 1e+01
 > subset(x, a > 1 & b < a)
ab
8  1e+05 1000
9  1e+06  100
10 1e+07   10
 >

Do you get all "numeric" for the following?

 > sapply(x, class)
 a b
"numeric" "numeric"
 >

If not, then your data frame is probably encoding the information in 
some way that you don't want (though if it was as factors, I would have 
expected a warning from the comparison operator).

You might get more help by distilling your problem to a simple example 
that can be tried out by others.

-- Tony Plate

Sachin J wrote:
> Hi,
>
>   I am trying to extract subset of data from my original data frame 
> based on some condition. For example : (mydf -original data frame, submydf 
> - subset dada frame)
>
>   >submydf = subset(mydf, a > 1 & b <= a), 
>
>   here column a contains values ranging from 0.01 to 10. I want to 
> extract only those matching condition 1 i.e a > . But when i execute 
> this command it is not giving me appropriate result. The subset df - 
> submydf  contains rows with 0.01 also. Please help me to resolve this 
> problem.
>
>   Thanks in advance.
>
>   Sachin
> 
>   
> -
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] for loop should check the looping index !!

2006-01-13 Thread Tony Plate
Yep, you missed the fact that 2:1 generates the sequence c(2,1).

Personally, I'd excuse you for missing this, as the documentation for 
seq says:

  The operator ':' and the 'seq(from, to)' form generate the
  sequence 'from, from+1, ..., to'.

Maybe I'm missing something, but I don't see anywhere on the help page 
for seq and ":" any mention of the fact the seq() generates a descending 
sequence if 'to' is less than 'from'.

In programming, *never* use a construct like 1:length(x) or 2:length(x), 
always using something like seq(1,len=length(x)) (or simply 
seq(len=length(x)), or seq(2, len=length(x)-1) or seq(along=x)[-1].

-- Tony Plate


johan Faux wrote:
> Hello ,
>
>   a<-c(1)
>   for(i in 2:length(a))
>   do.something with a[[i]]
>
>   I get :
>   Error in a[[i]] : subscript out of bounds
>
>   Am I missing something here?  Doesnt R check the value of i inside "for" 
> and if the condition is not tru, dont do anything 
>
>   thanks,
>   johan
> 
>   
> -
> 
>  Got holiday prints? See all the ways to get quality prints in your hands 
> ASAP.
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Convert matrix to data.frame

2006-01-12 Thread Tony Plate
When I try converting a matrix to a data frame, it works for me:

 > x <- matrix(1:6,ncol=2,dimnames=list(LETTERS[1:3],letters[24:25]))
 > data.frame(x)
   x y
A 1 4
B 2 5
C 3 6
 > str(data.frame(x))
`data.frame':   3 obs. of  2 variables:
  $ x: int  1 2 3
  $ y: int  4 5 6
 >

You can also use as.data.frame() to convert a matrix to a data.frame 
(but note that if colnames are missing form the matrix, as.data.frame() 
  constructs different colnames than does data.frame().

You say "it didn't work" -- it's difficult to help with such a 
non-specific complaint.  Can you explain exactly how it didn't work for 
you?  (e.g., show the exact error message).

-- Tony Plate

Chia, Yen Lin wrote:
> Hi all,
> 
>  
> 
> I wonder how could I convert a matrix A to a dataframe such that
> whenever I'm running a linear model such lme, I can use A$x1?  I tried
> data.frame(A), it didn't work.  Should I initialize A not as a matrix?
> Thanks.
> 
>  
> 
> Yen Lin
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Correct way to test for exact dimensions of matrix or array

2006-01-10 Thread Tony Plate
There's a gotcha in using identical() to compare dimensions -- it also 
compares names, e.g.:

 > x <- array(1:14, dim=c(rows=3,cols=5))
 > dim(x)
rows cols
35
 > identical(dim(x)+0, c(3,5))
[1] FALSE
 > identical(as.numeric(dim(x)+0), c(3,5))
[1] TRUE
 >

Gabor Grothendieck wrote:
> If its just succint you are after then this is slightly
> shorter:
> 
>identical(dim(x)+0, c(3,5))
> 
> 
> On 1/10/06, Gregory Jefferis <[EMAIL PROTECTED]> wrote:
> 
>>Thanks for suggestions.  This is a simple question in principle, but there
>>seem to be some wrinkles - I am always having to think quite carefully about
>>how to test for equality in R.  I should also have said that I would like
>>the check to be efficient as well safe and succinct.
>>
>>One suggestion was:
>>
>>   isTRUE(all.equal(dim(obj), c(3, 5)))
>>
>>But that is not so efficient because all.equal does lots of work esp if it
>>the objects are not equal.
>>
>>Another suggestion was:
>>
>>   all( dim( obj) == c(3,5) )
>>
>>But that is not safe eg because dim(vector(10)) is NULL and
>>all(NULL==c(3,5)) is actually TRUE (to my initial surprise) so vectors would
>>pass through the net.
>>
>>So, so far the only way that is efficient, safe and succinct is:
>>
>>   identical( dim( obj) , as.integer(c(3,5)))
>>
>>Martin Maechler pointed out that at the beginning of a function you might
>>want to break down the test into something less succinct, that printed more
>>specific error messages - a good suggestion for a top level function that is
>>supposed to be user friendly.
>>
>>Any other suggestions?  Many thanks,
>>
>>Greg Jefferis.
>>
>>On 10/1/06 15:13, "Martin Maechler" <[EMAIL PROTECTED]> wrote:
>>
>>
"Gregory" == Gregory Jefferis <[EMAIL PROTECTED]>
on Tue, 10 Jan 2006 14:47:43 + writes:
>>>
>>>Gregory> Dear R Users,
>>>
>>> Gregory> I want to test the dimensions of an incoming
>>> Gregory> vector, matrix or array safely
>>>
>>>
>>>Gregory> and succinctly.  Specifically I want to check if
>>>Gregory> the unknown object has exactly 2 dimensions with a
>>>Gregory> specified number of rows and columns.
>>>
>>>Gregory> I thought that the following would work:
>>>
>>>
>obj=matrix(1,nrow=3,ncol=5)
>identical( dim( obj) , c(3,5) )
>>>
>>>Gregory> [1] FALSE
>>>
>>>Gregory> But it doesn't because c(3,5) is numeric and the dims are
>>>integer.  I
>>>Gregory> therefore ended up doing something like:
>>>
>>>
>identical( dim( obj) , as.integer(c(3,5)))
>>>
>>>Gregory> OR
>>>
>>>
>isTRUE(all( dim( obj) == c(3,5) ))
>>>
>>>the last one is almost perfect if you leave a way the superfluous
>>>isTRUE(..).
>>>
>>>But, you say that it's part of your function checking it's
>>>arguments.
>>>In that case, I'd recommend
>>>
>>> if(length(d <- dim(obj)) != 2)
>>>  stop("'d' must be matrix-like")
>>> if(!all(d == c(3,5)))
>>>  stop("the matrix must be  3 x 5")
>>>
>>>which also provides for nice error messages in case of error.
>>>A more concise form with less nice error messages is
>>>
>>>  stopifnot(length(d <- dim(obj)) == 2,
>>>d == c(3,50))
>>>
>>>  ## you can leave away  all(.)  for things in stopifnot(.)
>>>
>>>
>>>
>>>
>>>Gregory> Neither of which feel quite right.  Is there a 'correct' way to
>>>do this?
>>>
>>>Gregory> Many thanks,
>>>
>>>You're welcome,
>>>Martin Maechler, ETH Zurich
>>>
>>>Gregory> Greg Jefferis.
>>>
>>>Gregory> PS Thinking about it, the second form is (doubly) wrong because:
>>>
>>>
>obj=array(1,dim=c(3,5,3,5))
>isTRUE(all( dim( obj) == c(3,5) ))
>>>
>>>Gregory> [1] TRUE
>>>
>>>Gregory> OR
>>>
>obj=numeric(10)
>isTRUE(all( dim( obj) == c(3,5) ))
>>>
>>>Gregory> [1] TRUE
>>>
>>>Gregory> (neither of which are equalities that I am happy with!)
>>>
>>
>>--
>>Gregory Jefferis, PhD   and:
>>Research Fellow
>>Department of Zoology   St John's College
>>University of Cambridge Cambridge
>>Downing Street  CB2 1TP
>>Cambridge, CB2 3EJ
>>United Kingdom
>>
>>Tel: +44 (0)1223 336683 +44 (0)1223 339899
>>Fax: +44 (0)1223 336676 +44 (0)1223 337720
>>
>>[EMAIL PROTECTED]
>>
>>__
>>R-help@stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>
> 
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Wikis etc.

2006-01-06 Thread Tony Plate
I second Frank's comment!  I wonder if questioners who receive a bunch 
of useful replies could be encouraged to enter a summary of those on a 
Wiki, in much the same way as users of S-news were expected to post a 
summary of their answers as a way of giving something back.

An existing R Wiki is located at 
http://fawn.unibw-hamburg.de/cgi-bin/Rwiki.pl?RwikiHome

However, there's currently not much on it.  Recently on R-help there was 
  a summary of using databases with R, which looked very useful, so I 
put that on the Wiki.  Maybe if others just start putting things there 
it can gather momentum?

-- Tony Plate

Frank E Harrell Jr wrote:
> I feel that as long as people continue to provide help on r-help wikis 
> will not be successful.  I think we need to move to a central wiki or 
> discussion board and to move away from e-mail.  People are extremely 
> helpful but e-mail seems to be to always be memory-less and messages get 
> too long without factorization of old text.  R-help is now too active 
> and too many new users are asking questions asked dozens of times for 
> e-mail to be effective.
> 
> The wiki also needs to collect and organize example code, especially for 
> data manipulation.  I think that new users would profit immensely from a 
> compendium of examples.
> 
> Just my .02 Euros
> 
> Frank

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] update to posting guide: use 'sessionInfo()' instead of 'version'

2005-12-29 Thread Tony Plate
Some changes have been made to the posting guide, based on suggestions 
from various R-help contributors over the past year.

The most significant change is the recommendation to use 'sessionInfo()' 
  rather than 'version' when asking questions about unexpected behavior 
or bugs.  This change was made because 'sessionInfo()' reports the 
version and a list of packages currently attached.  As more and more 
packages become available, it becomes more likely that unexpected 
behavior is due to conflicts between packages, so this is relevant 
information.

[Note that sessionInfo() currently does not report all the information 
that 'version' does (it omits at least "Status" and "svn rev").  R-core 
members are aware of this -- whether or not they change this is up to them.]

-- Tony Plate

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R and databases - a comment

2005-12-09 Thread Tony Plate
This is very useful, thanks for posting!

I created a page for this at the R Wiki: 
http://fawn.unibw-hamburg.de/cgi-bin/Rwiki.pl?DataBases

If any one has any info to add, go at it!

-- Tony Plate

charles loboz wrote:
> 1. That was a part of a private email exchange. It has
> been suggested that more people may be interested. 
> 
> 2. I did use various databases (significant part of my
> job) for the last 15 years. Some with R for the last 3
> years as a hobby. Some comments on the ones used
> below. Sorry, no links - I am time-constrained at the
> moment - please google if interested in details. The
> remarks are from the point of view of R user, not that
> of 'general database user'.
>  
> 3. SQLITE. www.sqlite.org - probably the best datase
> to use with R. No setup, no administration, embedded -
> so less connection overhead. All data in one file - so
> easy to transfer. Solid. Very functional SQL, fast if
> you play it right (almost as fast as SQLServer on
> Windows...) . Some limitations - no stored procedures.
> Some preprocessing/parsing can be done using TCL -
> well integrated with sqlite if you need that. Due to
> the implementation quirk you can even compute
> recursive functions (like exponential moving average
> or Fibonacci numbers) with SQL :-). Easy import/export
> of data to text files. After trying few other dbs I
> settled down on this one. Even considered writing a
> tutorial on SQLite use with R (like how to process
> gigabytes of data on a 128mb computer :-) ) - but time
> constraints stopped me. [Personally I think that
> SQLite should come bundled with the standard R
> installation. Could even be used to keep a lot of R's
> internal stuff, would probably simplify overall
> coding. But that is for others to decide]
>  
> All other databases (including mysql) require typical
> setup - installation, administration, user rights,
> keeping track of ports, services/daemons, directories,
> backups etc - so some db administrative skills are
> required.I am not sure how many R users are willing to
> go through that. The ones who may be interested in the
> stuff below
>  
> 4. www.postgres.org Postgres. Free. As complete as one
> can wish, small download, great functionality.
> Interfaces well to other languages, so you can do
> numerics in C++ and store that in the database (though
> why not do numerics in R?). Current version 8.1, much
> improved. 
>  
> 5. Firebird. open source verion of Interbase. Easy
> setup and can have all data in one file. But... slow
> development - not many developers there. SQL full but
> somewhat quirky (when porting from other dialects). 
>  
> 6. Mysql. the inheritance from the original ISAM
> system still shows. Nice user interface, but... if you
> need real db why not use postgres? if you need
> something simpler, without administration, why not use
> SQLITE? No doubt mysql is fine for many simple
> websites etc - this is mysql's niche.
>  
> 7. derby and hsqldb. both are written in Java, open
> source. HSQLDB (used now by OpenOffice) allows
> creation of in-memory tables and it's fast there - but
> it's usage from inside R is tricky - there is no
> easily available, installable and current ODBC driver.
> Similar for derby - the ODBC driver is there, but
> installation can be tricky to non-professionals. May
> be in the future...
>  
> There are three 'express' versions of commercial
> databases. They all share some restrictions, like max
> disc data size 2-4gb, max mem size 1-2gb and usage of
> single processor only. Plus various licensing
> restrictions, so be careful how you use them. 
>  
>  - Microsoft - in beta now, over 100mb download
> (windows only) (the old version, MSDE, is also
> available)
>  - Oracle - 150mb download, if i remember correctly
> even free to distribute, but check the license
>  - DB2 - 500mb download, currently 90 day version, IBM
> strong rumour is that early next year the new version
> will be free. 
>  
> Each commercial DB has some OLAP capability, but I am
> not sure how much of it is/will be available in the
> Express version.
> 
> 
>   
> __ 
> 
> Just $16.99/mo. or less. 
> dsl.yahoo.com
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Still a bug with NA in sd() or var()?

2005-10-31 Thread Tony Plate
Roger Dungan wrote:
> [snip]>
> There are obvious work-rounds, like
> 
>>sd(x, is.na(x)==F)
> 
> which gives the result (with error message)
> [1] 1.707825
> Warning message:
> the condition has length > 1 and only the first element will be used in:
> if (na.rm) "complete.obs" else "all.obs"
> 

What you are doing here looks very odd to me -- you are passing a vector 
of logicals as the value for the argument na.rm.  This is odd because 
na.rm should be just a single logical value, not a vector of the same 
length as x (hence the warning message).  Only the first element of that 
vector is used, so you are passing essentially a random value.  By luck, 
in your example, the first element was T, which is why you got a value 
of 1.707825 as the result, and not NA.  The rest might fall into place 
when this understanding is cleared up.

-- Tony Plate

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] unvectorized option for outer()

2005-10-28 Thread Tony Plate
Apologies for the cross post.  I explicitly tried to avoid this but 
somehow r-help got tacked onto the end of the To: line without my 
realizing it.

-- Tony Plate

Tony Plate wrote:
> [following on from a thread on R-help, but my post here seems more 
> appropriate to R-devel]
> 
...

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] unvectorized option for outer()

2005-10-28 Thread Tony Plate
[following on from a thread on R-help, but my post here seems more 
appropriate to R-devel]

Would a patch to make outer() work with non-vectorized functions be
considered?  It seems to come up moderately often on the list, which
probably indicates that many many people get bitten by the same
incorrect expectation, despite the documentation and the FAQ entry.  It
looks pretty simple to modify outer() appropriately: one extra function
argument and an if-then-else clause to call mapply(FUN, ...) instead of
calling FUN directly.

Here's a function demonstrating this:

outer2 <- function (X, Y, FUN = "*", ..., VECTORIZED=TRUE)
{
 no.nx <- is.null(nx <- dimnames(X <- as.array(X)))
 dX <- dim(X)
 no.ny <- is.null(ny <- dimnames(Y <- as.array(Y)))
 dY <- dim(Y)
 if (is.character(FUN) && FUN == "*") {
 robj <- as.vector(X) %*% t(as.vector(Y))
 dim(robj) <- c(dX, dY)
 }
 else {
 FUN <- match.fun(FUN)
 Y <- rep(Y, rep.int(length(X), length(Y)))
 if (length(X) > 0)
 X <- rep(X, times = ceiling(length(Y)/length(X)))
 if (VECTORIZED)
 robj <- FUN(X, Y, ...)
 else
 robj <- mapply(FUN, X, Y, MoreArgs=list(...))
 dim(robj) <- c(dX, dY)
 }
 if (no.nx)
 nx <- vector("list", length(dX))
 else if (no.ny)
 ny <- vector("list", length(dY))
 if (!(no.nx && no.ny))
 dimnames(robj) <- c(nx, ny)
 robj
}
# Some examples
f <- function(x, y, p=1) {cat("in f\n"); (x*y)^p}
outer2(1:2, 3:5, f, 2)
outer2(numeric(0), 3:5, f, 2)
outer2(1:2, numeric(0), f, 2)
outer2(1:2, 3:5, f, 2, VECTORIZED=F)
outer2(numeric(0), 3:5, f, 2, VECTORIZED=F)
outer2(1:2, numeric(0), f, 2, VECTORIZED=F)

# Output on examples
> f <- function(x, y, p=1) {cat("in f\n"); (x*y)^p}
> outer2(1:2, 3:5, f, 2)
in f
  [,1] [,2] [,3]
[1,]9   16   25
[2,]   36   64  100
> outer2(numeric(0), 3:5, f, 2)
in f
  [,1] [,2] [,3]
> outer2(1:2, numeric(0), f, 2)
in f

[1,]
[2,]
> outer2(1:2, 3:5, f, 2, VECTORIZED=F)
in f
in f
in f
in f
in f
in f
  [,1] [,2] [,3]
[1,]9   16   25
[2,]   36   64  100
> outer2(numeric(0), 3:5, f, 2, VECTORIZED=F)
  [,1] [,2] [,3]
> outer2(1:2, numeric(0), f, 2, VECTORIZED=F)

[1,]
[2,]
>

If a patch to add this feature would be considered, I'd be happy to
submit one (including documentation).  If so, and if there are any
potential traps I should bear in mind, please let me know!

-- Tony Plate

Rau, Roland wrote:
> Dear all,
> 
> a big thanks to Thomas Lumley, James Holtman and Tony Plate for their
> answers. They all pointed in the same direction => I need a vectorized
> function to be applied. Hence, I will try to work with a 'wrapper'
> function as described in the FAQ.
> 
> Thanks again,
> Roland
> 
> 
> 
>>-Original Message-
>>From: Thomas Lumley [mailto:[EMAIL PROTECTED] 
>>Sent: Thursday, October 27, 2005 11:39 PM
>>To: Rau, Roland
>>Cc: r-help@stat.math.ethz.ch
>>Subject: Re: [R] outer-question
>>
>>
>>You want FAQ 7.17 Why does outer() behave strangely with my function?
>>
>>  -thomas
>>
>>On Thu, 27 Oct 2005, Rau, Roland wrote:
>>
>>
>>>Dear all,
>>>
>>>This is a rather lengthy message, but I don't know what I 
>>
>>made wrong in
>>
>>>my real example since the simple code works.
>>>I have two variables a, b and a function f for which I would like to
>>>calculate all possible combinations of the values of a and b.
>>>If f is multiplication, I would simply do:
>>>
>>>a <- 1:5
>>>b <- 1:5
>>>outer(a,b)
>>>
>>>## A bit more complicated is this:
>>>f <- function(a,b,d) {
>>> return(a*b+(sum(d)))
>>>}
>>>additional <- runif(100)
>>>outer(X=a, Y=b, FUN=f, d=additional)
>>>
>>>## So far so good. But now my real example. I would like to plot the
>>>## log-likelihood surface for two parameters alpha and beta of
>>>## a Gompertz distribution with given data
>>>
>>>### I have a function to generate random-numbers from a
>>>Gompertz-Distribution
>>>### (using the 'inversion method')
>>>
>>>random.gomp <- function(n, alpha, beta) {
>>>   return( (log(1-(beta/alpha*log(1-runif(n)/beta)
>>>}
>>>
>>>## Now I generate some 'lifetimes'
>>>no.people <- 1000
>>>al <- 0.1
>>>bet <- 0.1
>>>lifetimes <- random.gomp(n=no.people, alpha=al, beta=bet

Re: [R] outer-question

2005-10-27 Thread Tony Plate
It looks like you didn't vectorize the function you gave "outer" in your 
longer example.

Consider your short example with a diagnostic printout:

 > a <- 1:3
 > b <- 1:4
 > f <- function(a,b,d) {
+ cat("In f:", length(a), length(b), "\n")
+ return(a*b+(sum(d)))
+ }
 > additional <- runif(100)
 > outer(X=a, Y=b, FUN=f, d=additional)
In f: 12 12
  [,1] [,2] [,3] [,4]
[1,] 53.61985 54.61985 55.61985 56.61985
[2,] 54.61985 56.61985 58.61985 60.61985
[3,] 55.61985 58.61985 61.61985 64.61985
 >

Note that "f" is called only once, with vectors for "a" and "b".

-- Tony Plate

Rau, Roland wrote:
> Dear all,
> 
> This is a rather lengthy message, but I don't know what I made wrong in
> my real example since the simple code works.
> I have two variables a, b and a function f for which I would like to
> calculate all possible combinations of the values of a and b.
> If f is multiplication, I would simply do:
> 
> a <- 1:5
> b <- 1:5
> outer(a,b)
> 
> ## A bit more complicated is this:
> f <- function(a,b,d) {
>   return(a*b+(sum(d)))
> }
> additional <- runif(100)
> outer(X=a, Y=b, FUN=f, d=additional)
> 
> ## So far so good. But now my real example. I would like to plot the
> ## log-likelihood surface for two parameters alpha and beta of 
> ## a Gompertz distribution with given data
> 
> ### I have a function to generate random-numbers from a
> Gompertz-Distribution
> ### (using the 'inversion method')
> 
> random.gomp <- function(n, alpha, beta) {
> return( (log(1-(beta/alpha*log(1-runif(n)/beta)
> }
> 
> ## Now I generate some 'lifetimes'
> no.people <- 1000
> al <- 0.1
> bet <- 0.1
> lifetimes <- random.gomp(n=no.people, alpha=al, beta=bet)
> 
> ### Since I neither have censoring nor truncation in this simple case,
> ### the log-likelihood should be simply the sum of the log of the
> ### the densities (following the parametrization of Klein/Moeschberger
> ### Survival Analysis, p. 38)
> 
> loggomp <- function(alphas, betas, timep) {
>   return(sum(log(alphas) + betas*timep + (alphas/betas *
> (1-exp(betas*timep)
> }
> 
> ### Now I thought I could obtain a matrix of the log-likelihood surface
> ### by specifying possible values for alpha and beta with the given
> data.
> ### I was able to produce this matrix with two for-loops. But I thought
> ### I could use also 'outer' in this case.
> ### This is what I tried:
> 
> possible.alphas <- seq(from=0.05, to=0.15, length=30)
> possible.betas <- seq(from=0.05, to=0.15, length=30)
> 
> outer(X=possible.alphas, Y=possible.betas, FUN=loggomp, timep=lifetimes)
> 
> ### But the result is:
> 
>>outer(X=possible.alphas, Y=possible.betas, FUN=loggomp,
> 
> timep=lifetimes)
> Error in outer(X = possible.alphas, Y = possible.betas, FUN = loggomp,
> : 
> dim<- : dims [product 900] do not match the length of object [1]
> In addition: Warning messages:
> ...
> 
> ### Can somebody give me some hint where the problem is?
> ### I checked my definition of 'loggomp' but I thought this looks fine:
> loggomp(alphas=possible.alphas[1], betas=possible.betas[1],
> timep=lifetimes)
> loggomp(alphas=possible.alphas[4], betas=possible.betas[10],
> timep=lifetimes)
> loggomp(alphas=possible.alphas[3], betas=possible.betas[11],
> timep=lifetimes)   
> 
> 
> ### I'd appreciate any kind of advice.   
> ### Thanks a lot in advance.
> ### Roland
>
> 
> +
> This mail has been sent through the MPI for Demographic Rese...{{dropped}}
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R on a supercomputer

2005-10-10 Thread Tony Plate
In general, R is not written in such a way that data remain in cache. 
However, R can use optimized BLAS libraries, and these are.   So if your 
version of R is compiled to use an optimized BLAS library appropriate to 
the machine (e.g., ATLAS, or Prof. Goto's Blas), AND a considerable 
amount of the computation done in your R program involves basic linear 
algebra (matrix multiplication, etc.), then you might see a good speedup.

-- Tony Plate

Kimpel, Mark William wrote:
> I am using R with Bioconductor to perform analyses on large datasets
> using bootstrap methods. In an attempt to speed up my work, I have
> inquired about using our local supercomputer and asked the administrator
> if he thought R would run faster on our parallel network. I received the
> following reply:
> 
>  
> 
>  
> 
> "The second benefit is that the processors have large caches. 
> 
> Briefly, everything is loaded into cache before going into the
> processor.  With large caches, there is less movement of data between
> memory and cache, and this can save quite a bit of time.  Indeed, when
> programmers optimize code they usually think about how to do things to
> keep data in cache as long as possible. 
> 
>   Whether you would receive any benefit from larger cache depends on how
> R is written. If it's written such that  data remain in cache, the
> speed-up could be considerable, but I have no way to predict it."
> 
>  
> 
> My question is, "is R written such that data remain in cache?" 
> 
>  
> 
> Thanks,
> 
>  
> 
>  
> 
> Mark W. Kimpel MD 
> 
>  
> 
> Indiana University School of Medicine
> 
>  
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Assign references

2005-10-07 Thread Tony Plate
Looking at what objects exist after the call to myFunk() should give you 
a clue as to what happened:

 > remove(list=objects())
 > myFunk<-function(a,b,foo,bar) {foo<<-a+b; bar<<-a*b;}
 > x<-0; y<-0;
 > myFunk(4,5,x,y)
 > x
[1] 0
 > y
[1] 0
 > objects()
[1] "bar""foo""myFunk" "x"  "y"
 > bar
[1] 20
 > foo
[1] 9
 >

I suspect that you might have slightly misinterpreted Thomas Lumely's 
explanations of how the <<- operator works in different situations (the 
LHS must exist if you are assigning using a replacement operator, e.g., 
as in "foo[1] <<- ...", but not when you are assigning the whole object 
as in "foo <<- ...").

But I really would suggest careful consideration of what might be the 
best way to approach your problem -- modifying global data from within a 
function is not the standard way of using R.  Unless you are very 
careful about how you do it, it is likely to cause headaches for 
yourself and/or others down the road (because R is just not intended to 
be used that way).

The standard way of doing this sort of thing in R is to modify a local 
copy of the dataframe and return that, or if you have to return several 
dataframes, then return a list of dataframes.

-- Tony Plate

[EMAIL PROTECTED] wrote:
> Folks,
> 
> I've run into trouble while writing functions that I hope will create
> and modify a dataframe or two.  To that end I've written a toy function
> that simply sets a couple of variables (well, tries but fails).
> Searching the archives, Thomas Lumley recently explained the <<-
> operator, showing that it was necessary for x and y to exist prior to
> the function call, but I haven't the faintest why this isn't working:
> 
> 
>>myFunk<-function(a,b,foo,bar) {foo<<-a+b; bar<<-a*b;}
>>x<-0; y<-0;
>>myFunk(4,5,x,y)
>>x<-0; y<-0;
>>myFunk(4,5,x,y)
>>x
> 
> [1] 0
> 
>>y
> 
> [1] 0
> 
> What (no doubt simple) reason is there for x and y not changing?
> 
> Thank you,
> cur
> --
> Curt Seeliger, Data Ranger
> CSC, EPA/WED contractor
> 541/754-4638
> [EMAIL PROTECTED]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] books about MCMC to use MCMC R packages?

2005-09-23 Thread Tony Plate
I've found "Bayesian Data Analysis" by Gelman, Carlin, Stern & Rubin 
(2nd ed) to be quite useful for understanding how MCMC can be used for 
Bayesian models.  It has a little bit of R code in it too.

-- Tony Plate

Molins, Jordi wrote:
> Dear list users,
> 
> I need to learn about MCMC methods, and since there are several packages in
> R that deal with this subject, I want to use them. 
> 
> I want to buy a book (or more than one, if necessary) that satisfies the
> following requirements:
> 
> - it teaches well MCMC methods;
> 
> - it is easy to implement numerically the ideas of the book, and notation
> and concepts are similar to the corresponding R packages that deal with MCMC
> methods.
> 
> I have done a search and 2 books seem to satisfy my requirements:
> 
> - Markov Chain Monte Carlo In Practice, by W.R. Gilks and others.
> 
> - Monte Carlo Statistical methods, Robert and Casella.
> 
> What do people think about these books? Is there a suggestion of some other
> book that could satisfy better my requirements?
> 
> Thank you very much in advance.
> 
> 
> 
> 
> 
> The information contained herein is confidential and is inte...{{dropped}}
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Regular expressions & sub

2005-08-18 Thread Tony Plate
 > x <- scan("clipboard", what="")
Read 7 items
 > x
[1] "1.11"   "10.11"  "11.11"  "113.31" "114.2"  "114.3"  "114.8"
 > gsub("[0-9]*\\.", "", x)
[1] "11" "11" "11" "31" "2"  "3"  "8"
 >


Bernd Weiss wrote:
> Dear all,
> 
> I am struggling with the use of regular expression. I got
> 
> 
>>as.character(test$sample.id)
> 
>  [1] "1.11"   "10.11"  "11.11"  "113.31" "114.2"  "114.3"  "114.8"  
> 
> and need
> 
> [1] "11"   "11"  "11"  "31" "2"  "3"  "8"
> 
> I.e. remove everything before the "." .
> 
> TIA,
> 
> Bernd
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] queer data set

2005-08-15 Thread Tony Plate
Here's one way of working with the data you gave:

 > x <- read.table(file("clipboard"), fill=T, header=T)
 > x
   HEADER1 HEADER2 HEADER3   HEADER3.1
1  A1  B1  C1 X11;X12;X13
2  A2  B2  C2 X21;X22;X23;X24;X25
3  A3  B3  C3
4  A4  B4  C4 X41;X42;X43
5  A5  B5  C5 X51
 > apply(x, 1, function(x) strsplit(x[4], ";")[[1]])
$"1"
[1] "X11" "X12" "X13"

$"2"
[1] "X21" "X22" "X23" "X24" "X25"

$"3"
character(0)

$"4"
[1] "X41" "X42" "X43"

$"5"
[1] "X51"

 > do.call("rbind", apply(x, 1, function(x) {
+y <- strsplit(x[4], ";")[[1]]
+x3 <- matrix(x[1:3], ncol=3, nrow=max(1,length(y)), byrow=T)
+return(cbind(x3, if (length(y)) y else "NA"))
+ }))
   [,1] [,2] [,3] [,4]
  [1,] "A1" "B1" "C1" "X11"
  [2,] "A1" "B1" "C1" "X12"
  [3,] "A1" "B1" "C1" "X13"
  [4,] "A2" "B2" "C2" "X21"
  [5,] "A2" "B2" "C2" "X22"
  [6,] "A2" "B2" "C2" "X23"
  [7,] "A2" "B2" "C2" "X24"
  [8,] "A2" "B2" "C2" "X25"
  [9,] "A3" "B3" "C3" "NA"
[10,] "A4" "B4" "C4" "X41"
[11,] "A4" "B4" "C4" "X42"
[12,] "A4" "B4" "C4" "X43"
[13,] "A5" "B5" "C5" "X51"
 >

This of course is a matrix; you can convert it back to a dataframe using 
as.data.frame() if you desire.  Use either "NA" (with quotes) or NA 
(without quotes) to control whether you get just the string "NA" or an 
actual character NA value in column 4.  If you're processing a huge 
amount of data, you can probably do better by rewriting the above code 
to avoid implicit coercions of data types.

hope this helps,

Tony Plate

S.O. Nyangoma wrote:
> I have a dataset that is basically structureless. Its dimension varies 
> from row to row and sep(s) are a mixture of tab and semi colon (;) and 
> example is
> 
> HEADER1 HEADER2 HEADER3   HEADER3
> A1   B1  C1   X11;X12;X13
> A2   B2  C2   X21;X22;X23;X24;X25
> A3   B3  C3   
> A4   B4  C4   X41;X42;X43
> A5   B5  C5   X51
> 
> etc., say. Note that a blank under HEADER3 corresponds to non 
> occurance and all semi colon (;) delimited variables are under 
> HEADER3. These values run into tens of thousands. I want to give some 
> order to this queer matrix to something like:
> 
> HEADER1 HEADER2 HEADER3   HEADER3
> A1   B1  C1   X11
> A1   B1  C1   X12
> A1   B1  C1   X13
> A1   B1  C1   X14
> A2   B2  C2   X21
> A2   B2  C2   X22
> A2   B2  C2   X23
> A2   B2  C2   X24
> A2   B2  C2   X25
> A2   B2  C2   X26
> A3   B3  C3   NA
> A4   B4  C4   X41
> A4   B4  C4   X42
> A4   B4  C4   X43
> 
> Is there a brilliant R-way of doing such task?
> 
> Goodday. Stephen.
> 
> 
> 
> 
> 
> 
> 
> 
> - Original Message -
> From: Prof Brian Ripley <[EMAIL PROTECTED]>
> Date: Monday, August 15, 2005 11:13 pm
> Subject: Re: [R] How to get a list work in RData file
> 
> 
>>On Mon, 15 Aug 2005, Xiyan Lon wrote:
>>
>>
>>>Dear R-Helper,
>>
>>(There are quite a few of us.)
>>
>>
>>>I want to know how I get a list  work which I saved in RData 
>>
>>file. For
>>
>>>example,
>>
>>I don't understand that at all, but it looks as if you want to 
>>save an 
>>unevaluated call, in which case see ?quote and use something like
>>
>>xyadd <- quote(test.xy(x=2, y=3))
>>
>>load and saving has nothing to do with this: it doesn't change the 
>>meaning 
>>of objects in the workspace.
>>
>>
>>>>test.xy <- function(x,y) {
>>>
>>>+xy <- x+y
>>>+xy
>>>+ }
>>>
>>>>xyadd <- test.xy(x=2, y=3)
>>>>xyadd
>>>
>>>[1] 5
>>>
>>>>x1 <- c(2,43,60,8)
>>>>y1 <- c(91,7,5,30)
>>>>
>>>>xyad

Re: [R] Why only a "" string for heading for row.names with write.csv with a matrix?

2005-08-10 Thread Tony Plate
Here's a relatively easy way to get what I think you want.  Note that 
converting x to a data frame before cbind'ing allows the type of the 
elements of x to be preserved:

 > x <- matrix(1:6, 2,3)
 > rownames(x) <- c("ID1", "ID2")
 > colnames(x) <- c("Attr1", "Attr2", "Attr3")
 > x
 Attr1 Attr2 Attr3
ID1 1 3 5
ID2 2 4 6
 > write.table(cbind(id=row.names(x), as.data.frame(x)), 
row.names=FALSE, sep=",")
"id","Attr1","Attr2","Attr3"
"ID1",1,3,5
"ID2",2,4,6
 >

As to why you can't get this via an argument to write.table (or 
write.csv), I suspect that part of the answer is a wish to avoid 
"creeping featuritis".  Transferring data between programs is 
notoriously infuriating.  There are more data formats than there are 
programs, but few programs use the same format as their default & 
preferred format.  So to accommodate everyone's preferred format would 
require an extremely large number of features in the data import/export 
functions.  Maintaining software that contains a large number of 
features is difficult -- it's easy for errors to creep in because there 
are so many combinations of how different features can be used on 
different functions.

The alternative to having lots of features on each function is to have a 
relatively small set of powerful functions that can be used to construct 
the behavior you want.  This type of software is thought by many to be 
easier to maintain and extend.  I think is is pretty much the preferred 
approach in R.  The above one-liner for writing the data in the form you 
want is really not much more complex than using an additional argument 
to write.table().  (And if you need to do this kind of thing frequently, 
then it's easy in R to create your own wrapper function for 'write.table'.)

One might object to this line of explanation by noting that many 
functions already have many arguments and lots of features.  I think the 
situation is that the original author of any particular function gets to 
decide what features the function will have, and after that there is 
considerable reluctance (justifiably) to add new features, especially in 
cases where there desired functionality can be easily achieved in other 
ways with existing functions.

-- Tony Plate

Earl F. Glynn wrote:
> Consider:
> 
>>x <- matrix(1:6, 2,3)
>>rownames(x) <- c("ID1", "ID2")
>>colnames(x) <- c("Attr1", "Attr2", "Attr3")
> 
> 
>>x
> 
> Attr1 Attr2 Attr3
> ID1 1 3 5
> ID2 2 4 6
> 
> 
>>write.csv(x,file="x.csv")
> 
> "","Attr1","Attr2","Attr3"
> "ID1",1,3,5
> "ID2",2,4,6
> 
> Have I missed an easy way to get the "" string to be something meaningful?
> 
> There is no information in the "" string.  This column heading for the row
> names often could used as a database key, but the "" entry would need to be
> manually edited first.  Why not provide a way to specify the string instead
> of putting "" as the heading for the rownames?
> 
>>From http://finzi.psych.upenn.edu/R/doc/manual/R-data.html
> 
>   Header line
>   R prefers the header line to have no entry for the row names,
>   . . .
>   Some other systems require a (possibly empty) entry for the row names,
> which is what write.table will provide if argument col.names = NA  is
> specified. Excel is one such system.
> 
> Why is an "empty" entry the only option here?
> 
> A quick solution that comes to mind seems a bit kludgy:
> 
> 
>>y <- cbind(rownames(x), x)
>>colnames(y)[1] <- "ID"
>>y
> 
> IDAttr1 Attr2 Attr3
> ID1 "ID1" "1"   "3"   "5"
> ID2 "ID2" "2"   "4"   "6"
> 
> 
>>write.table(y, row.names=F, col.names=T, sep=",", file="y.csv")
> 
> "ID","Attr1","Attr2","Attr3"
> "ID1","1","3","5"
> "ID2","2","4","6"
> 
> Now the rownames have an "ID" header, which could be used as a key in a
> database if desired without editing (but all the "numbers" are now
> characters strings, too).
> 
> It's also not clear why I had to use write.table above, instead of
> write.csv:
> 
>>write.csv(y, row.names=F, col.names=T, file="y.csv")
> 
> Error in write.table(..., col.names = NA, sep = ",", qmethod = "double") :
> col.names = NA makes no sense when row.names = FALSE
> 
> Thanks for any insight about this.
> 
> efg
> --
> Earl F. Glynn
> Bioinformatics
> Stowers Institute
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Seeking help with a loop

2005-08-03 Thread Tony Plate
 > x <- data.frame(q33a=3:4,q33b=5:6,q35a=1:2,q35b=2:1)
 > y <- list()
 > for (i in grep("q33", colnames(x), value=TRUE))
+y[[sub("q33","",i)]] <- ifelse(x[[sub("q33","q35",i)]]==1, x[[i]], NA)
 > as.data.frame(y)
a  b
1  3 NA
2 NA  6
 > # if you really want to create new variables rather
 > # than have them in a data frame:
 > # (use paste() or sub() to modify the names if you
 > #  want something like "newfielda")
 > for (i in names(y)) assign(i, y[[i]])
 > a
[1]  3 NA
 > b
[1] NA  6
 >

hope this helps,

Tony Plate

Greg Blevins wrote:
> Hello R Helpers,
> 
> After spending considerable time attempting to write a loop (and searching 
> the help archives) I have decided to post my problem.  
> 
> In a dataframe I have columns labeled:
> 
> q33a q33b q33c...q33rq35a q35b q35c...q35r
> 
> What I want to do is create new variables based on the following logic:
> newfielda <- ifelse(q35a==1, q33a, NA)
> newfieldb <- ifelse(q35b==1, q33b, NA)
> ...
> newfieldr
> 
> What I did was create two new dataframes, one containing q33a-r the other 
> q35a-r and tried to loop over both, but I could not get any of the loop 
> syntax I tried to give me the result I was seeking.
> 
> Any help would be much appreciated.
> 
> Greg Blevins
> Partner
> The Market Solutions Group, Inc.
> Minneapolis, MN
> 
> Windows XP, R 2.1.1
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Generating correlated data from uniform distribution

2005-07-01 Thread Tony Plate
Isn't this a little trickier with non-normal variables?  It sounds like 
Menghui Chen wants variables that have uniform marginal distribution, 
and a specified correlation.

When I look at histograms (or just the quantiles) of the rows of dat2 in 
your example, I see something for dat2[2,] that does not look much like 
it comes from a uniform distribution.

 > dat<-matrix(runif(2000),2,1000)
 > rho<-.77
 > R<-matrix(c(1,rho,rho,1),2,2)
 > ch<-chol(R)
 > dat2<-t(ch)%*%dat
 > cor(dat2[1,],dat2[2,])
[1] 0.7513892
 > hist(dat2[1,])
 > hist(dat2[2,])
 >
 > quantile(dat2[1,])
  0% 25% 50% 75%100%
0.000655829 0.246216035 0.507075912 0.745158441 0.16418
 > quantile(dat2[2,])
0%   25%   50%   75%  100%
0.0393046 0.4980066 0.7150426 0.9208855 1.3864704
 >

-- Tony Plate

Jim Brennan wrote:
> dat<-matrix(runif(2000),2,1000)
> rho<-.77
> R<-matrix(c(1,rho,rho,1),2,2)
> ch<-chol(R)
> dat2<-t(ch)%*%dat
> cor(dat2[1,],dat2[2,])
[1] 0.7513892
> 
>>dat<-matrix(runif(2),2,1)
>>rho<-.28
>>R<-matrix(c(1,rho,rho,1),2,2)
>>ch<-chol(R)
>>dat2<-t(ch)%*%dat
>>cor(dat2[1,],dat2[2,])
> 
> [1] 0.2681669
> 
>>dat<-matrix(runif(20),2,10)
>>rho<-.28
>>R<-matrix(c(1,rho,rho,1),2,2)
>>ch<-chol(R)
>>dat2<-t(ch)%*%dat
>>cor(dat2[1,],dat2[2,])
> 
> [1] 0.2814035
> 
> See  ?choleski
> 
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Menghui Chen
> Sent: July 1, 2005 4:49 PM
> To: r-help@stat.math.ethz.ch
> Subject: [R] Generating correlated data from uniform distribution
> 
> Dear R users,
> 
> I want to generate two random variables (X1, X2) from uniform
> distribution (-0.5, 0.5) with a specified correlation coefficient r.
> Does anyone know how to do it in R?
> 
> Many thanks!
> 
> Menghui
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] function for cumulative occurrence of elements

2005-06-28 Thread Tony Plate
I'm not entirely sure what you want, but is it "9 5 3" for this data? (9 
"new" species occur at the first point, 5 "new" at the second, and 3 
"new" at the third).  If this is right, then to get "accumulation curve 
when random Points are considered", you can probably just index rows of 
dt appropriately.

 > dd <- read.table("clipboard", header=T)
 > dd[,1:3]
Pointspecies frequency
1  7   American_elm 7
2  7  apple 2
3  7   black_cherry 8
4  7  black_oak 1
5  7chokecherry 1
6  7 oak_sp 1
7  7 pignut_hickory 1
8  7  red_maple 1
9  7  white_oak 5
10 9   black_spruce 2
11 9blue_spruce 2
12 9missing12
13 9  Norway_spruce 8
14 9   white_spruce 3
1512  apple 2
1612   black_cherry 1
1712   black_locust 1
1812   black_walnut 1
1912  lilac 3
2012missing 2
 > # dt: table of which species occur at which "Points"
 > dt <- table(dd$Point, dd$species)
 > # doc: for each species, the index of the "Point" where
 > # it first occurs
 > doc <- apply(dt, 2, function(x) which(x==1)[1])
 > doc
   American_elm  apple   black_cherry   black_locust  black_oak
  1  1  1  3  1
   black_spruce   black_walnutblue_sprucechokecherry  lilac
  2  3  2  1  3
missing  Norway_spruce oak_sp pignut_hickory  red_maple
  2  2  1  1  1
  white_oak   white_spruce
  1  2
 > table(doc)
doc
1 2 3
9 5 3
 >

hope this helps,

Tony Plate

Steven K Friedman wrote:
> Hello, 
> 
> I have a data set with 9700 records, and 7 parameters. 
> 
> The data were collected for a survey of forest communities.  Sample plots 
> (1009) and species (139) are included in this data set. I need to determine 
> how species are accumulated as new plots are considered. Basically, I want 
> to develop a species area curve. 
> 
> I've included the first 20 records from the data set.  Point represents the 
> plot id. The other parameters are parts of the information statistic H'. 
> 
> Using "Table", I can construct a data set that lists the occurrence of a 
> species at any Point (it produces a binary 0/1 data table). From there it 
> get confusing, regarding the most efficient approach to determining the 
> addition of new and or repeated species occurrences. 
> 
> ptcount <-  table(sppoint.freq$species, sppoint.freq$Point) 
> 
>  From here I've played around with colSums to calculate the number of species 
> at each Point.  The difficulty is determining if a species is new or 
> repeated.  Also since there are 1009 points a function is needed to screen 
> every Point. 
> 
> Two goals are of interest: 1) the species accumulation curve, and 2) an 
> accumulation curve when random Points are considered. 
> 
> Any help would be greatly appreciated. 
> 
> Thank you
> Steve Friedman 
> 
> 
>  Pointspecies frequency point.list point.prop   log.prop 
> point.hprime
> 1  7   American elm 7 27 0.25925926 -1.3499267
> 0.3499810
> 2  7  apple 2 27 0.07407407 -2.6026897
> 0.1927918
> 3  7   black cherry 8 27 0.29629630 -1.2163953
> 0.3604134
> 4  7  black oak 1 27 0.03703704 -3.2958369
> 0.1220680
> 5  7chokecherry 1 27 0.03703704 -3.2958369
> 0.1220680
> 6  7 oak sp 1 27 0.03703704 -3.2958369
> 0.1220680
> 7  7 pignut hickory 1 27 0.03703704 -3.2958369
> 0.1220680
> 8  7  red maple 1 27 0.03703704 -3.2958369
> 0.1220680
> 9  7  white oak 5 27 0.18518519 -1.6863990
> 0.3122961
> 10 9   black spruce 2 27 0.07407407 -2.6026897
> 0.1927918
> 11 9blue spruce 2 27 0.07407407 -2.6026897
> 0.1927918
> 12 9missing12 27 0. -0.8109302
> 0.3604134
> 13 9  Norway spruce 8 27 0.29629630 -1.2163953
> 0.3604134
> 14 9   white spruce 3 27 0. -2.1972246
> 0.2441361
> 1512  apple 2 27 0.07407407 -2.6026897
> 0.1927918
> 1612   black cherry 1 27 

Re: [R] summary(as.factor(x) - force to not sort the result according factor levels

2005-05-02 Thread Tony Plate
Christoph Lehmann wrote:
Hi
The result of a summary(as.factor(x)) (see example below) call is sorted 
according to the factor level. How can I get the result not sorted but 
in the original order of the levels in x?
by creating the factor with the levels in the order you want:
> test <- c(120402, 120402, 120402, 1323, 1323,200393, 200393, 200393, 
200393, 200393)
> summary(factor(test, levels=unique(test)))
120402   1323 200393
 3  2  5

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] na.action

2005-04-29 Thread Tony Plate
Maybe this does what you want:
> x <- as.matrix(read.table("clipboard"))
> x
  V1 V2 V3 V4
1 NA  0  0  0
2  0 NA  0 NA
3  0  0 NA  2
4  0  0  2 NA
> rowSums(x==2, na.rm=T)
1 2 3 4
0 0 1 1
>
There's probably at least 5 or 6 other quite sensible ways of doing 
this, but this is probably the fastest (and the least versatile).

A more general building block is the sum() function, as in:
> sum(x[3,]==2, na.rm=T)
[1] 1
>
The key is the use of the 'na.rm=T' argument value.
hope this helps,
Tony Plate
Tim Smith wrote:
Hi,
 
I had the following code:

  testp <- rcorr(t(datcm1),type = "pearson")
  mat1 <- testp[[1]][,] > 0.6
  mat2 <- testp[[3]][,] < 0.05
  mat3 <- mat1 + mat2
 
The resulting mat3 (smaller version) matrix looks like:
 
 NA   000  
  0  NA0   NA 
  0   0   NA2 
  0   02   NA   
 
To get to the number of times a '2' appears in the rows, I was trying to run the following code:
 
numrow = nrow(mat3)
  counter <- matrix(nrow = numrow,ncol =1)
  for(i in 1:numrow){
   count = 0;
   for(j in 1:numrow){
if(mat3[i,j] == 2){
 count = count + 1
}
   }
  counter[i,1] = count
  }
 
However, I get the following error:
 
'Error in if (mat3[i, j] == 2) { : missing value where TRUE/FALSE needed'
 
I also tried to use the na.action, but couldn't get anything. I'm sure there must be a relatively easy fix to this. Is there a workaround this problem?
 
thanks,
 
Tim
 

__

[[alternative HTML version deleted]]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Subarrays

2005-04-29 Thread Tony Plate
Here's one way:
> subarray <- function(x, marginals, intervals) {
+ if (length(marginals) != length(intervals))
+ stop("marginals and intervals must be the same length 
(intervals can be a list)")
+ if (any(marginals<1 | marginals>length(dim(x
+ stop("marginals must contain values in 1:length(dim(x))")
+ ic <- Quote(x[, drop=T])
+ # ic has 4 elts with one empty index arg
+ ic2 <- ic[c(1, 2, rep(3, length(dim(x))), 4)]
+ # ic2 has an empty arg for each dim of x
+ ic2[marginals+2] <- intervals
+ eval(ic2)
> }

> subarray(v, c(1,4), c(3,2))
 [,1] [,2] [,3] [,4]
[1,]   67   83   99  115
[2,]   71   87  103  119
[3,]   75   91  107  123
[4,]   79   95  111  127
> subarray(v, c(1,4), list(3,2))
 [,1] [,2] [,3] [,4]
[1,]   67   83   99  115
[2,]   71   87  103  119
[3,]   75   91  107  123
[4,]   79   95  111  127
> subarray(v, c(1,3,4), list(c(1,3,4),1,2))
 [,1] [,2] [,3] [,4]
[1,]   65   69   73   77
[2,]   67   71   75   79
[3,]   68   72   76   80
>
Question for language experts: is this the best way to create and 
manipulate R language expressions that contain empty arguments, or are 
there other preferred ways?

-- Tony Plate
Gunnar Hellmund wrote:
Define an array

v<-1:256
dim(v)<-rep(4,4)

Subarrays can be obtained as follows:

v[3,2,,2]
[1]  71  87 103 119
v[3,,,2]
 [,1] [,2] [,3] [,4]
[1,]   67   83   99  115
[2,]   71   87  103  119
[3,]   75   91  107  123
[4,]   79   95  111  127
In the general case this procedure is very tedious. 

Given an array 
A, dim(A)=(dim_1,dim_2,...,dim_d) 
and two vectors
v1=(n_i1,...n_ik), v2=(int_1,...,int_k) ('marginals' and relevant
'interval numbers')
is there a smart way to obtain 
A[,...,int_1,,int_2,,,int_k,]
?

Best wishes
Gunnar Hellmund
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Reconstruction of a "valid" expression within a function

2005-04-28 Thread Tony Plate
You are passing just a string to subset().  At the very least you need 
to parse it (but still this does not work easily with subset() -- see 
below).  But are you sure you need to do this?  subset() for dataframes 
already accepts subset expressions involving the columns of the 
dataframe, e.g.:

> df <- data.frame(x=1:10,y=rep(1:5,2))
> subset(df, y==2)
  x y
2 2 2
7 7 2
>
However, it's tricky to get subset() to work with an expression for its 
subset argument.  This is because of the way it evaluates its subset 
expression (look at the code for subset.data.frame()).

> subset(df, parse(text="df$y==2"))
Error in subset.data.frame(df, parse(text = "df$y==2")) :
'subset' must evaluate to logical
> subset(df, parse(text="y==2"))
Error in subset.data.frame(df, parse(text = "y==2")) :
'subset' must evaluate to logical
>
It's a little tricky in general passing R language expressions around, 
because many functions that work with expressions work with the 
unevaluated form of the actual argument, rather than with an R language 
expression as the value of a variable.  E.g.:

> with(df, y==2)
 [1] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
> cond <- parse(text="y==2")
> cond
expression(y == 2)
> with(df, cond)
expression(y == 2)
One way to make these types of functions work with R language 
expressions as the value of a variable is to use do.call():

> do.call("with", list(df, cond))
 [1] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
>
So, returning to subset(), you can give it an expression that is stored 
in the value of a variable like this:

> do.call("subset", list(df, cond))
  x y
2 2 2
7 7 2
>
However, if you're a beginner at R, I suspect that you'll get much 
further if you avoid such meta-language constructs and just find a way 
to make subset() work for you without trying to paste together R 
language expressions.

Hope this helps,
-- Tony Plate
Pascal Boisson wrote:
Hello all,
I have some trouble in reconstructing a valid expression within a
function,
here is my question.
I am building a function :
SUB<-function(DF,subset=TRUE) {
#where DF is a data frame, with Var1, Var2, Fact1, Fact2, Fact3
#and subset would be an expression, eg. Fact3 == 1 

#in a first time I want to build a subset from DF
#I managed to, with an expression like eg. DF$Fact3,
# but I would like to skip the DF$ for convenience
# so I tried something like this :
tabsub<-deparse(substitute(subset))
dDF<-deparse(substitute(DF))
if (tabsub[1]!="TRUE") {
subset<-paste(dDF,"$",tabsub,sep="")}
#At this point, I have a string that seems to be the expression that I
want
sDF<-subset(DF, subset)
}
#But I have an error message :
Error in r & !is.na(r) : operations are possible only for numeric or
logical types
I can not understand why is that, even after I've tried to convert
properly the string into an expression.
I've been all the day trying to sort that problem ...
Maybe this attempt is ackward and I have not understood what is really
behind an expression. 
But if anyone could give me a tip concerning this problem or point me to
relevant references, I would really appreciate.

Thanks
Pascal Boisson
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
DISCLAIMER:\ 
\ This email is from the Scottish Crop Researc...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Getting the name of an object as character

2005-04-28 Thread Tony Plate
If you're trying to find the textual form of an actual argument, here's 
one way:

> foo <- function(x) {
+ xn <- substitute(x)
+ if (is.name(xn) && !exists(as.character(xn)))
+ as.character(xn)
+ else
+ x
+ }
> foo(x)
[1] 3
> foo(xx)
[1] "xx"
> foo(list(xx))
Error in foo(list(xx)) : Object "xx" not found
>
If you want the textual form of arguments that are expressions, use 
deparse() and a different test (& beware that deparse() can return a 
vector of character data).

Although you can do this in R, it is not always advisable practice. 
Many people who have written functions with non-standard evaluation 
rules like this have come to regret it (one reason is that it makes 
these functions difficult to use in programs, another is that the 
behavior of the function can depend upon what global variables exists, 
another is that when the function works as intended, that's great, but 
when it doesn't, users can get quite confused trying to figure out what 
it's doing.)  The R function help() is an example of a commonly used 
function with a non-standard evaluation rule.

-- Tony Plate

Ali - wrote:
This could be really trivial, but I cannot find the right function to 
get the name of an object as a character.

Assume we have a function like:
getName <- function(obj)
Now if we call the function like:
getName(blabla)
and 'blabla' is not a defined object, I want getName to return "blabla". 
In other word, if

paste("blabla")
returns
"blabla"
I want to define a paste function which returns the same character by:
paste(blabla)
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Defining binary indexing operators

2005-04-27 Thread Tony Plate
Excuse me!  I misunderstood the question, and indeed, it is necessary be 
that complicated when you try to make x$y behave the same as foo(x,y), 
rather than foo(x,"y") (doing the former would be inadvisible, as I 
think someelse pointed out too.)

Tony Plate wrote:
It's not necessary to be that complicated, is it?  AFAIK, the '$' 
operator is treated specially by the parser so that its RHS is treated 
as a string, not a variable name.  Hence, a method for "$" can just take 
the indexing argument directly as given -- no need for any fancy 
language tricks (eval(), etc.)

 > x <- structure(3, class = "myclass")
 > y <- 5
 > foo <- function(x,y) paste(x, " indexed by '", y, "'", sep="")
 > foo(x, y)
[1] "3 indexed by '5'"
 > "$.myclass" <- foo
 > x$y
[1] "3 indexed by 'y'"
 >
The point of the above example is that foo(x,y) behaves differently from 
x$y even when both call the same function: foo(x,y) uses the value of 
the variable 'y', whereas x$y uses the string "y".  This is as desired 
for an indexing operator "$".

-- Tony Plate

Gabor Grothendieck wrote:
On 4/27/05, Ali - <[EMAIL PROTECTED]> wrote:
Assume we have a function like:
foo <- function(x, y)
how is it possible to define a binary indexing operator, denoted by 
$, so
that

x$y
functions the same as
foo(x, y)

  Here is an example. Note that $ does not evaluate y so you have
to do it yourself:
x <- structure(3, class = "myclass")
y <- 5
foo <- function(x,y) x+y
"$.myclass" <- function(x, i) { i <- eval.parent(parse(text=i)); 
foo(x, i) }
x$y # structure(8, class = "myclass")

[[alternative HTML version deleted]]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Defining binary indexing operators

2005-04-27 Thread Tony Plate
It's not necessary to be that complicated, is it?  AFAIK, the '$' 
operator is treated specially by the parser so that its RHS is treated 
as a string, not a variable name.  Hence, a method for "$" can just take 
the indexing argument directly as given -- no need for any fancy 
language tricks (eval(), etc.)

> x <- structure(3, class = "myclass")
> y <- 5
> foo <- function(x,y) paste(x, " indexed by '", y, "'", sep="")
> foo(x, y)
[1] "3 indexed by '5'"
> "$.myclass" <- foo
> x$y
[1] "3 indexed by 'y'"
>
The point of the above example is that foo(x,y) behaves differently from 
x$y even when both call the same function: foo(x,y) uses the value of 
the variable 'y', whereas x$y uses the string "y".  This is as desired 
for an indexing operator "$".

-- Tony Plate

Gabor Grothendieck wrote:
On 4/27/05, Ali - <[EMAIL PROTECTED]> wrote: 

Assume we have a function like:
foo <- function(x, y)
how is it possible to define a binary indexing operator, denoted by $, so
that
x$y
functions the same as
foo(x, y)

  Here is an example. Note that $ does not evaluate y so you have
to do it yourself:
x <- structure(3, class = "myclass")
y <- 5
foo <- function(x,y) x+y
"$.myclass" <- function(x, i) { i <- eval.parent(parse(text=i)); foo(x, i) }
x$y # structure(8, class = "myclass")
[[alternative HTML version deleted]]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Summarizing factor data in table?

2005-04-26 Thread Tony Plate
Do you want to count the number of non-NA divisions and organizations in 
the data for each year (where duplicates are counted as many times as 
they appear)?

> tapply(!is.na(foo$div), foo$yr, sum)
1998 1999 2000
   042
> tapply(!is.na(foo$org), foo$yr, sum)
1998 1999 2000
   442
>
Or perhaps the number of unique non-NA divisions and organizations in 
the data for each year?

> tapply(foo$div, foo$yr, function(x) length(na.omit(unique(x
1998 1999 2000
   042
> tapply(foo$org, foo$yr, function(x) length(na.omit(unique(x
1998 1999 2000
   442
>
(I don't understand where the "3" in your desired output comes from 
though, which maybe indicates I completely misunderstand your request.)

Andy Bunn wrote:
I have a very simple query with regard to summarizing the number of factors
present in a certain snippet of a data frame.
Given the following data frame:
foo <- data.frame(yr = c(rep(1998,4), rep(1999,4), rep(2000,2)), div =
factor(c(rep(NA,4),"A","B","C","D","A","C")),
org = factor(c(1:4,1:4,1,2)))
I want to get two new variables. Object ndiv would give the number of
divisions by year:
 1998 0
 1999 3
 2000 2
Object norgs would give the number of organizations
 1998 4
 1999 4
 2000 2
I figure xtabs should be able to do it, but I'm stuck without a for loop.
Any suggestions? -Andy
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Index matrix to pick elements from 3-dimensional matrix

2005-04-26 Thread Tony Plate
I'm assuming what you want to do is randomly sample from slices of A 
selected on the 3-rd dimension, as specified by J.  Here's a way that 
uses indexing by a matrix.  The cbind() builds a three column matrix of 
indices, the first two of which are randomly selected.  The use of 
replace() is to make the result have the same attributes, e.g., dim and 
dimnames, as J.

> A <- array(letters[1:12],c(2,2,3))
> J <- matrix(c(1,2,3,3),2,2)
> replace(J, TRUE, A[cbind(sample(dim(A)[1], length(J), rep=T), 
sample(dim(A)[2], length(J), rep=T), as.vector(J))])
 [,1] [,2]
[1,] "b"  "l"
[2,] "f"  "k"
> replace(J, TRUE, A[cbind(sample(dim(A)[1], length(J), rep=T), 
sample(dim(A)[2], length(J), rep=T), as.vector(J))])
 [,1] [,2]
[1,] "b"  "l"
[2,] "h"  "i"
> replace(J, TRUE, A[cbind(sample(dim(A)[1], length(J), rep=T), 
sample(dim(A)[2], length(J), rep=T), as.vector(J))])
 [,1] [,2]
[1,] "c"  "l"
[2,] "h"  "k"
>

-- Tony Plate
Robin Hankin wrote:
Hello Juhana
try this (but there must be a better way!)

stratified.select <- function(A,J){
  out <- sapply(J,function(i){sample(A[,,i],1)})
  attributes(out) <- attributes(J)
  return(out)
}
A <- array(letters[1:12],c(2,2,3))
J <- matrix(c(1,2,3,3),2,2)
R>  stratified.select(A,J)
 [,1] [,2]
[1,] "b"  "i"
[2,] "g"  "k"
R>   stratified.select(A,J)
 [,1] [,2]
[1,] "d"  "j"
[2,] "f"  "l"
R>
best wishes
Robin

On Apr 26, 2005, at 05:16 am, juhana vartiainen wrote:
Hi all
Suppose I have a dim=c(2,2,3) matrix A, say:
A[,,1]=
a b
c d
A[,,2]=
e f
g h
A[,,3]=
i j
k l
Suppose that I want to create a 2x2 matrix X, which picks elements 
from the above-mentioned submatrices according to an index matrix J 
referring to the "depth" dimension:
J=
1 3
2 3

In other words, I want X to be
X=
a j
g l
since the matrix J says that the (1,1)-element should be picked from 
A[,,1], the (1,2)-element should be picked from A[,,3], etc.

I have A and I have J. Is there an expression in A and J that creates X?
Thanks
Juhana
[EMAIL PROTECTED]
--
Juhana Vartiainen
docent in economics
Director, FIEF (Trade Union Foundation for Economic Research, 
Stockholm), http://www.fief.se
gsm +46 70 360 9915
office +46 8 696 9915
email [EMAIL PROTECTED]
homepage http://www.fief.se/staff/Juhana/index.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


--
Robin Hankin
Uncertainty Analyst
Southampton Oceanography Centre
European Way, Southampton SO14 3ZH, UK
 tel  023-8059-7743
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Proba( Ut+2=1 / ((Ut+1==1) && (Ut==1))) ?

2005-04-25 Thread Tony Plate
table() can return all the n-gram statistics, e.g.:
> v <- sample(c(-1,1), 1000, rep=TRUE)
> table("v_{t-2}"=v[-seq(to=length(v), len=2)], 
"v_{t-1}"=v[-c(1,length(v))], "v_t"=v[-(1:2)])
, , v_t = -1

   v_{t-1}
v_{t-2}  -1   1
 -1 136 134
 1  131 112
, , v_t = 1
   v_{t-1}
v_{t-2}  -1   1
 -1 131 113
 1  115 126
>
This says that there were 136 cases in which a -1 followed two -1's (and 
126 cases in which a 1 followed to 1's).

If you're really only interested in particular contexts, you can do 
something like:

> table(v[-seq(to=length(v), len=2)]==1 & v[-c(1,length(v))]==1 & 
v[-(1:2)]==1)

FALSE  TRUE
  872   126
> table(v[-seq(to=length(v), len=2)]==-1 & v[-c(1,length(v))]==-1 & 
v[-(1:2)]==-1)

FALSE  TRUE
  862   136
or
> sum(v[-seq(to=length(v), len=2)]==-1 & v[-c(1,length(v))]==-1 & 
v[-(1:2)]==-1)
[1] 136
>
vincent wrote:
Dear all,
First I apologize if my question is quite simple,
but i'm very newbie with R.
I have vectors of the form v = c(1,1,-1,-1,-1,1,1,1,1,-1,1)
(longer than this one of course).
The elements are only +1 or -1.
I would like to calculate :
- the frequencies of -1 occurences after 2 consecutives -1
- the frequencies of +1 occurences after 2 consecutives +1
It looks probably something like :
Proba( Ut+2=1 / ((Ut+1==1) && (Ut==1)))
could someone please give me a little hint about how
i should/could begin to proceed ?
Thanks
(Thanks also to the R creators/contributors, this soft
seems really great !)
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] pointer to comments re Paul Murrell's new book, R, & SAS on Andrew Gelman's blog

2005-04-21 Thread Tony Plate
There are some interesting comments re Paul Murrell's new book, R, & SAS 
on Andrew Gelman's blog:

http://www.stat.columbia.edu/~cook/movabletype/archives/2005/04/a_new_book_on_r.html
-- Tony Plate
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] terminate R program when trying to access out-of-bounds array element?

2005-04-13 Thread Tony Plate
Oops.
The message in the 'stop' should be something more like "numeric index 
out of range".

-- Tony Plate
Tony Plate wrote:
One way could be to make a special class with an indexing method that 
checks for out-of-bounds numeric indices.  Here's an example for vectors:

 > setOldClass(c("oobcvec"))
 > x <- 1:3
 > class(x) <- "oobcvec"
 > x
[1] 1 2 3
attr(,"class")
[1] "oobcvec"
 > "[.oobcvec" <- function(x, ..., drop=T) {
+if (!missing(..1) && is.numeric(..1) && any(is.na(..1) | ..1 < 1 | 
..1 > length(x)))
+stop("numeric vector out of range")
+NextMethod("[")
+ }
 > x[2:3]
[1] 2 3
 > x[2:4]
Error in "[.oobcvec"(x, 2:4) : numeric vector out of range
 >

Then, for vectors for which you want out-of-bounds checks done when they 
indexed, set the class to "oobcvec".  This should work for simple 
vectors (I checked, and it works if the vectors have names).

If you want this write a method like this for indexing matrices, you can 
use ..1 and ..2 to refer to the i and j indices.  If you want to also be 
able to check for missing character indices, you'll just need to add 
more code.  Note that the above example disallows 0 and negative 
indices, which may or may not be what you want.

If you're extensively using other classes that you've defined, and you 
want out-of-bounds checking for them, then you need to integrate the 
checks into the subsetting methods for those classes -- you can't just 
use the above approach.

hope this helps,
Tony Plate
Vivek Rao wrote:
I want R to stop running a script (after printing an
error message) when an array subscript larger than the
length of the array is used, for example
x = c(1)
print(x[2])
rather than printing NA, since trying to access such
an element may indicate an error in my program. Is
there a way to get this behavior in R? Explicit
testing with the is.na() function everywhere does not
seem like a good solution. Thanks.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] terminate R program when trying to access out-of-bounds array element?

2005-04-13 Thread Tony Plate
One way could be to make a special class with an indexing method that 
checks for out-of-bounds numeric indices.  Here's an example for vectors:

> setOldClass(c("oobcvec"))
> x <- 1:3
> class(x) <- "oobcvec"
> x
[1] 1 2 3
attr(,"class")
[1] "oobcvec"
> "[.oobcvec" <- function(x, ..., drop=T) {
+if (!missing(..1) && is.numeric(..1) && any(is.na(..1) | ..1 < 1 | 
..1 > length(x)))
+stop("numeric vector out of range")
+NextMethod("[")
+ }
> x[2:3]
[1] 2 3
> x[2:4]
Error in "[.oobcvec"(x, 2:4) : numeric vector out of range
>

Then, for vectors for which you want out-of-bounds checks done when they 
indexed, set the class to "oobcvec".  This should work for simple 
vectors (I checked, and it works if the vectors have names).

If you want this write a method like this for indexing matrices, you can 
use ..1 and ..2 to refer to the i and j indices.  If you want to also be 
able to check for missing character indices, you'll just need to add 
more code.  Note that the above example disallows 0 and negative 
indices, which may or may not be what you want.

If you're extensively using other classes that you've defined, and you 
want out-of-bounds checking for them, then you need to integrate the 
checks into the subsetting methods for those classes -- you can't just 
use the above approach.

hope this helps,
Tony Plate
Vivek Rao wrote:
I want R to stop running a script (after printing an
error message) when an array subscript larger than the
length of the array is used, for example
x = c(1)
print(x[2])
rather than printing NA, since trying to access such
an element may indicate an error in my program. Is
there a way to get this behavior in R? Explicit
testing with the is.na() function everywhere does not
seem like a good solution. Thanks.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problem using uniroot with integrate

2005-03-09 Thread Tony Plate
At Wednesday 09:27 AM 3/9/2005, Ken Knoblauch wrote:
Hi,
I'm trying to calculate the value of the variable, dp, below, in the
argument to the integral of dnorm(x-dp) * pnorm(x)^(m-1).  This
corresponds to the estimate of the sensitivity of an observer in an
m-alternative forced choice experiment, given the probability of
a correct response, Pc, a Gaussian assumption for the noise and
no bias.  The function that I wrote below gives me an error:
Error in f(x, ...) : recursive default argument reference
The problem seems to be at the statement using uniroot,
because the furntion est.dp works fine outside of the main function.
I've been using R for awhile but there are still many nuances
about the scoping and the use of environments that I'm weak on
and would like to understand better.  I would appreciate any
suggestions or solutions that anyone might offer for fixing
my error.  Thank you.
dprime.mAFC <- function(Pc, m) {
est.dp <- function(dp, Pc = Pc, m = m) {
  pr <- function(x, dpt = dp, m0 = m) {
dnorm(x - dpt) * pnorm(x)^(m0 - 1)
}
  Pc - integrate(pr, lower = -Inf, upper = Inf,
  dpt = dp, m0 = m)$value
}
dp.res <- uniroot(est.dp, interval = c(0,5), Pc = Pc, m = m)
dp.res$root
}
You've got several problems here
* recursive argument defaults: these are unnecessary but result in the 
particular error message you are seeing (e.g., in the def of est.dp, the 
default value for the argument 'm' is the value of the argument 'm' itself 
-- default values for arguments are interpreted in the frame of the 
function itself)
* the argument m=m you supply to uniroot() is being interpreted as 
specifying the 'maxiter' argument to uniroot()

I think you can fix it by changing the 'm' argument of function est.dp to 
be named 'm0', and specifying 'm0' in the call to uniroot.  (but I can't 
tell for sure because you didn't supply a working example -- when I just 
guess at values to pass in I get numerical errors.)
Also, it would be best to remove the incorrect recursive default arguments 
for the functions est.dp and pr.

-- Tony Plate
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] glm and percentage data with many zero values

2005-03-08 Thread Tony Plate
A very quick and easy thing to do with count data is to add 1 (or 0.5) to 
all your counts (I'm sure you can work backwards from abundance data to 
counts and then forward again).  This gets rid of zero problems.  In some 
cases this approximates a Bayesian approach with a low-information prior 
(though I'm not at all sure whether this is the case with a glm with 
Poisson errors).

-- Tony Plate
At Wednesday 08:02 AM 4/20/2005, Christian Kamenik wrote:
Dear all,
I am interested in correctly testing effects of continuous environmental 
variables and ordered factors on bacterial abundance. Bacterial abundance 
is derived from counts and expressed as percentage. My problem is that the 
abundance data contain many zero values:
Bacteria <- 
c(2.23,0,0.03,0.71,2.34,0,0.2,0.2,0.02,2.07,0.85,0.12,0,0.59,0.02,2.3,0.29,0.39,1.32,0.07,0.52,1.2,0,0.85,1.09,0,0.5,1.4,0.08,0.11,0.05,0.17,0.31,0,0.12,0,0.99,1.11,1.78,0,0,0,2.33,0.07,0.66,1.03,0.15,0.15,0.59,0,0.03,0.16,2.86,0.2,1.66,0.12,0.09,0.01,0,0.82,0.31,0.2,0.48,0.15)

First I tried transforming the data (e.g., logit) but because of the zeros 
I was not satisfied. Next I converted the percentages into integer values 
by round(Bacteria*10) or ceiling(Bacteria*10) and calculated a glm with a 
Poisson error structure; however, I am not very happy with this approach 
because it changes the original percentage data substantially (e.g., 0.03 
becomes either 0 or 1). The same is true for converting the percentages 
into factors and calculating a multinomial or proportional-odds model 
(anyway, I do not know if this would be a meaningful approach).
I was searching the web and the best answer I could get was 
http://www.biostat.wustl.edu/archives/html/s-news/1998-12/msg00010.html in 
which several persons suggested quasi-likelihood. Would it be reasonable 
to use a glm with quasipoisson? If yes, how I can I find the appropriate 
variance function? Any other suggestions?

Many thanks in advance, Christian

Christian Kamenik
Institute of Plant Sciences
University of Bern
Altenbergrain 21
3013 Bern
Switzerland
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] two dimensional array of object elements

2005-02-11 Thread Tony Plate
Create your original matrix as a list datatype.  When assigning elements, 
be careful with the list structure, as the example indicates.

> m <- 2; n <- 3
> a <- array(list(),c(m,n))
> a[1,2] <- list(b=1,c=2)
Error in "[<-"(`*tmp*`, 1, 2, value = list(b = 1, c = 2)) :
number of items to replace is not a multiple of replacement length
> a[1,2] <- list(list(b=1,c=2))
>

At Friday 11:36 AM 2/11/2005, Weijie Cai wrote:
Hi list,
I want to create a two (possibly three) dimensional array of objects. 
These objects are classes in object oriented style. I failed by using
a<-array(NA,c(m,n))
for (i in 1:m){
 for (j in 1:n){
   a[i,j]<-My.Obj
 }
}

The elements are still NA. Any suggestions?
Thanks
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] reading the seed from a simulation

2004-12-17 Thread Tony Plate
With most modern random number generators you can't capture the current 
state in a single 32-bit integer.  (I suspect the .Random.seed you are 
seeing is the state contained in 625 integers).

The easiest way to run reproducible simulations is to explicitly set the 
seed, using an integer, before each run.  Then it's easy to put the random 
number generator into the same state again, e.g.:

for (sim.num in 1:100) {
  set.seed(sim.num)
  ... run simulation ...
}
If you can't do this, you can record the value of .Random.seed prior to the 
simulation, and then when you want to reproduce that simulation again, set 
.Random.seed to that value, e.g.:

> set.seed(1)
> sample(1:100, 5)
[1] 27 37 57 89 20
> sample(1:100, 5)
[1] 90 94 65 62  6
> set.seed(1)
> sample(1:100, 5)
[1] 27 37 57 89 20
> saved.seed <- .Random.seed
> sample(1:100, 5)
[1] 90 94 65 62  6
> .Random.seed <- saved.seed
> sample(1:100, 5)
[1] 90 94 65 62  6
>
This is not guaranteed to work with all random-number generators; see the 
NOTE section in ?set.seed

-- Tony Plate
At Friday 09:50 AM 12/17/2004, Suzette Blanchard wrote:
Greetings,
I have a simulation of a nonlinear model that
is failing.  But it does not fail til way into the simulation.
I would like to look at the run that is failing
and maybe I could if I could capture the seed for the
failing run.  The help file on set.seed says you can do it
but when I tried
rs<-.Random.seed
print(paste("rs",rs,sep=" "))
I got 626 of them so I don't know how to identify the right
one.  Please can you help?
Thank you,
Suzette
=
Suzette Blanchard, Ph.D.
UCSD-PPRU
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Percentages in contingency tables *warning trivial question*

2004-12-13 Thread Tony Plate
The 'abind' function in the 'abind' package is a generalized binding 
functions for arrays.  (I've never tried it with tables.)

At Monday 04:36 AM 12/13/2004, BXC (Bendix Carstensen) wrote:
[...snip...]
The last step is necessary in the absence of a generalized cbind/rbind
for tables/arrays.
Please correct me if such a thing exists. If it does, it should be
referenced under "see also" in the help page for cbind.
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Re: Protocol for answering basic questions

2004-12-04 Thread Tony Plate
Perhaps something like the following paragraph should be added to the start 
of the "Posting Guide" (as a new paragraph right after the existing first 
paragraph):

Note that R-help is *not* intended for questions that are easily answered 
by consulting one of the FAQs or other introductory material (see "Do your 
homework before posting" below).Such questions are actively discouraged 
and are likely to evoke a brusque response.  Questions about seemingly 
simple matters that are mentioned in the FAQs or other introductory 
material *are welcomed* on R-help *when the questioner obviously has done 
their homework and the question is accompanied by an explanation* like "FAQ 
7.2.1 seems to be relevant to this but I couldn't understand/apply the 
answer because ...".

Something like this would make it very clear up front what type of 
questions are not appropriate.  (I'm not trying at all to dictate the 
policy, but as far as I can tell, the above summaries the attitude of the 
majority of very knowledgeable helpers that respond to questions on R-help.)

Also, I think that John Maindonald's idea of a "I am new to R, where do I 
start?" page, with a link from the posting guide, is an excellent idea.

I'm aware that some feel that the posting guide is already too long, but my 
feeling is that if users don't read a very easily accessible posting guide 
AND post inappropriate questions AND become offended by brusque responses, 
then they are beyond where they can easily be helped.  The most important 
thing is to make it very clear what types of questions are and are not 
considered appropriate, so that beginning users know what they are getting 
into.

And the following might merit inclusion in the FAQ:
Why is R-help not for hand-holding beginner questions?
R-help is a high traffic list and the general sentiment is that too many 
very simple questions will overwhelm everyone and most importantly result 
in the knowledgeable helpers ceasing to participate.  The reason that there 
is no "R-help-me-quickly-I-dont-want-to-read-the-documentation" list is 
that no-one has felt that it would work well -- it is unlikely that many 
knowledgeable users of R would be willing to participate.  Without such 
users participating, it is likely that sometimes bad advice would be 
offered and stand uncorrected, because R is a complex language with many 
ways of doing things, some markedly inferior to others.  For these reasons, 
some feel it would be a very bad idea to create such a list.  (However, 
anyone who believes otherwise and wishes to start and maintain such a list 
or other similar service is free to do so.)  One reason for this overall 
state of affairs is that R is free software and consequently there is no 
revenue stream to support a hand-holding support service with paid 
employees.  So although the actual software is free, some investment in 
terms of time spent reading documentation is required in order to use 
it.  Furthermore, many of the frequent helpers on R-help have written 
introductory documents intended to help beginners with many aspects of 
learning and using R (e.g., "An Introduction to R", and the various 
FAQs).  Consequently they sometimes get fed up getting asked again and 
again the same question they have already written a document to 
explain.  Nonetheless, the general sentiment on R-help is very helpful -- a 
quote summarizes it well: "It's OK if you need some spoonfeeding (I need 
that quite often myself), but at least show how you have tried to use the 
spoon yourself, instead of just showing us your open mouth."  [Attribution 
to Andy Liaw, or remain anonymous?]

As some feel that sufficient time and bandwidth has already been spent on 
this issue, if anyone has any comments on this particular matter of an 
addition to the posting guide (or FAQ), feel free to choose to respond to 
me privately, and I will summarize as appropriate.

-- Tony Plate
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] hashing using named lists

2004-11-18 Thread Tony Plate
Use match() for exact matching,
i.e.,
> test[[match("name", names(test))]]
Yes, it is more cumbersome.  This partial matching is considered by some to 
be a design fault, but changing it would break too many programs that 
depend upon it.

I don't understand your question about all.equal.list() -- it does seem to 
require exact matches on names, e.g.:

> all.equal(list(a=1:3), list(aa=1:3))
[1] "Names: 1 string mismatches"
> all.equal(list(aa=1:3), list(a=1:3))
[1] "Names: 1 string mismatches"
>
(the above run in R 2.0.0)
-- Tony Plate
(BTW, in R this operation is generally called "indexing" or "subscripting" 
or "extraction", but not "hashing".  "Hashing" is a specific technique for 
managing and looking up indices, which is why some other programming 
languages refer to list-like objects that are indexed by character strings 
as "hashes".  I don't think hashing is used for list names in R, but 
someone please correct me if I'm wrong! )

At Thursday 09:29 AM 11/18/2004, ulas karaoz wrote:
hi all,
I am trying to use named list to hash a bunch of vector by name, for instance:
test = list()
test$name = c(1,2,3)
the problem is that when i try to get the values back by using the name, 
the matching isn't done in an exact way, so
test$na is not NULL.

is there a way around this?
Why by default all.equal.list doesnt require an exact match?
How can I do hashing in R?
thanks.
ulas.
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Resources for optimizing code

2004-11-05 Thread Tony Plate
Have you tried reading the manual "An Introduction to R", with special 
attention to "Array Indexing" (indexing for data frames is pretty similar 
to indexing for matrices).

Unless I'm misunderstanding, what you want to do is very simple.  It is 
possible to use numeric vectors with 0 and 1 to indicate whether you want 
to keep the row, but it's a little easier with logical vectors.  Here's an 
example:

> x <- data.frame(a=1:5,b=letters[1:5])
> keep.num <- ifelse(x$a %% 2 == 1, 1, 0)
> keep.num
[1] 1 0 1 0 1
> keep.logical <- (x$a %% 2) == 1
> keep.logical
[1]  TRUE FALSE  TRUE FALSE  TRUE
> x[keep.num==1,,drop=F]
  a b
1 1 a
3 3 c
5 5 e
> x[keep.logical,,drop=F]
  a b
1 1 a
3 3 c
5 5 e
>

At Friday 10:34 AM 11/5/2004, Janet Elise Rosenbaum wrote:
I want to eliminate certain observations in a large dataframe (21000x100).
I have written code which does this using a binary vector (0=delete obs,
1=keep), but it uses for loops, and so it's slow and in the extreme it
causes R to hang for indefinite time periods.
I'm looking for one of two things:
1.  A document which discusses how to avoid for loops and situations in
which it's impossible to avoid for loops.
or
2.  A function which can do the above better than mine.
My code is pasted below.
Thanks so much,
Janet
# asst is a binary vector of length= nrow(DATAFRAME).
# 1= observations you want to keep.  0= observation to get rid of.
remove.xtra.f <-function(asst, DATAFRAME) {
n<-sum(asst, na.rm=T)
newdata<-matrix(nrow=n, ncol=ncol(DATAFRAME))
j<-1
for(i in 1:length(data)) {
if (asst[i]==1) {
newdata[j,]<-DATAFRAME[i,]
j<-j+1
}
}
newdata.f<-as.data.frame(newdata)
names(newdata.f)<-names(DATAFRAME)
return(newdata.f)
}
--
Janet Rosenbaum [EMAIL PROTECTED]
PhD Candidate in Health Policy, Harvard GSAS
Harvard Injury Control Research Center, Harvard School of Public Health
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Reading word by word in a dataset

2004-11-01 Thread Tony Plate
Trying to make it work when not all rows have the same numbers of fields 
seems like a good place to use the "flush" argument to scan() (to skip 
everything after the first field on the line):

With the following copied to the clipboard:
i1-apple10$   New_York
i2-banana
i3-strawberry   7$Japan
do:
> scan("clipboard", "", flush=T)
Read 3 items
[1] "i1-apple"  "i2-banana" "i3-strawberry"
> sub("^[A-Za-z0-9]*-", "", scan("clipboard", "", flush=T))
Read 3 items
[1] "apple"  "banana" "strawberry"
>
-- Tony Plate
At Monday 01:59 PM 11/1/2004, Spencer Graves wrote:
 Uwe and Andy's solutions are great for many applications but won't 
work if not all rows have the same numbers of fields.  Consider for 
example the following modification of Lee's example:
i1-apple10$   New_York
i2-banana
i3-strawberry   7$Japan

 If I copy this to "clipboard" and run Andy's code, I get the following:
> read.table("clipboard", colClasses=c("character", "NULL", "NULL"))
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
dec,  :
   line 2 did not have 3 elements

 We can get around this using "scan", then splitting things apart 
similar to the way Uwe described:
> dat <-
+ scan("clipboard", character(0), sep="\n")
Read 3 items
> dash <- regexpr("-", dat)
> dat2 <- substring(dat, pmax(0, dash)+1)
>
> blank <- regexpr(" ", dat2)
> if(any(blank<0))
+   blank[blank<0] <- nchar(dat2[blank<0])
> substring(dat2, 1, blank)
[1] "apple "  "banana"  "strawberry "

 hope this helps.  spencer graves
Uwe Ligges wrote:
Liaw, Andy wrote:
Using R-2.0.0 on WinXPPro, cut-and-pasting the data you have:

read.table("clipboard", colClasses=c("character", "NULL", "NULL"))

 V1
1  i1-apple
2 i2-banana
3 i3-strawberry

... and if only the words after "-" are of interest, the statement can be 
followed by

 sapply(strsplit(, "-"), "[", 2)
Uwe Ligges

HTH,
Andy

From: j lee
Hello All,
I'd like to read first words in lines into a new file.
If I have a data file the following, how can I get the
first words: apple, banana, strawberry?
i1-apple10$   New_York
i2-banana   5$London
i3-strawberry   7$Japan
Is there any similar question already posted to the
list? I am a bit new to R, having a few months of
experience now.
Cheers,
John
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] make apply() return a list

2004-11-01 Thread Tony Plate
for()-loops aren't so bad.  Look inside the code of apply() and see what it 
uses!  The important thing is that you use vectorized functions to 
manipulate vectors.  It's often fine to use for-loops to manipulate the 
rows or columns of a matrix, but once you've extracted a row or a column, 
then use a vectorized function to manipulate that data.

In any case, one way to get apply() to return a list is to wrap the result 
from the subfunction inside a list, e.g.:

> x <- apply(matrix(1:6,2), 1, function(x) list((c(mean=mean(x), sd=sd(x)
> x
[[1]]
[[1]][[1]]
mean   sd
   32
[[2]]
[[2]][[1]]
mean   sd
   42
> # to remove the extra level of listing here, do:
> lapply(x, "[[", 1)
[[1]]
mean   sd
   32
[[2]]
mean   sd
   42
>
At Monday 11:37 AM 11/1/2004, Arne Henningsen wrote:
Hi,
I have a dataframe (say myData) and want to get a list (say myList) that
contains a matrix for each row of the dataframe myData. These matrices are
calculated based on the corresponding row of myData. Using a for()-loop to do
this is very slow. Thus, I tried to use apply(). However, afaik apply() does
only return a list if the matrices have different dimensions, while my
matrices have all the same dimension. To get a list I could change the
dimension of one matrix artificially and restore it after apply():
This a (very much) simplified example of what I did:
> myData <- data.frame( a = c( 1,2,3 ), b = c( 4,5,6 ) )
> myFunction <- function( values ) {
+myMatrix <- matrix( values, 2, 2 )
+if( all( values == myData[ 1, ] ) ) {
+   myMatrix <- cbind( myMatrix, rep( 0, 2 ) )
+}
+return( myMatrix )
+ }
> myList <- apply( myData, 1, myFunction )
> myList[[ 1 ]] <- myList[[ 1 ]][ 1:2, 1:2 ]
> myList
$"1"
 [,1] [,2]
[1,]11
[2,]44
$"2"
 [,1] [,2]
[1,]22
[2,]55
$"3"
 [,1] [,2]
[1,]33
[2,]66
This exactly does what I want and really speeds up the calculation, but I
wonder if there is an easier way to make apply() return a list.
Thanks for your help,
Arne
--
Arne Henningsen
Department of Agricultural Economics
University of Kiel
Olshausenstr. 40
D-24098 Kiel (Germany)
Tel: +49-431-880 4445
Fax: +49-431-880 1397
[EMAIL PROTECTED]
http://www.uni-kiel.de/agrarpol/ahenningsen/
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] why should you set the mode in a vector?

2004-10-29 Thread Tony Plate
It's useful when you need to be certain of the mode of a vector.  One such 
situation is when you are about to call a C-language function using the 
.C() interface.  As you point out, some assignments (even just to vector 
elements) can change the mode of the entire vector.  This is why it's 
important to check the mode of vectors passed to external language 
functions immediately before the call.

As to what assigning the mode does, it specifies (or changes, if necessary) 
the underlying type of storage of the vector.  In R, all the elements in a 
vector have the same storage mode.  In the example below, the storage is 
initial as double-precision floats, but after the assignment of character 
data to element 2, the vector is stored as character data (with suitably 
coerced values of the other elements).  After assignment of list data to 
element 1, the entire vector becomes a list (i.e., a vector of pointers to 
general objects).  [The terminology I'm using here is a little loose, but 
someone please correct me if it is outright wrong.]  Finally, the assigning 
of mode "numeric" to the list fails because not all elements can be 
coerced.  (And I'm not sure why the last assignment succeeds and produces 
the results it does.)

> v <- vector(mode="numeric",length=4)
> v[3:4] <- 3:4
> storage.mode(v)
[1] "double"
> v[2] <- "foo"
> v
[1] "0"   "foo" "3"   "4"
> storage.mode(v)
[1] "character"
>
> v[1] <- list(1:3)
> v
[[1]]
[1] 1 2 3
[[2]]
[1] "foo"
[[3]]
[1] "3"
[[4]]
[1] "4"
> mode(v) <- "numeric"
Error in as.double.default(list(as.integer(c(1, 2, 3)), "foo", "3", "4")) :
(list) object cannot be coerced to double
> x <- v[2:4]
> mode(x) <- "numeric"
> x
[1] NA NA NA
>
-- Tony Plate
At Friday 03:41 PM 10/29/2004, Joel Bremson wrote:
Hi all,
If I write
v = vector(mode="numeric",length=10)
I'm still allowed to assign non-numerics to v.
Furthermore, R figures out what kind of vector I've got anyway
when I use the mode() function.
So what is it that assigning a mode does?
Joel
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub() on Matrix

2004-10-28 Thread Tony Plate
Many more recent regular expression implementations have ways of indicating 
a match on a word boundary.  It's usually "\b".

Here's what you did:
> gsub("x1", "i1", "x1 + x2 + x10 + xx1")
[1] "i1 + x2 + i10 + xi1"
The following worked for me to just change "x1" to "i1", while leaving 
alone any larger "word" that contains "x1":

> gsub("\\bx1\\b", "i1", "x1 + x2 + x10 + xx1")
[1] "i1 + x2 + x10 + xx1"
>
Note that the backslash must be escaped itself to get past the R lexical 
analyser, which is independent of the regexp processor.  What the regexp 
processor sees is just a single backslash.

For more on this, look for perl documentation of regular expressions.  Be 
aware that to use full perl regexps, you must supply the perl=T argument to 
gsub().  Also note that "\b" seems to be part of the most basic regular 
expression language in R; it even works with extended=F:

> gsub("\\bx1\\b", "i1", "x1 + x2 + x10 + xx1", perl=T)
[1] "i1 + x2 + x10 + xx1"
> gsub("\\bx1\\b", "i1", "x1 + x2 + x10 + xx1", perl=F)
[1] "i1 + x2 + x10 + xx1"
> gsub("\\bx1\\b", "i1", "x1 + x2 + x10 + xx1", perl=F, ext=F)
[1] "i1 + x2 + x10 + xx1"
>
(I assumed the fact that you have a matrix of strings is not relevant.)
Hope this helps,
Tony Plate
At Wednesday 09:07 PM 10/27/2004, Kevin Wang wrote:
Hi,
Suppose I've got a matrix, and the first few elements look like
  "x1 + x3 + x4 + x5 + x1:x3 + x1:x4"
  "x1 + x2 + x3 + x5 + x1:x2 + x1:x5"
  "x1 + x3 + x4 + x5 + x1:x3 + x1:x5"
and so on (have got terms from x1 ~ x14).
If I want to replace all the x1 with i7, all x2 with i14, all x3 with i13,
for example.  Is there an easy way?
I tried to put what I want to replace in a vector, like:
 repl = c("i7", "i14", "i13", "d2", "i8", "i5",
  "i6", "i3", "A", "i9", "i2",
  "i4", "i15", "i21")
and have another vector, say:
  > orig
 [1] "x1"  "x2"  "x3"  "x4"  "x5"  "x6"  "x7"  "x8"  "x9"  "x10"
[11] "x11" "x12" "x13" "x14"
Then I tried something like
  gsub(orig, repl, mat)
## mat is the name of my matrix
but it didn't work *_*.it would replace terms like x10 with i70.
(I know it may be an easy question...but I haven't done much regular
expression)
Cheers,
Kevin

Ko-Kang Kevin Wang
PhD Student
Centre for Mathematics and its Applications
Building 27, Room 1004
Mathematical Sciences Institute (MSI)
Australian National University
Canberra, ACT 0200
Australia
Homepage: http://wwwmaths.anu.edu.au/~wangk/
Ph (W): +61-2-6125-2431
Ph (H): +61-2-6125-7407
Ph (M): +61-40-451-8301
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] indexing problem

2004-10-19 Thread Tony Plate
Maybe this does what you want:
> dm <- cbind(1:2,11:12,101:102)
> idx <- cbind(c(1,2),c(2,3))
> row(idx)
 [,1] [,2]
[1,]11
[2,]22
> cbind(as.vector(row(idx)), as.vector(idx))
 [,1] [,2]
[1,]11
[2,]22
[3,]12
[4,]23
> dm[cbind(as.vector(row(idx)), as.vector(idx))]
[1]   1  12  11 102
> array(dm[cbind(as.vector(row(idx)), as.vector(idx))], dim=dim(idx))
 [,1] [,2]
[1,]1   11
[2,]   12  102
>
At Tuesday 12:43 PM 10/19/2004, you wrote:
ah sorry, here's an example:
> dm = cbind(1:2,11:12,101:102)
> dm
 [,1] [,2] [,3]
[1,]1   11  101
[2,]2   12  102
> idx=cbind(c(1,2),c(2,3))
> idx
 [,1] [,2]
[1,]12
[2,]23
the result I want to get:
1   11
12 102
that is: each row of idx gives the column index in dm
diana
Sundar Dorai-Raj wrote:
[EMAIL PROTECTED] wrote:
Hi,
I have the following indexing problem, can you help me please ?
Given:
dm = a data.frame or a matrix dm,
idx = a 2 columns (or any number) matrix with the same number of rows as dm
I want get a subset of dm, for each row, the columns which
specified by idx.
thank you, diana
Diana,
  From what I gather it appears as if you want to split dm by all the 
unique rows of idx? Is that right? If so, you can do the following:
x <- split(dm, do.call("paste", as.data.frame(idx))
This will split dm into a list with each element a subset of dm 
corresponding to a unique row in idx. The length of the x will be the 
number of unique rows in idx.
If this is not what you want, please provide an example and what you 
expect to see.
--sundar

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] read "4-jan-02" as date

2004-10-11 Thread Tony Plate
Works fine when you give as.Date() a character vector.  I suspect the Date 
column in your data frame is a factor.

> d <- c("12-Jan-01", "11-Jan-01", "10-Jan-01", "9-Jan-01", "8-Jan-01", 
"5-Jan-01")
> d
[1] "12-Jan-01" "11-Jan-01" "10-Jan-01" "9-Jan-01"  "8-Jan-01"  "5-Jan-01"
> as.Date(d, format="%d-%b-%y")
[1] "2001-01-12" "2001-01-11" "2001-01-10" "2001-01-09" "2001-01-08"
[6] "2001-01-05"
> as.Date(factor(d), format="%d-%b-%y")
Error in fromchar(x) : character string is not in a standard unambiguous format
>

Hope this helps,
Tony Plate
At Monday 09:04 AM 10/11/2004, bogdan romocea wrote:
Dear R users,
I have a column with dates (character) in a data frame:
12-Jan-01 11-Jan-01 10-Jan-01 9-Jan-01  8-Jan-01  5-Jan-01
and I need to convert them to (Julian) dates so that I can
sort the whole data frame by date. I thought it would be
very simple, but after checking the documentation and the
list I still don't have something that works.
1. as.Date returns the error below. What am I doing wrong?
As far as I can see the character strings are in standard
format.
d$Date <- as.Date(d$Date, format="%d-%b-%y")
Error in fromchar(x) : character string is not in a
standard unambiguous format
2. as.date {Survival} produces this error,
d$Date <- as.date(d$Date, order = "dmy")
Error in as.date(d$Date, order = "dmy") : Cannot coerce to
date format
3. Assuming all else fails, is there a text function
similar to SCAN in SAS? Given a string like "9-Jan-01" and
"-" as separator, I'd like a function that can read the
first, second and third values (9, Jan, 01), so that I can
get Julian dates with mdy.date {survival}.
Thanks in advance,
b.
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


  1   2   >