[R] encoding accentsand tildes in R Macosx
Hello, In R under Mac OS X 10.5.4 I've had problems when I've tried to read a data.frame with characters including tildes and accents. For instance Floreña is changed to Flore\x96a and Ranchería is changed to Rancher\x92a In the code: section<-read.table('Sectiondic.txt',sep='\t',header=T,stringsAsFactors=F,encoding=" ") I've changed the "encoding" argument but I have not could find the solution. Any suggestion? Thanks a lot Carlos Cuartas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mode value
Hello everyone, I would like to know if there is any function to calculate the mode value, or I have to build one to do it. Thanks so much Carlos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with read.table in Windows and Linux
Hello everyone, I'm trying to open the same file under Linux and Windows. Under Windows everything is ok but when I try to do it under Linux I have a mistake and I don't know why. This is the mistake: Error in make.names(col.names,unique=TRUE): string multibyte 1 invalid why? I write this when I want to do it under Windows: zz.info<-read.table(file("C:/Documents and Settings/Administrador/Desktop/carlos/aCGH/aCGH/examples/Anal_sin_norm_1Mb86_Segmentos3.txt","r+"),header=TRUE,sep="\t",dec=".") and under Linux: zz.info<-read.table(file("/home/carlos/Desktop/Anal_sin_norm_1Mb86_Segmentos3.txt","r+"),header=TRUE,sep="\t",dec=".") Why do I have problems under Linux?. If you need the text file tell me it. Thanks so much for your help Carlos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in X11
Hello everyone, I'm trying to plot a graphic in Linux, when I type X11() then I have an error which is the next: Error in X11(d$display, d$width, d$height, d$pointsize, d$gamma, d$colortype, : unable to start device X11cairo Why?, what I must do to fix it?. Thanks so much Carlos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with plotting table
Hi all, Let us have: x<-1:10 y<-x/2 plot(table(x), type="p") points(table(y), pch=2) Why does the last command plots the values of table(y) using the x coordinates of table(x)??? Am I doing something wrong? What would be a way of plotting the points of table(y) on their place? #this problem also occurs with: plot(table(y), type="p") points(table(x), pch=2) Thank you very much, Carlos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with plotting table
Thank you very much Duncan, that did the works. Thank you also Gavin and Bernardo for your feedback. Best regards, Carlos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Writing multiple matrices
Hello all :) I have a for loop where in each cycle I create certain matrix object, let´s say, X, I would like to write it so I use the write.table function but I would like to write as many matrices as cycles, this is, I would like to use a variable, let´s say y, that will be in the for, as in: for (y in 1:100) and then when I write the matrix in a file I would like to produce files with different names, for examen my_matrix_1.dat my_matrix_2.dat my_matrix_3.dat . . . my_matrix_100.dat is there any way to do this with the write.table function? Thank you very much in advance Carlos -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] character count
Hi Ista, one way could be: ncharacters<-unlist(lapply(x,function(x)nchar(gsub(' ','',x ncharacters From: Ista Zahn To: r-help@r-project.org Sent: Friday, December 12, 2008 10:31:10 AM Subject: [R] character count Dear list, I have a variable that consists of typed responses. I wish to compute a variable equal to the number of characters in the original variable. For example: > x <- c("convert this to 32 because it has 32 characters", "this one has 22 > characters", "12 characters") [Some magic function here] > x [1] 32 22 12 Any ideas? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Useful books for learning the R software and the S programming language
Robert, I have Peter's book and I think it can be a very good place to start from... dispite the discount... :) If you like spatial analysis you can try to look for Roger Bivand et al. "Applied Spatial Data Analysis with R", If you are into something else try the "Use R" collection from Springer... you may find something not that "pricey" that you can use. Best regards Carlos Em 2009/01/12, às 22:07, Peter Dalgaard escreveu: Robert Wilk wrote: any useful books for learning the R statistical software? are they pricey? Many. "Useful" depends on the reader, though, so look around. Here's a starting point http://www.r-project.org/doc/bib/R-books.html (modesty should forbid me to point at item 18 on the list and the fact that Amazon US has it currently 19% discounted) In general R books are cheaper than statistical monographs, but more expensive than the large market computer science books. and if the books recommended focus on S, how compatible will they be for someone learning R? Such books are strongly outnumbered by now. One important book from that group is Venables+Ripley's Modern Applied Statistics with S explicitly addresses R issues. thank you in advance for your help. P.S. specialized survey statistical procedures? Is R good at that? Not R in itself, but the "survey" package for it is rumoured to be state of the art, and its author has a book on it in its final stages. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] number of Mondays
dear All, i'm trying to calculate the number of Mondays, Tuesdays, etc that each month within a date range has. I have time series data that spans 60 months and i want to calculate the number of Mondays, Tuesdays, Wed, etc of each month. (I want to control for weekly seasonality but my data is monthly). Is there an easy way to to this in R? or is there a package i could use? i did some quick search in the help files and R sites but could not find any answers. i appreciate any hint you could give, thanks. Carlos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] number of Mondays
Indeed, i overlooked weekdays. Thank you all for your replies! On Jan 15, 2009, at 21:23 , Prof Brian Ripley wrote: Or for those not allergic to reading help, see ?weekdays . Just how hard do you have to work to miss that? E.g. ??day works. On Thu, 15 Jan 2009, Peter Dalgaard wrote: Carlos Hernandez wrote: dear All, i'm trying to calculate the number of Mondays, Tuesdays, etc that each month within a date range has. I have time series data that spans 60 months and i want to calculate the number of Mondays, Tuesdays, Wed, etc of each month. (I want to control for weekly seasonality but my data is monthly). Is there an easy way to to this in R? or is there a package i could use? i did some quick search in the help files and R sites but could not find any answers. i appreciate any hint you could give, This is where POSIXlt objects are useful: unlist(unclass(as.POSIXlt(ISOdate(1959,3,11 sec min hour mday mon year wday yday isdst 0 01211 259 369 0 Which means that I was born on a Wednesday (wday==3) in March (mon==2) (some of the fields count from 0 and others, like mday, from 1; presumably some UNIX vendor back in the Stone Age got their implementation turned into a standard...). This allows you to do stuff like: dd <- seq(Sys.Date(),as.Date("2009-3-11"),1) dd <- as.POSIXlt(dd) with(dd, table(mon,wday)) wday mon 0 1 2 3 4 5 6 0 2 2 2 2 3 3 3 1 4 4 4 4 4 4 4 2 2 2 2 2 1 1 1 which I think is pretty much what you were looking for. thanks. Carlos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] permutation of the rows of a matrix that minimizes the distance to another matrix
Hi! I have 2 matrices of numbers m1 and m2 with the same number of columns and rows. I would like to compute m2', the permutation of the rows of m2 such that the distance (e.g., sum(m1-m2') or sum((m1-m2')^2)) is minimized. Do you know of any function/algorithm to obtain such a permutation? Best regards, Carlos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] van der Corput sequences
Alberto, I think the functions below do what you want: > vanDerCorput(12,6) [1] 0.1667 0. 0.5000 0.6667 0.8333 0.0278 [7] 0.1944 0.3611 0.5278 0.6944 0.8611 0.0556 Regards, Carlos number2digits=function(n,base){ #first digit in output is the least significant digit=n%%base if (nhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Datasets in R
I´m trying to find datasets that will give me residuals, after applying the lm function, with no normality, non linearity, and heteroscedacity so I can try to exemplify those cases in the linear regression model. Can you give any advice on what datasets would be appropiate? I can´t use the ones in the alr3 package because those have already been seen in class. Thank you very much :-) natorro -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] left-aligned title?
Hi, I am trying to insert a letter in a plot corner outside the plotting area. Thus, "legend" and "text" don't seem to work. "title" does the trick, but I cannot find a way of moving it from the center to the left corner... I already tried with a few parameters from par, but title does not take them. Would anyone have an idea on how to pull this one off? Thank you very much, Carlos Gershenson http://homepages.vub.ac.be/~cgershen/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Constrained regression
Dear list members, I am trying to get information on how to fit a linear regression with constrained parameters. Specifically, I have 8 predictors , their coeffiecients should all be non-negative and add up to 1. I understand it is a quadratic programming problem but I have no experience in the subject. I searched the archives but the results were inconclusive. Could someone provide suggestions and references to the literature, please? Thank you very much. Carlos Carlos Alzola [EMAIL PROTECTED] (703) 242-6747 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Constrained regression
I would like to acknowledge the answers I received from Tom Filloon, Mike Cheung and Berwyn Turlach. Berwyn's response was exactly what I needed. Use solve.QP from the quadprog package in R. S-Plus has the equivalent function solveQP in the NuOpt module. Berwyn's response is below G'day Carlos, On Mon, Mar 3, 2008 at 11:52 AM Carlos Alzola <[EMAIL PROTECTED]> wrote: > I am trying to get information on how to fit a linear regression with > constrained parameters. Specifically, I have 8 predictors , their > coeffiecients should all be non-negative and add up to 1. I understand > it is a quadratic programming problem but I have no experience in the > subject. I searched the archives but the results were inconclusive. > > Could someone provide suggestions and references to the literature, > please? A suggestion: > library(MASS) ## to access the Boston data > designmat <- model.matrix(medv~., data=Boston) Dmat <- > crossprod(designmat, designmat) dvec <- crossprod(designmat, > Boston$medv) Amat <- cbind(1, diag(NROW(Dmat))) bvec <- c(1, > rep(0,NROW(Dmat)) meq <- 1 > library(quadprog) > res <- solve.QP(Dmat, dvec, Amat, bvec, meq) The solution seems to contain values that are, for all practical purposes, actually zero: > res$solution [1] 4.535581e-16 2.661931e-18 1.016929e-01 -1.850699e-17 [5] 1.458219e-16 -3.892418e-15 8.544939e-01 0.00e+00 [9] 2.410742e-16 2.905722e-17 -5.700600e-20 -4.227261e-17 [13] 4.381328e-02 -3.723065e-18 So perhaps better: > zapsmall(res$solution) [1] 0.000 0.000 0.1016929 0.000 0.000 0.000 [7] 0.8544939 0.000 0.000 0.000 0.000 0.000 [13] 0.0438133 0.000 So the estimates seem to follow the constraints. And the unconstrained solution is: > res$unconstrainted.solution [1] 3.645949e+01 -1.080114e-01 4.642046e-02 2.055863e-02 [5] 2.686734e+00 -1.776661e+01 3.809865e+00 6.922246e-04 [9] -1.475567e+00 3.060495e-01 -1.233459e-02 -9.527472e-01 [13] 9.311683e-03 -5.247584e-01 which seems to coincide with what lm() thinks it should be: > coef(lm(medv~., Boston)) (Intercept) crimzn indus chas 3.645949e+01 -1.080114e-01 4.642046e-02 2.055863e-02 2.686734e+00 noxrm age dis rad -1.776661e+01 3.809865e+00 6.922246e-04 -1.475567e+00 3.060495e-01 tax ptratio black lstat -1.233459e-02 -9.527472e-01 9.311683e-03 -5.247584e-01 So there seem to be no numeric problems. Otherwise we could have done something else (e.g calculate the QR factorization of the design matrix, say X, and give the R factor to solve.QP, instead of calculating X'X and giving that one to solve.QP). If the intercept is not supposed to be included in the set of constrained estimates, then something like the following can be done: > Amat[1,] <- 0 > res <- solve.QP(Dmat, dvec, Amat, bvec, meq) > zapsmall(res$solution) [1] 6.073972 0.00 0.109124 0.00 0.00 0.00 0.863421 [8] 0.00 0.00 0.00 0.00 0.00 0.027455 0.00 Of course, since after the first command in that last block the second column of Amat contains only zeros > Amat[,2] [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 we might as well have removed it (and the corresponding entry in bvec) > Amat <- Amat[, -2] > bvec <- bvec[-2] before calling solve.QP(). Note, the Boston data set was only used to illustrate how to fit such models, I do not want to imply that these models are sensible for these data. :-) Hope this helps. Cheers, Berwin Carlos Alzola [EMAIL PROTECTED] (703) 242-6747 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with script
e_counter] <- x_list[random_sq] y_list[range_counter] <- y_list[random_sq] + 1 myImagePlot(mexico_matrix + random_matrix) } } if (aleatorio < 0.50){ if(random_matrix[x_list[random_sq]+1, y_list[random_sq]] == 0) { random_matrix[x_list[random_sq] + 1, y_list[random_sq]] <- 1 flag <- 1 range_counter <- range_counter + 1 x_list[range_counter] <- x_list[random_sq] + 1 y_list[range_counter] <- y_list[random_sq] myImagePlot(mexico_matrix + random_matrix) } } if (aleatorio < 0.25) { if(random_matrix[x_list[random_sq]-1, y_list[random_sq]] == 0 ) { random_matrix[x_list[random_sq] - 1, y_list[random_sq]] <- 1 flag <- 1 range_counter <- range_counter + 1 x_list[range_counter] <- x_list[random_sq] - 1 y_list[range_counter] <- y_list[random_sq] myImagePlot(mexico_matrix + random_matrix) } } } } } sum(random_matrix) #--- I do the "sum(random_matrix)" just to check out if the number of zeroes needed have been set (never has reached 100 so far :-() I´m not sure what I am doing wrong, I´ve been working with this file in windows and mac at the same time in a colaboration. Any help will be greatly appreciated. Carlos -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] About Pareto distribution
Hi again :-) I finally was able to fix the program, thank you all very much for your help :-) Now I have a problem and I don´t know if it is possible to solve it with R, I have a data set, and because it is data from salaries I am suspecting it comes from a Pareto distribution, my questions are: 1. Is there any test of hypothesis in R to prove if data comes from a Pareto distribution? 2. How could I estimate the parameters with R? are there any functions for this? Thank you very much again :) Carlos -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to plot with different colours
Hello everyone, I'm trying to plot 3600 points and my idea is if this value is higher than 0.35 then this point must appear in green colour, if it's smaller than -0.35 then values must appear in red and if values are between -0.35 and 0.35 they must be in yellow. I'm thinking and I'm trying many things but I don't achieve it. Any idea?. Thanks so much Carlos Morales Diego __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] licensing of R packages
I know the standard answer to this kind of question is "get legal advice from a lawyer", but I would like to hear the (hopefully informed) opinion of other people. I would say that, according to the FSF's interpretation of the GPL, any R code using GPL packages can be distributed legally only using GPL-compatible licenses. http://www.gnu.org/licenses/gpl-faq.html#IfInterpreterIsGPL > Another similar and very common case is to provide libraries with the > interpreter which are themselves interpreted. For instance, Perl comes > with many Perl modules, and a Java implementation comes with many Java > classes. These libraries and the programs that call them are always > dynamically linked together. > > A consequence is that if you choose to use GPL'd Perl modules or Java > classes in your program, you must release the program in a > GPL-compatible way, regardless of the license used in the Perl or Java > interpreter that the combined Perl or Java program will run on. If the reasoning above applies to R as it does to Perl, all R code would be affected given that core packages like "base" are GPL. The interpretation of the R Foundation (the copyright holder in this case) seems more relaxed, but I wonder what is the intent of other people distributing R packages under the GPL. Maybe some of them would protest if R code using their package was distributed under a non-GPL-compatible license. For example, I would expect the authors of the GNU Scientific Library to defend that any package using "gsl" (a wrapper on their GPL library) should be published under a GPL-compatible license, being a derivative work (the FSF thinks so). Another question is if that "strict" interpretation of the GPL could be actually enforced, of course. Coming back to the GSL example, it seems a more flagrant violation of the license is already happening: http://www.numerit.com/gsl.htm (apparently the publisher of that product thinks that linking to a GPL dll doesn't impose any obligation to him, but the usual view of the FSF is quite the opposite; I just found that page by chance, I don't know anything else about that particular case). I've noticed that this question was posed in r-devel a couple of years ago, I'm surprised it didn't provoke more than one reply: https://stat.ethz.ch/pipermail/r-devel/2006-September/042715.html Cheers, Carlos PS: By the way, I think FAQ 2.11 should be fixed: it states that "R is released under the GNU General Public License (GPL)", without specifying the version and linking to http://www.gnu.org/copyleft/gpl.html (GPLv3). However, the COPYING file in the R directory corresponds to GPL2. -- View this message in context: http://www.nabble.com/licensing-of-R-packages-tp20497391p20497391.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] licensing of R packages
Prof Brian Ripley wrote: > > I'm not going into the original question except to point out that R is > licensed under GPL-2 and the quote was from the GPL-3 FAQ. As FSF > themselves insist, the two licences are incompatible. > Let me quote the corresponding section in the GPL2 FAQ, then: http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#IfInterpreterIsGPL > Another similar and very common case is to provide libraries with the > interpreter which are themselves interpreted. For instance, Perl comes > with many Perl modules, and a Java implementation comes with many Java > classes. These libraries and the programs that call them are always > dynamically linked together. > > A consequence is that if you choose to use GPL'd Perl modules or Java > classes in your program, you must release the program in a > GPL-compatible way, regardless of the license used in the Perl or Java > interpreter that the combined Perl or Java program will run on. Core R packages included in the R distribution are in fact "GPL (>= 2)" [*], but choosing GPLv2 or GPLv3 seems to make no difference in regard to the issue being discussed (again, according to the interpretation given by the FSF). Regards, Carlos [*] this is not the case for all the recommended packages in the distribution -- View this message in context: http://www.nabble.com/licensing-of-R-packages-tp20497391p20503264.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] licensing of R packages
Barry Rowlingson wrote: > > This misconception of the license terms comes about because of the > use of the word 'use'. If I distribute a short C program that has a > call in it to a function that has the same name as something in the > GSL, does my C program use the GSL? No. Maybe it _mentions_ the GSL, > but the GPL has no problems with that. Maybe the GPL has no problems with that, but GSL authors will have. For example, regarding a similar situation one of the GSL authors commented: [http://sourceware.org/ml/gsl-discuss/2001-q4/msg00033.html] > Any distributed code which refers to GSL functions should be licensed > to the end-user under the GPL. The intent of the GPL is that we make > our code free to other people if they do the same for us --- two-way > cooperation. The current R-quant license is not a free software > license so there should not be anything distributed under that license > which directly refers to GSL functions. Barry Rowlingson wrote: > > I'm distributing my C program, and not the GPL-covered code, so I can > license it how I like. > And the copyright owners have recourse to legal action if they think there is a license violation. Again, I don't know what a court would decide, but if you want to test the limits of the GPL license I would avoid challenging a GNU project :-) Cheers, Carlos -- View this message in context: http://www.nabble.com/licensing-of-R-packages-tp20497391p20504401.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] licensing of R packages
Duncan Murdoch-2 wrote: > > The way to lose a GPL lawsuit is to incorporate GPL'd code into your own > project, and then not follow the GPL when you redistribute. There's > evidence of that. > > But I've never heard of anyone linking to but not distributing GPL'd > code and being sued for it, let alone losing lawsuits over it. That's > evidence enough for me that it is a safe thing to do. > The LGPL covers the first point as well as the GPL does (I think), while explicitily allowing dynamic linking (so the second point is clearly not a problem). The FSF encourages using the GPL (and not the LGPL) precisely to make libraries available only to GPL projects. It's not surprising therefore that the GPL license scares people off. The safest option is to take the GPL at face value and accept the FSF interpretation, but depending on the jurisdiction, the details of the situation, and the level of risk-aversion, people might decide to do otherwise. The concept of "derivative work" is not really well defined, see for example http://www.rosenlaw.com/html/GPL.PDF To give another example related to R, the FSF foundation view of the world is that RPy, because it links dynamically to libR.so (or R.dll, etc), has to be distributed with a GPL (-compatible?) license. And the same restriction applies in turn to a python program using RPy (again, according to the FSF; because RPy and the "derivative work" are dynamically linked by the python interpreter). However, not everyone shares that view: http://mail.python.org/pipermail/python-list/2005-January/304974.html > On the basis of these clauses, the legal advice to us was that merely > including "import rpy" and making calls to RPy-wrapped R functions does > not invoke the provisions of the GPL because these calls only relate to > run-time linking, which is not covered by the GPL. However, combining > GPLed source code or static linking would invoke the GPL provisions. > [] > IANAL, and the above constitutes mangled paraphrasing of carefully > worded formal legal advice, the scope of which was restricted to > Australian law. However, the sections of the GPL quoted above are pretty > unambiguous. > The other, informal advice, was to ignore the FAQs and other opinions on > the FSF web site regarding intepretation of the GPL - it's only the > license text which counts. Cheers, Carlos -- View this message in context: http://www.nabble.com/licensing-of-R-packages-tp20497391p20509444.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] applying a function to another function
Hello, I did a function (sec_conop) whose arguments are syndic, well and wellconop. sec_conop(syndic='01syndic.txt',well='well-1.csv',wellconop='well-1.dat');closeAllConnections() This function takes “well” and “syndic”, matching between them and then it does some transformations. The result is exported to “wellconop”. I will apply this function to one hundred different “wells”. Therefore, for each well I use, the “wellconop argument will change too. For intance if “well” is now well-2.csv, the function will be sec_conop(syndic='01syndic.txt',well='well-2.csv',wellconop='well-2.dat');closeAllConnections() I am trying to apply this function automatically to all “well” I have, but I do not find the way. The last I tried, for three different “wells”, was : wells<-data.frame(funct=rep('sec_conop(',3),syndic=c('01syndic.txt','01syndic.txt','01syndic.txt'),well=c('well-1.csv','well-2.csv','well-3-1.csv'),wellconop=c('well-1.dat','well-2.dat','well-3.dat')) funct_3wells<-paste(wells$funct,"'",wells$syndic,"'", "," ,"'", wells$well,"'", "," , "'" ,wells$wellconop,"'",")",";","closeAllConnections()",sep='') lapply(funct_3wells,as.formula) This way works partially because the results in “wellconop” are truncated. Has anyone any suggestion? Thanks in advance Carlos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] applying a function several times
I am sorry. I am not sure if the mail a send before to this list was rejected because of header (subject). I've changed it. The first maybe was not appropriate. I did a function (sec_conop) whose arguments are syndic, well and wellconop. sec_conop(syndic='01syndic.txt',well='well-1.csv',wellconop='well-1.dat');closeAllConnections() This function takes well and syndic, matching between them and then it does some transformations. The result is exported to wellconop. I will apply this function to one hundred different wells. Therefore, for each well I use, the wellconop argument will change too. For intance if well is now well-2.csv, the function will be sec_conop(syndic='01syndic.txt',well='well-2.csv',wellconop='well-2.dat');closeAllConnections() I am trying to apply this function automatically to all well I have, but I do not find the way. The last I tried, for three different wells, was : wells<-data.frame(funct=rep('sec_conop(',3),syndic=c('01syndic.txt','01syndic.txt','01syndic.txt'),well=c('well-1.csv','well-2.csv','well-3-1.csv'),wellconop=c('well-1.dat','well-2.dat','well-3.dat')) funct_3wells<-paste(wells$funct,"'",wells$syndic,"'", "," ,"'", wells$well,"'", "," , "'" ,wells$wellconop,"'",")",";","closeAllConnections()",sep='') lapply(funct_3wells,as.formula) This way works partially because the results in wellconop are truncated. Has anyone any suggestion? Thanks in advance Carlos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in var(x, na.rm = na.rm) : no complete element pairs
Hello all, I'm trying to calculate the standar desviation and I'm using the function sd(x,na.rm=TRUE) and I have this error: Error in var(x, na.rm = na.rm) : no complete element pairs . Why happen this?, What can I do to solve it?. x is list of three numbers which I have from a table. Thanks so much from Spain Carlos Morales Diego __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in var(x, na.rm = na.rm) : no complete element pairs
Hello all, I'm trying to calculate the standar desviation with sd(x,na.rm=TRUE) and I don't know why I have this error Error in var(x, na.rm = na.rm) : no complete element pairs when I try to calculate it, I have been looking for information about this error but nothing. Why it happens?. What can I do to fix it?. Thanks so much from Spain Carlos Morales Diego __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in var(x, na.rm = na.rm) : no complete element pairs
Hello, I still have the same error which I have written in the Subject field, I leave here the code and I hope you can help me with this: filter.clones<-function(zz.info,crom.info) { clones.info<-zz.info cat("Removing clones which has a flag minor than 0\n") ord <- order(clones.info$Flags) clones.info<- clones.info[ ord, ] #for(j in 1:nrow(clones.info)) #{ del<-0 #print(j) del<-which(as.numeric(clones.info$Flags)<0) if (length(del)!=0) { #print(j) clones.info<-clones.info[-del,] #eliminados.info<-clones.info[del,] #if(j==1) #{ #j<-0 #} } #} ##Eliminar levaduras, moscas etc #for(j in 1:nrow(clones.info)) #{ del1<-0 del1<-grep("mix",clones.info$Name) if (length(del1)!=0) { #print(j) clones.info<-clones.info[-del1,] } #} #for(j in 1:nrow(clones.info)) #{ del2<-0 del2<-grep("fly",clones.info$Name) if (length(del2)!=0) { #print(j) clones.info<-clones.info[-del2,] } #} #for(j in 1:nrow(clones.info)) #{ del3<-0 del3<-grep("pombe",clones.info$Name) if (length(del3)!=0) { #print(j) clones.info<-clones.info[-del3,] } #} #for(j in 1:nrow(clones.info)) #{ del4<-0 del4<-grep("DMSO",clones.info$Name) if (length(del4)!=0) { #print(j) clones.info<-clones.info[-del4,] } #} #Eliminar los clones que estan unidos por un + o un menos #for(j in 1:nrow(clones.info)) #{ del5<-0 del5<-grep("[+]",clones.info$Name) if (length(del5)!=0) { #print(j) clones.info<-clones.info[-del5,] } #} #for(j in 1:nrow(clones.info)) #{ del6<-0 del6<-grep("[-]",clones.info$Name) if(length(del6)!=0) { #print(j) clones.info<-clones.info[-del6,] } #} #for(j in 1:nrow(clones.info)) #{ del7<-0 del7<-grep("rep",clones.info$Name) if(length(del7)!=0) { #print(j) clones.info<-clones.info[-del7,] } #} del8<-0 del8<-grep("REP",clones.info$Name) if(length(del8)!=0) { #print(j) clones.info<-clones.info[-del8,] } #cat("Numero de clones:",NROW(clones.info$Name),"\n") #chroms.info<-croms.info(PruebaDefinitiva.obj) #cat("Reordering the chromosomes\n") #ord <- order(chroms.info$picked_off_as_SI_name) #chroms.info<- chroms.info[ ord, ] #ord <- order(PruebaDefinitiva.obj$crom.info$picked_off_as_SI_name) ##crom.info <- crom.info[ ord, ] nrow(clones.info) #a<-PruebaDefinitiva.obj$zz.info #PruebaDefinitiva.obj$zz.info<-0 #PruebaDefinitiva.obj$zz.info<-clones.info #PruebaDefinitiva.obj$zz.info clones.info cat("Reordering the chromosomes\n") ord <- order(crom.info$picked_off_as_SI_name) crom.info<- crom.info[ ord, ] #arch.info<-cbind(arch.info,000) #names(arch.info)[NCOL(arch.info)]<-"Cromosomas" clones2.info<-clones.info clones2.info<-cbind(clones2.info,000) names(clones2.info)[NCOL(clones2.info)]<-"Cromosomas" clones2.info ##Añadir columna con los cromosomas #ncol(arch.info) #arch.info<-arch.info #arch.info<-cbind(arch.info,000) #names(arch.info)[NCOL(arch.info)]<-"Cromosomas" ord <- order(clones2.info$Name) clones2.info<- clones2.info[ ord, ] for(i in 1:nrow(clones2.info)) { cat("Processing clon ",i,"\n") find<-match(clones2.info$Name[i],crom.info$picked_off_as_SI_name,nomatch=0) print(find) if((length(find)!=0) &&(find!=0)) { clones2.info$Cromosomas[i]<-paste(crom.info$current_chromosome[find]) } find<-0 } del1<-0 del1<-grep("X",clones2.info$Cromosomas) if (length(del1)!=0) { #print(j) clones2.info<-clones2.info[-del1,] } del1<-0 del1<-grep("Y",clones2.info$Cromosomas) if (length(del1)!=0) { #print(j) clones2.info<-clones2.info[-del1,] } del1<-0 del1<-grep("Un_",clones2.info$Cromosomas) if (length(del1)!=0) { #print(j) clones2.info<-clones2.info[-del1,] } del1<-0 del1<-grep("DR",clones2.info$Cromosomas) if (length(del1)!=0) { #print(j)
Re: [R] Append to a csv file
Maybe each dataframe you are adding during the loop include the column name. I would add write.csv(mydata, file= âdata.csvâ=F, append=T,col.names=F) Hope that help Carlos To: r-help@r-project.org Sent: Monday, April 20, 2009 4:39:48 PM Subject: [R] Append to a csv file I am looping over a data set and at each loop I am creating a dataframe âmydataâ That I wanted to be saves in a .csv file, but I want all the results to be saved in the same file and this is the way I do it write.csv(mydata, file= âdata.csvâ=F, append=T) . the csv file looks fine but I always get the following warning message Warning messages: 1: In write.table(mydata, file =âdata.csvâ, ... : appending column names to file Does anyone see why R print out this warning message? -- View this message in context: http://www.nabble.com/Append-to-a-csv-file-tp23145471p23145471.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Logistic regression and R
Hello everybody :-) I have some data that I want to model with a logistic regression, most of the independent variables are numeric and the only dependent is categorical, I was thinking that I could apply a logistic regression using glm but I wanted to deepen my knowledge of this so I tried to do some reading and found the "iris" dataset, now I would like to ask two things, first if you know of any bibliography to read more about the logistic regression and R so I could understand and interpret better the output, and second, what could I do when I have some independent variables that are not only numerical but categorical too, i.e. mixed (categorical and numerical), can I still use a logistic regression? Thank you very much!!! :-D -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] extract and replace columns of matrices stored in a list
Dear All, I created a list (of length Z) in the following way: my.array <- vector("list", Z) then i assigned a matrix (of T rows by N columns) in each of the elements of the list my.array in the following way: my.array[[i]] <- matrix.data ##( matrix.data has dimensions TxN, and i repeated this command for i from 1 to Z, the matrix.data contains only numeric data) and 1. i would like to extract all the third columns of each of the Z matrices stored in my.array (such that i get a new list only with the 3rd columns of each matrix in the elements of a new list) 2. i would like to know how could i replace all the 3rd columns of each matrix in my.array if i have a second matrix (size ZxT) with these columns. is there a simple way to do these tasks? i appreciate any hints or advice. Carlos [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract and replace columns of matrices stored in a list
On Thu, Sep 3, 2009 at 4:34 PM, Henrique Dallazuanna wrote: > Try this: > > #1 > lapply(my.array, '[', , 3) > > this works! thank you a lot! > #2 > newThirdColumn <- sample(3) > lapply(my.array, replace, list = 7:9, values = newThirdColumn) > > i did not understand this last line, so far i couldn't make it work. would it be easier to replace the values (the third column of each matrix in my.array) using an array like in #1? thank you for your reply! > On Thu, Sep 3, 2009 at 11:16 AM, Carlos Hernandez > wrote: > >> Dear All, >> I created a list (of length Z) in the following way: >> >> my.array <- vector("list", Z) >> >> then i assigned a matrix (of T rows by N columns) in each of the elements >> of >> the list my.array in the following way: >> >> my.array[[i]] <- matrix.data ##( matrix.data has dimensions TxN, and i >> repeated this command for i from 1 to Z, the matrix.data contains only >> numeric data) >> >> and >> 1. i would like to extract all the third columns of each of the Z matrices >> stored in my.array (such that i get a new list only with the 3rd columns >> of >> each matrix in the elements of a new list) >> >> 2. i would like to know how could i replace all the 3rd columns of each >> matrix in my.array if i have a second matrix (size ZxT) with these >> columns. >> >> is there a simple way to do these tasks? i appreciate any hints or advice. >> >> Carlos >> >>[[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Henrique Dallazuanna > Curitiba-Paraná-Brasil > 25° 25' 40" S 49° 16' 22" O > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SAS vs. R in web application
Good evening, I have been asked to investigate the pros and cons of using SAS vs. R in a web application. Either SAS or R would be the engine used to make some very simple calculations and to produce graphs, preferably in png format. The advantages of R are pretty obvious as there would be no licensing issues. The only drawback I can see is that when calling it in batch (using R CMD BATCH), a DOS window appears. Thus I have some basic questions: a) Is it possible to have R operate in the background without the DOS window appear? How? b) Is it correct that there will be no licensing issues? c) What would be an efficient way to run it? I am thinking of having R running in the client's local machine and upload the results to a central server. If using SAS, would the model described in c) above be the best way to design it, or would it be better to upload the raw data to the server and have SAS perform the calculations there. Would this option require a multi-user SAS license? (I know, I should check with SAS Institute, but I thought I'd ask anyway. Someone in the list may have done something similar). Thanks in advance for any suggestions. Carlos Alzola [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] more efficient vectorization of a function ?
dear All, i'm using the following two functions: share.vector <- function (vec1) { vec1 <- vec1 - max(vec1,na.rm=TRUE) -0.1 ## this line avoids overflow vec1 <- exp(vec1) vec2 <- vec1/(1+sum(vec1,na.rm=TRUE)) vec2 } share.matrix <- function (mat1) { out1 <- apply(mat1,2,share.vector) return(out1) } vec1 is a vector (of numeric data, usually small numbers), mat1 is a matrix with many vec1's is there another way to program them such that they are more efficient (in terms of time)? i appreciate any hints or advice. best regards, Carlos [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] could not find function "Varcov" after upgrade of R?
Did you type library(Hmisc,T) before loading Design? Carlos -- From: "David Freedman" <3.14da...@gmail.com> Sent: Saturday, September 12, 2009 8:26 AM To: Subject: Re: [R] could not find function "Varcov" after upgrade of R? I've had the same problem with predict.Design, and have sent an email to the maintainer of the Design package at Vanderbilt University. I wasn't even able to run the examples given on the help page of predict.Design - I received the same error about Varcov that you did. I *think* it's a problem with the package, rather than R 2.9.2, and I hope the problem will soon be fixed. I was able to use predict.Design with 2.9.2 until I updated the Design package a few days ago. david freedman zhu yao wrote: I uses the Design library. take this example: library(Design) n <- 1000 set.seed(731) age <- 50 + 12*rnorm(n) label(age) <- "Age" sex <- factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4))) cens <- 15*runif(n) h <- .02*exp(.04*(age-50)+.8*(sex=='Female')) dt <- -log(runif(n))/h label(dt) <- 'Follow-up Time' e <- ifelse(dt <= cens,1,0) dt <- pmin(dt, cens) units(dt) <- "Year" dd <- datadist(age, sex) options(datadist='dd') Srv <- Surv(dt,e) f <- cph(Srv ~ rcs(age,4) + sex, x=TRUE, y=TRUE) cox.zph(f, "rank") # tests of PH anova(f) # Error in anova.Design(f) : could not find function "Varcov" Yao Zhu Department of Urology Fudan University Shanghai Cancer Center No. 270 Dongan Road, Shanghai, China 2009/9/12 Ronggui Huang I cannot reproduce the problem you mentioned. > ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) > trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) > group <- gl(2,10,20, labels=c("Ctl","Trt")) > weight <- c(ctl, trt) > anova(lm.D9 <- lm(weight ~ group)) > sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=Chinese (Simplified)_People's Republic of China.936;LC_CTYPE=Chinese (Simplified)_People's Republic of China.936;LC_MONETARY=Chinese (Simplified)_People's Republic of China.936;LC_NUMERIC=C;LC_TIME=Chinese (Simplified)_People's Republic of China.936 attached base packages: [1] stats graphics grDevices utils datasets methods base 2009/9/12 zhu yao : > After upgrading R to 2.9.2, I can't use the anova() fuction. > It says "could not find function "Varcov" ". > What's wrong with my computer? Help needed, thanks! > > Yao Zhu > Department of Urology > Fudan University Shanghai Cancer Center > No. 270 Dongan Road, Shanghai, China > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- HUANG Ronggui, Wincent Doctoral Candidate Dept of Public and Social Administration City University of Hong Kong Home page: http://asrr.r-forge.r-project.org/rghuang.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/could-not-find-function-%22Varcov%22-after-upgrade-of-R--tp25412881p25414017.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fitting a curve to data points
Hello all. This is likely to be a silly question, but I have a set of data points and I want to fit a curve to it, like this: http://www.igc.usp.br/pessoais/guano/temp/curve.png. which function should I use? many thanks Carlos -- +---+ Carlos Henrique Grohmann - Guano Geologist M.Sc - Doctorate Student at IGc-USP - Brazil Linux User #89721 - carlos dot grohmann at gmail dot com +---+ _ "Good morning, doctors. I have taken the liberty of removing Windows 95 from my hard drive." --The winning entry in a "What were HAL's first words" contest judged by 2001: A SPACE ODYSSEY creator Arthur C. Clarke Can't stop the signal. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] truehist?
Hello, After a long time, I needed the truehist function, but my system couldn't found it. I tried to install the package MAAS, but I couldn't found it! Something happened? Carlos -- +---+ Carlos Henrique Grohmann - Guano Visiting Researcher at Kingston University London - UK Geologist M.Sc - Doctorate Student at IGc-USP - Brazil Linux User #89721 - carlos dot grohmann at gmail dot com +---+ _ "Good morning, doctors. I have taken the liberty of removing Windows 95 from my hard drive." --The winning entry in a "What were HAL's first words" contest judged by 2001: A SPACE ODYSSEY creator Arthur C. Clarke Can't stop the signal. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] truehist?
Oh yes. I did searched for help, but Ijust didn't read carefully, I read "MAAS", instead of MASS.. Carlos On 9/25/07, Peter Dalgaard <[EMAIL PROTECTED]> wrote: > Carlos "Guâno" Grohmann wrote: > > Hello, > > After a long time, I needed the truehist function, but my system > > couldn't found it. I tried to install the package MAAS, but I couldn't > > found it! Something happened? > > > > Carlos > > > > > It's in MASS (sic). > > help.search("truehist") would have told you. > > > -- >O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 > ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 > > > -- +---+ Carlos Henrique Grohmann - Guano Visiting Researcher at Kingston University London - UK Geologist M.Sc - Doctorate Student at IGc-USP - Brazil Linux User #89721 - carlos dot grohmann at gmail dot com +---+ _ "Good morning, doctors. I have taken the liberty of removing Windows 95 from my hard drive." --The winning entry in a "What were HAL's first words" contest judged by 2001: A SPACE ODYSSEY creator Arthur C. Clarke Can't stop the signal. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help Problems formatting date and using Regul function
Hello, I problem is in the format of the date, my time series is like this: 2006070100 1244 6162 2006070101 1221 6060 2006070102 1214 6060 2006070103 1194 5959 2006070104 1182 5858 2006070105 1178 5858 2006070106 1176 5858 2006070107 1173 5858 2006070108 1179 5859 2006070109 1246 6162 When I attempt to format the time like this: A <- read.table("file", sep="\t", col.names=c("date", "my1", "my2", "my3")) temp <- as.Date(A$date, format="%Y%m%d%H") temp I get [1] "4403-05-21" "4403-05-22" "4403-05-23" "4403-05-24" "4403-05-25" [6] "4403-05-26" "4403-05-27" "4403-05-28" "4403-05-29" "4403-05-30" Another problem is in REGUL, I using the variables created in the extraction of the data but the regulation is not possible REGUL Ts.regul<-regul(A$date, y=A$my2, xmin=2006070100, n=800, units="hours", frequency=1, deltat=1/3600, datemin=NULL, dateformat="m/d/Y", tol=NULL, tol.type="both", methods="linear", rule=1, f=0, periodic=FALSE, window=(2006080316 - 2006070100)/(800 - 1), split=100, specs=NULL) I think if the question is resolved the function REGUL will work to. Can someone help me? I only now start too use R. Thanks for the help in advance, João Santos [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] code optimization problem ... using or not using "which" function
hello all, I have two data sets that share certain fields of of interest ( facility, unit, date) which I want to match up, and from this extract information from one dataset and store it in the other. my first initial idea (which I know is bad) goes like this: ## capacity and new_trayloc are datasets in example code: for( i in 1: nrow( new_trayloc) { theshifts<-which(as.Date(capacity$shift_dt) == new_trayloc$admit_dt[i] & as.character(capacity$unit)==as.character(new_trayloc$UNIT_1[i]) & as.character(capacity$fac_id)==as.character(new_trayloc$ORIG_FAC_ID[i])) thenightshifts<-which(as.Date(capacity$shift_dt) == new_trayloc$admit_dt[i]-1 & as.character(capacity$unit)==as.character(new_trayloc$UNIT_1[i]) & as.character(capacity$fac_id)==as.character(new_trayloc$ORIG_FAC_ID[i])) . obtain information by using theshifts and thenightshifts objects and store in new_trayloc } . by doing a system.time on the entire for loop for 5 iterations, i get a time of user system elapsed 25.661.04 26.72 That seems really bad... and plus, i need to run it for over 100,000 iterations. Any suggestions in either the way I match the fields, or my approach to my problem? Cheers, Juan Carlos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] longitudinal analysis with latent construct
Folks, I need to test a model that has one predictor (a construct with three indicators) influencing four other variables. Something like what I try to show below. X ---> Y1, Y2, Y3, Y4 / | \ i1 i2 i3 Also, each variable was measured at 5 points in time. So, I'd like to model their change, in a longitudinal fashion (mixed model?). I know it would be too much to find a script that does it all, but I thought that maybe you guys have a reference for me to read and learn the steps I'll have to follow to perform this analysis. Any guidance would be very welcome as I'm trying to migrate from EQS to R and thus have no experience with R yet. Thanks in advance, Carlos Santos Jr. Veja quais são os assuntos do momento no Yahoo! +Buscados http://br.maisbuscados.yahoo.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Paste in a FOR loop
?eval Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Package installation
Why don't you post your message in the Bioconductor list? People there will be able to help you better. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Wed, 2008-12-31 at 08:00 -0500, jianying...@med.unc.edu wrote: > Dear all, > > I tried to install bioconductor package using biocLite(). I got warning that > c:/program file/R../library is not writable. In any rate, after the > downloading/installing, I looked for packages like "affy" "gcrma" etc. but I > did not seem them in the "library" folder. However, when I try to load the > library, it worked. > > Does anybody have any experience? Where are those packages installed? > > P.S. I am using window vista > > Thanks. > > Jianying > > - Original Message - > From: "Carlos J. Gil Bellosta" > Date: Wednesday, December 31, 2008 6:30 am > Subject: Re: [R] Paste in a FOR loop > To: Michael Pearmain > Cc: r-help@r-project.org > > > ?eval > > > > Carlos J. Gil Bellosta > > http://www.datanalytics.com > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R- > > project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I plot multiple XY plots on the same graph
Hello, You can use plot for the first plot and points for the subsequent ones. Points will add new points to the existing plot reusing the axes, labels, etc. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Wed, 2008-12-31 at 15:36 -0800, George Chen wrote: > Hello, > I have multiple data sets I would like to plot on the same XY scatterplot. > The data sets have in common the same Y values. > Could somebody tell me how to do this? > I tried par(T=new) (I think this was it) but it literally overlays plots on > each other. > > Thanks in advance. > > George > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the first and last observation for each subject
Hello, First, order your data by ID and time. The columns you want in your output dataframe are then unique(ID), tapply( x, ID, function( z ) z[ 1 ] ) and tapply( y, ID, function( z ) z[ lenght( z ) ] - z[ 1 ] ) Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Fri, 2009-01-02 at 17:20 +0800, gallon li wrote: > I have the following data > > ID x y time > 1 10 20 0 > 1 10 30 1 > 1 10 40 2 > 2 12 23 0 > 2 12 25 1 > 2 12 28 2 > 2 12 38 3 > 3 5 10 0 > 3 5 15 2 > . > > x is time invariant, ID is the subject id number, y is changing over time. > > I want to find out the difference between the first and last observed y > value for each subject and get a table like > > ID x y > 1 10 20 > 2 12 15 > 3 5 5 > .. > > Is there any easy way to generate the data set? > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the first and last observation for each subject
Hello, Is is truly y=max(y)-min(y) what you want below? Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Fri, 2009-01-02 at 13:16 -0500, Stavros Macrakis wrote: > I think there's a pretty simple solution here, though probably not the > most efficient: > > t(sapply(split(a,a$ID), > function(q) with(q,c(ID=unique(ID),x=unique(x),y=max(y)-min(y) > > Using 'unique' instead of min or [[1]] has the advantage that if x is > in fact not time-invariant, this gives an error rather than silently > ignore inconsistencies. > > Trying to package up this idiom into a function leads to: > > select <- > function(df, groupby, selection) >{ > pf <- parent.frame() > fields <- substitute(selection) > t(sapply(split(df,eval(substitute(groupby),df,enclos=pf)), > function(q) eval(fields,q,enclos=pf))) } > > which I admit is rather ugly (and does no error-checking), but it does work: > > > select(a,ID,list(min(ID),unique(x),max(y)-min(y))) > [,1] [,2] [,3] > 1 110 20 > 2 212 15 > 3 355 > > Perhaps some of the more experienced people on the list could show me > how to write this more cleanly. > >-s > > > On Fri, Jan 2, 2009 at 4:20 AM, gallon li wrote: > > I have the following data > > > > ID x y time > > 1 10 20 0 > > 1 10 30 1 > > 1 10 40 2 > > 2 12 23 0 > > 2 12 25 1 > > 2 12 28 2 > > 2 12 38 3 > > 3 5 10 0 > > 3 5 15 2 > > . > > > > x is time invariant, ID is the subject id number, y is changing over time. > > > > I want to find out the difference between the first and last observed y > > value for each subject and get a table like > > > > ID x y > > 1 10 20 > > 2 12 15 > > 3 5 5 > > .. > > > > Is there any easy way to generate the data set? > > > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Equivalent of match for data.frame
Hello, Why not something like lapply(mydf, function(x) match(myarg, x) ) ? Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Sat, 2009-01-03 at 07:24 -0500, Sébastien wrote: > Dear R-users, > > I am translating a S script into R and having some troubles with the > match function. This function appears to work with vector and data.frame > in S, but not in R, e.g.: > a <- rep((1:4), each = 10) > b <- rep((1:10), times = 4) > mydf <- data.frame(a,b) > myarg <- mydf[1,] > match(myarg, mydf) > > # S returns 1 but R returns NA NA > > I guess one could use match(interaction(myarg), interaction(mydf)) to do > the job but I was just wondering if there was a more direct function. > > Thanks, > > Sebastien > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if statement
Hello, If you do C <- A C[A > X & A < Y] <- 0 you get what it seems you want. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Mon, 2009-01-05 at 03:41 -0800, Shruthi Jayaram wrote: > A <- ts(rnorm(120), freq=12, start=c(1992,8)) > X <- 0.5 > Y <- 0.8 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Changing Matrix Header
Hello, colnames( dat ) <- NULL will do the trick. Carlos J. Gil Bellosta http://www.datanalytics.com On Tue, 2009-01-06 at 17:14 +0900, Gundala Viswanath wrote: > Dear all, > > I have the following matrix. > > > dat > A A A A A A A A A A > [1,] 0 0 0 0 0 0 0 0 0 0 > [2,] 0 0 0 0 0 0 0 0 0 1 > [3,] 0 0 0 0 0 0 0 0 0 2 > > How can I change it into: > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > [1,] 0 0 0 0 0 0 0 0 0 0 > [2,] 0 0 0 0 0 0 0 0 0 1 > [3,] 0 0 0 0 0 0 0 0 02 > > > I tried: > > > as.matrix(x) > > But failed. > > > - Gundala Viswanath > Jakarta - Indonesia > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbind for matrices - rep argument
On Wed, 2009-01-07 at 16:22 +0100, Niccolò Bassani wrote: > Dear R users,I'm facing a trivial problem, but I really can't solve it. I've > tried a dozen of codes, but I can't get the result I want. > The question is: I have a dataframe like this one > > [,1] [,2] [,3] [,4] [,5] > [1,]12345 > [2,]25549 > [3,]16812 > [4,]86415 > > made up of decimal numbers, of course. > I want to append this dataframe to itself a number x of times, i.e. 3. That > is I want a dataframe like this > > > [,1] [,2] [,3] [,4] [,5] > [1,]12345 > [2,]25549 > [3,]16812 > [4,]86415 > [5,]12345 > [6,]25549 > [7,]16812 > [8,]86415 > [9,]12345 > [10,]25549 > [11,]16812 > [12,]86415 > > I'm searching for an "authomatic" way to do this (I've already used the > rbind re-writing x times the name of the frame...), as it must enter a > function where one argument is exactly the number x of times to repeat this > frame. > > Any ideas?? > Thanks in advance! Hello, If your matrix is kk <- matrix( 1:16, 4, 4) You can do kkk <- lapply( 1:5, function(x) kk ) do.call(rbind, kkk) You can write your code in a single line, though. I used 5 here as a matter of example. You can build a function on these lines with an arbitrary argument if need be. Carlos J. Gil Bellosta http://www.datanalytics.com > Niccol > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] VCOV Source Code
Hello, You can do stats:::vcov.lm to see the source code for that particular method. In order to see which are the methods supported by vcov, write methods("vcov") Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Wed, 2009-01-07 at 21:37 -0600, Yang Wan wrote: > Dear R Help, > > > > I wonder the way to show the source code of [vcov] command. Usually, it > can show the source code after input the command and enter. But for > [vcov], it shows > > > > function (object, ...) > > UseMethod("vcov") > > > > > > I appreciate for your help. Best wishes. > > > > Christina > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dataframe with unequal rows
Hello, You are not very precise there. Do you mean that the rows in your text file do not all have the same number of separators (commas, in your case)? Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Thu, 2009-01-08 at 04:38 -0500, rahul-a.agar...@ubs.com wrote: > I have a data frame with unequal rows length separated by comma.I > have to read the data first and then calculate number of comma in each > row...how can I do that > > Regards Rahul > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] VaR-Monte carlo Simulation, Historic simulation, Variance-Covariance Simulation
Yes, there are: replicate and quantile are your friends. You will find better support in the R-Finance list, though. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Thu, 2009-01-08 at 01:36 -0800, Maithili Shiva wrote: > Dear R helpers > > Suppose I have a portfolio of securities with exposure to Equity, Bonds and > Forex (say $ 100 each). > > Is there any fucntion in R that will help me calculate Value at Risk (VaR) > using Monte carlo Simulation , Historic simulation and Variance - Covariance > Simulation. > > > With regards > > Maithili > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in the NY Times
On Thu, 2009-01-08 at 10:42 -0600, Stas Kolenikov wrote: > A really good measure for R will be the total # of the downloads of > r-base for all platforms from all CRAN mirrors (and I would expect > that # can be found from the servers' logs). Hello, You obviate here that many of us are downloading R from our Linux distribution repositories directly. Besides, given the free nature of R, some of us install it in several computers, even, in my case, briefly in somebody else's computer for a short time if I have an urgent task to solve. Of course, I would never do (or be able to do) this with SAS... So, the number of downloads from CRAN servers seems like a lousy proxy for the total number of users of SAS. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in the NY Times
On Thu, 2009-01-08 at 13:52 -0600, Marc Schwartz wrote: > Reading the posts on SAS-L since yesterday via Google RSS, where the > NYT > article was also posted, some have noted that SAS itself offers online > support forums (http://support.sas.com/forums/index.jspa). From a > quick > review, it looks like the SAS.com forums date back to perhaps early > 2006, thus possibly accounting for some of the leveling of the posts > on > SAS-L recently. Hello, Not only that: the corporate intranet of SAS (sections of which are sometime open for external consultants for certain products) also contain forums with an uneven traffic flow. These will certainly absorb part of the traffic that would otherwise hit lists like SAS-L. In fact, in my five years experience working (also as) a SAS consultant, I have never posted to SAS-L. However, I have posted (or had my requests posted by other SAS employees) on these lists. Having said that, I should also add that R represents a threat to SAS (which does not stand for Statistical Analysis System for a long time already) in a business segment that very doubtfully accounts for more than 5-10% of their revenue. They have to sell about 1000 licenses of SAS/BASE and SAS/STAT in order to match the annual revenues from a single license for a single "solution" in a single top tier bank. It is quite amusing, though, to browse SAS marketing internal documentation --to which I had access some time ago-- on "how to compete" against R. The SAS salesperson statement in the article seems to have been extracted verbatim from them. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Boxplot from matrices
Hello, The following code may help you: > my.matrix <- matrix( rnorm(16), ncol = 4 ) > boxplot( my.matrix ~ col( my.matrix ) ) Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Sun, 2009-01-11 at 05:23 -0800, johnhj wrote: > Hii, > > I will create boxplots from matrices. I have the following data sets: > 5.0 1.78 2.99 2.019 0 > 10.0 1.79 3.00 1.744 0 > 15.0 1.78 2.98 1.936 0 > 20.0 1.78 2.99 1.975 0 > 25.0 1.73 2.91 3.591 0 > 30.0 1.79 3.00 1.966 0 > 35.0 1.79 3.00 2.451 0 > 40.0 1.79 3.00 1.853 0 > 45.0 1.79 3.00 2.077 0 > 50.0 1.79 3.00 1.943 0 > 55.0 1.79 3.00 2.608 0 > 60.0 1.79 3.00 1.790 0 > 65.0 1.79 3.00 1.893 0 > 70.0 1.79 3.00 2.079 0 > 75.0 1.77 2.97 2.200 0 > 80.0 1.79 3.01 1.868 0 > 85.0 1.78 2.99 2.179 0 > 90.0 1.70 2.85 2.305 0 > 95.0 1.71 2.87 1.854 0 > 100.0 1.79 3.00 2.362 0 > 105.0 1.79 3.00 3.634 0 > 110.0 1.79 3.00 1.578 0 > 115.0 1.79 3.00 1.835 0 > 120.0 1.79 3.00 2.359 0 > 125.0 1.79 3.00 2.542 0 > 130.0 1.76 2.95 2.620 0 > 135.0 1.79 3.00 4.181 0 > 140.0 1.79 3.00 1.375 0 > 145.0 1.79 3.00 2.872 0 > 150.0 1.79 3.00 3.002 0 > 155.0 1.79 3.00 3.712 0 > 160.0 1.79 3.01 3.175 0 > 165.0 1.79 3.00 2.821 0 > 170.0 1.79 3.00 3.320 0.078 > 175.0 1.79 3.00 2.076 0 > 180.0 1.77 2.97 2.186 0 > 185.0 1.78 2.99 4.652 0 > 190.0 1.79 3.01 2.051 0 > 195.0 1.79 3.00 1.922 0 > 200.0 1.79 3.00 1.945 0 > > The first thing I do is, to run the command > y<-matrix(c(test$V3),ncol=8) > to divide the third column in 8 matrices to create 8 boxplots. > The I run the command > w<-summary(y) > to get the values min, max, mean, median, 1.Quan, 3.Quan > > My problem is, I cann't run the plot command to create the 8 boxplots in a > graph... > The command > plot(y) > gives me an error.. > > Can anybody help me to create the boxplot from matrices in a graph ? > > greetings, > j > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to reference previous row?
Hello, The solution to problem will seem far easier if you think in a different way. For instance, you may want to consider the extra dummy column previous.first.value <- c( "NA", first[ - length(first) ] ) Then you can "horizontally" compare first with it's previous value. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Mon, 2009-01-12 at 18:57 +1100, Heston Capital wrote: > I am trying to write some code where the factor references its > previous value, but can't find a solution searching through the > archive. > > > X > first second > 1 A 1 > 2 A 2 > 3 B 3 > 4 B 4 > 5 B 5 > 6 C 6 > 7 C 7 > > I need a third column, in pseudo code- > If value of first=previous value of first: > third=previous value of third > else third = second > > So the third column would look like: > 0 > 0 > 3 > 3 > 3 > 6 > 6 > > > Thanks! > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] meaning of asymmetric on help page for intersect
Hello, The symmetric set difference of A and B is the set of elements in A or B but not in A intersection B, i.e., ( (A U B) \ (A intersection B) ). The asymmetric set difference of A and B is the set of elements of A except those in B, i.e., (A \ B). Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing elements for equality
Hello, You could build your output dataframe along the following lines: foo <- function(x) length( unique(x) ) == 1 results <- data.frame( freq = tapply( dat$id, dat$id, length ), var1 = tapply( dat$var1, dat$id, foo ), var2 = tapply( dat$var2, dat$id, foo ) ) Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Tue, 2009-01-13 at 14:17 -0500, Doran, Harold wrote: > Suppose I have a dataframe as follows: > > dat <- data.frame(id = c(1,1,2,2,2), var1 = c(10,10,20,20,25), var2 = > c('foo', 'foo', 'foo', 'foobar', 'foo')) > > Now, if I were to subset by id, such as: > > > subset(dat, id==1) > id var1 var2 > 1 1 10 foo > 2 1 10 foo > > I can see that the elements in var1 are exactly the same and the > elements in var2 are exactly the same. However, > > > subset(dat, id==2) > id var1 var2 > 3 2 20foo > 4 2 20 foobar > 5 2 25foo > > Shows the elements are not the same for either variable in this > instance. So, what I am looking to create is a data frame that would be > like this > > idfreqvar1var2 > 1 2 TRUETRUE > 2 3 FALSE FALSE > > Where freq is the number of times the ID is repeated in the dataframe. A > TRUE appears in the cell if all elements in the column are the same for > the ID and FALSE otherwise. It is insignificant which values differ for > my problem. > > The way I am thinking about tackling this is to loop through the ID > variable and compare the values in the various columns of the dataframe. > The problem I am encountering is that I don't think all.equal or > identical are the right functions in this case. > > So, say I was wanting to compare the elements of var1 for id ==1. I > would have > > x <- c(10,10) > > Of course, the following works > > > all.equal(x[1], x[2]) > [1] TRUE > > As would a similar call to identical. However, what if I only have a > vector of values (or if the column consists of names) that I want to > assess for equality when I am trying to automate a process over > thousands of cases? As in the example above, the vector may contain only > two values or it may contain many more. The number of values in the > vector differ by id. > > Any thoughts? > > Harold > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Howto access object of object
Hello, Use "@" instead of "$" to extract slots from a S4 object. Regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Wed, 2009-01-14 at 17:07 +0900, Gundala Viswanath wrote: > Dear all, > > I have the following object: > > > print(x) > An object of class "matrix.csr" > Slot "ra": > [1] 0.992056718 1.0 1.0 1.0 1.0 1.0 > [7] 1.0 1.0 1.0 1.0 1.0 1.0 > [13] 1.0 1.0 1.0 1.0 1.0 1.0 > [19] 1.0 1.0 1.0 0.002647761 0.000882587 0.000882587 > [25] 0.000882587 0.000882587 0.000882587 0.000882587 > > Slot "ja": > [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > > Slot "ia": > [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 22 22 > 22 > [26] 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 > 22 > [51] 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 > 22 > [76] 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 > 22 > > Slot "dimension": > [1] 100 1 > > __ END__ > > How can I acces "Slot 'ra'" only? > > I tried > > print(x$ra) > > but fail. > > - Gundala Viswanath > Jakarta - Indonesia > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vectorization of three embedded loops
Hello, I believe that your bottleneck lies at this piece of code: sum<-c(); for(j in 1:length(val)){ sum[j]<-euc[rownames(start.b)[i],val[j]] } In order to speed up your code, there are two alternatives: 1) Try to reorder the euc matrix so that the sum vector corresponds to (part of) a row or column of euc. 2) For each i value, create a matrix with the coordinates corresponding to ( rownames(start.b)[i], val[j] ) and index the matrix by this matrix in order to create sum. This will be easiest if you can reorder euc in a way that accessing its elements will be easy (and then you would be back into (1)). Creating a variable sum as c() and increasing its size in a loop is one of the easiest ways to uselessly burn your CPU. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Wed, 2009-01-14 at 10:32 +0300, Thomas Terhoeven-Urselmans wrote: > Dear R-programmer, > > I wrote an adapted implementation of the Kennard-Stone algorithm for > sample selection of multivariate data (R 2.7.1 under MacBook Pro, > Processor 2.2 GHz Intel Core 2 Duo, Memory 2 GB 667 MHZ DDR2 SDRAM). > I used for the heart of the script three embedded loops. This makes it > especially for huge datasets very slow. For a datamatrix of 1853*1853 > and the selection of 556 samples needed computation time of more than > 24 hours. > I did some research on vecotrization, but I could not figure out how > to do it better/faster. Which ways are there to replace the time > consuming loops? > > Here are some information: > > # val.n<-24; > # start.b<-matrix(nrow=1812, ncol=20); > # val is a vector of the rownames of 22 in an earlier step chosen > extrem samples; > # euc<-<-matrix(nrow=1853, ncol=1853); [contains the Euclidean > distance calculations] > > The following calculation of the system.time was for the selection of > two samples: > system.time(KEN.STO(val.n,start.b,val.start,euc)) > user system elapsed > 25.294 13.262 38.927 > > The function: > > KEN.STO<-function(val.n,start.b,val,euc){ > > for(k in 1:val.n){ > sum.dist<-c(); > for(i in 1:length(start.b[,1])){ > sum<-c(); > for(j in 1:length(val)){ > sum[j]<-euc[rownames(start.b)[i],val[j]] > } > sum.dist[i]<-min(sum); > } > bla<-rownames(start.b)[which(sum.dist==max(sum.dist))] > val<-c(val,bla[1]); > start.b<-start.b[-(which(match(rownames(start.b),val[length(val)])! > ="NA")),]; > if(length(val)>=val.n)break; > } > return(val); > } > > Regards, > > Thomas > > Dr. Thomas Terhoeven-Urselmans > Post-Doc Fellow > Soil infrared spectroscopy > World Agroforestry Center (ICRAF) > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Obtain numbers from vector of NAs and numbers
Hello, new.dat <- dat[ ! is.na(dat) ] should do the trick. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Wed, 2009-01-14 at 19:32 +0900, Gundala Viswanath wrote: > Dear all, > > I have this set of vectors generated via a loop. > > > for (i in 1:nrow(dat)) { > + print(dat$v1) } > > [1] NA NA NA NA NA NA NA NA NA NA 9 NA NA NA NA NA NA NA NA NA > [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > [1] NA NA NA NA NA NA NA NA NA NA NA NA NA 18 NA NA NA NA NA NA > [1] NA NA NA NA NA 8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA > [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > [1] 17 18 NA 13 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > [1] NA NA NA NA 9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > [1] 5 6 7 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > [1] 9 10 11 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > [1] 13 14 3 17 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > [1] 2 1 14 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > [1] 3 3 13 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > [1] 4 4 16 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > > What I want to do is to extract only integer (i.e. every numbers except NA) > yielding 1 single vector that contain all. > > [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 9 18 8 17 18 > [26] 13 9 5 6 7 9 10 11 13 14 3 17 2 1 14 3 3 13 4 4 16 > > Is there a quick way to do it? > > I tried "grep("[0-9]", vect)" but fail. > > - Gundala Viswanath > Jakarta - Indonesia > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Value Lookup from File without Slurping
On Fri, 2009-01-16 at 18:02 +0900, Gundala Viswanath wrote: > Dear all, > > I have a repository file (let's call it repo.txt) > that contain two columns like this: > > # tag value > AAA0.2 > AAT0.3 > AAC 0.02 > AAG 0.02 > ATA0.3 > ATT 0.7 > > Given another query vector > > > qr <- c("AAC", "ATT") > > I would like to find the corresponding value for each query above, > yielding: > > 0.02 > 0.7 > > However, I want to avoid slurping whole repo.txt into an object (e.g. hash). > Is there any ways to do that? > > The reason I want to do that because repo.txt is very2 large size > (milions of lines, > with tag length > 30 bp), and my PC memory is too small to keep it. > > - Gundala Viswanath > Jakarta - Indonesia Hello, You can always store your repo.txt into a database, say, SQLite, and select only the values you want via an SQL query. Thus, you will prevent loading the full file into memory. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with applying where condition
Hello, You can merge both tables first and then select the rows and columns you want. Do it the other way around if your tables are too big. All you need you can read it at ?merge ?subset Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Tue, 2009-01-20 at 11:03 +0530, venkata kirankumar wrote: > Hi all, > > I am a biggener in R-Project > > I got one problem with applying *where condition* > like > > if 2 tables like > table1: > > empidname dep > 101 kiransolutions >102 ram testing >103pavan database > > table2: > > empid month sal > 101 Dec 9500 > 102 Dec 9800 > 103 Dec 8500 > > in first table i have to take *empid* with using the *name(kiran)* > and after getting that *empid* i have to get *sal *with using that *empid* > > can any one suggest how can I acheave this > > > Thanks & regards; > kiran > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merging tables
Hello, Use merge. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Tue, 2009-01-20 at 13:41 +, Dry, Jonathan R wrote: > I am relatively new to R and am trying to do some basic data manipulation. > Basically I have a table (csv - table 1) of data for a set of samples (rows), > and a second table (table 2) of information about a subset of samples of > particular interest. I want to pull out the data from table 1 for the > samples in table 2, either by: > * Merging the two tables based on a common identifier (SampleID - may > have a different header in the two tables), and filter for overlapping > entries (preferred approach) > * OR filter table 1 for entries where SampleID matches to one in a list > taken from table 2 > > Any help would be gratefully recieved. > > -- > AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}} > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] from matrix to data.frame
Hello, The columns in your output dataframe are the following vectors: X1: as.vector( row(a) ) X2: colnames(a)[as.vector( col(a) )] X3: as.vector( a ) Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Tue, 2009-01-20 at 15:10 +0100, Antje wrote: > Hello, > > I have a question how to reshape a given matrix to a data frame. > > # -- > > a <- matrix(1:25, nrow=5) > > a > [,1] [,2] [,3] [,4] [,5] > [1,]16 11 16 21 > [2,]27 12 17 22 > [3,]38 13 18 23 > [4,]49 14 19 24 > [5,]5 10 15 20 25 > > > colnames(a) <- LETTERS[1:5] > > rownames(a) <- as.character(1:5) > > a >A B C D E > 1 1 6 11 16 21 > 2 2 7 12 17 22 > 3 3 8 13 18 23 > 4 4 9 14 19 24 > 5 5 10 15 20 25 > > # --- > > This is an example on how my matrix looks like. > Now, I'd like to reshape the data that I get a data frame with three columns: > > - the row name of the enty (X1) > - the column name of the entry (X2) > - the entry itself (X3) > > like: > > X1X2 X3 > 1 A 1 > 2 A 2 > 3 A 3 > > 1 B 6 > 2 B 7 > > 5 E 25 > > How would you solve this problem in an elegant way? > > Antje > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Does anyone has this paper in pdf?
> Note there is a link to issues dealing with teaching material, and I'd > imagine colleagues at your institution are likely to have the same access > rights, so technically its just as easy to send them a link to download a > paper themselves. Hello, On this point, I remember taking courses at university in which the professor was not allowed to make and distribute copies of certain articles for the students. However we were free to go and make the copies (legally) ourselves. There seems to be a point beyond which capitalism introduces more inefficiencies into the market than it solves. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mode (statistics) in R?
Hello, You can try ?table. Best regards, Carlos J. Gil Bellosta http://www.datanaytics.com On Mon, 2009-01-26 at 05:28 -0800, Jason Rupert wrote: > Hopefully this is a pretty simple question: > > Is there a function in R that calculates the "mode" of a sample? That is, I > would like to be able to determine the value that occurs the most frequently > in a data set. > > I tried the default R "mode" function, but it appears to provide a storage > type or something else. > > I tried the RSeek and some R documentation that I downloaded, but nothing > seems to mention calculating the "mode". > > Thanks again. > > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] D'Hondt method
Hello, I believe that a "productionized" version of the following would do: dHont <- function( candidates, votes, seats ){ tmp <- data.frame( candidates = rep( candidates, each = seats ), scores = as.vector(sapply( votes, function(x) x / 1:seats )) ) tmp <- tmp$candidates[order( - tmp$scores )] [1:seats] table(tmp) } > votes <- sample(1:1, 5) > votes [1] 448 7685 5445 482 6266 > dHont(letters[1:5], votes, 10 ) tmp a b c d e 0 4 3 0 3 Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Wed, 2009-02-04 at 12:16 +0100, Thomas Steiner wrote: > Is there a R function to calculate the seats in parliament given the > total number of seats and the votes for each party -- for different > methods including the method of D'Hont? > http://en.wikipedia.org/wiki/D%27Hondt_method > Thanks, > thomas > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R Interface Coming to SAS/IML Studio
Hello, I thought this link could be of interest to the list. http://support.sas.com/rnd/app/studio/Rinterface2.html Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do you apply a function to each variable in a data frame?
Data frames are lists themselves. Something like do.call( rbind, lapply( my.data.frame, quantile, probs=c(0.1,0.9)) ) should work. Carlos J. Gil Bellosta http://www.datanalytics.com On Mon, 2008-11-03 at 07:03 -0800, zerfetzen wrote: > I want to apply a more complicated function than what I use in my example, > but the idea is the same: > > Suppose you have a data frame named x and you want to a function applied to > each variable, we'll just use the quantile function for this example. I'm > trying all sorts of apply functions, but not having luck. My best guess > would be: > > sapply(x, FUN=quantile) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] CROSSTABULATION
Hello... Which code are you using to perform the breakup into the three classes? Can you be more specific on that? Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Thu, 2008-11-13 at 09:57 +, Sohail wrote: > I want to form a 3x3 crosstabulation for the signs of two vectors (i.e. > Negative, Zero, Positive). The problem is that I am simulating the data so > for some iterations one of the categories is absent. Thus the resulting > table shrinks to 3x2. I want it to be 3x3 with zero column corresponding to > the missing category. Moreover, I have tried but failed to give the > dimension names. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SAS Institute Adding Support for R
On Thu, 2009-02-12 at 21:32 -0300, milton ruser wrote: > Dear all, > I was thinking how much of R capabilities SAS Institute could incorporate on > SAS support? > > Cheers > > miltinho > brazil Most likely, as many as they currently provide for Tomcat, Postgres, Apache or other open source products they bundle along with their solutions. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] everybody loves R...
Hello, I do not know any such "community of R users in Spain and Latin America" but it sounds like a great idea. A number of R official documents have already been translated into Spanish, but enhancing local language support on basic documentation would facilitate adoption of the language by universities and other institutions currently working under a "Spanish only" restriction. Adding a directory of local providers of coding and consultancy resources would also increase the speed of adoption of R in the industry, for sure. Please, do contact me so that we can develop the idea further. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Fri, 2009-02-20 at 09:59 +, UsuarioR España wrote: > Hi all > > This > topic is very interesting to me as I was planning to do something > similar, but in Spanish. In my opinion with the existing infrastructure > in English, new resources are not necessary. However, local language > support, and, in particular, in Spanish, is rather weak. > > I don't > actually know if there is something already done like "community of R > users in Spain and Latin America" but your opinions, ideas, and offers > of collaboration to create it, would be very useful for me if finally we > decide to do something. > > Best regards > > > Date: Thu, 19 Feb 2009 23:42:31 +0100 > > From: waclaw.marcin.kusnierc...@idi.ntnu.no > > To: landronim...@gmail.com > > CC: r-help@r-project.org > > Subject: Re: [R] everybody loves R... > > > > Liviu Andronic wrote: > > > On Thu, Feb 19, 2009 at 5:29 PM, Gábor Csárdi wrote: > > > > > >> I don't want to be mean, I really like wikidot, but isn't it a better > > >> solution to use the R wiki instead? > > >> > > >> http://wiki.r-project.org/rwiki/doku.php > > >> > > >> > > > Or even to contribute to existing well-structured sites such as > > > Quick-R [1]? It would avoid doubling efforts, and dispersing similar > > > information accross too many places. > > > > > > > > > > well, if the purpose is to have the message 'everybody loves r' imposed > > on as many as possible, dispersing similar information across places is > > one way to go ;) > > > > vQ > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > _ > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modifying a built in function from the stats package (fixing arima)
Hello, I do not think that is the way to go. If you believe that your algorithm is better than the existing one, talk to the author of the package and discuss the improvement. The whole community will benefit. If you want to tune the existing function and tailor it to your needs, you have several ways to go, among them: 1) Copy the existing function into a new file, edit it and load it via source. 2) Download the source package and modify it for your own purposes. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Tue, 2009-03-03 at 18:20 +0100, Marc Vinyes wrote: > Dear members of the list, > > I'm a beginner in R and I'm having some trouble with: "Error in > optim(init[mask], armafn, method = "BFGS", hessian = TRUE, control = > optim.control, : > non-finite finite-difference value [8]" > > when running "arima". > > I've seen that some people have come accross the same problem: > https://stat.ethz.ch/pipermail/r-help/2008-August/169660.html > > So I'd like to modify the code of arima to change the optimization function > with another one that handles these problems automatically , however I don't > find the way to do it and > http://tolstoy.newcastle.edu.au/R/e6/help/09/01/2476.html points out a way > that doesn't work for me: > > * If I type edit(arima) and I modify it, changes are not saved, > * If I copy the code and I save it like a different function, I get the hard > error: "Error in Delta %+% c(1, -1) : object "R_TSconv" not found" > > Anybody can give me a hint? I miss matlab's easy way of doing this ("edit > function.m"). > > Thanks in advance > > MarC (AleaSoft) > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast Fourier Transform w.r.t. CreditRisk+
Hello, You have a link on the subject here: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1122844 The author has extra literature and code on the subject. Also, there was a thread in R-SIG-Finance list on the subject a few months ago. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Thu, 2009-03-05 at 03:48 -0800, Maithili Shiva wrote: > Dear R Helpers, > > Is there any literaure available (including R code) on Fast Fourier Transform > being used in CreditRisk+? I need to learn how to apply the Fast Fourier > Transform. I agree I am too vaue in my question and sincerely apologize for > the same, but I am not able to understand as to where do I start for this > particular assignment. I tried to search google for CRAN and Fast Fourier > Transform, but I got something for FFT image. Basically I need to understand > what is Fast Fourier Transform is and its use in CreditRisk+? > > With regards and tahnking in advance > > Maithili > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Describing clusters
Dear R-helpers, I am writing to the list in order to inquire whether there exists any R package or program that will help me "describe" clusters. The situation is as follows: 1) I create some clusters (say, with any clustering method in R). 2) I want to "describe" and assign some kind of "label" to each of them. In order to "label" each cluster, I want to compare the distribution of the variables in each cluster with respect to the distribution of the variables in the original dataset. I would like to do it graphically, if possible. In this way I could review this output and say: this cluster corresponds to, say, "older patients who were not treated before", etc. I am aware this is not sound scientific practice, but I am asked to do something like that. I have some ideas about how to do it, but I would like to know if I am walking on a well trodden path. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Describing clusters
Dear R-helpers, I am writing to the list in order to inquire whether there exists any R package or program that will help me "describe" clusters. The situation is as follows: 1) I create some clusters (say, with any clustering method in R). 2) I want to "describe" and assign some kind of "label" to each of them. In order to "label" each cluster, I want to compare the distribution of the variables in each cluster with respect to the distribution of the variables in the original dataset. I would like to do it graphically, if possible. In this way I could review this output and say: this cluster corresponds to, say, "older patients who were not treated before", etc. I am aware this is not sound scientific practice, but I am asked to do something like that. I have some ideas about how to do it, but I would like to know if I am walking on a well trodden path. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I have a question about a programa in R
Hello, And what is exactly your problem? Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to: Read Multi-filtes and sort to different files
Dear Mr. Li, To make things simpler, you could place the files corresponding to different stations in different directories. Then: 1) I would loop over the directories. 2) I would use dir and loop through the resulting vector (that would contain the file names). 3) I would use read.table with parameters skip (to skip the header) and the header option set to true. 4) I would aggregate the resulting files in a single big file. There are ways to do that. Some involve using for loops; you can also use sapply to loop over files and cbind if you feel confident with a command similar to do.call( cbind, sapply( dir(), read.table, skip = 1, header = TRUE ) ) I have not been able to test the expression above and it may not even parse in R but it is close to something that should work. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Wed, 2009-03-25 at 14:30 -0700, Qianfeng Li wrote: > new R user has a question: > > I have several hundreds of .txt files from different monitoring sites over > several years. > (1) different site has a unique name( such as : ST2.20090321.txt = Sation 2 > 2009 March 21 data, ST3.20090322=Station3, 2009, March 22 data). > (2) different site has different file header, but for the same site, the > header is the same. > for example: > Sation 2 > date time wind CO2 > 2009 10:30 2 3 > station 3 > data time solar NO > 2009 10:20 4 5 > > Question: > How to write a "R" program to read all these files, and combine the data from > each station to one file (such as: ST2.master will save all the data from > station 2, and ST1.master will save all the data from station 1) ? > > > Thanks a million times! > > Jeff > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] "Overloading" some non-dispatched S3 methods for new classes
Hello, I am building a package that creates a new kind of object not unlike a dataframe. However, it is not an extension of a dataframe, as the data themselves reside elsewhere. It only contains "metadata". I would like to be able to retrieve data from my objects such as the number of rows, the number of columns, the colnames, etc. I --quite naively-- thought that ncol, nrow, colnames, etc. would be dispatched, so I would only need to create a, say, ncol.myclassname function so as to be able to invoke "ncol" directly and transparently. However, it is not the case. The only alternative I can think about is to create decorated versions of ncol, nrow, etc. to avoid naming conflicts. But I would still prefer my package users to be able to use the undecorated function names. Do I have a chance? Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Segmentation fault in package rJava on CentOS server
Hello, I just installed rJava on [r...@ug13 ~]# R --version R version 2.9.0 (2009-04-17) runing on a [r...@ug13 ~]# cat /etc/redhat-release CentOS release 5.3 (Final) This is the output of [r...@ug13 ~]# R CMD javareconf Java interpreter : /usr/bin/java Java version : 1.4.2_18 Java home path : /usr/java/j2sdk1.4.2_18/jre Java compiler: /usr/bin/javac Java headers gen.: /usr/bin/javah Java archive tool: /usr/bin/jar Java library path: $(JAVA_HOME)/lib/i386/client:$(JAVA_HOME)/lib/i386:$(JAVA_HOME)/../lib/i386 JNI linker flags : -L$(JAVA_HOME)/lib/i386/client -L$(JAVA_HOME)/lib/i386 -L$(JAVA_HOME)/../lib/i386 -ljvm JNI cpp flags: -I$(JAVA_HOME)/../include -I$(JAVA_HOME)/../include/linux Package rJava got properly installed (there were a number of warnings, though, in the installation process). However, > library(rJava) > .jinit("") *** caught segfault *** address 0xc, cause 'memory not mapped' Traceback: 1: .External("RinitJVM", boot.classpath, parameters, PACKAGE = "rJava") 2: .jinit("") Whenever I try to interact with Java from R --I am interested in the RJDBC package--, I get the same segmentation fault at the .jinit call. In particular, when .jinit calls RinitJVM. Any ideas? Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bug in truncgof package?
Dear R-helpers, I was testing the truncgof CRAN package, found something that looked like a bug, and did my job: contacted the maintainer. But he did not reply, so I am resending my query here. I installed package truncgof and run the example for function ad.test. I got the following output: set.seed(123) treshold <- 10 xc <- rlnorm(100, 2, 2)# complete sample xt <- xc[xc >= treshold]# left truncated sample ad.test(xt, "plnorm", list(meanlog = 2, sdlog = 2), H = 10) Supremum Class Anderson-Darling Test data: xt AD = 3.124, p-value = 0.12 alternative hypothesis: two.sided treshold = 10, simulations: 100 So I cannot reject the hipothesis (at a standard confidence level) that the original sample comes from a lognormal distribution (as it is the case). But let us try to iterate on this example: set.seed( 123 ) treshold <- 10 foo <- function(){ xc <- rlnorm(100, 2, 2) # complete sample xt <- xc[xc >= treshold] # left truncated sample ks.test(xt, "plnorm", list(meanlog = 2, sdlog = 2), H = 10)$p.value } results <- replicate( 100, foo() ) Then: > table( results ) results 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.16 0.18 0.19 0.2 257931234112211 32 0.21 0.22 0.26 0.27 0.28 0.3 0.31 0.32 0.33 0.36 0.38 0.4 0.44 0.49 0.54 0.55 22131211121211 21 0.56 0.57 0.62 0.7 0.76 0.78 0.96 0.98 12111111 This is, in a 45% of the cases, you would reject the H_0 hypothesis, which happens to be true, at the 5% "standard" confidence level. Do you think this behaviour is buggy? If so, given that the maintainer does not seem to be contactable, what would be the next step to take? Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Recursive partitioning algorithms in R vs. alia
Dear R-helpers, I had a conversation with a guy working in a "business intelligence" department at a major Spanish bank. They rely on recursive partitioning methods to rank customers according to certain criteria. They use both SAS EM and Salford Systems' CART. I have used package R part in the past, but I could not provide any kind of feature comparison or the like as I have no access to any installation of the first two proprietary products. Has anybody experience with them? Is there any public benchmark available? Is there any very good --although solely technical-- reason to pay hefty software licences? How would the algorithms implemented in rpart compare to those in SAS and/or CART? Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Large Stata file Import in R
Hello, You are dealing with two different problems at the same time: importing Stata data and importing a relatively big file. Can you try to export your data to txt file first and try to import from it directly? Secondly, problems concerning reading big files with R occur quite often and there are plenty of discussions and workarounds described in previous posts. I am the author of a new package aimed at reading files column-wise. It is quite frugal with memory as the data resides mostly on R dumped files of the objects representing the rows of your data. You can install and test it via install.packages("colbycol",repos="http://R-Forge.R-project.org";) Comments and bug reports are more than welcome! Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Mon, 2009-06-29 at 15:50 +0100, saurav pathak wrote: > Hi > > I am using Stata 10 and I need to import a data set in stata 10 to R, I have > saved the dataset in lower versions of Stata as well by using saveold > command in Stata. > > My RAM is 4gb and the stata file is 600MB, I am getting an error message > which says : > > "Error: cannot allocate vector of size 3.4 Mb > In addition: There were 50 or more warnings (use warnings() to see the first > 50)" > > Thus far I have already tried the following > > 1. By right clicking on the R icon I have used --max-mem-size=1000M in the > "target" under "properties of the R icon > 2. I have used library(foreign) at teh command prompt > 3. then I use trialfile <- read.dta("C:/filename.dta") > Here I get error for a Stata data file that is 600MB in size, however, with > data set in Stata 10 and Stata 8 of the size of 200KB, I have successfully > being able to import the stata file in R > > I am therefor confused whteher there is problem with the version of my stata > file (which should not eb the case as I the smaller file of both versions > are working fine) or is it the size issue, > > Its pretty important for me, kindly address this question > Thanks > Saurav > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregate, max and time of max
Hello, I believe that by( data.ex, data.ex[,c(3,4)], function(x) x[which.max(x[,1]),] ) does what you want. Then, do.call( rbind, by( data.ex, data.ex[,c(3,4)], function(x) x[which.max(x[,1]),] ) ) looks somewhat nicer. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Fri, 2009-07-24 at 13:06 -0400, Afshartous, David wrote: > All, > > For data consisting of serial measurements on subjects, one may use the > aggregate function to say compute the peak response for each subject for > each design condition. Is there a way to alter this or another one-liner to > also retain the time at which the peak occurred and thus avoid writing a > doing this via a loop? I suppose one could attempt to employ the split > function but that's probably no simpler than employing a loop. Sample code > below: > > > data = expand.grid(time = seq(1,6), subject = seq(1,20), treatment = > c("placebo", "drug")) > data.ex = cbind(y = rnorm(dim(data)[1], 5, 1), data.ex) > > data.peak = aggregate(data.ex[c(1)], data.ex[c(3,4)], max) > ## this provides the peak of each subject on each treatment, but time is > ## lost. Including time in the statement doesn't help clearly as then > ## the peak of all the times will be calculated > > > David > > -- > David Afshartous, Ph.D. > Research Assistant Professor > University of Miami, Miller School of Medicine > Division of Clinical Pharmacology > 1500 N.W. 12th Avenue, 15th Floor West > Miami, Florida 33136 > > E-mail: afs...@med.miami.edu > Phone: +1 305-243-1549 > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about rpart decision trees (being used to predict customer churn)
Hello, If you do my.tree <- rpart(cancel ~ experience) and then you check my.tree$frame you will note that the complexity parameter there is 0. Check ?rpart.object to get a description of what this output means. But essentially, you will not be able to break the leaf unless you set a complexity parameter below that value, this is, never. You may need to go into the internals of the function (and the C code) in order to understand how this parameter is calculated. It looks to me as an oddity and it is worth trying to understand why. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com P.S.: Note that there is a bug in your submitted code that requires some hand fixing. On Sun, 2009-07-26 at 11:37 -0700, Robert Smith wrote: > Hi, > > I am using rpart decision trees to analyze customer churn. I am finding that > the decision trees created are not effective because they are not able to > recognize factors that influence churn. I have created an example situation > below. What do I need to do to for rpart to build a tree with the variable > experience? My guess is that this would happen if rpart used the loss matrix > while creating the tree. > > > experience <- as.factor(c(rep("good",90), rep("bad",10))) > > cancel <- as.factor(c(rep("no",85), rep("yes",5), rep("no",5), > rep("yes",5))) > > table(experience, cancel) > cancel > experience no yes > bad 5 5 > good 85 5 > > rpart(cancel ~ experience) > n= 100 > node), split, n, loss, yval, (yprob) > * denotes terminal node > 1) root 100 10 no (0.900 0.100) * > > I tried the following commands with no success. > rpart(cancel ~ experience, control=rpart.control(cp=.0001)) > rpart(cancel ~ experience, parms=list(split='information')) > rpart(cancel ~ experience, parms=list(split='information'), > control=rpart.control(cp=.0001)) > rpart(cancel ~ experience, parms=list(loss=matrix(c(0,1,1,0), nrow=2, > ncol=2))) > > Thanks a lot for your help. > > Best regards, > Robert > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about rpart decision trees (being used to predict customer churn)
Hello, Isn't it totally counter-intuitive that if you penalize the error less the tree finds it? See: experience <- as.factor(c(rep("good",90), rep("bad",10))) cancel <- as.factor(c(rep("no",85), rep("yes",5), rep("no",5),rep("yes",5))) foo <- function( i ){ tmp <- rpart(cancel ~ experience, parms=list(loss=matrix(c(0,i,1,0), byrow=TRUE,nrow=2))) nrow( tmp$frame ) } sapply( 1:20, foo ) The ouput I get is: [1] 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 1 So, something unexpected happens after penalization exceeds 16... Should it be? Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Sun, 2009-08-02 at 08:41 +1000, Graham Williams wrote: > 2009/7/27 Robert Smith > > > Hi, > > > > I am using rpart decision trees to analyze customer churn. I am finding > > that > > the decision trees created are not effective because they are not able to > > recognize factors that influence churn. I have created an example situation > > below. What do I need to do to for rpart to build a tree with the variable > > experience? My guess is that this would happen if rpart used the loss > > matrix > > while creating the tree. > > > > > experience <- as.factor(c(rep("good",90), rep("bad",10))) > > > cancel <- as.factor(c(rep("no",85), rep("yes",5), rep("no",5), > > rep("yes",5))) > > > table(experience, cancel) > > cancel > > experience no yes > > bad 5 5 > > good 85 5 > > > rpart(cancel ~ experience) > > n= 100 > > node), split, n, loss, yval, (yprob) > > * denotes terminal node > > 1) root 100 10 no (0.900 0.100) * > > > > I tried the following commands with no success. > > rpart(cancel ~ experience, control=rpart.control(cp=.0001)) > > rpart(cancel ~ experience, parms=list(split='information')) > > rpart(cancel ~ experience, parms=list(split='information'), > > control=rpart.control(cp=.0001)) > > rpart(cancel ~ experience, parms=list(loss=matrix(c(0,1,1,0), nrow=2, > > ncol=2))) > > > > Thanks a lot for your help. > > > > Best regards, > > Robert > > > > Hi Robert, > > Perhaps try a less extreme loss matrix: > > rpart(cancel ~ experience, parms=list(loss=matrix(c(0,5,1,0), byrow=TRUE, > nrow=2))) > > Output from Rattle: > > Summary of the Tree model for Classification (built using rpart): > > n= 100 > > node), split, n, loss, yval, (yprob) > * denotes terminal node > > 1) root 100 50 no (0.9000 0.1000) > 2) experience=good 90 25 no (0.9444 0.0556) * > 3) experience=bad 10 5 yes (0.5000 0.5000) * > > Classification tree: > rpart(formula = cancel ~ ., data = crs$dataset, method = "class", > parms = list(loss = matrix(c(0, 5, 1, 0), byrow = TRUE, nrow = 2)), > control = rpart.control(cp = 0.0001, usesurrogate = 0, maxsurrogate = > 0)) > > Variables actually used in tree construction: > [1] experience > > Root node error: 50/100 = 0.5 > > n= 100 > > CP nsplit rel error xerror xstd > 1 0.4000 0 1.01.0 0.30 > 2 0.0001 1 0.60.6 0.22 > > TRAINING DATA Error Matrix - Counts > > Actual > Predicted no yes > no 85 5 > yes 5 5 > > > TRAINING DATA Error Matrix - Percentages > > Actual > Predicted no yes > no 85 5 > yes 5 5 > > Time taken: 0.01 secs > > Generated by Rattle 2009-08-02 08:24:50 gjw > == > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to avoid a script from hanging up
Hello, Something you can do is saving your strings in a external text file (using cat, for instance). In this way, you would not require much memory while extracting your data. Once you have extracted it, you can always have a look at your external file to see if it is too big, what to do with it, etc. You can even consider saving your data into a database if need be. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Sun, 2009-08-02 at 20:02 +0200, mau...@alice.it wrote: > I am submitting this problem to the R forum , rather than the Bioconductor > forum, because its nature is closer to programming style than any > Bioinformatic contents. > I have implemented an R script to extracts many strings through querying 3 > Bioinformatic databases in the same loop cycle. Ideally, the script should > perform as many cycles as necessary to extract all available data of interest. > Inevitably it triggers a BioMart exception after running many cycles in a > row. The exception seems to be independent of the script instructions because > if I restart the script from the point where it got interrupted then it runs > for another while, extracting also the data where the exception occurred with > no problem at all. > Sometimes, though, the script does not respond any more, it hangs up, even if > no exception has apparently occurred, and the only way to regain control is > to kill the R process. This way I lose memory of how many data have been > processed and stored to disk files (unless I manually count them ... there > are thousands ..). If I restart the script then it restarts processing the > data strings from scratch. I guess it may be a memory problem as the task > manager (Windows/XP) shows that the hung-up R script is taking more than 70% > of the available RAM. > I wonder whether there is any system command to make the script self-aware of > its memory requirements and running time. > Ideally the script should be able to trap the exception and be sensitive to > its current RAM / CPU time requirements, self-exit after freezing and saving > the current program status so that when rerun it would not restart from > scratch but rather pick up from where it exited. > Maybe this is asking too much from a non-compiled language ? > > Thank you in advance, > Maura > > > tutti i telefonini TIM! > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subset of a matrix
Hello everyone, I would appreciate any help with the following. My dataset is a list containing matrices. So if you type e.g. data[[1]] you get something like: [,1][,2] 361a AT 456b AG 72145aTG As you can see my rows have names which are character strings containing numbers and letters. I want something similar to a histogram, per column. i.e. I want to know how many times I have a single repeat character in a column and how many times I have a twice repeated character and so on. Maybe there is an easy way to do this, but I wrote my own code which works perfectly, so don't bother to correct it unless extremely necessary. I write down the code so you know exactly what I'm trying to do: table <- vector() for (i in (1:length(data))){ for (j in (1:length(data[[i]][1,]))){ t <- table(data[[i]][,j]) table <- c(table, t) }} ncount <- table[names(table) != "-"] #this line is necessary to eliminate "-" characters which should not be included in the analysis sfs <- table (ncount) And with this code I get something like: 1 2 3 4 5 6 7 8 9 10 542 125 98 49 47 41 26 31 22 18 which is what I'm looking for. Now comes THE problem: As I said before my rows have names. Each name is unique. I want to apply my analysis to a subset of rows en each matrix, namely all rows whose names start with 3, all that start with 4, all that start with 721. In most cases only the first character is important, but since I have names of different length, in some cases I need the first three characters to differentiate the groups. I want to integrate this into the loop so that I get a vector (such as the one called "table" in my code) for each subset analyzed. I tried using the subset function, but I couldn't figure out how to use it, because it's intended to use row values to define the subset, not row names. I hope someone can help me out, but please bear in mind I am really new at R and most commands and parameters are really unfamiliar to me. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset of a matrix
Hi Milton, Thanks for trying to help anyway. From: milton ruser Cc: r-help@r-project.org Sent: Thursday, August 27, 2009 6:48:41 PM Subject: Re: [R] subset of a matrix Hi Carlos, I think I made a wrong suggestion. Sorry about that. I was thinking that if you have the same rowname length it helps you on the data handling. Is it true?! Case yes I can try suggest another automatic way of you get it. bests milton On Thu, Aug 27, 2009 at 12:39 PM, milton ruser wrote: Hi Carlos, > >how about this step first: > >rownames(mydata)<-gsub("361a","00361a",rownames(mydata)) >rownames(mydata)<-gsub("456a","00456a",rownames(mydata)) > >good luck > milton > > >Hello everyone, I would appreciate any help with the following. >> >>My dataset is a list containing matrices. So if you type e.g. >> >>data[[1]] >> >>you get something like: >> >> [,1][,2] >>361a AT >>456b AG >>72145aTG >> >> >>As you can see my rows have names which are character strings containing >>numbers and letters. I want something similar to a histogram, per column. >>i.e. I want to know how many times I have a single repeat character in a >>column and how many times I have a twice repeated character and so on. Maybe >>there is an easy way to do this, but I wrote my own code which works >>perfectly, so don't bother to correct it unless extremely necessary. I write >>down the code so you know exactly what I'm trying to do: >> >>table <- vector() >> >>for (i in (1:length(data))){ >> >> for (j in (1:length(data[[i]][1,]))){ >> >> t <- table(data[[i]][,j]) >> >> table <- c(table, t) >>}} >> >>ncount <- table[names(table) != "-"] #this line is necessary to eliminate "-" >>characters which should not be included in the analysis >> >>sfs <- table (ncount) >> >>And with this code I get something like: >> >> 1 2 3 4 5 6 7 8 9 10 >> >>542 125 98 49 47 41 26 31 22 18 >> >>which is what I'm looking for. >> >> >>Now comes THE problem: >> >>As I said before my rows have names. Each name is unique. I want to apply my >>analysis to a subset of rows en each matrix, namely all rows whose names >>start with 3, all that start with 4, all that start with 721. In most cases >>only the first character is important, but since I have names of different >>length, in some cases I need the first three characters to differentiate the >>groups. I want to integrate this into the loop so that I get a vector (such >>as the one called "table" in my code) for each subset analyzed. >> >>I tried using the subset function, but I couldn't figure out how to use it, >>because it's intended to use row values to define the subset, not row names. >> >>I hope someone can help me out, but please bear in mind I am really new at R >>and most commands and parameters are really unfamiliar to me. >> >>Thanks. >> >> >> >> [[alternative HTML version deleted]] >> >>__ >>R-help@r-project.org mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide http://www.r-project.org/posting-guide.html >>>>and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset of a matrix
Hi Henrique, I tried your code. I simply copied and pasted it 'cause I have no idea how it works. What I get is the total number of A's and T's and all other characters, which was not my intention. Maybe I need to make some modifications to your script before being able to apply within my script? Can you explain what for are you using those commands? Thanks for the help anyway. Cheers, Carlos From: Henrique Dallazuanna Cc: r-help@r-project.org Sent: Thursday, August 27, 2009 7:00:45 PM Subject: Re: [R] subset of a matrix Try this: lapply(data, function(r) lapply(split(r, substr(sprintf("%05d", as.numeric(gsub("[a-z]", "", row.names(r, 1, 3)), table)) On Thu, Aug 27, 2009 at 1:27 PM, Carlos Gonzalo Merino Mendez Hello everyone, I would appreciate any help with the following. > >>My dataset is a list containing matrices. So if you type e.g. > >>data[[1]] > >>you get something like: > >> [,1][,2] >>361a AT >>456b AG >>72145aTG >> > >>As you can see my rows have names which are character strings containing >>numbers and letters. I want something similar to a histogram, per column. >>i.e. I want to know how many times I have a single repeat character in a >>column and how many times I have a twice repeated character and so on. Maybe >>there is an easy way to do this, but I wrote my own code which works >>perfectly, so don't bother to correct it unless extremely necessary. I write >>down the code so you know exactly what I'm trying to do: > >>table <- vector() > >>for (i in (1:length(data))){ > >>for (j in (1:length(data[[i]][1,]))){ > >>t <- table(data[[i]][,j]) > >>table <- c(table, t) >>}} > >>ncount <- table[names(table) != "-"] #this line is necessary to eliminate "-" >>characters which should not be included in the analysis > >>sfs <- table (ncount) > >>And with this code I get something like: > >> 1 2 3 4 5 6 7 8 9 10 > >>542 125 98 49 47 41 26 31 22 18 > >>which is what I'm looking for. > > >>Now comes THE problem: > >>As I said before my rows have names. Each name is unique. I want to apply my >>analysis to a subset of rows en each matrix, namely all rows whose names >>start with 3, all that start with 4, all that start with 721. In most cases >>only the first character is important, but since I have names of different >>length, in some cases I need the first three characters to differentiate the >>groups. I want to integrate this into the loop so that I get a vector (such >>as the one called "table" in my code) for each subset analyzed. > >>I tried using the subset function, but I couldn't figure out how to use it, >>because it's intended to use row values to define the subset, not row names. > >>I hope someone can help me out, but please bear in mind I am really new at R >>and most commands and parameters are really unfamiliar to me. > >>Thanks. > > > >>[[alternative HTML version deleted]] > >>__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] regexp help needed.
Hi, I posted yesterday with a problem in a script. I still have the same problem, but I think I found a better way to explain my problem. I have a vector of character strings. Each string is unique, including numbers and letters. In the real world they represent a list of codes, so each position in the string has a meaning to me. I want to make a subset of the vector using "wildcards". So for example, take all the strings that start with 3. Any ideas how to do that? Thanks for any help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regexp help needed.
Thanks, this what I was looking for. From: Henrique Dallazuanna Cc: r-help@r-project.org Sent: Friday, August 28, 2009 1:42:25 PM Subject: Re: [R] regexp help needed. See this example: Str <- c("345asd", "31qwe", "234tyu", "40kjhg") grep("^3", Str, value = TRUE) split(Str, substr(Str, 1, 1)) On Fri, Aug 28, 2009 at 7:29 AM, Carlos Gonzalo Merino Mendez Hi, > >>I posted yesterday with a problem in a script. I still have the same problem, >>but I think I found a better way to explain my problem. > >>I have a vector of character strings. Each string is unique, including >>numbers and letters. In the real world they represent a list of codes, so >>each position in the string has a meaning to me. I want to make a subset of >>the vector using "wildcards". So for example, take all the strings that start >>with 3. > >>Any ideas how to do that? > >>Thanks for any help. > > > > >>[[alternative HTML version deleted]] > >>__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] DOE in R?
Hello, This is your starting point: http://cran.r-project.org/web/views/ExperimentalDesign.html Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Thu, 2009-09-03 at 17:38 -0700, B_miner wrote: > Hello! > > > This is not a topic I am well versed in but required to become well versed > in...I welcome any assistance! > > Using R, I want to create an optimal design for an experiment. I'll be > analyzing the results with logistic regression or some generalized linear > model. I am thinking that the algdesign package can help (but no idea where > to start?). > > I'm presenting an example here that I have seen the answer to (in SAS) in > order to make sure I would have gotten it *right*. > > There are 5 factors: 4 are quantitative with three levels each and 1 is > qualitative with two levels. > > Factor and levels: > > Intro: 0, 1.99, 2.99 > Duration: 6, 9 ,12 > GOTO: 3.99, 4.99, 5.99 > Fee: 0, 15, 45 > Color: Red, White > > > In order to screen these factors, I would want to get a design where I could > evaluate all main effects, all first order interactions and the squared > terms of Intro, Duration, GOTO and FEE (for example Intro*Intro). > > Looking for the D-optimal design. > > Is this something that R can provide? > > > These are, according to the SAS paper I read the following: > Obs intro duration goto fee color > 1 0.00 6 3.99 0 WHITE > 2 0.00 6 3.99 45 RED > 3 0.00 6 5.99 0 RED > 4 0.00 6 5.99 45 WHITE > 5 0.00 9 3.99 45 RED > 6 0.00 9 4.99 15 WHITE > 7 0.00 9 5.99 0 RED > 8 0.00 12 3.99 0 RED > 9 0.00 12 3.99 45 WHITE > 10 0.00 12 5.99 0 WHITE > 11 0.00 12 5.99 45 RED > 12 0.00 12 5.99 45 WHITE > 13 1.99 6 3.99 15 RED > 14 1.99 6 4.99 45 WHITE > 15 1.99 6 5.99 0 WHITE > 16 1.99 9 5.99 45 RED > 17 1.99 12 3.99 0 WHITE > 18 1.99 12 5.99 15 RED > 19 2.99 6 3.99 0 WHITE > 20 2.99 6 3.99 45 WHITE > 21 2.99 6 4.99 0 RED > 22 2.99 6 5.99 15 WHITE > 23 2.99 6 5.99 45 RED > 24 2.99 9 3.99 0 RED > 25 2.99 12 3.99 15 WHITE > 26 2.99 12 3.99 45 RED > 27 2.99 12 4.99 0 WHITE > 28 2.99 12 4.99 45 RED > 29 2.99 12 5.99 0 RED > 30 2.99 12 5.99 45 WHITE > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] DOE in R?
Hello, Maybe you want something like this: desD2 <- optFederov(~(Intro+Duration+GOTO+Fee+Color)^2,dat, nTrials = 30) In any case, both SAS's proc optex and R's optFederov implement a non-exhaustive search algorithm and nothing guarantees that the final design will be the same. However, you can check SAS's and R's D values to see to which extent the designs are far away from the "optimal". Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Sun, 2009-09-06 at 20:33 -0400, b miner wrote: > I attempted to use the package algdesign. I used the following code. > However, the results were very much not matching the reference I noted > (which is located at > http://www2.sas.com/proceedings/sugi31/196-31.pdf). Instead of 30 > design points, I received 25 and those that were returned, only half > or so matched the reference. I am inexperienced with optimal designs > so I dont know if I am doing something wrong, the package is not the > correct one for the task or a combination. Here is the code in case > anyone has any input: > > #CODE: > > runif(1) #for a bug in program (assumes random seed object exists) > > dat<-gen.factorial(c(3,3,3,3,2),varNames=c("Intro","Duration","GOTO","Fee","Color")) > dat #show design plan > > desD<-optFederov(~Intro+Duration+GOTO+Fee+Color > +quad(Intro)+quad(Duration)+quad(GOTO)+quad(Fee)+Intro*Duration > +Intro*GOTO+Intro*Fee+Intro*Color+Duration*GOTO+Duration*Fee > +Duration*Color+GOTO*Fee+GOTO*Color > +Fee*Color,dat,crit="D",maxIteration=1000,eval=TRUE) > > #D > desD$D > > #design > desD$design > design<-desD$design > > > > > > Subject: Re: [R] DOE in R? > > From: c...@datanalytics.com > > To: b_mi...@live.com > > CC: r-help@r-project.org > > Date: Sun, 6 Sep 2009 14:57:36 +0200 > > > > Hello, > > > > This is your starting point: > > > > http://cran.r-project.org/web/views/ExperimentalDesign.html > > > > Best regards, > > > > Carlos J. Gil Bellosta > > http://www.datanalytics.com > > > > > > On Thu, 2009-09-03 at 17:38 -0700, B_miner wrote: > > > Hello! > > > > > > > > > This is not a topic I am well versed in but required to become > well versed > > > in...I welcome any assistance! > > > > > > Using R, I want to create an optimal design for an experiment. > I'll be > > > analyzing the results with logistic regression or some generalized > linear > > > model. I am thinking that the algdesign package can help (but no > idea where > > > to start?). > > > > > > I'm presenting an example here that I have seen the answer to (in > SAS) in > > > order to make sure I would have gotten it *right*. > > > > > > There are 5 factors: 4 are quantitative with three levels each and > 1 is > > > qualitative with two levels. > > > > > > Factor and levels: > > > > > > Intro: 0, 1.99, 2.99 > > > Duration: 6, 9 ,12 > > > GOTO: 3.99, 4.99, 5.99 > > > Fee: 0, 15, 45 > > > Color: Red, White > > > > > > > > > In order to screen these factors, I would want to get a design > where I could > > > evaluate all main effects, all first order interactions and the > squared > > > terms of Intro, Duration, GOTO and FEE (for example Intro*Intro). > > > > > > Looking for the D-optimal design. > > > > > > Is this something that R can provide? > > > > > > > > > These are, according to the SAS paper I read the following: > > > Obs intro duration goto fee color > > > 1 0.00 6 3.99 0 WHITE > > > 2 0.00 6 3.99 45 RED > > > 3 0.00 6 5.99 0 RED > > > 4 0.00 6 5.99 45 WHITE > > > 5 0.00 9 3.99 45 RED > > > 6 0.00 9 4.99 15 WHITE > > > 7 0.00 9 5.99 0 RED > > > 8 0.00 12 3.99 0 RED > > > 9 0.00 12 3.99 45 WHITE > > > 10 0.00 12 5.99 0 WHITE > > > 11 0.00 12 5.99 45 RED > > > 12 0.00 12 5.99 45 WHITE > > > 13 1.99 6 3.99 15 RED > > > 14 1.99 6 4.99 45 WHITE > > > 15 1.99 6 5.99 0 WHITE > > > 16 1.99 9 5.99 45 RED > > > 17 1.99 12 3.99 0 WHITE > > > 18 1.99 12 5.99 15 RED > > > 19 2.99 6 3.99 0 WHITE > > > 20 2.99 6 3.99 45 WHITE > > > 21 2.99 6 4.99 0 RED > > > 22 2.99 6 5.99 15 WHITE > > > 23 2.99 6 5.99 45 RED > > > 24 2.99 9 3.99 0 RED > > > 25 2.99 12 3.99 15 WHITE > > > 26 2.99 12 3.99 45 RED > > > 27 2.99 12 4.99 0 WHITE > > > 28 2.99 12 4.99 45 RED > > > 29 2.99 12 5.99 0 RED > > > 30 2.99 12 5.99 45 WHITE > > > > > > > > > > > > > > > > > __ > Windows Live: Make it easier for your friends to see what you’re up to > on Facebook. Find out more. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] DOE in R?
Hello, I think I misread the link you sent me yesterday. In any case, the reason why SAS generates a model with 30 trials whereas R produces another one with 25 is that, if the parameter for the number of trials is not specified in the proc/function call, both systems adhere to different defaults: number of cols in the design matrix plus 10 for SAS and plus 5 for R. You can check proc optex and optFederov documentation for details. Beyond that, SAS implements several optimization methods, among which, Federov's. R implements just (one version of) Federov's. Finally, the non-deterministic nature of the search for optimality (and local optima) may lead to discrepancies between the outputs. You will have to work through the different proc/function inputs to make sure you are running an analogous algorithm on both systems. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Sun, 2009-09-06 at 20:33 -0400, b miner wrote: > I attempted to use the package algdesign. I used the following code. > However, the results were very much not matching the reference I noted > (which is located at > http://www2.sas.com/proceedings/sugi31/196-31.pdf). Instead of 30 > design points, I received 25 and those that were returned, only half > or so matched the reference. I am inexperienced with optimal designs > so I dont know if I am doing something wrong, the package is not the > correct one for the task or a combination. Here is the code in case > anyone has any input: > > #CODE: > > runif(1) #for a bug in program (assumes random seed object exists) > > dat<-gen.factorial(c(3,3,3,3,2),varNames=c("Intro","Duration","GOTO","Fee","Color")) > dat #show design plan > > desD<-optFederov(~Intro+Duration+GOTO+Fee+Color > +quad(Intro)+quad(Duration)+quad(GOTO)+quad(Fee)+Intro*Duration > +Intro*GOTO+Intro*Fee+Intro*Color+Duration*GOTO+Duration*Fee > +Duration*Color+GOTO*Fee+GOTO*Color > +Fee*Color,dat,crit="D",maxIteration=1000,eval=TRUE) > > #D > desD$D > > #design > desD$design > design<-desD$design > > > > > > Subject: Re: [R] DOE in R? > > From: c...@datanalytics.com > > To: b_mi...@live.com > > CC: r-help@r-project.org > > Date: Sun, 6 Sep 2009 14:57:36 +0200 > > > > Hello, > > > > This is your starting point: > > > > http://cran.r-project.org/web/views/ExperimentalDesign.html > > > > Best regards, > > > > Carlos J. Gil Bellosta > > http://www.datanalytics.com > > > > > > On Thu, 2009-09-03 at 17:38 -0700, B_miner wrote: > > > Hello! > > > > > > > > > This is not a topic I am well versed in but required to become > well versed > > > in...I welcome any assistance! > > > > > > Using R, I want to create an optimal design for an experiment. > I'll be > > > analyzing the results with logistic regression or some generalized > linear > > > model. I am thinking that the algdesign package can help (but no > idea where > > > to start?). > > > > > > I'm presenting an example here that I have seen the answer to (in > SAS) in > > > order to make sure I would have gotten it *right*. > > > > > > There are 5 factors: 4 are quantitative with three levels each and > 1 is > > > qualitative with two levels. > > > > > > Factor and levels: > > > > > > Intro: 0, 1.99, 2.99 > > > Duration: 6, 9 ,12 > > > GOTO: 3.99, 4.99, 5.99 > > > Fee: 0, 15, 45 > > > Color: Red, White > > > > > > > > > In order to screen these factors, I would want to get a design > where I could > > > evaluate all main effects, all first order interactions and the > squared > > > terms of Intro, Duration, GOTO and FEE (for example Intro*Intro). > > > > > > Looking for the D-optimal design. > > > > > > Is this something that R can provide? > > > > > > > > > These are, according to the SAS paper I read the following: > > > Obs intro duration goto fee color > > > 1 0.00 6 3.99 0 WHITE > > > 2 0.00 6 3.99 45 RED > > > 3 0.00 6 5.99 0 RED > > > 4 0.00 6 5.99 45 WHITE > > > 5 0.00 9 3.99 45 RED > > > 6 0.00 9 4.99 15 WHITE > > > 7 0.00 9 5.99 0 RED > > > 8 0.00 12 3.99 0 RED > > > 9 0.00 12 3.99 45 WHITE > > > 10 0.00 12 5.99 0 WHITE > > > 11 0.00 12 5.99 45 RED > > > 12 0.00 12 5.99 45 WHITE > > > 13 1.99 6 3.99 15 RED > > > 14 1.99 6 4.99 45 WHITE > > > 15
Re: [R] R Memory Usage Concerns
Hello, I do not know whether my package "colbycol" may help you. It can help you read files that would not have fitted into memory otherwise. Internally, as the name indicates, data is read into R in a column by column fashion. IO times increase but you need just a fraction of "intermediate memory" to read the files. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Tue, 2009-09-15 at 00:10 -0700, Evan Klitzke wrote: > On Mon, Sep 14, 2009 at 10:01 PM, Henrik Bengtsson > wrote: > > As already suggested, you're (much) better off if you specify colClasses, > > e.g. > > > > tab <- read.table("~/20090708.tab", colClasses=c("factor", "double", > > "double")); > > > > Otherwise, R has to load all the data, make a best guess of the column > > classes, and then coerce (which requires a copy). > > Thanks Henrik, I tried this as well as a variant that another user > sent me privately. When I tell R the colClasses, it does a much better > job of allocating memory (ending up with 96M of RSS memory, which > isn't great but is definitely acceptable). > > A couple of notes I made from testing some variants, if anyone else is > interested: > * giving it an nrows argument doesn't help it allocate less memory > (just a guess, but maybe because it's trying the powers-of-two > allocation strategy in both cases) > * there's no difference in memory usage between telling it a column > is "numeric" vs "double" > * when telling it the types in advance, loading the table is much, much > faster > > Maybe if I gather some more fortitude in the future, I'll poke around > at the internals and see where the extra memory is going, since I'm > still curious where the extra memory is going. Is that just the > overhead of allocating a full object for each value (i.e. rather than > just a double[] or whatever)? > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.