[R] Sweave: \Sexpr and variables with special chars
I am using \Sexpr to include a variable in a title of a Sweave document: \documentclass[a4paper]{article} <>= #mytitlevar <- "Stuff" # case 1, everything is find mytitlevar <- "Stuff_first" # case 2, f is turned into sub-text @ \title{MyTitle: \\ \Sexpr{mytitlevar} } \begin{document} \maketitle \end{document} When doing this, the variable seems to be subject to interpretation by LaTeX. The variable in case #2 causes the 'f' of 'Stuff_first' to be printed as sub-text because of the leading underscore. Is there a way to turn the variable value (the text) into a form so that it is not interpreted and/or Sweave? I understand that this perhaps more of a LaTeX question than an R question... Thanks, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sweave: Conditional code chunks?
I have a code junk that produces a figure. In my special case, however, data does not always exist. In cases where data exists, the code chunk is of course trival (case #1), however, what do I do for case # 2 where the data does not exist? I can obviously prevent the code from being executed by checking the existence of the object x, but on the Sweave level I have a static figure chunk. Here an example that should be reproducible: # case 1 x <- c(1,2,3) # case 2 - no definition of variable #x <- c(1,2,3) <>= if(exists(as.character(substitute(meta.summary{ plot(x) } @ In a way I would need a conditional chunk or a chunk that draws a figure only if it was generated and ignores it otherwise. Any ideas? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave question
Thank you for all your comment. In result of own research I found this method that seems to do what I want in addition to your suggestions: tools::texi2dvi("myfile.tex", pdf=TRUE) Thanks again, Ralf On Mon, Nov 15, 2010 at 6:42 AM, Duncan Murdoch wrote: > On 15/11/2010 6:22 AM, Dieter Menne wrote: >> >> >> Duncan Murdoch-2 wrote: >>> >>> See SweavePDF() in the patchDVI package on R-forge. >>> >>> >> >> In case googling patchDVI only show a few Japanese Pages, and search for >> patchDVI in R-Forge gives nothing: try >> >> https://r-forge.r-project.org/projects/sweavesearch/ >> >> (or did I miss something obvious, Duncan?) > > No, I just didn't realize that it was hard to find. But you can always > select R-forge as a repository, and then install.packages() will find it. > > Duncan Murdoch > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Full path to currently executed script file
I am looking for a way to determine the full filepath to the currently executed script. Any ideas? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave question
Thank you. The article you cited explains on the last page how this is done and shows how Sweave is run from within R and it says that it creates the .tex file. My last remaining question is now if there is a way to execute this Sweave tex output by executing Latex from R. In other words, what is the command to execute latex from within R. Or do I perhaps think to complcated and there is a single command to create the tex and the pdf/ps in a single step? At the end, I would like to create everything between the Sweave document and the final pdf/ps output from within R without the need to make external calls. Ralf On Sat, Nov 13, 2010 at 4:29 PM, Johannes Huesing wrote: > Ralf B [Sat, Nov 13, 2010 at 10:03:49PM CET]: >> It seems that Sweave is supposed to be used from Latex and R is called >> during the LaTeX compilation process whenever R chunks appear. > > This is not how it works. > > In the first page of > http://www.statistik.lmu.de/~leisch/Sweave/Sweave-Rnews-2002-3.pdf > that the file is first processed by R before it can be typeset by > LaTeX. > >> What >> about the other way round? I would like to run it triggered by R. Is >> this possible? > > To my understanding this is how it's done. > >> I understand that this does not correspond to the idea >> of literate programming since it means that there is R code running >> outside the document, > > You lost me here. > >> but for my practical approach, I would like to >> use Sweave more like a report extension at the end of my already >> existing R scripts that combined a number of diagrams to a pdf file. >> >> My second question is, does Sweave create a potential performance >> bottleneck when used with very big data analysis compared with when >> using R directly? >> > > Not really, because the only overhead is tangling the Sweave file. > If it is very big, you may want to process only the parts you have > changed last. The package weaver seems to come in handy then, see > http://bioconductor.org/packages/2.6/bioc/vignettes/weaver/inst/doc/weaver_howTo.pdf > -- > Johannes Hüsing There is something fascinating about science. > One gets such wholesale returns of conjecture > mailto:johan...@huesing.name from such a trifling investment of fact. > http://derwisch.wikidot.com (Mark Twain, "Life on the Mississippi") > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sweave question
It seems that Sweave is supposed to be used from Latex and R is called during the LaTeX compilation process whenever R chunks appear. What about the other way round? I would like to run it triggered by R. Is this possible? I understand that this does not correspond to the idea of literate programming since it means that there is R code running outside the document, but for my practical approach, I would like to use Sweave more like a report extension at the end of my already existing R scripts that combined a number of diagrams to a pdf file. My second question is, does Sweave create a potential performance bottleneck when used with very big data analysis compared with when using R directly? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge postscript files into ps/pdf
Assuming I would go into the trouble of messing with the existing R scripts that create mentioned postscripts/pdfs, how can I achieve that an array of scripts append to a single ps/pdf? I would want the first script to create the file if it does not yet exist and all other to append to it with new pages. I tried this simple example: #first.R pdfFileName <- paste("C:/testfile.pdf", sep="") pdf(pdfFileName, onefile=TRUE) plot(c(1,2,3)) abline(v = 2) dev.off() #second.R pdfFileName <- paste("C:/testfile.pdf", sep="") pdf(pdfFileName, onefile=TRUE) plot(c(1,2,3)) abline(h = 2) dev.off() The second overwrites the first and I cannot accumulate across different scripts. I can also not do it if I happen to start different pdf file environments in the same script despite it sharing the same file and having the 'onefile' set to true. Is it really just limited to the a single environment? Ralf On Fri, Nov 12, 2010 at 2:28 AM, Joshua Wiley wrote: > Hi Ralf, > > It is easy to make a bunch of graphs in one file (each on its own > page), using the onefile = TRUE argument to postscript() or pdf() > (depending what type of file you want). I usually use Ghostscript for > tinkering with already created postscript or PDF files. To me there > is more appropriate software than R to use if you want to > edit/merge/manipulate postscript or PDF files. > > Cheers, > > Josh > > On Thu, Nov 11, 2010 at 11:07 PM, Ralf B wrote: >> I created multiple postscript files using ?postscript. How can I merge >> them into a single postscript file using R? How can I merge them into >> a single pdf file? >> >> Thanks a lot, >> Ralf >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > University of California, Los Angeles > http://www.joshuawiley.com/ > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge postscript files into ps/pdf
I know such programs, however, for my specific problem I have an R script that creates a report (which I have to create many times) and I would like to append about 100 single paged post scripts at the end as appendix. File names are controlled so it would be easy to detect them; I just miss a useful function/package that allows me to perhaps print them to a postscript graphics device. Ralf On Fri, Nov 12, 2010 at 11:47 AM, Greg Snow wrote: > The best approach if creating all the files using R is to change how you > create the graphs so that they all go to one file to begin with (as mentioned > by Joshua), but if some of the files are created differently (rgl, external > programs), then this is not an option. > > One external program that is fairly easy to use is pdftk which will > concatenate multiple pdf files into 1 (among other things). If you want more > control of layout then you can use LaTeX which will read and include ps/pdf. > > If you need to use R, then you can read ps files using the grImport package > and then replot them to a postscript/pdf device with onefile set to TRUE. > > -- > Gregory (Greg) L. Snow Ph.D. > Statistical Data Center > Intermountain Healthcare > greg.s...@imail.org > 801.408.8111 > > >> -Original Message- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- >> project.org] On Behalf Of Ralf B >> Sent: Friday, November 12, 2010 12:07 AM >> To: r-help Mailing List >> Subject: [R] Merge postscript files into ps/pdf >> >> I created multiple postscript files using ?postscript. How can I merge >> them into a single postscript file using R? How can I merge them into >> a single pdf file? >> >> Thanks a lot, >> Ralf >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merge postscript files into ps/pdf
I created multiple postscript files using ?postscript. How can I merge them into a single postscript file using R? How can I merge them into a single pdf file? Thanks a lot, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] time question
I have this script which I use to get an epoch with accuracy of 1 second (based on R's inability to calculate millisecond-accurate timestamps -- at least I have not seen a straightforward solution :) ): nowInSeconds <- as.numeric(Sys.time()) nowInMS <- nowInSeconds * 1000 print(nowInSeconds) print(as.character(nowInMS)) when running this I get the following: > nowInSeconds <- as.numeric(Sys.time()) > nowInMS <- nowInSeconds * 1000 > print(nowInSeconds) [1] 1289312002 > print(as.character(nowInMS)) [1] "1289312002093" I wonder where the 93 milliseconds come from. Is this a random number? A rounding error? Can somebody explain this? Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rserve alternative?
The Rserve documentation at http://rosuda.org/Rserve/doc.shtml#start states that even when making multiple connections to the Rserve, Windows won't separate workspaces physically and share environments, which will obviously cause problems and should therefore not be used. Are there any alternatives for the windows platform? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rserve causes Perl error
Hi all, I tried to run Rserve: I installed it from CRAN using install.packages("Rserve") and tried to run it from the command line using: R CMD Rserve I am getting an error telling me that the command perl cannot be found. What is wrong and what can I do to fix this? Do I need to install any other packages or is it just a path problem? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create variable by name
Hi all, I have scripts that have one variable called 'output', the script creates a data frame and stores this data frame in 'output'. Then the data frame is written to disk. This is simple when working with a single script. However, as soon as one script calls other, variables overwrite each other. Here a little example that fixes the problem for one particular case by using variables with different names: # # Script A, either called directly or from another script # outputA <- NULL getOutput <- function(){ return(outputA) } outputA <- "script A" # # Script B, includes and executes script A as part of its own # output <- NULL output <- "Script B" source("C:/data/poodle/coder/overwritingTest/scriptA.R") outputA <- getOutput() print(paste("Output script B:", output)) print(paste("Output script A:", outputA )) However, I simply want to ensure that script A's output does not interfere with the the one produced by script B without script B's need to know how A called its variable. I want script B to easily access the output of script A. What I would ideally need ( I think) is an OO approach. I was thinking that I could perhaps store a generic variable that is generated based on the script name (i.e. scriptA_output, scriptB_output) On Wed, Oct 6, 2010 at 1:24 PM, Greg Snow wrote: > Possible? Yes (as others have shown), advisable? No, see fortune(106), > fortune(236), and possibly fortune(181). > > What is your ultimate goal? Maybe we can help you find a better way. > > -- > Gregory (Greg) L. Snow Ph.D. > Statistical Data Center > Intermountain Healthcare > greg.s...@imail.org > 801.408.8111 > > >> -Original Message- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- >> project.org] On Behalf Of Ralf B >> Sent: Wednesday, October 06, 2010 10:32 AM >> To: r-help Mailing List >> Subject: [R] Create variable by name >> >> Can one create a variable through a function by name >> >> createVariable <- function(name) { >> outputVariable = name >> name <- NULL >> } >> >> after calling >> >> createVariable("myVar") >> >> I would like to have a variable myVar initialized with NULL in my >> environment. Is this possible? >> >> Ralf >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Create variable by name
Can one create a variable through a function by name createVariable <- function(name) { outputVariable = name name <- NULL } after calling createVariable("myVar") I would like to have a variable myVar initialized with NULL in my environment. Is this possible? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Source awareness?
Here the general (perhaps silly question) first: Is it possible for a script to find out if it was sourced by another script or run directly? Here a small example with two scripts: # script A print ("This is script A") # script B source("C:/scriptA.R") print ("This is script B") I would like to modify script A in a way so that it only outputs 'This is script A' if it was called directly, but keeps quiet in the other case. In addition to that, is it possible to access the stack of script calls from the environment? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inheritance and automatic function call on script exit
Here the modified script with what I learned from Joshua: # # superscript # output <- NULL writeOutput <- function() { processTime <- proc.time() outputFilename <- paste("C:/myOutput_", processTime[3], ".csv", sep="") write.csv(output, file = outputFilename) } on.exit(writeOutput, add=T) # # subscript # source("C:/superscript.R") output <- data.frame(a=c(1,2,3), b=c(4,5,6)) For some reason, the file is not created. So it seems not to do the call. What do I do wrong? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inheritance and automatic function call on script exit
I think base:on.exit() will do the trick. Thank you :) Ralf On Wed, Oct 6, 2010 at 11:24 AM, Ralf B wrote: >> If you are running these interactively, you could make your own >> source() function. In that function you could define the super and >> subscripts, and have it call writeOutput on.exit(). I suspect you >> could get something like that to work even in batch mode by having R >> load the function by default and some tweaking of your scripts. > > What if I do not control the subscripts but only the superscript. In > other words, other people keep adding subscripts and the function of > my superscript only ensures certain standard behaviors. > > Ralf > > > >> >>> >>> Best, >>> Ralf >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> -- >> Joshua Wiley >> Ph.D. Student, Health Psychology >> University of California, Los Angeles >> http://www.joshuawiley.com/ >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inheritance and automatic function call on script exit
> If you are running these interactively, you could make your own > source() function. In that function you could define the super and > subscripts, and have it call writeOutput on.exit(). I suspect you > could get something like that to work even in batch mode by having R > load the function by default and some tweaking of your scripts. What if I do not control the subscripts but only the superscript. In other words, other people keep adding subscripts and the function of my superscript only ensures certain standard behaviors. Ralf > >> >> Best, >> Ralf >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > University of California, Los Angeles > http://www.joshuawiley.com/ > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Inheritance and automatic function call on script exit
Hi all, in order to add certain standard functionality to a large set of scripts that I maintain, I developed a superscript that I manually include into every script at the beginning. Here an example of a very simplified super and subscript. # # superscript # output <- NULL writeOutput <- function() { processTime <- proc.time() outputFilename <- paste("C:/myOutput_", processTime[3], ".csv", sep="") write.csv(output, file = outputFilename) } # # subscript # source("C:/superscript.R") output <- data.frame(a=c(1,2,3), b=c(4,5,6)) writeOutput() I would like to a) avoid the need to include the super script manually. Does R support some kind of script inheritance? b) Is it possible to call writeOutput() automatically when a script is exiting? Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rimage package problems
Hi all, I tried to install the rimage in order to get to the function ?read.jpeg. However, I get this error, independent what mirror I choose: install.packages("rimage") --- Please select a CRAN mirror for use in this session --- Warning message: In getDependencies(pkgs, dependencies, available, lib) : package ‘rimage’ is not available > Does anybody know what happend with the package? Is there an alternative, I simply want to draw a background picture for a plot using the standard graphics package. Thanks, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Determine area between two density plots
I wonder what the best way is to access those values. I am using the following code: x1 <- c(1,2,1,3,5,6,6,7,7,8) x2 <- c(1,2,1,3,5,6,5,3,8,7) d1 <- density(x1, na.rm = TRUE) d2 <- density(x2, na.rm = TRUE) plot(d1, lwd=3, main="bla") lines(d2, lty=2, lwd=3) d1[1] d1[2] The last two lines allow me to access 1000 values, but I don't know if this is the right approach. I also don't know why they are in two columns. Does density have a saver way to get to those values? Ralf Ralf On Wed, Sep 22, 2010 at 5:25 PM, Peter Alspach wrote: > Tena koe Ralf > > If you save the results of density() > > x1Den <- density(x1) > > you get the x and y values of the line which is plotted. Similarly for x2 - > you can then use these to shade the joint area and find the area. Tinkering > with the arguments of density to make the x values for each the same will > make this process easier. Let me know if you'd like more details. > > HTH > > Peter Alspach > >> -Original Message- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- >> project.org] On Behalf Of Ralf B >> Sent: Thursday, 23 September 2010 8:55 a.m. >> To: r-help Mailing List >> Subject: [R] Determine area between two density plots >> >> Hi group, >> >> I am creating two density plots as shown in the code below: >> >> x1 <- c(1,4,5,3,2,3,4,5,6,5,4,3,2,1,1,1,2,3) >> x2 <- c(1,4,5,3,5,7,4,5,6,1,1,1,2,1,1,1,2,3) >> plot(density(x1, na.rm = TRUE)) >> polygon(density(x2, na.rm = TRUE), border="blue") >> >> How can I determine the area that is covered between the two plots as >> a number and how can I grey (or highlight with a pattern) the area >> that lies between the two lines? >> >> Thanks, >> Ralf >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > > The contents of this e-mail are confidential and may be subject to legal > privilege. > If you are not the intended recipient you must not use, disseminate, > distribute or > reproduce all or any part of this e-mail or attachments. If you have > received this > e-mail in error, please notify the sender and delete all material pertaining > to this > e-mail. Any opinion or views expressed in this e-mail are those of the > individual > sender and may not represent those of The New Zealand Institute for Plant and > Food Research Limited. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Length of vector without NA's
Hi, this following code: x<-c(1,2,NA) length(x) returns 3, correctly counting numbers as well as NA's. How can I exclude NA's from this count? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotting densities
Hi group, I am currently plotting two densities using the following code: x1 <- c(1,2,1,3,5,6,6,7,7,8) x2 <- c(1,2,1,3,5,6,5,7) plot(density(x1, na.rm = TRUE)) polygon(density(x2, na.rm = TRUE), border="blue") However, I would like to avoid bordering the second density as it adds a nasty bottom line which I would like to avoid. I would also rather have a dashed or dotted line for the second (currently blue) density but without the bottom part. Any idea how to do that? Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Determine area between two density plots
Hi group, I am creating two density plots as shown in the code below: x1 <- c(1,4,5,3,2,3,4,5,6,5,4,3,2,1,1,1,2,3) x2 <- c(1,4,5,3,5,7,4,5,6,1,1,1,2,1,1,1,2,3) plot(density(x1, na.rm = TRUE)) polygon(density(x2, na.rm = TRUE), border="blue") How can I determine the area that is covered between the two plots as a number and how can I grey (or highlight with a pattern) the area that lies between the two lines? Thanks, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple Lorenz curves in one diagram
Hi group, I would like to draw multiple Lorenz curves in a single plot using data already prepared. Here is a simple example: require("lawstat") lorenz.curve(c(1,2,3),c(4,5,4)) lorenz.curve(c(1,2,3),c(4,2,1)) This example draws two separate graphs. How can I combine them in a distinguishable way? I tried ?polygon without success... Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extracting bins and frequencies from frequency table
Dear R users, I would like to great a frequency table from raw data and then access the classes/bins and their respective frequencies separately. Here the code to create the frequency tables: x1 <- c(1,5,1,1,2,2,3,4,5,3,2,3,6,4,3,8) t1 <- table(x1) print(t1[1]) Its easy to plot this, but how do I actually access the frequencies alone and the bins alone? Basically I am looking to get: bins <- c(1, 2, 3, 4, 5, 6, 8) freq <- c(3, 3, 4, 2, 2, 1, 1) When running print(t1[1]) I only get one pair. It seems to be organized that way. Is there a better way? Perhaps 'table' is not the right approach? Thanks a lot, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Combined plot: Scatter + density plot
Hi, in order to save space for a publication, it would be nice to have a combined scatter and density plot similar to what is shows on http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=78 I wonder if anybody perhaps has already developed code for this and is willing to share. This is the reproducible code for the histogram version obtained from the site: def.par <- par(no.readonly = TRUE) # save default, for resetting... x <- pmin(3, pmax(-3, rnorm(50))) y <- pmin(3, pmax(-3, rnorm(50))) xhist <- hist(x, breaks=seq(-3,3,0.5), plot=FALSE) yhist <- hist(y, breaks=seq(-3,3,0.5), plot=FALSE) top <- max(c(xhist$counts, yhist$counts)) xrange <- c(-3,3) yrange <- c(-3,3) nf <- layout(matrix(c(2,0,1,3),2,2,byrow=TRUE), c(3,1), c(1,3), TRUE) par(mar=c(3,3,1,1)) plot(x, y, xlim=xrange, ylim=yrange, xlab="", ylab="") par(mar=c(0,3,1,1)) barplot(xhist$counts, axes=FALSE, ylim=c(0, top), space=0) par(mar=c(3,0,1,1)) barplot(yhist$counts, axes=FALSE, xlim=c(0, top), space=0, horiz=TRUE) par(def.par) I am basically stuck from line 6 where the bin information from the histogram is used for determining plotting sizes. Density are different and don't have (equal) bins and their size would need to be determined differently. I wonder if somebody here has created such a diagram already and is willing to share ideas/code/pointers to similar examples. Your effort is highly appreciated. Thanks a lot, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Baumgartner-Weiss-Schindler test
Hi R group, I am wondering if there is any implementation of the Baumgartner-Weiss-Schindler test in R, as described in: http://www.jstor.org/stable/2533862 It is a non-parametric test, that works similar to KS and others testing the null hypothesis that two sets of data originate from the the same population. Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] offlist comment Re: KS Test question (2)
Hi David, I would like to apologize for what I wrote earlier. It was late and I was frustrated. Please give me time to adapt to the formal structures of the forum. Best, Ralf On Thu, Aug 5, 2010 at 7:32 AM, David Winsemius wrote: > > On Aug 5, 2010, at 4:10 AM, Ralf B wrote: > >> This is unbelievable. Now people like yourself start doing background >> searches on one and accusing one of not being professional > > Your words, not mine. > >> plus posting cheeky R code. > > It appeared that you were having problems and did not have an efficient > strategy for searching the archives, so I shared with you code that I > developed and have put in my .Rprofile setup file. I do no see where that is > "posting cheeky R code". I saw it as trying to be constructive. Using it > would only be part of the recommended actions to take before posting > > >> The reason why I submitted the questions I have >> submitted was that these answers did not satisfy my particular problem >> (or perhaps I mistakenly thought so). The point here is that the forum >> should be a forum where one should be allowed to ask questions without >> first studying the history of the the entire forum in fear that >> someone might have asked it before. > > If you read the Posting Guide I think you will find precisely the opposite > expectation explicitly presented. Using my "cheeky code" would only be part > of the recommended actions to take before posting if you follow the > recommendations of the "Do your homework before posting:" section. This list > was not set up to be a chat room or a tutoring center for general questions > in statistics. > > While you are reading the Posting Guide, please note that it expresses this > advice regarding posting messages that were sent privately: > > "Take care when you quote other people's comments to respect their rights, > e.g., as summarized here. In particular > > • Private messages should never be quoted without permission, " > > >> I was hoping that I could find >> clearer answers then what I was able to read. I do know how to search >> in Google. But I am not an expert in statistics, as you already found >> in your background check. If I would be fluent in stastitsics and R >> and if past answers would have exactly satisfied my problem I would >> not post here and I certainly would not have occupied your expensive >> attention. >> >> >> >> >> >> On Wed, Aug 4, 2010 at 6:16 PM, David Winsemius >> wrote: >>> >>> On Aug 4, 2010, at 5:49 PM, Ralf B wrote: >>> >>>> Hi R Users, >>>> >>>> I have two vectors, x and y, of equal length representing two types of >>>> data from two studies. I would like to test if they are similar enough >>>> to use them interchangeably. No assumptions about distributions can be >>>> made (initial tests clearly show that they are not normal). >>>> Here some result: >>>> >>>> Two-sample Kolmogorov-Smirnov test >>>> >>>> data: x and y >>>> D = 0.1091, p-value < 2.2e-16 >>>> alternative hypothesis: two-sided >>>> >>>> Warning message: >>>> In ks.test(x[1:nx], y[1:nx], exact = FALSE) : >>>> cannot compute correct p-values with ties >>>> >>>> Here some questions: >>>> >>>> a) What does the error message means and what does it imply? >>>> b) The data is very noisy and the initial result shows that there is >>>> no relation between x and y. Is there a way to calculate and effect >>>> size? >>>> c) Can the p-value be used, when running tests over a large amount of >>>> different data sets, as a metric for ranking similarity between x and >>>> y data sets? >>> >>> There has been quite a bit of discussion on this list over the years >>> about >>> why KS test is not good in this situation. If I read the results of a >>> search >>> on your name correctly, you are in a department of Information Sciences. >>> I >>> would have thought that the first reaction of someone in that field would >>> be >>> do do a search on a question. Why are you filling up the archives with >>> questions that have been repeatedly asked and answered? >>> >>> Do you need help in this area? >>> >>> rhelpSearch <- function(string, >>> restrict = c("Rhelp10", "Rhelp08", "Rhelp02", "functions" >>> ), >>> matchesPerPage = 100, ...) >>> RSiteSearch(string=string, restrict = restrict, matchesPerPage = >>> matchesPerPage, ...) >>> >>> >>> rhelpSearch("KS.test ties p-value") >>> >>>> >>>> Best >>>> R. > > > -- > David Winsemius, MD > West Hartford, CT > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error: cannot allocate vector of size xxx Mb
Thank you for such a careful and thorough analysis of the problem and your comparison with your configuration. I very much appreciate. For completeness and (perhaps) further comparison, I have executed 'version' and sessionInfo() as well: > version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status RC major 2 minor 10.0 year 2009 month 10 day25 svn rev50206 language R version.string R version 2.10.0 RC (2009-10-25 r50206) > sessionInfo() R version 2.10.0 RC (2009-10-25 r50206) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] splines stats4grid stats graphics grDevices utils [8] datasets methods base other attached packages: [1] flexmix_2.2-7 multcomp_1.1-7survival_2.35-8 mvtnorm_0.9-9 [5] modeltools_0.2-16 lattice_0.18-3car_1.2-16psych_1.0-88 [9] nortest_1.0 gplots_2.8.0 caTools_1.10 bitops_1.0-4.1 [13] gdata_2.8.0 gtools_2.6.2 ggplot2_0.8.7 digest_0.4.2 [17] reshape_0.8.3 plyr_0.1.9proto_0.3-8 RJDBC_0.1-5 [21] rJava_0.8-2 DBI_0.2-5 loaded via a namespace (and not attached): [1] tools_2.10.0 > memory.limit() [1] 2047 Also, the example i presented was a simplified reproduction of the real data structure. My real data structure does not have reused vectors. I merely wanted to show the error occurring when processing large vectors into data frames and then binding these data frames together. I hope this additional information helps. I might add that I am running this in StatET under Eclipse with 512 MB of allocated RAM in the environment. Besides adding more memory, can you spot simple ways of how memory use can be improved? I know that I am running quite a bit of baggage. Unfortunately my script is rather comprehensive and my example is really just a simplified part that I created to reproduce the problem. Thanks, Ralf On Thu, Aug 5, 2010 at 4:44 AM, Petr PIKAL wrote: > Hi > > r-help-boun...@r-project.org napsal dne 05.08.2010 09:53:21: > >> I am dealing with very large data frames, artificially created with >> the following code, that are combined using rbind. >> >> >> a <- rnorm(500) >> b <- rnorm(500) >> c <- rnorm(500) >> d <- rnorm(500) >> first <- data.frame(one=a, two=b, three=c, four=d) >> second <- data.frame(one=d, two=c, three=b, four=a) > > Up to this point there is no error on my system > >> version > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status Under development (unstable) > major 2 > minor 12.0 > year 2010 > month 05 > day 31 > svn rev 52164 > language R > version.string R version 2.12.0 Under development (unstable) (2010-05-31 > r52164) > >> sessionInfo() > R version 2.12.0 Under development (unstable) (2010-05-31 r52164) > Platform: i386-pc-mingw32/i386 (32-bit) > > attached base packages: > [1] stats grDevices datasets utils graphics methods base > > other attached packages: > [1] lattice_0.18-8 fun_1.0 > > loaded via a namespace (and not attached): > [1] grid_2.12.0 tools_2.12.0 > >> rbind(first, second) > > Although size of first and second is only roughly 160 MB their > concatenation probably consumes all remaining memory space as you already > have a-d first and second in memory. > > Regards > Petr > >> >> which results in the following error for each of the statements: >> >> > a <- rnorm(500) >> Error: cannot allocate vector of size 38.1 Mb >> > b <- rnorm(500) >> Error: cannot allocate vector of size 38.1 Mb >> > c <- rnorm(500) >> Error: cannot allocate vector of size 38.1 Mb >> > d <- rnorm(500) >> Error: cannot allocate vector of size 38.1 Mb >> > first <- data.frame(one=a, two=b, three=c, four=d) >> Error: cannot allocate vector of size 38.1 Mb >> > second <- data.frame(one=d, two=c, three=b, four=a) >> Error: cannot allocate vector of size 38.1 Mb >> > rbind(first, second) >> >> When running memory.limit() I am getting this: >> >> memory.limit() >> [1] 2047 >> >> Which shows me that I have 2 GB of memory available. What is wrong? >> Shouldn't 38 MB be very feasible? >> >> Best, >> Ralf >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http:
Re: [R] offlist comment Re: KS Test question (2)
This is unbelievable. Now people like yourself start doing background searches on one and accusing one of not being professional plus posting cheeky R code. The reason why I submitted the questions I have submitted was that these answers did not satisfy my particular problem (or perhaps I mistakenly thought so). The point here is that the forum should be a forum where one should be allowed to ask questions without first studying the history of the the entire forum in fear that someone might have asked it before. I was hoping that I could find clearer answers then what I was able to read. I do know how to search in Google. But I am not an expert in statistics, as you already found in your background check. If I would be fluent in stastitsics and R and if past answers would have exactly satisfied my problem I would not post here and I certainly would not have occupied your expensive attention. On Wed, Aug 4, 2010 at 6:16 PM, David Winsemius wrote: > > On Aug 4, 2010, at 5:49 PM, Ralf B wrote: > >> Hi R Users, >> >> I have two vectors, x and y, of equal length representing two types of >> data from two studies. I would like to test if they are similar enough >> to use them interchangeably. No assumptions about distributions can be >> made (initial tests clearly show that they are not normal). >> Here some result: >> >> Two-sample Kolmogorov-Smirnov test >> >> data: x and y >> D = 0.1091, p-value < 2.2e-16 >> alternative hypothesis: two-sided >> >> Warning message: >> In ks.test(x[1:nx], y[1:nx], exact = FALSE) : >> cannot compute correct p-values with ties >> >> Here some questions: >> >> a) What does the error message means and what does it imply? >> b) The data is very noisy and the initial result shows that there is >> no relation between x and y. Is there a way to calculate and effect >> size? >> c) Can the p-value be used, when running tests over a large amount of >> different data sets, as a metric for ranking similarity between x and >> y data sets? > > There has been quite a bit of discussion on this list over the years about > why KS test is not good in this situation. If I read the results of a search > on your name correctly, you are in a department of Information Sciences. I > would have thought that the first reaction of someone in that field would be > do do a search on a question. Why are you filling up the archives with > questions that have been repeatedly asked and answered? > > Do you need help in this area? > > rhelpSearch <- function(string, > restrict = c("Rhelp10", "Rhelp08", "Rhelp02", "functions" > ), > matchesPerPage = 100, ...) > RSiteSearch(string=string, restrict = restrict, matchesPerPage = > matchesPerPage, ...) > > > rhelpSearch("KS.test ties p-value") > >> >> Best >> R. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error: cannot allocate vector of size xxx Mb
I am dealing with very large data frames, artificially created with the following code, that are combined using rbind. a <- rnorm(500) b <- rnorm(500) c <- rnorm(500) d <- rnorm(500) first <- data.frame(one=a, two=b, three=c, four=d) second <- data.frame(one=d, two=c, three=b, four=a) rbind(first, second) which results in the following error for each of the statements: > a <- rnorm(500) Error: cannot allocate vector of size 38.1 Mb > b <- rnorm(500) Error: cannot allocate vector of size 38.1 Mb > c <- rnorm(500) Error: cannot allocate vector of size 38.1 Mb > d <- rnorm(500) Error: cannot allocate vector of size 38.1 Mb > first <- data.frame(one=a, two=b, three=c, four=d) Error: cannot allocate vector of size 38.1 Mb > second <- data.frame(one=d, two=c, three=b, four=a) Error: cannot allocate vector of size 38.1 Mb > rbind(first, second) When running memory.limit() I am getting this: memory.limit() [1] 2047 Which shows me that I have 2 GB of memory available. What is wrong? Shouldn't 38 MB be very feasible? Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] split / lapply over multiple columns
Besides beauty, is there an actual advantage in terms of run-time and/or memory use? Ralf On Wed, Aug 4, 2010 at 3:44 PM, Bert Gunter wrote: > It's not that it's "bad" -- it's just unnecessarily clumsy. ALmost > always, tapply/by will do the same thing more simply. > > -- Bert > > On Wed, Aug 4, 2010 at 10:10 AM, Ralf B wrote: >>> In general, the lapply(split(...)) construction should never be used. >> >> Why? What makes it so bad to use? >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Output (graphics and table/text)
Hi R Users, I need to produce a simple report consisting of some graphs and a statistic. Here simplification of it: # graphics output test a <- c(1,3,2,1,4) b <- c(2,1,1,1,2) c <- c(4,7,2,4,5) d <- rnorm(500) e <- rnorm(600) op <- par(mfrow=c(3,2)) pie(a) pie(b) pie(c) text(ks.test(d,e)) obviously, the ks.test does not make it to the output. How can this be archived by a) simply dumpting the text into the fourth quad so that coordination is relative to the quarter? b) the output is actually presented as a little table without the need to use a LaTeX solution? Thanks a lot, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] KS Test question (2)
Hi R Users, I have two vectors, x and y, of equal length representing two types of data from two studies. I would like to test if they are similar enough to use them interchangeably. No assumptions about distributions can be made (initial tests clearly show that they are not normal). Here some result: Two-sample Kolmogorov-Smirnov test data: x and y D = 0.1091, p-value < 2.2e-16 alternative hypothesis: two-sided Warning message: In ks.test(x[1:nx], y[1:nx], exact = FALSE) : cannot compute correct p-values with ties Here some questions: a) What does the error message means and what does it imply? b) The data is very noisy and the initial result shows that there is no relation between x and y. Is there a way to calculate and effect size? c) Can the p-value be used, when running tests over a large amount of different data sets, as a metric for ranking similarity between x and y data sets? Best R. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] KS Test questions
1) When running ks.test, I am getting the following error after the test presents its result:: 'ks.test(x, y) : cannot compute correct p-values with ties' I wonder what means and what causes it. 2) Also, how do I calculate an effect size from this statistic? R. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] split / lapply over multiple columns
> In general, the lapply(split(...)) construction should never be used. Why? What makes it so bad to use? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kullback–Leibler divergence question (fl exmix::KLdiv) Urgent!
Hi all, x <- cbind(rnorm(500),rnorm(500)) KLdiv(x, eps=1e-4) KLdiv(x, eps=1e-5) KLdiv(x, eps=1e-6) KLdiv(x, eps=1e-7) KLdiv(x, eps=1e-8) KLdiv(x, eps=1e-9) KLdiv(x, eps=1e-10) ... KLdiv(x, eps=1e-100) ... KLdiv(x, eps=1e-1000) When calling flexmix::KLdiv using the given code I get results with increasing value the smaller I pick the accuracy parameter 'eps' until finally reaching infinite. If I pick the number too low, I get NA as a result. What is the best value for eps and how should one deal with this? Should I simple pick a value that returns an result and then keep the accuracy value constant at this level for all my analysis in order to get comparable results? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding collumn to existing data frame
Data frame is given by the rest of the script and not really an option. Other than that, you are absolutely right. Ralf On Tue, Aug 3, 2010 at 11:34 PM, Dennis Murphy wrote: > Wouldn't a list be a better object type if the variables you want to add > have variable lengths? This way you don't have to worry about nuisances such > as NA padding. Just a thought... > > Dennis > > On Tue, Aug 3, 2010 at 7:54 PM, Ralf B wrote: >> >> Actually it does -- one has to use feed the result back into the >> original variable: >> >> add.col <- function(df, vec, namevec){ >> if (nrow(df) < length(vec) ){ df <- # pads rows if needed >> rbind(df, matrix(NA, length(vec)-nrow(df), ncol(df), >> dimnames=list( NULL, names(df) ) ) ) >> } >> length(vec) <- nrow(df) # pads with NA's >> df[, namevec] <- vec; # names new col properly >> return(df) >> } >> >> mydata <- NULL >> mydata <- data.frame(userid = c(5, 6, 5, 6, 5, 6), taskid = c(1, 1, 2, 2, >> 3, 3), >> stuff = 11:16) >> mydata <- add.col(mydata, c(1,2,3,4),"test1") >> mydata <- add.col(mydata, c(1,2,3,4,5,6,7,8),"test2") >> mydata >> >> >> Thanks a lot, David and all others here you made the effort! >> Ralf >> >> >> On Tue, Aug 3, 2010 at 10:37 PM, David Winsemius >> wrote: >> > >> > On Aug 3, 2010, at 10:35 PM, David Winsemius wrote: >> > >> >> >> >> On Aug 3, 2010, at 8:32 PM, Ralf B wrote: >> >> >> >>> Hi experts, >> >>> >> >>> I am trying to write a very flexible method that allows me to add a >> >>> new column to an existing data frame. This is what I have so far: >> >>> >> >>> add.column <- function(df, new.col, name) { >> >>> n.row <- dim(df)[1] >> >>> length(new.col) <- n.row >> >>> names(new.col) <- name >> >>> return(cbind(df, new.col)) >> >>> } >> >>> >> >>> df <- NULL >> >>> df <- data.frame(a=c(1,2,3)) >> >>> df >> >>> # corect: added NA to new collumn >> >>> df <- add.column(df,c(1,2),'myNewColumn2') >> >>> df >> >>> # problem: not added, data frame should be extended with NAs >> >>> add.column(df,c(1,2,3,4),'myNewColumn3') >> >>> df >> >>> >> >>> >> >>> However, there are two problems: >> >>> >> >>> 1) The column name is not renamed accurately but always set to >> >>> 'new.col' . Surely this could be done outside the function, but it >> >>> would be better if its self contained. >> >> >> >> Try this: >> >> >> >> add.col <- function(df, vec, namevec){ >> >> length(vec) <- nrow(df) # pads with NA's >> >> cbind(df, namevec=vec)} # names new col >> >> properly >> >> >> > Actually it doesn't name column correctky... see below for a method >> > with "[ >> > <-" . >> > >> >>> 2) It does not work for cases where new.col is longer than the length >> >>> of the data frame. In such cases, I would like to add NA's to the data >> >>> frame if it has less rows. >> >> >> >> Don't have a compact answer to this. (Tried re-dimensioning with "dim() >> >> <-" but it was not accepted by the interpreter. Would need to add a >> >> test >> >> at the beginning and then pad with rows of NA's using rbind before >> >> cbinding >> >> as above. >> >> >> >> add.col <- function(df, vec, namevec){ >> >> if (nrow(df) < length(vec) ){ df <- # pads rows if needed >> >> rbind(df, matrix(NA, length(vec)-nrow(df), ncol(df), >> >> dimnames=list( NULL, names(df) ) ) >> >> ) } >> >> length(vec) <- nrow(df) # pads with NA's >> >> df[, namevec] <- vec; # names new col properly >> >> return(df)} >> >> >> >>> >> >>> Any ideas to to solve this? >> >> >> >> Has not been tested with columns of varying types. >> >> >> > >> > David Winsemius, MD >> > West Hartford, CT >> > >> > >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding collumn to existing data frame
Actually it does -- one has to use feed the result back into the original variable: add.col <- function(df, vec, namevec){ if (nrow(df) < length(vec) ){ df <- # pads rows if needed rbind(df, matrix(NA, length(vec)-nrow(df), ncol(df), dimnames=list( NULL, names(df) ) ) ) } length(vec) <- nrow(df) # pads with NA's df[, namevec] <- vec; # names new col properly return(df) } mydata <- NULL mydata <- data.frame(userid = c(5, 6, 5, 6, 5, 6), taskid = c(1, 1, 2, 2, 3, 3), stuff = 11:16) mydata <- add.col(mydata, c(1,2,3,4),"test1") mydata <- add.col(mydata, c(1,2,3,4,5,6,7,8),"test2") mydata Thanks a lot, David and all others here you made the effort! Ralf On Tue, Aug 3, 2010 at 10:37 PM, David Winsemius wrote: > > On Aug 3, 2010, at 10:35 PM, David Winsemius wrote: > >> >> On Aug 3, 2010, at 8:32 PM, Ralf B wrote: >> >>> Hi experts, >>> >>> I am trying to write a very flexible method that allows me to add a >>> new column to an existing data frame. This is what I have so far: >>> >>> add.column <- function(df, new.col, name) { >>> n.row <- dim(df)[1] >>> length(new.col) <- n.row >>> names(new.col) <- name >>> return(cbind(df, new.col)) >>> } >>> >>> df <- NULL >>> df <- data.frame(a=c(1,2,3)) >>> df >>> # corect: added NA to new collumn >>> df <- add.column(df,c(1,2),'myNewColumn2') >>> df >>> # problem: not added, data frame should be extended with NAs >>> add.column(df,c(1,2,3,4),'myNewColumn3') >>> df >>> >>> >>> However, there are two problems: >>> >>> 1) The column name is not renamed accurately but always set to >>> 'new.col' . Surely this could be done outside the function, but it >>> would be better if its self contained. >> >> Try this: >> >> add.col <- function(df, vec, namevec){ >> length(vec) <- nrow(df) # pads with NA's >> cbind(df, namevec=vec)} # names new col properly >> > Actually it doesn't name column correctky... see below for a method with "[ > <-" . > >>> 2) It does not work for cases where new.col is longer than the length >>> of the data frame. In such cases, I would like to add NA's to the data >>> frame if it has less rows. >> >> Don't have a compact answer to this. (Tried re-dimensioning with "dim() >> <-" but it was not accepted by the interpreter. Would need to add a test >> at the beginning and then pad with rows of NA's using rbind before cbinding >> as above. >> >> add.col <- function(df, vec, namevec){ >> if (nrow(df) < length(vec) ){ df <- # pads rows if needed >> rbind(df, matrix(NA, length(vec)-nrow(df), ncol(df), >> dimnames=list( NULL, names(df) ) ) ) } >> length(vec) <- nrow(df) # pads with NA's >> df[, namevec] <- vec; # names new col properly >> return(df)} >> >>> >>> Any ideas to to solve this? >> >> Has not been tested with columns of varying types. >> > > David Winsemius, MD > West Hartford, CT > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] split / lapply over multiple columns
Hi all, I have a data frame with column over which I would like to run repeated functions for data analysis. Currently I am only running recursively over two columns where I column 1 has two states over which I split and column two has 3 states. The function therefore runs 2 x 3 = 6 times as shown when running the following code: mydata <- data.frame(userid = c(5, 6, 5, 6, 5, 6), taskid = c(1, 1, 2, 2, 3, 3), stuff = 11:16) mydata mydata <- mydata[with(mydata, order(userid, taskid)), ] mydata lapply(split(mydata, mydata[,1]), function(x){ lapply(split(x, x[,2]), function(y){ print(paste("result:",y)) }) }) This traverses the tree like this: 5,1 5,2 5,3 6,1 6,2 6,3 Is there an easier way of doing that? I would like to provide the two columns (index 1 and index 2) directly and have the ?lapply function perform its lambda function directly on each memebr of the tree automatically? How can I do that? Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adding collumn to existing data frame
Hi experts, I am trying to write a very flexible method that allows me to add a new column to an existing data frame. This is what I have so far: add.column <- function(df, new.col, name) { n.row <- dim(df)[1] length(new.col) <- n.row names(new.col) <- name return(cbind(df, new.col)) } df <- NULL df <- data.frame(a=c(1,2,3)) df # corect: added NA to new collumn df <- add.column(df,c(1,2),'myNewColumn2') df # problem: not added, data frame should be extended with NAs add.column(df,c(1,2,3,4),'myNewColumn3') df However, there are two problems: 1) The column name is not renamed accurately but always set to 'new.col' . Surely this could be done outside the function, but it would be better if its self contained. 2) It does not work for cases where new.col is longer than the length of the data frame. In such cases, I would like to add NA's to the data frame if it has less rows. Any ideas to to solve this? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Flip axis on hist2d plot
I am plotting a heatmap using the hist2d function: require("gplots") x <- rnorm(2000) y <- rnorm(2000) hist2d(x, y, freq=TRUE, nbins=50, col = c("white",heat.colors(256))) However, I would like to flip the vertical y axis so that the upper left corner serves as the y-origin. How can I do that? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unique rows in data frame (with condition)
I have to deal with data frames that contain multiple entries of the same (based on an identifying collumn 'id'). The second collumn is mostly corresponding to the the id collumn which means that double entries can be eliminated with ?unique. a <- unique(data.frame(timestamp=c(3,3,3,5,8), mylabel=c("a","a","a","b","c"))) However sometimes I have dataframes like this: a <- unique(data.frame(timestamp=c(3,3,3,5,8), mylabel=c("a","z","a","b","c"))) which then results in: timestamp mylabel 1 3 a 2 3 z 4 5 b 5 8 c However, I want only the first occurance of timestamp and then selected over the first label resulting in an output like this: timestamp mylabel 1 3 a 4 5 b 5 8 c Is there something like groupBy (like in SQL) ? Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Linear Interpolation question
Hi R experts, I have the following timeseries data: #example data structure a <- c(NA,1,NA,5,NA,NA,NA,10,NA,NA) c <- c(1:10) df <- data.frame(timestamp=a, sequence=c) print(df) where i would like to linearly interpolate between the points 1,5, and 10 in 'timestamp'. Original timestamps should not be modified. Here the code I use to run the interpolation (so far): # linear interpolation print(c) results <- approx(df$sequence, df$timestamp, n=NROW(df)) print(results) df$timestamp <- results$y # plotting plot(c, a, main = "Linear Interpolation with approx") points(results, col = 2, pch = "*") # new dataframe print(df) when looking at the result dataframe however, I can see that the original timestamps have been shifted as well. however would i would like to have is a result where the timestamps at position 2,4 and 8 remain unchanged at the values 1,5, and 10. I also would like values before the first item to be constant. So the dataframe should look like this: timestamp sequence 1 1.001 2 1.002 3 3.003 4 5.004 5 6.255 6 7.506 7 8.757 8 10.08 9 10.00 9 10 10.00 10 How do I have the change the syntax of my script to make that work? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] KLdiv question
I am having a data set that causes flexmix::KLdiv to produce NA as a result and I was told that increasing the sensitivity of the 'esp' value can be used to avoid a lot of values being set to a default (which presumably causes the problem). Now here my question. When running KLdiv on a normal distribution: a <- rnorm(5) b <- rnorm(5) mydata <- cbind(a,b) KLdiv(mydata, esp=1e-4) KLdiv(mydata, esp=1e-5) KLdiv(mydata, esp=1e-6) KLdiv(mydata, esp=1e-7) KLdiv(mydata, esp=1e-8) KLdiv(mydata, esp=1e-9) KLdiv(mydata, esp=1e-10) KLdiv(mydata, esp=1e-100) the result is stable independent from the chosen esp accuracy. However, when I run the data on a distribution such as values in a given range, I get NA and the method seems not to work independently of how high I choose the accuracy. y1 <- sample(1:1280, 20, replace=T) y2 <- sample(1:1280, 20, replace=T) mydata2 <- cbind(y1,y2) KLdiv(mydata2, esp=1e-4) KLdiv(mydata2, esp=1e-5) KLdiv(mydata2, esp=1e-6) KLdiv(mydata2, esp=1e-7) KLdiv(mydata2, esp=1e-8) KLdiv(mydata2, esp=1e-9) KLdiv(mydata2, esp=1e-10) KLdiv(mydata2, esp=1e-100) Am I doing something wrong here? Does KL have any distributional assumptions that I violate? Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reset R environment through R command
With environment I actually meant workspace. On Thu, Jul 29, 2010 at 1:22 PM, Ralf B wrote: > Is it possible to remove all variables in the current environment > through a R command. > > Here is what I want: > > x <- 5 > y < 10:20 > reset() > print(x) > print(y) > > Output should be NULL for x and y, and not 5 and 10:20. > > Can one do that in R? > > Best, > Ralf > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reset R environment through R command
Is it possible to remove all variables in the current environment through a R command. Here is what I want: x <- 5 y < 10:20 reset() print(x) print(y) Output should be NULL for x and y, and not 5 and 10:20. Can one do that in R? Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Spearman's Correlation Coefficient to compare distributions?
Hi, I have distributions from two different data sets and I would like to measure how similar their distributions (in terms of their bin frequencies) are. In other words, I am not interested in the exact sequence of data points but rather in the their distributional properties and in their similarities. Spearman's Correlation Coefficient is used to compare data without the assumption of normality. I wonder if this measure can also be used to compare distributional data rather than the data poitns that are summarized in a distribution. Here the example code that exemplifies what I would like to check: aNorm <- rnorm(100) bNorm <- rnorm(100) cUni <- runif(100) ha <- hist(aNorm) hb <- hist(bNorm) hc <- hist(cUni) print(ha$counts) print(hb$counts) print(hc$counts) # relatively similar n <- min(c(NROW(ha$counts),NROW(hb$counts))) cor.test(ha$counts[1:n], hb$counts[1:n], method="spearman") # quite different n <- min(c(NROW(ha$counts),NROW(hc$counts))) cor.test(ha$counts[1:n], hc$counts[1:n], method="spearman") Does this make sense or am I violating some assumptions of the coefficient? Thanks, R. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Statistical mailing list
I am looking for a mailing list for general statistical questions that are not R related. Do you have any suggestions for lists that are busy and helpful and/or lists that you use and recommend? Thanks in advance, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about KLdiv and large datasets
Is the 'eps' argument part of KLdiv (was not able to find that in the help pages) or part of a general environment (such as the graphics parameters 'par' ) ? I am asking so that I can read about it what it actually does to resolve the question you already raised about its reliability... Ralf On Fri, Jul 16, 2010 at 10:41 AM, Peter Ehlers wrote: > On 2010-07-16 7:56, Ralf B wrote: >> >> Hi all, >> >> when running KL on a small data set, everything is fine: >> >> require("flexmix") >> n<- 20 >> a<- rnorm(n) >> b<- rnorm(n) >> mydata<- cbind(a,b) >> KLdiv(mydata) >> >> however, when this dataset increases >> >> require("flexmix") >> n<- 1000 >> a<- rnorm(n) >> b<- rnorm(n) >> mydata<- cbind(a,b) >> KLdiv(mydata) >> >> >> KL seems to be not defined. Can somebody explain what is going on? >> >> Thanks, >> Ralf > > Ralf, > > You can adjust the 'eps=' argument. But I don't know > what this will do to the reliability of the results. > > KLdiv(mydata, eps = 1e-7) > > -Peter Ehlers > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about KLdiv and large datasets
Hi all, when running KL on a small data set, everything is fine: require("flexmix") n <- 20 a <- rnorm(n) b <- rnorm(n) mydata <- cbind(a,b) KLdiv(mydata) however, when this dataset increases require("flexmix") n <- 1000 a <- rnorm(n) b <- rnorm(n) mydata <- cbind(a,b) KLdiv(mydata) KL seems to be not defined. Can somebody explain what is going on? Thanks, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to transform: 4 columns into two columns stacked
I have the following data structure: n=5 mydata <- data.frame(id=1:n, x=rnorm(n), y=rnorm(n), id=1:n, x=rnorm(n), y=rnorm(n)) print(mydata) producing the following represention id x y id.1 x.1y.1 1 1 0.5326855 -2.076337031 0.7930274 -1.0530558 2 2 0.7888909 0.633546932 0.5908323 -1.3543282 3 3 0.5350803 -0.201089313 2.5079242 -0.4657274 4 4 -1.3041960 -0.251951294 1.6294046 -1.4094830 5 5 0.3109767 -0.023059815 0.5183756 1.3084776 however I need to transform this data into this form: id x y 1 1 0.5326855 -2.07633703 2 2 0.7888909 0.63354693 3 3 0.5350803 -0.20108931 4 4 -1.3041960 -0.25195129 5 5 0.3109767 -0.02305981 6 1 0.7930274 -1.0530558 7 2 0.5908323 -1.3543282 8 3 2.5079242 -0.4657274 9 4 1.6294046 -1.4094830 10 5 0.5183756 1.3084776 what is the simplest way to do that? Thanks a lot in advance! Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] KLdiv question (data.frame)
Hi all, I wonder why KLdiv does not work with data.frames: n <- 50 mydata <- data.frame( sequence=c(1:n), data1=c(rnorm(n)), data2=c(rnorm(n)) ) # does NOT work KLdiv(mydata) # works fine dataOnly <- cbind(mydata$data1, mydata$data2, mydata$group) KLdiv(dataOnly) Any ideas? Is there a better implementation that can deal with data.frame or is there a simpler way of converting? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Repeated analysis over groups / Splitting by group variable
I am performing some analysis over a large data frame and would like to conduct repeated analysis over grouped-up subsets. How can I do that? Here some example code for clarification: require("flexmix") # for Kullback-Leibler divergence n <- 23 groups <- c(1,2,3) mydata <- data.frame( sequence=c(1:n), data1=c(rnorm(n)), data2=c(rnorm(n)), group=rep(sample(groups, n, replace=TRUE)) ) # Part 1: full stats (works fine) dataOnly <- cbind(mydata$data1, mydata$data2, mydata$group) KLdiv(dataOnly) # # Part 2: again - but once for each group (error) # by(dataOnly, groups, KLdiv(dataOnly)) The error I am getting is: Error in tapply(1L:23L, list(INDICES = c(1, 2, 3)), function (x) : arguments must have same length Are there better ways than 'by' ? I would like to use different stats and functions and therefore I am looking for a splitter whose output I can hand to any statical function I want. Any ideas? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merging columns along time line
I am resending this, as I believe it has not arrived on the mailing list when I first emailed. I have a set of labels arranged along a timeframe in a. Each label has a timestamp and marks a state until the next label. The dataframe a contains 5 such timestamps and 5 associated labels. This means, on a continious scale between 1-100, there are 5 markers. E.g. 'abc' marks the timestampls between 10 and 19, 'def' marks the timestamps between 20 and 32, and so on. a <- data.frame(timestamp=c(3,5,8), mylabel=c("abc","def","ghi")) b <- data.frame(timestamp=c(1:10)) I would like to assign these labels as an extra collumn 'label' to the data.frame b which currently only consists of a the timestamp. The output would then look like this: timestamp label 1 1NA 2 2NA 3 3"abc" 4 4"abc" 5 5"def" 6 6"def" 7 7"def" 8 8"ghi" 9 9"ghi" 10 10"ghi" What is the simplest way to assign these labels based on timestamps to get this output. The real dataset is several millions of rows... Thanks, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Arrange values on a timeline
I have a set of labels arranged along a timeframe in a. Each label has a timestamp and marks a state until the next label. The dataframe a contains 5 such timestamps and 5 associated labels. This means, on a continious scale between 1-100, there are 5 markers. E.g. 'abc' marks the timestampls between 10 and 19, 'def' marks the timestamps between 20 and 32, and so on. a <- data.frame(timestamp=c(3,5,8), mylabel=c("abc","def","ghi")) b <- data.frame(timestamp=c(1:10)) I would like to assign these labels as an extra collumn 'label' to the data.frame b which currently only consists of a the timestamp. The output would then look like this: timestamp label 1 1NA 2 2NA 3 3"abc" 4 4"abc" 5 5"def" 6 6"def" 7 7"def" 8 8"ghi" 9 9"ghi" 10 10"ghi" What is the simplest way to assign these labels based on timestamps to get this output. The real dataset is several millions of rows... Thanks, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] StartsWith over vector of Strings?
When running the combined code with your suggested line: content <- data.frame(urls=c( "http://www.google.com/search?source=ig&hl=en&rlz=&=&q=stuff&aq=f&aqi=g10&aql=&oq=&gs_rfai=CrrIS3VU8TJqcMJHuzASm9qyBBgAAAKoEBU_QsmVh";, "http://search.yahoo.com/search;_ylt=Atvki9MVpnxuEcPmXLEWgMqbvZx4?p=stuff&toggle=1&cop=mss&ei=UTF-8&fr=yfp-t-701";) ) searchset <- data.frame(signatures=c("http://www.google.com/search";)) content[na.omit(pmatch(searchset, content$urls))] print(content) I am getting both URLs as results, but in fact, would expect only the first URL. Am I overlooking something? Ralf On Tue, Jul 13, 2010 at 12:03 PM, Greg Snow wrote: > content[na.omit(pmatch(searchset, content,,TRUE))] > > -- > Gregory (Greg) L. Snow Ph.D. > Statistical Data Center > Intermountain Healthcare > greg.s...@imail.org > 801.408.8111 > > >> -----Original Message- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- >> project.org] On Behalf Of Ralf B >> Sent: Tuesday, July 13, 2010 5:47 AM >> To: r-help@r-project.org >> Subject: [R] StartsWith over vector of Strings? >> >> Given vectors of strings of arbitrary length >> >> content <- c("abc", "def") >> searchset <- c("a", "abc", "abcdef", "d", "def", "defghi") >> >> Is it possible to determine the content String set that matches the >> searchset in the sense of 'startswith' ? This would be a vector of all >> strings in content that start with the string of any of the strings in >> the searchset. In the little example here, this would be: >> >> result <- c("abc", "abc", "def", "def") >> >> Best, >> Ralf >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Substring function?
Hi all, I would like to detect all strings in the vector 'content' that contain the strings from the vector 'search'. Here a code example: content <- data.frame(urls=c( "http://www.google.com/search?source=ig&hl=en&rlz=&=&q=stuff&aq=f&aqi=g10&aql=&oq=&gs_rfai=CrrIS3";, "http://search.yahoo.com/search;_ylt=Atvki9MVpnxuEcPmXLEWgMqbvZx4?p=stuff&toggle=1";) ) search <- data.frame(signatures=c("http://www.google.com/search";)) subset(content, search$signatures %in% content$urls) I am getting an error: [1] urls <0 rows> (or 0-length row.names) What I would like to achieve is the return of "http://www.google.com/search?source=ig&hl=en&rlz=&=&q=stuff&aq=f&aqi=g10&aql=&oq=&gs_rfai=CrrIS3";. Is that possible? In practice I would like to run this over 1000s of strings in 'content' and 100s of strings in 'search'. Could I run into performance issues with this approach and, if so, are there better ways? Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] StartsWith over vector of Strings?
Given vectors of strings of arbitrary length content <- c("abc", "def") searchset <- c("a", "abc", "abcdef", "d", "def", "defghi") Is it possible to determine the content String set that matches the searchset in the sense of 'startswith' ? This would be a vector of all strings in content that start with the string of any of the strings in the searchset. In the little example here, this would be: result <- c("abc", "abc", "def", "def") Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast string comparison
I see. I did not get these performances since did not directly compare arrays but run seemingly expensive for-loops to do it iteratively... :( R. On Tue, Jul 13, 2010 at 1:42 AM, Hadley Wickham wrote: > strings <- replicate(1e5, paste(sample(letters, 100, rep = T), collapse = > "")) > system.time(strings[-1] == strings[-1e5]) > # user system elapsed > # 0.016 0.000 0.017 > > So it takes ~1/100 of a second to do ~100,000 string comparisons. You > need to provide a reproducible example that illustrates why you think > string comparisons are slow. > > Hadley > > > On Tue, Jul 13, 2010 at 6:52 AM, Ralf B wrote: >> I am asking this question because String comparison in R seems to be >> awfully slow (based on profiling results) and I wonder if perhaps '==' >> alone is not the best one can do. I did not ask for anything >> particular and I don't think I need to provide a self-contained source >> example for the question. So, to re-phrase my question, are there more >> (runtime) effective ways to find out if two strings (about 100-150 >> characters long) are equal? >> >> Ralf >> >> >> >> >> >> >> On Sun, Jul 11, 2010 at 2:37 PM, Sharpie wrote: >>> >>> >>> Ralf B wrote: >>>> >>>> What is the fastest way to compare two strings in R? >>>> >>>> Ralf >>>> >>> >>> Which way is not fast enough? >>> >>> In other words, are you asking this question because profiling showed one of >>> R's string comparison operations is causing a massive bottleneck in your >>> code? If so, which one and how are you using it? >>> >>> -Charlie >>> >>> - >>> Charlie Sharpsteen >>> Undergraduate-- Environmental Resources Engineering >>> Humboldt State University >>> -- >>> View this message in context: >>> http://r.789695.n4.nabble.com/Fast-string-comparison-tp2285156p2285409.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Assistant Professor / Dobelman Family Junior Chair > Department of Statistics / Rice University > http://had.co.nz/ > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast string comparison
I am asking this question because String comparison in R seems to be awfully slow (based on profiling results) and I wonder if perhaps '==' alone is not the best one can do. I did not ask for anything particular and I don't think I need to provide a self-contained source example for the question. So, to re-phrase my question, are there more (runtime) effective ways to find out if two strings (about 100-150 characters long) are equal? Ralf On Sun, Jul 11, 2010 at 2:37 PM, Sharpie wrote: > > > Ralf B wrote: >> >> What is the fastest way to compare two strings in R? >> >> Ralf >> > > Which way is not fast enough? > > In other words, are you asking this question because profiling showed one of > R's string comparison operations is causing a massive bottleneck in your > code? If so, which one and how are you using it? > > -Charlie > > - > Charlie Sharpsteen > Undergraduate-- Environmental Resources Engineering > Humboldt State University > -- > View this message in context: > http://r.789695.n4.nabble.com/Fast-string-comparison-tp2285156p2285409.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fast string comparison
What is the fastest way to compare two strings in R? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mirror axis on hist2d plot - how?
The following code produces a heatmap based on normalized data. I would like to mirror x and y axis for this plot. Any idea how to do that? require("gplots") x <- rnorm(500) y <- rnorm(500) hist2d(x, y, freq=TRUE, nbins=50, col = c("white",heat.colors(256))) Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Current script name from R
I am using RGUI, the command line or the StatET Eclipse environment. Should this not all be the same? Ralf On Fri, Jul 9, 2010 at 7:11 AM, Allan Engelhardt wrote: > I'm assuming you are using Rscript (please provide self-contained examples > when posting) in which case you could look for the element in > (base|R.utils)::commandArgs() that begin with the string "--file=" - the > rest is the file name. See the asValues= parameter in help("commandArgs", > package="R.utils") for a nice way to get the parameter. > > For an invocation of the form R < foo.R you'd need to inspect your system's > process table (so don't do that). > > Hope this helps. > > Allan > > On 09/07/10 10:48, Ralf B wrote: >> >> Is there a way for a script to find out about its own name ? >> >> Ralf >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] KLdiv produces NA. Why?
I am trying to calculate a Kullback-Leibler divergence from two vectors with integers but get NA as a result when trying to calulate the measure. Why? x <- cbind(stuff$X, morestuff$X) x[1:5,] [,1] [,2] [1,] 293 938 [2,] 293 942 [3,] 297 949 [4,] 290 956 [5,] 294 959 KLdiv(x) [,1] [,2] [1,]0 NA [2,] NA0 Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotting text in existing plot?
I would like to plot some text in a existing plot graph. Is there a very simple way to do that. It does not need to be pretty at all (just maybe a way to center it or define a position within the plot). ( ? ) Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Current script name from R
Is there a way for a script to find out about its own name ? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R^2 in loess and predict?
Parametric regression produces R^2 as a measure of how well the model predicts the sample and adjusted R^2 as a measure of how well it models the population. What is the equalvalent for non-parametric regression (e.g. loess function) ? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Non-parametric regression
I have two data sets, each a vector of 1000 numbers, each vector representing a distribution (i.e. 1000 numbers each of which representing a frequency at one point on a scale between 1 and 1000). For similfication, here an short version with only 5 points. a <- c(8,10,8,12,4) b <- c(7,11,8,10,5) Leaving the obvious discussion about causality aside fro a moment, I would like to see how well i can predict b from a using a regression. Since I do not know anything about the distribution type and already discovered non-normality I cannot use parametric regression or anything GLM for that matter. How should I proceed in using non-parametric regression to model vector a and see how well it predicts b? Perhaps you could extend the given lines into a short example script to give me an idea? Are there any other options? Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fast String operations in R ? Cost of String operations
Hi experts, currently developing some code that checks a large amount of Strings for the existence of sub-strings and pattern (detecting sub-strings within URLs). I wonder if there is information about how well particular String operations work in R together with comparisons. Are there recommendations (based on such information) regarding what operations should be used and what should be avoided? Are there libraries and functions that provide optimized String operations for such needs or is R simply not the right choice for that? Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Profiler for R ?
Hi, is there such a thing as a profiler for R that informs about a) how much processing time is used by particular functions and commands and b) how much memory is used for creating how many objects (or types of data structures)? In a way I am looking for something similar to the java profiler (which is started by command line and provides profiling information collected from the run of a particular program). Is there such a tool through the R command line or RGUI ? Are there profilers available for the Eclipse StatET or though another package or extension? Thanks, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Good Package(s) for String and URL processing?
Are there packages that allow improved String and URL processing? E.g. extract parts of a URLs such as sub-domains, top-level domain, protocols (e.g. https, http, ftp), file type based on endings, check if a URL is valid or not, etc... I am currently only using split and paste. Are there better and more efficient ways to handle strings e.g. finding sub-strings or to do pattern matching? What packages do you use if you have to do a lot of String processing and you don't have the option to go to another language such as Perl or Python? Thanks, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Assigning variable value as name to cbind column
Hi all, I have this (non-working) script: dataTest <- data.frame(col1=c(1,2,3)) new.data <- c(1,2) name <- "test" n.row <- dim(dataTest)[1] length(new.data) <- n.row names(new.data) <- name cbind(dataTest, name=new.data) print(dataTest) and would like to bind the new column 'new.data' to 'dataTest' by using the value of the variable 'name' as the column name. The end result should look like this: col1 test 1 1 1 2 2 2 3 3 NA The best I got was that 'name' became the column name but never the actual value of 'name'. How can i do that? (This is actually a function that runs many time -- this means a manual workaround is not feasible). Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple qqplot question
Short rep: I have two distributions, data and data2; each build from about 3 million data points; they appear similar when looking at densities and histograms. I plotted qqplots for further eye-balling: qqplot(data, data2, xlab = "1", ylab = "2") and get an almost perfect diagonal line which means they are in fact very alike. Now I tried to check normality using qqnorm -- and I think I am doing something wrong here: qqnorm(data, main = "Q-Q normality plot for 1") qqnorm(data2, main = "Q-Q normality plot for 2") I am getting perfect S-shaped curves (??) for both distributions. Am I something missing here? | | * * * * | * |* |* | * |* | * | * * * |- Thanks, Ralf On Thu, Jun 24, 2010 at 8:23 PM, Ralf B wrote: > Unfortunately not. I want a qqplot from two variables. > > Ralf > > > On Thu, Jun 24, 2010 at 7:23 PM, Joris Meys wrote: >> Also take a look at qq.plot in the package "car". Gives you exactly >> what you want. >> Cheers >> Joris >> >> On Fri, Jun 25, 2010 at 12:55 AM, Ralf B wrote: >>> More details... >>> >>> I have two distributions which are very similar. I have plotted >>> density plots already from the two distributions. In addition, >>> I created a qqplot that show an almost straight line. What I want is a >>> line that represents the ideal case in which the two >>> distributions match perfectly. I would use this line to see how much >>> the errors divert at different stages of the plot. >>> >>> Ralf >>> >>> >>> >>> On Thu, Jun 24, 2010 at 5:56 PM, stephen sefick wrote: >>>> You are going to have to define the question a little better. Also, >>>> please provide a reproducible example. >>>> >>>> On Thu, Jun 24, 2010 at 4:44 PM, Ralf B wrote: >>>>> I am a beginner in R, so please don't step on me if this is too >>>>> simple. I have two data sets datax and datay for which I created a >>>>> qqplot >>>>> >>>>> qqplot(datax,datay) >>>>> >>>>> but now I want a line that indicates the perfect match so that I can >>>>> see how much the plot diverts from the ideal. This ideal however is >>>>> not normal, so I think qqnorm and qqline cannot be applied. >>>>> >>>>> Perhaps you can help? >>>>> >>>>> Ralf >>>>> >>>>> __ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> >>>> >>>> >>>> -- >>>> Stephen Sefick >>>> >>>> | Auburn University | >>>> | Department of Biological Sciences | >>>> | 331 Funchess Hall | >>>> | Auburn, Alabama | >>>> | 36849 | >>>> |___| >>>> | sas0...@auburn.edu | >>>> | http://www.auburn.edu/~sas0025 | >>>> |___| >>>> >>>> Let's not spend our time and resources thinking about things that are >>>> so little or so large that all they really do for us is puff us up and >>>> make us feel like gods. We are mammals, and have not exhausted the >>>> annoying little problems of being mammals. >>>> >>>> -K. Mullis >>>> >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Joris Meys >> Statistical consultant >> >> Ghent University >> Faculty of Bioscience Engineering >> Department of Applied mathematics, biometrics and process control >> >> tel : +32 9 264 59 87 >> joris.m...@ugent.be >> --- >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] what "density" is plotting ?
The density function works empirically based on your data. It makes no assumption about an underlying distribution. Ralf On Thu, Jun 24, 2010 at 10:48 PM, Carrie Li wrote: > Hello, Ralf, > > Sorry I was being clear. > I mean probability density function > like normal f(x)=(1/2*pi*sd )*exp() something like that . > Sorry about the confusion > > Carrie > > On Thu, Jun 24, 2010 at 10:43 PM, Ralf B wrote: >> >> Hi Carrie, >> >> the output is defined by you; density() only creates the function >> which you need to plot using the plot() function. When you call >> plot(density(x)) you get the output on the screen. You need to use >> pdf() if you want to create a pdf file, png() for creating a png file >> or postscript if you like ps; there are many others. >> >> Ralf >> >> On Thu, Jun 24, 2010 at 10:35 PM, Carrie Li >> wrote: >> > Hi everyone, >> > >> > I am confused regarding the function "density". >> > suppose that there is a sample x of 100 data points, and >> > plot(density(x)) >> > gives it's pdf ? >> > or it's more like histogram only ? >> > >> > thanks for any answering >> > >> > Carrie >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] what "density" is plotting ?
Hi Carrie, the output is defined by you; density() only creates the function which you need to plot using the plot() function. When you call plot(density(x)) you get the output on the screen. You need to use pdf() if you want to create a pdf file, png() for creating a png file or postscript if you like ps; there are many others. Ralf On Thu, Jun 24, 2010 at 10:35 PM, Carrie Li wrote: > Hi everyone, > > I am confused regarding the function "density". > suppose that there is a sample x of 100 data points, and plot(density(x)) > gives it's pdf ? > or it's more like histogram only ? > > thanks for any answering > > Carrie > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Handouts / Reports or just simply printing text to PDF?
I assume R won't easily generate nice reports (unless one starts using Sweave and LaTeX) but perhaps somebody here knows a package that can create report like output for special cases? How can I simply plot output into PDF? Perhaps you know a package I should check out? What do you guys do to create handouts (before actually publishing)? Thanks in advance, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple qqplot question
Unfortunately not. I want a qqplot from two variables. Ralf On Thu, Jun 24, 2010 at 7:23 PM, Joris Meys wrote: > Also take a look at qq.plot in the package "car". Gives you exactly > what you want. > Cheers > Joris > > On Fri, Jun 25, 2010 at 12:55 AM, Ralf B wrote: >> More details... >> >> I have two distributions which are very similar. I have plotted >> density plots already from the two distributions. In addition, >> I created a qqplot that show an almost straight line. What I want is a >> line that represents the ideal case in which the two >> distributions match perfectly. I would use this line to see how much >> the errors divert at different stages of the plot. >> >> Ralf >> >> >> >> On Thu, Jun 24, 2010 at 5:56 PM, stephen sefick wrote: >>> You are going to have to define the question a little better. Also, >>> please provide a reproducible example. >>> >>> On Thu, Jun 24, 2010 at 4:44 PM, Ralf B wrote: >>>> I am a beginner in R, so please don't step on me if this is too >>>> simple. I have two data sets datax and datay for which I created a >>>> qqplot >>>> >>>> qqplot(datax,datay) >>>> >>>> but now I want a line that indicates the perfect match so that I can >>>> see how much the plot diverts from the ideal. This ideal however is >>>> not normal, so I think qqnorm and qqline cannot be applied. >>>> >>>> Perhaps you can help? >>>> >>>> Ralf >>>> >>>> __ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >>> >>> -- >>> Stephen Sefick >>> >>> | Auburn University | >>> | Department of Biological Sciences | >>> | 331 Funchess Hall | >>> | Auburn, Alabama | >>> | 36849 | >>> |___| >>> | sas0...@auburn.edu | >>> | http://www.auburn.edu/~sas0025 | >>> |___| >>> >>> Let's not spend our time and resources thinking about things that are >>> so little or so large that all they really do for us is puff us up and >>> make us feel like gods. We are mammals, and have not exhausted the >>> annoying little problems of being mammals. >>> >>> -K. Mullis >>> >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Applied mathematics, biometrics and process control > > tel : +32 9 264 59 87 > joris.m...@ugent.be > --- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple qqplot question
More details... I have two distributions which are very similar. I have plotted density plots already from the two distributions. In addition, I created a qqplot that show an almost straight line. What I want is a line that represents the ideal case in which the two distributions match perfectly. I would use this line to see how much the errors divert at different stages of the plot. Ralf On Thu, Jun 24, 2010 at 5:56 PM, stephen sefick wrote: > You are going to have to define the question a little better. Also, > please provide a reproducible example. > > On Thu, Jun 24, 2010 at 4:44 PM, Ralf B wrote: >> I am a beginner in R, so please don't step on me if this is too >> simple. I have two data sets datax and datay for which I created a >> qqplot >> >> qqplot(datax,datay) >> >> but now I want a line that indicates the perfect match so that I can >> see how much the plot diverts from the ideal. This ideal however is >> not normal, so I think qqnorm and qqline cannot be applied. >> >> Perhaps you can help? >> >> Ralf >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Stephen Sefick > > | Auburn University | > | Department of Biological Sciences | > | 331 Funchess Hall | > | Auburn, Alabama | > | 36849 | > |___| > | sas0...@auburn.edu | > | http://www.auburn.edu/~sas0025 | > |___| > > Let's not spend our time and resources thinking about things that are > so little or so large that all they really do for us is puff us up and > make us feel like gods. We are mammals, and have not exhausted the > annoying little problems of being mammals. > > -K. Mullis > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simple qqplot question
I am a beginner in R, so please don't step on me if this is too simple. I have two data sets datax and datay for which I created a qqplot qqplot(datax,datay) but now I want a line that indicates the perfect match so that I can see how much the plot diverts from the ideal. This ideal however is not normal, so I think qqnorm and qqline cannot be applied. Perhaps you can help? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Install package automatically if not there?
Hi fans, is it possible for a script to check if a library has been installed? I want to automatically install it if it is missing to avoid scripts to crash when running on a new machine... Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.csv does not find my file (windows xp)
jep! I forgot to use sep="" for paste and introducted a space in front of the filename... damn, 1 hour of my life! Ralf 2010/6/24 Uwe Ligges : > > > On 24.06.2010 19:02, Ralf B wrote: >> >> I try to load a file >> >> myData<- read.csv(file="C:\\myfolder\\mysubfolder\\mydata.csv", >> head=TRUE, sep=";") >> >> and get this error: >> >> Error in file(file, "rt") : cannot open the connection >> In addition: Warning message: >> In file(file, "rt") : >> cannot open file 'C:\myfolder\mysubfolder\mydata.csv: No such file >> or directory >> >> am I overlooking something? >> >> I am getting the same error when I write the path in '/' notation... >> Does R not tolorate drive letters? > > > It does, and if you can open the file with opther software, then you > probably misspelled folder or filename. > > Uwe Ligges > > > >> Ralf >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.csv does not find my file (windows xp)
I try to load a file myData <- read.csv(file="C:\\myfolder\\mysubfolder\\mydata.csv", head=TRUE, sep=";") and get this error: Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file 'C:\myfolder\mysubfolder\mydata.csv: No such file or directory am I overlooking something? I am getting the same error when I write the path in '/' notation... Does R not tolorate drive letters? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RJDBC vs RMySQL vs ???
Unfortunately, I have a lot of errors with RMySQL -- but that is another thread... Ralf On Thu, Jun 24, 2010 at 10:31 AM, James W. MacDonald wrote: > Hi Ralf, > > Ralf B wrote: >> >> Sorry for the lack of details. Since I run the same SQL first directly >> on MySQL (using the MySQL Query Browser) and then again using R >> through the RJDBC interface, I assume that I won't simply have a badly >> constructed SQL query. However, just to clear possible objection, here >> the SQL: >> >> >> # Extracts vector of data points >> getData <- function(connection) { >> queryStart <- "SELECT id1, id2, x, y FROM `mytable` " >> queryEnd <- ";" >> query <- paste(queryStart, " WHERE id1 IN(", id1s, ") AND id2 IN(", >> id2s, ") AND subtype='TYPE1'", queryEnd) >> # execute query >> data = dbGetQuery(connection, query) >> return(data) >> } >> >> When running this method using either RGUI or the command line, I have >> a runtime that reaches an incredible 10 minutes (!) for selecting >> about 50k - 80k data points (which I consider not much) based on the >> range of IDs I choose. The table size is about 5-8 million data points >> total. The same SQL query directly executed in MySQL Query Browser >> takes about 20 seconds which I would consider fine. There are no >> indices created for any of the fields but since the query runs a lot >> faster in the query browser I don't suspect this to be the main >> reason. >> >> Any ideas? > > Well, the RJDBC rforge page has this note: > > Note: The current implementation of RJDBC is done entirely in R, no Java > code is used. This means that it may not be extremely efficient and could be > potentially sped up by using Java native code. However, it was sufficient > for most tasks we tested. If you have performance issues with RJDBC, please > let us know and tell us more details about your test case. > > And from my quick peek at the page, it appears RJDBC is designed to allow > one to query any DBMS. Since RMySQL is MySQL-specific, it may be more > efficient. Anyway, why don't you just try it and see? > > Best, > > Jim > > >> >> Best, >> Ralf >> >> >> >> >> On Wed, Jun 23, 2010 at 4:36 PM, James W. MacDonald >> wrote: >>> >>> Hi Ralf, >>> >>> Ralf B wrote: >>>> >>>> I am running a simple SQL SELECT statement that involvs 50k + data >>>> points using R and the RJDBC interface. I am facing very slow response >>>> times in both the RGUI and the R console. When running this SQL >>>> statement directly in a SQL client I have processing times that are a >>>> lot lot faster (which means that the SQL statement itself is not the >>>> problem). >>>> >>>> Did any of you compare RJDBC vs RMySQL or is there a better, more >>>> efficient way to extract large data from databases using R? Would you >>>> recommend dumping data out completely into flat files and working with >>>> flat files instead? I expected that this would not be such a problem >>>> given that businesses maintain their data in DBs and R is supposed to >>>> be good in shifting around data. Am I doing something wrong? >>> >>> Well, if you don't show people what you have done, how can anybody tell >>> if >>> you are doing something wrong or not? >>> >>> I have no experience with RJDBC, so cannot say anything about that. >>> However, >>> I have always found RMySQL to be speedy enough. As an example: >>> >>>> library(RMySQL) >>> >>> Loading required package: DBI >>>> >>>> con <- dbConnect("MySQL", host="genome-mysql.cse.ucsc.edu", user = >>>> "genome", dbname = "hg18") >>>> system.time(a <- dbGetQuery(con, "select name, chromEnd from snp129 >>>> where >>>> chrom='chr1' and chromStart between 1 and 1e8;") >>> >>> + ) >>> user system elapsed >>> 7.95 0.06 38.59 >>>> >>>> dim(a) >>> >>> [1] 508676 2 >>> >>> So 40 seconds to get half a million records. Since this is via the >>> internet, >>> I have to imagine things would be much faster querying a local DB. >>> >>> But then you never say what constitutes 'slow' for you, s
Re: [R] Comparing distributions
The diagram only serves as a rough example to give you an idea. To be more precise I would like to give more detail: The data represents movements from two types of pointing device (e.g. mouse, pointer, ) along an axis. The data has diffreent parameters -- such as different pointing devices, different axis, split by different experiment conditions etc. but the problem is always the same: I would like find out if their distributions correlate and would like to have some kind of 'objective' (Yes, I know -- nothing is objective -- but eye-balling isn't either.) measure, test, etc. These would be accompanied by Q-Q plots and density plots to get a general feeling of what is going on and become part of the discussion. I don't expect a solution from here, but perhaps a general direction where I could find my kind of problem being understood. Ralf On Wed, Jun 23, 2010 at 10:07 PM, Robert A LaBudde wrote: > Your "*" curve apparently dominates your "+" curve. > > If they have the same total number of data each, as you say, they both > cannot sum to the same value (e.g., N = 1 or 1.000). > > So there is something going on that you aren't mentioning. > > Try comparing CDFs instead of pdfs. > > At 03:33 PM 6/23/2010, Ralf B wrote: >> >> I am trying to do something in R and would appreciate a push into the >> right direction. I hope some of you experts can help. >> >> I have two distributions obtrained from 1 datapoints each (about >> 1 datapoints each, non-normal with multi-model shape (when >> eye-balling densities) but other then that I know little about its >> distribution). When plotting the two distributions together I can see >> that the two densities are alike with a certain distance to each other >> (e.g. 50 units on the X axis). I tried to plot a simplified picture of >> the density plot below: >> >> >> >> >> | >> | * >> | * * >> | * + * >> | * + + * >> | * + * + + * >> | * +* + * + + * >> | * + * + +* >> | * + +* >> | * + +* >> | * + + >> * >> | * + >> + * >> |___ >> >> >> What I would like to do is to formally test their similarity or >> otherwise measure it more reliably than just showing and discussing a >> plot. Is there a general approach other then using a Mann-Whitney test >> which is very strict and seems to assume a perfect match. Is there a >> test that takes in a certain 'band' (e.g. 50,100, 150 units on X) or >> are there any other similarity measures that could give me a statistic >> about how close these two distributions are to each other ? All I can >> say from eye-balling is that they seem to follow each other and it >> appears that one distribution is shifted by a amount from the other. >> Any ideas? >> >> Ralf >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: r...@lcfltd.com > Least Cost Formulations, Ltd. URL: http://lcfltd.com/ > 824 Timberlake Drive Tel: 757-467-0954 > Virginia Beach, VA 23464-3239 Fax: 757-467-2947 > > "Vere scire est per causas scire" > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RJDBC vs RMySQL vs ???
Sorry for the lack of details. Since I run the same SQL first directly on MySQL (using the MySQL Query Browser) and then again using R through the RJDBC interface, I assume that I won't simply have a badly constructed SQL query. However, just to clear possible objection, here the SQL: # Extracts vector of data points getData <- function(connection) { queryStart <- "SELECT id1, id2, x, y FROM `mytable` " queryEnd <- ";" query <- paste(queryStart, " WHERE id1 IN(", id1s, ") AND id2 IN(", id2s, ") AND subtype='TYPE1'", queryEnd) # execute query data = dbGetQuery(connection, query) return(data) } When running this method using either RGUI or the command line, I have a runtime that reaches an incredible 10 minutes (!) for selecting about 50k - 80k data points (which I consider not much) based on the range of IDs I choose. The table size is about 5-8 million data points total. The same SQL query directly executed in MySQL Query Browser takes about 20 seconds which I would consider fine. There are no indices created for any of the fields but since the query runs a lot faster in the query browser I don't suspect this to be the main reason. Any ideas? Best, Ralf On Wed, Jun 23, 2010 at 4:36 PM, James W. MacDonald wrote: > Hi Ralf, > > Ralf B wrote: >> >> I am running a simple SQL SELECT statement that involvs 50k + data >> points using R and the RJDBC interface. I am facing very slow response >> times in both the RGUI and the R console. When running this SQL >> statement directly in a SQL client I have processing times that are a >> lot lot faster (which means that the SQL statement itself is not the >> problem). >> >> Did any of you compare RJDBC vs RMySQL or is there a better, more >> efficient way to extract large data from databases using R? Would you >> recommend dumping data out completely into flat files and working with >> flat files instead? I expected that this would not be such a problem >> given that businesses maintain their data in DBs and R is supposed to >> be good in shifting around data. Am I doing something wrong? > > Well, if you don't show people what you have done, how can anybody tell if > you are doing something wrong or not? > > I have no experience with RJDBC, so cannot say anything about that. However, > I have always found RMySQL to be speedy enough. As an example: > >> library(RMySQL) > Loading required package: DBI >> con <- dbConnect("MySQL", host="genome-mysql.cse.ucsc.edu", user = >> "genome", dbname = "hg18") >> system.time(a <- dbGetQuery(con, "select name, chromEnd from snp129 where >> chrom='chr1' and chromStart between 1 and 1e8;") > + ) > user system elapsed > 7.95 0.06 38.59 >> dim(a) > [1] 508676 2 > > So 40 seconds to get half a million records. Since this is via the internet, > I have to imagine things would be much faster querying a local DB. > > But then you never say what constitutes 'slow' for you, so maybe this is > slow as well? > > Best, > > Jim > > >> >> Ralf >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RJDBC vs RMySQL vs ???
I am running a simple SQL SELECT statement that involvs 50k + data points using R and the RJDBC interface. I am facing very slow response times in both the RGUI and the R console. When running this SQL statement directly in a SQL client I have processing times that are a lot lot faster (which means that the SQL statement itself is not the problem). Did any of you compare RJDBC vs RMySQL or is there a better, more efficient way to extract large data from databases using R? Would you recommend dumping data out completely into flat files and working with flat files instead? I expected that this would not be such a problem given that businesses maintain their data in DBs and R is supposed to be good in shifting around data. Am I doing something wrong? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Comparing distributions
I am trying to do something in R and would appreciate a push into the right direction. I hope some of you experts can help. I have two distributions obtrained from 1 datapoints each (about 1 datapoints each, non-normal with multi-model shape (when eye-balling densities) but other then that I know little about its distribution). When plotting the two distributions together I can see that the two densities are alike with a certain distance to each other (e.g. 50 units on the X axis). I tried to plot a simplified picture of the density plot below: | | * | * * | *+ * | * + + * | *+ * ++ * | *+* + * + + * | * + * + +* | * + +* |* ++* | * + + * | * + + * |___ What I would like to do is to formally test their similarity or otherwise measure it more reliably than just showing and discussing a plot. Is there a general approach other then using a Mann-Whitney test which is very strict and seems to assume a perfect match. Is there a test that takes in a certain 'band' (e.g. 50,100, 150 units on X) or are there any other similarity measures that could give me a statistic about how close these two distributions are to each other ? All I can say from eye-balling is that they seem to follow each other and it appears that one distribution is shifted by a amount from the other. Any ideas? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] About normality tests (2) ...
In addition to the previous email: What plots would you suggest in addition to density / histogram plots and how can I produce them with R ? Perhaps one of you has an example ? Thanks a lot, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] About normality tests...
Hi all, I have two very large samples of data (1+ data points) and would like to perform normality tests on it. I know that p < .05 means that a data set is considered as not normal with any of the two tests. I am also aware that large samples tend to lead more likely to normal results (Andy Field, 2005). I have a few questions to ensure that I am using them right. 1) The Shapiro-Wilk test requires to provide mean and sd. Is is correct to add here the mean and sd of the data itself (since I am comparing to a normal distribution with the same parameters) ? mySD <- sd(mydata$myfield) myMean <- mean(mydata$myfield) shapiro.test(rnorm(100, mean = myMean, sd = mySD)) 2) If I just want to test each distribution individually, I assume that I am doing a one-sample Kolmogorov-Smirnov test. Is that correct? 3) If I simply want to know if normality exists or not, what should I put for the parameter 'alternative' ? Does it actually matter? alternative = c("two.sided", "less", "greater") Thank you, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RJDBC - sloooooow - HELP!
Hi all, I am suffering from a very slow RJDBC (7 rows of from a simple select take like 10 minutes). Does anybody know if RMySQL is faster? Or RODBC in that respect? What are alternatives and what can be done to get a realistic performance out of MySQL when connected to R's JRI ? Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Column name defined by function variable
Sorry, its late and I am getting tired ;) I modified based on your suggestion: #combine data add.col <- function(df, new.col, name) { n.row <- dim(df)[1] length(new.col) <- n.row names(new.col) <- name cbind(df, new.col) } data <- data.frame(stuff1=as.numeric(d2$points)) data <- add.col(data, as.numeric(d1$morepoints), "stuff2") but the column in the data frame is still called 'new.col' and not 'stuff2'. Any further ideas? Best, Ralf On Thu, Jun 17, 2010 at 5:14 AM, Ivan Calandra wrote: > Hi, > > I haven't check much of what you wrote, so just a blind guess. What about in > the function's body before cbind(): > names(new.col) <- "more stuff" > ? > > HTH, > Ivan > > Le 6/17/2010 11:09, Ralf B a écrit : >> >> Hi all, >> >> probably a simple problem for you but I am stuck. >> >> This simple function adds columns (with differing length) to data frames: >> >> add.col<- function(df, new.col) { >> n.row<- dim(df)[1] >> length(new.col)<- n.row >> cbind(df, new.col) >> } >> >> Now I would like to extend that method. A new parameter 'name' shouild >> allow people to pass in a name for that new column. Is that possible >> and how can this be achieved? >> >> Example: >> >> myData<- data.frame(c(1,2,3)) >> add.col(myData, c(5,6,7,8), 'more stuff') >> >> adds a new column named 'more stuff' to the dataframe myData. >> >> >> Any ideas? >> >> Best, >> Ralf >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > -- > Ivan CALANDRA > PhD Student > University of Hamburg > Biozentrum Grindel und Zoologisches Museum > Abt. Säugetiere > Martin-Luther-King-Platz 3 > D-20146 Hamburg, GERMANY > +49(0)40 42838 6231 > ivan.calan...@uni-hamburg.de > > ** > http://www.for771.uni-bonn.de > http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Testing for differences between 2 unknown distributions/densities
Hi all, I have two distributions / densities (drew density plots and eye-balled some data). Given that I don't want to make any assumptions about the data (e.g. normality, existence of certain distribution types and parameters), what are my options for testing that the distributions are the same? Thanks, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Column name defined by function variable
Hi all, probably a simple problem for you but I am stuck. This simple function adds columns (with differing length) to data frames: add.col <- function(df, new.col) { n.row <- dim(df)[1] length(new.col) <- n.row cbind(df, new.col) } Now I would like to extend that method. A new parameter 'name' shouild allow people to pass in a name for that new column. Is that possible and how can this be achieved? Example: myData <- data.frame(c(1,2,3)) add.col(myData, c(5,6,7,8), 'more stuff') adds a new column named 'more stuff' to the dataframe myData. Any ideas? Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Strange behavior when plotting with ggplot2 and lattice
Hi all, I have the following script,which won't plot (tried in RGUI and also in Eclipse StatET): library(ggplot2)# for plotting results userids <- c(1,2,3) for (userid in userids){ qplot(c(1:10), c(1:20)) } print ("end") No plot shows up. If I run the following: library(ggplot2)# for plotting results userids <- c(1,2,3) for (userid in userids){ blabla))) qplot(c(1:10), c(1:20)) } print ("end") which contains a syntax mistake in line 4, then I get the plot output on the screen. I have the same issue when using lattice, but its ok when using standard graphics plot. WHAT IS GOING ON??? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2: qplot won't work
I have a script running in the StatET Eclipse environment that executes the ggplot2 command qplot in a function: # Creates the plot createPlot <- function(){ print("Lets plot!") qplot(1:10, letters[1:10]) } When executing the qplot line directly, it works. When executing the script it does not open a window and it it does not plot. Is there something important I have forgotten? I know that the function is called because I always get my 'Lets plot' When using normal graphics plot functions, its also works seamless. Of course I am importing the library ggplot2 at the beginning of my script - here is the import log: library(gdata) # for trim function gdata: Unable to locate valid perl interpreter gdata: gdata: read.xls() will be unable to read Excel XLS and XLSX files gdata: unless the 'perl=' argument is used to specify the location of a gdata: valid perl intrpreter. gdata: gdata: (To avoid display of this message in the future, please ensure gdata: perl is installed and available on the executable search path.) gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLX' (Excel 97-2004) files. gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLSX' (Excel 2007+) files. gdata: Run the function 'installXLSXsupport()' gdata: to automatically download and install the perl gdata: libaries needed to support Excel XLS and XLSX formats. Attaching package: 'gdata' The following object(s) are masked from package:utils : object.size Warning message: package 'gdata' was built under R version 2.10.1 > library(TTR) # for moving averages (SMA,...) smoothing Loading required package: xts Loading required package: zoo Warning messages: 1: package 'TTR' was built under R version 2.10.1 2: package 'xts' was built under R version 2.10.1 3: package 'zoo' was built under R version 2.10.1 > library(ggplot2) # for plotting results Loading required package: proto Loading required package: grid Loading required package: reshape Loading required package: plyr Loading required package: digest Attaching package: 'ggplot2' The following object(s) are masked from package:gdata : interleave Warning messages: 1: package 'ggplot2' was built under R version 2.10.1 2: package 'proto' was built under R version 2.10.1 3: package 'reshape' was built under R version 2.10.1 4: package 'plyr' was built under R version 2.10.1 5: package 'digest' was built under R version 2.10.1 What is wrong here? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Smoothing Techniques - short stepwise functions with spikes
R Friends, I have data from which I would like to learn a more general (smoothened) trend by applying data smoothing methods. Data points follow a positive stepwise function. |x x | | xx | xxx | x | | xxx |__ Data points from each step should not be interacting with any other step. The outliers I want to to remove are spikes as shown in the diagram. These spikes do not have more than one or two points. I consider larger groups as relevant and want to keep them in. I sometimes have less than 5 points for each step, and up to 50 at max. Given these conditions would you suggest using one of the moving averages (e.g. SMA, EMA, DEMA, ...) or the locally linear regression (lowress) method. Are there any other options? Does anybody know a good site that overviews all methods without going to much into mathematical details but rather focusing on the requirements and underlying assumptions of each method? Is there perhaps even a package that runs and visualizes a comparison on the data similar to packages like 'party' ? (with 1000s of active packages, one can always hope for that) Thanks in advance! Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Corrupt R installation?
I installed the lattice package, and got an error that R was not able to remove the previous version of lattice. Now my installation seems to be currupt, even affecting other packages. I am getting this error when loading TTR: > library(TTR) Loading required package: xts Loading required package: zoo Error in loadNamespace(i, c(lib.loc, .libPaths())) : there is no package called 'lattice' In addition: Warning messages: 1: package 'TTR' was built under R version 2.10.1 2: package 'xts' was built under R version 2.10.1 3: package 'zoo' was built under R version 2.10.1 Error: package 'zoo' could not be loaded My question now is, is there a way to manually remove lattice (or whats left from it) ? Or do I have to go through the process of completely re-installing? What do you guys do to prevent such a situation - is there an easy way to secure a R installation? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bug in DEMA (Moving Average smoothing algoritm) ?
When running DEMA(data, 5) on a vector 'data' of length 5, my R engine stops. Is this function or the R environment facing a bug here or am I doing something wrong? DEMA should work if the smoothing window size is the same size as the the data length, right? (I am working with Eclipse 3.5. and the StatET environment.) Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.