[R] as.numeric in data.frame, but only where it is possible
Hi, I would like to convert my "character" sequences in my matrix/ data.frame into "numeric" where it is possible. I would also like to retain my alphabetic character strings in their original forms. "5.1" > 5.1 "hm" > "hm" k<-matrix(c("aa", "bb", 1,2, 4.3, 0), nrow=2) mode(k) <- "numeric" # ln1 coerces numeric chars into "character" and # ln2 replaces alphabet chars with NA (with warnings) # => OK as matrix can't have mixed types k<-matrix(c("aa", "bb", 1,2, 4.3, 0), nrow=2) g<-as.data.frame(k, stringsAsFactos=FALSE) g[,2] # returns: # [1] 1 2 # Levels: 1 2 # hm... let them be levels then... sum(g[,2]) # going down in flames g[,2] <- as.numeric(g[,2]) sum(g[,2]) # is the winner of the day, # but I have a hunch that there must be something more trivial/general solution. - I'm looking for something like, as.data.frame(... as.numeric.if.possible=TRUE) , if possible. - Mainly, I don't know in advance if a column has alphabetical or numeric characters. - I have relatively large matrices (or a relatively slow PC) (max 100,000 x 100 "cells"), if that matters *** *** *** *** *** Another related problem of mine is to create data.frames with mixed types (it should be possible, shouldn't it). d<-data.frame("b"=seq(1,3)) d<-cbind(a,b) typeof(d) # "d" was coerced into "character" any help is greatly appreciated, gabor __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to create a string containing '\/' to be used with SED?
What is the problem error message? I can say > fred <- "blah1\\/blah2\\/blah3" and then the string looks like... > cat("#", fred, '#\n', sep='') #blah1\/blah2\/blah3# If you just ask R to print it then it looks like... > fred [1] "blah1\\/blah2\\/blah3" > when you're "playing" with strings and regular expressions, it's vital to understand the backslash quoting mechanism... Best regards, Sean ikarus wrote: > > I still can't create a string with inside "\/" (e.g., a <- > "..\\/path\\/file" > doesn't work, R complains and the \\ are removed), ... > -- View this message in context: http://www.nabble.com/How-to-create-a-string-containing-%27%5C-%27-to-be-used-with-SED--tp20694319p20713699.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 problem
Yes - that's it. thank you Steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 problem
aes() does not have an argument, "Year" but it does have an argument "x" so try: df <- data.frame(Year = rep(1:5,2)) m <- ggplot(df, aes(x=Year)) m + geom_bar() (It works for me.) Eric steve wrote: I'm using ggplot2 2.0.8 and R 2.8.0 df = data.frame(Year = rep(1:5,2)) m = ggplot(df, aes(Year=Year)) m + geom_bar() Error in get("calculate", env = ., inherits = TRUE)(., ...) : attempt to apply non-function __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] base:::rbind
timeSeries is calling methods:::bind_activation. You can turn this off and on yourself with: methods:::bind_activation(FALSE) methods:::bind_activation(TRUE) See ?cbind2 On Wed, Nov 26, 2008 at 9:00 PM, Fernando Saldanha <[EMAIL PROTECTED]> wrote: > It seems that after loading the package timeSeries the function > base:::rbind is replaced by methods:::rbind > >> identical(base:::rbind, methods:::rbind) > [1] FALSE >> rbb = base:::rbind >> library(timeSeries) > Loading required package: timeDate >> identical(base:::rbind, methods:::rbind) > [1] TRUE >> identical(rbb, base:::rbind) > [1] FALSE > > Is there a way to access the original function base:::rbind after > loading timeSeries? > > I am runing R 2.8.0 under Vista Home Premium. > > Thanks for any help. > > FS > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2 problem
I'm using ggplot2 2.0.8 and R 2.8.0 df = data.frame(Year = rep(1:5,2)) m = ggplot(df, aes(Year=Year)) m + geom_bar() Error in get("calculate", env = ., inherits = TRUE)(., ...) : attempt to apply non-function __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] base:::rbind
It seems that after loading the package timeSeries the function base:::rbind is replaced by methods:::rbind > identical(base:::rbind, methods:::rbind) [1] FALSE > rbb = base:::rbind > library(timeSeries) Loading required package: timeDate > identical(base:::rbind, methods:::rbind) [1] TRUE > identical(rbb, base:::rbind) [1] FALSE Is there a way to access the original function base:::rbind after loading timeSeries? I am runing R 2.8.0 under Vista Home Premium. Thanks for any help. FS __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to create a string containing '\/' to be used with SED?
Thanks Sean!! I followed you suggestion to use gsub() and it worked perfectly! I still can't create a string with inside "\/" (e.g., a <- "..\\/path\\/file" doesn't work, R complains and the \\ are removed), but I don't care, gsub() does the same job as sed and without using any system call. seanpor wrote: > > Good morning, > > You do not need to quote a forward slash / in R, but you do need to quote > a backslash when you're inputting it... so to get a string which actually > contains "blah\/blah"... you need to use "blah\\/blah" > > http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-does-backslash-behave-strangely-inside-strings_003f > > Unless this is a very very big file you shouldn't need to go out to sed, > as gsub() should work adequately... and probably quicker and cleaner. So > something along the lines of.. (UNTESTED!!! since I don't have a > reproduceable example) > > tmp1 <- readLines(configurationFile) > tmp1 <- gsub("^instance .*", paste("instance = ", data$instancePath, "/", > data$newInstance, sep = ""), tmp1) > > > I'm working on 50mb text files, and doing all sorts of manipulations and I > do it all inside R under windows XP... reading a 50mb text file across > the 100mb network and doing a gsub() on most lines takes an elapsed 16 > seconds on this office desktop. > > hth... > > Regards, > Sean > > > ikarus wrote: >> >> Hi guys, >> I've been struggling to find a solution to the following issue: >> I need to change strings in .ini files that are given in input to a >> program whose output is processed by R. The strings to be changed looks >> like: >> "instance = /home/TSPFiles/TSPLIB/berlin52.tsp" >> >> I normally use Sed for this kind of things. So, inside R I'd like to >> write something like: >> >> command <- paste("sed -i 's/^instance .*/instance = ", >> data$instancePath, >>data$newInstance, "/' ", configurationFile, sep = "") >> system(command) >> >> This will overwrite the line starting with "instance " using "instance = >> the_new_instance" >> In the example I gave, data$instancePath = /home/TSPFiles/TSPLIB/ and >> data$newInstance = berlin52.tsp >> >> The problem is that I need to pass the above path string to sed in the >> form: >> "\/home\/TSPFiles\/TSPLIB\/" >> >> However, I couldn't find a way to "create" such a string in R. I tried in >> several different ways, >> but it always complains saying that '\/' is an unrecognized escape! >> >> Any suggestion? >> >> Thanks! >> > -- View this message in context: http://www.nabble.com/How-to-create-a-string-containing-%27%5C-%27-to-be-used-with-SED--tp20694319p20711848.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Business Data Sets
Can someone please help me, or point me in the right direction, to find some “Business” data sets. In the next couple of weeks I will be teaching a “small business” class/seminar, and my intensions are to preach the importance of collecting data on business operations and how to analyze that data. I.e., sales can be forecasted by calculating the correlations between x1, x2, and x3. It’s nothing complex, I’m just looking for some relatively straight forward data sets. So, is there any simple data sets within libraries and packages in R, or other websites that you know of? Any help would be appreciated. -- View this message in context: http://www.nabble.com/Business-Data-Sets-tp20711909p20711909.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Boundary value problem
Hi, I want to solve an ordinary differential equation as boundary value problem. I already did it as initial value problem, but now I have constraints for variable values at more than one point. I searched with the key words , "boundary value problem", "shooting method", but couldnt find any help online. wondering if there is a package available out there to do this? As far as I know, its not there in either the package odesolver or desolve. thanks karthik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and SPSS
On Nov 26, 2008, at 7:57 PM, Andrew Choens wrote: It lets you access R from within SPSS. Best of both worlds. -- Insert something humorous here. :-) OK, I'll bite. "It lets you access R from within SPSS. Best of both worlds." __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and SPSS
On Wed, 2008-11-26 at 12:25 -0800, Applejus wrote: > Hi, > > I have a code in R. Could anyone give me the best possible way (or just > ways!) to integrate it in SPSS? > > Thanks! You will need a SPSS registration, but go here and get the SPSS r plugin. http://www.spss.com/devcentral/ It lets you access R from within SPSS. Best of both worlds. -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reshape with var as fun.aggregate
Hi Jason, The reason you are getting this error is that the latest version of reshape uses fun.aggregate(numeric(0)) to figure out what missing values should be filled in with. Unfortunately there was a brief period in R versions when var(numeric(0)) returned an error rather than a missing value. You can work around the problem by manually specifying fill = NA. Regards, Hadley On Wed, Nov 26, 2008 at 1:05 PM, <[EMAIL PROTECTED]> wrote: > I used to be able to use variance for fun.aggregate, but since I > upgraded from R 2.6 to 2.7 (Ubuntu hardy to intrepid), it no longer > works. > >> library("reshape") >> data("french_fries") >> ffm <- melt(french_fries, id=1:4, na.rm = TRUE) >> cast(ffm, rep ~ treatment, length) > rep 1 2 3 > 1 1 579 578 575 > 2 2 580 579 580 >> cast(ffm, rep ~ treatment, mean) > rep123 > 1 1 3.207772 3.177336 3.024522 > 2 2 3.181207 3.115544 3.277759 >> cast(ffm, rep ~ treatment, var) > Error in fun.aggregate(data$value[0]) : 'x' is empty > > Any ideas? > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Drawing a tree in R
Hi, I've been trying to make use of the dendrogram class to plot a tree. However, my attempts have failed so far because I'm not sure how to build up the nested list. (where do I have to store the nodes?). The tree I want to be drawn is like this: in the form (node, left child,right child) (root,(n1,1,(n2,2,3)),4) If it is not easily possible to draw a tree with the help of the dendrogram class I'm open to any other method to draw a tree in R!! Thanks! -Severin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory limit
On Wed, Nov 26, 2008 at 1:16 PM, Stavros Macrakis <[EMAIL PROTECTED]> wrote: > I routinely compute with a 2,500,000-row dataset with 16 columns, > which takes 410MB of storage; my Windows box has 4GB, which avoids > thrashing. As long as I'm careful not to compute and save multiple > copies of the entire data frame (because 32-bit Windows R is limited > to about 1.5GB address space total, including any intermediate > results), R works impressively well and fast with this dataset for > selections, calculations, cross-tabs, plotting, etc. For example, > simple single-column statistics and cross-tabs take << 1 sec., summary > of the whole thing takes 16 sec. A linear regression between two > numeric columns takes < 20 sec. Plotting of all 2.5M points takes a > while, but that is no surprise (and is usually pointless [sic] > anyway). I have not tried to do any compute-intensive statistical > calculations on the whole data set. > > The main (but minor) annoyance with it is that it takes about 90 secs > to load into memory using R's native binary "save" format, so I tend > to keep the process lying around rather than re-starting and > re-loading for each analysis. Fortunately, garbage collection is very > effective in reclaiming unused storage as long as I'm careful to > remove unnecessary objects. FYI, objects saved with save(..., compress=FALSE) are notable faster to read back. /Henrik > >-s > > > On Wed, Nov 26, 2008 at 7:42 AM, iwalters <[EMAIL PROTECTED]> wrote: >> >> I'm currently working with very large datasets that consist out of 1,000,000 >> + rows. Is it at all possible to use R for datasets this size or should I >> rather consider C++/Java. >> >> >> -- >> View this message in context: >> http://www.nabble.com/increasing-memory-limit-in-Windows-Server-2008-64-bit-tp20675880p20699700.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installing packages based on the license
On 26/11/2008 4:43 PM, Ted Mendum wrote: Hello, I would like have an automatic way to avoid installing packages that I cannot use due to license restrictions. For example, the conf.design package is limited to non-commercial use, and since I work for a for-profit business, I cannot use it. I found out about the license terms of conf.design by chance; I would like to avoid any possibility of a mistake in the future. Is there some clever combination of grep and install.packages that I could use to limit my downloads to, say, GPL-only packages? In theory, you should be able to look at the License field of available.packages(fields="License") and limit yourself to those with acceptable licenses. However, I don't think CRAN reports on the license. However, after you have installed a set of packages, you can scan the set of licenses using this code: installed.packages(fields="License")[,"License"] Duncan Murdoch Thanks, Ted Ted Mendum | Senior Scientist Warner Babcock Institute for Green Chemistry 66 Cummings Park, Woburn, MA 01801 p: 781-937-9000 f:781-937-9001 [EMAIL PROTECTED] www.warnerbabcock.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Estimates of coefficient variances and covariances from a multinomial logistic regression?
On Wed, 26 Nov 2008, Grant Gillis wrote: Hello and thanks in advance for any help, I am using the 'multinom' function from the nnet package to calculate a multinomial logistic regression. I would like to get a matrix estimates of the estimated coefficient variances and covariances. Am I missing some easy way to extract these? See ?vcov Chuck Grant [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installing packages based on the license
On Wed, 26 Nov 2008, Ted Mendum wrote: Hello, I would like have an automatic way to avoid installing packages that I cannot use due to license restrictions. For example, the conf.design package is limited to non-commercial use, and since I work for a for-profit business, I cannot use it. I found out about the license terms of conf.design by chance; I would like to avoid any possibility of a mistake in the future. Is there some clever combination of grep and install.packages that I could use to limit my downloads to, say, GPL-only packages? Something like: txt <- readLines(url("http://cran.r-project.org/web/packages/ALS/index.html";)) aok <- regexpr("GPL",txt[grep("License",txt)+1])>0 will work on CRAN as it is formatted now. Maybe roll it up in a function, say license.ok(), that returns the name of package if aok or print the license and stop if not aok. Then install.packages(license.ok(pkg.name)) should do it. Of course, this does not handle dependencies. For that you might want to parse with index.html with XML, follow the dependencies and check them all out, first. HTH, Chuck Thanks, Ted Ted Mendum | Senior Scientist Warner Babcock Institute for Green Chemistry 66 Cummings Park, Woburn, MA 01801 p: 781-937-9000 f:781-937-9001 [EMAIL PROTECTED] www.warnerbabcock.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Estimates of coefficient variances and covariances from a multinomial logistic regression?
Hello and thanks in advance for any help, I am using the 'multinom' function from the nnet package to calculate a multinomial logistic regression. I would like to get a matrix estimates of the estimated coefficient variances and covariances. Am I missing some easy way to extract these? Grant [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SVM
Hi All, I fitted several classifiers in a two class problem. I then used the package 'yaImpute' - to apply my predictive models to asciigrids and thereby generate a probability maps. So far I successfully used yaImpute to generate maps for Random Forests, Classification trees, Generalized Linear Models (GLMs) and Generalized Additive Models (GAMs). But when I try to use it to two other classifiers - support vector machine (SVM) and linear discriminant analysis (LDA) - I am getting an error message which I am not really sure what it is trying to tell me. Can you please take few minutes of your time to help me understand what these error messages are? I used the following piece of code to apply my models to grids once I fit the model for both LDA and SVM: AsciiGridPredict(lda.fit,xfiles=namelist,outfiles = as.character(outfile)) AsciiGridPredict(svm.fit,xfiles=namelist,outfiles = as.character(outfile)) I am getting the following error message for LDA: Rows per dot: 1 Rows to do: 163 ToDo: ... Done: . First six lines of predicted data for map row: 2 predict.class predict.posterior.0 predict.posterior.1 predict.LD1 1 - - - 2 - - - 3 - - - 4 - - - 5 - - - 6 - - - Error in AsciiGridImpute(object, xfiles, outfiles, xtypes = xtypes, lon = lon, : predict is not present in the predicted data In addition: Warning message: In `[<-.factor`(`*tmp*`, ri, value = c(-, -, -, -, : invalid factor level, NAs generated I am getting the following error message for SVM: Rows per dot: 1 Rows to do: 163 ToDo: ... Done: ... Legend of levels in output grids: predict 1 0 2 1 There were 50 or more warnings (use warnings() to see the first 50) > warnings() Warning messages: 1: In `[<-.factor`(`*tmp*`, ri, value = c(-, -, -, ... : invalid factor level, NAs generated 2: In `[<-.factor`(`*tmp*`, ri, value = c(-, -, -, ... : invalid factor level, NAs generated 3: In `[<-.factor`(`*tmp*`, ri, value = c(-, -, -, ... : invalid factor level, NAs generated 4: In `[<-.factor`(`*tmp*`, ri, value = c(-, -, -, ... : invalid factor level, NAs generated 5: In `[<-.factor`(`*tmp*`, ri, value = c(-, -, -, ... : invalid factor level, NAs generated I hope my writing is clear and my questions make sense. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and SPSS
Hello, On Wed, Nov 26, 2008 at 9:25 PM, Applejus <[EMAIL PROTECTED]> wrote: > I have a code in R. Could anyone give me the best possible way (or just > ways!) to integrate it in SPSS? > I would doubt you could do this, but for the least "provide commented, minimal, self-contained, reproducible code." It would help if you were more specific. Liviu -- Do you know how to read? http://www.alienetworks.com/srtest.cfm Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Installing packages based on the license
Hello, I would like have an automatic way to avoid installing packages that I cannot use due to license restrictions. For example, the conf.design package is limited to non-commercial use, and since I work for a for-profit business, I cannot use it. I found out about the license terms of conf.design by chance; I would like to avoid any possibility of a mistake in the future. Is there some clever combination of grep and install.packages that I could use to limit my downloads to, say, GPL-only packages? Thanks, Ted Ted Mendum | Senior Scientist Warner Babcock Institute for Green Chemistry 66 Cummings Park, Woburn, MA 01801 p: 781-937-9000 f:781-937-9001 [EMAIL PROTECTED] www.warnerbabcock.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory limit
I routinely compute with a 2,500,000-row dataset with 16 columns, which takes 410MB of storage; my Windows box has 4GB, which avoids thrashing. As long as I'm careful not to compute and save multiple copies of the entire data frame (because 32-bit Windows R is limited to about 1.5GB address space total, including any intermediate results), R works impressively well and fast with this dataset for selections, calculations, cross-tabs, plotting, etc. For example, simple single-column statistics and cross-tabs take << 1 sec., summary of the whole thing takes 16 sec. A linear regression between two numeric columns takes < 20 sec. Plotting of all 2.5M points takes a while, but that is no surprise (and is usually pointless [sic] anyway). I have not tried to do any compute-intensive statistical calculations on the whole data set. The main (but minor) annoyance with it is that it takes about 90 secs to load into memory using R's native binary "save" format, so I tend to keep the process lying around rather than re-starting and re-loading for each analysis. Fortunately, garbage collection is very effective in reclaiming unused storage as long as I'm careful to remove unnecessary objects. -s On Wed, Nov 26, 2008 at 7:42 AM, iwalters <[EMAIL PROTECTED]> wrote: > > I'm currently working with very large datasets that consist out of 1,000,000 > + rows. Is it at all possible to use R for datasets this size or should I > rather consider C++/Java. > > > -- > View this message in context: > http://www.nabble.com/increasing-memory-limit-in-Windows-Server-2008-64-bit-tp20675880p20699700.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
You can do much better by doing the correlations as a matrix operation: system.time({ + m1<-scale(m1) + m2<-scale(m2) + r<-crossprod(m1,m2)/100 + df<-100 + tstat<-sqrt(df)*r/sqrt(1-r^2) + p<-pt(tstat,df) + }) user system elapsed 0.025 0.004 0.028 There might be a factor of n/(n-1) missing somewhere, which would be fixable if you could bring yourself to care about it. -thomas On Wed, 26 Nov 2008, Jorge Ivan Velez wrote: Hi Daren, Here is another aproach a little bit faster taking into account that I'm using your original matrices. My session info is at the end. I'm using a 2.4 GHz Core 2-Duo processor and 3 GB of RAM. # Data set.seed(123) m1 <- matrix(rnorm(10), ncol=100) m2 <- matrix(rnorm(10), ncol=100) colnames(m1)=paste('m1_',1:100,sep="") colnames(m2)=paste('m2_',1:100,sep="") # Combinations combs=expand.grid(colnames(m1),colnames(m2)) # --- # Option 1 # system.time(apply(combs,1,function(x) cor.test(m1[,x[1]],m2[,x[2]])$p.value)->pvalues1) # user system elapsed # 8.120.018.20 # --- # Option 2 # require(Hmisc) system.time(apply(combs,1,function(x) rcorr(m1[,x[1]],m2[,x[2]])$P[2])->pvalues2) # user system elapsed # 7.000.007.02 HTH, Jorge # - Session Info R version 2.8.0 Patched (2008-11-08 r46864) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan <[EMAIL PROTECTED]> wrote: My two matrices are roughly the sizes of m1 and m2. I tried using two apply and cor.test to compute the correlation p.values. More than an hour, and the codes are still running. Please help to make it more efficient. m1 <- matrix(rnorm(10), ncol=100) m2 <- matrix(rnorm(1000), ncol=100) cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) }) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating a vector based on lookup function
On Wed, 26 Nov 2008, PDXRugger wrote: I am still searching for a solution to what i think is a simple problem i am having with building a vector in a for loop. I have built a more understandable example so hopefully that will help..help you, help me, if you know what i mean. dev=400 #test location model TAZs to reference cands=c(101,105,109) #Create Object of length of cands candslength=length(cands) #TEST TAZ Vector CANDS=(100:110) #Test Dev Vector TAZDEVS=c(120,220,320,420,520,620,720,820,920,1020,1120) #Creates datframe of CANDS and DEVS TAZLIST=data.frame(CANDS,TAZDEVS) for(i in 1:candslength){ cand=cands[i] #Creates an object based on the value of "cands" in TAZLIST TAZDet=TAZLIST[TAZLIST$CANDS==cand[i],2] } What i would like to see is TAZDet filled with the "cands" corresponding values found in "TAZLIST"'s second column TAZDEVS, they would be 120,520,920. No. 101 is the second element of CANDS 220 is the second element of TAZDEVS So, 120 is not 'corresponding' in any sense I know of. et cetera. So if things worked the way i would like them TAZDet would be a vector containing these three values (102,520,920). At this point it gives me NAs. Which happens because you have an obvious bug. What is TAZLIST$CANDS==cand[3] ?? Any help would be desirable, Thanks guys and gals. Is this what you seek? with(TAZLIST,TAZDEVS[match(cands,CANDS)]) HTH, Chuck Cheers, JR -- View this message in context: http://www.nabble.com/Creating-a-vector-based-on-lookup-function-tp20706588p20706588.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R and SPSS
Hi, I have a code in R. Could anyone give me the best possible way (or just ways!) to integrate it in SPSS? Thanks! -- View this message in context: http://www.nabble.com/R-and-SPSS-tp20708367p20708367.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SPSSyntax function
Dear all, I've created a function called "spssyntax" that creates a syntax for variable labels and value labels in SPSS (in the *sps* syntax format that was recently discussed), using a list of such labels in R. The entry list looks like this (between the ### signs): ### labels <- list() labels$varlab$id <- "ID" labels$varlab$v1 <- "A label for the first variable" labels$vallab$v1 <- c( "Very low"=1, "Low"=2, "Middle"=3, "High"=4, "Very high"=5, "Not Applicable"=-1, "Not Answered"=-2) ### And the result syntax file looks like this (again, between the ### signs): ### VARIABLE LABELS id "ID" v1 "A label for the first variable" . VALUE LABELS v1 1 "Very low" 2 "Low" 3 "Middle" 4 "High" 5 "Very high" -1 "Not Applicable" -2 "Not Answered" / . MISSING VALUES v1 (-1, -2) . ### Should there be any package interested in adopting this function (maybe foreign?), I'd be happy to contribute it. Best regards, Adrian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Smoothed 3D plots
Hi Jorge, >> In general terms, I would like to plot a smoothed surface with h4 and h11 >> in >> the x and y axis, respectively, and y in the z axis. I think that a >> smoothing procedure should work but I don't know how to get that work in >> R. There are quite a few options. If I were you I would have a look at package locfit: plot(locfit(y ~ lp(h4, h11), data=res), type="persp", theta=-50, phi=25, tick="detailed") See, amongst other options, ?locfit.raw Regards, Mark. PS: Don't worry about the warning messages. They refer to switches that should be set for panel functions. Jorge Ivan Velez wrote: > > DeaR list, > > I'm trying to represent some information via 3D plots. My data and session > info are at the end of this message. So far, I have tried scatterplot3d > (scatterplot3d), > persp3d (rgl), persp (graphics) and scatter3d (Rmcdr) but any of them gave > me what I'd like to have as final result (please see [1] for a similar 3D > plot changing > PF by ypred, pdn by h4 and pup by h11). > > In general terms, I would like to plot a smoothed surface with h4 and h11 > in > the x and y axis, respectively, and y in the z axis. I think that a > smoothing procedure should > work but I don't know how to get that work in R. > > I would really appreciate any help you can provide me. > > Thanks in advance, > > Jorge Velez > > [1] http://www.meyersanalytics.com/pfpuppdn.gif > > > # --- > # Data > # --- > res=structure(list(ypred = c(0.718230085350727, 0.654361814214854, > 0.644756374003864, 0.652006785905442, 0.695903817103918, > 0.812977870063322, > 0.639339808892218, 0.701034703527025, 0.740110233651808, > 0.664942946000872, > 0.866847509624424, 0.652502006511821, 0.650617983864292, > 0.713152860580267, > 0.717738908917513, 0.73869721161, 0.666120460155577, > 0.652242288736383, > 0.596880009623886, 0.671197684926038, 0.45, 0.820174620312253, > 0.665066954282449, 0.735164674202043, 0.696027875479989, > 0.646119215131913, > 0.644235192484384, 0.663550099786557), h4 = c(1, 0, 1, 0, 1, > 1, 1, 0, 0, 0, 1, 2, 2, 1, 0, 0, 0, 0, 2, 0, 2, 1, 1, 0, 0, 0, > 0, 1), h11 = c(0, 0, 1, 0, 1, 2, 1, 0, 1, 2, 2, 1, 1, 0, 2, 1, > 2, 0, 1, 2, 2, 2, 0, 1, 1, 0, 0, 0)), .Names = c("y", "h4", > "h11"), class = "data.frame", row.names = c(NA, -28L)) > > > # > # Some plots > # > > # Option 1 > require(Rcmdr) > with(res, > scatter3d(h4,ypred,h11, xlab='Chr4',zlab='Chr11', > ylab='Ratio',fit=c("linear","quadratic")) > ) > > # Option 2 > require(scatterplot3d) > with(res, > scatterplot3d(h4,ypred,h11, xlab='Chr4',zlab='Chr11',ylab='Ratio') > ) > > > # - > # Session info > # - > R version 2.8.0 Patched (2008-11-08 r46864) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] tcltk stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] rgl_0.81 mgcv_1.4-1 Rcmdr_1.4-4 > car_1.2-9 > > [5] scatterplot3d_0.3-27 lattice_0.17-15 > > loaded via a namespace (and not attached): > [1] grid_2.8.0 > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/Smoothed-3D-plots-tp20706730p20708262.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Request for Assistance in R with NonMem
Hi I am having some problems running a covariate analysis with my colleage using R with the NonMem program we are using for a graduate school project. R and NonMem run fine without adding in the covariates, but the program is giving us a problem when the covariate analysis is added. We think the problem is with the R code to run the covariate data analysis. We have the control stream, R code (along with the error), and data attached. If anyone can help we would really appreciate it. Thank you very much. Michael R Code: # load MItools from within R library(MItools) ProjectDir <- "~/EssentialsOpenCourseware/continuous_PKPD/Examples" NMcom <- "nm6osxg77big.pl" cov <- c("AGE", "BW") # run NONMEM using 3032.ctl NONR(ProjectDir=ProjectDir, NMcom=NMcom, dvname="Test (mcg/L)", diag=1, covplt=1, b=3032, boot=0, nochecksum=TRUE, grp=c("FLAG") ) ERROR MESSAGE: ld warning: for symbol _cm41_ tentative definition of size 32 from /usr/local/nm6osxg77big/nm/nonmem.a(INITL.o) is is smaller than the real definition of size 28 from /usr/local/nm6osxg77big/nm/BLKDAT.o /usr/local/nm6osxg77big/test/nm6osxg77big.pl is starting /Users/mpdubb/EssentialsOpenCourseware/continuous_PKPD/Examples/3033/nonmem.exe /usr/local/nm6osxg77big/test/nm6osxg77big.pl is complete No TAD item for run 3033. Error in as.vector(x, mode) : invalid 'mode' argument Run 3033 complete. NONR complete. CONTROL STREAM (in NONMEM): $PROB RUN# 3033 EFFECT COMP POPULATION PK-PD MODEL 2 $INPUT C REPL ID ARM TIME AGE BW GNDR DV MDV FLAG AMT EVID ;FLAG 1 = PK, FLAG 2=PD, ;CMT 1=GUT, 2=CENTRAL ;DV =PK ORDERED BY ID, TIME $DATA ../projectRSD1.csv IGNORE=C $SUBROUTINE ADVAN2 TRANS2 INFN=../MItoolsRunlogNM6.for $ABB COMRES=8 $PK FLA2=FLAG TVKA=THETA(1) TVCL=THETA(2) TVV=THETA(3) CL=TVCL*EXP(ETA(1)) V=TVV*EXP(ETA(2)) KA=TVKA*EXP(ETA(3)) S2=V $ERROR EMAX=THETA(4)*EXP(ETA(4)) EC50=THETA(5)*EXP(ETA(5)) N = THETA(6)*EXP(ETA(6)) EFF= (EMAX*F**N)/(EC50**N+F**N) + ERR(2) CONC= F + F*ERR(1) Y = CONC*(2-FLAG) + EFF*(FLAG-1) IPRED=Y " LAST " COM(1)=G(1,1) " COM(2)=G(2,1) " COM(3)=G(3,1) " COM(4)=G(4,1) " COM(5)=G(5,1) " COM(6)=G(6,1) " COM(7)=HH(1,1) " COM(8)=HH(2,1) $THETA (0, 1.7) ;KA (0, 4.7) ;CL (0, 38) ;V (0.0, .5) ; EMAX (0.0, 100) ; EC50 (0.0, 1, 10); N $OMEGA 0.04 0.04 0.04 0.04 0.04 0.04 $SIGMA 0.04 5 $ESTIMATION MAXEVAL= NOABORT METHOD=0 POSTHOC PRINT=5 $COVARIANCE $TABLE ID TIME AGE BW GNDR TVKA TVV TVCL ETA1 ETA2 ETA3 EVID NOPRINT ONEHEADER FILE=TABLE.PAR $TABLE NOPRINT FILE=../3033.TAB ONEHEADER ID TIME AGE BW GNDR KA V CL TVKA TVCL TVV FLAG FLA2 EVID EFF CONC IPRED FLAG $TABLE NOPRINT ONEHEADER FILE=../3033par.TAB ID TIME AGE BW GNDR KA V CL TVKA TVCL TVV EMAX EC50 ETA1 ETA2 ETA3 EVID $TABLE ID TIME COM(1)=G11 COM(2)=G21 COM(3)=G31 COM(4)=G41 COM(5)=G51 COM(6)=G61 COM(7)=H11 COM(8)=H21 IPRED MDV NOPRINT ONEHEADER FILE=cwtab1.deriv __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating a vector based on lookup function
I am still searching for a solution to what i think is a simple problem i am having with building a vector in a for loop. I have built a more understandable example so hopefully that will help..help you, help me, if you know what i mean. dev=400 #test location model TAZs to reference cands=c(101,105,109) #Create Object of length of cands candslength=length(cands) #TEST TAZ Vector CANDS=(100:110) #Test Dev Vector TAZDEVS=c(120,220,320,420,520,620,720,820,920,1020,1120) #Creates datframe of CANDS and DEVS TAZLIST=data.frame(CANDS,TAZDEVS) for(i in 1:candslength){ cand=cands[i] #Creates an object based on the value of "cands" in TAZLIST TAZDet=TAZLIST[TAZLIST$CANDS==cand[i],2] } What i would like to see is TAZDet filled with the "cands" corresponding values found in "TAZLIST"'s second column TAZDEVS, they would be 120,520,920. So if things worked the way i would like them TAZDet would be a vector containing these three values (102,520,920). At this point it gives me NAs. Any help would be desirable, Thanks guys and gals. Cheers, JR -- View this message in context: http://www.nabble.com/Creating-a-vector-based-on-lookup-function-tp20706588p20706588.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reshape with var as fun.aggregate
I used to be able to use variance for fun.aggregate, but since I upgraded from R 2.6 to 2.7 (Ubuntu hardy to intrepid), it no longer works. > library("reshape") > data("french_fries") > ffm <- melt(french_fries, id=1:4, na.rm = TRUE) > cast(ffm, rep ~ treatment, length) rep 1 2 3 1 1 579 578 575 2 2 580 579 580 > cast(ffm, rep ~ treatment, mean) rep123 1 1 3.207772 3.177336 3.024522 2 2 3.181207 3.115544 3.277759 > cast(ffm, rep ~ treatment, var) Error in fun.aggregate(data$value[0]) : 'x' is empty Any ideas? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-Square Test Disagreement
> Next time the launch of an incoming nuclear strike is detected, > set them to work as follows (following Karl Pearson's historical > precedent): > > "Anti-aircraft guns all day long": Computing for the > Ministry of Munitions > JUNE BARROW GREEN (Open University) >From January 1917 until March 1918 Pearson and his >staff of mathematicians and human computers at the >Drapers Biometric Laboratory worked tirelessly on >the computing of ballistic charts, high-angle range >tables and fuze-scales for AV Hill of the Anti-Aircraft >Experimental Section. Things did not always go smoothly >-- Pearson did not take kindly to the calculations of >his staff being questioned -- and Hill sometimes had >to work hard to keep the peace. > > If you have enough of them (and Pearson undoubtedly did, so you > can quote that in your requisition request), then you might just > get the answer in time! > > [ The above excerpted from http://tinyurl.com/6byoub ] > > Good luck! > Ted. > That is absolutely classic. -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weighted ftable
I didn't know that. That is exactly what I need. On Tue, 2008-11-25 at 13:00 -0800, Thomas Lumley wrote: > On Mon, 24 Nov 2008, Andrew Choens wrote: > > > I need to do some fairly deep tables, and ftable() offers most of what I > > need, except for the weighting. With smaller samples, I've just used > > replicate to let me have a weighted data set, but with this data set, > > I'm afraid replicate is going to make my data set too big for R to > > handle comfortably. > > > > That being said, is there some way to weight my table (similar to > > wtd.table) but offer the nuanced control and depth of ftable? > > xtabs() will take a weight as the left-hand side of the formula, and its > output can then be processed by ftable(). > > -thomas > > Thomas Lumley Assoc. Professor, Biostatistics > [EMAIL PROTECTED] University of Washington, Seattle -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Smoothed 3D plots
DeaR list, I'm trying to represent some information via 3D plots. My data and session info are at the end of this message. So far, I have tried scatterplot3d (scatterplot3d), persp3d (rgl), persp (graphics) and scatter3d (Rmcdr) but any of them gave me what I'd like to have as final result (please see [1] for a similar 3D plot changing PF by ypred, pdn by h4 and pup by h11). In general terms, I would like to plot a smoothed surface with h4 and h11 in the x and y axis, respectively, and y in the z axis. I think that a smoothing procedure should work but I don't know how to get that work in R. I would really appreciate any help you can provide me. Thanks in advance, Jorge Velez [1] http://www.meyersanalytics.com/pfpuppdn.gif # --- # Data # --- res=structure(list(ypred = c(0.718230085350727, 0.654361814214854, 0.644756374003864, 0.652006785905442, 0.695903817103918, 0.812977870063322, 0.639339808892218, 0.701034703527025, 0.740110233651808, 0.664942946000872, 0.866847509624424, 0.652502006511821, 0.650617983864292, 0.713152860580267, 0.717738908917513, 0.73869721161, 0.666120460155577, 0.652242288736383, 0.596880009623886, 0.671197684926038, 0.45, 0.820174620312253, 0.665066954282449, 0.735164674202043, 0.696027875479989, 0.646119215131913, 0.644235192484384, 0.663550099786557), h4 = c(1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 2, 2, 1, 0, 0, 0, 0, 2, 0, 2, 1, 1, 0, 0, 0, 0, 1), h11 = c(0, 0, 1, 0, 1, 2, 1, 0, 1, 2, 2, 1, 1, 0, 2, 1, 2, 0, 1, 2, 2, 2, 0, 1, 1, 0, 0, 0)), .Names = c("y", "h4", "h11"), class = "data.frame", row.names = c(NA, -28L)) # # Some plots # # Option 1 require(Rcmdr) with(res, scatter3d(h4,ypred,h11, xlab='Chr4',zlab='Chr11', ylab='Ratio',fit=c("linear","quadratic")) ) # Option 2 require(scatterplot3d) with(res, scatterplot3d(h4,ypred,h11, xlab='Chr4',zlab='Chr11',ylab='Ratio') ) # - # Session info # - R version 2.8.0 Patched (2008-11-08 r46864) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] tcltk stats graphics grDevices utils datasets methods base other attached packages: [1] rgl_0.81 mgcv_1.4-1 Rcmdr_1.4-4 car_1.2-9 [5] scatterplot3d_0.3-27 lattice_0.17-15 loaded via a namespace (and not attached): [1] grid_2.8.0 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in sqlCopy in RODBC
I tried your suggestion... library(RODBC) channel = odbcConnectAccess("db.mdb") sqlCopy(channel,"Select * from tab","newtab",destchannel=channel, safer=TRUE,append=TRUE,rownames=FALSE,fast=FALSE) odbcClose(channel) however, I am still running into errors, both when appending to an existing table, or creating a table if the destination table does not exist. The code I am using is: query <- " select * from tblHistorical where MyDate between '2008-11-21' and '2008-11-25' ; " sqlCopy(RemoteChannel, query, "NewTable", destchannel=LocalChannel, safer=TRUE, append=TRUE, rownames=FALSE, fast=FALSE) If I run this when "NewTable" does not yet exist in the database, it returns the error: ERROR: Could not SQLExecDirect 42000 102 [Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax near '16'. And when this runs when "NewTable" is already defined with correct data types, I get the error: Error in sqlSave(destchannel, dataset, destination, verbose = verbose, : unable to append to table ‘NewTable’ It seems I can't get it to work in either scenario. Any insight anyone could provide would be greatly appreciated. Dieter Menne wrote: > > > > BKMooney wrote: >> >> I am trying to copy portions of tables from one SQL database to another, >> using sqlCopy in the RODBC package. >> >> ... >> I am currently getting an error: >> Error in sqlSave(destchannel, dataset, destination, verbose = verbose, : >> table 'LocalTable' already exists >> > > I can reproduce your error with my example file and fast=TRUE. You might > try fast=FALSE > and append=TRUE when things like this happens. The following works for me > > library(RODBC) > channel = odbcConnectAccess("db.mdb") > sqlCopy(channel,"Select * from tab","newtab",destchannel=channel, > safer=TRUE,append=TRUE,rownames=FALSE,fast=FALSE) > odbcClose(channel) > > > > -- View this message in context: http://www.nabble.com/Error-in-sqlCopy-in-RODBC-tp20691929p20706667.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-Square Test Disagreement
On 26-Nov-08 17:57:52, Andrew Choens wrote: > [...] > And, since I do work for government, if I ask for a roomful of > calculators, I might just get them. And really, what am I going > to do with a roomful of calculators? > > --andy > Insert something humorous here. :-) Next time the launch of an incoming nuclear strike is detected, set them to work as follows (following Karl Pearson's historical precedent): "Anti-aircraft guns all day long": Computing for the Ministry of Munitions JUNE BARROW GREEN (Open University) From January 1917 until March 1918 Pearson and his staff of mathematicians and human computers at the Drapers Biometric Laboratory worked tirelessly on the computing of ballistic charts, high-angle range tables and fuze-scales for AV Hill of the Anti-Aircraft Experimental Section. Things did not always go smoothly -- Pearson did not take kindly to the calculations of his staff being questioned -- and Hill sometimes had to work hard to keep the peace. If you have enough of them (and Pearson undoubtedly did, so you can quote that in your requisition request), then you might just get the answer in time! [ The above excerpted from http://tinyurl.com/6byoub ] Good luck! Ted. E-Mail: (Ted Harding) <[EMAIL PROTECTED]> Fax-to-email: +44 (0)870 094 0861 Date: 26-Nov-08 Time: 18:35:25 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] S4 slot containing either aov or NULL
Dear Thomas, take a look at setOldClass ... ## register "aov" (an S3-class) as a formally defined class setOldClass("aov") ## "list" and some others are special cases; cf. class ? list ## now your code should work setClassUnion("aovOrNULL", c("aov", "NULL")) setClass("c1", representation(value = "aovOrNULL")) y1 <- new("c1", value = NULL) #trying to assign an aov object to the slot doesn't work utils::data(npk, package="MASS") npk.aov <- aov(yield ~ block + N*P*K, npk) y2 <- new("c1", value = npk.aov ) Best, Matthias Thomas Kaliwe wrote: Dear listmembers, I would like to define a class with a slot that takes either an object of class aov or NULL. I have been reading "S4 Classes in 15 pages more or less" and "Lecture: S4 classes and methods" #First I tried with list and NULL setClass(listOrNULL") setIs("list", "listOrNULL") setIs("NULL", "listOrNULL") #doesn't work #defining a union class it works with list and null setClassUnion("listOrNULL", c("list", "NULL")) setClass("c1", representation(value = "listOrNULL")) y1 = new("c1", value = NULL) y2 = new("c1", value = list(a = 10)) #but it won't work with aov or null setClassUnion("aovOrNULL", c("aov", "NULL")) setClass("c1", representation(value = "aovOrNULL")) y1 = new("c1", value = NULL) #trying to assign an aov object to the slot doesn't work utils::data(npk, package="MASS") npk.aov <- aov(yield ~ block + N*P*K, npk) y2 = new("c1", value = npk.aov ) Any ideas? Thank you Thomas Kaliwe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr. Matthias Kohl www.stamats.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with aovlmer.fnc in languageR
Dear R list, I have a recurring problem with the languageR package, specifically the aovlmer.fnc function. When I try to run the following code (from R. H. Baayen's textbook): # Example 1: library(languageR) latinsquare.lmer <- lmer(RT ~ SOA + (1 | Word) + (1 | Subject), data = latinsquare) x <- pvals.fnc(latinsquare.lmer, withMCMC = TRUE) aovlmer.fnc(latinsquare.lmer, mcmc = x$mcmc, which = c("SOAmedium", "SOAshort")) I get the following error message (German locale): Fehler in anova(object) : Calculated PWRSS for a LMM is negative Invoking traceback yields the following result: > traceback() 4: .Call(mer_update_projection, object) 3: anova(object) 2: anova(object) 1: aovlmer.fnc(latinsquare.lmer, mcmc = x$mcmc, which = c("SOAmedium", "SOAshort")) By contrast, the following code (without the aovlmer.fnc command) runs without error: # Example 2: library(languageR) latinsquare.lmer <- lmer(RT ~ SOA + (1 | Word) + (1 | Subject), data = latinsquare) pvals.fnc(latinsquare.lmer, withMCMC = TRUE) Similarly, the following code (without the pvals.fnc command, and consequently running aovlmer.fnc *without* MCMC sampling) also runs without error: # Example 3: library(languageR) latinsquare.lmer <- lmer(RT ~ SOA + (1 | Word) + (1 | Subject), data = latinsquare) aovlmer.fnc(latinsquare.lmer, noMCMC = TRUE) However, the following code results in exactly the same error message as given above, which puzzles me because up to and including the pvals.fnc command, the code is basically the same as in Example 2 (which runs fine); and the code in the final aovlmer.fnc command is the same as in Example 3 (which also runs fine) and, crucially, does not refer to the object created by pvals.fnc (named "x") at all: # Example 4: library(languageR) latinsquare.lmer <- lmer(RT ~ SOA + (1 | Word) + (1 | Subject), data = latinsquare) x <- pvals.fnc(latinsquare.lmer, withMCMC = TRUE) aovlmer.fnc(latinsquare.lmer, noMCMC = TRUE) Trying to run the examples given in the languageR documentation, no error occurs for me with the example in ?pvals.fnc; however, the same error as above occurs when running the example given in ?aovlmer.fnc. I am using Windows XP SP3, R version 2.8.0, and updated packages. The following is the result of sessionInfo: > sessionInfo() R version 2.8.0 (2008-10-20) i386-pc-mingw32 locale: LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] languageR_0.953lme4_0.999375-27 Matrix_0.999375-16 zipfR_0.6-5lattice_0.17-17 loaded via a namespace (and not attached): [1] grid_2.8.0 tools_2.8.0 To sum up, the error occurs when invoking aovlmer.fnc, but somehow seems to be caused by previous invocation of pvals.fnc. I have reproduced the error on several other computers (running on Windows XP SP2, Windows XP SP3, and Windows Vista) with fresh installations of R, so I do hope I have excluded any trivial blunders on my part. Any help is greatly appreciated. Thank you very much in advance! Mats Exter -- Mats Exter Institut für Linguistik Allgemeine Sprachwissenschaft Universität zu Köln 50923 Köln Germany __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-Square Test Disagreement
On Thu, 2008-11-27 at 00:46 +0800, Berwin A Turlach wrote: > Chuck explained already the reason for this small difference. I just > take issue about it being an important difference. In my opinion, > this difference is not important at all. It would only be important > to people who are still sticking to arbitrary cut-off points that are > mainly due to historical coincidences and the lack of computing power > at those time in history. If somebody tells you that this difference > is important, ask him or her whether he or she will be willing to > finance you a room full of calculators (in the sense of Pearson's > time) and whether he or she wants you to do all your calculations and > analyses with these calculators in future. Alternatively, you could > ask the person whether he or she would like the anaesthetist during > his or her next operation to use chloroform given his or her nostalgic > penchant for out-dated rituals/methods. Yes he did and when I realized the source of my confusion I was appropriately chastised. I felt like a bit of a fool. Of course, I should try comparing apples to apples. Oranges are another thing entirely. As to the importance of the difference, I am of two minds. On the one hand I fully agree with you. It is an anachronistic approach. On the other hand we don't all have the pleasure of working in a math department where such subtleties are well understood. I work for a consulting firm that advises state and local governments (USA). I personally do try to expand my understanding on statistics and math (I do not have a degree in math), but my clients do not. When I'm working with someone from the government, it is sometimes easier to simply tell them that relationship x is significant at a certain level of certainty. Although I doubt they could really explain the details, they have some basic understanding of what I am talking about. Subtleties are sometimes lost on our public servants. And, since I do work for government, if I ask for a roomful of calculators, I might just get them. And really, what am I going to do with a roomful of calculators? --andy -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-Square Test Disagreement
G'day Andy, On Wed, 26 Nov 2008 14:51:50 + Andrew Choens <[EMAIL PROTECTED]> wrote: > I was asked by my boss to do an analysis on a large data set, and I am > trying to convince him to let me use R rather than SPSS. Very laudable of you. :) > This is the output from R: > > chisq.test(test29) > > Pearson's Chi-squared test > > data: test29 > X-squared = 9.593, df = 4, p-value = 0.04787 > > But, the same data in SPSS generates a p value of .051. It's a small > but important difference. Chuck explained already the reason for this small difference. I just take issue about it being an important difference. In my opinion, this difference is not important at all. It would only be important to people who are still sticking to arbitrary cut-off points that are mainly due to historical coincidences and the lack of computing power at those time in history. If somebody tells you that this difference is important, ask him or her whether he or she will be willing to finance you a room full of calculators (in the sense of Pearson's time) and whether he or she wants you to do all your calculations and analyses with these calculators in future. Alternatively, you could ask the person whether he or she would like the anaesthetist during his or her next operation to use chloroform given his or her nostalgic penchant for out-dated rituals/methods. > I played around and rescaled things, and tried different values for > B, but I never could get R to reach .051. Well, I have no problem when using simulated p-values to get something close to 0.051; look at the last try. The second one might also be noteworthy. Unfortunately, I didn't save the seed beforehand. > test29 <- matrix(c(110,358,71,312,29,139,31,77,13,32), byrow=TRUE, > ncol=2) test29 [,1] [,2] [1,] 110 358 [2,] 71 312 [3,] 29 139 [4,] 31 77 [5,] 13 32 > chisq.test(test29, simul=TRUE) Pearson's Chi-squared test with simulated p-value (based on 2000 replicates) data: test29 X-squared = 9.593, df = NA, p-value = 0.04798 > chisq.test(test29, simul=TRUE) Pearson's Chi-squared test with simulated p-value (based on 2000 replicates) data: test29 X-squared = 9.593, df = NA, p-value = 0.05697 > chisq.test(test29, simul=TRUE, B=2) Pearson's Chi-squared test with simulated p-value (based on 2 replicates) data: test29 X-squared = 9.593, df = NA, p-value = 0.0463 > chisq.test(test29, simul=TRUE, B=2) Pearson's Chi-squared test with simulated p-value (based on 2 replicates) data: test29 X-squared = 9.593, df = NA, p-value = 0.0499 > chisq.test(test29, simul=TRUE, B=2) Pearson's Chi-squared test with simulated p-value (based on 2 replicates) data: test29 X-squared = 9.593, df = NA, p-value = 0.0486 > chisq.test(test29, simul=TRUE, B=2) Pearson's Chi-squared test with simulated p-value (based on 2 replicates) data: test29 X-squared = 9.593, df = NA, p-value = 0.05125 Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple imputation with fit.mult.impute in Hmisc - how to replace NA with imputed value?
Charlie Brush wrote: Frank E Harrell Jr wrote: Charlie Brush wrote: I am doing multiple imputation with Hmisc, and can't figure out how to replace the NA values with the imputed values. Here's a general ourline of the process: > set.seed(23) > library("mice") > library("Hmisc") > library("Design") > d <- read.table("DailyDataRaw_01.txt",header=T) > length(d);length(d[,1]) [1] 43 [1] 2666 Do for this data set, there are 43 columns and 2666 rows Here is a piece of data.frame d: > d[1:20,4:6] P01 P02 P03 1 0.1 0.16 0.16 2 NA 0.00 0.00 3 NA 0.60 0.04 4 NA 0.15 0.00 5 NA 0.00 0.00 6 0.7 0.00 0.75 7 NA 0.00 0.00 8 NA 0.00 0.00 9 0.0 0.00 0.00 10 0.0 0.00 0.00 11 0.0 0.00 0.00 12 0.0 0.00 0.00 13 0.0 0.00 0.00 14 0.0 0.00 0.00 15 0.0 0.00 0.03 16 NA 0.00 0.00 17 NA 0.01 0.00 18 0.0 0.00 0.00 19 0.0 0.00 0.00 20 0.0 0.00 0.00 These are daily precipitation values at NCDC stations, and NA values at station P01 will be filled using multiple imputation and data from highly correlated stations P02 and P08: > f <- aregImpute(~ I(P01) + I(P02) + I(P08), n.impute=10,match='closest',data=d) Iteration 13 > fmi <- fit.mult.impute( P01 ~ P02 + P08 , ols, f, d) Variance Inflation Factors Due to Imputation: Intercept P02 P08 1.01 1.39 1.16 Rate of Missing Information: Intercept P02 P08 0.01 0.28 0.14 d.f. for t-distribution for Tests of Single Coefficients: Intercept P02 P08 242291.18116.05454.95 > r <- apply(f$imputed$P01,1,mean) > r 2 3 4 5 7 81617 249 250 251 0.002 0.430 0.044 0.002 0.002 0.002 0.002 0.123 0.002 0.002 0.002 252 253 254 255 256 257 258 259 260 261 262 1.033 0.529 1.264 0.611 0.002 0.513 0.085 0.002 0.705 0.840 0.719 263 264 265 266 267 268 269 270 271 272 273 1.489 0.532 0.150 0.134 0.002 0.002 0.002 0.002 0.002 0.055 0.135 274 275 276 277 278 279 280 281 282 283 284 0.009 0.002 0.002 0.002 0.008 0.454 1.676 1.462 0.071 0.002 1.029 285 286 287 288 289 418 419 420 421 422 700 0.055 0.384 0.947 0.002 0.002 0.008 0.759 0.066 0.009 0.002 0.002 -- So far, this is working great. Now, make a copy of d: > dnew <- d And then fill in the NA values in P01 with the values in r For example: > for (i in 1:length(r)){ dnew$P01[r[i,1]] <- r[i,2] } This doesn't work, because each 'piece' of r is two numbers: > r[1] 2 0.002 > r[1,1] Error in r[1, 1] : incorrect number of dimensions My question: how can I separate the the two items in (for example) r[1] to use the first part as an index and the second as a value, and then use them to replace the NA values with the imputed values? Or is there a better way to replace the NA values with the imputed values? Thanks in advance for any help. You didn't state your goal, and why fit.mult.impute does not do what you want. But you can look inside fit.mult.impute to see how it retrieves the imputed values. Also see the example in documentation for transcan in which the command impute(xt, imputation=1) to retrieve one of the multiple imputations. Note that you can say library(Design) (omit the quotes) to access both Design and Hmisc. Frank Thanks for your help. My goal is to replace the NA values in the (copy of the) data frame with the means of the imputed values (which are now in variable 'r'). fit.mult.impute works fine. I just can't figure out the last step, taking the results of fit.mult.impute (which are in variable 'r') and replacing the NA values in the (copy of the) data frame. A simple for loop doesn't work because the items in 'r' don't look like a normal vector, as for example r[1] returns 2 0.002 Is there a command to replace the NA values in the data frame with the means of the imputed values? Thanks, Charlie Don't do that, as this would no longer be multiple imputation. If you want single conditional mean imputation use transcan. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple imputation with fit.mult.impute in Hmisc - how to replace NA with imputed value?
Frank E Harrell Jr wrote: Charlie Brush wrote: I am doing multiple imputation with Hmisc, and can't figure out how to replace the NA values with the imputed values. Here's a general ourline of the process: > set.seed(23) > library("mice") > library("Hmisc") > library("Design") > d <- read.table("DailyDataRaw_01.txt",header=T) > length(d);length(d[,1]) [1] 43 [1] 2666 Do for this data set, there are 43 columns and 2666 rows Here is a piece of data.frame d: > d[1:20,4:6] P01 P02 P03 1 0.1 0.16 0.16 2 NA 0.00 0.00 3 NA 0.60 0.04 4 NA 0.15 0.00 5 NA 0.00 0.00 6 0.7 0.00 0.75 7 NA 0.00 0.00 8 NA 0.00 0.00 9 0.0 0.00 0.00 10 0.0 0.00 0.00 11 0.0 0.00 0.00 12 0.0 0.00 0.00 13 0.0 0.00 0.00 14 0.0 0.00 0.00 15 0.0 0.00 0.03 16 NA 0.00 0.00 17 NA 0.01 0.00 18 0.0 0.00 0.00 19 0.0 0.00 0.00 20 0.0 0.00 0.00 These are daily precipitation values at NCDC stations, and NA values at station P01 will be filled using multiple imputation and data from highly correlated stations P02 and P08: > f <- aregImpute(~ I(P01) + I(P02) + I(P08), n.impute=10,match='closest',data=d) Iteration 13 > fmi <- fit.mult.impute( P01 ~ P02 + P08 , ols, f, d) Variance Inflation Factors Due to Imputation: Intercept P02 P08 1.01 1.39 1.16 Rate of Missing Information: Intercept P02 P08 0.01 0.28 0.14 d.f. for t-distribution for Tests of Single Coefficients: Intercept P02 P08 242291.18116.05454.95 > r <- apply(f$imputed$P01,1,mean) > r 2 3 4 5 7 81617 249 250 251 0.002 0.430 0.044 0.002 0.002 0.002 0.002 0.123 0.002 0.002 0.002 252 253 254 255 256 257 258 259 260 261 262 1.033 0.529 1.264 0.611 0.002 0.513 0.085 0.002 0.705 0.840 0.719 263 264 265 266 267 268 269 270 271 272 273 1.489 0.532 0.150 0.134 0.002 0.002 0.002 0.002 0.002 0.055 0.135 274 275 276 277 278 279 280 281 282 283 284 0.009 0.002 0.002 0.002 0.008 0.454 1.676 1.462 0.071 0.002 1.029 285 286 287 288 289 418 419 420 421 422 700 0.055 0.384 0.947 0.002 0.002 0.008 0.759 0.066 0.009 0.002 0.002 -- So far, this is working great. Now, make a copy of d: > dnew <- d And then fill in the NA values in P01 with the values in r For example: > for (i in 1:length(r)){ dnew$P01[r[i,1]] <- r[i,2] } This doesn't work, because each 'piece' of r is two numbers: > r[1] 2 0.002 > r[1,1] Error in r[1, 1] : incorrect number of dimensions My question: how can I separate the the two items in (for example) r[1] to use the first part as an index and the second as a value, and then use them to replace the NA values with the imputed values? Or is there a better way to replace the NA values with the imputed values? Thanks in advance for any help. You didn't state your goal, and why fit.mult.impute does not do what you want. But you can look inside fit.mult.impute to see how it retrieves the imputed values. Also see the example in documentation for transcan in which the command impute(xt, imputation=1) to retrieve one of the multiple imputations. Note that you can say library(Design) (omit the quotes) to access both Design and Hmisc. Frank Thanks for your help. My goal is to replace the NA values in the (copy of the) data frame with the means of the imputed values (which are now in variable 'r'). fit.mult.impute works fine. I just can't figure out the last step, taking the results of fit.mult.impute (which are in variable 'r') and replacing the NA values in the (copy of the) data frame. A simple for loop doesn't work because the items in 'r' don't look like a normal vector, as for example r[1] returns 2 0.002 Is there a command to replace the NA values in the data frame with the means of the imputed values? Thanks, Charlie __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Second y-axis
Hi list, In the following code, how can I place the percentage label away from numbers in the second y-axis (lets say all should be inside plot area)? Thanks Alireza = require(grid) vp<- viewport(x=.1,y=.1,width=.6,height=.6,just=c("left", "bottom")) pushViewport(vp) plotDATA=data.frame(Loss=c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10),Level=c("AvgAll","AvgAll","AvgAll","AvgAll","AvgAll","AvgAll","AvgAll", "AvgAll","AvgAll","AvgAll","AvgAll","AvgAll","GUL","GUL","GUL","GUL","GUL","GUL","GUL","GUL"),Line=c(1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8)) library(lattice) xyplot( Loss ~ Line, data=plotDATA, t="p", scales=list(relation="free", x=list(draw=TRUE, tick.number=12, labels= 1:12)),par.settings = list(clip = list(panel = "off"))) p<- xyplot( Loss ~ Line, data=plotDATA, t="p",scales=list(relation="free",x=list(at = 1:12)), panel=function(x,y,subscripts, groups,...){ panel.xyplot(subset(plotDATA, Level=="AvgAll")$Line,subset(plotDATA, Level=="AvgAll")$Loss ,col=Lloydscolour(colIncP),lwd=3,origin=0,...) panel.axis(side = "right", at=unique(plotDATA$Loss),labels=unique(plotDATA$Loss)/max(plotDATA$Loss)*100,outside=FALSE,ticks=TRUE,half=FALSE) panel.axis(side = "right", at=median(plotDATA$Loss),labels="Percentage",outside=FALSE,ticks=FALSE,half=FALSE,rot=90) panel.axis(side = "right", at=c(4,8),labels=c(200,400),outside=TRUE,ticks=TRUE,half=FALSE) panel.barchart(subset(plotDATA,Level=="GUL" )$Line, subset(plotDATA,Level=="GUL" )$Loss,box.ratio=1,horizontal = FALSE,stack = TRUE,reference = TRUE,col="blue",border="blue")#,origin=0) } ) print(p,position = c(0.1, 0.1, 0.9, .9)) = [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survreg and pweibull
Dear all - I have followed the thread the reply to which was lead by Thomas Lumley about using pweibull to generate fitted survival curves for survreg models. http://tolstoy.newcastle.edu.au/R/help/04/11/7766.html Using the lung data set, data(lung) lung.wbs <- survreg( Surv(time, status)~ 1, data=lung, dist='weibull') curve(pweibull(x, scale=exp(coef(lung.wbs)), shape=1/lung.wbs $scale,lower.tail=FALSE),from=0, to=max(lung$time)) lines(survfit(Surv(time,status)~1, data=lung), col="red") Assuming this is correct, why does the inflection point of this curve not match up to the exp(scale parameter)? Am I wrong in assuming that the scale represents the inflection, and the shape adjusts the shape around this point? I think I am perhaps confusing the scale and the median with the inflection point calcuation? One can visualise the mismatch with: abline(v=exp(coef(lung.wbs)),lty=2) abline(h=0.5,lty=2) Many thanks for the clarification R version 2.8.0 (2008-10-20) i386-apple-darwin8.11.1 locale: en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] splines datasets utils stats graphics grDevices methods base other attached packages: [1] survival_2.34-1 Hmisc_3.4-3 lattice_0.17-15 MASS_7.2-44 loaded via a namespace (and not attached): [1] cluster_1.11.11 grid_2.8.0 tools_2.8.0 Andrew - Dr. Andrew Beckerman Department of Animal and Plant Sciences, University of Sheffield, Alfred Denny Building, Western Bank, Sheffield S10 2TN, UK ph +44 (0)114 222 0026; fx +44 (0)114 222 0002 http://www.beckslab.staff.shef.ac.uk/ http://www.flickr.com/photos/apbeckerman/ http://www.warblefly.co.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ts subscripting problem
hi, i am having trouble getting a particular time series to plot. this is what i have: > class(irradiance) [1] "ts" > irradiance[1:30] 197811 197812 197901 197902 197903 197904 197905 197906 1366.679 1366.729 1367.476 1367.739 1368.339 1367.883 1367.916 1367.055 197907 197908 197909 197910 197911 197912 198001 198002 1367.484 1366.887 1366.935 1367.034 1366.997 1367.310 1367.041 1366.459 198003 198004 198005 198006 198007 198008 198009 198010 1367.143 1366.553 1366.597 1366.854 1366.814 1366.901 1366.622 1366.669 198011 198012 198101 198102 198103 198104 1365.874 1366.098 1367.141 1366.239 1366.323 1366.388 > plot(irradiance[1:30]) > plot(irradiance) Error in dn[[2]] : subscript out of bounds so, if i plot a subset of the data it works fine. but if i try to plot the whole thing it breaks. the ts object was created using: irradiance = ts(tapply(d$number, f, mean), freq = 12, start = c(1978, 11)) and other ts objects that i have defined using basically the same approach work fine. any ideas greatly appreciated! cheers, andrew. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RES: memory limit
It depends of the number of the variables. If you are using 2 or 3 variables you can do some things. I should you read about ff package and ASOR packages they manage the dataset to do some kind of IO. Regards, -Mensagem original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Em nome de iwalters Enviada em: quarta-feira, 26 de novembro de 2008 09:42 Para: r-help@r-project.org Assunto: [R] memory limit I'm currently working with very large datasets that consist out of 1,000,000 + rows. Is it at all possible to use R for datasets this size or should I rather consider C++/Java. -- View this message in context: http://www.nabble.com/increasing-memory-limit-in-Windows-Server-2008-64-bit- tp20675880p20699700.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RES: S4 object
Take a look at the links you will found on this previous post of the list. http://tolstoy.newcastle.edu.au/R/help/06/01/18259.html I myself don't know anything about this subject. Sorry, but you probably will found what you need there. Best Wishes Rodrigo. -Mensagem original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Em nome de Laurina Guerra Enviada em: quarta-feira, 26 de novembro de 2008 11:43 Para: r-help@r-project.org Assunto: [R] S4 object Hola buen dia!! Alguien me podrma orientar acerca de que es la clase S4 object y como funciona de la forma mas sencilla posible. Se les agradecerma su respuesta lo mas pronto posible.. gracias!! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting density for truncated distribution
thank you, both solutions are really helpful! -- View this message in context: http://www.nabble.com/plotting-density-for-truncated-distribution-tp20684995p20703469.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
Out of desperation, I made the following function which hadley beats me to it :P. Thanks everyone for the great help. cor.p.values <- function(r, n) { df <- n - 2 STATISTIC <- c(sqrt(df) * r / sqrt(1 - r^2)) p <- pt(STATISTIC, df) return(2 * pmin(p, 1 - p)) } > Date: Wed, 26 Nov 2008 09:33:59 -0600 > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: Re: [R] Very slow: using double apply and cor.test to compute > correlation p.values for 2 matrices > CC: [EMAIL PROTECTED]; [EMAIL PROTECTED] > > On Wed, Nov 26, 2008 at 8:14 AM, jim holtman wrote: >> Your time is being taken up in cor.test because you are calling it >> 100,000 times. So grin and bear it with the amount of work you are >> asking it to do. >> >> Here I am only calling it 100 time: >> >>> m1 <- matrix(rnorm(1), ncol=100) >>> m2 <- matrix(rnorm(1), ncol=100) >>> Rprof('/tempxx.txt') >>> system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, >>> function(y) { cor.test(x,y)$p.value }) })) >> user system elapsed >> 8.86 0.00 8.89 >>> >> >> so my guess is that calling it 100,000 times will take: 100,000 * >> 0.0886 seconds or about 3 hours. > > You can make it ~3 times faster by vectorising the testing: > > m1 <- matrix(rnorm(1), ncol=100) > m2 <- matrix(rnorm(1), ncol=100) > > system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, > function(y) { cor.test(x,y)$p.value })})) > > > system.time({ > r <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })}) > > df <- nrow(m1) - 2 > t <- sqrt(df) * r / sqrt(1 - r ^ 2) > p <- pt(t, df) > p <- 2 * pmin(p, 1 - p) > }) > > > all.equal(cor.pvalues, p) > > > You can make cor much faster by stripping away all the error checking > code and calling the internal c function directly (suggested by the > Rprof output): > > > system.time({ > r <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })}) > }) > > system.time({ > r2 <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { > .Internal(cor(x, y, 4L, FALSE)) })}) > }) > > 1.5s vs 0.2 s on my computer. Combining both changes gives me a ~25 > time speed up - I suspect you can do even better if you think about > what calculations are being duplicated in the computation of the > correlations. > > Hadley > > -- > http://had.co.nz/ _ [[elided Hotmail spam]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
On Wed, Nov 26, 2008 at 8:14 AM, jim holtman <[EMAIL PROTECTED]> wrote: > Your time is being taken up in cor.test because you are calling it > 100,000 times. So grin and bear it with the amount of work you are > asking it to do. > > Here I am only calling it 100 time: > >> m1 <- matrix(rnorm(1), ncol=100) >> m2 <- matrix(rnorm(1), ncol=100) >> Rprof('/tempxx.txt') >> system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, >> function(y) { cor.test(x,y)$p.value }) })) > user system elapsed > 8.860.008.89 >> > > so my guess is that calling it 100,000 times will take: 100,000 * > 0.0886 seconds or about 3 hours. You can make it ~3 times faster by vectorising the testing: m1 <- matrix(rnorm(1), ncol=100) m2 <- matrix(rnorm(1), ncol=100) system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value })})) system.time({ r <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })}) df <- nrow(m1) - 2 t <- sqrt(df) * r / sqrt(1 - r ^ 2) p <- pt(t, df) p <- 2 * pmin(p, 1 - p) }) all.equal(cor.pvalues, p) You can make cor much faster by stripping away all the error checking code and calling the internal c function directly (suggested by the Rprof output): system.time({ r <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })}) }) system.time({ r2 <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { .Internal(cor(x, y, 4L, FALSE)) })}) }) 1.5s vs 0.2 s on my computer. Combining both changes gives me a ~25 time speed up - I suspect you can do even better if you think about what calculations are being duplicated in the computation of the correlations. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] odfWeave and XML... on a Mac
I'm trying out odfWeave in a Mac environment and getting some odd behavior. Having figured out that the code snippets only work if they're in certain fonts, I was able to get R to run a test document through and produce an output document. After running it, though, I get a warning message: Warning message: In file.remove("styles_2.xml") : cannot remove file 'styles_2.xml', reason 'No such file or directory' This message is interesting given that about 20 lines earlier I see: Renaming styles_2.xml to styles.xml If I run the test doc in results=verbatim mode, I see that warning by my results appear in the appropriate places on the output document: *output* This is the basic text stuff. Now I will try to input the other stuff: [1] 5 And this is the after-text. *end* If I run the test document in results=xml mode, though, the output is blank: ***output* This is the basic text stuff. Now I will try to input the other stuff: And this is the after-text. **end* Earlier posts on this forum suggest that the solution may involve loading an earlier build of XML. Is that likely to work? And if so - stupid question I'm sure, but how do I do that? Thanks in advance for the time and attention of people more experienced than myself... -- View this message in context: http://www.nabble.com/odfWeave-and-XML...-on-a-Mac-tp20702670p20702670.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] S4 slot containing either aov or NULL
Dear listmembers, I would like to define a class with a slot that takes either an object of class aov or NULL. I have been reading "S4 Classes in 15 pages more or less" and "Lecture: S4 classes and methods" #First I tried with list and NULL setClass(listOrNULL") setIs("list", "listOrNULL") setIs("NULL", "listOrNULL") #doesn't work #defining a union class it works with list and null setClassUnion("listOrNULL", c("list", "NULL")) setClass("c1", representation(value = "listOrNULL")) y1 = new("c1", value = NULL) y2 = new("c1", value = list(a = 10)) #but it won't work with aov or null setClassUnion("aovOrNULL", c("aov", "NULL")) setClass("c1", representation(value = "aovOrNULL")) y1 = new("c1", value = NULL) #trying to assign an aov object to the slot doesn't work utils::data(npk, package="MASS") npk.aov <- aov(yield ~ block + N*P*K, npk) y2 = new("c1", value = npk.aov ) Any ideas? Thank you Thomas Kaliwe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
Hi Daren, Here is another aproach a little bit faster taking into account that I'm using your original matrices. My session info is at the end. I'm using a 2.4 GHz Core 2-Duo processor and 3 GB of RAM. # Data set.seed(123) m1 <- matrix(rnorm(10), ncol=100) m2 <- matrix(rnorm(10), ncol=100) colnames(m1)=paste('m1_',1:100,sep="") colnames(m2)=paste('m2_',1:100,sep="") # Combinations combs=expand.grid(colnames(m1),colnames(m2)) # --- # Option 1 # system.time(apply(combs,1,function(x) cor.test(m1[,x[1]],m2[,x[2]])$p.value)->pvalues1) # user system elapsed # 8.120.018.20 # --- # Option 2 # require(Hmisc) system.time(apply(combs,1,function(x) rcorr(m1[,x[1]],m2[,x[2]])$P[2])->pvalues2) # user system elapsed # 7.000.007.02 HTH, Jorge # - Session Info R version 2.8.0 Patched (2008-11-08 r46864) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan <[EMAIL PROTECTED]> wrote: > > My two matrices are roughly the sizes of m1 and m2. I tried using two apply > and cor.test to compute the correlation p.values. More than an hour, and the > codes are still running. Please help to make it more efficient. > > m1 <- matrix(rnorm(10), ncol=100) > m2 <- matrix(rnorm(1000), ncol=100) > > cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { > cor.test(x,y)$p.value }) }) > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-Square Test Disagreement
On 11/26/2008 9:51 AM, Andrew Choens wrote: > I was asked by my boss to do an analysis on a large data set, and I am > trying to convince him to let me use R rather than SPSS. I think Sweave > could make my life much much easier. To get me a little closer to this > goal, I ran my analysis through R and SPSS and compared the resulting > values. In all but one case, they were the same. Given the matrix > > [,1] [,2] > [1,] 110 358 > [2,] 71 312 > [3,] 29 139 > [4,] 31 77 > [5,] 13 32 > > This is the output from R: >> chisq.test(test29) > > Pearson's Chi-squared test > > data: test29 > X-squared = 9.593, df = 4, p-value = 0.04787 > > But, the same data in SPSS generates a p value of .051. It's a small but > important difference. I played around and rescaled things, and tried > different values for B, but I never could get R to reach .051. > > I'd like to know which program is correct - R or SPSS? I know, this is a > biased place to ask such a question. I also appreciate all input that > will help me use R more effectively. The difference could be the result > of my own ignorance. The SPSS p-value is for the Likelihood Ratio Chi-squared test, not Pearson's. For Pearson's Chi-squared test in SPSS (16.0.2), I get p=0.04787, so the results do match if you do the same Chi-squared test. > thanks > --andy -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] S4 object
Hola buen dia!! Alguien me podría orientar acerca de que es la clase S4 object y como funciona de la forma mas sencilla posible. Se les agradecería su respuesta lo mas pronto posible.. gracias!! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plm pakage
Hi everyone, I'm doing a panel data analisys and I want to run three estimation methods against my available dataset:pooled OLS, random and fixed effects. I have 9 individuals and 5 observation for each individual. this is my code,what's wrong? X <- cbind(y,x) X <-data.frame(X) ooo<-pdata.frame(X,9) vedo<-plm(y~x, data=ooo) and this is the error: Errore in X.m[, coef.within, drop = F] : numero di dimensione errato thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Needs suggestions for choosing appropriate R packages
Dear all, I am thinking to fit a multilevel dataset with R. I have found several possible packages for my task, such as glmmPQL(MASS),glmm(repeated),glmer(lme4), et al. I am a little confused by these functions. Could anybody tell me which function/package is the correct one to analyse my dataset? My dataset is as follows: the response variable P is binary variable (the subject is a patient or not); two explanatory variables X1 (age) and X2 (sex); this dataset was sampled from three different level, district,school,individual, so this was regarded as a multilevel dataset. I hope to fit the 3-level model(Y is binary variable): Logit(Pijk)=(a0+b0k+c0jk)+b1*X1+b2*X2 i-individual, first level; j-school, 2nd level; k-district,3rd level. I know that the GLIMMIX procedure in the latest version SAS9.2 is a choice for that, but unfortunately we donot have the latest version. R must have the similar functions to do that, can anybody give me some suggestions or help on analysing my dataset? Q1: Which package/functions is appropriate for my task? Could u show me some example codes if possible? Q2: Logit(Pijk)=(a0+b0k+c0jk)+(b1+b1j)*X1+b2*X2 If the randome effect was also specified in X1 as above, Which package/functions is possible? Thanks a lot. -- With Kind Regards, oooO: (..): :\.(:::Oooo:: ::\_)::(..):: :::)./::: ::(_/ : [***] ZhiJie Zhang ,PhD Dept.of Epidemiology, School of Public Health,Fudan University Office:Room 443, Building 8 Office Tel./Fax.:+86-21-54237410 Address:No. 138 Yi Xue Yuan Road,Shanghai,China Postcode:200032 Email:[EMAIL PROTECTED] <[EMAIL PROTECTED]> Website: www.statABC.com [***] oooO: (..): :\.(:::Oooo:: ::\_)::(..):: :::)./::: ::(_/ : [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] memory limit
I'm currently working with very large datasets that consist out of 1,000,000 + rows. Is it at all possible to use R for datasets this size or should I rather consider C++/Java. -- View this message in context: http://www.nabble.com/increasing-memory-limit-in-Windows-Server-2008-64-bit-tp20675880p20699700.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
He might try rcorr from Hmisc instead. Using your test suite, it gives about a 20% improvement on my MacPro: > m1 <- matrix(rnorm(1), ncol=100) > m2 <- matrix(rnorm(1), ncol=100) > Rprof('/tempxx.txt') > system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { rcorr(x,y)$P }) })) user system elapsed 4.221 0.049 4.289 > m1 <- matrix(rnorm(1), ncol=100) > m2 <- matrix(rnorm(1), ncol=100) > Rprof('/tempxx.txt') > system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) })) user system elapsed 5.328 0.038 5.355 I'm not a smart enough programmer to figure out whether there might be an even more efficient method that takes advantage rcorr's implicit "looping" through a set of columns to produce an all combinations return. -- David Winsemius, MD Heritage Labs On Nov 26, 2008, at 9:14 AM, jim holtman wrote: Your time is being taken up in cor.test because you are calling it 100,000 times. So grin and bear it with the amount of work you are asking it to do. Here I am only calling it 100 time: m1 <- matrix(rnorm(1), ncol=100) m2 <- matrix(rnorm(1), ncol=100) Rprof('/tempxx.txt') system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) })) user system elapsed 8.860.008.89 so my guess is that calling it 100,000 times will take: 100,000 * 0.0886 seconds or about 3 hours. If you run Rprof, you will see if is spending most of its time there: 0 8.8 root 1.8.8 apply 2. .8.8 FUN 3. . .8.8 apply 4. . . .8.7 FUN 5. . . . .8.6 cor.test 6. . . . . .8.4 cor.test.default 7. . . . . . .2.4 match.arg 8. . . . . . . .1.7 eval 9. . . . . . . . .1.4 deparse 10. . . . . . . . . .0.6 .deparseOpts 11. . . . . . . . . . .0.2 pmatch 11. . . . . . . . . . .0.1 sum 10. . . . . . . . . .0.5 %in% 11. . . . . . . . . . .0.3 match 12. . . . . . . . . . . .0.3 is.factor 13. . . . . . . . . . . . .0.3 inherits 8. . . . . . . .0.2 formals 9. . . . . . . . .0.2 sys.function 7. . . . . . .2.1 cor 8. . . . . . . .1.1 match.arg 9. . . . . . . . .0.7 eval 10. . . . . . . . . .0.6 deparse 11. . . . . . . . . . .0.3 .deparseOpts 12. . . . . . . . . . . .0.1 pmatch 11. . . . . . . . . . .0.2 %in% 12. . . . . . . . . . . .0.2 match 13. . . . . . . . . . . . .0.1 is.factor 14. . . . . . . . . . . . . .0.1 inherits 9. . . . . . . . .0.1 formals 8. . . . . . . .0.5 stopifnot 9. . . . . . . . .0.2 match.call 8. . . . . . . .0.1 pmatch 8. . . . . . . .0.1 is.data.frame 9. . . . . . . . .0.1 inherits 7. . . . . . .1.5 paste 8. . . . . . . .1.4 deparse 9. . . . . . . . .0.6 .deparseOpts 10. . . . . . . . . .0.3 pmatch 10. . . . . . . . . .0.1 any 9. . . . . . . . .0.6 %in% 10. . . . . . . . . .0.6 match 11. . . . . . . . . . .0.5 is.factor 12. . . . . . . . . . . .0.4 inherits 13. . . . . . . . . . . . .0.2 mode 7. . . . . . .0.4 switch 8. . . . . . . .0.1 qnorm 7. . . . . . .0.2 pt 5. . . . .0.1 $ On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan <[EMAIL PROTECTED]> wrote: My two matrices are roughly the sizes of m1 and m2. I tried using two apply and cor.test to compute the correlation p.values. More than an hour, and the codes are still running. Please help to make it more efficient. m1 <- matrix(rnorm(10), ncol=100) m2 <- matrix(rnorm(1000), ncol=100) cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) }) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Finding Stopping time
Can any one help me to solve problem in my code? I am actually trying to find the stopping index N. So first I generate random numbers from normals. There is no problem in finding the first stopping index. Now I want to find the second stopping index using obeservation starting from the one after the first stopping index. E.g. If my first stopping index was 5. I want to set 6th observation from the generated normal variables as the first random number, and I stop at second stopping index. This is my code, alpha <- 0.05 beta <- 0.07 a <- log((1-beta)/alpha) b <- log(beta/(1-alpha)) theta1 <- 2 theta2 <- 3 cumsm<-function(n) {y<-NULL for(i in 1:n) {y[i]=x[i]^2} s=sum(y) return(s) } psum <- function(p,q) {z <- NULL for(l in p:q) { z[l-p+1] <- x[l]^2} ps <- sum(z) return(ps) } smm <- NULL sm <- NULL N <- NULL Nout <- NULL T <- NULL k<-0 x <- rnorm(100,theta1,theta1) for(i in 1:length(x)) { sm[i] <- psum(1,i) T[i] <- ((i/2)*log(theta1/theta2))+(((theta2-theta1)/(2*theta1*theta2))*sm[i])-(i*(theta2-theta1)/2) if (T[i]<=b | T[i]>=a){N[1]<-i break} } for(j in 2:200) { for(k in (N[j-1]+1):length(x)) { smm[k] <- psum((N[j-1]+1),k) T[k] <- ((k/2)*log(theta1/theta2))+(((theta2-theta1)/(2*theta1*theta2))*smm[k])-(k*(theta2-theta1)/2) if (T[k]<=b | T[k]>=a){N[j]<-k break} } } But I cannot get the stopping index after the first one. Tanks -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eclipse and R
Hi Ruud, I forwarded your message to the StatET (R in Eclipse) list; there might be StatET users with a similar setup as yours on that list (and the StatET developer is more likely to pick up your question there). Best, Tobias >Hello, I am trying to install Eclipse and R on an amd64 machine running >Suse linux 9.3. I have compiled R 2.8.0 with --enable-R-shlib and it >seems that compilation was successfull. After starting R, I installed >the latest rJava package, from the output: >checking whether JRI is requested... yes >cp src/libjri.so libjri.so >It seems JRI support has been compiled successfully. However, when I try >to open R from within Eclipse, I receive an error message: > >Launching the R Console was cancelled, because it seems starting the >Java process/R engine failed. >Please make sure that R package 'rJava' with JRI is installed. > >I can open an R console from the command line, and attach the rJava >library without problems. What am I doing wrong here? >Thanks, Ruud > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Chi-Square Test Disagreement
I was asked by my boss to do an analysis on a large data set, and I am trying to convince him to let me use R rather than SPSS. I think Sweave could make my life much much easier. To get me a little closer to this goal, I ran my analysis through R and SPSS and compared the resulting values. In all but one case, they were the same. Given the matrix [,1] [,2] [1,] 110 358 [2,] 71 312 [3,] 29 139 [4,] 31 77 [5,] 13 32 This is the output from R: > chisq.test(test29) Pearson's Chi-squared test data: test29 X-squared = 9.593, df = 4, p-value = 0.04787 But, the same data in SPSS generates a p value of .051. It's a small but important difference. I played around and rescaled things, and tried different values for B, but I never could get R to reach .051. I'd like to know which program is correct - R or SPSS? I know, this is a biased place to ask such a question. I also appreciate all input that will help me use R more effectively. The difference could be the result of my own ignorance. thanks --andy -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to check linearity in Cox regression
> On examining non-linearity of Cox coefficients with penalized splines - I > have not been able to dig up a completely clear description of the test > performed in R or S-plus. One "iron clad" way to test is to fit a model that has the variable of interest "x" as a linear term, then a second model with splines, and do a likelihood ratio test with 2*(difference in log-likelihood) on (difference in df) degrees of freedom. With a penalized model this test is conservative: the chi-square is not quite the right distribution, the true dist has the same mean but smaller variance. The pspline function uses an evenly spaced set of symmetric basis functions. A neat consequence of this is that the Wald test for linear vs 'more general' is a test that the coefficients of the spline terms fall in a linear series. That is, a linear trend test on the coefficients. This is what coxph does. As with the LR test, the chi-square dist is conservative. I have not worked at putting in the more correct distribution. See Eilers and Marx, Statistical Science 1986. > And what is the null for the non-linear test? The linear test is "is a linear better than nothing", the non-linear one is a sequential test "is the non-linear better than the linear". The second test of course depends on the total number of df you allowed for the pspline fit. As a silly example adding "+ pspline(x, df=200)" would likely show that the nonlinear term was not a significant addition, i.e., not worth 199 more degrees of freedom. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
Your time is being taken up in cor.test because you are calling it 100,000 times. So grin and bear it with the amount of work you are asking it to do. Here I am only calling it 100 time: > m1 <- matrix(rnorm(1), ncol=100) > m2 <- matrix(rnorm(1), ncol=100) > Rprof('/tempxx.txt') > system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, > function(y) { cor.test(x,y)$p.value }) })) user system elapsed 8.860.008.89 > so my guess is that calling it 100,000 times will take: 100,000 * 0.0886 seconds or about 3 hours. If you run Rprof, you will see if is spending most of its time there: 0 8.8 root 1.8.8 apply 2. .8.8 FUN 3. . .8.8 apply 4. . . .8.7 FUN 5. . . . .8.6 cor.test 6. . . . . .8.4 cor.test.default 7. . . . . . .2.4 match.arg 8. . . . . . . .1.7 eval 9. . . . . . . . .1.4 deparse 10. . . . . . . . . .0.6 .deparseOpts 11. . . . . . . . . . .0.2 pmatch 11. . . . . . . . . . .0.1 sum 10. . . . . . . . . .0.5 %in% 11. . . . . . . . . . .0.3 match 12. . . . . . . . . . . .0.3 is.factor 13. . . . . . . . . . . . .0.3 inherits 8. . . . . . . .0.2 formals 9. . . . . . . . .0.2 sys.function 7. . . . . . .2.1 cor 8. . . . . . . .1.1 match.arg 9. . . . . . . . .0.7 eval 10. . . . . . . . . .0.6 deparse 11. . . . . . . . . . .0.3 .deparseOpts 12. . . . . . . . . . . .0.1 pmatch 11. . . . . . . . . . .0.2 %in% 12. . . . . . . . . . . .0.2 match 13. . . . . . . . . . . . .0.1 is.factor 14. . . . . . . . . . . . . .0.1 inherits 9. . . . . . . . .0.1 formals 8. . . . . . . .0.5 stopifnot 9. . . . . . . . .0.2 match.call 8. . . . . . . .0.1 pmatch 8. . . . . . . .0.1 is.data.frame 9. . . . . . . . .0.1 inherits 7. . . . . . .1.5 paste 8. . . . . . . .1.4 deparse 9. . . . . . . . .0.6 .deparseOpts 10. . . . . . . . . .0.3 pmatch 10. . . . . . . . . .0.1 any 9. . . . . . . . .0.6 %in% 10. . . . . . . . . .0.6 match 11. . . . . . . . . . .0.5 is.factor 12. . . . . . . . . . . .0.4 inherits 13. . . . . . . . . . . . .0.2 mode 7. . . . . . .0.4 switch 8. . . . . . . .0.1 qnorm 7. . . . . . .0.2 pt 5. . . . .0.1 $ On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan <[EMAIL PROTECTED]> wrote: > > My two matrices are roughly the sizes of m1 and m2. I tried using two apply > and cor.test to compute the correlation p.values. More than an hour, and the > codes are still running. Please help to make it more efficient. > > m1 <- matrix(rnorm(10), ncol=100) > m2 <- matrix(rnorm(1000), ncol=100) > > cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { > cor.test(x,y)$p.value }) }) > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory limit
Good afternoon, The short answer is "yes", the long answer is "it depends". It all depends on what you want to do with the data, I'm working with dataframes of a couple of million lines, on this plain desktop machine and for my purposes it works fine. I read in text files, manipulate them, convert them into dataframes, do some basic descriptive stats and tests on them, a couple of columns at a time, all quick and simple in R. There are some libraries which are setup to handle very large datasets, e.g. biglm [1]. If you're using algorithms which require vast quantities of memory, then as the previous emails in this thread suggest, you might need R running on 64-bit. If you're working with a problem which is "embarrassingly parallel"[2], then there are a variety of solutions - if you're in between then the solutions are much more data dependant. the flip question: how long would it take you to get up and running with the functionallity (tried and tested in R) you require if you're going to be re-working things in C++? I suggest that you have a look at R, possibly using a subset of your full set to start with - you'll be amazed how quickly you can get up and running. As suggested at the start of this email... "it depends"... Best Regards, Sean O'Riordain Dublin [1] http://cran.r-project.org/web/packages/biglm/index.html [2] http://en.wikipedia.org/wiki/Embarrassingly_parallel iwalters wrote: > > I'm currently working with very large datasets that consist out of > 1,000,000 + rows. Is it at all possible to use R for datasets this size > or should I rather consider C++/Java. > > > -- View this message in context: http://www.nabble.com/increasing-memory-limit-in-Windows-Server-2008-64-bit-tp20675880p20700590.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about Kolmogorov-Smirnov Test
Ricardo Ríos wrote: Hi wizards I have the following code for a Kolmogorov-Smirnov Test: z<-c(1.6,10.3,3.5,13.5,18.4,7.7,24.3,10.7,8.4,4.9,7.9,12,16.2,6.8,14.7) ks.test(z,"pexp",1/10)$statistic The Kolmogorov-Smirnov statistic is: D 0.293383 However, I have calculated the Kolmogorov-Smirnov statistic with the following R code: z<-c(1.6,10.3,3.5,13.5,18.4,7.7,24.3,10.7,8.4,4.9,7.9,12,16.2,6.8,14.7) a<-sort(z) d<- pexp(a, rate = 1/10, lower.tail = TRUE, log.p = FALSE) w=numeric(length = length(a)) for(i in 1:length(a)) w[i]=i/15 max(abs(w-d)) But I have obtained the following result: [1] 0.2267163 Why these results are not equal? w is calculated as follows: w <- (seq(along=a)-1)/length(a) [ {0, ..., n-1} rather than {1, ..., n} ] Uwe Ligges Thanks in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] construct a vector
Thanks, works fine. Armin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] construct a vector
axionator wrote: > Hi all, > I have an unkown number of vectors (>=2) all of the same length. Out > of these, I want to construct a new one as follows: > having vectors u,v and w, the resulting vector z should have entries: > z[1] = u[1], z[2] = v[1], z[3] = w[1] > z[4] = u[2], z[5] = v[2], z[6] = w[2] > ... > i.e. go through the vector u,v,w, take at each time the 1st, 2sd, ... > elements and store them consecutively in z. > Is there an efficient way in R to do this? > > suppose you have your vectors collected into a list, say vs; then the following will do: as.vector(do.call(rbind, vs)) vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple imputation with fit.mult.impute in Hmisc - how to replace NA with imputed value?
Charlie Brush wrote: I am doing multiple imputation with Hmisc, and can't figure out how to replace the NA values with the imputed values. Here's a general ourline of the process: > set.seed(23) > library("mice") > library("Hmisc") > library("Design") > d <- read.table("DailyDataRaw_01.txt",header=T) > length(d);length(d[,1]) [1] 43 [1] 2666 Do for this data set, there are 43 columns and 2666 rows Here is a piece of data.frame d: > d[1:20,4:6] P01 P02 P03 1 0.1 0.16 0.16 2 NA 0.00 0.00 3 NA 0.60 0.04 4 NA 0.15 0.00 5 NA 0.00 0.00 6 0.7 0.00 0.75 7 NA 0.00 0.00 8 NA 0.00 0.00 9 0.0 0.00 0.00 10 0.0 0.00 0.00 11 0.0 0.00 0.00 12 0.0 0.00 0.00 13 0.0 0.00 0.00 14 0.0 0.00 0.00 15 0.0 0.00 0.03 16 NA 0.00 0.00 17 NA 0.01 0.00 18 0.0 0.00 0.00 19 0.0 0.00 0.00 20 0.0 0.00 0.00 These are daily precipitation values at NCDC stations, and NA values at station P01 will be filled using multiple imputation and data from highly correlated stations P02 and P08: > f <- aregImpute(~ I(P01) + I(P02) + I(P08), n.impute=10,match='closest',data=d) Iteration 13 > fmi <- fit.mult.impute( P01 ~ P02 + P08 , ols, f, d) Variance Inflation Factors Due to Imputation: Intercept P02 P08 1.01 1.39 1.16 Rate of Missing Information: Intercept P02 P08 0.01 0.28 0.14 d.f. for t-distribution for Tests of Single Coefficients: Intercept P02 P08 242291.18116.05454.95 > r <- apply(f$imputed$P01,1,mean) > r 2 3 4 5 7 81617 249 250 251 0.002 0.430 0.044 0.002 0.002 0.002 0.002 0.123 0.002 0.002 0.002 252 253 254 255 256 257 258 259 260 261 262 1.033 0.529 1.264 0.611 0.002 0.513 0.085 0.002 0.705 0.840 0.719 263 264 265 266 267 268 269 270 271 272 273 1.489 0.532 0.150 0.134 0.002 0.002 0.002 0.002 0.002 0.055 0.135 274 275 276 277 278 279 280 281 282 283 284 0.009 0.002 0.002 0.002 0.008 0.454 1.676 1.462 0.071 0.002 1.029 285 286 287 288 289 418 419 420 421 422 700 0.055 0.384 0.947 0.002 0.002 0.008 0.759 0.066 0.009 0.002 0.002 -- So far, this is working great. Now, make a copy of d: > dnew <- d And then fill in the NA values in P01 with the values in r For example: > for (i in 1:length(r)){ dnew$P01[r[i,1]] <- r[i,2] } This doesn't work, because each 'piece' of r is two numbers: > r[1] 2 0.002 > r[1,1] Error in r[1, 1] : incorrect number of dimensions My question: how can I separate the the two items in (for example) r[1] to use the first part as an index and the second as a value, and then use them to replace the NA values with the imputed values? Or is there a better way to replace the NA values with the imputed values? Thanks in advance for any help. You didn't state your goal, and why fit.mult.impute does not do what you want. But you can look inside fit.mult.impute to see how it retrieves the imputed values. Also see the example in documentation for transcan in which the command impute(xt, imputation=1) to retrieve one of the multiple imputations. Note that you can say library(Design) (omit the quotes) to access both Design and Hmisc. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] construct a vector
> I have an unkown number of vectors (>=2) all of the same length. Out > of these, I want to construct a new one as follows: > having vectors u,v and w, the resulting vector z should have entries: > z[1] = u[1], z[2] = v[1], z[3] = w[1] > z[4] = u[2], z[5] = v[2], z[6] = w[2] > ... > i.e. go through the vector u,v,w, take at each time the 1st, 2sd, ... > elements and store them consecutively in z. > Is there an efficient way in R to do this? u <- 1:5 v <- (1:5) + 0.1 w <- (1:5) + 0.2 as.vector(rbind(u,v,w)) # [1] 1.0 1.1 1.2 2.0 2.1 2.2 3.0 3.1 3.2 4.0 4.1 4.2 5.0 5.1 5.2 Regards, Richie. Mathematical Sciences Unit HSL ATTENTION: This message contains privileged and confidential inform...{{dropped:20}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] construct a vector
on 11/26/2008 07:11 AM axionator wrote: > Hi all, > I have an unkown number of vectors (>=2) all of the same length. Out > of these, I want to construct a new one as follows: > having vectors u,v and w, the resulting vector z should have entries: > z[1] = u[1], z[2] = v[1], z[3] = w[1] > z[4] = u[2], z[5] = v[2], z[6] = w[2] > ... > i.e. go through the vector u,v,w, take at each time the 1st, 2sd, ... > elements and store them consecutively in z. > Is there an efficient way in R to do this? > > Thanks in advance > Armin Is this what you want? u <- 1:10 v <- 11:20 w <- 21:30 z <- as.vector(rbind(u, v, w)) > z [1] 1 11 21 2 12 22 3 13 23 4 14 24 5 15 25 6 16 26 7 17 27 8 [23] 18 28 9 19 29 10 20 30 Essentially, we are creating a matrix from the 3 vectors: > rbind(u, v, w) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] u12345678910 v 11 12 13 14 15 16 17 18 1920 w 21 22 23 24 25 26 27 28 2930 Then coercing that to a vector, taking advantage of the way in which matrix elements are stored. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] construct a vector
Hi all, I have an unkown number of vectors (>=2) all of the same length. Out of these, I want to construct a new one as follows: having vectors u,v and w, the resulting vector z should have entries: z[1] = u[1], z[2] = v[1], z[3] = w[1] z[4] = u[2], z[5] = v[2], z[6] = w[2] ... i.e. go through the vector u,v,w, take at each time the 1st, 2sd, ... elements and store them consecutively in z. Is there an efficient way in R to do this? Thanks in advance Armin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting density for truncated distribution
Another option mydata <- rnorm(10) mydata <- mydata[mydata>0] plot(density(c(mydata, -mydata), from=0)) If you want the area under the curve to be one, you'll need to double the density estimate dx <- density(c(mydata, -mydata), from=0) dx$y <- dx$y * 2 plot(dx) Chris Jeroen Ooms wrote: > > I am using density() to plot a density curves. However, one of my > variables is truncated at zero, but has most of its density around zero. I > would like to know how to plot this with the density function. > > The problem is that if I do this the regular way density(), values near > zero automatically get a very low value because there are no observed > values below zero. Furthermore there is some density below zero, although > there are no observed values below zero. > > This illustrated the problem: > > mydata <- rnorm(10); > mydata <- mydata[mydata>0]; > plot(density(mydata)); > > the 'real' density is exactly the right half of a normal distribution, so > truncated at zero. However using the default options, the line seems to > decrease with a nice curve at the left, with some density below zero. This > is pretty confusing for the reader. I have tried to decrease the bw, masks > (but does not fix) some of the problem, but than also the rest of the > curve loses smoothness. I would like to make a plot of this data that > looks like the right half of a normal distribution, while keeping the > curve relatively smooth. > > Is there any way to specify this truncation in the density function, so > that it will only use the positive domain to calculate density? > -- View this message in context: http://www.nabble.com/plotting-density-for-truncated-distribution-tp20684995p20699699.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to read .sps (SPSS file extension)?
sorry for the typo, help(importer, package="memisc") will do the trick. Eik Vettorazzi schrieb: maybe the importers of the memisc-package will help you, but I never tried them, see help(importers,package="memisc") At a first glance it seems, that you have to split your file manually, but maybe there is another way. hth. livio finos schrieb: sorry, you are completely right! sps is not the extension for portable file! sorry for the time I make you spend. I try to make my problem more clear. I exporting a dataset from limesurvey (a free software for internet survey). It works very fine and it allow to export in different format such as csv and excel. this fine, but what I like from spss formats is the variables labels. limesurvey declares to export in spss, but it export in sps format which is not a format but a code actually. I realize it just now, sorry. I attach an extract of the code here below. do you have any suggestion on how to manage that? I think it will be great if we can improve the interconnettivity among free software. thanks again.. livio NEW FILE. FILE TYPE NESTED RECORD=1(A). - RECORD TYPE 'A'. - DATA LIST LIST / i0(A1) d1(N3) d2(DATETIME20.0) d3(A15) d4(N2) d5(N1) d6(N1) d7(N1) d8(N1) d9(N1) d10(N1) d11(N1) d12(N1) d13(N1) d14(N1) d15(N1) d16(N1) d17(N2) d18(N1) d19(N1) d20(N1) . - RECORD TYPE 'B'. - DATA LIST LIST / i1(A1) d21(N1) d22(N1) d23(N1) d24(N1) d25(N1) d26(N1) d27(N1) d28(N1) d29(N1) d30(N1) d31(N1) d32(N1) d33(N1) d34(N1) d35(N1) d36(N1) d37(N1) d38(A37) d39(N1) d40(N1) . - RECORD TYPE 'C'. - DATA LIST LIST / i2(A1) d41(N1) d42(N1) d43(N1) d44(N1) d45(N1) d46(N1) d47(N1) d48(N1) d49(N1) d50(N1) d51(N1) d52(N1) d53(N1) d54(N1) d55(N1) d56(N1) d57(N1) d58(N1) d59(N1) d60(N1) . - RECORD TYPE 'D'. - DATA LIST LIST / i3(A1) d61(N1) d62(N1) d63(N1) d64(N1) d65(N1) d66(N1) d67(N1) d68(N1) d69(N1) d70(N1) d71(N1) d72(N1) d73(N1) d74(N1) d75(N1) d76(N1) d77(N1) d78(N1) d79(N1) d80(N1) . - RECORD TYPE 'E'. - DATA LIST LIST / i4(A1) d81(N1) d82(N1) d83(N1) d84(N1) d85(N1) d86(N1) d87(N1) d88(N1) d89(N1) d90(N1) d91(N1) d92(N1) d93(N1) d94(N1) d95(N1) d96(N1) d97(N1) d98(N1) d99(N1) d100(N1) . - RECORD TYPE 'F'. - DATA LIST LIST / i5(A1) d101(N1) d102(N1) d103(N1) d104(N1) d105(N1) d106(N1) d107(N1) d108(N1) d109(N1) . END FILE TYPE. BEGIN DATA A '8' '01-01-1980 00:00:00' 'it' '13' '1' '1' '' '1' '' '1' '1' '1' '1' '1' '4' '0' '' '1' '' '1' B '' '1' '1' '1' '1' '0' '' '0' '' '2' '2' '1' '4' '2' '7' '3' '2' '0''2' '3' C '3' '4' '4' '2' '2' '1' '4' '4' '2' '3' '1' '4' '1' '4' '1' '1' '4' '3' '4' '3' D '1' '3' '1' '1' '1' '1' '1' '1' '1' '1' '1' '1' '2' '2' '2' '2' '3' '3' '3' '2' E '2' '3' '2' '2' '0''0''0''0''0''0''0''3' '4' '4' '2' '1' '5' '2' '5' '4' F '1' '2' '2' '2' '2' '1' '2' '2' '5' A '9' '01-01-1980 00:00:00' 'it' '13' '2' '1' '' '1' '' '0' '' '1' '1' '1' '3' '0' '' '1' '' '1' B '' '0' '' '1' '1' '0' '' '0' '' '2' '2' '1' '4' '3' '7' '3' '2' '0''2' '4' C '3' '4' '4' '3' '3' '3' '4' '3' '2' '2' '1' '3' '1' '4' '1' '4' '4' '4' '4' '3' D '1' '2' '1' '1' '1' '1' '1' '1' '1' '1' '1' '1' '3' '1' '3' '3' '3' '3' '3' '3' E '3' '3' '3' '2' '0''0''0''0''0''0''0''3' '4' '2' '2' '5' '3' '5' '2' '5' F '1' '5' '5' '5' '5' '5' '5' '2' '5' A '10' '01-01-1980 00:00:00' 'it' '13' '1' '1' '' '1' '' '1' '2' '0' '' '0' '' '0' '' '1' '' '1' B '' '1' '2' '0' '' '0' '' '0' '' '1' '2' '1' '4' '6' '7' '3' '2' '0''3' '3' C '4' '4' '4' '4' '3' '4' '4' '3' '2' '2' '1' '4' '1' '4' '1' '3' '4' '4' '4' '1' D '1' '1' '1' '1' '1' '1' '1' '1' '1' '1' '1' '3' '3' '3' '3' '3' '3' '3' '3' '3' E '3' '3' '3' '2' '0''0''0''0''0''0''0''5' '4' '5' '2' '5' '5' '5' '5' '5' F '1' '5' '5' '5' '5' '5' '5' '1' '5' END DATA. EXECUTE. *Define Variable Properties. VARIABLE LABELS d1 'Record ID'. VARIABLE LABELS d2 'Data di completamento'. VARIABLE LABELS d3 'Lingua di partenza'. VARIABLE LABELS d4 'Età :'. VARIABLE LABELS d5 'Sesso:'. VARIABLE LABELS d6 '3 - Papà '. VARIABLE LABELS d7 'Com'è composta la tua famiglia? - COMMENT'. VARIABLE LABELS d8 '3 - Mamma'. VARIABLE LABELS d9 'Com'è composta la tua famiglia? - COMMENT'. VARIABLE LABELS d10 '3 - Fratelli n°'. VARIABLE LABELS d11 'Com'è composta la tua famiglia? - COMMENT'. VARIABLE LABELS d12 '3 - Sorelle n°'. VARIABLE LABELS d13 'Com'è composta la tua famiglia? - COMMENT'. VARIABLE LABELS d14 '3 - Nonni n°'. VARIABLE LABELS d15 'Com'è composta la tua famiglia? - COMMENT'. VARIABLE LABELS d16 '3 - Altre figure parentali (zii, cugini, ecc.) n°'. VARIABLE LABELS d17 'Com'è composta la tua famiglia? - COMMENT'. VARIABLE LABELS d18 '4 - Papà '. VARIABLE LABELS d19 'Quali di queste persone vivono in casa con te? - COMMENT'. VARIABLE LABELS d20 '4 - Mamma'. VARIABLE LABELS d21 'Quali di queste persone vivono in casa con te? - COMMENT'. VARIABLE LABELS d22 '4 - Fratelli n°'. VARIABLE LABELS d23 'Quali di queste persone vivono in casa con te? - COMMENT'. VARIABLE LABELS d24 '4 - Sorelle n°'. VARIABLE
Re: [R] how to read .sps (SPSS file extension)?
maybe the importers of the memisc-package will help you, but I never tried them, see help(importers,package="memisc") At a first glance it seems, that you have to split your file manually, but maybe there is another way. hth. livio finos schrieb: sorry, you are completely right! sps is not the extension for portable file! sorry for the time I make you spend. I try to make my problem more clear. I exporting a dataset from limesurvey (a free software for internet survey). It works very fine and it allow to export in different format such as csv and excel. this fine, but what I like from spss formats is the variables labels. limesurvey declares to export in spss, but it export in sps format which is not a format but a code actually. I realize it just now, sorry. I attach an extract of the code here below. do you have any suggestion on how to manage that? I think it will be great if we can improve the interconnettivity among free software. thanks again.. livio NEW FILE. FILE TYPE NESTED RECORD=1(A). - RECORD TYPE 'A'. - DATA LIST LIST / i0(A1) d1(N3) d2(DATETIME20.0) d3(A15) d4(N2) d5(N1) d6(N1) d7(N1) d8(N1) d9(N1) d10(N1) d11(N1) d12(N1) d13(N1) d14(N1) d15(N1) d16(N1) d17(N2) d18(N1) d19(N1) d20(N1) . - RECORD TYPE 'B'. - DATA LIST LIST / i1(A1) d21(N1) d22(N1) d23(N1) d24(N1) d25(N1) d26(N1) d27(N1) d28(N1) d29(N1) d30(N1) d31(N1) d32(N1) d33(N1) d34(N1) d35(N1) d36(N1) d37(N1) d38(A37) d39(N1) d40(N1) . - RECORD TYPE 'C'. - DATA LIST LIST / i2(A1) d41(N1) d42(N1) d43(N1) d44(N1) d45(N1) d46(N1) d47(N1) d48(N1) d49(N1) d50(N1) d51(N1) d52(N1) d53(N1) d54(N1) d55(N1) d56(N1) d57(N1) d58(N1) d59(N1) d60(N1) . - RECORD TYPE 'D'. - DATA LIST LIST / i3(A1) d61(N1) d62(N1) d63(N1) d64(N1) d65(N1) d66(N1) d67(N1) d68(N1) d69(N1) d70(N1) d71(N1) d72(N1) d73(N1) d74(N1) d75(N1) d76(N1) d77(N1) d78(N1) d79(N1) d80(N1) . - RECORD TYPE 'E'. - DATA LIST LIST / i4(A1) d81(N1) d82(N1) d83(N1) d84(N1) d85(N1) d86(N1) d87(N1) d88(N1) d89(N1) d90(N1) d91(N1) d92(N1) d93(N1) d94(N1) d95(N1) d96(N1) d97(N1) d98(N1) d99(N1) d100(N1) . - RECORD TYPE 'F'. - DATA LIST LIST / i5(A1) d101(N1) d102(N1) d103(N1) d104(N1) d105(N1) d106(N1) d107(N1) d108(N1) d109(N1) . END FILE TYPE. BEGIN DATA A '8' '01-01-1980 00:00:00' 'it' '13' '1' '1' '' '1' '' '1' '1' '1' '1' '1' '4' '0' '' '1' '' '1' B '' '1' '1' '1' '1' '0' '' '0' '' '2' '2' '1' '4' '2' '7' '3' '2' '0''2' '3' C '3' '4' '4' '2' '2' '1' '4' '4' '2' '3' '1' '4' '1' '4' '1' '1' '4' '3' '4' '3' D '1' '3' '1' '1' '1' '1' '1' '1' '1' '1' '1' '1' '2' '2' '2' '2' '3' '3' '3' '2' E '2' '3' '2' '2' '0''0''0''0''0''0''0''3' '4' '4' '2' '1' '5' '2' '5' '4' F '1' '2' '2' '2' '2' '1' '2' '2' '5' A '9' '01-01-1980 00:00:00' 'it' '13' '2' '1' '' '1' '' '0' '' '1' '1' '1' '3' '0' '' '1' '' '1' B '' '0' '' '1' '1' '0' '' '0' '' '2' '2' '1' '4' '3' '7' '3' '2' '0''2' '4' C '3' '4' '4' '3' '3' '3' '4' '3' '2' '2' '1' '3' '1' '4' '1' '4' '4' '4' '4' '3' D '1' '2' '1' '1' '1' '1' '1' '1' '1' '1' '1' '1' '3' '1' '3' '3' '3' '3' '3' '3' E '3' '3' '3' '2' '0''0''0''0''0''0''0''3' '4' '2' '2' '5' '3' '5' '2' '5' F '1' '5' '5' '5' '5' '5' '5' '2' '5' A '10' '01-01-1980 00:00:00' 'it' '13' '1' '1' '' '1' '' '1' '2' '0' '' '0' '' '0' '' '1' '' '1' B '' '1' '2' '0' '' '0' '' '0' '' '1' '2' '1' '4' '6' '7' '3' '2' '0''3' '3' C '4' '4' '4' '4' '3' '4' '4' '3' '2' '2' '1' '4' '1' '4' '1' '3' '4' '4' '4' '1' D '1' '1' '1' '1' '1' '1' '1' '1' '1' '1' '1' '3' '3' '3' '3' '3' '3' '3' '3' '3' E '3' '3' '3' '2' '0''0''0''0''0''0''0''5' '4' '5' '2' '5' '5' '5' '5' '5' F '1' '5' '5' '5' '5' '5' '5' '1' '5' END DATA. EXECUTE. *Define Variable Properties. VARIABLE LABELS d1 'Record ID'. VARIABLE LABELS d2 'Data di completamento'. VARIABLE LABELS d3 'Lingua di partenza'. VARIABLE LABELS d4 'Età :'. VARIABLE LABELS d5 'Sesso:'. VARIABLE LABELS d6 '3 - Papà '. VARIABLE LABELS d7 'Com'è composta la tua famiglia? - COMMENT'. VARIABLE LABELS d8 '3 - Mamma'. VARIABLE LABELS d9 'Com'è composta la tua famiglia? - COMMENT'. VARIABLE LABELS d10 '3 - Fratelli n°'. VARIABLE LABELS d11 'Com'è composta la tua famiglia? - COMMENT'. VARIABLE LABELS d12 '3 - Sorelle n°'. VARIABLE LABELS d13 'Com'è composta la tua famiglia? - COMMENT'. VARIABLE LABELS d14 '3 - Nonni n°'. VARIABLE LABELS d15 'Com'è composta la tua famiglia? - COMMENT'. VARIABLE LABELS d16 '3 - Altre figure parentali (zii, cugini, ecc.) n°'. VARIABLE LABELS d17 'Com'è composta la tua famiglia? - COMMENT'. VARIABLE LABELS d18 '4 - Papà '. VARIABLE LABELS d19 'Quali di queste persone vivono in casa con te? - COMMENT'. VARIABLE LABELS d20 '4 - Mamma'. VARIABLE LABELS d21 'Quali di queste persone vivono in casa con te? - COMMENT'. VARIABLE LABELS d22 '4 - Fratelli n°'. VARIABLE LABELS d23 'Quali di queste persone vivono in casa con te? - COMMENT'. VARIABLE LABELS d24 '4 - Sorelle n°'. VARIABLE LABELS d25 'Quali di queste persone vivono in casa con te? - COMMENT'. VARIABLE LABELS d26 '4 - N
Re: [R] Running rtest - how to/ help
Well after trying a couple of things - rtest.java example with command line argument being --zero-init this is the error Creating Rengine (with arguments) Rengine created, waiting for R # # An unexpected error has been detected by Java Runtime Environment: # # EXCEPTION_ACCESS_VIOLATION (0xc005) at pc=0x6c733b9d, pid=3640, tid=5016 # # Java VM: Java HotSpot(TM) Client VM (10.0-b19 mixed mode, sharing windows-x86) # Problematic frame: # C [R.dll+0x33b9d] # # An error report file with more information is saved as: # C:\workspaceVE\XTest\hs_err_pid3640.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug. # 2. while for rtest2.java it remains the same that is terminates after Creating Rengine (with arguments) But difference being some window/ frame which opens for a fraction of a second __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Efficient passing through big data.frame and modifying select
Marvelous! Thanks guys for your hints and time! Very smooth now! Joh On Wednesday 26 November 2008 03:41:49 Henrik Bengtsson wrote: > Alright, here are another $.02: using 'use.names=FALSE' in unlist() is > much faster than the default 'use.names=TRUE'. /Henrik > > On Tue, Nov 25, 2008 at 6:40 PM, Henrik Bengtsson <[EMAIL PROTECTED]> wrote: > > My $.02: Using argument 'fixed=TRUE' in strsplit() is much faster than > > the default 'fixed=FALSE'. /Henrik > > > > On Tue, Nov 25, 2008 at 1:02 PM, William Dunlap <[EMAIL PROTECTED]> wrote: > >>> -Original Message- > >>> From: William Dunlap > >>> Sent: Tuesday, November 25, 2008 9:16 AM > >>> To: '[EMAIL PROTECTED]' > >>> Subject: Re: [R] Efficient passing through big data.frame and > >>> modifying select fields > >>> > >>> > Johannes Graumann johannes_graumann at web.de > >>> > Tue Nov 25 15:16:01 CET 2008 > >>> > > >>> > Hi all, > >>> > > >>> > I have relatively big data frames (> 1 rows by 80 columns) > >>> > that need to be exposed to "merge". Works marvelously well in > >>> > general, but some fields of the data frames actually contain > >>> > multiple ";"-separated values encoded as a character string without > >>> > defined order, which makes the fields not match each other. > >>> > > >>> > Example: > >>> > > frame1[1,1] > >>> > > >>> > [1] "some;thing" > >>> > > >>> > >frame2[2,1] > >>> > > >>> > [2] "thing;some" > >>> > > >>> > In order to enable merging/duplicate identification of columns > >>> > containing these strings, I wrote the following function, which > >>> > passes through the rows one by one, identifies ";"-containing cells, > >>> > splits and resorts them. > >>> > > >>> > ResortCombinedFields <- function(dframe){ > >>> > if(!is.data.frame(dframe)){ > >>> >stop("\"ResortCombinedFields\" input needs to be a data frame.") > >>> > } > >>> > for(row in seq(nrow(dframe))){ > >>> >for(mef in grep(";",dframe[row,])){ > >>> > >>> I needed to add drop=TRUE to the above dframe[row,] for this to work. > >>> > >>> > dframe[row,mef] <- > >>> > >>> paste(sort(unlist(strsplit(dframe[row,mef],";"))),collapse=";") > >>> > >>> >} > >>> > } > >>> > return(dframe) > >>> > } > >>> > > >>> > works fine, but is horribly inefficient. How might this be > >>> > >>> tackled more elegantly? > >>> > >>> > Thanks for any input, Joh > >>> > >>> It is usually faster to loop over columns of an data frame and use row > >>> subscripting, if needed, on individual columns. E.g., the following > >>> 2 are much quicker on a sample 1000 by 4 dataset I made with > >>> > >>> dframe<-data.frame(lapply(c(One=1,Two=2,Three=3), > >>>function(i)sapply(1:1000, > >>> function(i) > >>> > >>> paste(sample(LETTERS[1:5],size=sample(3,size=1),repl=FALSE), > >>> collapse=";"))), > >>>stringsAsFactors=FALSE) > >>> dframe$Four<-sample(LETTERS[1:5], size=nrow(dframe), > >>> replace=TRUE) # no ;'s in column Four > >>> > >>> The first function, f1, doesn't try to find which rows may > >>> need adjusting > >>> and the second, f2, does. > >>> > >>> f1 <- function(dframe){ > >>> if(!is.data.frame(dframe)){ > >>> stop("\"ResortCombinedFields\" input needs to be a data frame.") > >>> } > >>> for(icol in seq_len(ncol(dframe))){ > >>> dframe[,icol] <- unlist(lapply(strsplit(dframe[,icol], > >>> ";"), function(parts) paste(sort(parts), collapse=";"))) > >>> } > >>> return(dframe) > >>> } > >>> > >>> f2 <- > >>> function(dframe){ > >>> if(!is.data.frame(dframe)){ > >>> stop("\"ResortCombinedFields\" input needs to be a data frame.") > >>> } > >>> for(icol in seq_len(ncol(dframe))){ > >>> col <- dframe[,icol] > >>> irow <- grep(";", col) > >>> if (length(irow)) { > >>> col[irow] <- unlist(lapply(strsplit(col[irow], ";"), > >>> function(parts) paste(sort(parts), collapse=";"))) > >>> dframe[,icol] <- col > >>> } > >>> } > >>> return(dframe) > >>> } > >>> > >>> Times were > >>> > >>> > unix.time(z<-ResortCombinedFields(dframe)) > >>> > >>>user system elapsed > >>> 2.526 0.022 2.559 > >>> > >>> > unix.time(f1z<-f1(dframe)) > >>> > >>>user system elapsed > >>> 0.509 0.000 0.508 > >>> > >>> > unix.time(f2z<-f2(dframe)) > >>> > >>>user system elapsed > >>> 0.259 0.004 0.264 > >>> > >>> > identical(z, f1z) > >>> > >>> [1] TRUE > >>> > >>> > identical(z, f2z) > >>> > >>> [1] TRUE > >> > >> In R 2.7.0 (April 2008) f1() and f2() both take time proportional > >> to nrow(dframe), while your original ResortCombinedFields() takes > >> time proportional to the square of nrow(dframe). E.g., for 50,000 > >> rows ResortCombinedFields takes 4252 seconds while f2 takes 14 seconds > >> It looks like 2.9 acts about the same. > >> > >> Bill Dunlap > >> TIBCO Software Inc - Spotfire Division > >> wdunlap tibco.com > >> > >> __ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the postin
Re: [R] How to measure the significant difference between two time series
Q1: Quick answer a) you need to remove the seasonality - there s/b a tool in the time series package to do this - though I'm not familar enough with R to know this. b) check the resulting series to see if it is stationary - acf decays quickly i.e. within a couple of lags. c) if two series are stationary then overlay the acf plots and if second plot lies within the CI of the first they are the same. d) on the seasonal parts of the two series do regression on year vs b1*q1+ b2*q2 + b3*q3 + b4*q4 and if the coefficients of the second are within the 2 Se band of the first - seasonality is similar. if c) and d) are not signif - the two series are similar. Also, standardise all your series so that they have mean 0 (or 1) and variance 1 before you start - it'll make the numerics and comparisons better. Also, before starting you may need to take logs if the series oscillations increase with time. Q2 - don't know what you mean by interannual variation - within year possibly. Hope this helps. Gerard "Bin Zhao" <[EMAIL PROTECTED] l.com> To Sent by: "r-help" [EMAIL PROTECTED] cc project.org Subject [R] How to measure the significant 24/11/2008 04:13 difference between two time series Please respond to [EMAIL PROTECTED] .com Dear R experts and statisticians, I have some time series datasets, they are several years vegetation indices (about 50 data points per year) sampled from different station. These indices have similar dynamics with seasonal change. My questions are, 1) How can I compare the difference among the indices, and how can I say there is significant differnce between two time series. They don't have the same values in the same date, but they may have the similar tendency and change curves? 2) I also want to compare the interannual variation of the vegetation index in the same station. Of course, the index can be separated into several sub-series, i.e., one year one sub-series. Now I have the same question as 1), how can I compare the significant differnce between different year? If there are mature theoretical framework can resolve my questions, is there any package in R can do this? Any further advice is highly appreciated. Regards, Bin Zhao 2008-11-24 , 10:53:58 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ** The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer. It is the policy of the Department of Justice, Equality and Law Reform and the Agencies and Offices using its IT services to disallow the sending of offensive material. Should you consider that the material contained in this message is offensive you should contact the sender immediately and also mailminder[at]justice.ie. Is le haghaidh an duine nó an eintitis ar a bhfuil sí dírithe, agus le haghaidh an duine nó an eintitis sin amháin, a bheartaítear an fhaisnéis a tarchuireadh agus féadfaidh sé go bhfuil ábhar faoi rún agus/nó faoi phribhléid inti. Toirmisctear aon athbhreithniú, atarchur nó leathadh a dhéanamh ar an bhfaisnéis seo, aon úsáid eile a bhaint aisti nó aon ghníomh a dhéanamh ar a hiontaoibh, ag daoine nó ag eintitis seachas an faighteoir beartaithe. Má fuair tú é seo trí dhearmad, téigh i dteagmháil leis an seoltóir, le do thoil, agus s
[R] Running rtest - how to/ help
I am trying to run rTest example (XP, Eclipse, ) available with code. JRI etc are used from the rJava package (i.e. Haven't built myself) The code dosent runs properly - it shows ( i tried passing argument as --save). what else is possible. 'Creating Rengine (with arguments)' and then gets terminated. When I am trying to use run.bat >run.bat examples\rtest.java it says > '-cp' is not recognized as an internal or external command C:\Documents and Settings\shlahoti\Desktop\JRI_0.4-1\JRI>set PATH=C:\WINDOWS\sys tem32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\system32\;C:\WINDOWS\system 32\Wbem;C:\Program Files\WinZip;C:\Program Files\Visual Networks\Dial Analysis\; c:\Program Files\Common Files\Roxio Shared\DLLShared\;C:\Program Files\Windows I maging\;C:\Program Files\Subversion\bin;"C:\Graphviz2.16\Bin";C:\jEdit;C:\QuickT ime\QTSystem\;C:\php;C:\Program Files\R\R-2.8.0\library\rJava\jri\;C:\Program Fi les\R\R-2.8.0\bin\;\bin;\lib;\bin;\lib;\bin;\lib;\bin;\lib;\bin;\lib;\bin;\lib C:\Documents and Settings\shlahoti\Desktop\JRI_0.4-1\JRI>-cp .;examples;src/JRI. jar rtest $ '-cp' is not recognized as an internal or external command, operable program or batch file. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] eclipse and R
Hello, I am trying to install Eclipse and R on an amd64 machine running Suse linux 9.3. I have compiled R 2.8.0 with --enable-R-shlib and it seems that compilation was successfull. After starting R, I installed the latest rJava package, from the output: checking whether JRI is requested... yes cp src/libjri.so libjri.so It seems JRI support has been compiled successfully. However, when I try to open R from within Eclipse, I receive an error message: Launching the R Console was cancelled, because it seems starting the Java process/R engine failed. Please make sure that R package 'rJava' with JRI is installed. I can open an R console from the command line, and attach the rJava library without problems. What am I doing wrong here? Thanks, Ruud __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in sqlCopy in RODBC
BKMooney wrote: > > I am trying to copy portions of tables from one SQL database to another, > using sqlCopy in the RODBC package. > > ... > I am currently getting an error: > Error in sqlSave(destchannel, dataset, destination, verbose = verbose, : > table 'LocalTable' already exists > I can reproduce your error with my example file and fast=TRUE. You might try fast=FALSE and append=TRUE when things like this happens. The following works for me library(RODBC) channel = odbcConnectAccess("db.mdb") sqlCopy(channel,"Select * from tab","newtab",destchannel=channel, safer=TRUE,append=TRUE,rownames=FALSE,fast=FALSE) odbcClose(channel) -- View this message in context: http://www.nabble.com/Error-in-sqlCopy-in-RODBC-tp20691929p20697101.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LaTeX and R-scripts/-results
On Wed, Nov 26, 2008 at 08:50:33AM +0100, Oliver Bandel wrote: > at some places I read about good interaction of > LaTeX and R. > > Can you give me a starting point, where I can find > information about it? Have a look at these: Sweave() xtable()(xtable) latex() (Hmisc) cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multiple imputation with fit.mult.impute in Hmisc - how to replace NA with imputed value?
I am doing multiple imputation with Hmisc, and can't figure out how to replace the NA values with the imputed values. Here's a general ourline of the process: > set.seed(23) > library("mice") > library("Hmisc") > library("Design") > d <- read.table("DailyDataRaw_01.txt",header=T) > length(d);length(d[,1]) [1] 43 [1] 2666 Do for this data set, there are 43 columns and 2666 rows Here is a piece of data.frame d: > d[1:20,4:6] P01 P02 P03 1 0.1 0.16 0.16 2 NA 0.00 0.00 3 NA 0.60 0.04 4 NA 0.15 0.00 5 NA 0.00 0.00 6 0.7 0.00 0.75 7 NA 0.00 0.00 8 NA 0.00 0.00 9 0.0 0.00 0.00 10 0.0 0.00 0.00 11 0.0 0.00 0.00 12 0.0 0.00 0.00 13 0.0 0.00 0.00 14 0.0 0.00 0.00 15 0.0 0.00 0.03 16 NA 0.00 0.00 17 NA 0.01 0.00 18 0.0 0.00 0.00 19 0.0 0.00 0.00 20 0.0 0.00 0.00 These are daily precipitation values at NCDC stations, and NA values at station P01 will be filled using multiple imputation and data from highly correlated stations P02 and P08: > f <- aregImpute(~ I(P01) + I(P02) + I(P08), n.impute=10,match='closest',data=d) Iteration 13 > fmi <- fit.mult.impute( P01 ~ P02 + P08 , ols, f, d) Variance Inflation Factors Due to Imputation: Intercept P02 P08 1.01 1.39 1.16 Rate of Missing Information: Intercept P02 P08 0.01 0.28 0.14 d.f. for t-distribution for Tests of Single Coefficients: Intercept P02 P08 242291.18116.05454.95 > r <- apply(f$imputed$P01,1,mean) > r 2 3 4 5 7 81617 249 250 251 0.002 0.430 0.044 0.002 0.002 0.002 0.002 0.123 0.002 0.002 0.002 252 253 254 255 256 257 258 259 260 261 262 1.033 0.529 1.264 0.611 0.002 0.513 0.085 0.002 0.705 0.840 0.719 263 264 265 266 267 268 269 270 271 272 273 1.489 0.532 0.150 0.134 0.002 0.002 0.002 0.002 0.002 0.055 0.135 274 275 276 277 278 279 280 281 282 283 284 0.009 0.002 0.002 0.002 0.008 0.454 1.676 1.462 0.071 0.002 1.029 285 286 287 288 289 418 419 420 421 422 700 0.055 0.384 0.947 0.002 0.002 0.008 0.759 0.066 0.009 0.002 0.002 -- So far, this is working great. Now, make a copy of d: > dnew <- d And then fill in the NA values in P01 with the values in r For example: > for (i in 1:length(r)){ dnew$P01[r[i,1]] <- r[i,2] } This doesn't work, because each 'piece' of r is two numbers: > r[1] 2 0.002 > r[1,1] Error in r[1, 1] : incorrect number of dimensions My question: how can I separate the the two items in (for example) r[1] to use the first part as an index and the second as a value, and then use them to replace the NA values with the imputed values? Or is there a better way to replace the NA values with the imputed values? Thanks in advance for any help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LaTeX and R-scripts/-results
Oliver Bandel wrote: Hello, at some places I read about good interaction of LaTeX and R. Can you give me a starting point, where I can find information about it? Are there special LaTeX-packages for the support, or does R have packages for support of LaTeX? Or will an external Code-Generator be used? TIA, Oliver Hi Oliver, you are right, LaTeX and R are perfect companions. Look for "Sweave"(*). You find an introduction of Fritz Leisch in R-News 2002, Vol 2/3: http://cran.r-project.org/doc/Rnews/Rnews_2002-3.pdf and an entire homepage about it: http://www.statistik.lmu.de/~leisch/Sweave/ HTH Thomas P. (*) Many thanks to Friedrich Leisch for this great peace of software! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to create a string containing '\/' to be used with SED?
Good morning, You do not need to quote a forward slash / in R, but you do need to quote a backslash when you're inputting it... so to get a string which actually contains "blah\/blah"... you need to use "blah\\/blah" http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-does-backslash-behave-strangely-inside-strings_003f Unless this is a very very big file you shouldn't need to go out to sed, as gsub() should work adequately... and probably quicker and cleaner. So something along the lines of.. (UNTESTED!!! since I don't have a reproduceable example) tmp1 <- readLines(configurationFile) tmp1 <- gsub("^instance .*", paste("instance = ", data$instancePath, "/", data$newInstance, sep = ""), tmp1) I'm working on 50mb text files, and doing all sorts of manipulations and I do it all inside R under windows XP... reading a 50mb text file across the 100mb network and doing a gsub() on most lines takes an elapsed 16 seconds on this office desktop. hth... Regards, Sean ikarus wrote: > > Hi guys, > I've been struggling to find a solution to the following issue: > I need to change strings in .ini files that are given in input to a > program whose output is processed by R. The strings to be changed looks > like: > "instance = /home/TSPFiles/TSPLIB/berlin52.tsp" > > I normally use Sed for this kind of things. So, inside R I'd like to write > something like: > > command <- paste("sed -i 's/^instance .*/instance = ", data$instancePath, >data$newInstance, "/' ", configurationFile, sep = "") > system(command) > > This will overwrite the line starting with "instance " using "instance = > the_new_instance" > In the example I gave, data$instancePath = /home/TSPFiles/TSPLIB/ and > data$newInstance = berlin52.tsp > > The problem is that I need to pass the above path string to sed in the > form: > "\/home\/TSPFiles\/TSPLIB\/" > > However, I couldn't find a way to "create" such a string in R. I tried in > several different ways, > but it always complains saying that '\/' is an unrecognized escape! > > Any suggestion? > > Thanks! > -- View this message in context: http://www.nabble.com/How-to-create-a-string-containing-%27%5C-%27-to-be-used-with-SED--tp20694319p20696613.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.