Re: [R] The KJV
On 07-Feb-10 01:06:40, Ben Bolker wrote: Jim Lemon jim at bitwrit.com.au writes: On 02/06/2010 06:57 PM, Charlotte Maia wrote: Hey all, Does anyone know if there are any R packages with a copy of the KJV? I'm guessing the answer is no... So the next question, and the more important one is: Does anyone think it would be useful (e.g. for text-mining purposes)? I know almost nothing about theology, so I'm not sure what kind of questions theologists might have (that R could answer). An alternative, that would achieve a similar result (I think), would be an R interface to another open source system, such as Sword. Hi Charlotte, Try http://www.gutenberg.org/etext/10 Jim I couldn't help it: x - url(http://www.gutenberg.org/dirs/etext90/kjv10.txt,open=r;) X - readLines(x,n=2) z - grep(First Book of Moses,X) X - X[-(1:z)] X - X[nchar(X)0] length(X) ## 15058 words - tolower(unlist(strsplit(X,[ .,:;()]))) words2 - grep([^0-9],words,value=TRUE) tt - rev(sort(table(words2))) barplot(rev(tt[1:100]),horiz=TRUE,las=1,cex.names=0.4,log=x) Delightful! And fascinating in the detail too. length(tt) # [1] 5078 with slight changes like: barplot(rev(tt[1:50]),horiz=TRUE,las=1,cex.names=0.6,log=x) # ... barplot(rev(tt[101:150]),horiz=TRUE,las=1,cex.names=0.6,log=x) # ... and see the likes of tt[lord] # lord # 1939 tt[god] # god # 822 tt[men] # men # 204 tt[women] # women #26 I'm now wondering how it matches up with Zipf's Law (or perhaps Fisher's logarithmic ... ) Thanks, Ben! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 07-Feb-10 Time: 08:28:30 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] embedFonts with pdf files and Windows 7
Do your systems actually have the fonts you are trying to embed? I doubt it: Helvetica is a commercial font, and most likely the Linux system is embedding a substitute. It would be better to do pdf(test.pdf, family=NimbusSan, useDingbats=FALSE) plot(matrix(rnorm(200),nc=2)) dev.off() myCall - embedFonts(test.pdf,outfile = test-a.pdf) This is what ?pdf says you should expect: Since ‘embedFonts’ makes use of Ghostscript, it should be able to embed the URW-based families for use with other viewers. If that does not work, you need to get help with your Ghostscript installation (it is all to do with how it is set up to handle font substitution, which the above should avoid). On Sat, 6 Feb 2010, James M. Curran wrote: I am trying to embed fonts in my PDF images so that they are embedded for the publisher of my book. I am running: Windows 7 - 64 Enterprise R 2.10.1 Ghostscript 8.70 Ghostview 4.9 MiKTeX 2.8 I have this tiny test script: pdf(test.pdf) plot(matrix(rnorm(200),nc=2)) graphics.off() myCall = embedFonts(test.pdf,outfile = test-a.pdf) which successfully issues this command to ghostscript: myCall [1] gswin32c.exe -dNOPAUSE -dBATCH -q -dAutoRotatePages=/None -sDEVICE=pdfwrite -sOutputFile=C:\\Users\\curran\\AppData\\Local\\Temp\\RtmpSkHosh\\Rembed136f65f3 -sFONTPATH= test.pdf The file test.pdf is about 9kb with no fonts embedded. The file test-a.pdf is about 4kb with no fonts embedded. I have tried altering the options: options = -dEmbedAllFonts=true, and the font path fontpath = C:\\Windows\\Fonts To no avail. The only way I can get embedFonts to work is to shift the work over to our Linux system. Any help would be greatly appreciated. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there an R implementation for the Barnard's exact test (a substitute for fisher.test) ?
Hello all, After almost half a year, I received a friendly e-mail from Peter Calhoun, sharing his R implementation of Barnard's exact test. With his permission, I posted his code here: http://www.r-statistics.com/2010/02/barnards-exact-test-a-powerful-alternative-for-fishers-exact-test-implemented-in-r/http://www.r-statistics.com/2010/02/barnards-exact-test-a-non-parametric-alternative-for-fishers-exact-test-implemented-in-r/ I hope others will find it useful. Please note that the code is not as fast as could be. If someone would wish to give a faster version of the code, please let me know and I'll gladly post it. Cheers, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Sun, Jul 26, 2009 at 2:09 PM, Tal Galili tal.gal...@gmail.com wrote: Hello R help members. I came across today with an article on Barnard's exact test (http://www.cytel.com/Papers/twobinomials.pdf), that is supposed to give a more powerful fisher.test - Because it doesn't assume that we know the row and column totals are in advance. Any pointers to such a function ? Thanks, Tal -- -- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: http://www.r-statistics.com/ http://www.talgalili.com http://www.biostatistics.co.il [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The KJV
On Sun, Feb 7, 2010 at 8:28 AM, Ted Harding ted.hard...@manchester.ac.uk wrote: Delightful! And fascinating in the detail too. length(tt) # [1] 5078 with slight changes like: barplot(rev(tt[1:50]),horiz=TRUE,las=1,cex.names=0.6,log=x) # ... barplot(rev(tt[101:150]),horiz=TRUE,las=1,cex.names=0.6,log=x) # ... and see the likes of tt[lord] # lord # 1939 tt[god] # god # 822 tt[men] # men # 204 tt[women] # women # 26 I'm now wondering how it matches up with Zipf's Law (or perhaps Fisher's logarithmic ... ) Thanks, Ben! I'm wondering if someone is now going to write an R package to look for 'bible codes': http://en.wikipedia.org/wiki/Bible_code it's all in there: http://www.biblecodewisdom.com/code/model-goodness-fit-test Barry -- blog: http://geospaced.blogspot.com/ web: http://www.maths.lancs.ac.uk/~rowlings web: http://www.rowlingson.com/ twitter: http://twitter.com/geospacedman pics: http://www.flickr.com/photos/spacedman __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The KJV
On 07-Feb-10 12:49:23, Barry Rowlingson wrote: On Sun, Feb 7, 2010 at 8:28 AM, Ted Harding ted.hard...@manchester.ac.uk wrote: Delightful! And fascinating in the detail too. _length(tt) _# [1] 5078 with slight changes like: _barplot(rev(tt[1:50]),horiz=TRUE,las=1,cex.names=0.6,log=x) _# ... _barplot(rev(tt[101:150]),horiz=TRUE,las=1,cex.names=0.6,log=x) _# ... and see the likes of _tt[lord] _# lord _# 1939 _tt[god] _# god _# 822 _tt[men] _# men _# 204 _tt[women] _# women _# _ _26 I'm now wondering how it matches up with Zipf's Law (or perhaps Fisher's logarithmic ... ) Thanks, Ben! I'm wondering if someone is now going to write an R package to look for 'bible codes': http://en.wikipedia.org/wiki/Bible_code it's all in there: http://www.biblecodewisdom.com/code/model-goodness-fit-test Barry Barry, these things can become distracting! Like the Weighing Pennies Problem (given N pennies, one of which has a different weight from all the others, and a two-pan balance, what is the minimum nmber of weighings required to determine which is the one with the different weight?). With reference to the work of British Defence scientists during World War II: It was said that the 'weighing-pennies' problem wasted 10,000 scientist-hours of war-work, and that there was a proposal to drop it over Germany. [page 155 of the Bollobás edition of Littlewood's A Mathematician's Miscellany]. And now, Baz, you come up with Bible Codes ... Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 07-Feb-10 Time: 13:47:09 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] predicting with stl() decomposition
Hi mailinglist members, Im actually working on a time series prediction and my current approach is to decompose the series first into a trend, a seasonal component and a remainder. Therefore Im using the stl() function. But Im wondering how to get the single components in order to predict the particular fitted series. This code snippet illustrates my problem: series - vector(length=300) noise - rnorm(300,0,2) time - 1:300 series[1] - noise[1] for(i in 3:300){ series[i] - 0.5*series[i-1]+ noise[i] + 0.01*time[i] } seriesTs - ts(series, start=c(1980,1), frequency=12) decomp - stl(seriesTs ,periodic) plot(decomp) llrSaison - loess(seriesTs~time , span=decomp$win[1] , degree=decomp$deg[1]) llrTrend - loess(seriesTs~time, span=decomp$win[2] , degree=decomp$deg[2]) plot(llrSaison$fitted) The last plot differs much from the seasonal plot in the plot(decomp) call. This is why the llr estimator doesnt extract the seasonal component, but how can I predict the single components at last? Or is there a function which can predict the values of the stl-object. Predict() doesnt work, Ive already tried it. All the best, Konrad Hoppe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] optimized R-selection and R-replacement inside a matrix need, strings coerced to factors
Hello, I need to modify some huge arrays (2000 individuals x 50 000 variables). To format the data, I think I should benefit from optimized R-selection and R-replacement inside a matrix and prohibite a naive use of loops. Thank you in advance for providing information about the following problem : file A : 2 000 individuals in rows 50 000 columns corresponding to 50 000 variables : each value belongs to {0, 1, 2} file B : 50 000 variables in rows 1st column : character (A,C,G,T) corresponding to code 0 2nd colomn : character corresponding to code 1 convention: if A[,j]=0, one wants to replace 0 with character in B[j,1] twice if A[,j]=1, one wants to replace 1 with character in B[j,1] and character in B[j,2] if A[,j]=2, one wants to replace 2 with character in B[j,2] and character in B[j,2] C - matrix(0,2000,0) # initialization to void matrix for(j in 1:2000){ c - A[,j] zeros - which(c==0); ones - which(c==1); twos - which(c==2); rm(c) c1 - matrix(Z,2000) c2 - matrix(Z,2000) c1[zeros] - B$V1[j]; c2[zeros] -B$V1[j] c1[ones] - B$V1[j]; c2[ones] -B$V2[j] c1[twos] - B$V2[j]; c2[twos] -B$V2[j] C - cbind(C, cbind(c1,c2)) } I do think some more elaborated solution might exist. Thanks in advance for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Posting an 'S4-creating Package Problem'...
On 02/06/2010 03:39 PM, Daniel Kosztyla wrote: Hello R-Team, May you help me to post a 'S4-creating Package Problem'? Thanks already now for supporting. The problem sounds like: Hello R forum, while compiling my R-package these 'Warnings' occur: ... Warnung in matchSignature(signature, fdef, where) : in the method signature for function plot no definition for class: prediction Warnung in matchSignature(signature, fdef, where) : in the method signature for function plot no definition for class: validation ** help *** installing help indices ... Maybe my NAMESPACE file looks wrong. Has anybody an idea how it has to look like to solve this problem? ( I use exportClasses(...), exportMethods(...). ) I have 3 classes: 'prediction', 'validation', 'nvalidation' which have a plot function. There's no warning for class 'nvalidation' but for the other two. Any suggestions? Hi Dan Files in a package are collated and then sourced. If your 'prediction' class is in prediction.R, and your plot method is in plot.R, then the files will be collated plot.R, prediction.R, and the class definition for prediction will be unknown when the plot method is defined. Use Collate: in the DESCRIPTION file, or put class (and generic) definitions in files that collate early, e.g., AllClasses.R, AllGenerics.R. Hope that helps, Martin Greetings. Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] convert R plots into annotated web-graphics
Dear all, I would like to make a large scatter plot created with R available as an interactive web graphic, in combination with additional text-annotations for each data point in the plot. The idea is to present the text-annotations in an HTML-table and inter-link the data points in the plot with their corresponding entries in the table, i.e. when clicking on a data point in the plot, the corresponding entry in the table should be highlighted or centered and vice-versa, when clicking on a table-entry, the corresponding point in the plot should be highlighted. I have seen that CRAN contains various R-packages for SVG-based output of interactive graphics (with hyperlinks and tool-tip annotations for each data point); however, SVG is not supported by all browsers. Is anybody aware of another solution for this problem (maybe based on image-maps and javascript)? If you have alternative ideas for interlinking tabular annotations with plotted data points, I would appreciate any recommendation/suggestion. (I work with R 2.8.1 on different 32-bit PCs with both Linux and Windows operating systems). Many thanks, Rainer _ inen herausragenden Schutz gegen Massenmails. http://mail.yahoo.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-Help
On Sat, Feb 6, 2010 at 2:46 PM, David Winsemius dwinsem...@comcast.net wrote: On Feb 6, 2010, at 3:29 PM, Ravi Ramaswamy wrote: Hi - I am not familiar with R. Could I ask you a quick question? When I read a file like this, I get an error. Not sure what I am doing wrong. I use a MAC. How do I specify a full path name for a file in R? Or do files have to reside locally? KoreaAuto - read.table(/Users/ Especially when just starting using R the simplest approach is KoreaAuto - read.table(file.choose()) which brings up a file chooser panel so you can point and click your way to the desired file. If the file is tab-delimited, as appears to be the case in the file you enclosed, you may want to use read.delim instead of read.table. The read.delim function sets up the defaults for the many optional arguments to read.table specifically for tab-delimited files with a header line of column names as you have shown. I think the opening and clsing quotes meant that you supplied an empty string to the file argument. raviramaswamy/Documents/Rutgers/STT 586/HW1 Data.txt) Error: unexpected numeric constant in KoreaAuto - read.table(/Users/raviramaswamy/Documents/Rutgers/STT 586 Using single instances of either sort of quote ( or ' ) on the ends of strings should work. If you drag a file from a Finder window to the R-console you should get a fully specified file path and name. Seems like the working directory is getwd() [1] /Users/raviramaswamy rd - read.table(file=/Users/davidwinsemius/Downloads/meminfo.csv, sep=,, header=TRUE) rd time RSS VSZ MEM 1 1 3027932 3141808 4.5 2 2 3028572 3141808 4.5 3 3 3030208 3141808 4.5 4 4 302 3150004 4.5 5 5 3035036 3150004 4.5 You can also shorten the Users/username part to ~ rd - read.table(file=~/Downloads/meminfo.csv, sep=,, header=TRUE) so I said this and still got an error KoreaAuto - read.table(/Documents/Rutgers/HW1Data) Error: unexpected '/' in KoreaAuto - read.table(/ But using no quotes will definitely not work. (And that was not a full path name anyway.) Could someone please help me with the correct syntax? Thanks Ravi Year AO GNP CP OP 01 1974 .0022 183 2322 189 02 1975 .0024 238 2729 206 03 1976 .0027 319 3069 206 04 1977 .0035 408 2763 190 05 1978 .0050 540 2414 199 06 1979 .0064 676 2440 233 07 1980 .0065 785 2430 630 08 1981 .0069 944 2631 740 09 1982 .0078 1036 3155 740 10 1983 .0095 1171 3200 660 David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Non-linear regression
It appears my suspicions about this being homework were unfounded. Given the additional problems with excess zeroes, you may want to examine the extremely informative material on analysis of such problems written by Zeileis, Kleiber and Jackman: (easily found in case you have misplaced it, as I had, with a Google search for: r-project zero-inflated hurdle models Regression Models for Count Data in R http://cran.cnr.berkeley.edu/web/packages/pscl/vignettes/countreg.pdf -- David. On Feb 6, 2010, at 10:56 PM, kupz wrote: Agreed, it would be simple to propose the relationship, however the regression is necessary to model the data properly. Unfortunately a simple decay based on those two points does not have the proper shape necessary. This is due to an extreme amount of zero inflation with this fisheries data. On another note, I have a working solution for the problem, I am excluding a portion of the zero data based on some other apriori assumptions.. Thanks for your help though. -- View this message in context: http://n4.nabble.com/Non-linear-regression-tp1471736p1471749.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] metafor package: effect sizes are not fully independent
Dear Gang, Here are just some general thoughts. Wolfgang Viechtbauer will be a better position to answer questions related to metafor. For multivariate effect sizes, we first have to estimate the asymptotic sampling covariance matrix among the effect sizes. Formulas for some common effect sizes are provided by Gleser and Olkin (2009). If a fixed-effects model is required, it is quite easy to write your own GLS function to conduct the multivariate meta-analysis (see e.g., Becker, 1992). If a random-effects model is required, it is more challenging in R. SAS Proc MIXED can do the work (e.g., van Houwelingen, Arends, Stijnen, 2002). Sometimes, it is possible to transform the multivariate effect sizes into independent effect sizes (Kalaian Raudenbush, 1996; Raudenbush, Becker, Kalaian, 1988). Then univariate meta-analysis, e.g., metafor(), can be performed on the transformed effect sizes. This approach works if it makes sense to pool the multivariate effect sizes as in your case (2)- the effect sizes are the same but in different conditions (happy, sad, and neutral). However, this approach does not work if the multivariate effect sizes are measuring different concepts, e.g., verbal achievement and mathematical achievement. Hope this helps. Becker, B. J. (1992). Using results from replicated studies to estimate linear models. Journal of Educational Statistics, 17, 341-362. Gleser, L. J., Olkin, I. (2009). Stochastically dependent effect sizes. In H. Cooper, L. V. Hedges, and J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis, 2nd edition (pp. 357-376). New York: Russell Sage Foundation. Kalaian, H. A., Raudenbush, S. W. (1996). A multivariate mixed linear model for meta-analysis. Psychological Methods, 1, 227-235. Raudenbush, S. W., Becker, B. J., Kalaian, H. (1988). Modeling multivariate effect sizes. Psychological Bulletin, 103, 111-120. van Houwelingen, H.C., Arends, L.R., Stijnen, T. (2002). Advanced methods in meta-analysis: multivariate approach and meta-regression. Statistics in Medicine, 21, 589-624. Regards, Mike -- - Mike W.L. Cheung Phone: (65) 6516-3702 Department of Psychology Fax: (65) 6773-1843 National University of Singapore http://courses.nus.edu.sg/course/psycwlm/internet/ - On Sat, Feb 6, 2010 at 6:07 AM, Gang Chen gangch...@gmail.com wrote: In a classical meta analysis model y_i = X_i * beta_i + e_i, data {y_i} are assumed to be independent effect sizes. However, I'm encountering the following two scenarios: (1) Each source has multiple effect sizes, thus {y_i} are not fully independent with each other. (2) Each source has multiple effect sizes, each of the effect size from a source can be categorized as one of a factor levels (e.g., happy, sad, and neutral). Maybe better denote the data as y_ij, effect size at the j-th level from the i-th source. I can code the levels with dummy variables into the X_i matrix, but apparently the data from the same source are correlated with each other. In this case, I would like to run a few tests one of which is, for example, whether there is any difference across all the levels of the factor. Can metafor handle these two cases? Thanks, Gang __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- - Mike W.L. Cheung Phone: (65) 6516-3702 Department of Psychology Fax: (65) 6773-1843 National University of Singapore http://courses.nus.edu.sg/course/psycwlm/internet/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predicting with stl() decomposition
Hi, yes that error name is indeed kind of weird. But I think its thrown due to the missing robustness of the estimation since every weight is one and hence the fit is likely to be influenced by outliers in the provided data which should be just an example. But do you have an idea to extract the single components of the fit? I guess there must be a possibility to predict those stl models. Cheers, Konrad _ Von: Dennis Murphy [mailto:djmu...@gmail.com] Gesendet: Sonntag, 7. Februar 2010 16:30 An: Konrad Hoppe Betreff: Re: [R] predicting with stl() decomposition Hi: When I ran your code, I got the following message in the first loess call: llrSaison - loess(seriesTs~time , span=decomp$win[1] , + degree=decomp$deg[1]) Warning messages: 1: Chernobyl! trLk 1 2: Chernobyl! trLk 1 Somebody has a sense of humor in their code writing, but I'm pretty sure the message trL k 1 has some meaning, probably telling you the fit is unstable. I looked through the loess function code but couldn't find anything in it that would be immediately helpful. It calls a function simpleLoess(), but that function is evidently non-visible. You probably need expert guidance here. Dennis On Sun, Feb 7, 2010 at 5:48 AM, Konrad Hoppe konradho...@hotmail.de wrote: Hi mailinglist members, Im actually working on a time series prediction and my current approach is to decompose the series first into a trend, a seasonal component and a remainder. Therefore Im using the stl() function. But Im wondering how to get the single components in order to predict the particular fitted series. This code snippet illustrates my problem: series - vector(length=300) noise - rnorm(300,0,2) time - 1:300 series[1] - noise[1] for(i in 3:300){ series[i] - 0.5*series[i-1]+ noise[i] + 0.01*time[i] } seriesTs - ts(series, start=c(1980,1), frequency=12) decomp - stl(seriesTs ,periodic) plot(decomp) llrSaison - loess(seriesTs~time , span=decomp$win[1] , degree=decomp$deg[1]) llrTrend - loess(seriesTs~time, span=decomp$win[2] , degree=decomp$deg[2]) plot(llrSaison$fitted) The last plot differs much from the seasonal plot in the plot(decomp) call. This is why the llr estimator doesnt extract the seasonal component, but how can I predict the single components at last? Or is there a function which can predict the values of the stl-object. Predict() doesnt work, Ive already tried it. All the best, Konrad Hoppe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading hierarchical data
I would like to read the following hierarchical data set. There is a family record followed by one or more personal records. If col. 7 is 1 it is a family record. If it is 2 it is a personal record. The family record is formatted as follows: col. 1-5 family id col. 71 col. 9dwelling type code The personal record is formatted as follows: col. 1-5personal id col. 7 2 col. 8-9age col. 11 sex code The first six family and accompanying personal records look like this: 06470 1 1 1 232 0 2 230 1 07470 1 0 1 240 1 08470 1 0 1 227 0 09470 1 0 1 213 1 2 222 0 3 224 1 10470 1 1 1 220 0 2 211 1 11470 1 0 1 217 0 2 210 1 3 226 1 I want to create a dataset containing . family ID . dwelling code . person ID . age . sex code The dataset will contain one observation per person, and the with family information repeated for people in the same family. Can anyone help? Thanks, Richard Saba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predicting with stl() decomposition
Hi Dennis, Ive already found that matrix, but I want to predict the time series, for example with predict.loess() on every component. But actually Im unable to extract the plotted series (trend and seasonal) as a loess object. This representation is what Im looking for. At the moment I dont see any other possibility to predict the particular components as with the loess prediction. Do you have an idea to extract this representation and not just the data? Thanks in advance. Konrad _ Von: Dennis Murphy [mailto:djmu...@gmail.com] Gesendet: Sonntag, 7. Februar 2010 17:08 An: Konrad Hoppe Betreff: Re: [R] predicting with stl() decomposition Hi: str(decomp) List of 8 $ time.series: mts [1:300, 1:3] 0.0928 0.2906 -0.0852 -0.1877 0.0347 ... ..- attr(*, dimnames)=List of 2 .. ..$ : NULL .. ..$ : chr [1:3] seasonal trend remainder ..- attr(*, tsp)= num [1:3] 1980 2005 12 ..- attr(*, class)= chr [1:2] mts ts $ weights: num [1:300] 1 1 1 1 1 1 1 1 1 1 ... $ call : language stl(x = seriesTs, s.window = periodic) $ win: Named num [1:3] 3001 19 13 ..- attr(*, names)= chr [1:3] s t l $ deg: Named int [1:3] 0 1 1 ..- attr(*, names)= chr [1:3] s t l $ jump : Named num [1:3] 301 2 2 ..- attr(*, names)= chr [1:3] s t l $ inner : int 2 $ outer : int 0 - attr(*, class)= chr stl This tells you decomp$time.series is a matrix with respective columns 'seasonal', 'trend' and 'remainder', respectively. You can extract that and go from there. HTH, Dennis On Sun, Feb 7, 2010 at 7:55 AM, Konrad Hoppe konradho...@hotmail.de wrote: Hi, yes that error name is indeed kind of weird. But I think its thrown due to the missing robustness of the estimation since every weight is one and hence the fit is likely to be influenced by outliers in the provided data which should be just an example. But do you have an idea to extract the single components of the fit? I guess there must be a possibility to predict those stl models. Cheers, Konrad _ Von: Dennis Murphy [mailto:djmu...@gmail.com] Gesendet: Sonntag, 7. Februar 2010 16:30 An: Konrad Hoppe Betreff: Re: [R] predicting with stl() decomposition Hi: When I ran your code, I got the following message in the first loess call: llrSaison - loess(seriesTs~time , span=decomp$win[1] , + degree=decomp$deg[1]) Warning messages: 1: Chernobyl! trLk 1 2: Chernobyl! trLk 1 Somebody has a sense of humor in their code writing, but I'm pretty sure the message trL k 1 has some meaning, probably telling you the fit is unstable. I looked through the loess function code but couldn't find anything in it that would be immediately helpful. It calls a function simpleLoess(), but that function is evidently non-visible. You probably need expert guidance here. Dennis On Sun, Feb 7, 2010 at 5:48 AM, Konrad Hoppe konradho...@hotmail.de wrote: Hi mailinglist members, Im actually working on a time series prediction and my current approach is to decompose the series first into a trend, a seasonal component and a remainder. Therefore Im using the stl() function. But Im wondering how to get the single components in order to predict the particular fitted series. This code snippet illustrates my problem: series - vector(length=300) noise - rnorm(300,0,2) time - 1:300 series[1] - noise[1] for(i in 3:300){ series[i] - 0.5*series[i-1]+ noise[i] + 0.01*time[i] } seriesTs - ts(series, start=c(1980,1), frequency=12) decomp - stl(seriesTs ,periodic) plot(decomp) llrSaison - loess(seriesTs~time , span=decomp$win[1] , degree=decomp$deg[1]) llrTrend - loess(seriesTs~time, span=decomp$win[2] , degree=decomp$deg[2]) plot(llrSaison$fitted) The last plot differs much from the seasonal plot in the plot(decomp) call. This is why the llr estimator doesnt extract the seasonal component, but how can I predict the single components at last? Or is there a function which can predict the values of the stl-object. Predict() doesnt work, Ive already tried it. All the best, Konrad Hoppe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading hierarchical data
Will this do it for you: input - readLines(textConnection(06470 1 1 + 1 232 0 + 2 230 1 + 07470 1 0 + 1 240 1 + 08470 1 0 + 1 227 0 + 09470 1 0 + 1 213 1 + 2 222 0 + 3 224 1 + 10470 1 1 + 1 220 0 + 2 211 1 + 11470 1 0 + 1 217 0 + 2 210 1 + 3 226 1)) closeAllConnections() fid - NULL dwell - NULL result - do.call(rbind, lapply(input, function(.line){ + values - as.integer(substring(.line, c(1, 7, 9), c(5, 7, 9))) # assume family record + if (values[2] == '1'){ + fid - values[1] + dwell - values[3] + return(NULL) + } else { + values - as.integer(substring(.line, c(1, 7, 8, 11), c(5, 7, 9, 11))) + return(c(fid=fid, dwell=dwell, pid=values[1], age=values[3], sex=values[4])) + } + })) result fid dwell pid age sex [1,] 6470 1 1 32 0 [2,] 6470 1 2 30 1 [3,] 7470 0 1 40 1 [4,] 8470 0 1 27 0 [5,] 9470 0 1 13 1 [6,] 9470 0 2 22 0 [7,] 9470 0 3 24 1 [8,] 10470 1 1 20 0 [9,] 10470 1 2 11 1 [10,] 11470 0 1 17 0 [11,] 11470 0 2 10 1 [12,] 11470 0 3 26 1 On Sun, Feb 7, 2010 at 10:57 AM, Saba(Home) saba...@charter.net wrote: I would like to read the following hierarchical data set. There is a family record followed by one or more personal records. If col. 7 is 1 it is a family record. If it is 2 it is a personal record. The family record is formatted as follows: col. 1-5 family id col. 7 1 col. 9 dwelling type code The personal record is formatted as follows: col. 1-5 personal id col. 7 2 col. 8-9 age col. 11 sex code The first six family and accompanying personal records look like this: 06470 1 1 1 232 0 2 230 1 07470 1 0 1 240 1 08470 1 0 1 227 0 09470 1 0 1 213 1 2 222 0 3 224 1 10470 1 1 1 220 0 2 211 1 11470 1 0 1 217 0 2 210 1 3 226 1 I want to create a dataset containing . family ID . dwelling code . person ID . age . sex code The dataset will contain one observation per person, and the with family information repeated for people in the same family. Can anyone help? Thanks, Richard Saba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] conditioned xyplot, many y variables
The example below creates parallel time-series plots of three different y variables conditioned by a dichotomous factor. In the graphical layout, • Each y variable inhabits its own row and is plotted on its own distinct scale. • Each level of the factor has its own column, but within each row the scale is held constant across columns. • The panels fit tightly (as they do in lattice) without superfluous whitespace or ticks. Currently I know of no lattice solution to this problem, only a traditional graphics solution. Can one solve this problem elegantly using lattice? The difficulty is to lock the levels of the factor (the columns) into the same scale for each y variable (for each row), while allowing the scales to differ between the y variables (between the rows). Details: # Toy data: N-15 TIME - (1:N)/N ppp - TIME^2 QQQ - exp(TIME) z - ppp / QQQ JUNK-data.frame( ppp=ppp, QQQ=QQQ, z=z, TIME=TIME) JUNK$ID-1 jank-JUNK jank$ID-2 jank$ppp-jank$ppp / 2 jank$QQQ-jank$QQQ / 2 jank$z-jank$ppp/jank$QQQ JUNK-rbind(JUNK, jank) jank-JUNK jank$ppp-(jank$ppp) ^(1/4) jank$QQQ-(jank$QQQ) / 10 jank$z - jank$ppp / jank$QQQ JUNK$Species-Dog jank$Species-feline JUNK-rbind(JUNK, jank) JUNK$Species-factor(JUNK$Species) JUNK$ID-factor(JUNK$ID) summary(JUNK) # Traditional graphics solution: par(mfrow=c(3,2),mar=c(0,0,0,0)+0.0,oma=c(4,4,4,1),xpd=FALSE, las=0) varNamesAndLabels-data.frame( name=c(z, QQQ, ppp) , label=c(z (mIU/mL), QQQ (pg/L), ppp (mg/L)) ) rownames( varNamesAndLabels)- varNamesAndLabels$name count_y_variables-0 for(this_y_name in rownames( varNamesAndLabels) ) { count_y_variables - count_y_variables + 1 countSpecies-0 for(thisSpecies in levels(JUNK$Species)) { countSpecies-countSpecies + 1 TEMPORARY-JUNK[JUNK$Species==thisSpecies,] if(countSpecies==1) { plot(JUNK$TIME, JUNK[[this_y_name]], xlab=, ylab=, type=n,xaxt='n', log=y) mtext( varNamesAndLabels[this_y_name,label], side=2, line=2.5) } else plot(JUNK$TIME, JUNK[[this_y_name]] , xlab=, ylab=, type=n,xaxt='n', log=y, yaxt=n) for( thisID in levels(TEMPORARY$ID)) { lines(TEMPORARY$TIME[TEMPORARY$ID==thisID], TEMPORARY[[this_y_name]][TEMPORARY$ID==thisID], type=o) } if(count_y_variables == nrow(varNamesAndLabels)) mtext( thisSpecies, side=1, line=2.5) } } library(lattice) # The three lattice partial solutions below differ only in the value of scales$y$relation. # scales$y$relation=same # forces ppp, QQQ, and z to the same scale, which obscures signal, # especially for ppp. But at least it enables us to see that the range of QQQ # differs immensely between Dog and feline. xyplot ( ppp + QQQ + z ~ TIME | Species , group=ID , data=JUNK , ylab=c(ppp (mg/L), QQQ (pg/L), z (mIU/mL)) , xlab=c(Dog, feline) , type=o , strip= FALSE , outer=TRUE , layout=c(2,3) , scales=list( ppp=list( alternating=3) , y=list( relation=same , alternating=3 , rot=0 , log=T ) ) ) # scales$y$relation=free # displays ppp, QQQ, and z on different scales, but it also allows # the scales for each variable to differ between Dog and feline. # This prevents us from visually comparing the species. xyplot ( ppp + QQQ + z ~ TIME | Species , group=ID , data=JUNK , ylab=c(ppp (mg/L), QQQ (pg/L), z (mIU/mL)) , xlab=c(Dog, feline) , type=o , strip= FALSE , outer=TRUE , layout=c(2,3) , scales=list( ppp=list( alternating=3) , y=list( relation=free , alternating=3 , rot=0 , log=T ) ) ) # scales$y$relation=sliced # shows us that the difference max(z)-min(z) differs greatly between # Dog and feline. But it obscures the fact that # QQQ differs wildly between Dog and feline, as we saw when # relation=same. xyplot ( ppp + QQQ + z ~ TIME | Species , group=ID , data=JUNK , ylab=c(ppp (mg/L), QQQ (pg/L), z (mIU/mL)) , xlab=c(Dog, feline) , type=o , strip= FALSE , outer=TRUE , layout=c(2,3) , scales=list( ppp=list( alternating=3) , y=list( relation=sliced , alternating=3 , rot=0 , log=T )
[R] Why does aggregate fail?
I am trying to get hourly totals, given 15-minute bins. s = seq(0, 95, 1) s = floor(s/4) # 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 . . . s [1] 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 [26] 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 12 12 [51] 12 12 13 13 13 13 14 14 14 14 15 15 15 15 16 16 16 16 17 17 17 17 18 18 18 [76] 18 19 19 19 19 20 20 20 20 21 21 21 21 22 22 22 22 23 23 23 23 mode(d) [1] list d 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Sunday 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Sunday 0 0 0 0 0 5 5 5 0 0 0 0 0 0 0 6 0 3 1 0 1 6 8 9 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 Sunday 7 9 10 5 0 1 0 1 1 1 0 0 0 1 0 0 7 10 9 9 11 11 8 8 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 Sunday 10 7 6 7 8 7 4 4 6 5 5 5 5 0 0 0 1 6 2 3 0 x = aggregate(d, by=list(s), FUN=sum) Error in FUN(X[[1L]], ...) : arguments must have same length length(s) [1] 96 length(d) [1] 96 What am I doing wrong? Thanks in advance list, Jim Rome __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why does aggregate fail?
On Feb 7, 2010, at 1:08 PM, James Rome wrote: I am trying to get hourly totals, given 15-minute bins. s = seq(0, 95, 1) s = floor(s/4) # 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 . . . s [1] 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 [26] 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 12 12 [51] 12 12 13 13 13 13 14 14 14 14 15 15 15 15 16 16 16 16 17 17 17 17 18 18 18 [76] 18 19 19 19 19 20 20 20 20 21 21 21 21 22 22 22 22 23 23 23 23 mode(d) [1] list d 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Sunday 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Sunday 0 0 0 0 0 5 5 5 0 0 0 0 0 0 0 6 0 3 1 0 1 6 8 9 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 Sunday 7 9 10 5 0 1 0 1 1 1 0 0 0 1 0 0 7 10 9 9 11 11 8 8 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 Sunday 10 7 6 7 8 7 4 4 6 5 5 5 5 0 0 0 1 6 2 3 0 x = aggregate(d, by=list(s), FUN=sum) Error in FUN(X[[1L]], ...) : arguments must have same length I don't know what sort of error is occurring. You have not created a posting that easily lets us see what sort of object d really is. (And it is not being display as though it were a simple list.) dput(d) would have allowed us to see what sort of attributes it has. Your code works if one strips out the data and puts it into a vector. s = seq(0, 95, 1) s = floor(s/4) d - scan() 1: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 26: 0 0 28: 0 0 0 0 0 5 5 5 0 0 0 0 0 0 0 6 0 3 1 0 1 49: 6 8 9 52: 7 9 10 5 0 1 0 1 1 1 0 0 0 1 0 0 7 10 9 9 11 73: 11 8 8 76: 10 7 6 7 8 7 4 4 6 5 5 5 5 0 0 0 1 6 2 3 0 97: Read 96 items x = aggregate(d, by=list(s), FUN=sum) x Group.1 x 10 1 21 0 32 0 43 0 54 2 65 0 76 0 87 0 98 15 10 9 0 11 10 6 12 11 5 13 12 30 14 13 24 15 14 3 16 15 1 17 16 8 18 17 39 19 18 37 20 19 28 21 20 21 22 21 20 23 22 1 24 23 11 length(s) [1] 96 length(d) [1] 96 What am I doing wrong? Thanks in advance list, Jim Rome __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why does aggregate fail?
dput(d) structure(list(`0` = 0, `1` = 1, `2` = 0, `3` = 0, `4` = 0, `5` = 0, `6` = 0, `7` = 0, `8` = 0, `9` = 0, `10` = 0, `11` = 0, `12` = 0, `13` = 0, `14` = 0, `15` = 0, `16` = 0, `17` = 2, `18` = 0, `19` = 0, `20` = 0, `21` = 0, `22` = 0, `23` = 0, `24` = 0, `25` = 0, `26` = 0, `27` = 0, `28` = 0, `29` = 0, `30` = 0, `31` = 0, `32` = 5, `33` = 5, `34` = 5, `35` = 0, `36` = 0, `37` = 0, `38` = 0, `39` = 0, `40` = 0, `41` = 0, `42` = 6, `43` = 0, `44` = 3, `45` = 1, `46` = 0, `47` = 1, `48` = 6, `49` = 8, `50` = 9, `51` = 7, `52` = 9, `53` = 10, `54` = 5, `55` = 0, `56` = 1, `57` = 0, `58` = 1, `59` = 1, `60` = 1, `61` = 0, `62` = 0, `63` = 0, `64` = 1, `65` = 0, `66` = 0, `67` = 7, `68` = 10, `69` = 9, `70` = 9, `71` = 11, `72` = 11, `73` = 8, `74` = 8, `75` = 10, `76` = 7, `77` = 6, `78` = 7, `79` = 8, `80` = 7, `81` = 4, `82` = 4, `83` = 6, `84` = 5, `85` = 5, `86` = 5, `87` = 5, `88` = 0, `89` = 0, `90` = 0, `91` = 1, `92` = 6, `93` = 2, `94` = 3, `95` = 0), .Names = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95), row.names = Sunday, class = data.frame) On 2/7/2010 1:27 PM, David Winsemius wrote: On Feb 7, 2010, at 1:08 PM, James Rome wrote: I am trying to get hourly totals, given 15-minute bins. s = seq(0, 95, 1) s = floor(s/4) # 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 . . . s [1] 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 [26] 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 12 12 [51] 12 12 13 13 13 13 14 14 14 14 15 15 15 15 16 16 16 16 17 17 17 17 18 18 18 [76] 18 19 19 19 19 20 20 20 20 21 21 21 21 22 22 22 22 23 23 23 23 mode(d) [1] list d 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Sunday 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Sunday 0 0 0 0 0 5 5 5 0 0 0 0 0 0 0 6 0 3 1 0 1 6 8 9 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 Sunday 7 9 10 5 0 1 0 1 1 1 0 0 0 1 0 0 7 10 9 9 11 11 8 8 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 Sunday 10 7 6 7 8 7 4 4 6 5 5 5 5 0 0 0 1 6 2 3 0 x = aggregate(d, by=list(s), FUN=sum) Error in FUN(X[[1L]], ...) : arguments must have same length I don't know what sort of error is occurring. You have not created a posting that easily lets us see what sort of object d really is. (And it is not being display as though it were a simple list.) dput(d) would have allowed us to see what sort of attributes it has. Your code works if one strips out the data and puts it into a vector. s = seq(0, 95, 1) s = floor(s/4) d - scan() 1: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 26: 0 0 28: 0 0 0 0 0 5 5 5 0 0 0 0 0 0 0 6 0 3 1 0 1 49: 6 8 9 52: 7 9 10 5 0 1 0 1 1 1 0 0 0 1 0 0 7 10 9 9 11 73: 11 8 8 76: 10 7 6 7 8 7 4 4 6 5 5 5 5 0 0 0 1 6 2 3 0 97: Read 96 items x = aggregate(d, by=list(s), FUN=sum) x Group.1 x 10 1 21 0 32 0 43 0 54 2 65 0 76 0 87 0 98 15 10 9 0 11 10 6 12 11 5 13 12 30 14 13 24 15 14 3 16 15 1 17 16 8 18 17 39 19 18 37 20 19 28 21 20 21 22 21 20 23 22 1 24 23 11 length(s) [1] 96 length(d) [1] 96 What am I doing wrong? Thanks in advance list, Jim Rome __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why does aggregate fail?
You have a dataframe with 96 columns and a single row named Sunday. My guess is that was not your intent. How did d come to exist? -- David. On Feb 7, 2010, at 1:29 PM, James Rome wrote: dput(d) structure(list(`0` = 0, `1` = 1, `2` = 0, `3` = 0, `4` = 0, `5` = 0, `6` = 0, `7` = 0, `8` = 0, `9` = 0, `10` = 0, `11` = 0, `12` = 0, `13` = 0, `14` = 0, `15` = 0, `16` = 0, `17` = 2, `18` = 0, `19` = 0, `20` = 0, `21` = 0, `22` = 0, `23` = 0, `24` = 0, `25` = 0, `26` = 0, `27` = 0, `28` = 0, `29` = 0, `30` = 0, `31` = 0, `32` = 5, `33` = 5, `34` = 5, `35` = 0, `36` = 0, `37` = 0, `38` = 0, `39` = 0, `40` = 0, `41` = 0, `42` = 6, `43` = 0, `44` = 3, `45` = 1, `46` = 0, `47` = 1, `48` = 6, `49` = 8, `50` = 9, `51` = 7, `52` = 9, `53` = 10, `54` = 5, `55` = 0, `56` = 1, `57` = 0, `58` = 1, `59` = 1, `60` = 1, `61` = 0, `62` = 0, `63` = 0, `64` = 1, `65` = 0, `66` = 0, `67` = 7, `68` = 10, `69` = 9, `70` = 9, `71` = 11, `72` = 11, `73` = 8, `74` = 8, `75` = 10, `76` = 7, `77` = 6, `78` = 7, `79` = 8, `80` = 7, `81` = 4, `82` = 4, `83` = 6, `84` = 5, `85` = 5, `86` = 5, `87` = 5, `88` = 0, `89` = 0, `90` = 0, `91` = 1, `92` = 6, `93` = 2, `94` = 3, `95` = 0), .Names = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95), row.names = Sunday, class = data.frame) On 2/7/2010 1:27 PM, David Winsemius wrote: On Feb 7, 2010, at 1:08 PM, James Rome wrote: I am trying to get hourly totals, given 15-minute bins. s = seq(0, 95, 1) s = floor(s/4) # 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 . . . s [1] 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 [26] 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 12 12 [51] 12 12 13 13 13 13 14 14 14 14 15 15 15 15 16 16 16 16 17 17 17 17 18 18 18 [76] 18 19 19 19 19 20 20 20 20 21 21 21 21 22 22 22 22 23 23 23 23 mode(d) [1] list d 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Sunday 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Sunday 0 0 0 0 0 5 5 5 0 0 0 0 0 0 0 6 0 3 1 0 1 6 8 9 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 Sunday 7 9 10 5 0 1 0 1 1 1 0 0 0 1 0 0 7 10 9 9 11 11 8 8 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 Sunday 10 7 6 7 8 7 4 4 6 5 5 5 5 0 0 0 1 6 2 3 0 x = aggregate(d, by=list(s), FUN=sum) Error in FUN(X[[1L]], ...) : arguments must have same length I don't know what sort of error is occurring. You have not created a posting that easily lets us see what sort of object d really is. (And it is not being display as though it were a simple list.) dput(d) would have allowed us to see what sort of attributes it has. Your code works if one strips out the data and puts it into a vector. s = seq(0, 95, 1) s = floor(s/4) d - scan() 1: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 26: 0 0 28: 0 0 0 0 0 5 5 5 0 0 0 0 0 0 0 6 0 3 1 0 1 49: 6 8 9 52: 7 9 10 5 0 1 0 1 1 1 0 0 0 1 0 0 7 10 9 9 11 73: 11 8 8 76: 10 7 6 7 8 7 4 4 6 5 5 5 5 0 0 0 1 6 2 3 0 97: Read 96 items x = aggregate(d, by=list(s), FUN=sum) x Group.1 x 10 1 21 0 32 0 43 0 54 2 65 0 76 0 87 0 98 15 10 9 0 11 10 6 12 11 5 13 12 30 14 13 24 15 14 3 16 15 1 17 16 8 18 17 39 19 18 37 20 19 28 21 20 21 22 21 20 23 22 1 24 23 11 length(s) [1] 96 length(d) [1] 96 What am I doing wrong? Thanks in advance list, Jim Rome __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (Another) Bates fortune?
On Fri, 5 Feb 2010, Peter Ehlers wrote: I vote to 'fortunize' Doug Bates on Hierarchical data sets: which software to use? The widespread use of spreadsheets or SPSS data sets or SAS data sets which encourage the single table with a gargantuan number of columns, most of which are missing data in most cases approach to organization of longitudinal data is regrettable. http://n4.nabble.com/Hierarchical-data-sets-which-software-to-use-td1458477.html#a1470430 Thanks, added to the devel-version on R-Forge. Z -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why does aggregate fail?
On Feb 7, 2010, at 1:32 PM, David Winsemius wrote: You have a dataframe with 96 columns and a single row named Sunday. My guess is that was not your intent. How did d come to exist? But to answer your question: apply(d, 1, function(z) aggregate(z, by=list(s), FUN=sum) ) $Sunday Group.1 x 10 1 21 0 32 0 43 0 54 2 65 0 76 0 87 0 98 15 10 9 0 11 10 6 12 11 5 13 12 30 14 13 24 15 14 3 16 15 1 17 16 8 18 17 39 19 18 37 20 19 28 21 20 21 22 21 20 23 22 1 24 23 11 -- David. On Feb 7, 2010, at 1:29 PM, James Rome wrote: dput(d) structure(list(`0` = 0, `1` = 1, `2` = 0, `3` = 0, `4` = 0, `5` = 0, `6` = 0, `7` = 0, `8` = 0, `9` = 0, `10` = 0, `11` = 0, `12` = 0, `13` = 0, `14` = 0, `15` = 0, `16` = 0, `17` = 2, `18` = 0, `19` = 0, `20` = 0, `21` = 0, `22` = 0, `23` = 0, `24` = 0, `25` = 0, `26` = 0, `27` = 0, `28` = 0, `29` = 0, `30` = 0, `31` = 0, `32` = 5, `33` = 5, `34` = 5, `35` = 0, `36` = 0, `37` = 0, `38` = 0, `39` = 0, `40` = 0, `41` = 0, `42` = 6, `43` = 0, `44` = 3, `45` = 1, `46` = 0, `47` = 1, `48` = 6, `49` = 8, `50` = 9, `51` = 7, `52` = 9, `53` = 10, `54` = 5, `55` = 0, `56` = 1, `57` = 0, `58` = 1, `59` = 1, `60` = 1, `61` = 0, `62` = 0, `63` = 0, `64` = 1, `65` = 0, `66` = 0, `67` = 7, `68` = 10, `69` = 9, `70` = 9, `71` = 11, `72` = 11, `73` = 8, `74` = 8, `75` = 10, `76` = 7, `77` = 6, `78` = 7, `79` = 8, `80` = 7, `81` = 4, `82` = 4, `83` = 6, `84` = 5, `85` = 5, `86` = 5, `87` = 5, `88` = 0, `89` = 0, `90` = 0, `91` = 1, `92` = 6, `93` = 2, `94` = 3, `95` = 0), .Names = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95), row.names = Sunday, class = data.frame) On 2/7/2010 1:27 PM, David Winsemius wrote: On Feb 7, 2010, at 1:08 PM, James Rome wrote: I am trying to get hourly totals, given 15-minute bins. s = seq(0, 95, 1) s = floor(s/4) # 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 . . . s [1] 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 [26] 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 12 12 [51] 12 12 13 13 13 13 14 14 14 14 15 15 15 15 16 16 16 16 17 17 17 17 18 18 18 [76] 18 19 19 19 19 20 20 20 20 21 21 21 21 22 22 22 22 23 23 23 23 mode(d) [1] list d 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Sunday 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Sunday 0 0 0 0 0 5 5 5 0 0 0 0 0 0 0 6 0 3 1 0 1 6 8 9 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 Sunday 7 9 10 5 0 1 0 1 1 1 0 0 0 1 0 0 7 10 9 9 11 11 8 8 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 Sunday 10 7 6 7 8 7 4 4 6 5 5 5 5 0 0 0 1 6 2 3 0 x = aggregate(d, by=list(s), FUN=sum) Error in FUN(X[[1L]], ...) : arguments must have same length I don't know what sort of error is occurring. You have not created a posting that easily lets us see what sort of object d really is. (And it is not being display as though it were a simple list.) dput(d) would have allowed us to see what sort of attributes it has. Your code works if one strips out the data and puts it into a vector. s = seq(0, 95, 1) s = floor(s/4) d - scan() 1: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 26: 0 0 28: 0 0 0 0 0 5 5 5 0 0 0 0 0 0 0 6 0 3 1 0 1 49: 6 8 9 52: 7 9 10 5 0 1 0 1 1 1 0 0 0 1 0 0 7 10 9 9 11 73: 11 8 8 76: 10 7 6 7 8 7 4 4 6 5 5 5 5 0 0 0 1 6 2 3 0 97: Read 96 items x = aggregate(d, by=list(s), FUN=sum) x Group.1 x 10 1 21 0 32 0 43 0 54 2 65 0 76 0 87 0 98 15 10 9 0 11 10 6 12 11 5 13 12 30 14 13 24 15 14 3 16 15 1 17 16 8 18 17 39 19 18 37 20 19 28 21 20 21 22 21 20 23 22 1 24 23 11 length(s) [1] 96 length(d) [1] 96 What am I doing wrong? Thanks in advance list, Jim Rome __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT David Winsemius, MD
Re: [R] Why does aggregate fail?
On 2/7/2010 1:32 PM, David Winsemius wrote:You have a dataframe with 96 columns and a single row named Sunday. My guess is that was not your intent. How did d come to exist? I was trying to make a simpler example. The actual code uses a data frame maxrdf: dput(maxrdf) structure(list(`0` = c(0, 1, 0, 3, 0, 2, 3), `1` = c(1, 1, 0, 2, 0, 1, 2), `2` = c(0, 1, 1, 1, 0, 2, 1), `3` = c(0, 1, 0, 1, 0, 1, 1), `4` = c(0, 1, 3, 2, 0, 1, 1), `5` = c(0, 0, 0, 1, 0, 1, 1), `6` = c(0, 0, 0, 1, 0, 1, 1), `7` = c(0, 0, 1, 0, 0, 0, 0), `8` = c(0, 1, 1, 0, 0, 2, 1), `9` = c(0, 0, 1, 2, 0, 3, 0 ), `10` = c(0, 1, 0, 0, 1, 2, 0), `11` = c(0, 0, 0, 0, 0, 0, 0), `12` = c(0, 0, 0, 0, 0, 0, 1), `13` = c(0, 0, 1, 0, 0, 0, 0), `14` = c(0, 0, 0, 0, 0, 0, 0), `15` = c(0, 0, 0, 1, 0, 0, 0), `16` = c(0, 1, 1, 1, 0, 1, 0), `17` = c(2, 1, 1, 2, 0, 0, 1), `18` = c(0, 1, 3, 4, 0, 4, 2), `19` = c(0, 0, 3, 4, 0, 4, 5), `20` = c(0, 0, 5, 3, 1, 0, 4), `21` = c(0, 0, 5, 5, 2, 0, 4), `22` = c(0, 0, 5, 7, 0, 0, 5), `23` = c(0, 0, 7, 9, 10, 0, 7), `24` = c(0, 0, 6, 8, 4, 5, 9), `25` = c(0, 0, 6, 4, 5, 4, 7), `26` = c(0, 0, 4, 6, 5, 4, 5), `27` = c(0, 0, 7, 9, 8, 5, 10), `28` = c(0, 2, 9, 13, 0, 5, 14), `29` = c(0, 2, 10, 11, 0, 9, 11), `30` = c(0, 3, 9, 8, 0, 8, 9), `31` = c(0, 5, 6, 7, 0, 3, 7), `32` = c(5, 7, 7, 5, 4, 5, 7), `33` = c(5, 6, 8, 5, 7, 6, 5), `34` = c(5, 4, 5, 5, 4, 5, 6), `35` = c(0, 4, 6, 4, 1, 3, 5), `36` = c(0, 6, 5, 5, 4, 7, 5), `37` = c(0, 6, 6, 6, 5, 7, 5), `38` = c(0, 8, 6, 6, 5, 4, 5), `39` = c(0, 6, 5, 3, 3, 4, 4), `40` = c(0, 5, 2, 5, 3, 3, 2), `41` = c(0, 4, 5, 3, 4, 3, 4), `42` = c(6, 5, 6, 0, 3, 2, 5), `43` = c(0, 7, 4, 0, 3, 2, 6), `44` = c(3, 7, 6, 6, 5, 8, 4), `45` = c(1, 8, 5, 3, 2, 5, 9), `46` = c(0, 8, 7, 7, 0, 6, 5), `47` = c(1, 8, 5, 4, 0, 8, 8), `48` = c(6, 5, 8, 0, 0, 4, 0), `49` = c(8, 6, 13, 7, 0, 8, 0), `50` = c(9, 7, 8, 7, 0, 7, 0), `51` = c(7, 7, 8, 10, 0, 6, 0), `52` = c(9, 1, 8, 0, 4, 5, 0), `53` = c(10, 0, 1, 0, 1, 0, 1), `54` = c(5, 0, 3, 0, 3, 0, 0), `55` = c(0, 0, 1, 1, 5, 0, 0), `56` = c(1, 0, 10, 5, 10, 1, 0), `57` = c(0, 0, 8, 6, 12, 0, 1), `58` = c(1, 0, 8, 4, 11, 0, 0), `59` = c(1, 0, 6, 5, 6, 0, 0), `60` = c(1, 0, 0, 1, 4, 0, 0), `61` = c(0, 0, 0, 4, 9, 0, 1), `62` = c(0, 0, 0, 2, 5, 0, 0), `63` = c(0, 0, 0, 5, 12, 0, 0), `64` = c(1, 0, 0, 0, 9, 1, 1), `65` = c(0, 0, 0, 1, 7, 0, 0), `66` = c(0, 6, 8, 3, 6, 3, 4), `67` = c(7, 7, 7, 3, 9, 5, 6), `68` = c(10, 6, 7, 0, 6, 6, 6), `69` = c(9, 10, 9, 5, 9, 7, 8), `70` = c(9, 9, 10, 6, 8, 7, 8), `71` = c(11, 10, 11, 7, 11, 8, 8), `72` = c(11, 0, 9, 6, 10, 7, 7), `73` = c(8, 0, 9, 7, 8, 9, 12), `74` = c(8, 3, 9, 0, 9, 9, 7), `75` = c(10, 4, 10, 0, 10, 7, 6), `76` = c(7, 0, 8, 8, 9, 9, 11), `77` = c(6, 0, 12, 8, 9, 10, 7), `78` = c(7, 0, 9, 5, 9, 9, 7), `79` = c(8, 0, 8, 7, 8, 8, 7), `80` = c(7, 11, 7, 8, 4, 8, 5), `81` = c(4, 6, 9, 5, 7, 7, 1), `82` = c(4, 6, 5, 10, 9, 10, 1), `83` = c(6, 5, 6, 7, 9, 7, 0), `84` = c(5, 0, 8, 3, 8, 8, 0), `85` = c(5, 1, 5, 6, 8, 7, 0), `86` = c(5, 2, 5, 3, 7, 7, 0), `87` = c(5, 3, 5, 6, 8, 7, 0), `88` = c(0, 4, 6, 0, 6, 6, 0), `89` = c(0, 1, 8, 0, 4, 7, 0), `90` = c(0, 0, 5, 0, 7, 2, 0), `91` = c(1, 0, 3, 0, 2, 5, 0), `92` = c(6, 0, 5, 0, 6, 5, 0), `93` = c(2, 0, 4, 0, 5, 5, 0), `94` = c(3, 0, 3, 0, 2, 6, 0), `95` = c(0, 0, 3, 0, 4, 4, 0)), .Names = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95), row.names = c(Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday), class = data.frame) maxrdf[1,] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Sunday 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Sunday 0 0 0 0 0 5 5 5 0 0 0 0 0 0 0 6 0 3 1 0 1 6 8 9 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 Sunday 7 9 10 5 0 1 0 1 1 1 0 0 0 1 0 0 7 10 9 9 11 11 8 8 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 Sunday 10 7 6 7 8 7 4 4 6 5 5 5 5 0 0 0 1 6 2 3 0 And the code that fails is ha = matrix(nrow=7, ncol=24) colnames(ha) = as.character(c(0:23)) rownames(ha) = rownames(maxrdf) for(j in 1:7) { x = aggregate(maxrdf[j,], by=list(c(s)), FUN=sum) ha[j,] = x[[2]] } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why does aggregate fail?
On 2/7/2010 1:35 PM, David Winsemius wrote:But to answer your question: apply(d, 1, function(z) aggregate(z, by=list(s), FUN=sum) ) David, That works, but I do not understand why I could not use aggregate directly. And the answer comes out as a list, which thus far baffles me. How do I get the answer as a matrix in my original code, which I modified to use apply? ha = matrix(nrow=7, ncol=24) colnames(ha) = as.character(c(0:23)) rownames(ha) = rownames(maxrdf) for(j in 1:7) { x = apply(maxrdf[j,], 1, function(z) aggregate(z, by=list(s), FUN=sum) ) ha[j,] = x[[1]][2] } Unfortunately, ha gets converted into a list, and then I can't use it for my plots. And you can probably educate me on how to get what I am aiming for (a matrix with the rows as the days, the columns as the hours, and the content as the hourly sum of the 15-minute chunks) without using the above for loop. Thanks for the help, Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditioned xyplot, many y variables
On Sun, Feb 7, 2010 at 9:32 AM, Jacob Wegelin jacobwege...@fastmail.fm wrote: The example below creates parallel time-series plots of three different y variables conditioned by a dichotomous factor. In the graphical layout, • Each y variable inhabits its own row and is plotted on its own distinct scale. • Each level of the factor has its own column, but within each row the scale is held constant across columns. • The panels fit tightly (as they do in lattice) without superfluous whitespace or ticks. Currently I know of no lattice solution to this problem, only a traditional graphics solution. Can one solve this problem elegantly using lattice? Yes, for some definition of elegantly. See below. The difficulty is to lock the levels of the factor (the columns) into the same scale for each y variable (for each row), while allowing the scales to differ between the y variables (between the rows). This is not generally possible, as this makes sense only when rows and columns correspond to conditioning variables, which is not always true. (It is true for two conditioning variables with the default layout, but lattice does not treat that case specially.) However, you can start with the relation=free version, and (1) modify the limits to get same limits across rows, (2) remove the labels for the second column, and (3) remove the space allocated for those labels to get what you want: ## assign the trellis object to a variable for further manipulation fplot - xyplot ( ppp + QQQ + z ~ TIME | Species , group=ID , data=JUNK , ylab=c(ppp (mg/L), QQQ (pg/L), z (mIU/mL)) , xlab=c(Dog, feline) , type=o , strip= FALSE , outer=TRUE , layout=c(2,3) , scales=list( ppp=list( alternating=3) , y=list( relation=free , alternating=3 , rot=0 , log=T ) ) ) ## massage the limits (stored in fplot$y.limits) so that rows have the ## same limits. The limits are stored as a linear list, and it is ## useful to make it an array first. str(fplot$y.limits) dim(fplot$y.limits) - dim(fplot) for (i in seq_len(ncol(fplot$y.limits))) { rng - range(unlist(fplot$y.limits[,i])) for (j in seq_len(nrow(fplot$y.limits))) fplot$y.limits[j, i][[1]] - rng } str(fplot$y.limits) ## Next, drop the y-axis labels for the second column, and zap the ## space allocated for them. update(fplot, scales = list(y = list(at = rep(list(NA, numeric(0)), 3))), par.settings = list(layout.widths = list(axis.panel = c(1, 0 (Maybe I should wrap this up in a helper function.) -Deepayan Details: # Toy data: N-15 TIME - (1:N)/N ppp - TIME^2 QQQ - exp(TIME) z - ppp / QQQ JUNK-data.frame( ppp=ppp, QQQ=QQQ, z=z, TIME=TIME) JUNK$ID-1 jank-JUNK jank$ID-2 jank$ppp-jank$ppp / 2 jank$QQQ-jank$QQQ / 2 jank$z-jank$ppp/jank$QQQ JUNK-rbind(JUNK, jank) jank-JUNK jank$ppp-(jank$ppp) ^(1/4) jank$QQQ-(jank$QQQ) / 10 jank$z - jank$ppp / jank$QQQ JUNK$Species-Dog jank$Species-feline JUNK-rbind(JUNK, jank) JUNK$Species-factor(JUNK$Species) JUNK$ID-factor(JUNK$ID) summary(JUNK) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why does aggregate fail?
This works. But I wish I could write it without a lot of trial and error. :-( ha = matrix(nrow=7, ncol=24) colnames(ha) = as.character(c(0:23)) rownames(ha) = rownames(maxrdf) m = as.matrix(maxrdf) for(j in 1:7) { x = aggregate(m[j,], by=list(s), FUN=sum) ha[j,] = x[[2]] } On 2/7/2010 1:57 PM, James Rome wrote: On 2/7/2010 1:35 PM, David Winsemius wrote:But to answer your question: apply(d, 1, function(z) aggregate(z, by=list(s), FUN=sum) ) David, That works, but I do not understand why I could not use aggregate directly. And the answer comes out as a list, which thus far baffles me. How do I get the answer as a matrix in my original code, which I modified to use apply? ha = matrix(nrow=7, ncol=24) colnames(ha) = as.character(c(0:23)) rownames(ha) = rownames(maxrdf) for(j in 1:7) { x = apply(maxrdf[j,], 1, function(z) aggregate(z, by=list(s), FUN=sum) ) ha[j,] = x[[1]][2] } Unfortunately, ha gets converted into a list, and then I can't use it for my plots. And you can probably educate me on how to get what I am aiming for (a matrix with the rows as the days, the columns as the hours, and the content as the hourly sum of the 15-minute chunks) without using the above for loop. Thanks for the help, Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why does aggregate fail?
On Feb 7, 2010, at 1:57 PM, James Rome wrote: On 2/7/2010 1:35 PM, David Winsemius wrote:But to answer your question: apply(d, 1, function(z) aggregate(z, by=list(s), FUN=sum) ) David, That works, but I do not understand why I could not use aggregate directly. And the answer comes out as a list, which thus far baffles me. It comes out as a data.frame, ... just as promised in the help page. How do I get the answer as a matrix in my original code, which I modified to use apply? You could coerce either maxrdf or the aggregate returns to a matrix with as.matrix or data.matrix. ha = matrix(nrow=7, ncol=24) colnames(ha) = as.character(c(0:23)) rownames(ha) = rownames(maxrdf) for(j in 1:7) { x = apply(maxrdf[j,], 1, function(z) aggregate(z, by=list(s), FUN=sum) ) ha[j,] = x[[1]][2] } Unfortunately, ha gets converted into a list, and then I can't use it for my plots. And you can probably educate me on how to get what I am aiming for (a matrix with the rows as the days, the columns as the hours, and the content as the hourly sum of the 15-minute chunks) without using the above for loop. apply( data.matrix(maxrdf), 1 # loops over the rows function(z) aggregate(z, by=s, sum) ) #Gives you a bunch of dataframes produced by the serial application of aggregate. sapply(apply(maxrdf, 1, function(z) aggregate(z, by=list(s), sum) ), '[', 2) #Gives you a list of vectors by day...almost what you wanted ... # the ' [, 2' part is the extraction of the second column of the dataframe # And what I think you were asking for: do.call(rbind, sapply(apply(maxrdf, 1, function(z) aggregate(z, by=list(s), sum) ), '[', 2) ) # ... as a matrix with named rows [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] Sunday.x 10002000 15 0 6 53024 3 1 8 Monday.x 4120300 12 2126 213125 1 0 013 Tuesday.x 14218 22 23 34 2622 1723371332 015 Wednesday.x7421 11 24 27 39 1920 82024 12012 7 Thursday.x 00100 13 220 1617 13 7 013393031 Friday.x 637090 18 25 1922 102725 5 1 0 9 Saturday.x 73118 20 31 41 2319 1726 0 1 1 111 [,18] [,19] [,20] [,21] [,22] [,23] [,24] Sunday.x 3937282120 111 Monday.x 35 7 028 6 5 0 Tuesday.x 37373727232215 Wednesday.x1813283018 0 0 Thursday.x 34373529311917 Friday.x 28323632292020 Saturday.x 303232 7 0 0 0 Thanks for the help, Jim David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert R plots into annotated web-graphics
On Sun, Feb 7, 2010 at 2:35 PM, Rainer Tischler rainer_...@yahoo.de wrote: Dear all, I would like to make a large scatter plot created with R available as an interactive web graphic, in combination with additional text-annotations for each data point in the plot. The idea is to present the text-annotations in an HTML-table and inter-link the data points in the plot with their corresponding entries in the table, i.e. when clicking on a data point in the plot, the corresponding entry in the table should be highlighted or centered and vice-versa, when clicking on a table-entry, the corresponding point in the plot should be highlighted. I have seen that CRAN contains various R-packages for SVG-based output of interactive graphics (with hyperlinks and tool-tip annotations for each data point); however, SVG is not supported by all browsers. Is anybody aware of another solution for this problem (maybe based on image-maps and javascript)? If you have alternative ideas for interlinking tabular annotations with plotted data points, I would appreciate any recommendation/suggestion. (I work with R 2.8.1 on different 32-bit PCs with both Linux and Windows operating systems). My 'imagemaps' package? https://r-forge.r-project.org/projects/imagemap/ Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert R plots into annotated web-graphics
On Sun, Feb 7, 2010 at 2:35 PM, Rainer Tischler rainer_...@yahoo.de wrote: If you have alternative ideas for interlinking tabular annotations with plotted data points, I would appreciate any recommendation/suggestion. (I work with R 2.8.1 on different 32-bit PCs with both Linux and Windows operating systems). As an alternative suggestion to my imagemap package, you could use a javascript chart plotting library and just generate a data file and the html from R. Maybe flot: http://code.google.com/p/flot/ I find the R 'brew' package ideal for creating JS or HTML output files from object. Warning: this answer contains small parts. Some assembly required. Barry -- blog: http://geospaced.blogspot.com/ web: http://www.maths.lancs.ac.uk/~rowlings web: http://www.rowlingson.com/ twitter: http://twitter.com/geospacedman pics: http://www.flickr.com/photos/spacedman __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditioned xyplot, many y variables
On Sun, Feb 7, 2010 at 11:32 AM, Jacob Wegelin jacobwege...@fastmail.fm wrote: The example below creates parallel time-series plots of three different y variables conditioned by a dichotomous factor. In the graphical layout, • Each y variable inhabits its own row and is plotted on its own distinct scale. • Each level of the factor has its own column, but within each row the scale is held constant across columns. • The panels fit tightly (as they do in lattice) without superfluous whitespace or ticks. Currently I know of no lattice solution to this problem, only a traditional graphics solution. Can one solve this problem elegantly using lattice? It's easy with ggplot2: library(ggplot2) JUNKm - melt(JUNK, measure = c(ppp, QQQ, z)) ggplot(JUNKm, aes(TIME, value, group = ID)) + geom_line() + geom_point() + facet_grid(variable ~ Species, scales = free_y) + scale_y_log10() Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] x-axis plot problem
Hi all, I tried to have plot of many vector in one plot and i have got a nice plot but i have problem with x-axis. I want to have month and year only(Jul.07 means July 2007) in x-axis without appearing other number behaind it. I would appercit any help. The R code: F-c(7.49,6.91,6.78,6.99,7.44,7.42) M-c(4.81,4.51,5.21,4.65,4.75,3.86) P-c(7.49,15.03,15.19,15.32,15.42,15.45) B-c(16.24,15.87,12.94,11.82,10.86,9.61) time-c(Jul/07,Aug/07,Sep/07,Oct/07,Nov/07,Dec/07) model-data.frame(F,M,P,B) row.names(model)-c(Jul07,Aug07,Sep07,Oct07,Nov07,Dec007) model par(mgp=c(2, 1, 0),bty=o ) matplot(model, pch = c(1,22,17,16), type = o,lty=c(2,2,2,5), col =c(gray10, gray10,gray10,gray10),xlab=Month-Year,ylab=Zinth, xaxs = i, yaxs = i,main=Model Output) legend(topleft, legend = c(F, M,P,B),text.width = strwidth(1,000,000,),pch=c(1,22,17,16),col =c(gray10, gray10,gray10,gray10),lty=c(2,2,2,5), xjust = 1, yjust = 1, bty=n, cex=0.8, ncol=2) axis(1, 1:6, row.names(model)) -- View this message in context: http://n4.nabble.com/x-axis-plot-problem-tp1472286p1472286.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] x-axis plot problem
On Feb 7, 2010, at 3:22 PM, abotaha wrote: Hi all, I tried to have plot of many vector in one plot and i have got a nice plot but i have problem with x-axis. I want to have month and year only(Jul.07 means July 2007) in x-axis without appearing other number behaind it. I'm going to assume that you did not want that period between the Mon and Yr since you did not include it in your label strings. I would appercit any help. The R code: F-c(7.49,6.91,6.78,6.99,7.44,7.42) M-c(4.81,4.51,5.21,4.65,4.75,3.86) P-c(7.49,15.03,15.19,15.32,15.42,15.45) B-c(16.24,15.87,12.94,11.82,10.86,9.61) time-c(Jul/07,Aug/07,Sep/07,Oct/07,Nov/07,Dec/07) model-data.frame(F,M,P,B) row.names(model)-c(Jul07,Aug07,Sep07,Oct07,Nov07,Dec007) model par(mgp=c(2, 1, 0),bty=o ) matplot(model, pch = c(1,22,17,16), type = o,lty=c(2,2,2,5), col =c(gray10, gray10,gray10,gray10),xlab=Month- Year,ylab=Zinth, xaxs = i, yaxs = i,main=Model Output) # Change the xaxs=i to xaxt=n to suppress the numbers 1:6 from being stuck under the labels you later lay down with the axis command. legend(topleft, legend = c(F, M,P,B),text.width = strwidth(1,000,000,),pch=c(1,22,17,16),col =c(gray10, gray10,gray10,gray10),lty=c(2,2,2,5), xjust = 1, yjust = 1, bty=n, cex=0.8, ncol=2) axis(1, 1:6, row.names(model)) -- View this message in context: http://n4.nabble.com/x-axis-plot-problem-tp1472286p1472286.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] x-axis plot problem
I think you just need to set axes=FALSE in your call to matplot(). You'll then need to add the y-axis manually --- do axis(2) in addition to your call which draws the x axis. You'll also need to do box() if you want a box around your graph. cheers, Rolf Turner P. S. You are clearly a Good Person! A relative newbie who has read the Posting Guide! :-) R. T. On 8/02/2010, at 9:22 AM, abotaha wrote: Hi all, I tried to have plot of many vector in one plot and i have got a nice plot but i have problem with x-axis. I want to have month and year only(Jul.07 means July 2007) in x-axis without appearing other number behaind it. I would appercit any help. The R code: F-c(7.49,6.91,6.78,6.99,7.44,7.42) M-c(4.81,4.51,5.21,4.65,4.75,3.86) P-c(7.49,15.03,15.19,15.32,15.42,15.45) B-c(16.24,15.87,12.94,11.82,10.86,9.61) time-c(Jul/07,Aug/07,Sep/07,Oct/07,Nov/07,Dec/07) model-data.frame(F,M,P,B) row.names(model)-c(Jul07,Aug07,Sep07,Oct07,Nov07,Dec007) model par(mgp=c(2, 1, 0),bty=o ) matplot(model, pch = c(1,22,17,16), type = o,lty=c(2,2,2,5), col =c(gray10, gray10,gray10,gray10),xlab=Month-Year,ylab=Zinth, xaxs = i, yaxs = i,main=Model Output) legend(topleft, legend = c(F, M,P,B),text.width = strwidth(1,000,000,),pch=c(1,22,17,16),col =c(gray10, gray10,gray10,gray10),lty=c(2,2,2,5), xjust = 1, yjust = 1, bty=n, cex=0.8, ncol=2) -- View this message in context: http://n4.nabble.com/x-axis-plot-problem-tp1472286p1472286.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ## Attention: This e-mail message is privileged and confidential. If you are not the intended recipient please delete the message and notify the sender. Any views or opinions presented are solely those of the author. This e-mail has been scanned and cleared by MailMarshal www.marshalsoftware.com ## __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] x-axis plot problem
Hi abotaha, Modify your matplot() call as matplot(model, pch = c(1,22,17,16), type = o,lty=c(2,2,2,5), col =c(gray10, gray10,gray10,gray10),xlab=Month-Year,ylab=Zinth, xaxt = n, yaxs = i,main=Model Output) and then add axis(1, 1:6, time) HTH, Jorge On Sun, Feb 7, 2010 at 3:22 PM, abotaha wrote: Hi all, I tried to have plot of many vector in one plot and i have got a nice plot but i have problem with x-axis. I want to have month and year only(Jul.07 means July 2007) in x-axis without appearing other number behaind it. I would appercit any help. The R code: F-c(7.49,6.91,6.78,6.99,7.44,7.42) M-c(4.81,4.51,5.21,4.65,4.75,3.86) P-c(7.49,15.03,15.19,15.32,15.42,15.45) B-c(16.24,15.87,12.94,11.82,10.86,9.61) time-c(Jul/07,Aug/07,Sep/07,Oct/07,Nov/07,Dec/07) model-data.frame(F,M,P,B) row.names(model)-c(Jul07,Aug07,Sep07,Oct07,Nov07,Dec007) model par(mgp=c(2, 1, 0),bty=o ) matplot(model, pch = c(1,22,17,16), type = o,lty=c(2,2,2,5), col =c(gray10, gray10,gray10,gray10),xlab=Month-Year,ylab=Zinth, xaxs = i, yaxs = i,main=Model Output) legend(topleft, legend = c(F, M,P,B),text.width = strwidth(1,000,000,),pch=c(1,22,17,16),col =c(gray10, gray10,gray10,gray10),lty=c(2,2,2,5), xjust = 1, yjust = 1, bty=n, cex=0.8, ncol=2) axis(1, 1:6, row.names(model)) -- View this message in context: http://n4.nabble.com/x-axis-plot-problem-tp1472286p1472286.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Noval numbers
Dear everybody, How can I transform numbers to a positional system with the base of, e.g., nine, and do further operations with them? Thank you in advance Yours, sincerely Mag. Ferri Leberl __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Noval numbers
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. It would be useful if you could at least provide some examples of what you want to do. There are various ways of converting numbers back and forth. Are these integers or floating point? What type of operations do you want to do on them? On Sun, Feb 7, 2010 at 4:25 PM, Mag. Ferri Leberl ferri.leb...@gmx.at wrote: Dear everybody, How can I transform numbers to a positional system with the base of, e.g., nine, and do further operations with them? Thank you in advance Yours, sincerely Mag. Ferri Leberl __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using a variable name stored in another variable?
Hello, I'm trying to figure out how to create a data object, and then save it with a user-defined name that is input as a command line argument. I know how to create the object and assign it the new name, however, I can't figure out how to refer to the new name for a future operation such as save(). The code below creates an object and uses assign() to give it the user supplied name MyName. However, since I don't know what the new name is in advance, how do I refer to it in the save() command? (the example below only saves an object with the name, not the objec itself). Is it some kind of dereference? Any ideas? command: cat myscript.r | R --vanilla --args MyName script: # get the command-line argument for the variable name myobjectname - commandArgs()[4] # make some data somedata - matrix(rnorm(100),10,10) # make a filename for the saved object filename - paste(myobjectname, .RData, sep=) # assign data to the new name assign(myobjectname, somedata) # save the object to disk save(myobjectname, file=filename) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] contour persp
I have this data set that both x y are ordered vectors of length 600 700 respectively; z is a 600 by 700 matrix whose entry z[i,j] is either a missing value (indicated by 'NaN') or a real number between 0 and 1. The contour function contour(x,y,z) gives me a blank picture. I guess the reason is that most of z-entries are missing, only less than 1% are non missing. Question (1) Is there a way that I could manipulate the data or function to have the non-missing values plotted? Also, trying function persp gives me this error message persp(x,y,z) Error in persp.default(x, y, z) : invalid 'z' limits I look at the manual of persp. I guess, the error message comes from its internal call zlim = range(z, na.rm = TRUE) it appears to me that persp can't handle missing value yet its manual states clearly z: a matrix containing the values to be plotted ('NA's are allowed). Note that ‘x’ can be used instead of ‘z’ for convenience. Question (2) Can persp handle missing values in z? if the answer is a sounding yes, how should I do in my case? Please help, Thanks! Your frustrated Andrew __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Noval numbers
On 07/02/2010 4:25 PM, Mag. Ferri Leberl wrote: Dear everybody, How can I transform numbers to a positional system with the base of, e.g., nine, and do further operations with them? I don't understand what you want. Decimal, noval or binary are just ways to represent numbers as strings of characters. It doesn't make sense to me to say you are transforming them to a particular representation. You can represent them in a variety of ways: 10 (decimal), 11 (noval), 1010 (binary), ten (English), but it's still the same number. It does make sense to ask if you can convert numbers to one of these representations, or convert the representation back to the number; is that what you meant? Erich Neuwirth posted a function to do one way conversions: http://finzi.psych.upenn.edu/Rhelp10/2008-September/175003.html With his functions you can do makeDigitSeq(numberInBase(10, 9)) [1] 11 Duncan Murdoch Thank you in advance Yours, sincerely Mag. Ferri Leberl __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mboost: Interpreting coefficients from glmboost if center=TRUE
I'm running R 2.10.1 with mboost 2.0 in order to build predictive models . I am performing prediction on a binomial outcome, using a linear function (glmboost). However, I am running into some confusion regarding centering. (I am not aware of an mboost-specific mailing list, so if the main R list is not the right place for this topic, please let me know.) The boost_control() function allows for the choice between center=TRUE and center=FALSE. If I select center=FALSE, I am able to interpret the coefficients just like those from standard logistic regression. However, if I select center=TRUE, this is no longer the case. In theory and in practice with my data, centering improves the predictions made by the model, so this is an issue worth pursuing for me. Below is output from running the exact same data in exactly the same way, only differing by whether the center bit is flipped or not: Output with center=TRUE: [(Intercept)] = -0.04543632 [painscore] = 0.007553608 [Offset] = -0.546520621809327 Output with center=FALSE: [(Intercept)] = -0.989742 [painscore] = 0.001342585 [Offset] = -0.546520621809327 The mean of painscore is 741. It seems to me that for center=FALSE, mboost should modify the intercept by subtracting 741*0.007553608 from it (thus intercept should = -11.285). If I manually do this, the output is credible, and in the ballpark of that given by other methods (e.g., lrm or glm with a Binomial link function). If I don't do this, then the inverse logistic interpretation of the output is off by orders of magnitude. In the end, with center=TRUE, and I want to make a prediction based on the coefficients returned by mboost, the results only make sense if I manually rescale my independent variables prior to making a prediction. Is this the desired behavior, or am I doing something wrong? Many thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using a variable name stored in another variable?
Chris Seidel wrote: Hello, I'm trying to figure out how to create a data object, and then save it with a user-defined name that is input as a command line argument. I know how to create the object and assign it the new name, however, I can't figure out how to refer to the new name for a future operation such as save(). ..snip.. You probably want the get() function: get( myobjectname ) The help page for get() has a note which states that it is the compliment of assign(). Perhaps a similar note should be added to the help page for assign... Hope this helps! -Charlie -- View this message in context: http://n4.nabble.com/using-a-variable-name-stored-in-another-variable-tp1472371p1472400.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data views (Re: (Another) Bates fortune?)
Note : this post has been motivated more by the hierarchical data subject than the aside joke of Douglas Bates, but might be of interest to its respondents. Le vendredi 05 février 2010 à 21:56 +0100, Peter Dalgaard a écrit : Peter Ehlers wrote: I vote to 'fortunize' Doug Bates on Hierarchical data sets: which software to use? The widespread use of spreadsheets or SPSS data sets or SAS data sets which encourage the single table with a gargantuan number of columns, most of which are missing data in most cases approach to organization of longitudinal data is regrettable. http://n4.nabble.com/Hierarchical-data-sets-which-software-to-use-td1458477.html#a1470430 Hmm, well, it's not like long format data frames (which I actually think are more common in connection with SAS's PROC MIXED) are much better. Those tend to replicate base data unnecessarily - as if rats change sex with millisecond resolution. [ Note to Achim Zeilis : the rats changing sex with millisecond resolution quote is well worth a nomination to fortune fame ; it seems it is not one already... ] The correct data structure would be a relational database with multiple levels of tables, but, to my knowledge, no statistical software, including R, is prepared to deal with data in that form. Well, I can think of two exceptions : - BUGS, in its various incarnations (WinBUGS, OpenBUGS, JAGS), does not require its data to come from the same source. For example, while programming a hierarchical model (a. k. a. mixed-effect model), individual level variables may come from one source and various group level variables may come from other sources. Quite handy : no previous merge() required. Now, writing (and debugging !) such models in BUGS is another story... - SAS has had this concept of data view for a long time, its most useful incarnation being a data view of an SQL view. Again, this avoids the need to actually merge the datasets (which, AFAICR, is a serious piece of pain in the @$$ in SAS (maybe that's the *real* etymology of the name ?)). This problem has bugged me for a while. I think that the concept of a data view is right (after all, that's one of the core concepts of SQL for a reason...), but that implementing it *cleanly* in R is probably hard work. Using a DBMS for maintaining tables and views and querying them just at the right time does help, but the ability of using these DBMS data without importing them in R is, AFAIK, currently lacking. One upon a time, a very old version of RPgSQL (a Bioconductor package), aimed to such a representation : it created objects inheriting from data.frame to represent Postgres-based data, allowing to use these data transparently. This package dropped into oblivon when his creator and sole maintainer became unable to maintain it further. As far as I understand it, the DBI specification *might* allow the creation of such objects, but I am not aware of any driver actually implementing that. In fact, there are two elements of solution to this problem : a) creation of (abstract) objects representing data collections as data frames, with the same properties, but not requesting the creation of an actual data frame. As far as my (very poor) object-oriented knowledge goes, these objects should be, in C++/Python parlance, inherit from data.frame. b) creation of objects implementing various realizations of the objects created in a) : DBMS querying, actual data.frame querying (here I'm thinking of sqldf, which does this on the reverse direction, allowing querying R data frames to be queried in SQL. Quite handy...), etc ... I tried my hand once at building such a representation (for DBMS-deposited data), with partial success (read-only was OK, read-write was seriously buggy). But my S3 object-oriented code stinks, my Python is pytiful, and, as a public health measure, I won't even try to qualify my C++... So I leave implementation to better programmers as an exercise (a term project, or even a master's thesis subject is probably closer to truth...). A third, much larger, (implementation) element, is lacking in this picture : the algorithms used on these data. SAS is notoriously good (in some simple cases, such as ordinary regression) at handling datasets larger than available memory because the algorithms have been written with punched cards (maybe even paper tape) in mind : *one* *sequential* read of the data was the only *practical* way to go back in those days. So all the matrices and vectors necessary to the computation (notionally, X'X and X'Y) were built in memory in *one* step. Such an organization is probably impossible with most modern algorithms : see Douglas Bates' description of the lmer() algorithms for a nice, big counter-example, or consider MCMC... But coming closer to such an organization *seems* possible : see for example biglm. So I think that data views are a a worthy but not-so-easy possible goal
[R] specifying colors in a heatmap/image -like plot
Hi, I have searched for a solution but I failed to find an answer. I am hoping you may be able to help me. I have a data set where I have observations for a number of units (n =~40) over a period of time (t =~100) and I have a variable (Z) that codes a categorical variable for each observation. I want to produce a 2D plot where time is on the x-axis and units are on the y-axis. Then each block on the 2-d plot should take a color depending on variable Z. Z is not ordered so using a scale (like in heatmaps) does not make sense. In fact the values of Z have meanings that are intuitively related to colors (e.g. Z=3 means involvement by the United Nations so I want its color to be blue). Below is some code that gives an example of what I am aiming to do and why heatmap and image functions don't work for me. Thanks in advance for your help. # Example: Suppose Z had 3 values (0,1,2) and I had 8 observations. hitmep - matrix(c(0,2,1,0,2,1,1,0),2,4) # Graph 1: heatmap(hitmep2, Rowv =NA, Colv =NA, labrow =NULL, scale =none) # Graph 2: image(t(hitmep2), axes =FALSE) # I like the layout of the plots. My problem with these is that I don't want Z's values (0,1,2) to have colors on a scale. I want to specify, for example, 1=blue, 2=yellow and 3=green. Do you know how to do this? Thanks in advance, Kerim Can Kavakli -- View this message in context: http://n4.nabble.com/specifying-colors-in-a-heatmap-image-like-plot-tp1472388p1472388.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interactively editing point labels in a graph
The built-in R graphics system was not designed for interactivity -- there is no [feasible] way to detect the data point coordinates in a base graphics plot. The playwith package tries to figure out the coordinates from the data objects given in the call: this works for simple scatterplots etc, but is non-trivial for your CA plot. You *could* define functions to enable playwith to work correctly in this case: the functions would be called something like plotCoords.plot.ca and possibly case.names.ca (if case.names.default does not already work). Regards -Felix On 6 February 2010 23:11, trece por ciento el13porcie...@yahoo.com wrote: Many thanks, Felix It worked, simply importing the emf into PowerPoint! By the way, as you are the maintainer of playwith, a question: Why is playwith unable to cope with it? I liked very much the playwith option because it is easy to use, and has all the basic capabilities that I need. Best regards, Hug --- On Wed, 2/3/10, Felix Andrews fe...@nfrac.org wrote: From: Felix Andrews fe...@nfrac.org Subject: Re: [R] Interactively editing point labels in a graph To: trece por ciento el13porcie...@yahoo.com Cc: Liviu Andronic landronim...@gmail.com, r-help@r-project.org Date: Wednesday, February 3, 2010, 4:51 PM For your situation, perhaps the best option is to save the plot in a vector format like WMF, PDF or SVG, and open it with an external editor. Inkscape is a good one. On 4 February 2010 06:46, trece por ciento el13porcie...@yahoo.com wrote: Thanks, Liviu In a first look it seems OK. Two questions: 1. Playwith accept directly the plots created by the ca package, but it seems unable to identify the point labels For example: data(smoke) smoke ca(smoke) plot(ca(smoke)) playwith(plot(ca(smoke))) Then, if I try to identify a label playwith gives the message Sorry, can not guess the data point coordinates. Please contact the maintainer with suggestions. If I ask to select the label from a table playwith sends the following message to RGui: Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 2, 0 2. Can playwith draw ellipses or any other figure around selected points? (For the first question it seems my fault, but I don't know how to fix it) Hug --- On Wed, 2/3/10, Liviu Andronic landronim...@gmail.com wrote: From: Liviu Andronic landronim...@gmail.com Subject: Re: [R] Interactively editing point labels in a graph To: trece por ciento el13porcie...@yahoo.com Cc: r-help@r-project.org Date: Wednesday, February 3, 2010, 3:49 AM Hello On 2/3/10, trece por ciento el13porcie...@yahoo.com wrote: Dear experts, I would like to be able to interactively (if possible, with mouse and clik) edit point labels in graphs, Try playwith. Liviu particularly in multivariate graphs, such as the biplots you get after a correspondence analysis (with, for example, package ca), where labels tend to overlap. The graph aspect ratio is relevant (it needs to be mantained). And I'm working with Windows XP. In this kind of graphs points in the graph are identified with labels, generally long (see, for example: http://www.white-history.com/Greece_files/hlafreq.jpg), and sometimes -as in the example- it is good to group certain points within ellipses. Do you know if exists some package able to do this task? Thanks in advance, Hug __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Felix Andrews / 安福立 Postdoctoral Fellow Integrated Catchment Assessment and Management (iCAM) Centre Fenner School of Environment and Society [Bldg 48a] The Australian National University Canberra ACT 0200 Australia M: +61 410 400 963 T: + 61 2 6125 4670 E: felix.andr...@anu.edu.au CRICOS Provider No. 00120C -- http://www.neurofractal.org/felix/ -- Felix Andrews / 安福立 Postdoctoral Fellow Integrated Catchment Assessment and Management (iCAM) Centre Fenner School of Environment and Society [Bldg 48a] The Australian National University Canberra ACT 0200 Australia M: +61 410 400 963 T: + 61 2 6125 4670 E: felix.andr...@anu.edu.au CRICOS Provider
[R] Out-of-sample prediction with VAR
Good day, I'm using a VAR model to forecast sales with some extra variables (google trends data). I have divided my dataset into a trainingset (weekly sales + vars in 2006 and 2007) and a holdout set (2008). It is unclear to me how I should predict the out-of-sample data, because using the predict() function in the vars package seems to estimate my google trends vars as well. However, I want to forecast the sales figures, with knowledge of the actual google trends data. My questions: 1. How should I do this? I currently extract the linear model generated by the VAR(3) function to predict the holdout set, but that seems inappropriate? 2. In case that I am doing it right, how is it possible that a automatically fitted model with more variables actually performs less good (in terms of MAPE)? Shouldn't it at least predict just as well as the simple AR(3) by finding that the extra variables have no added value? My code: ts_Y - ts(log_residuals[1:104]); # detrended sales data ts_XGG - ts(salesmodeldata$gtrends_global[1:104]); ts_XGL - ts(salesmodeldata$gtrends_local[1:104]); training_matrix - data.frame(ts_Y, ts_XGG, ts_XGL); ### Try VAR(3) var_model - VAR (y=training_matrix, p=3, type=both, season=NULL, exogen=NULL, lag.max=NULL); ## Out of sample forecasting var.lm = lm(var_model$varresult$ts_Y); # the generated LM ts_Y - ts(log_residuals[105:155]); ts_XGG - ts(salesmodeldata$gtrends_global[105:155]); ts_XGL - ts(salesmodeldata$gtrends_local[105:155]); # Notice how I manually create the lagged values to be used in the Linear Model holdout_matrix - na.omit(data.frame(ts.union(ts_Y, ts_XGG, ts_XGL, ts_Y.l1 = lag(ts_Y,-1), ts_Y.l2 = lag(ts_Y,-2), ts_Y.l3 = lag(ts_Y,-3), ts_XGG.l1 = lag(ts_XGG,-1), ts_XGG.l2 = lag(ts_XGG,-2), ts_XGG.l3 = lag(ts_XGG,-3), ts_XGL.l1 = lag(ts_XGL,-1), ts_XGL.l2 = lag(ts_XGL,-2), ts_XGL.l3 = lag(ts_XGL,-3), const=1, trend=0.0001514194 ))); var.predict = predict(object=var_model, n.ahead=52, dumvar=holdout_matrix); ## Assess accuracy calc_mape (holdout_matrix$ts_Y, var.predict, islog=T, print=T) Some context: For my Master's thesis I'm using R to test the predictive power of web metrics (such as google trends data pageviews) in sales forecasting. To properly assess this, I employ a simple AR model (for time series without the extra variables) and a VAR model for the predictions with the extra variables. I also develop a random forest with, and without the buzz variables and see if MAPE improves. Many thanks in advance! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using a variable name stored in another variable?
Hi Charlie, get() will return the contents (value) of a variable. But what I want is to save the named object. Something like save(get(myobjectname), ...) doesn't work. In the environment, is that object of interest, and a variable which holds the name of the object of interest. If you don't know the name of the object, but only the variable which contains it's name, how do you use that information to save the object? -Chris From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Sharpie [ch...@sharpsteen.net] Sent: Sunday, February 07, 2010 4:13 PM To: r-help@r-project.org Subject: Re: [R] using a variable name stored in another variable? Chris Seidel wrote: Hello, I'm trying to figure out how to create a data object, and then save it with a user-defined name that is input as a command line argument. I know how to create the object and assign it the new name, however, I can't figure out how to refer to the new name for a future operation such as save(). ..snip.. You probably want the get() function: get( myobjectname ) The help page for get() has a note which states that it is the compliment of assign(). Perhaps a similar note should be added to the help page for assign... Hope this helps! -Charlie -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] specifying colors in a heatmap/image -like plot
On Feb 7, 2010, at 4:57 PM, kerimcan wrote: Hi, I have searched for a solution but I failed to find an answer. I am hoping you may be able to help me. I have a data set where I have observations for a number of units (n =~40) over a period of time (t =~100) and I have a variable (Z) that codes a categorical variable for each observation. I want to produce a 2D plot where time is on the x-axis and units are on the y-axis. Then each block on the 2-d plot should take a color depending on variable Z. Z is not ordered so using a scale (like in heatmaps) does not make sense. In fact the values of Z have meanings that are intuitively related to colors (e.g. Z=3 means involvement by the United Nations so I want its color to be blue). Below is some code that gives an example of what I am aiming to do and why heatmap and image functions don't work for me. Thanks in advance for your help. # Example: Suppose Z had 3 values (0,1,2) and I had 8 observations. hitmep - matrix(c(0,2,1,0,2,1,1,0),2,4) # Graph 1: heatmap(hitmep2, Rowv =NA, Colv =NA, labrow =NULL, scale =none) # Graph 2: image(t(hitmep2), axes =FALSE) # I like the layout of the plots. My problem with these is that I don't want Z's values (0,1,2) to have colors on a scale. I want to specify, for example, 1=blue, 2=yellow and 3=green. Do you know how to do this? Well, if you fix the name of your data vector and add the glaringly obvious color argument, it seems to work: hitmep2 - matrix(c(0,2,1,0,2,1,1,0),2,4) # Graph 1: heatmap(hitmep2, col=c(red, green, blue),Rowv =NA, Colv =NA, labrow =NULL, scale =none) # Graph 2: image(t(hitmep2), col=c(red, green, blue), axes =FALSE) Unless I don't understand what you wanted... always a possibility. -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using a variable name stored in another variable?
On 07/02/2010 6:05 PM, Chris Seidel wrote: Hi Charlie, get() will return the contents (value) of a variable. But what I want is to save the named object. Something like save(get(myobjectname), ...) doesn't work. I think you want save(list=myobjectname, file= ...) assuming that the object has already been created with that name. If it hasn't, you'll need two steps: assign( myobjectname, value) save(list=myobjectname, file=...) These could be wrapped in local( { ... } ) if you are worried that myobjectname might be the name of an object you want to keep. For example, x - 1 # Create a variable I don't want to mess with name - x # choose a name to save under local({ assign(name, 2) ; save(list=name, file=test.Rdata) }) # That created test.Rdata with x equal to 2 Duncan Murdoch In the environment, is that object of interest, and a variable which holds the name of the object of interest. If you don't know the name of the object, but only the variable which contains it's name, how do you use that information to save the object? -Chris From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Sharpie [ch...@sharpsteen.net] Sent: Sunday, February 07, 2010 4:13 PM To: r-help@r-project.org Subject: Re: [R] using a variable name stored in another variable? Chris Seidel wrote: Hello, I'm trying to figure out how to create a data object, and then save it with a user-defined name that is input as a command line argument. I know how to create the object and assign it the new name, however, I can't figure out how to refer to the new name for a future operation such as save(). ..snip.. You probably want the get() function: get( myobjectname ) The help page for get() has a note which states that it is the compliment of assign(). Perhaps a similar note should be added to the help page for assign... Hope this helps! -Charlie -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using a variable name stored in another variable?
Tena koe Chris Does the following help? dfName - 'myDf' save(dfName, file='test1') save('dfName', file='test2') save('myDf', file='test3') save(myDf, file='test4') Peter Alspach -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Chris Seidel Sent: Monday, 8 February 2010 12:05 p.m. To: r-help@r-project.org Subject: Re: [R] using a variable name stored in another variable? Hi Charlie, get() will return the contents (value) of a variable. But what I want is to save the named object. Something like save(get(myobjectname), ...) doesn't work. In the environment, is that object of interest, and a variable which holds the name of the object of interest. If you don't know the name of the object, but only the variable which contains it's name, how do you use that information to save the object? -Chris From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Sharpie [ch...@sharpsteen.net] Sent: Sunday, February 07, 2010 4:13 PM To: r-help@r-project.org Subject: Re: [R] using a variable name stored in another variable? Chris Seidel wrote: Hello, I'm trying to figure out how to create a data object, and then save it with a user-defined name that is input as a command line argument. I know how to create the object and assign it the new name, however, I can't figure out how to refer to the new name for a future operation such as save(). ..snip.. You probably want the get() function: get( myobjectname ) The help page for get() has a note which states that it is the compliment of assign(). Perhaps a similar note should be added to the help page for assign... Hope this helps! -Charlie -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why does smoothScatter clip when xlim and ylim increased?
On Sat, Feb 6, 2010 at 6:15 AM, Duncan Murdoch murd...@stats.uwo.ca wrote: On 06/02/2010 7:51 AM, Jennifer Lyon wrote: Hi: Is there a way to get smoothScatter to not clip when I increase the xlim and ylim parameters? Consider the following example: set.seed(17) x1-rnorm(100) x2-rnorm(100) smoothScatter(x1,x2) #Now if I increase xlim and ylim notice that the plot seems to be clipped at the former xlim, and ylim boundaries: smoothScatter(x1,x2, xlim=c(-5,5), ylim=c(-5,5)) If you follow the links on the help page, you'll see that smoothScatter uses bkde2D, which has a range.x argument to control the range of the smoothing. The smoothScatter function never passes the xlim and ylim values to bkde2D, only to the plotting functions, presumably because the author expected you to use them to limit the range, not extend it. You can get the behaviour you want with specified xlim and ylim by modifying one line in smoothScatter: map - grDevices:::.smoothScatterCalcDensity(x, nbin, bandwidth) should become map - grDevices:::.smoothScatterCalcDensity(x, nbin, bandwidth, list(xlim, ylim)) (You can use fix(smoothScatter) to edit your own local copy of smoothScatter and make this change.) However, this messes up the default plot, so a better patch would be needed to permanently fix this. Duncan Murdoch Ah. A very helpful explanation. Further exploration led to the realization that if I passed in par(usr) instead of xlim and ylim that both the case I care about and the default case display without clipping. Of course I discovered (very reasonably) that par(usr) doesn't exist until plot() is called, so I ended up calling plot and then modifying the image call with add=T. Along the lines of: plot(NA,NA, xlab = xlab, ylab = ylab, xlim = xlim, ylim = ylim, xaxs = xaxs, yaxs = yaxs, type=n, ...) usr-par(usr) map - grDevices:::.smoothScatterCalcDensity(x, nbin, bandwidth, list(usr[1:2],usr[3:4])) ... image(xm, ym, z = dens, col = colramp(256), xlab = xlab, ylab = ylab, xlim = xlim, ylim = ylim, xaxs = xaxs, yaxs = yaxs, add=T, ...) This is somewhat wasteful, as bkde2D is computing densities at grid points well away from where the data is located, so I'll just have to increase the number of grid points. Thank you for your help. Jen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading hierarchical data
Try this. It uses input defined in Jim's post and defines the rectype of each row (1 or 2). It then reads the rectype 1 records into DF1 using read.fwf and the rectype 2 records into DF2 also using read.fwf. ix is defined to have one component per personal record giving the row number in DF1 of the corresponding family. We combine DF1 and DF2 using ix and remove the column names that start with X. # record type (1 or 2) rectype - substr(input, 7, 7) # read in record type 1 input1 - input[rectype == 1] DF1 - read.fwf(textConnection(input1), widths = c(5, 1, 1, 1, 1), col.names = c(familyid, X, X, X, dwelling)) # read in record type 2 input2 - input[rectype == 2] DF2 - read.fwf(textConnection(input2), widths = c(5, 1, 1, 2, 1, 1), col.names = c(personalid, X, X, age, X, sex)) # ix is the index in DF1 of family row corresponding to each personal row in DF2 ix - cumsum(rectype == 1)[rectype == 2] DF - cbind(DF1[ix,], DF2) DF - DF[substr(names(DF), 1, 1) != X] so DF looks like this: DF familyid dwelling personalid age sex 1 64701 1 32 0 1.1 64701 2 30 1 2 74700 1 40 1 3 84700 1 27 0 4 94700 1 13 1 4.1 94700 2 22 0 4.2 94700 3 24 1 5 104701 1 20 0 5.1104701 2 11 1 6 114700 1 17 0 6.1114700 2 10 1 6.2114700 3 26 1 On Sun, Feb 7, 2010 at 10:57 AM, Saba(Home) saba...@charter.net wrote: I would like to read the following hierarchical data set. There is a family record followed by one or more personal records. If col. 7 is 1 it is a family record. If it is 2 it is a personal record. The family record is formatted as follows: col. 1-5 family id col. 7 1 col. 9 dwelling type code The personal record is formatted as follows: col. 1-5 personal id col. 7 2 col. 8-9 age col. 11 sex code The first six family and accompanying personal records look like this: 06470 1 1 1 232 0 2 230 1 07470 1 0 1 240 1 08470 1 0 1 227 0 09470 1 0 1 213 1 2 222 0 3 224 1 10470 1 1 1 220 0 2 211 1 11470 1 0 1 217 0 2 210 1 3 226 1 I want to create a dataset containing . family ID . dwelling code . person ID . age . sex code The dataset will contain one observation per person, and the with family information repeated for people in the same family. Can anyone help? Thanks, Richard Saba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mboost: Interpreting coefficients from glmboost if center=TRUE
On Feb 7, 2010, at 5:03 PM, Kyle Werner wrote: I'm running R 2.10.1 with mboost 2.0 in order to build predictive models . I am performing prediction on a binomial outcome, using a linear function (glmboost). However, I am running into some confusion regarding centering. (I am not aware of an mboost-specific mailing list, so if the main R list is not the right place for this topic, please let me know.) The boost_control() function allows for the choice between center=TRUE and center=FALSE. If I select center=FALSE, I am able to interpret the coefficients just like those from standard logistic regression. However, if I select center=TRUE, this is no longer the case. In theory and in practice with my data, centering improves the predictions made by the model, so this is an issue worth pursuing for me. Below is output from running the exact same data in exactly the same way, only differing by whether the center bit is flipped or not: Output with center=TRUE: [(Intercept)] = -0.04543632 [painscore] = 0.007553608 [Offset] = -0.546520621809327 Output with center=FALSE: [(Intercept)] = -0.989742 [painscore] = 0.001342585 [Offset] = -0.546520621809327 The mean of painscore is 741. It seems to me that for center=FALSE, mboost should modify the intercept by subtracting 741*0.007553608 from it (thus intercept should = -11.285). If I manually do this, the output is credible, and in the ballpark of that given by other methods (e.g., lrm or glm with a Binomial link function). If I don't do this, then the inverse logistic interpretation of the output is off by orders of magnitude. In the end, with center=TRUE, and I want to make a prediction based on the coefficients returned by mboost, the results only make sense if I manually rescale my independent variables prior to making a prediction. Is this the desired behavior, or am I doing something wrong? I don't know, but my question is ... why aren't you using the predict method for that sort of object? Presumably the authors of the package know how to recognize the differences in the objects. Testing confirms this to be the case with the first example in the glmboost help page. Many thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interactively editing point labels in a graph
Create your plot and save it in wmf format, e.g. DF - as.data.frame(state.x77) plot(Income ~ log(Population), DF, pch = 20) with(DF, text(log(Population), Income, rownames(state.x77), cex = 0.5, pos = 4)) savePlot(states.wmf) Then insert it into Microsoft Word, right click the image, choose Edit and you can edit all the text labels. On Wed, Feb 3, 2010 at 2:57 AM, trece por ciento el13porcie...@yahoo.com wrote: Dear experts, I would like to be able to interactively (if possible, with mouse and clik) edit point labels in graphs, particularly in multivariate graphs, such as the biplots you get after a correspondence analysis (with, for example, package ca), where labels tend to overlap. The graph aspect ratio is relevant (it needs to be mantained). And I'm working with Windows XP. In this kind of graphs points in the graph are identified with labels, generally long (see, for example: http://www.white-history.com/Greece_files/hlafreq.jpg), and sometimes -as in the example- it is good to group certain points within ellipses. Do you know if exists some package able to do this task? Thanks in advance, Hug __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading hierarchical data
Here is a further simplification. We use the colClasses= argument with NULL for the columns we do not want so we do not have to later remove those columns. # record type (1 or 2) rectype - substr(input, 7, 7) # read in record type 1 input1 - input[rectype == 1] DF1 - read.fwf(textConnection(input1), widths = c(5, 1, 1, 1, 1), col.names = c(familyid, , , , dwelling), colClasses = c(numeric, NULL, NULL, NULL, numeric)) # read in record type 2 input2 - input[rectype == 2] DF2 - read.fwf(textConnection(input2), widths = c(5, 1, 1, 2, 1, 1), col.names = c(personalid, , , age, , sex), colClasses = c(numeric, NULL, NULL, numeric, NULL, numeric)) # ix is the index in DF1 of family row corresponding to each personal row in DF2 ix - cumsum(rectype == 1)[rectype == 2] DF - cbind(DF1[ix,], DF2) DF On Sun, Feb 7, 2010 at 6:30 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Try this. It uses input defined in Jim's post and defines the rectype of each row (1 or 2). It then reads the rectype 1 records into DF1 using read.fwf and the rectype 2 records into DF2 also using read.fwf. ix is defined to have one component per personal record giving the row number in DF1 of the corresponding family. We combine DF1 and DF2 using ix and remove the column names that start with X. # record type (1 or 2) rectype - substr(input, 7, 7) # read in record type 1 input1 - input[rectype == 1] DF1 - read.fwf(textConnection(input1), widths = c(5, 1, 1, 1, 1), col.names = c(familyid, X, X, X, dwelling)) # read in record type 2 input2 - input[rectype == 2] DF2 - read.fwf(textConnection(input2), widths = c(5, 1, 1, 2, 1, 1), col.names = c(personalid, X, X, age, X, sex)) # ix is the index in DF1 of family row corresponding to each personal row in DF2 ix - cumsum(rectype == 1)[rectype == 2] DF - cbind(DF1[ix,], DF2) DF - DF[substr(names(DF), 1, 1) != X] so DF looks like this: DF familyid dwelling personalid age sex 1 6470 1 1 32 0 1.1 6470 1 2 30 1 2 7470 0 1 40 1 3 8470 0 1 27 0 4 9470 0 1 13 1 4.1 9470 0 2 22 0 4.2 9470 0 3 24 1 5 10470 1 1 20 0 5.1 10470 1 2 11 1 6 11470 0 1 17 0 6.1 11470 0 2 10 1 6.2 11470 0 3 26 1 On Sun, Feb 7, 2010 at 10:57 AM, Saba(Home) saba...@charter.net wrote: I would like to read the following hierarchical data set. There is a family record followed by one or more personal records. If col. 7 is 1 it is a family record. If it is 2 it is a personal record. The family record is formatted as follows: col. 1-5 family id col. 7 1 col. 9 dwelling type code The personal record is formatted as follows: col. 1-5 personal id col. 7 2 col. 8-9 age col. 11 sex code The first six family and accompanying personal records look like this: 06470 1 1 1 232 0 2 230 1 07470 1 0 1 240 1 08470 1 0 1 227 0 09470 1 0 1 213 1 2 222 0 3 224 1 10470 1 1 1 220 0 2 211 1 11470 1 0 1 217 0 2 210 1 3 226 1 I want to create a dataset containing . family ID . dwelling code . person ID . age . sex code The dataset will contain one observation per person, and the with family information repeated for people in the same family. Can anyone help? Thanks, Richard Saba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dataframe question
Folks: Good day. Please see the code below. three_wk_out is a dataframe with columns wk1 through wk209. I want to change the format of the columns. I am trying the code below but it does not work. I need $week in the for loop interpreted as wk1, wk2, etc. Could you please help? Thanks. Satish R code below week_list - paste(wk,c(1:209),sep=) for (week in week_list) { three_wk_out$week - as.numeric(three_wk_out$week) } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe question
Tena koe Satish Try using three_wk_out[,week] - as.numeric(tree_wk_out[,week]) HTH Peter Alspach -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Vadlamani, Satish {FLNA} Sent: Monday, 8 February 2010 1:51 p.m. To: r-help@r-project.org Subject: [R] dataframe question Folks: Good day. Please see the code below. three_wk_out is a dataframe with columns wk1 through wk209. I want to change the format of the columns. I am trying the code below but it does not work. I need $week in the for loop interpreted as wk1, wk2, etc. Could you please help? Thanks. Satish R code below week_list - paste(wk,c(1:209),sep=) for (week in week_list) { three_wk_out$week - as.numeric(three_wk_out$week) } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3D plot of following data
Hi Jim Lemon wrote: On 02/02/2010 11:01 PM, walter.dju...@chello.at wrote: Hello R-experts, I am having difficulties with 3D plotting (i.e. the evolution of various forward curves through time). I have two comma seperated files both ordered by date (in the first column) one containing contracts (meaning forward delivery months from YEAR_ Letter F ... January through letter Z ... December) and the other holding the closing price of the respective contract on the day also defined in the first column (see attachments). What I would like to do is plot a three dimensional figure with trade day (date) on the X-axis, contract on the Y-axis and the price of the forward contract being the z-value. I am quite a newbie and did not manage to merge these two files in a logic way, so that R could do a 3D plot. Has anyone tried to program Hans Rosling's time evolution graphs in R? Take a look at http://www.omegahat.org/SVGAnnotation/JSSPaper.html#fig:gapMSS Paul Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr Paul Murrell Department of Statistics The University of Auckland Private Bag 92019 Auckland New Zealand 64 9 3737599 x85392 p...@stat.auckland.ac.nz http://www.stat.auckland.ac.nz/~paul/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe question
On Feb 7, 2010, at 7:51 PM, Vadlamani, Satish {FLNA} wrote: Folks: Good day. Please see the code below. three_wk_out is a dataframe with columns wk1 through wk209. I want to change the format of the columns. I am trying the code below but it does not work. I need $week in the for loop interpreted as wk1, wk2, etc. Could you please help? Thanks. Satish R code below week_list - paste(wk,c(1:209),sep=) Or more functionally: three_wk_out - as.data.frame( lapply(three_wk_out, some_function) ) E.g.: df a b c x 1 1 0 0 1 2 2 3 2 4 3 1 2 1 5 4 2 0 3 2 df - as.data.frame(lapply(df, ^, 2)) df a b c x 1 1 0 0 1 2 16 81 16 256 3 1 16 1 625 4 16 0 81 16 for (week in week_list) { three_wk_out$week - as.numeric(three_wk_out$week) } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] metafor package: effect sizes are not fully independent
Dear Gang, It seems that it is possible to use a univariate meta-analysis to handle your multivariate effect sizes. If you want to calculate a weighted average first, Hedges and Olkin (1985) has discussed this approach. Hedges, L. V., Olkin, I. (1985). Statistical methods for meta-analysis. Orlando, FL: Academic Press. Regards, Mike -- - Mike W.L. Cheung Phone: (65) 6516-3702 Department of Psychology Fax: (65) 6773-1843 National University of Singapore http://courses.nus.edu.sg/course/psycwlm/internet/ - On Mon, Feb 8, 2010 at 6:48 AM, Gang Chen gangch...@gmail.com wrote: Dear Mike, Thanks a lot for the kind help! Actually a few months ago I happened to read a couple of your posts on the R-help archive when I was exploring the possibility of using lme() in R for meta analysis. First of all, I didn't specify the meta analysis model for my cases correctly in my previous message. Currently I'm only interested in random- or mixed-effects meta analysis. So what you've suggested is directly relevant to what I've been looking for, especially for case (2). I'll try to gather those references you listed, and figure out the details. Also I think I didn't state my case (1) clearly in my previous post. In that case, all the effect sizes are the same and in the same condition too (e.g., happy), but each source has multiple samples of the measurement (and also measurement error, or standard error). Could this still be handled as a multivariate meta analysis since the samples for the the same source are correlated? Or somehow the multiple measures from the same source can be somehow summarized (weighted average?) before the meta analysis? Your suggestions are highly appreciated. Best wishes, Gang On Sun, Feb 7, 2010 at 10:39 AM, Mike Cheung mikewlche...@gmail.com wrote: Dear Gang, Here are just some general thoughts. Wolfgang Viechtbauer will be a better position to answer questions related to metafor. For multivariate effect sizes, we first have to estimate the asymptotic sampling covariance matrix among the effect sizes. Formulas for some common effect sizes are provided by Gleser and Olkin (2009). If a fixed-effects model is required, it is quite easy to write your own GLS function to conduct the multivariate meta-analysis (see e.g., Becker, 1992). If a random-effects model is required, it is more challenging in R. SAS Proc MIXED can do the work (e.g., van Houwelingen, Arends, Stijnen, 2002). Sometimes, it is possible to transform the multivariate effect sizes into independent effect sizes (Kalaian Raudenbush, 1996; Raudenbush, Becker, Kalaian, 1988). Then univariate meta-analysis, e.g., metafor(), can be performed on the transformed effect sizes. This approach works if it makes sense to pool the multivariate effect sizes as in your case (2)- the effect sizes are the same but in different conditions (happy, sad, and neutral). However, this approach does not work if the multivariate effect sizes are measuring different concepts, e.g., verbal achievement and mathematical achievement. Hope this helps. Becker, B. J. (1992). Using results from replicated studies to estimate linear models. Journal of Educational Statistics, 17, 341-362. Gleser, L. J., Olkin, I. (2009). Stochastically dependent effect sizes. In H. Cooper, L. V. Hedges, and J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis, 2nd edition (pp. 357-376). New York: Russell Sage Foundation. Kalaian, H. A., Raudenbush, S. W. (1996). A multivariate mixed linear model for meta-analysis. Psychological Methods, 1, 227-235. Raudenbush, S. W., Becker, B. J., Kalaian, H. (1988). Modeling multivariate effect sizes. Psychological Bulletin, 103, 111-120. van Houwelingen, H.C., Arends, L.R., Stijnen, T. (2002). Advanced methods in meta-analysis: multivariate approach and meta-regression. Statistics in Medicine, 21, 589-624. Regards, Mike -- - Mike W.L. Cheung Phone: (65) 6516-3702 Department of Psychology Fax: (65) 6773-1843 National University of Singapore http://courses.nus.edu.sg/course/psycwlm/internet/ - On Sat, Feb 6, 2010 at 6:07 AM, Gang Chen gangch...@gmail.com wrote: In a classical meta analysis model y_i = X_i * beta_i + e_i, data {y_i} are assumed to be independent effect sizes. However, I'm encountering the following two scenarios: (1) Each source has multiple effect sizes, thus {y_i} are not fully independent with each other. (2) Each source has multiple effect sizes, and each of the effect size from a source can be categorized as one of a factor levels (e.g., happy, sad, and neutral). Maybe better denote the data
Re: [R] Noval numbers
The attached file gives functions to go both directions. I have used it in class for many years. This is very useful when studying machine representations of numbers, for understanding mixed-radix number systems, for example time (days, hours, minutes, seconds) or British money (pounds, shillings, pence), and for unique indexing of cells in designed experiments. Rich ## base ## Richard M. Heiberger ## See Section 12.1.4.2 of ## Richard M. Heiberger ## Computation for the Analysis of Designed Experiments ## Wiley, 1989 ## defaults to 8 bit binary base - function(x, basis=c(2,2,2,2,2,2,2,2)) { cb - rev(cumprod(c(1,basis))) xx - x y - rep(0, length(cb)) for (i in 1:length(cb)) { yy - xx %/% cb[i] if (yy 0) { y[i] - yy xx - xx %% cb[i] } } names(y) - cb y } baseinv - function(y, basis=c(2,2,2,2,2,2,2,2)) { sum(y * rev(cumprod(c(1,basis } base(200) baseinv(.Last.value) ## British money basis - c(12,20) ## 12 pence per shilling, 20 shillings per pound sterling base(498, basis) baseinv(.Last.value, basis) ## American weight base(100, 16) ## 16 ounces per pound avoirdupois baseinv(.Last.value, 16) ## time basis - c(60,60,24) ## 60 seconds per minute, 60 minutes per hour, 24 hours per day x - c(1, 2, 3, 40) y - baseinv(x, basis) y base(y, basis) ## binary arithmetic with 8 bits basis - c(2,2,2,2,2,2,2,2) x - 100 y - base(x, basis) y baseinv(y, basis) base(1) baseinv(.Last.value) base(200) baseinv(.Last.value) base(1000) baseinv(.Last.value) ## IEEE with 53 base 2 digits x - c( 101, 102, 103, 1001, 1002, 1003, 10001, 10002, 10003 ## the last three values illustrate ) ## the effects of .Machine$double.eps x sprintf(%17.0f, x) y - sapply(x, base, basis=rep(2,54)) y print(digits=17, apply(y, 2, baseinv, basis=rep(2,54)) ) ## base 9 a - base(132, c(9,9,9)) b - base(125, c(9,9,9)) a b a+b baseinv(a+b, c(9,9,9)) base(baseinv(a+b, c(9,9,9)), c(9,9,9)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe question
On Feb 7, 2010, at 8:14 PM, David Winsemius wrote: On Feb 7, 2010, at 7:51 PM, Vadlamani, Satish {FLNA} wrote: Folks: Good day. Please see the code below. three_wk_out is a dataframe with columns wk1 through wk209. I want to change the format of the columns. I am trying the code below but it does not work. I need $week in the for loop interpreted as wk1, wk2, etc. Could you please help? Thanks. Satish R code below week_list - paste(wk,c(1:209),sep=) Or more functionally: three_wk_out - as.data.frame( lapply(three_wk_out, some_function) ) Or if you wanted to just change the particular columns that matched the wk pattern: idx - grep(wk, names(three_wk_out)) three_wk_out[, idx ] - apply( three_wk_out[, idx ], 2, as.numeric) (I probably should have used apply( ___ , 2, fn) in the prior effort rather than coercing a list back to a dataframe.) E.g.: a b c x 1 1 0 0 1 2 2 3 2 4 3 1 2 1 5 4 2 0 3 2 df - as.data.frame(lapply(df, ^, 2)) df a b c x 1 1 0 0 1 2 16 81 16 256 3 1 16 1 625 4 16 0 81 16 for (week in week_list) { three_wk_out$week - as.numeric(three_wk_out$week) } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] split plot with aov
I have a factor SAMPLES which is at the lowest level (within) of a split plot anova model but this factor also appears in the ANOVA table at the block level. This happens for unbalanced responses but not for balanced responses. I would be grateful for an explanation of this. The block error term is Day:Treatment:Temp and the within error term is Day:Treatment:Temp:Samples. Day is the main plot, Treatment the first split, Temp is within Treatment and then Samples within Temp. Thanks, Penny B. -- View this message in context: http://n4.nabble.com/split-plot-with-aov-tp1472521p1472521.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with apply()
I have a 2 column data.frame: d[1:5,] a b 180015 C 280016 B 380023 C 480062 B 580069 B I want to apply a function across each row: for(i in 1:nrow(d)) { +myFun(con, d[i,]$a, d[i,]$b) + } How do I do this using apply()? I'm unsure how to tell apply() to pass data from columns a and b for a given row as arguments to the function myFun(). Thanks in advance for any pointers, Nathan -- Dr. Nathan S. Watson-Haigh OCE Post Doctoral Fellow CSIRO Livestock Industries University Drive Townsville, QLD 4810 Australia Tel: +61 (0)7 4753 8548 Fax: +61 (0)7 4753 8600 Web: http://www.csiro.au/people/Nathan.Watson-Haigh.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] split plot with aov
The dummy variables for the factors in balanced designs are orthogonal. The treatment dummy variables are not orthogonal to the block dummy variables for unbalanced designs. That is essentially what the term balanced means. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mboost: Interpreting coefficients from glmboost if center=TRUE
Thanks for your reply. In fact, I do use the predict method for model assessment, and it shows that centering leads to a substantial improvement using even the bluntest of assessments of 'goodness' (i.e., binary categorization accuracy). So I agree that the package authors must have internal tools to reverse the effects of centering the variables, at least within the predict method. But it seems to me that the coefficients that I get out should be related to the values that I input, not to the centered values. In other words, centering seems like it should be done invisibly; unless I center the variables myself, I would expect the coefficients to be applicable to the original data. I extract the coefficients returned by the model and store them in a database which is web accessible. I reconstruct models periodically, and track various statistics associated with these models in the database. This is why I highly value the fact that mboost has glmboost, which can return linearly interpretable coefficients. It is also why I do not directly call upon R every time I want to query a model. (As an aside, if I were to use R directly, I might consider the gamboost or blackboost methods, which do not return scalar coefficients that are readily extractable.) On Sun, Feb 7, 2010 at 6:31 PM, David Winsemius dwinsem...@comcast.net wrote: On Feb 7, 2010, at 5:03 PM, Kyle Werner wrote: I'm running R 2.10.1 with mboost 2.0 in order to build predictive models . I am performing prediction on a binomial outcome, using a linear function (glmboost). However, I am running into some confusion regarding centering. (I am not aware of an mboost-specific mailing list, so if the main R list is not the right place for this topic, please let me know.) The boost_control() function allows for the choice between center=TRUE and center=FALSE. If I select center=FALSE, I am able to interpret the coefficients just like those from standard logistic regression. However, if I select center=TRUE, this is no longer the case. In theory and in practice with my data, centering improves the predictions made by the model, so this is an issue worth pursuing for me. Below is output from running the exact same data in exactly the same way, only differing by whether the center bit is flipped or not: Output with center=TRUE: [(Intercept)] = -0.04543632 [painscore] = 0.007553608 [Offset] = -0.546520621809327 Output with center=FALSE: [(Intercept)] = -0.989742 [painscore] = 0.001342585 [Offset] = -0.546520621809327 The mean of painscore is 741. It seems to me that for center=FALSE, mboost should modify the intercept by subtracting 741*0.007553608 from it (thus intercept should = -11.285). If I manually do this, the output is credible, and in the ballpark of that given by other methods (e.g., lrm or glm with a Binomial link function). If I don't do this, then the inverse logistic interpretation of the output is off by orders of magnitude. In the end, with center=TRUE, and I want to make a prediction based on the coefficients returned by mboost, the results only make sense if I manually rescale my independent variables prior to making a prediction. Is this the desired behavior, or am I doing something wrong? I don't know, but my question is ... why aren't you using the predict method for that sort of object? Presumably the authors of the package know how to recognize the differences in the objects. Testing confirms this to be the case with the first example in the glmboost help page. Many thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with apply()
On Feb 7, 2010, at 8:26 PM, Nathan S. Watson-Haigh wrote: I have a 2 column data.frame: d[1:5,] a b 180015 C 280016 B 380023 C 480062 B 580069 B I want to apply a function across each row: for(i in 1:nrow(d)) { +myFun(con, d[i,]$a, d[i,]$b) + } How do I do this using apply()? I'm unsure how to tell apply() to pass data from columns a and b for a given row as arguments to the function myFun(). apply(d, 1, function(x) myFun(x[1], x[2]) ) The reason you cannot use the $ operator is that the row is passed to the function as a vector, rather than as a list. -- David Thanks in advance for any pointers, Nathan -- Dr. Nathan S. Watson-Haigh OCE Post Doctoral Fellow CSIRO Livestock Industries University Drive Townsville, QLD 4810 Australia Tel: +61 (0)7 4753 8548 Fax: +61 (0)7 4753 8600 Web: http://www.csiro.au/people/Nathan.Watson-Haigh.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe question
David: Thanks for the idea. Both the one that you suggested and the one that Bill Venables suggested are very good. Unfortunately, this statement is creating out of memory issues like below (system limitations). When I had padded white space before the number, read.csv.sql is correctly treating it as a factor. I am going to take out the padding so that it treats it as numeric and then I can proceed with further steps. Satish Out of memory warning Reached total allocation of 1535Mb: see help(memory.size) 34: In ans[[i]] - tmp : Reached total allocation of 1535Mb: see help(memory.size) Bill Venable's suggestion below week_list - paste(wk, 1:209, sep=) ### no need for c(...) for(week in week_list) three_wk_out[[week]] - as.numeric(three_wk_out[[week]]) ### no need for '{...}' Bill Venables CSIRO/CMIS Cleveland Laboratories -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Sunday, February 07, 2010 8:51 PM To: Vadlamani, Satish {FLNA} Cc: r-help@r-project.org help Subject: Re: [R] dataframe question On Feb 7, 2010, at 8:14 PM, David Winsemius wrote: On Feb 7, 2010, at 7:51 PM, Vadlamani, Satish {FLNA} wrote: Folks: Good day. Please see the code below. three_wk_out is a dataframe with columns wk1 through wk209. I want to change the format of the columns. I am trying the code below but it does not work. I need $week in the for loop interpreted as wk1, wk2, etc. Could you please help? Thanks. Satish R code below week_list - paste(wk,c(1:209),sep=) Or more functionally: three_wk_out - as.data.frame( lapply(three_wk_out, some_function) ) Or if you wanted to just change the particular columns that matched the wk pattern: idx - grep(wk, names(three_wk_out)) three_wk_out[, idx ] - apply( three_wk_out[, idx ], 2, as.numeric) (I probably should have used apply( ___ , 2, fn) in the prior effort rather than coercing a list back to a dataframe.) E.g.: a b c x 1 1 0 0 1 2 2 3 2 4 3 1 2 1 5 4 2 0 3 2 df - as.data.frame(lapply(df, ^, 2)) df a b c x 1 1 0 0 1 2 16 81 16 256 3 1 16 1 625 4 16 0 81 16 for (week in week_list) { three_wk_out$week - as.numeric(three_wk_out$week) } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Contributed packages
Folks: If you wanted to find out about what are the contributed packages and classify them, how would you go about it? For someone new like me, I would like to know what the possibilities are. When I click on install packages on my Windows version of R, it gives me a list but it is hard to figure out from that list what is the purpose of each package and to what class it belongs (for example, class of regular expressions). What is the equivalent of CPAN.org for Perl in R where you can browse Perl modules by category? Thanks. Satish __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe question
On Feb 7, 2010, at 11:15 PM, Vadlamani, Satish {FLNA} wrote: David: Thanks for the idea. Both the one that you suggested and the one that Bill Venables suggested are very good. Unfortunately, this statement is creating out of memory issues like below (system limitations). When I had padded white space before the number, read.csv.sql is correctly treating it as a factor. I am going to take out the padding so that it treats it as numeric and then I can proceed with further steps. Idea: Write the dataframe and all other useful data to a csv or tab delimited file. Save all other useful data as well. Exit without saving the workspace. Restart and read data in with correct format using colClasses argument. three_wk_out - read.csv(file= somename.csv, colClasses = rep(numeric, 209) ) Of course if it's that big, you may have problems doing anything useful with it in the space you have available. Details of your machine would be helpful, especially if you are using one of the Windows variant and have 4 GB of physical memory. There is information about this condition in the R-Win FAQ. -- David. Satish Out of memory warning Reached total allocation of 1535Mb: see help(memory.size) 34: In ans[[i]] - tmp : Reached total allocation of 1535Mb: see help(memory.size) Bill Venable's suggestion below week_list - paste(wk, 1:209, sep=) ### no need for c(...) for(week in week_list) three_wk_out[[week]] - as.numeric(three_wk_out[[week]]) ### no need for '{...}' Bill Venables CSIRO/CMIS Cleveland Laboratories -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Sunday, February 07, 2010 8:51 PM To: Vadlamani, Satish {FLNA} Cc: r-help@r-project.org help Subject: Re: [R] dataframe question On Feb 7, 2010, at 8:14 PM, David Winsemius wrote: On Feb 7, 2010, at 7:51 PM, Vadlamani, Satish {FLNA} wrote: Folks: Good day. Please see the code below. three_wk_out is a dataframe with columns wk1 through wk209. I want to change the format of the columns. I am trying the code below but it does not work. I need $week in the for loop interpreted as wk1, wk2, etc. Could you please help? Thanks. Satish R code below week_list - paste(wk,c(1:209),sep=) Or more functionally: three_wk_out - as.data.frame( lapply(three_wk_out, some_function) ) Or if you wanted to just change the particular columns that matched the wk pattern: idx - grep(wk, names(three_wk_out)) three_wk_out[, idx ] - apply( three_wk_out[, idx ], 2, as.numeric) (I probably should have used apply( ___ , 2, fn) in the prior effort rather than coercing a list back to a dataframe.) E.g.: a b c x 1 1 0 0 1 2 2 3 2 4 3 1 2 1 5 4 2 0 3 2 df - as.data.frame(lapply(df, ^, 2)) df a b c x 1 1 0 0 1 2 16 81 16 256 3 1 16 1 625 4 16 0 81 16 for (week in week_list) { three_wk_out$week - as.numeric(three_wk_out$week) } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Contributed packages
On Feb 7, 2010, at 11:27 PM, Vadlamani, Satish {FLNA} wrote: Folks: If you wanted to find out about what are the contributed packages and classify them, how would you go about it? For someone new like me, I would like to know what the possibilities are. When I click on install packages on my Windows version of R, it gives me a list but it is hard to figure out from that list what is the purpose of each package and to what class it belongs (for example, class of regular expressions). What is the equivalent of CPAN.org for Perl in R where you can browse Perl modules by category? Thanks. The CRAN Task Views. Satish __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2 stacked line plot
Hi all, I have been hunting around for hours trying to figure out how to generate a stacked line chart using ggplot2. This type of chart can be generated in excel 2007 by selecting: Chart type Line Stacked line. I can generate a stacked area chart using the following code: p - ggplot2(~, aes(x = ~, y = ~, colour = Type)) + geom_area(aes(position = 'stack', fill = Type)) However, when I try and replicate this using the following code for geom_line: p - ggplot(~, aes(x = ~, y = ~, colour = Type)) + geom_line(aes(position = 'stack')) the resulting plot is not stacked - i.e. each 'Type' is plotted at its actual value rather than cumulatively to form a stacked chart... I have poured through Hadley's ggplot2 book (ggplot2: elegant graphics for data analysis), the R help list and also done general google searching but cannot find a way to generate this type of plot. R version: 2.9.2 ggplot2 version: 0.8.5 OS: windows 7 (64-bit). Any suggestions or assistance would be greatly appreciated. Regards, Liam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with Tinn-R
Hi, I install Tinn-R 2.3.4.4 and when I want to execute the calculation, it gives me this error: The preferred Rterm not defined. Thank you so much for any help given. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 stacked line plot
Hi, On Sun, Feb 7, 2010 at 11:40 PM, Liam Blanckenberg liam.blanckenb...@gmail.com wrote: Hi all, I have been hunting around for hours trying to figure out how to generate a stacked line chart using ggplot2. This type of chart can be generated in excel 2007 by selecting: Chart type Line Stacked line. I can generate a stacked area chart using the following code: p - ggplot2(~, aes(x = ~, y = ~, colour = Type)) + geom_area(aes(position = 'stack', fill = Type)) However, when I try and replicate this using the following code for geom_line: p - ggplot(~, aes(x = ~, y = ~, colour = Type)) + geom_line(aes(position = 'stack')) the resulting plot is not stacked - i.e. each 'Type' is plotted at its actual value rather than cumulatively to form a stacked chart... I have poured through Hadley's ggplot2 book (ggplot2: elegant graphics for data analysis), the R help list and also done general google searching but cannot find a way to generate this type of plot. R version: 2.9.2 ggplot2 version: 0.8.5 OS: windows 7 (64-bit). Any suggestions or assistance would be greatly appreciated. Are you trying to show a graph that looks like Figure 4.5 from this page? http://learnr.wordpress.com/2009/07/02/ggplot2-version-of-figures-in-lattice-multivariate-data-visualization-with-r-part-4/ sans the coord_flip(), perhaps? That website is a good resource for ggplot graphics. He ran a whole series recreating the graphs in the lattice graphics book with ggplot2. His final post on that subject included a link to a pdf with the code and graphics for all the posts in that series for easy scanning, too. If this isn't the graph you wanted, perhaps you can skim that document to see if there's a graphic that resembles what you're after. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] specifying colors in a heatmap/image -like plot
On 02/08/2010 08:57 AM, kerimcan wrote: Hi, I have searched for a solution but I failed to find an answer. I am hoping you may be able to help me. I have a data set where I have observations for a number of units (n =~40) over a period of time (t =~100) and I have a variable (Z) that codes a categorical variable for each observation. I want to produce a 2D plot where time is on the x-axis and units are on the y-axis. Then each block on the 2-d plot should take a color depending on variable Z. Z is not ordered so using a scale (like in heatmaps) does not make sense. In fact the values of Z have meanings that are intuitively related to colors (e.g. Z=3 means involvement by the United Nations so I want its color to be blue). Below is some code that gives an example of what I am aiming to do and why heatmap and image functions don't work for me. Thanks in advance for your help. # Example: Suppose Z had 3 values (0,1,2) and I had 8 observations. hitmep- matrix(c(0,2,1,0,2,1,1,0),2,4) # Graph 1: heatmap(hitmep2, Rowv =NA, Colv =NA, labrow =NULL, scale =none) # Graph 2: image(t(hitmep2), axes =FALSE) # I like the layout of the plots. My problem with these is that I don't want Z's values (0,1,2) to have colors on a scale. I want to specify, for example, 1=blue, 2=yellow and 3=green. Do you know how to do this? Hi Kerim, You can do this with color2D.matplot (plotrix) as well as with image or heatmap. Just pass the desired color vector as the cellcolors argument. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with apply()
On 02/08/2010 12:26 PM, Nathan S. Watson-Haigh wrote: I have a 2 column data.frame: d[1:5,] a b 1 80015 C 2 80016 B 3 80023 C 4 80062 B 5 80069 B I want to apply a function across each row: for(i in 1:nrow(d)) { + myFun(con, d[i,]$a, d[i,]$b) + } How do I do this using apply()? I'm unsure how to tell apply() to pass data from columns a and b for a given row as arguments to the function myFun(). Hi Nathan, apply doesn't work with data frames unless they can be coerced to matrices or arrays (and sometimes not even then). What's wrong with using the code you have above? Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 stacked line plot
On 02/08/2010 03:40 PM, Liam Blanckenberg wrote: Hi all, I have been hunting around for hours trying to figure out how to generate a stacked line chart using ggplot2. This type of chart can be generated in excel 2007 by selecting: Chart type Line Stacked line. I can generate a stacked area chart using the following code: p- ggplot2(~, aes(x = ~, y = ~, colour = Type)) + geom_area(aes(position = 'stack', fill = Type)) However, when I try and replicate this using the following code for geom_line: p- ggplot(~, aes(x = ~, y = ~, colour = Type)) + geom_line(aes(position = 'stack')) the resulting plot is not stacked - i.e. each 'Type' is plotted at its actual value rather than cumulatively to form a stacked chart... I have poured through Hadley's ggplot2 book (ggplot2: elegant graphics for data analysis), the R help list and also done general google searching but cannot find a way to generate this type of plot. R version: 2.9.2 ggplot2 version: 0.8.5 OS: windows 7 (64-bit). Hi Liam, Are you looking for something like stackpoly (plotrix package)? It's not ggplot, but it might do what you want. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with Tinn-R
Roslina Zakaria wrote: I install Tinn-R 2.3.4.4 and when I want to execute the calculation, it gives me this error: The preferred Rterm not defined. Set the path to Rtermn in Options/Application/R/Path Dieter -- View this message in context: http://n4.nabble.com/problem-with-Tinn-R-tp1472562p1472633.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.